Shell scripting tips

2019/03/03

It’s been a while since the last time I wrote a shell script. A few years ago I had to automate many tasks for managing thousands of servers. This made me write many scripts for many purposes, like running a sanity check on a set of clusters, collecting and analysing logs or automating deployment of configurations across a set of hundreds of servers. I ended up with many tools at hand, and needed a way to productively write new tools, and also reuse the codes common to all the scripts etc …
In this post I will share the few practices that I applied when writing my scripts. I would not say it’s best practice, it just did work for me. They are not all my creation, I took reference from some other people’s scipts over the years, and some from shell scripting books.

Functions

Using functions was my basic way of achieving code reuse. I designed my functions across all scripts to do one specific thing, and then composed all the scripts with just the logic to call the functions. Another advantage of doing this is that it made it easier to debug the scripts in case I run into an issue. As I was writing code for automating tasks on a large sclage infrastructure, scripts could tend to be very long, so using functions helped a lot for code clarity. When designing the scripts, the first thing that I prioritized was how to isolate code by using functions.
I followed a coding guideline for functions, with only three basic rules:

Function names

Always put the prefix “func_” followed by a name that describes what the function does. One example:

func_panic () {
# Print error message to stderr and exit with error code
# @params:
#	${*}: error message
# @output:
#	stdout:	none
#	stderr:	error message
# @return: none. This function will exit the program with error code ${RC_ERROR}

	message=${*}
	printf "%s\n" "${message}" 1>&2
	exit ${RC_ERROR}
}

The reason to prefix all function names with func_ is to differenciate them from other utilities or commands that you call from your script. If I named the function above just panic, you would see somewhere in the main script a function call like

panic "Exception occured"

Well, one would think that panic is a command or a bash function like echo or printf. Prefixing the function names with func_ help quickly differentiate my functions and other commands.

Function documentation

At the top of the function, include comments describing the function’s purpose, expected input parameters, output and return codes. You may have notice in the previous example how the function was documented. Here is another example:

func_delete_files () {
# Delete files one by one.
# Only files under ${BASE_DIR} will be deleted.
# Function will abort if it fails to delete one of the files.
# @params:
#	${*}: list of files(full path) to delete.
# @output:
#	stdout:	none
#	stderr:	error message
# @return:
#	${RC_SUCCESS}:	everything went file
#	${RC_ERROR}:	Failed to delete a file

	files=${*}
	for file in ${files}; do
		echo ${file} | grep ${BASE_DIR} && rm ${file} || \
			echo "Failed to delete or cannot delete files outside of ${BASE_DIR}" 1>&2 && return ${RC_ERROR}
	done
	return ${RC_SUCCESS}
}

Other people read your code. Even if you are the only one writing and maintaining the code, you will some day need to read the code. In my point of view, shell syntax is not very readable in the first place. So for at least isolated codes like functions, I think it is best practice to write some guidance about what the function does, what it expects as input parameter and so on. You’ll thank yourself the day you come back and read a well documented function code that you wrote six month ago; at least I did…

Function output and return codes

Since the return statement in shell scripts can only return codes, I made a habit of using stdout as a function return value as well. What I mean is I took advantage of the fact that you can pipe your function output as input to other programs. Everything that I printed to stdout inside a function should make sense to another program. Otherwise I did not print anything at all, or used stderr for sending information and warnings. Example:

func_create_tmpfile () {
# Create and return a temp file
# @params: none
# @output:
#	stdout:	filename of temp file
#	stderr:	error message
# @return:
#	${RC_SUCCESS}:	everything went file
#	${RC_ERROR}:	Failure

	tmpfile=${TEMP_DIR}/${SELF}_`date +"%s"`.tmp
	touch ${tmpfile} || return ${RC_ERROR}
	printf "%s" "${tmpfile}" # This can be treated as a return value
	return ${RC_SUCCESS}
}

# We can call this function to create a temp file and get the filename
TMP_FILE=$( func_create_tmpfile )

You may have noticed that I always use the return statement in functions. Not having a return statement will not make your function fail or anything, but I also made it a habit to return error codes to specifically declare success or failure of the function execution. After a function call, the caller can do a simple check like with any other command:

func_xxx # function call
if [ $? -ne ${RC_SUCCESS} ]; then
	# The function failed; do something
fi

Common modules and file structure

After writing a few scripts, I realized that there were common tasks that I had to rewrite. Since those scripts were written at different times, I ended up with different codes doing the same thing. In addition to this, log files and temporary files were handled differently in different scripts. I decided to group all the common stuffs in on script file, and reuse the file every time that I need to write a new script. Below are the things that I achieved doing this.

Standard directory and file structure

I usually had to deal with three types of files with my scripts:

Configuration files
Log files
Temporary files

In addition to this, many of the scripts has to be deployed and executed on many servers. I decided to create a standard environment for all the scripts, and this meant to standardize how files and directories where structured. The standard deployement looked like this:

# script install directory
BASE_DIR="" # <-- decide a location for all scripts

# Script file
SELF=${BASE_DIR}/`basename ${0}`

# directory to store all the scripts configuration files
CONFIG_DIR=${BASE_DIR}/config

# configuration file of script_name.sh
SELF=${BASE_DIR}/${CONFIG_DIR}/

# directory to store temporary files
TEMP_DIR=${BASE_DIR}/temp

# directory to store log files
LOG_DIR=${BASE_DIR}/logs/script_name.d

# log file for script_name.sh
LOG_FILE=${BASE_DIR}/${LOG_DIR}/script_name.log

Following this standard for all my scripts came with benefits:

You don’t need to rethink basic stuffs like where to store your configuration files or log files
You always know where your files are
Having a standard ${BASE_DIR} can help isolate your scripts environment from other system files, thus helps you enforce rules such as “never delete a file outside of ${BASE_DIR}”. Especially when working on business critical systems, deleting a file by mistake, or due to a bug in your script is very dangerous. This rule helped me avoid those types of accidents.