Use Bash Builtins

First and foremost, this is nothing new. You will find many people on the Internet making a case for using the native capabilities of your shell (or any programming language's standard library for the matter).

I frequently encounter Bash scripts (and have written several myself) that call out to external programs to do things that Bash natively supports. Initially this might sound like a pedant's complaint, but there are legitimate reasons you should be interested in using builtins over external programs in your Bash scripts. The primary reason is performance. Bash builtins frequently are significantly faster than using external programs. Generally, using builtin commands is more efficient because you don't need to fork any new processes for them. There are other nice advantanges, like not having to worry about as many platform-specific implementation differences and reducing the required dependencies in your scripts.

I threw together a simple benchmark script to illustrate the performance points along with some common examples of unnecessary usages of external programs in scripts. In many cases, people just aren't aware that Bash can do these things natively. Let's fix that.

I've included a snippet of my benchmark results under each section. The full benchmark script can be found here. For this post, I ran the script with 10000 iterations. Also note that the benchmark relies on builtins in Bash 4.0+, so you'll need to install a more recent version if you are on OS X.

Reading files

You're probably familiar with the useless use of cat. It's not uncommon to see cat used to read a file to STDIN or a variable when we could simply be using the redirection operator instead.

$ < ~/stop_using_cat.txt
Stop using cat!
$ FILE=$(< ~/stop_using_cat.txt)

Performance comparison:

>>> read_benchmark
builtin redirection: 0.256
/usr/bin/cat: 13.131

Arithemtic

I don't see this one very often anymore. Most people are aware of Bash's builtin arithmetic expressions. Some people still use expr or bc though. There are legitimate cases for using bc, but generally native Bash is sufficient.

$ echo $(( 987 + 1 ))
988
$ echo '987 + 1' | bc
988
$ expr 987 + 1
988

Performance comparison:

>>> math_benchmark
builtin shell math: 0.241
/bin/expr: 12.732
/usr/bin/bc: 18.720

Format Strings

In trivial cases, you won't gain much over using the echo builtin over printf, but printf is more powerful and faster. Be careful to use the builtins for echo and printf and not the external versions. Also, if you're using echo with options in your scripts, you should generally be using printf because the echo options are not standardized and vary wildly across platforms.

$ echo 'hello'
hello
$ printf 'hello\n'
hello

Generally, printf formatting is the same as printf() from C or any other common programming language. The beauty of printf comes when you need complex formatting. You don't need to rely on carefully crafting here docs or tools like column.

    $ for i in {1..5}; do
        printf '%5s %10s %10s\n' ${i} 'zomg' ${RANDOM:0:3}
    done
        1       zomg        305
        2       zomg        326
        3       zomg        688
        4       zomg        286
        5       zomg        485

The function is also handy for dealing with numbers in other formats, e.g. hexidecimal MAC addresses.

$ MAC=$(< sys/class/net/eth0/address)
$ B0=$(printf '%02x' "$(( 0x${MAC:0:2} ^ 2 ))")
$ IPv6="fe80::${B0}${MAC:3:5}ff:fe${MAC:9:5}${MAC:15:2}"
$ echo ${IPv6}
fe80::2488:9cff:fe93:fdc1

Performance comparison:

>>> print_benchmark
builtin echo: 0.172
/bin/echo: 8.685
builtin printf: 0.186
/usr/bin/printf: 10.105

You can even use printf as a date replacement.

$ echo "The date is $(date +%F)."
The date is 2015-08-28.
$ printf 'The date is %(%Y-%m-%d)T.\n'
The date is 2015-08-28.

Performance comparison:

>>> date_benchmark
builtin printf: 0.227
/usr/bin/date: 11.054

Counting

Character and line counting are often deferred to wc, but we can also replace some instances of this with pure Bash for performance gains. Note we use printf to print the variable. If we could used echo, but we would have to discard the extra newline.

$ CONTENT=$(< blogpost.md)
$ printf "${CONTENT}" | wc -c
8341
$ echo "${#CONTENT}"
8341

Performance comparison:

>>> charlength_benchmark
builtin variable length: 0.232
/usr/bin/wc -c: 23.800

We can do line counting by taking advantage of the mapfile builtin in Bash. mapfile reads lines of standard input and assigns each line to an element in an array. Then we can use the builtin array length syntax to count the lines.

$ MULTILINE=$(printf 'Remember kids,\nUse yourbuiltins!\nCheers.\n')
$ echo "${MULTILINE}" | wc -l
3
$ mapfile -t MLARR <<< "${MULTILINE}" # read into array
$ echo "${#MLARR[@]}"
3

Performance comparison:

>>> linelength_benchmark
builtin mapfile and array length: 1.192
/usr/bin/wc -l: 26.096

Command Paths

This one is a little pedantic, but Bash has a builtin for printing the full path of program, hash. It's a lot faster than which when you just need to check if a program exists on the search path.

$ which python >/dev/null
$ hash python

Performance comparison:

>>> command_benchmark
builtin hash: 0.622
/usr/bin/which: 9.889

Note that there is a which builtin in Zsh that I didn't test because it's not Bash. If you are writing Zsh scripts, you should make use of it too.

Sequence expansion

A slightly less known Bashism is the builtin brace expansion (rather than an external program like seq). The main caveat with this (making it less useful in my opinion) is that it is strictly textual. Bash does not apply any interpretation to the context of the expansion or the text between the braces. This means you can't use variables with brace expansion.

$ echo {1..10}
1 2 3 4 5 6 7 8 9 10
$ for i in {01..3}; do echo "host${i}"; done
host01
host02
host03
$ limit=10 && echo {1..${limit}} # does not work
{1..10}

Performance comparison:

>>> seq_expansion_benchmark
builtin sequence expansion: 5.328
seq: 10.076

String Manipulation

Bash supports a TON of variable manipulation methods (see BHW for a more comprehensive list), so I'll cover a couple common cases here to illustrate the point. This isn't intended to be a how-to, but a case for using builtins.

People often rely on tr or awk to change letter casing.

$ UPPER='CRUISECONTROL'
$ echo ${UPPER} | tr [A-Z] [a-z]
cruisecontrol
$ echo ${UPPER} | awk '{print tolower($0)}'
cruisecontrol
$ echo ${UPPER,,} # native
cruisecontrol

Performance comparison:

>>> downcase_benchmark
builtin variable manipulation: 0.202
/usr/bin/tr: 20.698
/usr/bin/awk: 37.323

People often use sed for simple string replacement, e.g. search and replace functionality.

$ CONTENT=$(< blogpost.md)
$ echo "${CONTENT}" | sed -e 's/replace/transform/g' > newpost.md
$ echo "${CONTENT//replace/transform/}" > newpost.md

Performance comparison:

>>> replacement_benchmark
builtin variable replacement: 0.244
/usr/bin/sed: 29.274

On the note of sed, if you must use sed for your use case, be sure to combine multiple expressions into one command rather than using multiple pipes. There is a large performance hit for starting a new sed process.

$ echo "${CONTENT}" | sed -e 's/UNIX/Unix/g' -e 's,Linux,GNU/Linux,g'

Summary

There are many more examples of pure Bash implementations of common programming problems that I didn't cover. The benchmarks hopefully made it pretty clear that Bash wins out performance-wise by a landslide in most simple cases. So generally I'm saying, take advantage of all Bash has to offer like you would the standard library in your programming language of choice. Using the builtin capabilities of the shell are going to be faster than calling out to external programs, and external programs are often overkill for common situations where you're using a shell script anyway. Take a look at StackOverflow now, and notice how many answers involving sed, awk, grep, and wc could be solved with pure Bash.