Category Archives: Awk

Awk Basics

I’ve been meaning to learn Awk for so many years – this post is just going to cover some basics to getting started for a proper read, check out The GNU Awk User’s Guide.

Note: I’m going to be using Gnu Awk (gawk) as that’s what’s installed by default with my machine, we still type awk to use it. So the commands and programs in this post should still all work on other Awk implementations for the most part.

What is Awk (elevator pitch)

Awk (Aho, Weinberger, Kernighan (Pattern Scanning Language)) is a program and simple programming language for processing data within files. It’s particularly useful for extracting and processing records from files with delimited data.

Example data

Before we get started, this is an example of very simple data file that we’ll use for some of the sample commands – this file is named test1.txt, hence you’ll see this in the sample commands.

One     1
Two     2
Three   3
Four    4
Five    5
Six     6
Seven   7
Eight   8
Nine    9
Ten     10

Running Awk

We run awk from the command line by either supplying all the commands via your terminal or by create files for our commands. The run the command(s) via the terminal just use

awk '{ print }' test1.txt

We can also create files with our awk commands (we’ll use .awk as the file extension) and run them like this (from your terminal)

awk -f sample.awk test1.txt

where our sample.awk file looks like this

{ print }

Awk, like most *nix commands will also take input via the terminal, so if you run the next command you’ll see Awk is waiting for input. Each line you type in the terminal will be processed and output to the screen. Ctrl+D effectively tells Awk you’ve finished your input and at this point the application exits correctly, so if you have any code that runs on completion (we’ll discuss start up and clean up code later), then Awk will correctly process that completion code.

awk '{print}'

Again, standard for *nix commands you can ofcourse pipe from other commands into Awk, so let’s list the directory and let Awk process that data using something like this

ls -l | awk '{ print ">>> " $0 }'

The $0 is discussed in Basic Syntax but to save you checking, it outputs the whole row/record. Hence this code takes ls listing the files in the directory in tabular format, then Awk simply prepends >>> to each row from ls

Basic Syntax

The basic syntax for an awk program is

pattern { action }

So let’s use a grep like pattern match by locating the row in this file that contains the word Nine

awk '/Nine/ { print $0 }' test1.txt

The / indicates a regular expression pattern search. So in this case we’re matching on the word Nine and then the action prints the record/line from the file that contains the expression. $0 basically stands for the variable (if you like) representing the current line (i.e. in this case the matching line). The final argument in this command is ofcourse the data file we’re using awk against. So this example is similar to grep Nine test1.txt.

Actually we can reduce this particular command to

awk '/Nine/' test1.txt

The pattern matching is case sensitive, so we could write the following to ignore case

awk 'tolower($0) ~ /nine/' test1.txt
# or
awk 'BEGIN{IGNORECASE=1} /nine/' test1.txt

In the first example we convert the current record to lower case then use ~ to indicate that we’re trying to match using the pattern matching/regular expression. The second example simple disables case sensitivity.

Awk doesn’t require that we just use pattern matching, we can write Awk language programs, so for example let’s look for any record with a double digit number in the second column (i.e. 10)

awk 'length($2) == 2' test1.txt

As you can see from the above code, we’re using a built-in function length to help us, check out 9.1 Built-in Functions for information on the built-in functions. As you can imagine we have string manipulation functions, numeric functions etc.

Start up and clean up

We can have code run at start of and at the end (or clean up) of an awk program. So for example

awk 'BEGIN {print "start"} { print } END {print "complete"}' test1.txt

In this case we simply display “start” followed by the lines from the file ending with “complete”.

Variables/State

As, I’m sure you’re aware by now, Awk is really a fairly fully fledged programming language. With this in mind you might be wondering if we can store variables in our Awk programs and the answer is yes. Here’s an example (I’ve formatted the code as it’s stored in a file and thus makes it more readable)

BEGIN { 
    count = $2 
}

count += $2 

END { 
    print "Total " count
}

I’m sure it’s obvious from the code, but I’ll go through it anyway – at the start of the application we initialise the count variable to the first row, second column value (in our test1.txt file this is 1) then when Awk processes each subsequent row we simply add to the count and update that until the end where we output the count variable.