{"id":8979,"date":"2022-01-04T21:38:46","date_gmt":"2022-01-04T21:38:46","guid":{"rendered":"http:\/\/putridparrot.com\/blog\/?p=8979"},"modified":"2022-01-04T21:38:46","modified_gmt":"2022-01-04T21:38:46","slug":"awk-basics","status":"publish","type":"post","link":"https:\/\/putridparrot.com\/blog\/awk-basics\/","title":{"rendered":"Awk Basics"},"content":{"rendered":"<p>I&#8217;ve been meaning to learn Awk for so many years &#8211; this post is just going to cover some basics to getting started for a proper read, check out <a href=\"https:\/\/www.gnu.org\/software\/gawk\/manual\/gawk.html\" rel=\"noopener\" target=\"_blank\">The GNU Awk User\u2019s Guide<\/a>.<\/p>\n<p><em>Note: I&#8217;m going to be using Gnu Awk (gawk) as that&#8217;s what&#8217;s installed by default with my machine, we still type awk to use it. So the commands and programs in this post should still all work on other Awk implementations for the most part.<\/em><\/p>\n<p><strong>What is Awk (elevator pitch)<\/strong><\/p>\n<p>Awk (Aho, Weinberger, Kernighan (Pattern Scanning Language)) is a program and simple programming language for processing data within files. It&#8217;s particularly useful for extracting and processing records from files with delimited data.<\/p>\n<p><strong>Example data<\/strong><\/p>\n<p>Before we get started, this is an example of very simple data file that we&#8217;ll use for some of the sample commands &#8211; this file is named <em>test1.txt<\/em>, hence you&#8217;ll see this in the sample commands.<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nOne     1\r\nTwo     2\r\nThree   3\r\nFour    4\r\nFive    5\r\nSix     6\r\nSeven   7\r\nEight   8\r\nNine    9\r\nTen     10\r\n<\/pre>\n<p><strong>Running Awk<\/strong><\/p>\n<p>We run <em>awk<\/em> from the command line by either supplying all the commands via your terminal or by create files for our commands. The run the command(s) via the terminal just use<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\nawk '{ print }' test1.txt\r\n<\/pre>\n<p>We can also create files with our awk commands (we&#8217;ll use .awk as the file extension) and run them like this (from your terminal)<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\nawk -f sample.awk test1.txt\r\n<\/pre>\n<p>where our sample.awk file looks like this<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\n{ print }\r\n<\/pre>\n<p>Awk, like most *nix commands will also take input via the terminal, so if you run the next command you&#8217;ll see Awk is waiting for input. Each line you type in the terminal will be processed and output to the screen. Ctrl+D effectively tells Awk you&#8217;ve finished your input and at this point the application exits correctly, so if you have any code that runs on completion (we&#8217;ll discuss start up and clean up code later), then Awk will correctly process that completion code.<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\nawk '{print}'\r\n<\/pre>\n<p>Again, standard for *nix commands you can ofcourse pipe from other commands into Awk, so let&#8217;s list the directory and let Awk process that data using something like this<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\nls -l | awk '{ print &quot;&gt;&gt;&gt; &quot; $0 }'\r\n<\/pre>\n<p>The $0 is discussed in <em>Basic Syntax<\/em> but to save you checking, it outputs the whole row\/record. Hence this code takes <em>ls<\/em> listing the files in the directory in tabular format, then Awk simply prepends >>> to each row from <em>ls<\/em><\/p>\n<p><strong>Basic Syntax<\/strong><\/p>\n<p>The basic syntax for an awk program is<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\npattern { action }\r\n<\/pre>\n<p>So let&#8217;s use a grep like pattern match by locating the row in this file that contains the word <em>Nine<\/em><\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\nawk '\/Nine\/ { print $0 }' test1.txt\r\n<\/pre>\n<p>The \/ indicates a regular expression pattern search. So in this case we&#8217;re matching on the word <em>Nine<\/em> and then the action prints the record\/line from the file that contains the expression. $0 basically stands for the variable (if you like) representing the current line (i.e. in this case the matching line). The final argument in this command is ofcourse the data file we&#8217;re using awk against. So this example is similar to <em>grep Nine test1.txt<\/em>.<\/p>\n<p>Actually we can reduce this particular command to <\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\nawk '\/Nine\/' test1.txt\r\n<\/pre>\n<p>The pattern matching is case sensitive, so we could write the following to ignore case<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\nawk 'tolower($0) ~ \/nine\/' test1.txt\r\n# or\r\nawk 'BEGIN{IGNORECASE=1} \/nine\/' test1.txt\r\n<\/pre>\n<p>In the first example we convert the current record to lower case then use ~ to indicate that we&#8217;re trying to match using the pattern matching\/regular expression. The second example simple disables case sensitivity.<\/p>\n<p>Awk doesn&#8217;t require that we just use pattern matching, we can write Awk language programs, so for example let&#8217;s look for any record with a double digit number in the second column (i.e. 10)<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\nawk 'length($2) == 2' test1.txt\r\n<\/pre>\n<p>As you can see from the above code, we&#8217;re using a built-in function <em>length<\/em> to help us, check out <a href=\"https:\/\/www.gnu.org\/software\/gawk\/manual\/gawk.html#Built_002din\" rel=\"noopener\" target=\"_blank\">9.1 Built-in Functions<\/a> for information on the built-in functions. As you can imagine we have string manipulation functions, numeric functions etc.<\/p>\n<p><strong>Start up and clean up<\/strong><\/p>\n<p>We can have code run at start of and at the end (or clean up) of an awk program. So for example <\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\nawk 'BEGIN {print &quot;start&quot;} { print } END {print &quot;complete&quot;}' test1.txt\r\n<\/pre>\n<p>In this case we simply display &#8220;start&#8221; followed by the lines from the file ending with &#8220;complete&#8221;.<\/p>\n<p><strong>Variables\/State<\/strong><\/p>\n<p>As, I&#8217;m sure you&#8217;re aware by now, Awk is really a fairly fully fledged programming language. With this in mind you might be wondering if we can store variables in our Awk programs and the answer is yes. Here&#8217;s an example (I&#8217;ve formatted the code as it&#8217;s stored in a file and thus makes it more readable)<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\nBEGIN { \r\n    count = $2 \r\n}\r\n\r\ncount += $2 \r\n\r\nEND { \r\n    print &quot;Total &quot; count\r\n}\r\n<\/pre>\n<p>I&#8217;m sure it&#8217;s obvious from the code, but I&#8217;ll go through it anyway &#8211; at the start of the application we initialise the <em>count<\/em> variable to the first row, second column value (in our test1.txt file this is 1) then when Awk processes each subsequent row we simply add to the <em>count<\/em> and update that until the end where we output the <em>count<\/em> variable.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been meaning to learn Awk for so many years &#8211; this post is just going to cover some basics to getting started for a proper read, check out The GNU Awk User\u2019s Guide. Note: I&#8217;m going to be using Gnu Awk (gawk) as that&#8217;s what&#8217;s installed by default with my machine, we still type [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[320],"tags":[],"class_list":["post-8979","post","type-post","status-publish","format-standard","hentry","category-awk"],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/putridparrot.com\/blog\/wp-json\/wp\/v2\/posts\/8979","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/putridparrot.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/putridparrot.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/putridparrot.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/putridparrot.com\/blog\/wp-json\/wp\/v2\/comments?post=8979"}],"version-history":[{"count":5,"href":"https:\/\/putridparrot.com\/blog\/wp-json\/wp\/v2\/posts\/8979\/revisions"}],"predecessor-version":[{"id":9066,"href":"https:\/\/putridparrot.com\/blog\/wp-json\/wp\/v2\/posts\/8979\/revisions\/9066"}],"wp:attachment":[{"href":"https:\/\/putridparrot.com\/blog\/wp-json\/wp\/v2\/media?parent=8979"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/putridparrot.com\/blog\/wp-json\/wp\/v2\/categories?post=8979"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/putridparrot.com\/blog\/wp-json\/wp\/v2\/tags?post=8979"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}