How to write awk commands and scripts Examples

The awk command is powerful method for processing or analyzing text files, in particular data files that are organized by lines (rows) and columns.

Simple awk commands can be run from the command line. More complex tasks should be written as awk programs (so-called awk scripts) to a file.

The basic format of an awk command looks like this:
awk 'pattern {action}' input-file > output-file
This means: take each line of the input file; if the line contains the pattern apply the action to the line and write the resulting line to the output-file. If the pattern is omitted, the action is applied to all line. For example:
awk '{ print $5 }' table1.txt > output1.txt
This statement takes the element of the 5th column of each line and writes it as a line in the output file "output.txt". The variable '$4' refers to the second column. Similarly you can access the first, second, and third column, with $1, $2, $3, etc. By default columns are assumed to be separated by spaces or tabs (so called white space). So, if the input file "table1.txt" contains these lines:

1, Justin Timberlake, Title 545, Price $7.30
2, Taylor Swift, Title 723, Price $7.90
3, Mick Jagger, Title 610, Price $7.90
4, Lady Gaga, Title 118, Price $7.30
5, Johnny Cash, Title 482, Price $6.50
6, Elvis Presley, Title 335, Price $7.30
7, John Lennon, Title 271, Price $7.90
8, Michael Jackson, Title 373, Price $5.50
Then the command would write the following lines to the output file "output1.txt":
545,
723,
610,
118,
482,
335,
271,
373,
If the column separator is something other than spaces or tabs, such as a comma, you can specify that in the awk statement as follows:
awk -F, '{ print $3 }' table1.txt > output1.txt 
This will select the element from column 3 of each line if the columns are considered to be separated by a comma. Therefore the output, in this case, would be:
 Title 545
 Title 723
 Title 610
 Title 118
 Title 482
 Title 335
 Title 271
 Title 373

The list of statements inside the curly brackets ('{','}') is called a block. If you put a conditional expression in front of a block, the statement inside the block will be executed only if the condition is true.
awk '$7=="\$7.30" { print $3 }' table1.txt 
In this case the condition is $7=="\$7.30", which means that the element at column 7 is equal to $7.30. The backslash in front of the dollar sign is used to prevent the system from interpreting $7 as a variable and instead take the dollar sign literally.

So this awk statement prints out the element at the 3rd column of each line that has a "$7.30" at column 7.

You can also use regular expressions as condition. For example:
awk '/30/ { print $3 }' table1.txt.
The string between the two slashes ('/') is the regular expression. In this case it is just the string "30". This means, if a line contains the string "30", the system prints out the element at the 3rd column of that line. The output in the above example would be:

Timberlake,
Gaga,
Presley,
 
If the table elements are numbers awk can run calculations on them as in this example:
awk '{ print ($2 * $3) + $7 }'
Besides the variables that access elements of the current row ($1, $2, etc.) there is the variable $0 which refers to the complete row (line), and the variable NF which holds to the number of fields. You can also define new variables as in this example:
awk '{ sum=0; for (col=1; col&lt=NF; col++) sum += $col; print sum; }' 
This computes and prints the sum of all the elements of each row.

Awk statements are frequently combined with sed commands.

Linux awk command with examples
Linux command syntaxLinux command description
awk ' {print $1,$3} '
Print only columns one and three using stdin
awk ' {print $0} '
      
Print all columns using stdin
awk ' /'pattern'/ {print $2} '
Print only elements from column 2 that match pattern using stdin
awk -f script.awk inputfile
Just like make or sed, awk uses -f to get its' instructions from a file, useful when there is a lot to be done and using the terminal would be impractical
awk ' program ' inputfile
Execute program using data from inputfile
awk "BEGIN { print \"Hello, world!!\" }"
Classic "Hello, world" in awk
awk '{ print }'
Print what's entered on the command line until EOF (^D)
#! /bin/awk -f
BEGIN { print "Hello, world!" }
awk script for the classic "Hello, world!" (make it executable with chmod and run it as-is)
# This is a program that prints \
"Hello, world!"
# and exits
Comments in awk scripts
awk -F "" 'program' files
Define the FS (field separator) as null, as opposed to white space, the default
awk -F "regex" 'program' files
FS can also be a regular expression
awk 'BEGIN { print "Here is a single \
quote <'\''>" }'
Will print <'>. Here's why we prefer Bourne shells. :)
awk '{ if (length($0) > max) max = \
length($0) }
END { print max }' inputfile
Print the length of the longest line
awk 'length($0) > 80' inputfile
Print all lines longer than 80 characters
awk 'NF > 0' data
Print every line that has at least one field (NF stands for Number of Fields)
awk 'BEGIN { for (i = 1; i <= 7; i++)
print int(101 * rand()) }'
Print seven random numbers from 0 to 100
ls -l . | awk '{ x += $5 } ; END \
{ print "total bytes: " x }'
total bytes: 7449362
Print the total number of bytes used by files in the current directory
ls -l . | awk '{ x += $5 } ; END \
{ print "total kilobytes: " (x + \
1023)/1024 }'
total kilobytes: 7275.85
Print the total number of kilobytes used by files in the current directory
awk -F: '{ print $1 }' /etc/passwd | sort
Print sorted list of login names
awk 'END { print NR }' inputfile
Print number of lines in a file, as NR stands for Number of Rows
awk 'NR % 2 == 0' data
Print the even-numbered lines in a file.
How would you print the odd-numbered lines?
ls -l | awk '$6 == "Nov" { sum += $5 }
END { print sum }'
Prints the total number of bytes of files that were last modified in November
awk '$1  ̃/J/' inputfile
Regular expression matching all entries in the first field that start with a capital j
awk '$1  ̃!/J/' inputfile
Regular expression matching all entries in the first field that don't start with a capital j
awk 'BEGIN { print "He said \"hi!\" \
to her." }'
Escaping double quotes in awk
echo aaaabcd | awk '{ sub(/a+/, \
"<A>"); print }'
Prints "<A>bcd"
ls -lh | awk '{ owner = $3 ; $3 = $3 \
" 0wnz"; print $3 }' | uniq
Attribution example; try it :)
awk '{ $2 = $2 - 10; print $0 }' inventory
Modify inventory and print it, with the difference being that the value of the second field will be lessened by 10
awk '{ $6 = ($5 + $4 + $3 + $2); print \
$6' inventory
Even though field six doesn't exist in inventory, you can create it and assign values to it, then display it
echo a b c d | awk '{ OFS = ":"; $2 = ""
> print $0; print NF }'
OFS is the Output Field Separator and the command will output "a::c:d" and "4" because although field two is nullified, it still exists so it gets counted
echo a b c d | awk ’{ OFS = ":"; \
$2 = ""; $6 = "new"
> print $0; print NF }’
Another example of field creation; as you can see, the field between $4 (existing) and $6 (to be created) gets created as well (as $5 with an empty value), so the output will be "a::c:d::new" "6"
echo a b c d e f | awk ’\
{ print "NF =", NF;
> NF = 3; print $0 }’
Throwing away three fields (last ones) by changing the number of fields
FS=[ ]
This is a regular expression setting the field separator to space and nothing else (non-greedy pattern matching)
echo ' a b c d ' |  awk 'BEGIN { FS = \
"[ \t\n]+" }
> { print $2 }'

This will print only "a"
awk -n '/RE/{p;q;}' file.txt
Print only the first match of RE (regular expression)
awk -F\\\\ ’...’ inputfiles ...
Sets FS to \\
BEGIN { RS = "" ; FS = "\n" }
{
print "Name is:", $1
print "Address is:", $2
print "City and State are:", $3
print ""
}
If we have a record like
"John Doe
1234 Unknown Ave.
Doeville, MA", this script sets the field separator to
newline so it can easily operate on rows
awk ’BEGIN { OFS = ";"; ORS = "\n\n" }
> { print $1, $2 }’ inputfile
With a two-field file, the records will be printed like this:
"field1:field2

field3;field4

...;..."
Because ORS, the Output Record Separator, is set to two newlines and OFS is ";"
awk ’BEGIN {
> OFMT = "%.0f" # print numbers as \
integers (rounds)
> print 17.23, 17.54 }’
This will print 17 and 18, because the Output ForMaT is set to round floating point values to the closest integer value
awk ’BEGIN {
> msg = "Dont Panic!"
> printf "%s\n", msg
>} '
You can use printf mainly how you use it in C
awk ’{ printf "%-10s %s\n", $1, \
$2 }’ inputfile
Prints the first field as a 10-character string, left-justified, and $2 normally, next to it
awk ’BEGIN { print "Name  Number"
             print "----  ------" }
     { printf "%-10s %s\n", $1, \
$2 }’ inputfile
Making things prettier
awk ’{ print $2 > "phone-list" }' \
inputfile
Simple data extraction example, where the second field is written to a file named "phone-list"
awk ’{ print $1 > "names.unsorted"
       command = "sort -r > names.sorted"
       print $1 | command }’ inputfile
Write the names contained in $1 to a file, then sort and output the result to another file (you can also append with >>, like you would in a shell)
awk ’BEGIN { printf "%d, %d, %d\n", 011, 11, \
0x11 }’
Will print 9, 11, 17
if (/foo/ || /bar/)
   print "Found!"
Simple search for foo or bar
awk ’{ sum = $2 + $3 + $4 ; avg = sum / 3
> print $1, avg }’ grades
Simple arithmetic operations (most operators resemble C a lot)
awk '{ print "The square root of", \
$1, "is", sqrt($1) }'
2
The square root of 2 is 1.41421
7
The square root of 7 is 2.64575
Simple, extensible calculator
awk ’$1 == "start", $1 == "stop"’ inputfile
Prints every record between start and stop
awk ’
> BEGIN { print "Analysis of \"foo\"" }
> /foo/ { ++n }
> END { print "\"foo\" appears", n,\
 "times." }’ inputfile
BEGIN and END rules are executed exactly once, before and after any record processing
echo -n "Enter search pattern: "
read pattern
awk "/$pattern/ "’{ nmatches++ }
END { print nmatches, "found" }’ inputfile
Search using shell
if (x % 2 == 0)
print "x is even"
else
print "x is odd"
Simple conditional. awk, like C, also supports the ?: operators
awk ’{ i = 1
  while (i <= 3) {
    print $i
    i++
  }
}’ inputfile
Prints the first three fields of each record, one per line.
awk ’{ for (i = 1; i <= 3; i++)
  print $i
}’
Prints the first three fields of each record, one per line.
BEGIN {
if (("date" | getline date_now) <= 0) {
  print "Can’t get system date" > \
"/dev/stderr"
  exit 1
}
print "current date is", date_now
close("date")
}
Exiting with an error code different from 0 means something's not quite right. Here's and example
awk ’BEGIN {
> for (i = 0; i < ARGC; i++)
> print ARGV[i]
> }’ file1 file2
Prints awk file1 file2
for (i in frequencies)
delete frequencies[i]
Delete elements in an array
foo[4] = ""
if (4 in foo)
print "This is printed, even though foo[4] \
is empty"
Check for array elements
function ctime(ts, format)
{
  format = "%a %b %d %H:%M:%S %Z %Y"
  if (ts == 0)
  ts = systime()
  # use current time as default
  return strftime(format, ts)
}
An awk variant of ctime() in C. This is how you define your own functions in awk
BEGIN { _cliff_seed = 0.1 }
function cliff_rand()
{
  _cliff_seed = (100 * log(_cliff_seed)) % 1
  if (_cliff_seed < 0)
    _cliff_seed = - _cliff_seed
  return _cliff_seed
}
A Cliff random number generator
cat apache-anon-noadmin.log | \
awk 'function ri(n) \
{  return int(n*rand()); }  \
BEGIN { srand(); }  { if (! \
($1 in randip)) {  \
randip[$1] = sprintf("%d.%d.%d.%d", \
ri(255), ri(255)\
, ri(255), ri(255)); } \
$1 = randip[$1]; print $0  }'
Anonymize an Apache log (IPs are randomized)

0 comments:

Post a Comment