The awk
command is powerful method for processing or analyzing text files, in
particular data files that are organized by lines (rows) and columns.
Simple awk commands can be run from the command line. More complex tasks should be written as awk programs (so-called awk scripts) to a file.
The basic format of an awk command looks like this:
The list of statements inside the curly brackets ('{','}') is called a block. If you put a conditional expression in front of a block, the statement inside the block will be executed only if the condition is true.
So this awk statement prints out the element at the 3rd column of each line that has a "$7.30" at column 7.
You can also use regular expressions as condition. For example:
Awk statements are frequently combined with sed commands.
Simple awk commands can be run from the command line. More complex tasks should be written as awk programs (so-called awk scripts) to a file.
The basic format of an awk command looks like this:
awk 'pattern {action}' input-file > output-file
This means: take each line of the input file; if the line contains the
pattern apply the action to the line and write the resulting line to the
output-file. If the pattern is omitted, the action is applied to all
line. For example:
awk '{ print $5 }' table1.txt > output1.txt
This statement takes the element of the 5th column of each line and
writes it as a line in the output file "output.txt". The variable '$4'
refers to the second column. Similarly you can access the first,
second, and third column, with $1, $2, $3, etc. By default columns are
assumed to be separated by spaces or tabs (so called white space). So,
if the input file "table1.txt" contains these lines:
1, Justin Timberlake, Title 545, Price $7.30
2, Taylor Swift, Title 723, Price $7.90
3, Mick Jagger, Title 610, Price $7.90
4, Lady Gaga, Title 118, Price $7.30
5, Johnny Cash, Title 482, Price $6.50
6, Elvis Presley, Title 335, Price $7.30
7, John Lennon, Title 271, Price $7.90
8, Michael Jackson, Title 373, Price $5.50
Then the command would write the following lines to the output file "output1.txt":
545,
723,
610,
118,
482,
335,
271,
373,
If the column separator is something other than spaces or tabs, such as a
comma, you can specify that in the awk statement as follows:
This will select the element from column 3 of each line if the columns are considered to be separated by a comma. Therefore the output, in this case, would be:awk -F, '{ print $3 }' table1.txt > output1.txt
Title 545
Title 723
Title 610
Title 118
Title 482
Title 335
Title 271
Title 373
The list of statements inside the curly brackets ('{','}') is called a block. If you put a conditional expression in front of a block, the statement inside the block will be executed only if the condition is true.
In this case the condition is $7=="\$7.30", which means that the element at column 7 is equal to $7.30. The backslash in front of the dollar sign is used to prevent the system from interpreting $7 as a variable and instead take the dollar sign literally.awk '$7=="\$7.30" { print $3 }' table1.txt
So this awk statement prints out the element at the 3rd column of each line that has a "$7.30" at column 7.
You can also use regular expressions as condition. For example:
awk '/30/ { print $3 }' table1.txt.
The string between the two slashes ('/') is the regular expression. In
this case it is just the string "30". This means, if a line contains the
string "30", the system prints out the element at the 3rd column of
that line.
The output in the above example would be:
Timberlake,
Gaga,
Presley,
If the table elements are numbers awk can run calculations on them as in this example:
awk '{ print ($2 * $3) + $7 }'
Besides the variables that access elements of the current row ($1, $2,
etc.) there is the variable $0 which refers to the complete row (line),
and the variable NF which holds to the number of fields.
You can also define new variables as in this example:
This computes and prints the sum of all the elements of each row.awk '{ sum=0; for (col=1; col<=NF; col++) sum += $col; print sum; }'
Awk statements are frequently combined with sed commands.
Linux awk command with examples | |
---|---|
Linux command syntax | Linux command description |
awk ' {print $1,$3} ' |
Print only columns one and three using stdin |
awk ' {print $0} ' |
Print all columns using stdin |
awk ' /'pattern'/ {print $2} ' |
Print only elements from column 2 that match pattern using stdin |
awk -f script.awk inputfile |
Just like make or sed, awk uses -f to get its' instructions from a file, useful when there is a lot to be done and using the terminal would be impractical |
awk ' program ' inputfile |
Execute program using data from inputfile |
awk "BEGIN { print \"Hello, world!!\" }" |
Classic "Hello, world" in awk |
awk '{ print }' |
Print what's entered on the command line until EOF (^D) |
#! /bin/awk -f BEGIN { print "Hello, world!" } |
awk script for the classic "Hello, world!" (make it executable with chmod and run it as-is) |
# This is a program that prints \ "Hello, world!" # and exits |
Comments in awk scripts |
awk -F "" 'program' files |
Define the FS (field separator) as null, as opposed to white space, the default |
awk -F "regex" 'program' files |
FS can also be a regular expression |
awk 'BEGIN { print "Here is a single \ quote <'\''>" }' |
Will print <'>. Here's why we prefer Bourne shells. :) |
awk '{ if (length($0) > max) max = \ length($0) } END { print max }' inputfile |
Print the length of the longest line |
awk 'length($0) > 80' inputfile |
Print all lines longer than 80 characters |
awk 'NF > 0' data |
Print every line that has at least one field (NF stands for Number of Fields) |
awk 'BEGIN { for (i = 1; i <= 7; i++) print int(101 * rand()) }' |
Print seven random numbers from 0 to 100 |
ls -l . | awk '{ x += $5 } ; END \ { print "total bytes: " x }' total bytes: 7449362 |
Print the total number of bytes used by files in the current directory |
ls -l . | awk '{ x += $5 } ; END \ { print "total kilobytes: " (x + \ 1023)/1024 }' total kilobytes: 7275.85 |
Print the total number of kilobytes used by files in the current directory |
awk -F: '{ print $1 }' /etc/passwd | sort |
Print sorted list of login names |
awk 'END { print NR }' inputfile |
Print number of lines in a file, as NR stands for Number of Rows |
awk 'NR % 2 == 0' data |
Print the even-numbered lines in a file. How would you print the odd-numbered lines? |
ls -l | awk '$6 == "Nov" { sum += $5 } END { print sum }' |
Prints the total number of bytes of files that were last modified in November |
awk '$1 ̃/J/' inputfile |
Regular expression matching all entries in the first field that start with a capital j |
awk '$1 ̃!/J/' inputfile |
Regular expression matching all entries in the first field that don't start with a capital j |
awk 'BEGIN { print "He said \"hi!\" \ to her." }' |
Escaping double quotes in awk |
echo aaaabcd | awk '{ sub(/a+/, \ "<A>"); print }' |
Prints "<A>bcd" |
ls -lh | awk '{ owner = $3 ; $3 = $3 \ " 0wnz"; print $3 }' | uniq |
Attribution example; try it :) |
awk '{ $2 = $2 - 10; print $0 }' inventory |
Modify inventory and print it, with the difference being that the value of the second field will be lessened by 10 |
awk '{ $6 = ($5 + $4 + $3 + $2); print \ $6' inventory |
Even though field six doesn't exist in inventory, you can create it and assign values to it, then display it |
echo a b c d | awk '{ OFS = ":"; $2 = "" > print $0; print NF }' |
OFS is the Output Field Separator and the command will output "a::c:d" and "4" because although field two is nullified, it still exists so it gets counted |
echo a b c d | awk ’{ OFS = ":"; \ $2 = ""; $6 = "new" > print $0; print NF }’ |
Another example of field creation; as you can see, the field between $4 (existing) and $6 (to be created) gets created as well (as $5 with an empty value), so the output will be "a::c:d::new" "6" |
echo a b c d e f | awk ’\ { print "NF =", NF; > NF = 3; print $0 }’ |
Throwing away three fields (last ones) by changing the number of fields |
FS=[ ] |
This is a regular expression setting the field separator to space and nothing else (non-greedy pattern matching) |
echo ' a b c d ' | awk 'BEGIN { FS = \ "[ \t\n]+" } > { print $2 }' |
This will print only "a" |
awk -n '/RE/{p;q;}' file.txt |
Print only the first match of RE (regular expression) |
awk -F\\\\ ’...’ inputfiles ... |
Sets FS to \\ |
BEGIN { RS = "" ; FS = "\n" } { print "Name is:", $1 print "Address is:", $2 print "City and State are:", $3 print "" } |
If we have a record like "John Doe 1234 Unknown Ave. Doeville, MA", this script sets the field separator to newline so it can easily operate on rows |
awk ’BEGIN { OFS = ";"; ORS = "\n\n" } > { print $1, $2 }’ inputfile |
With a two-field file, the records will be printed like this: "field1:field2 field3;field4 ...;..." Because ORS, the Output Record Separator, is set to two newlines and OFS is ";" |
awk ’BEGIN { > OFMT = "%.0f" # print numbers as \ integers (rounds) > print 17.23, 17.54 }’ |
This will print 17 and 18, because the Output ForMaT is set to round floating point values to the closest integer value |
awk ’BEGIN { > msg = "Dont Panic!" > printf "%s\n", msg >} ' |
You can use printf mainly how you use it in C |
awk ’{ printf "%-10s %s\n", $1, \ $2 }’ inputfile |
Prints the first field as a 10-character string, left-justified, and $2 normally, next to it |
awk ’BEGIN { print "Name Number" print "---- ------" } { printf "%-10s %s\n", $1, \ $2 }’ inputfile |
Making things prettier |
awk ’{ print $2 > "phone-list" }' \ inputfile |
Simple data extraction example, where the second field is written to a file named "phone-list" |
awk ’{ print $1 > "names.unsorted" command = "sort -r > names.sorted" print $1 | command }’ inputfile |
Write the names contained in $1 to a file, then sort and output the result to another file (you can also append with >>, like you would in a shell) |
awk ’BEGIN { printf "%d, %d, %d\n", 011, 11, \ 0x11 }’ |
Will print 9, 11, 17 |
if (/foo/ || /bar/) print "Found!" |
Simple search for foo or bar |
awk ’{ sum = $2 + $3 + $4 ; avg = sum / 3 > print $1, avg }’ grades |
Simple arithmetic operations (most operators resemble C a lot) |
awk '{ print "The square root of", \ $1, "is", sqrt($1) }' 2 The square root of 2 is 1.41421 7 The square root of 7 is 2.64575 |
Simple, extensible calculator |
awk ’$1 == "start", $1 == "stop"’ inputfile |
Prints every record between start and stop |
awk ’ > BEGIN { print "Analysis of \"foo\"" } > /foo/ { ++n } > END { print "\"foo\" appears", n,\ "times." }’ inputfile |
BEGIN and END rules are executed exactly once, before and after any record processing |
echo -n "Enter search pattern: " read pattern awk "/$pattern/ "’{ nmatches++ } END { print nmatches, "found" }’ inputfile |
Search using shell |
if (x % 2 == 0) print "x is even" else print "x is odd" |
Simple conditional. awk, like C, also supports the ?: operators |
awk ’{ i = 1 while (i <= 3) { print $i i++ } }’ inputfile |
Prints the first three fields of each record, one per line. |
awk ’{ for (i = 1; i <= 3; i++) print $i }’ |
Prints the first three fields of each record, one per line. |
BEGIN { if (("date" | getline date_now) <= 0) { print "Can’t get system date" > \ "/dev/stderr" exit 1 } print "current date is", date_now close("date") } |
Exiting with an error code different from 0 means something's not quite right. Here's and example |
awk ’BEGIN { > for (i = 0; i < ARGC; i++) > print ARGV[i] > }’ file1 file2 |
Prints awk file1 file2 |
for (i in frequencies) delete frequencies[i] |
Delete elements in an array |
foo[4] = "" if (4 in foo) print "This is printed, even though foo[4] \ is empty" |
Check for array elements |
function ctime(ts, format) { format = "%a %b %d %H:%M:%S %Z %Y" if (ts == 0) ts = systime() # use current time as default return strftime(format, ts) } |
An awk variant of ctime() in C. This is how you define your own functions in awk |
BEGIN { _cliff_seed = 0.1 } function cliff_rand() { _cliff_seed = (100 * log(_cliff_seed)) % 1 if (_cliff_seed < 0) _cliff_seed = - _cliff_seed return _cliff_seed } |
A Cliff random number generator |
cat apache-anon-noadmin.log | \ awk 'function ri(n) \ { return int(n*rand()); } \ BEGIN { srand(); } { if (! \ ($1 in randip)) { \ randip[$1] = sprintf("%d.%d.%d.%d", \ ri(255), ri(255)\ , ri(255), ri(255)); } \ $1 = randip[$1]; print $0 }' |
Anonymize an Apache log (IPs are randomized) |
0 comments:
Post a Comment