AWK programming language

February 20, 2018.

Notes on awk programming language.

Notes

awk progrms have two parts: pattern and the action.

pattern { action }

Running awk programs:

awk 'program' input_files

Running an awk script against files.

awk -f prog.awk option list of files

Printing every line:

{ print }

which is the same as:

{print $0}

The data file: emp.data

Beth    4.00    0
Dan	3.75	0
Kathy	4.00	10
Mark	5.00	20
Mary	5.50	22
Susie	4.25	18

Print the name and pay (rate times hours) for everyone who worked more than zero hours

awk '$3 > 0 {print $1, $2, $3}' emp.data

their weekly pays.

awk '$3 > 0 {print $1, $2* $3}' emp.data

who did not work this week

awk '$3 == 0 {print $1}' emp.data

NF, the number of fields

{print NF, $1, $NF}, prints the number of fields, the first field and the last field.

{print NR, $0}, prints the line number for each line.

{print "total pay for", $1, "is", $2 * $3 }, puts text in the output.

printf statement:

{ printf("total pay for %s is $%.2f\n", $1, $2 * $3) }

Selection by logical operators &&, ||, !

BEGIN and END are special patterns that match before the first line and after the last lines of the file respectively.

BEGIN { print "NAME		RATE	HOURS"; print ""}
  { print }

Built-in functions

Control-Flow Statements

Regular expressions

awk '$4 ~ /Asia/ {print $1}' countries

will match the 4th column with Asia and print the 1st column (Country).

Regular expression metachracters: \ ^ $ . [ ] | ( ) * + ?

Built-in variables: ARGC ARGV FILENAME FNR FS NF NR OFMT OFS ORS RLENGTH RS RSTART SUBSEP.

Field variables: $0 $1 through $NF. $(NF-1) is last but one field.

Built-in Arithmetic functions: atan2(x), cos(x), exp(x), int(x), log(x), rand(x), sin(x), sqrt(x), srand(x).

Built-in String functions: index("banana", "an"), match(s,r), split(s,a,fs), sub(), sprintf(), split(), substr(), gsub()

Arrays

Associative arrays: pop["Asia"] += $3 or pop[$4] += $3

Test for existence: subscript in A

Delete element: delete pop[i]

User defined functions:

function (paramlist) {
    statements
}

Multiline records: RS = Record seperator.

system(exprn) function – executes the command given by exprn.

Why and how

Reference