You are on page 1of 8

Awk, named after its developers Aho, Weinberger, and Kernighan, is

a programming language which permits easy manipulation of


structured data and the generation of formatted reports.
What well-maintained awk-compatible languages are there?
6.1 nawk
6.2 gawk
6.3 mawk
6.4 tawk
6.5 mksawk
6.6 awkcc
6.7 awk2c
6.8 a2p
6.9 awka

The awk utility is a pattern scanning and processing language. It


searches one or more files to see if they contain lines that match
specified patterns and then perform associated actions, such as
writing the line to the standard output or incrementing a counter
each time it finds a match.

Some of the features of awk are:

• Its ability to view a text file as made up of records and fields in a textual
database.
• Its use of variables to manipulate the database.
• Its use of arithmetic and string operators.
• Its use of common programming constructs such as loops and conditionals.
• Its ability to generate formatted reports.

There are three variations of AWK:

• AWK - the original from AT&T


• NAWK - A newer, improved version from AT&T
• GAWK - The Free Software foundation's version

Description of an Awk Program

An awk program consists of one or more program lines containing a pattern and/or
action in the following format:
pattern { action }

The pattern selects lines from the input file. The awk utility performs the action on all
lines that the pattern selects. You must enclose the action within braces so that awk
can differentiate it from the pattern. There are two rules which occur if either a
pattern or action is ommited:

• If a program line does not contain a pattern, awk selects all lines in the input
file.
• If the program line does not contain an action, awk copies the selected lines to
its standard output (this is usually the display, if you haven't redirected the
output to another program or to a file).

To start, awk compares the first line in the input file with each pattern in the program.
If a pattern selects a line (if there is a match), awk takes the action associated with the
pattern. If the line is not selected, awk takes no action. When awk has completed its
comparisons for the first line of the input file, it repeats the process for the next line of
input. It continues this process, comparing subsequent lines in the input file, until it
has read the entire input file/s.

If several patterns select the same line, awk takes the actions associated with each of
the patterns in the order they appear. It is therefore possible for awk to send a single
line from the input file to its standad output more than once

An awk pattern is used to conditionally pass control to an action. An action only


executes if its relevant pattern was matched. You can use a regular expression,
enclosed within slashes, as a pattern. The ~ operator tests to see if a field or variable
matches a regular expression. The !~ operator tests for no match. You can process
arithmetic and character relational expressions with the following reational operators.

Operator Meaning

< less than


<= less than or equal
== to
!= equal to
>= not equal to
> greater than or
equal to
greater than

You can combine any of the patterns described above using the Boolean operators ||
(OR) or && (AND).

The comma is the range operator. If you separate two patterns with a comma on a
single awk progam line, awk selects a range of lines beginning with the first line that
contains the first pattern. The last line awk selects is the next subsequent line that
contains the second pattern. After awk finds the second pattern, it starts the process
over by looking for the first pattern again.

Two unique patterns, BEGIN and END, allow you to execute commands before awk
starts its processing and after it finishes. The awk utility exeutes the actions associated
with the BEGIN pattern before, and with the END pattern after, it processes all the
files for input.

Actions

The action portion of an awk command causes awk to take action when it matches a
pattern. If you do not specify an action awk performs the default action, which is the
Print command (explicitly represented as {print}). This action copies the record
(normally a line) from the input file to awk's standard output.

You can follow a Print command with arguments, causing awk to print just the
arguements you specify. The arguments can be variables or string constants. Using
awk, you can send the output from a Print command to a file (>), append it to a file
(>>), or pipe it to the input of another program (|).

Unless you separate items in a Print command with commas, awk catenates them.
Commas cause awk to separate the items with the output field seperator (normally a
space).

You can include several actions on one line within a set of braces by seperating them
with semicolons.
Awk Command Syntax
The awk command has the following command syntax :-

When using the awk command there are two important aspects we have to specify.
We have to tell awk which data we wish to process (the input data) and then how we
wish to process it (the awk program instructions). Awk lets us do this in several
ways...

The Input Data

• We can point awk to a file or several files containing the input data we wish to
process
(these are shown in the above diagram as the "list of input files" )
• We can use standard input to specify the input data
(this input data may be taken from the keyboard after executing the awk
command or from another program or unix command that is piped into awk)

The Awk Program

• We can point awk to a file containing the awk program


(thats the "awk -f program-file" command line option shown in the diagram)
• We can specify the awk program on the command line
(thats the "program source" option shown in the diagram)
As well as specifying the awk program and input data we can also specify other
things such as the input field seperator and the intial state of variables we use in our
code. The following list summarises all of the command line options available and
what each option's purpose is:

Command Line Option Purpose


-f program-file The -f program-file option specifies the file containing the awk program
code to execute, and is used as an alternative to writing the code on the
command line with the program source option.
program source The program source command line option is used to specify awk code
on the command line itself. If this option is used the awk code is best
enclosed in single quotes (') to protect it from the shell.
-Fc The -Fc command line option allows you to specify the field seperator
(FS) character. By default this is set to whitespace (SPACE and TAB).
To set the field seperator to the number zero you would add -F0 or -F"0"
to the command line.
variable=value This option enables us to initialise variables on the command line. To do
this we use the format variable=value, which will set the appropriate
variable to its related value prior to execution.
filenames... This is the file or list of files containing the input data we wish to
process.

In order to complete this tutorial package you will be required to program some awk
code through a WWW front end. You will not need to know all the details mentioned
above as the WWW user interface is much friendlier than UNIX's (although it would
be beneficial).

Comments and Annotations


The awk utility disregards anything on a program line following a hash sign (#). You
can document an awk program by preceding comments with this symbol.

For example,
# THIS AWK PROGRAM IS EQUIVILENT TO THE UNIX COMMAND "cat" (ALBEIT SLOWER!!)
{ print }

Awk will skip the first line of the above program.

Awk Functions
Awk provides us with several built in functions for manipulating numbers and strings.

A list of these functions and a description of what they do is shown below :-

Function Name Operation


length(string) returns the number of characters in string;
if you do not supply an argument, it returns
the number of characters in the current
input record.
int(number) returns the integer portion of number.
index(string1, string2) returns the index of string2 in string1 or 0
if string2 is not present.
split(string, array, delimiter) places elements of string, delimited by the
delimeter, in the array array[1]...array[n];
returning the number of elements in the
array.
sprintf(format, arguments) formats arguments according to the format
and returns the formatted string; mimics
the C programming language function of
the same name.
substr(string, position, length) returns a substring of string that begins at
position and is length characters long.

Arithmetic Operators
Awk takes its arithmetic operators from the C programming language. The following
list describes what each one does.

Name Function of Arithmetic Operation


* Multiplies the expression preceding the operator by the expression following it.
/ Divides the expression preceding the operator by the expression following it.
% Takes the remainder after dividing the expression preceeding the operator by the
expression following it.
+ Adds the expression preceeding the operator and the expression following it.
- Subtracts the expression preceeding the operator and the expression following it.
= Assigns the value of the expression following the operator to the variable
preceeding it.
++ Increments the variable preceeding the operator.
-- Decrements the variable preceeding the operator.
+= Adds the expression following the operator to the variable preceeding it and
assigns the result to the variable preceeding the operator.
-= Subtracts the expression following the operator to the variable preceeding it and
assigns the result to the variable preceeding the operator.
*= Multiplies the variable preceeding the operator by the expression following it
and assigns the result to the variable preceeding the operator.
/= Divides the variable preceeding the operator by the expression following it and
assigns the result to the variable preceeding the operator.
%= Takes the remainder, after dividing the variable preceeding the operator by the
expression following it, and assigns the result to the variable preceeding the
operator.

Printf Statement
You can use the printf command in place of print to control the format of the output
that awk generates. The awk version of printf is similar to that of the C language. A
printf command takes the following format:
printf "control-string" arg1, arg2, ... , argn

The control-string determines how printf will format arg1 - argn. These arguments
can be variables or other expressions. Within the control-string, you can use "\n" to
indicate a NEWLINE and "\t" to indicate a TAB.

The control-string contains conversion specifications, one for each argument. A


conversion specification has the following format:

%[-][x[.y]]conv

The - causes printf to left justify the argument. The x is the minimum field width, and
the .y is the number of places to the right of a decimal point in a number. The conv is
a letter from the following list:

Conv Conversion
d decimal
e exponential notation
f floating point number
g use f or e, whichever is shorter
o unsigned octal
s string of characters
x unsigned hexadecimal

You might also like