You are on page 1of 28

Introduction to SED & AWK

~ Srikanth Naidu CH HP_ESG, E&PE

Wipro Technologies

Agenda::
 Power tools for editing  Regular Expression  SED structure  SED commands  AWK Programming Model  Constants and Variables  Operators  System Variables  Formatted Printing  Passing Parameters Into a Script  Fundamental programming constructs  Arrays  Functions

Power Tools for Editing


To create and modify text files sed and awk are some power tools for editing. Using power tools we can save many hours of repetitive work done manually using a text editor. Whats common ??  They are invoked using similar syntax.  They are both stream-oriented, reading input from text files one line at a time and directing the result to standard output.  They use regular expressions for pattern matching.  They allow the user to specify instructions in a script.

Regular Expression
What's an Regular Expression? ape in Who has taken the tape? . * Matches any single character except newline Matches any number (including zero) of the single character that immediately precedes it.

[...] Matches any one of the class of characters enclosed between the brackets. A circumflex (^) as first character inside brackets A hyphen (-) is used to indicate a range of characters. The close bracket (]) as the first character in class is a member of the class. Any metacharacters when specified as members of a class. ^ First character of regular expression, matches the beginning of the line $ As last character of regular expression, matches the end of the line. \{n,m\} Matches a range of occurrences of the single character that immediately precedes it. \{n\} will match exactly n occurrences, \{n,\} will match at least n occurrences + Matches one or more occurrences of the preceding regular expression. ? Matches zero or one occurrences of the preceding regular expression | Specifies that either the preceding or following regular expression can be matched (alternation).
4

Sed Structure
SED :: Stream EDitor The Pattern Space ( Buffer ) One-line-at-a-time design Global Perspective on Addressing A sed command can specify zero, one, or two addresses An address can be a regular expression describing a pattern, a line number, or a line addressing symbol. How do u invoke SED? Command line :: sed [-e] 'instruction(s)' file Script :: sed -f scriptfile file Saving output :: sed -f scriptfile file >out-file Different types of SED scripts  Multiple Edits to the Same File  Making Changes Across a Set of Files
5

SED Commands
1. Addressing Types 1. [address] command 2. [line-address] command 3. address { command1 command2 command3 } 2. Comment # 3. Substitution [address] s/pattern/replacement/flags flags:: n :: nth occurrence of the pattern ( 1 to 512 ) g :: Make changes globally on all occurrences in the pattern space. p :: Print the contents of the pattern space w file :: Write the contents of the pattern space to file
6

E.g. s/CA/California/g

SED Commands

(cont )

4. Delete :: It takes an address and deletes the contents of the pattern space if the line matches the address. If the line matches the address, the entire line is deleted, not just the portion of the line that is matched. d E.g. /^$/d 5. Append :: The append command places the supplied text after the current line in the pattern space. [line-address] a\ text E.g. # cat poem.txt
A wise old owl lived in an oak; The more he saw the less he spoke; The less he spoke the more he heard: Why can't we all be like that bird? #sed '/oak/a\ > APPENDED TEXT > ' poem.txt A wise old owl lived in an oak; APPENDED TEXT The more he saw the less he spoke; The less he spoke the more he heard: Why can't we all be like that bird?

SED Commands

(cont )

6. Insert :: The insert command places the supplied text before the current line in the pattern space. [line-address] i\ text E.g. #sed '/bird/i\
> INSERTED TEXT > ' poem.txt A wise old owl lived in an oak; The more he saw the less he spoke; The less he spoke the more he heard: INSERTED TEXT Why can't we all be like that bird?

7. Change :: The change command replaces the contents of the pattern space with the supplied text. [address] c\ text E.g.
.#sed '/bird/c\ > CHANGED TEXT > ' poem.txt A wise old owl lived in an oak; The more he saw the less he spoke; The less he spoke the more he heard: CHANGED TEXT

SED Commands

(cont )

8. List :: The list command (l) displays the contents of the pattern space, showing non-printing characters as two-digit ASCII codes. E.g.#cat stmt.txt Ramu is good Boy Sita is a beautiful girl #sed 'l' stmt.txt Ramu is good Boy Ramu is good Boy \07 \20 Sita is a beautiful girl Sita is a beautiful girl \31 \01 #sed -n 'l' stmt.txt Ramu is good Boy \07 \20 Sita is a beautiful girl \31 \01

SED Commands

(cont )

9. Transform:: This command transforms each character by position in string abc to its equivalent in string xyz . [address] y/abc/xyz/ E.g.
#sed 'y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/' poem.txt
A WISE OLD OWL LIVED IN AN OAK; THE MORE HE SAW THE LESS HE SPOKE; THE LESS HE SPOKE THE MORE HE HEARD: WHY CAN'T WE ALL BE LIKE THAT BIRD?

10.

Print:: The print command (p) causes the contents of the pattern space to be output. It does not clear the pattern space nor does it change the flow of control in the script.

E.g. #sed -n '/oak/p' poem.txt A wise old owl lived in an oak;

11. Print Line Number :: An equal sign (=) following an address prints the line number of the matched line.
E.g. sed '=' poem.txt 1 A wise old owl lived in an oak; 2 The more he saw the less he spoke; 3 The less he spoke the more he heard: 4 Why can't we all be like that bird? 5

10

SED Commands
12.

(cont )

Next:: The next command (n) outputs the contents of the pattern space and then reads the next line of input without returning to the top of the script. [address]n
> p' poem.txt A wise old owl lived in an oak; The less he spoke the more he heard: Why can't we all be like that bird?

E.g. sed -n '/saw/n

13. Read file:: The read ( r )command reads the contents of file into the pattern space after the addressed line. [line-addess]r file E.g. #cat add
Something more abt the owl at the oak Its very amazing one every one used to come to the owl for suggestions #sed '/oak/r add > ' poem.txt A wise old owl lived in an oak; Something more abt the owl at the oak Its very amazing one every one used to come to the owl for suggestions The more he saw the less he spoke; The less he spoke the more he heard: Why can't we all be like that bird?

11

SED Commands
[address]w file E.g.sed -n ' /oak/w op.txt > ' poem.txt cat op.txt A wise old owl lived in an oak;

(cont )

14. Write to file:: The write command writes the contents of the pattern space to the file.

15. Quit:: The quit command (q) causes sed to stop reading new input lines and stop sending them to the output. [line-address]q E.g.sed '/oak/q' poem.txt A wise old owl lived in an oak;

12

AWK Programming Model


AWK is programming language designed to search for, match pattern and perform action on files (or input lines) Acronym out of its developers names (Aho, Weinberger, Kernighan ) An awk program consists a main input loop. This loop exists as the framework within which the code that you do write will be executed BEGIN and END rules

Flow and control in awk scripts


13

AWK Programming Model


Structure of an awk program pattern { action } pattern { action } ... How do u invoke AWK ? 1. Command line :: awk 'instructions' files 2. Script :: awk -f script files Saving output :: awk -f script files >out-file 3. Script that calls awk by itself #!/usr/bin/awk f Awk makes the assumption that its input is structured. AWK Records and Fields Field separator (FS) To change a FS -F Referencing the fields using the field operator $ $1 $2 .. so on and $0 E.g. # awk '{ print $2, $1, $3 }' names

(cont )

14

Constants and Variables


Constants:: There are two types of constants string or numeric ( Ram and 1 )

Variables:: A variable is an identifier that references a value. Name the variable and assign a value to it Naming rules case sensitive AWK initializes the Variables Need not give the type while using a variable There will be both string and integer values to the variables and awk judges the type depending on the context.

15

System Variables
FILENAME FS NF NR OFS ORS RS ARGC ARGV Current filename

Field separator (a blank) Number of fields in current record Number of the current record Output field separator (a blank) Output record separator (a newline) Record separator (a newline) Number of arguments on command line An array containing the command-line arguments

ENVIRON An associative array of environment variables

16

Operators
Arithmetic operators Operator Description + Addition Subtraction * Multiplication / Division % Modulo ^ Exponentiation Assignment Operators Operator Description ++ Add 1 to variable -Subtract 1 from variable += Assign result of addition -= Assign result of subtraction *= Assign result of multiplication /= Assign result of division %= Assign result of modulo ^= Assign result of exponentiation

E.g. # Count blank lines.


/^$/ { print x += 1

}
17

Operators
Relational Operators
Operator < > <= >= == != ~ !~
E.g. $5 ~ /LA/ { print $1 ", " $6 }

(cont )

Description Less than Greater than Less than or equal to Greater than or equal to Equal to Not equal to Matches Does not match

Boolean Operators
Operator || && !
E.g. .NF == 6 && NR > 1

Description Logical OR Logical AND Logical NOT


18

Formatted Printing
Awk offers an alternative to the print statement, printf . printf ( format-expression [, arguments] ) A format specification is preceded by a percent sign (%) printf statement can be used to specify the width and alignment of output fields

Character
c d e E f g G o s x X

Description
ASCII character Decimal integer Floating-point format ([-]d.precision e[+-] dd) Floating-point format ([-]d.precision E[+-]dd) E.g. printf "%4.3e\n", 1950 gives 1.950e+03 Floating-point format ([-] ddd.precision) E.g. printf "%4.3f", 1950 gives 1950.000 e or f conversion, whichever is shortest E or f conversion, whichever is shortest Unsigned octal value String Unsigned hexadecimal number. Uses a-f for 10 to 15 Unsigned hexadecimal number. Uses A-F for 10 to 15
19

Passing Parameters Into a Script


A parameter assigns a value to a variable that can be accessed within the awk script. The variable can be set on the command line, after the script and before the filename. awk 'script' var=value inputfile Each parameter must be interpreted as a single argument. Therefore, spaces are not permitted on either side of the equal sign. E.g. awk -f scriptfile high=100 low=60 datafile

In addition, environment variables or the output of a command can be passed as the value of a variable. awk '{ ... }' directory=$cwd file1 ... awk '{ ... }' directory=`pwd` file1 ... We can use command-line parameters to define system variables, as in the following example: $ awk '{ print NR, $0 }' OFS='. ' names AWK provides a provision for defining parameters before any input is read. The -v option specifies variable assignments that you want to take place before executing the BEGIN procedure (i.e., before the first line of input is read.) The -v option must be specified before a command-line script. 20 E.g. awk -v RS=;" '{ print }' phones.block

Fundamental programming constructs


1. IF Statement
if ( expression ) action1 [else action2]

2. While Loop while (condition)


{ action }

3. For Loop
for ( set_counter ; test_counter ; increment_counter ) action

4. Other statements that effect flow control


break and continue next and exit (main input loop )
21

Arrays
An array is used to store a set of values. array[subscript] = value

In awk, you don't have to declare the size of the array; you only have to use the identifier as an array. In awk, all arrays are associative arrays. What makes an associative array unique is that its index can be a string or a number.

Special version of the for loop for arrays in AWK for ( variable in array ) do something with array[variable] E.g. for ( item in acro ) print item, acro[item] The keyword in is also an operator that can be used in a conditional expression to test that a subscript is a member of an array. E.g. item in array
22

Arrays
split to create Arrays
The built-in function split() can parse any string into elements of an array n = split(string, array, separator) The array's indices start at 1 and go to n

(cont )

If a separator is not specified, then the field separator (FS) is used n gives the total no. of elements

Deleting Elements of an Array


delete array[subscript]

23

Functions
Awk has a number of built-in functions in two groups Arithmetic and String functions.

awk's Built-In Arithmetic Functions Awk Function


cos(x) exp(x) int(x) log(x) sin(x) sqrt(x) atan2(y,x) rand()

Description
Returns cosine of x (x is in radians). Returns e to the power x. Returns truncated value of x. Returns natural logarithm (base-e) of x. Returns sine of x (x is in radians). Returns square root of x. Returns arctangent of y/x in the range - to . Returns pseudo-random number r, where 0 <= r < 1.

24

Functions
Awk's Built-In String Functions Awk Function
gsub(r,s,t) index(s,t) length(s) match(s,r)

(cont)

Description
Globally substitutes s for each match of the regular expression r in the string t. Returns the number of substitutions. If t is not supplied, defaults to $0. Returns position of substring t in string s or zero if not present. Returns length of string s or length of $0 if no string is supplied. Returns either the position in s where the regular expression r begins, or 0 if no occurrences are found. Parses string s into elements of array a using field separator sep; returns number of elements. If sep is not supplied, FS is used. Array splitting works the same way as field splitting. Substitutes s for first match of the regular expression r in the string t. Returns 1 if successful; 0 otherwise. If t is not supplied, defaults to $0.

split(s,a,sep)

sub(r,s,t)

25

Functions
substr(s,p,n) tolower(s) toupper(s)

(cont)

Returns substring of string s at beginning position p up to a maximum length of n. If n is not supplied, the rest of the string from p is used. Translates all uppercase characters in string s to lowercase and returns the new string. Translates all lowercase characters in string s to uppercase and returns the new string.

Writing your own functions


function name (parameter-list) { statements} The parameter-list is a comma-separated list of variables that are passed as arguments into the function when it is called The function typically contains a return statement that returns control to that point in the script where the function was called A function definition can be placed anywhere in a script that a pattern-action rule can appear. Typically, we put the function definitions at the top of the script before the pattern-action rules.
26

Summary
SED and AWK are two of the most powerful and most versatile tools that UNIX offers. SED and AWK are both well suited to automating monotonous text editing tasks that would normally be done interactively in a text editor. SED is definitely the simpler of the two, having grown out of "substitute" commands of ex and vi. AWK is much more mature, complete, and it shares a number of syntactic features with C. SED is a stream editor and can be used to apply a series of edits to a number of files, or the stdout of a program. AWK is a powerful interpreted programming language that provides extensive text processing facilities and is ideal for producing reports and restructuring data. Although both SED and AWK were born in a UNIX environment, they have been ported to a variety of platforms and are frequently available in freeware as well as commercial packages.

27

THANK Y U

28

You might also like