You are on page 1of 277

Programming With Perl

An Introduction
September 2005

Notes:
Structure Of This Course

 This course is split into three parts:


1. Introduction (~0.5 hours)
2. Course material (~9.0 hours)
3. Labs/Tutorials/Exercises (~2.5 hours)

The goal is to cover 75% of the Perl language.

 All of the material in this course comes from “Programming Perl 3rd edition”
and the Perl Cookbook.
 If there’s anything which is not clear then ask as we go.

 One thing I would like from this course is feedback, so,


 Fill in the course feedback form before you leave tomorrow.

An assumption is that everyone has some programming experience. This course isn’t
going to teach programming.

Some parts of Perl are not going to be covered - Ties and DBM, Formats, Many
system functions. This is all reference material which you can find in any of the
standard texts - or in the man pages.
Agenda - Day 1/2

 09:00 - 09:30 Introduction Day1 Course agenda is the same


 09:30 - 11:15 Perl times for each of the two days.
 11:15 - 11:30 Break
 11:30 - 12:30 Perl Labs and exercises happen as
we go. If there are any problems
 12:30 - 13:00 Break for lunch
or questions you wish to raise,
 13:00 - 14:30 Perl just ask.
 14:30 - 14:45 Break
 14:45 - 16:00 Perl

 09:00 - 09:30 Recap of day 1


Day2
 09:30 - 11:15 Perl
 11:15 - 11:30 Break
 11:30 - 12:30 Perl
 12:30 - 13:00 Break for lunch
 13:00 - 14:30 Perl
 14:30 - 14:45 Break
 14:45 - 15:30 Perl
 Conclusions, discussion, questions, feedback

Each day is 09:00 to 16:00 with 30 minutes for lunch and a 15 minutes break in both
the morning and the afternoon.

Agenda is flexible. If there are specific areas which I haven’t covered in which you
have an interest, then ask.

There are lots of LABS and exercises - most are small to start with and get more
detailed as we get to the end of the course.

By the time we get to the end of the overview everyone should be capable of writing
simple scripts which manipulate files and do simple pattern matching and substitution.

We will be largely learning by example - lots of the examples in this course come from
the Perl Cookbook.
The Pursuit Of Happiness (Or The Hard Sell)

 Perl is a language for getting your job done.


 Designed to make easy jobs easy without making hard jobs impossible
 What are the easy jobs?
 Manipulate numbers & text & files & directories & computers & networks.
 You want to be able to run external programs & scan their output.
 It should be easy to develop, modify & debug your own programs.
 Perl is a glue language.
 Perl is especially popular with web programmers and developers.
 But only because they discovered it first.
 We will look at perl from a viewpoint of helping in areas of:
 Design.
 Programming.
 Verification.
 Documentation/Reporting.
 Data analysis.
 Perl is an ideal language for data manipulation.

Notes:
What Is Perl?

 To those who like it:


Practical Extraction and Reporting Language.
 To those who love it:
Pathologically Eclectic Rubbish Lister.
 Above all Perl is:
 Free.
 Easy to use.
 Capable of “One-Liners” or whole projects.
 But be careful:
 You can write rubbish software in any language.

 If you’ve programmed before in Basic, C, C++, Pascal, awk, Python, English


then you’ll probably feel comfortable with Perl.

Any language can be used to write code which is not maintainable. Perl isn’t an
exception to that rule.

We will look at three different styles of programming.

1. Flat programming - simple scripts.

2. Procedural programming - larger programs based on procedures and control


structures and simple data structures.

3. One-liners.

As part of the course notes there is a style guide for Perl. Follow it (or something like
it).

Some of the things we won’t have time to cover on this course are OO Perl and
Advanced Data Structures. If this is something which interests you, then let me
know since a follow-up course is possible/likely.
The History Of Perl

 This guy is Larry Wall, the creator of Perl.

 Perl has been around since 1987 (Perl1).


 1988 sees Perl 2, 1989 sees Perl 3.
 1991 sees “Programming Perl, 1st edition,
and Perl 4. The Internet explodes into
growth.
 1994 sees Perl 5.
.
.
.
 1997 - Perl 6 is announced.
 2005 onwards - Perl 6 - we’re still waiting.

Notes:
More About Perl

 Perl is a rich language:


 Perl is modularly extensible.
 You can rapidly design, program, debug & deploy applications.
 You can extend the functionality of those applications as needed.
 You can embed Perl in other languages.
 You can embed other languages in Perl.
 You can write Object-Oriented Perl.

 A misconception:
 Perl is interpreted and so it’s slow!
 Perl compiles to an intermediate format (like Java bytecode or Pascal P-Code).
 Once it is compiled it is passed to the interpreter for execution.
 Hence:
 You can write faster code in C but you can write code faster in Perl.

 Great solutions come from using pre-built Perl modules written in C:


 C speed.
 Perl’s convenience and flexibility.

For embedding the choice of language is C since Perl is written in C.


How To Get Perl

 Unix:
 Available on-site. See:
 /pd/perl/5.005_503/bin/perl
 /pd/perl/5.8.6/bin/perl
 /usr/local/bin/perl
 Windows:
 Active-state Perl (version 5.8.6) from www.activestate.com
 Linux:
 Included as part of all standard Linux distributions (version 5.8.6)
 Mac OS X:
 Included as part of OS X (version 5.8.1 on OS X 10.3.9)

We have various versions on site - recommend that we use 5.8.x.

Perl Tk is available on-site in version 5.8.x.


Places To Get Useful Information - I

 Internet:
 www.perl.com (The Perl homepage)
 www.perl.org (The Perl mongers homepage)
 www.oreilly.com
 search.cpan.org (Go here to find Perl modules)
 Comp.lang.perl newsgroup hierarchy:
 comp.lang.perl.misc
 comp.lang.perl.moderated
 comp.lang.perl.modules
 comp.lang.perl.tk
 Man perl from a unix command line:
 Gives all the perl help topics
 Ask

All the news groups listed above are available in this building.

Perl is probably the most widely used and understood programming language in
Bristol. People can always come and ask me a question if they have a problem.
Places To Get Useful Information - II

 Books:
 Programming Perl (3rd edition)
 Larry Wall & Tom Christiansen & Jon Orwant - ISBN 0-596-00027-8
 Learning Perl (3rd edition)
 Randal Schwartz and Tom Phoenix - ISBN 0-596-00132-0
 Perl Cookbook (2nd edition).
 Tom Christiansen & Nathan Torkington - ISBN 1-56592-243-3
 Mastering Algorithms With Perl
 Jon Orwant, Jarrko Hietaniemi & John Macdonald - ISBN 1-56592-398-7
 Advanced Perl Programming
 Sriram Srinivasan - ISBN 1-56592-220-4

If you only buy one book make it the camel book (A.K.A. programming Perl) ,
followed by the Perl Cookbook. If you do buy programming Perl make sure it’s the 3rd
edition and NOT the 2nd edition.

There are two Perl in 21 Days books, one of which is available on-line at the CR&D
bookshelf web-site.
(The on-line version can be found in the tutorial areas as a series of PDF files).

Since a lot of this course is going to be Perl by example, I’ve placed a few programs
into the various tutorial areas which all can be used (reused) as you wish. There’s
also a copy of a Perl module (Netlist_Functions.pm) which contains a lot of useful
functions which can be imported into your own programs. Hey, why bother
programming when you can steal! (This really is the philosophy you should be
adopting in your own work.
(Some of) The Perl Manpages

Manpage Covers
perl What perl manpages are available
perldata Data types
perlsyn Syntax
perlop Operators and precedence
perlre Regular expressions
perlvar Predefined variables
perlsub Subroutines
perlfunc Built-in functions
perlmod How to make modules work
perlref References
perlobj Objects
perlipc Inter-process communications
perlrun How to run Perl commands, plus switches
perldebug Debugging
perldiag Diagnostic messages

Notes:
(More About) The Perl Manpages

 See also:
 perlfaq1 to perlfaq9
 As of Perl version 5.6.1 you can search individual Perl manpages by using the
name of the manpage as a command and passing a Perl regular expression
as the search pattern.
 Examples:
perlop comma
perlfunc split
perlvar ARG
perldiag ‘assigned to typeglob’
 When you don’t know where something is in the documentation, search all
the FAQ’s:
perlfaq round
Some Terminology

 Idiomatic Perl:
 Widespread and accepted ways of doing certain things in Perl.
If ( $variable != 56 ) print “Your variable did not equal $variable\n”;
print “Your variable did not equal $variable\n” unless ( $variable == 56 );

 Interpolation:
 Replacing a variable with the variables value.
 Regexp’s:
 Regular expressions.
 CPAN:
 The Comprehensive Perl Archive Network.
 The place to go to get modules written and contributed by other Perl
programmers.
 Don’t reinvent the wheel, or if you do then make sure it’s a better wheel.
 Share code within your office/group/site/business unit.

Idiomatic Perl is one of the most confusing bits of Perl since there are so many
different ways of doing things. This can be both useful (you can program in the way
which suits you) and a drawback (reading other peoples code isn’t always easy)

TMTOWTDI - There’s More Than One Way To Do It - the Perl motto.

Interpolation will be mentioned a lot by people who use Perl a lot - it’s just a fancy
computer science term.

Regexps - these are not exactly the same as regular expressions in other UNIX
applications - so be careful.

CPAN - pretty light on EDA type code. Maybe we should start a forum!
Account Details

 There are six user accounts: user1 to user6


 Password for each account is: ________
 Each area holds:
 Copies of all the course material as .pdf files.
 Tutorial areas for all the labs.
 A “How To” guide.
 A document on “Perl Style”.
 A list of some common regexp’s.
 Issues 1 and 2 of the Perl Review (as .pdf files).

Notes:
Account Details






Notes:
A Standard Header

 This works in Bristol.


#!/usr/local/bin/perl

use strict; Preamble


use warnings;
use diagnostics;

use Carp; Some standard modules


use Cwd;
use Config;

use lib ( "/design/rmc/tools/Perl_Modules/tool/current/" );


use lib ( "/design/rmc/tools/Perl_Modules/tool/current/ Extend lib path
OS_SPECIFIC/$Config{archname}" );

use FindBin qw( $Bin );


use lib $Bin; Current directory

use Netlist_Tools; Site specific

 There are other binary invocations that use “eval’ with some “magic”.
PREVIEW - Examples Of sprintf()

Field Meaning

%% A percent sign

%c A character with the given number

%s A string

%d A signed integer, in decimal

%u An unsigned integer, in decimal

%o An unsigned integer, in octal

%x An unsigned integer, in hexadecimal

%e A floating-point number, in scientific notation

%f A floating-point number, in fixed decimal notation.

%g A floating-point number, in %e or %f notation

See Chapter 29 (pages 797 to 799) of Programming Perl, 3rd edition.

Be careful - sprintf() in Perl does its own formatting - it is NOT calling the
underlying sprintf() function in the C library.
PREVIEW - Examples Of sprintf()

Field Meaning

%X Like %x, but using uppercase characters

%E Like %e, but using uppercase “E”

%G Like %g, but using uppercase “E” if applicable

%b An unsigned integer, in binary

%p A pointer (the Perl value’s address in hexadecimal)

%n A special: stores the number of characters output so far into the next variable in the
argument list.

In addition to the formats on the previous slide, Perl also supports the following
conversions.

For compatibility, Perl also supports these conversions:

%I - a synonym for %d
%D - a synonym for %ld
%U - a synonym for %lu
%O - a synonym for %lo
%F - a synonym for %f
PREVIEW - Examples Of sprintf()

Flag Meaning

space Prefix positive number with a space

+ Prefix positive number with a plus sign

- Left-justify within field

0 Use zeroes, not spaces, to right-justify

# Prefix non-zero octal with “0”, non-zero hex with “0x”

number Minimum field width

.number “Precision”: digits after the decimal point for floating-point numbers, maximum length
for a string, minimum length for an integer.
l Interpret integer as a C type long or unsigned long

h Interpret integer as C type short or unsigned short (if no flags are supplied interpret
integer as C type int or unsigned

See Chapter 29 (pages 797 to 799) of Programming Perl, 3rd edition.

Perl allows the following flags between the % and the conversion character.
PREVIEW - Examples Of chop() And chomp()

@lines = `cat myfile`; Remember, chop is indiscriminate,


chop @lines; it always removes something, so
you’re supposed to know that the
chop($cwd = `pwd`); last character on a line is “\n”.
chop($answer = <STDIN>);

$answer = chop($tmp = <STDIN>); # WRONG What is in $answer?

$last_char = chop($var);

while (<PASSWD>) { chomp is more discriminating, it


chomp; # avoid \n on last field will only remove the last character
@array = split /:/; if it’s a “\n”.
...
} You could also do s/\n$//; which is
explicit.

You almost always want to use chomp() and not chop().

chop() always returns the character it removes. If you chop() a list, then every
item in the list is chopped. The thing which ends up in $answer in the question on
the slide is the character which was removed from the string $tmp. The thing you
probably wanted was $tmp.

chomp() is discriminating, and although by default it always removes the last


character on a line only if that character is “\n”, the default can be overridden. The
character (or string) which is removed is that contained in the Perl variable $/. So
chomp() can remove any arbitrary length string from the end of an input string.

chomp() returns the number of characters it deleted - not the characters


themselves.
PREVIEW - Examples Of hex() And oct()

$number = hex("ffff12c0");
sprintf uses the same
sprintf "%lx", $number; # (That's an ell, not a one.) conventions as C’s sprintf.

perl -e 'print 0xffdc;' A neat command line alternative


when you need a quick conversion.

Does $val start with an “0” (as


$val = oct $val if $val =~ /^0/; opposed to “0x” or “0b”).
$perms = (stat("filename"))[2] & 07777;
$oct_perms = sprintf "%lo", $perms;

Note that you can always set the value of any variable with a hex value just by doing
this:

$h_number = 0xffdd;
print $h;

The hex() function is interpreting a string as a hex number, not a value. If the string
begins with “0x”, this is ignored. To do a reverse conversion use sprintf() as
shown.

Hex strings can only represent integers. Strings which would cause integer overflow
will trigger a warning.

oct() will interpret a string as an octal value. If the string starts with “0” it will be
interpreted as octal. If the string starts with “0x” it will be interpreted as a hex
value. If it begins with “0b” it will be interpreted as a binary value.

Try this:

perl -e ‘print 0b11001001;’ # Is anyone (apart from me) sad


enough to know from what 80’s/90’s TV series this was an
episode title.
Programming With Perl

September 2005

Notes:
Getting Started

 For many programming tasks you’d like a language in which you can say:

print "Hello World!\n"

and expect the language to do just that.

 Perl is such a language.


 Some important points …
 This course is an overview.
 We’re going to cover a lot of Perl very quickly and there will be lots of examples.

 There are many slides in this course which have this symbol in the top left
corner of the slide. All such slides are gathered together into a single
document called “How-to.pdf” in your labs and exercises directory.

This is a minimal (and complete) Perl program, but it illustrates some important
points.

1. You don’t have to say much before you say what you want to say.

2. You don’t have to say much after you’ve said what you want to say either. Unlike
many languages, Perl thinks it’s okay that you just fall off the end of your
program. You may use the exit() function to end a program (actually, you
should use the exit() function to end a program) just as you may force yourself to
pre-declare variables before you use them (actually …) It’s up to you!

Here are a few important points:

1. The \n at the end of the print statement is a newline.

2. All statements are terminated by a semi-colon.

LAB1 - HELLO_1
Variables, Arrays & Lists, Hashes

Notes:
Variables And Their Syntax

 A variable is a handy place to keep something:


 A place with a name.
 Might be private of public. This
This is
is what
what computer
computer
 Might be temporary or permanent. scientists
scientists call
call scope
scope

 We’ll learn about scope later (or look up my our local).

 A variable is distinguished by the sort of data it holds:


 Singular - one thing - strings and numbers.
 Plural - many things - lists of strings or lists of numbers (or both).

 We call a singular variable a scalar.


 We call a plural variable an array.

These are the two fundamental data types in Perl. One of a thing, and more than one
of a thing.

We call a singular variable a scalar.

We call a variable which contains more than one thing, either an array/list or an
associative array/hash.
Variables And Their Syntax

 We can write a different version of our first example (in the getting started section)
like this:

$phrase = "Hello, world!\n"; # Set a variable.


print $phrase; # Print the variable.

 We didn’t have to predefine what type of variable $phrase was.


 The $ character tells Perl that phrase is a scalar.

 Perl has some other variable types with names like hash and handle and typeglob.

Later we’ll see that it is a good idea to force yourself to predefine variables before you
use them (using my()).

Hash and handle we’ll cover later. Typeglob won’t be covered in this course.

LAB1 - HELLO_2
LAB1 - HELLO_3
LAB1 - HELLO_4
Variables And Their Syntax

Type Character Example Is a name for:


Scalar $ $pounds An individual value (number or string)
Array @ @large A list of values keyed by number
Hash % %interest A group of values keyed by a string
Subroutine & &how A callable chunk of Perl code
Typeglob * *struck Everything named struck

Tips:

The $ for scalar is a stylized S.


The @ for array is a stylized A.

Sadly the analogy breaks down after that.

We’ll cover subroutines in detail later in the course. Typeglob won’t be covered in this
course.
Variables And Their Syntax
Construct Meaning
$days Simple scalar value of $days
$days[28] 29th element of @days
$days{‘Feb’} "Feb" value from hash %days

Construct Meaning

@days Array containing($days[0], to ,$days[n])

@days[3,4,5] Array slice containing ($days[3],$days[4],$days[5])

@days[3..5] Array slice containing ($days[3],$days[4],$days[5])

@days{‘Jan’,’Feb’} Hash slice containing ($days{‘Jan’},$days{‘Feb’})

Quiz: What’s the value in $days after this has run?

my @days = qw( Monday Tuesday Wednesday Thursday Friday Saturday Sunday );


my $days = @days;

Review:

Scalars store a single variable - all scalars are prefixed by $.

Arrays store many variables. Arrays start with @ or %.

@ arrays are accessed by index - % arrays (hashes) are accessed by a string.

Note that the range operator (..) has made an appearance. So 1 .. 20 will give you all
the integers between 1 and 20 inclusive. We’ll talk more about the range operator
later.

In the quiz example we’ve introduced a lot of new stuff. qw (think of this as quote-
word) lets you use Barewords to create lists. This whole example is an illustration of
context - the value of $days after the example has run is ?
Numeric Literals

$x = 12345; # integer
$x = 12345.67; # floating point
$x = 6.02e23; # scientific notation
$x = 4_294_967_296; # underline for legibility
$x = 0377; # octal
$x = 0xffff; # hexadecimal
$x = 0b11000000; # binary

You can’t use “,” in numbers since in Perl the , is an operator - so we use _ instead.

Octal numbers are prefixed with 0 (that’s zero).

Hex numbers are prefixed 0x (that’s zero x).

Binary numbers are prefixed by 0b (that’s zero b).


Variables Types - Scalars

 Scalars are assigned a new value with the = operator.


 Scalar variables can be:
 Integers.
 Floating-point numbers.
 Strings.
 References to other variables (think C pointers).
 Objects.
 Double quote marks “” do variable interpolation and backslash interpolation.
 Substitution and turning “\n” into a newline.
 Single quotes ‘’ suppress interpolation.
 Backticks `` will execute an external program and return the output in a string.

The “=“ symbol does assignment. Be careful because the “==“ symbol is used for
equality. At some point in your life you’ll accidentally confuse the two.

Double quotes do variable and backslash interpolation - Interpolation is a fancy


computer-science name for replacing a variable with the contents of that variable.
Single quotes suppress interpolation. Backticks (the ones which lean towards the left)
will execute an external program and return its output to you in the form of a string.
Variables Types - Scalars
$answer = 42; # an integer
$pi = 3.14159265; # a "real" number
$avocados = 6.02e23; # scientific notation
$pet = "Camel"; # string
$sign = "I love my $pet"; # string with interpolation
$cost = 'It costs $100'; # string without interpolation
$thence = $whence; # another variable's value
$salsa = $moles * $avocados; # a gastrochemical expression
$exit = system("vi $file"); # numeric status of a command
$cwd = `pwd`; # string output from a command

 Scalars can also hold references to data structures, subroutines and objects.
$ary = \@myarray; # reference to a named array
$hsh = \%myhash; # reference to a named hash
$sub = \&mysub; # reference to a named subroutine

$ary = [1,2,3,4,5]; # reference to an unnamed array


$hsh = {Na => 19, Cl => 35}; # reference to an unnamed hash
$sub = sub { print $state }; # reference to an unnamed subroutine

$fido = new Camel "Amelia"; # ref to an object

Variable interpolation:

$pet = “Camel”;
$sign = “I love my $pet”;
print $sign;

What do you think this will print out?

References will be covered extensively when we get to the in-depth look at Perl.
References are the key to writing efficient Perl code with subroutines, and the only
way to do OO programming.

In the example:

$hsh = {Na => 19 , Cl => 35};

the => is the same as a comma “,” - this is convenience which lets us see easily
where the keys and where the values are. (Often known as syntactic sugar).
Variables Types - Scalars

 If you use a variable which has never been assigned a value then:
 The uninitialized variable springs into existence.
 Is created with the null value - either 0 or “”.
 Depending on how you use them variables will be interpreted as:
 Strings.
 Numbers.
 True or False, i.e. boolean.
 Context - suppose you said this:

$camels = '123';
print $camels + 1, "\n";

Question: What do you think is printed in the example shown?

Answer: $camels is a string containing the text ‘123’. When Perl tries to add 1 to a
string it first converts the string containing the text ‘123’ into the number 123. It then
adds 1 and (hopefully) gets 124. This is then converted back into a string containing
the text ‘124’ which is then printed. A newline is then printed.

LAB2 - VARIABLES1
LAB2 - VARIABLES2
LAB2 - VARIABLES3
LAB2 - VARIABLES4_A
LAB2 - VARIABLES4_B

PRINTF and SPRINTF and CHOP and CHOMP

LAB2 - VARIABLES5_A, _B, _C


Variables Types - Arrays And Hashes

 Some kinds of variables hold multiple values:


 Arrays.
 Hashes.
 Like scalars, arrays and hashes spring into existence with nothing in them.
 When you assign to them they supply a list context. (We’ll look at this later)

 Arrays and Hashes differ from each other:


 Use an array to look up something by number. Arrays are always denoted with
the “@” symbol - but it’s the whole array.
 Use a hash to look up something by name. Hashes are always denoted with the
“%” symbol - but it’s the whole hash.

 What’s the difference between a list and an array?

Arrays are also called lists - the distinction is blurred - when an array is used with
subscripts it’s generally regarded as an array, when it’s used as an ordered list and
used with push() pop() shift() and unshift() it’s generally regarded as a list.

It also depends upon context as well as how you think about a particular problem.
TMTOWTDI.
Variables Types - Arrays

 An array is an ordered list accessed by a scalars position in the list.

@home = ("couch", "chair", "table", "stove");

($potato, $lift, $tennis, $pipe) = @home;

($alpha,$omega) = ($omega,$alpha);

$home[0] = "couch";
$home[1] = "chair";
$home[2] = "table";
$home[3] = "stove";

An array is an ordered list accessed by a scalars position in the list.

The list can contain numbers, strings, or a mixture of both. It can also contain
references to variables and references to objects or references to other arrays or
references to other hashes.

To assign a list value to an array you simply group the values together with “(“ and
“)”.

If you use @home in a list context (on the right side of a list assignment) you’ll get
the list back. So you could set 4 scalar variables as shown.

List assignments happen in parallel so you can swap two scalar variables as shown in
the third example.

Arrays are 0 based (as in C) so while the list contains 4 elements the elements are
numbered 0 to 3.

Array subscripts are enclosed in “[“ and “]” so an individual element is referred to as
$home[n]. Since the element is a scalar (a single thing) it is preceded by $.
Variables Types - Arrays

 Examples:
1: @stuff = ("one", "two", "three");
2: $stuff = ("one", "two", "three");

3: @stuff = ("one", "two", "three");


$stuff = @stuff;

4: @x = (@stuff,@nonsense,funkshun())

5: @releases = (
"alpha",
"beta",
"gamma",
);

6: @froots = qw(
apple banana carambola
coconut guava kumquat
mandarin nectarine peach
pear persimmon plum
);

Review: an array variable is able to store a series of values with each uniquely
identified by an integer known as its index. The contents of an array are accessed
collectively by giving the array name prefixed by an @.

@dwarfs = (“Happy” , “Sleepy” , “Grumpy” , “Dopey” , “Sneezy” ,


“Bashful” , “Doc”);
@deadly_sins = (“Gluttony” , “Sloth” , “Anger” , “Envy” ,
“Lust” , “Greed” , “Pride”);
print “@dwarfs never commit @deadly_sins\n”;

In the examples shown:

1: The array contains three items.


2: What does $stuff contain ?
3: What does $stuff contain ?
4: What does @x contain ?
5: What does that last “,” do ?
6: But look, we can do away with “,” entirely as long as the list items do not contain
white-space.
List Assignment

 Examples:

1: my ($a, $b, $c) = (1, 2, 3);

2: my ($map{red}, $map{green}, $map{blue}) = (0xff0000, 0x00ff00, 0x0000ff);

3: my ($dev, $ino, undef, undef, $uid, $gid) = stat($file);

4: my ($a, $b, @rest) = split;


my ($a, $b, %rest) = @arg_list;

5: while (($login, $password) = getpwent) {


if (crypt($login, $password) eq $password) {
print "$login has an insecure password!\n";
}
}

@days + 0; # implicitly force @days into a scalar context


scalar(@days) # explicitly force @days into a scalar context

1: Parallel assignment of three scalars.

2: Parallel assignment of three scalars - which are values in a hash.

3: If you don’t want some of the things returned in a list, throw them away by
undef’ing them.

4: Here we take $a and $b from the list and then the rest of the list goes into @rest.
Here’s an important principle - the first list in the list (so to speak) gets everything
else in the list! In the next example $a and $b get the first two values from
@arg_list and then the hash %rest gets everything else. There’s an issue here
concerning how many items are left in the list before it’s assigned to the hash %rest -
the length of the list needs to be a multiple of 2.

The last two examples show how you can force things into scalar context - the
scalar() function is one way.
List And Array Examples

 Examples:

# Stat returns list value.


$modification_time = (stat($file))[9];

# SYNTAX ERROR HERE.


$modification_time = stat($file)[9]; # OOPS, FORGOT ()

# Find a hex digit.


$hexdigit = ('a','b','c','d','e','f')[$digit-10];

# Get multiple values as a slice.


($day, $month, $year) = (localtime)[3,4,5];

Note: lists grow dynamically, so you can have a 4 element list like this:
my @list = qw( fred barney wilma betty );
and say this:
$list[656] = "dino";
And Perl will create all the intervening array slots for you (they will all have the value
undef).

If you create a big array and you’d later like to delete it (to save on memory perhaps)
then you can do this:
my @big_array = (); # create the array
@big_array = <SOME_FILE>; # load a ton of stuff into it"
@big_array = undef; # delete the array

If you want to remove all the entries in an array without undef’ing it and then
recreating it, then just do this:

@my_array = ();

The same works for hashes as well - to empty a hash just do this:

%my_hash = ();
Variables Types - Arrays

 Since arrays are ordered you can do useful operations on them such as;
 Stack operations:
 push()
 pop()
 shift() shift and unshift push and pop
 unshift() work here. work here.

 Example:
@home = ( "go", "where" , "no", "one" , "has" , "gone" );

push( @home , "before" );


unshift( @home , "boldly" );
unshift( @home , "To" );

$first = shift( @home );


$last = pop( @home );
print "First = $first and last = $last\n";

Perl regards an array as an ordered list. The end of the array (i.e. the right-hand part
of the list) is considered the top of the stack. push() and pop() work on the top of
the stack.

shift() and unshift() work on the other end of the stack. shift() takes one element
from the start of a list, unshift puts a new element at the start of the list.

What do you think is printed on the last line of the example?

What does the list @home contain once the example has been run?
How Do I … Specify A List In A Program?

 You want to include a list in your program.

@a = ("quick", "brown", "fox"); A comma separated list

@a = qw( Why are you bugging me? ); Use qw() if you have a lot of
Single-word elements
@bigarray = (); Use something like this if you want
open(DATA, "< mydatafile") to read a list from a file
or die "Couldn't read from datafile: $!\n";
while (<DATA>) {
chomp;
push(@bigarray, $_);
}

$banner = 'The Mines of Moria'; Use the quoting operators. These


$banner = q(The Mines of Moria); two lines are equivalent. q() is the
same as single quotes
$name = "Gandalf"; Use the quoting operators. These
$banner = "Speak, $name, and enter!"; lines are equivalent. qq() is the
$banner = qq(Speak, $name, and welcome!); same as double quotes

More info: See The Perl Cookbook, section 4.1 Page 91.
How Do I … Specify A List In A Program?

 You want to include a list in your program.

$his_host = 'www.perl.com'; Backticks


$host_info = `nslookup $his_host`; # expand Perl variable

$perl_info = qx(ps $$); # that's Perl's $$ qx()


$shell_info = qx’ps $$'; # that's the new shell's $$

@banner = ('Costs', 'only', '$4.95'); These 3


@banner = qw(Costs only $4.95); are
@banner = split(' ', 'Costs only $4.95'); identical

@banner = qw|The vertical bar (\|) looks and behaves like a pipe.|; Different
quoting
character

More info: See The Perl Cookbook, section 4.1 Page 91.

qx() and backticks are not exactly the same. Backticks do not stop variable
interpolation while qx() does. If you don’t want Perl variables to be expanded then
you can use a single-quote delimiter on qx() to stop this.

q(), qq() and qx() quote single strings. qw() quotes a list of single word strings
by splitting its argument on whitespace without variable interpolation.

If you don’t want to change the quoting character, use a backslash to escape the
delimiter in the string.
How Do I … Change The Size Of an Array?

 You want to enlarge or truncate an array.

# grow or shrink @ARRAY


$#ARRAY = $NEW_LAST_ELEMENT_INDEX_NUMBER Solution: Assign to $#ARRAY

$ARRAY[$NEW_LAST_ELEMENT_INDEX_NUMBER] = $VALUE;

$#ARRAY is the number of the last element in @ARRAY

If you assign it a number smaller than its current value


then the array is truncated. Truncated elements are lost.

If you assign it a number bigger than its current value


then the array grows. All new elements have the value
undef.

$#ARRAY is not equal to @ARRAY (or scalar( @ARRAY) ).

More info: See The Perl Cookbook, section 4.3 Page 95.
How Do I … Swap Values Without Using
A Temporary Variable?
 You want to exchange the values of two variables, but don’t want to use a
temporary variable.

($VAR1, $VAR2) = ($VAR2, $VAR1); Solution

$temp = $a;
$a = $b; Normally you would
$b = $temp; do something like
this (say in C)
($alpha, $beta, $production) = qw(January March August);
# move beta to alpha,
# move production to beta, You can swap more
# move alpha to production than two things at
($alpha, $beta, $production) = ($beta, $production, $alpha); a time

More info: See The Perl Cookbook, section 1.3 Page 8.

Most programming languages require you to use a temporary variable when swapping
two variables values. Perl however will track both sides of the assignment and
guarantees that you won’t accidentally clobber any of your values. This lets you
eliminate the temporary variable.

You can also exchange more than two variables at once.


How Do I … Append One Array To Another?

 You want to join two arrays together by adding all the items of one to the end of
the other.
push(@ARRAY1, @ARRAY2); Solution: Use push()

@ARRAY1 = (@ARRAY1, @ARRAY2); Solution: List flattening

@members = ("Time", "Flies");


@initiates = ("An", "Arrow");
push(@members, @initiates);
# @members is now ("Time", "Flies", "An", "Arrow")

splice(@members, 2, 0, "Like", @initiates); Add new elements into a


print "@members\n"; list using splice()
splice(@members, 0, 1, "Fruit");
splice(@members, -2, 2, "A", "Banana");
print "@members\n";

This is output:
Time Flies Like An Arrow
Fruit Flies Like A Banana

More info: See The Perl Cookbook, section 4.9 Page 108.

Push() is optimised for appending a one array to another.

If you use list flattening beware that this takes more memory and is slower.

If you want to insert elements of one array into the middle of another, use
splice().

The splice() function:

We’ve already seen push, pop, shift and unshift. They are all examples of a generic
function called splice(). The splice function takes four arguments: an array to be
modified, the index at which it is to be modified, the number of elements to be
removed (starting at the index specified in the previous argument), and a list of extra
elements to be inserted at the index (after the previous elements are removed). The
function returns a list of the elements which are removed.
List Flattening

 Contrary to what you might expect:

@virtues = ( “Faith” , “Hope” , ( “Love” , “Charity ) );

 This doesn’t produce a hierarchical list of three elements where the third element is
itself a two-element list.
 Each element of a list must be a scalar, not another list.
 Above example is actually the same as:
@virtues = ( “Faith” , “Hope” , “Love” , “Charity );

 It is easy to make a hierarchical list in Perl - see references.

LAB3 - ARRAYS_1
LAB3 - ARRAYS_2
LAB3 - ARRAYS_3
LAB3 - ARRAYS_4
LAB3 - ARRAYS_5
Pick Your Own Quotes

Customary Generic Meaning Interpolates


‘’ q// Literal string No
"" qq// Literal string Yes
`` qx// Command execution Yes
() qw// Word list No
// m// Pattern match Yes
s/// s// Pattern substitution Yes
y/// tr// Character translation No
"" qr// Regular expression Yes

$single = q!I said, "You said, 'She said it.'"!;

$double = qq(Can't we get some "good" $variable?);

Some of these forms are syntactic sugar which allow you to not put lots of formatting
in strings (which might be confusing and lead to mistakes).

In the first example we’ve used ! As the quote mark, which means we can freely use “
and ‘ in the text string we wish to build. We could have used our normal quotes and
escaped the “ and ‘ quotes inside the string, but it would have been very hard to
read.

Any character in a string which might be otherwise interpreted as a controlling


character, can always be included in a string by escaping it - i.e. if we want to put a “
in a double-quoted string, we can always do this by writing the “ inside the string as
\”.

\ followed by {any character} is the same as {any character}.


Variables Types - Hashes

 Hashes are arrays accessed by a string.


 Hashes are also called associative lists.
 push() and pop() and shift() and unshift() have no meaning for hashes.
 A hash has no beginning and no end.
@home %longday

1 2 3 4 Sat
Couch Chair Table Stove Saturday
Thu
Tue Thursday
Tuesday
Fri
Mon Friday
Monday
Wed
Sun Wednesday
Sunday
 The % character is used to mark hash names.

Hash keys are not automatically implied by their position. In fact the concept of
position has no meaning for a hash. (And as we will see later, this means that you
can’t use foreach on a hash to loop over all the things in the hash).

You must supply a key as well as a value when populating a hash.

You can assign a list to a hash (just like an array) but pairs of items from the list will
be interpreted as key/value pairs in the hash. So you can say this:

@list = ( “Sat” , “Saturday” , “Sun” , “Sunday” , etc , “Fri” , “Friday” );


%hash = @list;

This is the same as:

%hash = ( “Sat” => “Saturday” , “Sun” => “Sunday” , etc , “Fri” => “Friday” );
Variables Types - Hashes

 %longday could be declared like this:

%longday = ("Sun", "Sunday", "Mon", "Monday", "Tue", "Tuesday",


"Wed", "Wednesday", "Thu", "Thursday", "Fri",
"Friday", "Sat", "Saturday");

 This is hard to read, so Perl provides => as an alternative to the comma.

%longday = (
"Sun" => "Sunday",
"Mon" => "Monday",
"Tue" => "Tuesday",
"Wed" => "Wednesday",
"Thu" => "Thursday",
"Fri" => "Friday",
"Sat" => "Saturday",
);

As in the example from the previous slide - suppose you wanted to translate
abbreviated days names to their corresponding full names. You could write the list
assignment as shown in the top box.

This is visually noisy, so Perl provides the => (comma operator) so that with a bit of
creative formatting the same statement can be written as shown in the second
example.

Remember - Hashes have no order to them - all accessing is done via the keys. Do
not try to use foreach to loop over the values in a hash.
Variables Types - Hashes

 Hashes are still an array full of scalars.


 Select an individual hash element using { and }.
 Example - the value associated with “Wed” in our example is:
$longday{ “Wed” };
 Note we’re dealing with a scalar value so there’s a $ on the front, not a %.
 Example:
 Suppose we have a hash called %wife.

The name of the hash Since this is a hash


We need { and }

$wife{”Tony"} = ”Cherie";

A scalar so $ The key The value

You can assign a list to a hash - see our previous examples - each pair of items in the
list is taken as (respectively) a key and a value.

You can assign a hash to a list. If you do then it’ll convert the hash into a list of
key/value pairs.

Often we use:

The keys() function to extract a list of just the keys.


This list will also be unordered (the respective keys won’t be in a list in the same
order that they were entered into the original hash) but can be sorted using sort().

Remember a single element of a hash is still a scalar - so it is always prefixed by a $


and not a %. The % refers to the whole hash and not to individual elements. You also
need to use “{“ and “}”.

It is generally true that things don’t come back out of a hash in the same order that
they go in (if say, you get all the keys back out with the keys() function). Do not try
to use push(), pop(), shift() or unshift() with hashes. They don’t work -
remember, position in a hash has no meaning.
Functions Which Work With Hashes

 A limited set of functions work with Perl hashes:


 keys List all the keys in a hash
 values List all the values in a hash
 each Used to iterate key/value pairs
 exists Tells you whether a hash key exists
 delete Deletes a hash key/value pair.

my @keys = keys( %my_hash );


my @values = values( %my_hash );

while ( my ( $key , $value ) = each %my_hash )


{
print $key . " " . $value . "\n";
}

In the same way that an array can be deleted by assigning it with undef, so can a
hash. So to delete a hash, do this:

%my_hash = undef;

If however you just want to remove all the entries in the hash without undef’ing it
and then recreating it, then just do this:

%my_hash = ();

i.e. assign the empty list to the hash.


How Do I … Create A Hash?

 You want to create and populate a hash with key/value pairs.


%age = ( "Nat", 24, Solution: A hash can initialised with a list
"Jules", 25, where each pair if values in the list being
"Josh", 17 ); interpreted as a key/value pair.

$age{"Nat"} = 24; This is the same as above


$age{"Jules"} = 25;
$age{"Josh"} = 17;

%food_color = ( You can also use the comma operator =>


"Apple" => "red", to initialise a has like this.
"Banana" => "yellow",
"Lemon" => "yellow",
"Carrot" => "orange"
);

%food_color = ( The => operator automatically quotes


Apple => "red", anything on its left, so you can omit the
Banana => "yellow", quotes on the keys
Lemon => "yellow",
Carrot => "orange"
);

More info: See The Perl Cookbook, section 5.0 Page 129.

Solution: assign a list of pairs of items to the hash. You can also use the => operator
to do the same thing - it visually easier to see what is happening and where the
key/value pairs are located in the list.

Using => will automatically quote what’s on its left.

Single-word hash keys are also automatically quoted, so you can write
$hash{“somekey”} as $hash{somekey}.

Hashes are stored in an order which is convenient for the implementation of hashes,
which means that the extraction order is not the same as the insertion order.
How Do I … Add An Element To A Hash?

 You need to add an element to a hash.


$HASH{$KEY} = $VALUE; Solution: Simply add a new entry like this

More info: See The Perl Cookbook, section 5.1 Page 130.

Solving this problem is easy - just add any new entry as shown. Perl will take care of
all memory management for you, and just as with arrays and lists, you don’t need to
worry about overflow.

If you use undef as a hash key it will be turned into the empty string “”.

If you try to get a value for a key which isn’t in the hash you’ll also get undef, so you
can’t simple use if $hash{key} to see if a key exists. You need to use
exists($hash{key}) to test whether the key is in the hash,
defined($hash{key}) to see if it is or is not undef, and if($hash{key}) to
test it for true or false.
Hashes

 Remember - a hash is just an array where things are looked up by name.


 If you assign a list to a hash - pairs of items become key/value associations.

%map = ('red',0xff0000,'green',0x00ff00,'blue',0x0000ff);

%map = (); # clear the hash first


$map{red} = 0xff0000;
$map{green} = 0x00ff00;
$map{blue} = 0x0000ff;

%map = ( red => 0xff0000, green => 0x00ff00, blue => 0x0000ff,
);

$field = radio_group(
NAME => 'animals',
VALUES => ['camel', 'llama', 'ram', 'wolf'],
DEFAULT => 'camel',
LINEBREAK => 'true',
LABELS => \%animal_names,
);

The => operator has the nice side effect of quoting anything on its left, so we can
leave the quotes off red, green, blue in the third example. The value on the right of
=> will still need quotes if it is a character string.

The last example uses named parameters to invoke complex functions.

The hash when it’s initialized, is done in some order. The values generally don’t come
back out in the order they went in.

You can’t use scalar( %hash ) (or even use %hash in scalar context) to find out how
many things are in the hash. If you want to know that, use:

scalar( keys( %hash ) ); or scalar( values( %hash ) );

LAB4 - HASH_1
LAB4 - HASH_2
LAB4 - HASH-3
LAB4 - HASH_4
LAB4 - HASH_5
Array And Hash Slices

 Slicing an array:

print $tragedy[3] , $tragedy[4] , $tragedy[5];

print @tragedy[3,4,5]
These are equivalent

Note: [ and ]
 Slicing a hash:

print ($sound{cat} , $sound{goldfish} , $sound{dog} , $sound{whale} );

print @sound{ “cat” , “goldfish” , “dog” , “whale” };

These are equivalent Note: { and }

Slicing an array:

The things in the array slice are not copies - they are the same elements. So
assigning to the array slice is also assigning to the original array elements. (The same
is also true for a hash slice).

The slice is a list (hence the @) and the brackets are [ and ].

Slicing a hash:

The values() function returns hash values in an apparently random order, so to create
a list of values from a hash with a specific order we often have to do something
similar to what is shown in the example. Instead of putting a single key in the curley
braces, we put a list of keys in the curley braces.

The slice is a list (hence the @ and NOT a $ or a %) and the brackets are { and }.
Scalar And List Context

 Examples:

$x = funkshun(); # scalar context Funkshun() should


$x[1] = funkshun(); # scalar context always figure out
$x{"ray"} = funkshun(); # scalar context what it is supposed
to return.
@x = funkshun(); # list context
@x[1] = funkshun(); # list context
@x{"ray"} = funkshun(); # list context
%x = funkshun(); # list context

($x,$y,$z) = funkshun(); # list context


($x) = funkshun(); # list context

my $x = funkshun(); # scalar context


my @x = funkshun(); # list context
my %x = funkshun(); # list context
my ($x) = funkshun(); # list context

The first three examples are all evaluated in scalar context.

The second set of examples are all evaluated in list context - even if the assignment
only picks out a single value from such a list.

The rules don’t change when using my to force ourselves to declare variables.

A well designed function can figure out what context it’s been called in (using
wantarray) and return what is appropriate.

The wantarray function is used like this:

If wantarray
{
return @an_array;
}
else
{
return $a_scalar;
}
Variables Types - Simple Data Structures

 Arrays and Hashes are simple, flat data structures.


 How do we build more complex data structures?
 Here’s the wrong way and the right way to do it:

$wife{"Jacob"} = ("Leah", "Rachel", "Bilhah", "Zilpah"); # WRONG

$wife{"Jacob"} = ["Leah", "Rachel", "Bilhah", "Zilpah"]; # RIGHT

 Once this is done you can refer to individual elements like this:

$wife{"Jacob"}[0] = "Leah";
$wife{"Jacob"}[1] = "Rachel";
$wife{"Jacob"}[2] = "Bilhah";
$wife{"Jacob"}[3] = "Zilpah";

Sometimes you need to build not-so-lovely and not-so-simple data structures. Perl lets
you do this by pretending that complicated values are really simple ones.

We want $wife{“Jacob”} to refer to a single thing (it’s a scalar) so it must refer to a


Perl reference, and a reference to a list is created using [ and ] and not ( and ). We
are telling Perl to pretend that a whole list is in fact a scalar. The statement creates
an anonymous array (i.e. and array without a name) and puts a reference to it into
the hash element $wife{“Jacob”}. This is how Perl deals with both multi-dimensional
arrays and nested data structures.

You can see in the second example how this looks like a multi-dimensional array with
one string subscript and one numeric subscript.

We’ll discuss this is more detail tomorrow … This example (and the one on the
following page) are here to demonstrate that making complex data structures is easy.
Variables Types - Simple Data Structures

 Example:
$kids_of_wife{"Jacob"} = {
"Leah" => ["Reuben","Simeon","Levi","Judah","Issachar","Zebulun"],
"Rachel" => ["Joseph","Benjamin"],
"Bilhah" => ["Dan","Naphtali"],
"Zilpah" => ["Gad","Asher"],
};

$kids_of_wife{"Jacob"}{"Leah"}[0] = "Reuben";
$kids_of_wife{"Jacob"}{"Leah"}[1] = "Simeon";
$kids_of_wife{"Jacob"}{"Leah"}[2] = "Levi";
$kids_of_wife{"Jacob"}{"Leah"}[3] = "Judah";
$kids_of_wife{"Jacob"}{"Leah"}[4] = "Issachar";
$kids_of_wife{"Jacob"}{"Leah"}[5] = "Zebulun";
$kids_of_wife{"Jacob"}{"Rachel"}[0] = "Joseph";
$kids_of_wife{"Jacob"}{"Rachel"}[1] = "Benjamin";
$kids_of_wife{"Jacob"}{"Bilhah"}[0] = "Dan";
$kids_of_wife{"Jacob"}{"Bilhah"}[1] = "Naphtali";
$kids_of_wife{"Jacob"}{"Zilpah"}[0] = "Gad";
$kids_of_wife{"Jacob"}{"Zilpah"}[1] = "Asher";

Suppose we not only wanted to know the names of Jacob’s wives, but also the names
of all sons of all his wives.

In this case we want to treat a hash as a scalar - we use { and } for that. Now we
have an array in a hash in a hash.

Adding another level to a nested data structure is like adding another dimension to a
multi-dimensional array. The important point is that Perl lets you pretend that
something which is complex is a simple scalar.

Perl’s whole object oriented structure is built upon this kind of encapsulation.

Again, we’ll discus this in detail tomorrow.


Variable Types - Packages

 Why use packages?


 Use other peoples code.
 Let’s us split up our own code into manageable units.
 Is the basis for the whole of Perl’s OO system.
 Ensures that our code (subroutine & variable names) do not clash with imported
code.

# This file is Matrix.pm # This file is Solve.pm


# This is our code
...
use Matrix;
sub print_me
{ ...
# Code to print out a matrix
} sub print_me
{
# Code to print out an equation
}

Packages are a way of splitting up your code. They are roughly equivalent to
C/Spice/Verilog .include statements.

Suppose we pick up Matrix.pm from somewhere - it has a subroutine called print_me.


We import Matrix.pm and
We also have a subroutine called print_me which does something completely
different. When we want to call print_me, which subroutine do we call?
Variables Types - Packages

 Suppose you want to talk about matrices.


 You would start off by saying this in Matrix.pm:

package Matrix;

 The effect of this is that from this point onwards any global name in Matrix.pm will
be prefixed by Matrix::
 So if you say:

package Matrix;
$result = &print_me();

 Then the real name of $result is $Matrix::result and the real name of
&print=me() is &Matrix::print_me()

In computer-science, and in Perl, each of these packages establishes a “namespace”.


You can have as many namespaces as you want but you’re only ever in one at a time.

If we don’t use a package declaration in our program then the default name is
“Main::” This means that the previous example will work since print_me() in
Matrix.pm is really &Matrix::print_me() while print_me in solve.pm is really
&Main::print_me(). {We would be better off in Solve.pm using a declaration like
package Solve; - what would the &print_me() subroutine be called then?}

Code which is brought into a program like this with a use command, is also called a
module. The standard is to name the module with the same name as the package it
contains (but with an initial uppercase letter) and with a .pm filename suffix. Thus the
code for package Matrix; would be contained in a file called Matrix.pm

The nice thing about Perl is that there are a *lot* of packages “out there” that you
can use to solve all sorts of problems.
Variables Types - Pragma’s

 In the previous section we used the “use” command to load in some new code (a
module).
 Some of the built-in modules in Perl don’t add code. Rather they change the way
that the language behaves.
 These special modules are called pragmas.
 Example:
use strict;

Pragma’s change the way the language works. In the example shown, it tightens up
on some of the rules which Perl uses by default and requires the programmer to be
explicit. This example would require that you predefine all your variable names - this
is usually a good thing - see the section on style in about five minutes time.
How Do I … Round Floating-Point Numbers?

 You want to round a floating-point number to a certain number of decimal places.

$rounded = sprintf("%FORMATf", $unrounded); General solution - use sprintf (or


printf).
$a = 0.255;
$b = sprintf("%.2f", $a);
print "Unrounded: $a\nRounded: $b\n";
printf "Unrounded: $a\nRounded: %.2f\n", $a;

Unrounded: 0.255
Rounded: 0.26
Unrounded: 0.255
Rounded: 0.26

More info: See The Perl Cookbook, section 2.4 Page 46.

The “f” argument in sprintf will let you specify how many decimal places the
argument should be rounded to. Perl looks at the next digit in the number, rounds it
up if it is 5 or greater, or down otherwise.
How Do I … Compare Floating-Point Numbers?

 You want to compare floating-point numbers to know if they’re equal to a certain


level of significance.
# equal(NUM1, NUM2, ACCURACY) : returns true if NUM1 and NUM2 are
# equal to ACCURACY number of decimal places

sub equal {
my ($A, $B, $dp) = @_;

return sprintf("%.${dp}g", $A) eq sprintf("%.${dp}g", $B);


}

More info: See The Perl Cookbook, section 2.2 Page 45.

Floating-point arithmetic isn’t precise so you should never do a direct comparison


using “==“. The solution is to turn the floating-point numbers into strings using
sprintf and then compare those strings.

Alternatively use a large multiplier on both numbers (like 1000000), turn that result
into an integer and then use “==“, but this demands that you have some idea of the
magnitude of the numbers before you start. If the number of decimal places is fixed
this make this latter solution easier.
How Do I … Convert Binary And Decimal Numbers?

 You have an integer whose binary representation you would like to print out, or a
binary number which you would like to print as an integer.
sub dec2bin {
my $str = unpack("B32", pack("N", shift));
$str =~ s/^0+(?=\d)//; # otherwise you'll get leading zeros
return $str;
}

sub bin2dec {
return unpack("N", pack("B32", substr("0" x 32 . shift, -32)));
}

$num = bin2dec('0110110'); # $num is 54


$binstr = dec2bin(54); # $binstr is 110110

More info: See The Perl Cookbook, section 2.3 Page 48.

You can’t solve either problem with sprintf since it doesn’t have a “print in binary”
format. So we use pack and unpack for manipulating strings of data. Both the pack
and unpack functions take arguments which specify what they should do with their
arguments.
How Do I … Control Case?

 A string in uppercase needs converting to lowercase, or vice-versa.


Obey the
use locale; # needed in 5.004 or above language
environment
$big = uc($little); # "bo peep" -> "BO PEEP"
$little = lc($big); # "JOHN" -> "john" Use functions
$big = "\U$little"; # "bo peep" -> "BO PEEP" Use string
$little = "\L$big"; # "JOHN" -> "john" escapes

$big = "\u$little"; # "bo" -> "Bo" Use string


$little = "\l$big"; # "BoPeep" -> "boPeep" escapes

Note: Lowercase u & l Transform just the first letter of a word

Note: Uppercase U & L Transform the whole word

# You can do case insensitive string comparisons like this:

if (uc($a) eq uc($b)) {
print "a and b are the same\n";
}

More info: See The Perl Cookbook, section 1.9 Page 19.

The two ways of doing the conversions (functions and string escapes) look different,
but do the same thing. You can set the case of either the first character or the whole
word.

The use locale directive tells the Perl case conversion functions and pattern matching
engine to respect your language environment, allowing for languages with umlauts,
accent marks, cedillas and other diacritics used in many languages.

You can also use the case conversion functions and pattern matching to do case
insensitive string comparisons.
How Do I … Find Out Today’s Date?

 You need to find out the year, month and day values for today’s date.

($day, $month, $year) = (localtime)[3,4,5];


printf("The current date is %04d %02d %02d\n", $year+1900, $month+1, $day);
# prints - The current date is 2005 08 08

# Could also have been written - ($day, $month, $year) = (localtime)[3..5];

use Time::localtime;
$tm = localtime;
($DAY, $MONTH, $YEAR) = ($tm->mday, $tm->mon, $tm->year);

This is an object-oriented version of localtime().

More info: See The Perl Cookbook, section 3.1 Page 73.

Solution - use localtime() and extract the information you want from the list it
returns.

Or, use Time::localtime which overrides locatime() to return a Time:tm


object. You can then use the inbuilt method calls of the Time::localtime object to
get the values you want.
Style, File Handles & Operators

Notes:
Running Perl Programs And Scripts

 If you’re doing something simple - this will work:

% perl -e ‘print "Hello World!\n";’

 For longer scripts put the code into a file and say this:

% perl grading

 The most convenient way is to make the file executable and ensure this line is at
the top of the file:

 #!/usr/local/bin/perl -w

% grading

% at the start of the following lines is the Unix shell prompt.

% perl -e : You’re basically trying to cram everything onto one line.


% perl grading : Feed the program explicitly to Perl.
% grading : Let the shell call Perl to run the script.

Useful tip - never just use this at the top of your file to invoke Perl:

#!/usr/local/bin/perl

But rather use this instead:

#!/usr/local/bin/perl -w

This will turn on lots of warning messages.


Good Programming Practice
#!/usr/local/bin/perl -w

use lib "/a/unix/path/to/my/Perl/Modules";

# Pull in some modules

use strict;
use Netlist_Functions;

# Define a constant

use constant PI => 3.141562953589793;

# Create some variables

my @args = ();
my $flag = TRUE;

# ALL YOUR PROGRAM CODE GOES HERE

exit 0;

# Put all your subroutines here

A more extensive version of this template can be found in the tutorial area and in
your notes.

Note: Once you “use strict;” all your variable will have to be defined like this:

my $variable;

Or

my $variable = 56;

You’ll get compile time errors if you don’t use my. Perl will also tell you about
variables you define and never use.

For any programs other than one-liners, ALWAYS use a methodology like this - it will
save you lots of time in debugging applications. We’ll talk more about strict later.
Style Guidelines

 See the separate document provided with the course notes.


 Here’s a brief summary:
 Enable warnings with “#!/usr/local/bin/perl -w” or use warnings;
 Use “use strict;”
 Use “==” for numeric tests and eq for string tests.
 Don’t confuse “==” and “=”.
 Don’t confuse “=” and “=~”.
 Use a consistent indent when writing code.
 Use consistent bracket matching.
 Never, ever use “goto”.
 Don’t use printf when print will do - which is nearly always.
 Use comments - lots of comments.
 Document your code.

Note that there’s a complete style guide included in the course notes. There’s also a
separate style presentation later in the course.
Filehandles

 A filehandle is a name given to a file, device, socket or pipe.


 Filehandles hide the complexity of buffering from your program.
 They also provide a symbolic name.
 You create a filehandle using the open() function.
 Open() needs two parameters:
 The filehandle.
 A filename.

 STDIN, STDOUT and STDERR are predefined for you.

 You also need to specify the behavior of the open() function.

Notes:
Filehandles

 Using open()
open(SESAME, "filename") # read from existing file
open(SESAME, "<filename") # (same thing, explicitly)
open(SESAME, ">filename") # create file and write to it
open(SESAME, ">>filename") # append to existing file
open(SESAME, "| output-pipe-command") # set up an output filter
open(SESAME, "input-pipe-command |") # set up an input filter

print STDOUT "Enter a number: "; # ask for a number


$number = <STDIN>; # input the number
print STDOUT "The number is $number.\n"; # print the number

chop($number = <STDIN>); # input number and remove newline

$number = <STDIN>; # input number


chop($number); # remove newline

You can use open to create filehandles for a variety of purposes (input, output,
piping).

Once opened the filehandle can be used to access the file or device until it is closed
with …

Using open with the same filehandle again will close the first filehandle.

Once a file is open it can be read from using the line reading operator <>. An empty
<> will read from STDIN.

What is STDOUT doing with the print statement in the second example? Since it’s the
default - you don’t need it.

The last two examples do the same thing - you’ll most frequestly see the first - this is
one of Perl’s common idioms.

Note that when you do use a filehandle with a print statement, there’s no “,” between
the print, the filehandle and the text.
How Do I … Process All The Files In A Directory

 You want to do something to each file in a particular directory.


opendir(DIR, $dirname) or die "can't opendir $dirname: $!";
Solution: Use opendir
while (defined($file = readdir(DIR))) {
to open the directory
# do something with "$dirname/$file"
and readdir to retrieve
}
all the filenames
closedir(DIR);

$dir = "/usr/local/bin";
Example: Read all the
print "Text files in $dir are:\n";
files and add on the
opendir(BIN, $dir) or die "Can't open $dir: $!";
directory path at the
while( defined ($file = readdir BIN) ) {
front of the filenames
print "$file\n" if -T "$dir/$file";
}
closedir(BIN);

More info: See The Perl Cookbook, section 9.5 Page 318.

The opendir, readdir and closedir functions operate on directories the same
way that open, close and <> operate on files. Both use handles, but the handles
used by the directory functions are different from those used by files.

In scalar context readdir returns the next filename from a directory until it runs out of
names, at which point it returns undef.

In list context it returns the rest of the filenames in a directory or an empty list if
there are no filenames left.
Operators - Arithmetic

Example Name Result


$a + $b Addition Sum of $a and $b
$a * $b Multiplication Product of $a and $b
$a % $b Modulus Remainder of $a divided by $b
$a ** $b Exponentiation $a to the power $b

You can work out subtraction and division for yourself.

You can always use ( and ) to force the order of evaulation you want.
Operators - String

 There is an addition operator for strings that performs concatenation.


 Perl uses .
$a = 123;
$b = 456;
print $a + $b; # prints 579
print $a . $b; # prints 123456

 There’s also a “multiply” operator for strings, called the repeat operator.

$a = 123;
$b = 3;
print $a * $b; # prints 369
print $a x $b; # prints 123123123

Note in the above how Perl is converting from numbers to strings as needed.

String concatenation is also implied in interpolation which occurs in double-quoted


strings.
Operators - String

 The following three statements all print the same thing.

print $a . ' is equal to ' . $b . ".\n"; # dot operator


print $a, ' is equal to ', $b, ".\n"; # list
print "$a is equal to $b.\n"; # interpolation

Of the three different ways of printing shown above, interpolation is the easiest to
understand.
Operators - Assignment

 Assignment:
$a = $b;
$a = $b + 5;
$a = $a * 3;

$a *= 3;

$line .= "\n"; # Append newline to $line.


$fill x= 80; # Make string $fill into 80 repeats of itself.
$val ||= "2"; # Set $val to 2 if it isn't already "true".

$a = $b = $c = 0; # C programmers will be familiar with this

($temp -= 32) *= 5/9;

chop($number = <STDIN>);

First three assignments are hopefully obvious

Second and third examples are op= syntax and works for all of Perl’s binary
operators.
Operators - Unary Arithmetic

 Can use something like $variable += 1 as shorthand.


 Perl also has autoincrement and autodecrement operators.

Example Name Result


++$a, $a++ Autoincrement Add 1 to $a
--$a, $a-- Autodecrement Subtract 1 from $b

 If you place the operator in front of the variable it is known as pre-increment or pre-
decrement.
 The value is changed before it is used.
 If you place the operator after the variable it is known as post-increment or post-
decrement.
 The value is changed after it is used.

If you’ve used C before this is exactly the same is pre/post increment/decrement in


that language.

$count = 3;
$limit = $count++;
print “Count=$count and Limit=$limit\n”;
Count=4 and Limit=3

or

$count = 3;
$limit = ++$count;
print “Count=$count and Limit=$limit\n”;
Count=4 and Limit=4
Operators - Unary Arithmetic

 Example:

$a = 5; # $a is assigned 5
$b = ++$a; # $b is assigned the incremented value of $a, 6
$c = $a--; # $c is assigned 6, then $a is decremented to 5

Notes:
Operators - Logical

 Also known as short-circuit operators.


 Allow the program to make decisions without using lots of “if” statements.

Example Name Result


$a && $b And $a if $a is false, $b otherwise
$a || $b Or $a if $a is true, $b otherwise
! $a Not True of $a is not true
$a and $b And $a if $a is false, $b otherwise
$a or $b Or $a if $a is true, $b otherwise
not $a Not True of $a is not true
$a xor $b Xor True if $a or $b is true, but not both

open(GRADES, "grades") or die "Can't open file grades: $!\n";

Called short-circuit operators because they skip the evaluation of rightward


arguments once they have enough information to decide an overall result.

The bottom example is from our grading program. Perl tries to open the file called
“grades”. If it succeeds then the program continues with statements which follow this
line, otherwise Perl issues an error message via the die() function and stops.

Note that this code is visually easy on the eye and the important thing which the line
it trying to do is the first thing on the line - secondary actions are off to the right of
the code.
Operators - Numeric And String Comparison

 There are two sets of operators - one for numbers and one for strings.

Comparison Numeric String Return Value


Equal == eq True if $a is equal to $b
Not equal != ne True is $a is not equal to $b
Less than < lt True if $a is less then $b
Greater than > gt True if $a is greater than $b
Less than or equal <= le True if $a is not greater than $b
Greater than or equal >= ge True if $a is not less than $b
Comparison <=> cmp 0 if equal, 1 if $a greater, -1 if $b greater

Notes:
Operators - File Test

 File test operators let you find out information about files before you blindly muck
about with them.
 Here are a few of the file test operators.

Example Name Result


-e $a Exists True if the file named in $a exists
-r $a Readable True if the file named in $a is readable
-w $a Writable True if the file named in $a is writable
-d $a Directory True if the file named in $a is a directory
-f $a File True if the file named in $a is a regular file
-T $a Text file True if the file named in $a is a text file

-e "/usr/bin/perl" or warn "Perl is improperly installed\n";


-f "/vmlinuz" and print "I see you are a friend of Linus\n";

There are a lot more operators not listed - see the Perl man pages or Programming
Perl etc.
More On Input Operators

 The command input operator ``. (Also known as backtick or qx//).


 The most heavily used input operator is <> (also called the diamond operator).
 Examples:

while (defined($_ = <STDIN>)) { print $_; } # the longest way


while ($_ = <STDIN>) { print; } # explicitly to $_
while (<STDIN>) { print; } # the short way
for (;<STDIN>;) { print; } # while loop in disguise
print $_ while defined($_ = <STDIN>); # long statement modifier
print while $_ = <STDIN>; # explicitly to $_
print while <STDIN>; # short statement modifier

All of these lines


Are equivalent

 $_ is the default variable which is used implicitly (when you’re not explicit).

You can use the backtick operator to execute any system command like this:

$info = `finger $user`; # Or - qx/finger $user/;

The command will undergo variable interpolation - so the $user gets converted into a
real user name, then the command is passed to the shell, and all output from the
shell is passed back to the command and put into the variable $info. The numeric
status of the command is stored in the Perl variable $?. If you need to pass a $
symbol to the shell then you’ll need to escape it with \, so the $user in our example is
seen by Perl and not the shell.

Be careful how you use <>. If you do this:

$one_line = <MYFILE>; # Get one line


@all_lines = <MYFILE>; # Get all lines - are you sure?

If you just use <> without a file handle, then STDIN is assumed. So:
$input = <STDIN>; and $input = <>; both do the same thing; read a line of
input from STDIN. You can use this to advantage with Perl one-liners where STDIN is
actually a pipe from a shell command like this (the $ is the shell prompt):

$ cat myfile.pl | perl -e "while (<>) { print if m/^\s*sub/;


};”
A Special Case Of Using <>

 Normally when you use the <> operator, you use it like this:

my $line = <STDIN>; # Assign explicitly to a variable

 There is one case where assignment is automatic:


 The <> operator is the only thing inside the conditional of a while() loop.
 If it is, then the input is assigned to $_.
 Used in writing Perl One-Liners.
@ARGV = ('-') unless @ARGV; # assume STDIN if empty while (<>) {
while (@ARGV) { ... # code for each line
$ARGV = shift @ARGV; # shorten @ARGV each time }
if (!open(ARGV, $ARGV)) {
warn "Can't open $ARGV: $!\n";
next;
This,
}
while (<ARGV>) {
... # code for each line
} Does exactly the
} same as this.

Remember, this special “magic” requires that the only thing inside the while loop is
the <> operator, if you use the <> operator anywhere else you must assign the
result explicitly if you want to keep the value.

LAB5 - FILES_1
LAB5 - FILES_2
LAB5 - FILES_3
The Range Operator ..

 Examples:

1: for (101 .. 200) { print; } # prints 101102...199200


2: @foo = @foo[0 .. $#foo]; # an expensive no-op
3: @foo = @foo[ -5 .. -1]; # slice last 5 items

4: @alphabet = ('A' .. 'Z');

5: $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];

6: @z2 = ('01' .. '31'); print $z2[$mday];

7: @combos = ('aa' .. 'zz');

8: @bigcombos = ('aaaaaa' .. 'zzzzzz');

Ay-Carumba - You’d better


have a lot of memory

1: Uses $_ as the default value of the loop.

2: $#foo is the index of the last item in @foo - this is true for all arrays.

3: Using a negative subscript on an array counts backwards from the end of the
array.

If the left value is greater than the right value in a .. Command then a null list is
returned. If what you really wanted was to count backwards then do this:

for reverse ( 27 .. 56 ) { print; } # prints 565554 … 2827

4: When used with strings we get some magic - this gives all the uppercase letters in
the English alphabet.

The .. operator is false as long a its left operand is false. Once the left operand is true
the .. operator is true until the right operand is true, then the .. operator becomes
false again.
The Conditional Operator ?:

 Just like the C version.


 Is a trinary operator - it’s two parts separate three expressions like this:
condition ? then : else
 Examples:
$a = $ok ? $b : $c; # get a scalar
@a = $ok ? @b : @c; # get an array
$a = $ok ? @b : @c; # get a count of an array's elements

printf "I have %d camel%s.\n",


$n, $n == 1 ? "" : "s";

What this says is this (for the first example):


Look at the value of $ok - if it’s true then $a = $b; otherwise $a = $c;

Example: $result = ( $count == 10 ) ? 88 : 99;

1st expression 2nd expression 3rd expression

The condition part is always evaluated in scalar context - for Truth or Falsity.

Question: In the example - what will the value of $result be if $count is 12?
How Do I … Establish Default Values?

 You would like to give a default value to a variable, but only if it doesn’t already
have one.

$a = $b || $c; use $b if $b is true, else $c


set $x to $y unless $x is
$x ||= $y
already true
$a = defined($b) ? $b : $c; use $b if $b is defined, else $c

$foo = $bar || "DEFAULT VALUE";

$dir = shift(@ARGV) || "/tmp";

$dir = defined($ARGV[0]) ? shift(@ARGV) : "/tmp";

$dir = @ARGV ? $ARGV[0] : "/tmp";

$count{ $shell || "/bin/sh" }++;

More info: See The Perl Cookbook, section 1.2 Page 6.

The difference between the two types of solution is what they test for - something
being defined, or something being true.

Three values which are defined are false. 0 “0” and “”. If a variable already held one
of those values and you wanted to keep that value then || won’t work.
How Do I … Establish Default Values?

 You would like to give a default value to a variable, but only if it doesn’t already
have one.
# find the user name on Unix systems The first expression which is true
$user = $ENV{USER} is the result which is assigned to
|| $ENV{LOGNAME} $user.
|| getlogin()
|| (getpwuid($<))[0]
|| "Unknown uid number $<";

$starting_point ||= "Greenwich";

@a = @b unless @a; copy only if empty


@a = @b ? @b : @c; assign @b if nonempty, else @c

More info: See The Perl Cookbook, section 1.2 Page 6.

LAB5 - FILE_4
Control Structures

Notes:
Control Structures - Truth

 We’ve seen that some operators return a true or false value.


 Here are the rules for the values a scalar can hold.
1. Any string is true except for “” and “0”.
2. Any number is true except for 0.
3. Any reference is true regardless of what it refers to.
4. Any undefined value is false.

0 # would become the string "0", so false.


1 # would become the string "1", so true.
10 - 10 # 10-10 is 0, would convert to string "0", so false.
0.00 # equals 0, would convert to string "0", so false.
"0" # the string "0", so false.
"" # a null string, so false.
"0.00" # the string "0.00", neither "" nor "0", so true!
"0.00" + 0 # the number 0 (coerced by the +), so false.
\$a # a reference to $a, so true, even if $a is false.
undef() # a function returning the undefined value, so false.

Notes
Loop Statements

LABEL while (EXPR) BLOCK


LABEL while (EXPR) BLOCK continue BLOCK

LABEL until (EXPR) BLOCK


LABEL until (EXPR) BLOCK continue BLOCK

LABEL for (EXPR; EXPR; EXPR) BLOCK

LABEL foreach (LIST) BLOCK


LABEL foreach var (LIST) BLOCK
LABEL foreach var (LIST) BLOCK continue BLOCK

LABEL BLOCK
LABEL BLOCK continue BLOCK
Continue BLOCKS are always optional

LABEL’s are always optional

All these statements have an optional LABEL.

The while statements execute as long as EXPR is true. If while is replaced with until,
then the sense of the test is reversed. Note that unlike some languages which have
do - until loops, in Perl the until test is made at the start of the loop and not the end.

It is customary to make the LABEL name be all uppercase.

The while and until statement can have an optional continue block. This block
is executed every time the block is continued either by falling off the end of the first
block or by an explicit next (a loop-control operator which goes to the next iteration
of the loop).
Loop Control

 We’ve already seen that a loop can have a label.


 It’s used with the loop control operators next, last, redo.
 The label names the loop as a whole - not the top of the loop.
 The loop control operator doesn’t “go to” the label.
 The syntax for the loop control operators is this:
 last LABEL
 next LABEL
 redo LABEL
 The last operator immediately exits the loop - any continue block is not executed.
 The next operator skips the rest of the current loop and starts the next one. If
there’s a continue clause then it is executed.
 The redo operator restarts the loop block without evaluating the condition again.
Any continue block is not executed.

The LABEL is optional - if it’s missing then the last, next, redo is the innermost
enclosing loop. But if you want to jump out of nested loops then the LABEL is needed.

Even though I’ve talked about continue blocks a lot - not many people use them.
Loop Control - An Example
LABEL: while <CONDITION>
{
# Code

if ( something == TRUE ) { redo; }

# Code

if ( something == TRUE ) { next; }

# Code

if ( something == TRUE ) { last; }

# Code
}
continue
{

# Code

The LABEL is optional - if it’s missing then the last, next, redo is the innermost
enclosing loop. But if you want to jump out of nested loops then the LABEL is needed.
Compound Statements - If And Unless

 A sequence of statements is called a BLOCK.


 Compound statements are built from expressions and BLOCKs.
 Blocks are always surrounded by { and }.

if (EXPR) BLOCK
if (EXPR) BLOCK else BLOCK
if (EXPR) BLOCK elsif (EXPR) BLOCK ..
if (EXPR) BLOCK elsif (EXPR) BLOCK .. else BLOCK

unless (EXPR) BLOCK


unless (EXPR) BLOCK else BLOCK
unless (EXPR) BLOCK elsif (EXPR) BLOCK ..
unless (EXPR) BLOCK elsif (EXPR) BLOCK .. else BLOCK

Note: it’s elsif NOT elseif.

unless simply reverses the true/false value of if. Note that unless also works with
else and elsif. There’s no such thing as elseunless.
Compound Statements - If And Unless

 Examples:

}
unless ($x == 1) ...
if ($x != 1) ... These all do the same thing.
if (!($x == 1)) ... TMTOWTDI

if ((my $colour = <STDIN>) =~ /red/i) {


$value = 0xff0000;
}
elsif ($colour =~ /green/i) {
$value = 0x00ff00;
}
elsif ($colour =~ /blue/i) {
$value = 0x0000ff;
}
else {
warn "unknown RGB component $colour, using black instead\n";
$value = 0x000000;
}

Notes:
Compound Statements - If And Unless

 Examples:
unless (open(FOO, $foo)) { die "Can't open $foo: $!" }
if (!open(FOO, $foo)) { die "Can't open $foo: $!" }

die "Can't open $foo: $!" unless open(FOO, $foo);


die "Can't open $foo: $!" if !open(FOO, $foo);

open(FOO, $foo) || die "Can't open $foo: $!";


open FOO, $foo or die "Can't open $foo: $!";

chdir $dir or die "chdir $dir: $!";


open FOO, $file or die "open $file: $!";
@lines = <FOO> or die "$file is empty?";
close FOO or die "close $file: $!";

$! is the error code


I tend to prefer this

In the preferred example - there’s no if and no unless - we’re relying on the short-
circuit evaluation.

$! Is the error code returned by a shell for open, chdir and close (and also for lots
of other shell operations).
Control Structures - If And Unless

 Examples:
if ($debug_level > 0) {
# Something has gone wrong. Tell the user.
print "Debug: Danger, Will Robinson, danger!\n";
}

if ($city eq "New York") {


print "New York is northeast of Washington, D.C.\n";
}
elsif ($city eq "Chicago") {
print "Chicago is northwest of Washington, D.C.\n";
}
else {
print "I don't know where $city is, sorry.\n";
}

unless ($destination eq $home) {


print "I'm not going home.\n";
}

Note - if has else and elsif. unless does not have an elseunless.
Control Structures - If And Unless

 More examples - compare with the previous page:

print "Danger, Will Robinson, danger!\n" if ($debug_level > 0);

print "I'm not going home.\n" unless ( $destination eq $home );

Another example of idiomatic Perl. You’ll see the interchangeability of statements like
this a lot.
Control Structures - While And Until

 Perl has four main looping constructs, while & until and for & foreach.
 While & until act like if and unless except that they loop repeatedly.
1. First the condition is checked.
2. If the condition is met, that is the condition is:
1. True for the while loop.
2. False for an until loop.
3. Then the block of code is executed.
while ($tickets_sold < 10000) {
$available = 10000 - $tickets_sold;
print "$available tickets are available. How many would you like: ";
$purchase = <STDIN>;
chomp($purchase);
$tickets_sold += $purchase;
}

while ( $line = <GRADES> ) { ...

Note: If the original condition is never met then the loop is never entered. Make sure
if you intend to leave the loop at some point that you have some code in the loop
which changes the variable which keeps you going through the loop.

The bottom example assigns the next line from the GRADES file to the variable $line
and returns the value of the line so the condition of the while statement can be
evaluated for truth. You might wonder if Perl will exit prematurely when it sees blank
lines in the file - the answer is it won’t because a blank line is a “\n” or newline
character and this is not false. When we do reach the end of the file the line input
operator returns the value undef, which always evaluates to false and so at this point
the loop does terminate. There’s no need for an explicit test because the input
operator is set up to work smoothly in a conditional context.
While Loops

while (my $line = <STDIN>) {


$line = lc $line;
}
continue {
print $line; # still visible
}
# $line now out of scope here

A variable declared local to the while loop (here done with my $line) exists only
inside the loop. If you want $line to be visible after the loop has ended then declare
the variable before the loop begins. We’ll discuss scope shortly.

Also, the use of a continue block here is redundant - we could have easily put all the
statements in the continue block inside the main while loop. We’ll also discuss
last,next and redo shortly.
Control Structures - While And Until

 You will often see command line arguments processed like this:

while (@ARGV) {
process(shift @ARGV);
}

The shift operator removes one element from the argument list each time through the
loop and sends it to a subroutine for processing (here called process()).
Control Structures - For And Foreach

 Examples:

for ($sold = 0; $sold < 10000; $sold += $purchase) {


$available = 10000 - $sold;
print "$available tickets are available. How many would you like: ";
$purchase = <STDIN>;
chomp($purchase);
}

foreach $user (@users) {


if (-f "$home{$user}/.nexrc") {
print "$user is cool... they use a perl-aware vi!\n";
}
}

foreach $key (sort keys %hash) {...


Common Perl idiom
for getting the keys
from a hash.

The for loop takes three expressions. An initial expression - set only once, a condition
to be tested every time the loop is executed and an expression to modify the loop
variable.

The foreach loop is used to iterate through the contents of an array. The foreach loop
treats the expression in ( and ) as a list (this is list context) always - even if there’s
only one element in the list. Then each element is aliased to the loop variable in turn
- IMPORTANT - MODIFYING THE LOOP VARIABLE ALSO MODIFIES THE ORIGINAL
ARRAY.
For Loops

 The for loop has three expressions:


1. An expression which initializes the loop.
2. A condition which will keep the loop executing, and
3. An expression which re-initializes the loop.
 All three expressions are optional - the “;” are not.
 If it’s missing - the condition is always true.
 So:
LABEL: for (my $i = 1; $i <= 10; $i++)
{
}

{
my $i = 1; These are
LABEL: while ($i <= 10) equivalent
{
}
continue { $i++; }
}

Notes:
For Loop Examples

 Examples:

for ($i = 0, $bit = 0; $i < 32; $i++, $bit <<= 1) {


print "Bit $i is set\n" if $mask & $bit;
}
# the values in $i and $bit persist past the loop

for (my ($i, $bit) = (0, 1); $i < 32; $i++, $bit <<= 1) {
print "Bit $i is set\n" if $mask & $bit;
}
# loop's versions of $i and $bit now out of scope

You can do more than one thing in the three parts of the loop.

The <<= 1 part of the loop is shifting the value of $bit 1 bit to the right.
Foreach Examples

 Examples:

$sum = 0; foreach $value (@array) { $sum += $value }

for $count (10,9,8,7,6,5,4,3,2,1,'BOOM') { # do a countdown


print "$count\n"; sleep(1);
}

for (reverse 'BOOM', 1 .. 10) { # same thing


print "$_\n"; sleep(1);
}

for $field (split /:/, $data) { # any LIST expression


print "Field contains: `$field'\n";
}

foreach $key (sort keys %hash) {


print "$key => $hash{$key}\n"; This is the usual way
} to get all of the keys
out of a hash.

With foreach there isn’t any way to know where you are in a list (unless you decide to
keep track of it yourself with counters etc.)

If the list contains modifiable values (i.e. variables, not constants), then you can
modify those variables by modifying the variable inside the loop. The variable in the
loop is an alias for the variable in the list.
Foreach Examples

 Examples:

foreach $pay (@salaries) { # grant 50% raises


$pay *= 1.50; # works for me!
}

for (@christmas, @easter) { # change menu


s/ham/turkey/;
}

s/ham/turkey/ for @christmas, @easter; # same thing

for ($scalar, @array, values %hash) {


s/^\s+//; # strip leading whitespace
s/\s+$//; # strip trailing whitespace
}

On the last slide we said that the variable inside the loop in a foreach loop was an
implicit alias for the variable in the list which is passed to foreach.

So when we alter the variable in the loop ($pay in the top example) we’re actually
altering the variable in the list which we are reading through.
Control Structures - Breaking Out - Next & Last

 It’s not unusual to have special cases in loops.


 Next skips to the end of the loop and forces the next iteration.
 Last skips to the end of the loop and exits the loop.
 Example:
foreach $user (@users) {
if ($user eq "root" or $user eq "lp") {
next;
}
if ($user eq "special") {
print "Found the special account.\n";
# do some processing
last;
}
}

Notes:
Control Structures - Breaking Out - Next & Last

 It’s possible to break out of nested loops by labeling your loops and specifying
which loop you want to break out of.

LINE: while ($line = <ARTICLE>) {


last LINE if $line eq "\n"; # stop on first blank line
next LINE if $line =~ /^#/; # skip comment lines
# your ad here
}

A label
Would anyone care to speculate
On what this piece of code does?

Notes:
Case Statements

 Perl doesn’t have a case statement:. But it’s simple


to build one.

SWITCH: {
if (/^abc/) { $abc = 1; last SWITCH; }
if (/^def/) { $def = 1; last SWITCH; }
if (/^xyz/) { $xyz = 1; last SWITCH; }
$nothing = 1;
}

OR

SWITCH: {
/^abc/ && do { $abc = 1; last SWITCH; };
/^def/ && do { $def = 1; last SWITCH; };
/^xyz/ && do { $xyz = 1; last SWITCH; };
$nothing = 1;
}

Perl doesn’t have a case/switch structure since it is so easy to build one. The SWITCH
is a label (remember the convention that all labels are in upper-case), and not some
Perl keyword we haven’t discussed yet.

We haven’t covered do (it’s on the next page), but think of it as a dummy keyword
which enables a statement (the bit between { and }) to be written. All three lines in
the second statement are using short-circuit evaluation. The first thing on the line
(reading from left to right) which is false makes the whole line false and all the
statements following are not evaluated. Remember: in short-circuit evaluation it’s the
first thing which is false in an && statement and the first thing which is true in an ||
statement which controls the flow of the program.

It’s important to remember that once a short-circuit evaluation has enough


information to determine truth/falsity, then none of the other possible clauses are
evaluated. If those other clauses also do assignment then those assignments won’t
happen.
The do (BLOCK) Construct

# process to place all LFSR stage results in a single file

while(<RESULTS>) {
/LFSR\s\=\s(\w+)/ && do { print LFSRFILE “$1\n” };
$lastfile = $1;
}

This is a way of grouping


a lot of statements into a
single block.

The do BLOCK executes a sequence of statements in the BLOCK and returns the
value of the last expression evaluated in the BLOCK.

It can be modified with a while or an until statement modifier. If so then Perl


executes the BLOCK before it tests the loop condition.

The do BLOCK itself does not count as a loop, so the loop control statements next,
last, redo cannot be used to leave or restart the BLOCK.
The do (FILE) Construct

If do can read the file but can’t compile it, it


returns undef and sets an error message in $@.

# read in config files: system first, then user


for $file ("/design/C6RAM/defaults/defaults.rc",
"$ENV{HOME}/.someprogrc")
{
unless ($return = do $file) {
warn "couldn't parse $file: $@" if $@;
warn "couldn't do $file: $!" unless defined $return;
warn "couldn't run $file" unless $return;
}
}

If the file compiles and runs, the value If do can’t read the file it returns
returned is the value of the last undef and sets $! to the error.
expression evaluated.

The do FILE form uses the value of FILE as a filename and executes the contents
of the file as a Perl script.

Its use is to include subroutines from a Perl subroutine library, but it has been
superceded by use. It is still useful for loading things like configuration data into your
program as shown in the example.

If the file can be read but doesn’t compile then an error is set in $@.
If the file can’t be read then an error is set in $!
Goto

 Perl does support goto - so that’s at least one thing they got wrong then!

 You can:
 goto LABEL
 goto Expression
 goto &name (subroutine)

goto(("FOO", "BAR", "GLARCH")[$i]); # hope 0 <= i < 3

@loop_label = qw/FOO BAR GLARCH/;


goto $loop_label[rand @loop_label]; # random teleport

Notes:
How Do I … Do Something With Every Element In A List?

 You want to repeat a procedure for every element in a list.

foreach $item (LIST) {


# do something with $item Solution: Use a foreach loop
}

foreach $user (@bad_users) {


complain($user);
}

foreach $var (sort keys %ENV) { Sometimes you need to use a function
print "$var=$ENV{$var}\n"; to generate the list needed by foreach
}

foreach $user (@all_users) { The code in the loop can call last to jump out
$disk_space = get_usage($user); of the loop, next to move on to the next element,
if ($disk_space > $MAX_QUOTA) { of redo to jump back to the first statement inside
complain($user); the block.
}
}

More info: See The Perl Cookbook, section 4.4 Page 97.

The variable set to each value in the list is called the loop iterator. If no variable is
supplied then the global variable $_ will be used. $_ is the default variable used in
many of Perl’s string, list and file functions.
How Do I … Do Something With Every Element In A List?

 You want to repeat a procedure for every element in a list.

while (<FH>) { # $_ is set to the line just read


chomp; # $_ has a trailing \n removed, if it had one
foreach (split) { # $_ is split on whitespace, into @_
# then $_ is set to each chunk in turn
$_ = reverse; # the characters in $_ are reversed
print; # $_ is printed
}
}

foreach my $item (@array) { To be sure of what is


print "i = $item\n"; happening it is always Perl’s $_ value is preserved
} better to declare and through any foreach nested
use your own lexical loops
@array = (1,2,3); variable
foreach $item (@array) { The foreach construct
$item--; has another feature: each
} time through the loop the
print "@array"; iterator variable is an alias
# prints: 1 2 3 not a copy

More info: See The Perl Cookbook, section 4.4 Page 97.

IMPORTANT NOTE: The top example works the way we might hope for. The value of
$_ in the while loop is preserved when the foreach loop is executed. However, if
the while loop had been the inner loop then BAD THINGS would have happened
since the while <FH> construct clobbers the value of the global $_ (I.e. it doesn’t
localize it). Consider this to be a bug or a feature - either way it’s an accident waiting
to happen. See the full explanation on page 99 of the Perl Cookbook.

I would always recommend using lexical variables. These are localized at their point
of declaration and the risk of side-effects is much reduced.

Also note that with a foreach loop, the loop iterator is not a copy of the variable
from the list, it actually is the variable in the list - change the variable and it changes
in the list. This is important - it’s not a copy, it’s an alias.
How Do I … Find Elements In One List But Not In Another?

 You want to find the elements which are in one list but not in another.
# assume @A and @B are already loaded
%seen = (); # lookup table to test membership of B
@aonly = (); # answer

# build lookup table


foreach $item (@B) { $seen{$item} = 1 }

# find only elements in @A and not in @B


foreach $item (@A) {
unless ($seen{$item}) {
# it's not in %seen, so add to @aonly
push(@aonly, $item);
}
}

Straight-forward version

More info: See The Perl Cookbook, section 4.7 Page 104.

Solution: Build a hash of the keys in @B to use as a lookup table. Then iterate through
@A looking to see if the item in @A is in the lookup table. If it is then it’s in both @A
and @B. If it’s not then it’s in @B but not in @A.
How Do I … Find Elements In One List But Not In Another?

 You want to find the elements which are in one list but not in another.
my %seen; # lookup table
my @aonly; # answer

# build lookup table


@seen{@B} = ();

foreach $item (@A) {


push(@aonly, $item) unless exists $seen{$item};
}

Different (idiomatic) version

More info: See The Perl Cookbook, section 4.7 Page 104.

The two different answers vary in how they build the hash. The first (previous slide)
iterates over @B. This one uses a hash slice. A hash slice is built like this:

$hash{“key1”} = 1;
$hash{“key2”} = 2;

This is equivalent to:

@hash{“key1” , “key2”} = (1,2);

The list in {} holds the keys while the list on the right holds the values. In this second
example we say this:

@seen{@B} = ();

This uses the items in @B as keys for %seen, setting each to undef (because the list
on the right is empty). We later check for the existence of the key - not the logical
truth or the definedness of the value.
How Do I … Extract Unique Elements From A List?

 You want to remove duplicate elements from a list.


%seen = ();
@uniq = ();
foreach $item (@list) {
unless ($seen{$item}) Solution: Use a hash to record
# if we get here, we have not seen it before the values and then keys() to
$seen{$item} = 1; extract the values
push(@uniq, $item);
}
}

%seen = (); Same as above but faster


foreach $item (@list) {
push(@uniq, $item) unless $seen{$item}++;
}

%seen = (); Same as above but different


foreach $item (@list) {
$seen{$item}++;
}
@uniq = keys %seen;

More info: See The Perl Cookbook, section 4.6 Page 102.

Solution: Use a hash to record which items have been seen and then use keys on the
hash to extract them.

Warning. Using a hash like this can use up a lot of memory, and once you’ve used a
hash the keys function will return the keys in a random order (not the insertion
order). If this matters then you need a different solution.
How Do I … Extract Unique Elements From A List?

 You want to remove duplicate elements from a list.


# generate a list of users logged in, removing duplicates
%ucnt = ();
for (`who`) {
s/\s.*\n//; # kill from first space till end-of-line, yielding username
$ucnt{$_}++; # record the presence of this user
}
# extract and print unique keys
@users = sort keys %ucnt;
print "users logged in: @users\n";

More info: See The Perl Cookbook, section 4.6 Page 102.
How Do I … Reverse An Array?

 You want to reverse an array.


# reverse @ARRAY into @REVERSED
@REVERSED = reverse @ARRAY; Solution: Use the reverse() function

for ($i = $#ARRAY; $i >= 0; $i--) { Solution: Use a for loop


# do something with $ARRAY[$i]
}

# two-step: sort then reverse


@ascending = sort { $a cmp $b } @users;
@descending = reverse @ascending;

# one-step: sort with reverse comparison


@descending = sort { $b cmp $a } @users;

More info: See The Perl Cookbook, section 4.10 Page 109.

The reverse() function, reverses a list.

The for loop actually processes the list in reverse order but keep the list in its
original order.

If you use reverse() to reverse a list you just sorted then make sure its in the
order you want.

The sort() function takes an optional code block which lets you replace the default
alphabetic comparison subroutine with your own, This function is called each time
sort() has to compare two values. The values are loaded into $a and $b which are
automatically localised, so they won’t interfere with any variables you already have
called $a or $b.

The comparison function should return a negative number if $a should appear before
$b in the output list, 0 if the order doesn’t matter and a positive number if $a should
appear after $b in the output list. Perl has two operators that behave this way: <=>
for sorting numbers in ascending order, and cmp for sorting strings in ascending
alphabetic order. By default sort() uses cmp-style comparisons. Of course, you can
always provide your own comparison subroutine.
How Do I … Traverse A Hash?

 You want to perform an action on each entry in a hash.


while(($food, $color) = each(%food_color)) { Solution: Use each() with a while loop
print "$food is $color.\n";
}
Banana is yellow.
Apple is red.
Carrot is orange.
Lemon is yellow.

foreach $food (keys %food_color) { Solution: Use keys with a foreach loop
my $color = $food_color{$food};
print "$food is $color.\n";
}
Banana is yellow.
Apple is red.
Carrot is orange.
Lemon is yellow.

foreach cannot be used with hashes, nor can push(), pop(), shift, unshift() WARNING

More info: See The Perl Cookbook, section 5.4 Page 135.
How Do I … Delete Something From A Hash?

 You want to remove an entry from a hash.


# remove $KEY and its value from %HASH Solution: Use the delete() function
delete($HASH{$KEY});

Don’t try to delete a key by setting its value to undef. All


that will do is set the keys value to undef!

The delete function() is the only way to remove a specific


hash entry. Once a key is deleted it will no longer show up
in the list of keys(), or an each() iteration and exists() will
return false for that key.

More info: See The Perl Cookbook, section 5.3 Page 133.

You can’t delete a key by setting its value to undef since undef is a value which a
hash can can store. You must use the delete() function.

If you want to clear a hash then simply assign it to the empty list like this:

%hash = ();
How Do I … Sort A Hash?

 You need to work with the elements of a hash in a particular order.


# %HASH is the hash to sort Solution
@keys = sort { criterion() } (keys %hash);
foreach $key (@keys) {
$value = $hash{$key};
# do something with $key, $value
}

foreach $food (sort keys %food_color) { Alphabetically


print "$food is $food_color{$food}.\n";
}

foreach $food ( sort { $food_color{$a} cmp $food_color{$b} } Associated values


keys %food_color;
{
print "$food is $food_color{$food}.\n";
}

More info: See The Perl Cookbook, section 5.9 Page 144.

Solution: Get a list of keys and sort based on the ordering you want.

Sort by default sorts alphabetically. The optional code block passed to sort will be
called every time sort needs to compare two values in the sort function. $a and $b
are localised sort variables.
How Do I … Test For The Presence Of A Key In A Hash?

 You need to know if a hash has a particular key.


%age = ();
$age{"Toddler"} = 3;
$age{"Unborn"} = 0;
$age{"Phantasm"} = undef;

foreach $thing ("Toddler", "Unborn", "Phantasm", "Relic") {


print "$thing: ";
print "Exists " if exists $age{$thing};
print "Defined " if defined $age{$thing};
print "True " if $age{$thing};
print "\n";
}

Toddler: Exists Defined True Exists, defined, true


Unborn: Exists Defined Exists, defined
Phantasm: Exists Exists
Relic: None of the above

More info: See The Perl Cookbook, section 5.2 Page 131.

Toddler: It exists because we gave it a value in the hash, that value is defined (3) and
since it’s non-zero, it is true.

Unborn: It exists because we gave it a value in the hash, that value is defined (0) and
since it’s zero it is not true.

Phantasm: It exists because we gave it a value in the hash, that value is undefined so
it fails the defined test and since undef is false it fails the truth test as well.

Relic: It doesn’t exist since we never put it into the hash. So it fails all three tests.
How Do I … Invert A Hash?

 You have a hash and a value for which you want to find the corresponding key.
# %LOOKUP maps keys to values Solution: Use the
%REVERSE = reverse %LOOKUP; reverse() function

%surname = ( "Mickey" => "Mantle", "Babe" => "Ruth" );


%first_name = reverse %surname;
print $first_name{"Mantle"}, "\n";
Mickey

What happens if two different keys happen to have the same value?

Result - The inverted hash will only have one. For a solution to this
see the “Perl Cookbook” pages 140 and 141.

More info: See The Perl Cookbook, section 5.8 Page 142.

Use reverse() to create an inverted hash whose values are the original hashes keys
and whose keys are the original hashes values.

When we treat %surname as a list it becomes:

("Mickey", "Mantle", "Babe", "Ruth"),

or

("Ruth", "Babe", "Mantle", "Mickey"),

because we can’t predict the order in which things come out of hashes. Reversing this
list (assume the first list is the one we get) gives this:

("Ruth", "Babe", "Mantle", "Mickey")

When we treat this list as a hash it becomes:

("Ruth" => "Babe", "Mantle" => "Mickey")


How Do I … Test For The Presence Of A Key In A Hash?

 You need to know if a hash has a particular key.


# does %HASH have a value for $KEY ? Solution: Use the exists() function
if (exists($HASH{$KEY})) {
# it exists
} else {
# it doesn't
}

# %food_color per the introduction


foreach $name ("Banana", "Martini") {
if (exists $food_color{$name}) {
print "$name is a food.\n";
} else {
print "$name is a drink.\n";
}
}

Banana is a food.
Martini is a drink.

More info: See The Perl Cookbook, section 5.2 Page 131.

exists() checks for the existence of a key in a hash. It doesn’t say anything about the
keys value (if the key exists).
How Do I … Print A Hash?

 You want to print a hash, but neither print “%hash” nor print %hash works.
while ( ($k,$v) = each %hash ) { Solution: Iterate using each()
print "$k => $v\n";
}

print map { "$_ => $hash{$_}\n" } keys %hash; Solution: Use map to generate a
list of strings

print "@{[ %hash ]}\n"; Solution: Interpolate the hash


as a list and print that
{ Solution: Use a temporary array
my @temp = %hash; to hold the hash and print that
print "@temp";
}

foreach $k (sort keys %hash) { You can print in key order at the
print "$k => $hash{$k}\n"; cost of doing a sort()
}

More info: See The Perl Cookbook, section 5.5 Page 137.

The best solution is probably the first one.


How Do I … Delete Something From A Hash?

 You want to remove an entry from a hash.


# %food_colour as per Introduction print "Initially:\n";
sub print_foods { print_foods();
my @foods = keys %food_colour; print "\nWith Banana undef\n";
my $food; undef $food_colour{"Banana"};
print_foods();
print "Keys: @foods\n"; print "\nWith Banana deleted\n";
print "Values: "; delete $food_colour{"Banana"};
print_foods();
foreach $food (@foods) {
my $color = $food_colour{$food};
Initially:
if (defined $colour) { Keys: Banana Apple Carrot Lemon
Values: yellow red orange yellow
print "$colour ";
} else { With Banana undef
print "(undef) "; Keys: Banana Apple Carrot Lemon
} Values: (undef) red orange yellow
}
print "\n"; With Banana deleted
Keys: Apple Carrot Lemon
}
Values: red orange yellow

More info: See The Perl Cookbook, section 5.3 Page 133.

You can’t delete a key by setting its value to undef since undef is a value which a
hash can store. You must use the delete() function.

As the example shows, setting $food_colour{“Banana”} to undef doesn’t delete


the key from the hash - it only makes the value undef. delete() really does
remove it from the hash.

delete() can also work with a hash slice to remove multiple keys from a hash, like
this:

delete @food_color{"Banana", "Apple", "Cabbage"};


How Do I …Merge Hashes?

 You need to make a new hash with the entries of two existing hashes.
%merged = (%A, %B); Solution: Treat the hashes as lists and join them
as you would lists. Keys which appear in both hashes
%merged = (); will only appear once in the final hash.
while ( ($k,$v) = each(%A) ) { Alternative: Loop over the hashes elements and
$merged{$k} = $v; build a new hash.
}
while ( ($k,$v) = each(%B) ) {
$merged{$k} = $v;
}

More info: See The Perl Cookbook, section 5.10 Page 145.
How Do I … Traverse A Hash?

 You want to perform an action on each entry in a hash.


while(($key, $value) = each(%HASH)) { Solution: Use each() with a while loop
# do something with $key and $value
}

foreach $key (keys %HASH) { Solution: Use keys with a foreach loop
$value = $HASH{$key};
# do something with $key and $value
}

More info: See The Perl Cookbook, section 5.4 Page 135.

The each() function returns a two element list from the hash each! time it is called.
Remember, order has no meaning in hashes, so regardless of the order with which
you put values into the hash, it is very unlikely that they will come back out in that
same order. It is possible to retrieve items in insertion order, but that is beyond the
scope of this course.
How Do I … Find The Most Common Anything?

 You want to know how many times a value in an array or in a hash occurs in the
array or hash.
%count = (); Solution: Use a hash to count how many time each
foreach $element (@ARRAY) { element (for an array) or key (for a hash) occurs.
$count{$element}++;
} The foreach adds one to $count{$element} for every
occurrence of $element.

More info: See The Perl Cookbook, section 5.14 Page 150.
How Do I … Operate On A Series Of Integers?

 You want to perform an operation on a series of integers between X and Y.

foreach ($X .. $Y) {


Range operator
# $_ is set to every integer from X to Y, inclusive
}

foreach $i ($X .. $Y) { Range operator


# $i is set to every integer from X to Y, inclusive
}

for ($i = $X; $i <= $Y; $i++) {


# $i is set to every integer from X to Y, inclusive
}

for ($i = $X; $i <= $Y; $i += 7) {


# $i is set to every integer from X to Y, stepsize = 7
}

Remember, for and foreach are synonyms, so that gives us another 4 variations

More info: See The Perl Cookbook, section 2.5 Page 49.

Solution: use a for loop or a foreach with the range operator (..)

When iterating over consecutive integers, the third method is most efficient.
Regular Expressions

Notes:
Regular Expressions

 Regular expressions (a.k.a. regexes, regexps, RE’s) are used in:


 grep
 awk
 findstr
 sed
 vi
 Emacs
 A regular expression is a way of describing a set of strings without saying what they
all are.

if (/Windows 95/) { print "Time to upgrade?\n" }

s/Windows/Linux/;

Be careful - regular expressions in Perl are not identical to regular expressions in


other languages.

When you see something that looks like /foo/ you’re looking at a pattern match
operator (the / and the /).

If you can find patterns in a string then you can also replace those patterns with
something else. So when you see something like s/Windows/Linux/ you’re looking at a
substitution of Linux for Windows (which some people might say is a good thing)!

Finally patterns can also specify where something isn’t. This is used with the split
operator - see next slide.
Regular Expressions

 An example of the split operator:

($good, $bad, $ugly) = split( /,/ , "vi,emacs,teco");

This is the list which This is the text which


gets the results of the This is the string which split operates on
split operator split uses to chop up
the list on its right
(the comma between / and /

 Tip - the best way to split a string which contains lots of white space:

@words = split( /\s+/ , $line );

We haven’t covered the \s character class yet - but it stands for any white-space
character. The \s+ means any string containing one or more consecutive white-space
characters (it can be different numbers at different places on a line of text - the fields
on which the split occurs don’t all have to be the same length).
Regular Expressions

 The simplest regular expressions are those which match several characters in a row:

while ($line = <FILE>) {


if ($line =~ /http:/) {
print $line;
}
}

This uses $_ for both the


while (<FILE>) {
input operator and the string
print if /http:/;
} to search for a pattern match

while (<FILE>) {
print if /http:/;
print if /ftp:/;
print if /mailto:/;
# What next?
}

In the first example we’re looking for all lines containing /http:/ exactly.

The =~ operator is called the binding operator. It’s telling Perl to look for a match in
the variable $line. If we don’t use the =~ operator then Perl by default searches the
system variable $_. This is a special scalar variable which is used in many places in
Perl - not just pattern matching.

In the second example we’re using the default value $_ (which is also set by the <>
operator).

In the third example we’re looking for lots of different types of links, http, ftp, mailto.
What happens if this later needs to be extended. Wouldn’t it be easier to look for any
number of alphabetic characters followed by a colon?
Regular Expressions

 In regular expression speak that would be:

/[a-zA-Z]+:/

The [ and ] define a character class. The a-z and A-Z represent all the alphabetic
characters (the - means all characters between the starting and ending character
inclusive).

The + means “one or more of whatever is immediately in front of me”. That’s an


example of a quantifier - something which says how many times something is allowed
to repeat. Remember the / and / are not part of the pattern. Thery’re like quotes in
that they contain the pattern but are not part of it.
Regular Expressions - Character Classes

 These are some common Perl quantifiers.

Name ASCII definition Code


Whitespace [ \t\n\r\f] \s
Word character [a-zA-Z_0-9] \w
Digit [0-9] \d

 Note that these match single characters.


 A \w will match a single word character - not a word.
 You can say \w+ to match a word.
 Perl also allows negation of these classes by using upper case character version of a
quantifier.
 \D matches a non-digit character etc.
 There’s one special character class, written with a “.” that will match any character.

Example: /a./ will match any string containing an “a” that is not the last character in
a string. Why?

So this will match “at” or “am” or “a!” but not “a” since there’s nothing after the “a”
for the dot (any character) to match with.

It’ll also match “camel” and “oasis”, but not “sheba”. It matches “caravan” on
the first “a”.
Regular Expressions - Quantifiers

 The character classes we’ve seen so far all match one character.
 You can match a word with \w+ and the “+” is one kind of quantifier.
 General quantifiers are like this:
 {min,max}

Example Matches
\d{6,8} Any number of between 6 and 8 digits
\d{5,5} A number of exactly 5 digits
\d{5,} A number of 5 digits or more
\d{,5} A number of 5 digits or less

Code Meaning
+ {1,}
* {0,}
? {0,1}

Be very, very careful using “*”. Why?


Regular Expressions - Quantifiers

 Exercise: What does this do, i.e. what will be in $line after the substitution?

$line = "Fred xxxxxxxx barney";


$line =~ s/x*//;
print $line;

 One last thing:


 Quantifiers apply to the immediately preceding character, so:
/bam{2}/ will match "bamm" but not "bambam"

 To apply a quantifier to more than one character, use ( and ) like this:

/(bam){2}/ will match "bambam"

One other thing to note: all matching in Perl is greedy - Perl will match as much as it
can
Regular Expressions - Anchors

 Examples:
/\bFred\b/ would match in Answer And Reason
"The Great Fred" Yes
"Fred The Great" Yes
"Frederick The Great" No - Fred is not followed by a non-word character.

 There are also characters for matching at:


 Start of line “^”.
 End of line “$”. (Don’t worry, Perl won’t confuse this with a variable instance).

 So when we said:
next LINE if line =~ /^#/;

 What were we saying?

When you try to pattern match, Perl will try to match in every location until it
succeeds. An anchor allows you to specify where a pattern can match.

The special symbol \b matches on a word boundary which is defined as the “nothing”
which exists between a word character “\w” and a non-word character “\W”.

Answer: Go to the next iteration of the loop if the first character on a line is the “#”
character.

Also, when we said that the sequence \d{6,8} would match a number of between 6
and 8 digits - that wasn’t quite true, since it would also match any number containing
9 or more digits as well. To get the desired result we would have to combine
quantifiers with anchors.

Exercise: write a pattern which will match a number of 5 or 6 digits - but will fail to
match one of more than 6 digits.
Regular Expressions - Back References

 Use ( and ) to remember bits of patterns which match.


 Example:

/\d+/
Both these patterns match
the same thing - a number
/(\d+)/ But this one remembers what
was matched

 What does this do?


s/(\S+)\s+(\S+)/$2 $1/

When you match patterns you can use “(“ and “)” to remember the bits of a string
which did match.
The “(“ and “)” don’t change what matches.

How you remember what was matched depends on where you want to remember it
from. Inside the same pattern the bits of pattern which match are stored in variables
\1 \2 \3 etc. The match from the first pair of “(“ and “)” is in \1 and so on.

Outside the pattern the bits of pattern which match are stored in $1 $2 $3 etc.

Be careful - once you start a new pattern match the old values of $1 $2 $3 etc. are all
wiped out, so if you want to remember them long-term then copy $1 $2 $3 etc. into
new variables.

By the way - there’s no limit to how many bits of the pattern can be remembered,
once you get to \9 or $9 Perl continues with \10 and $10 and so on.

Whoops - no easy answer here this time - you’ll have to work it out.
Regular Expressions - List Processing

 Examples:

@array = (1 + 2, 3 - 4, 5 * 6, 7 / 8);

sort @dudes, @chicks, other();

print reverse sort map {lc} keys %hash;

($hour, $min, $sec, $ampm) = /(\d+):(\d+):(\d+) *(\w+)/;

@hmsa = /(\d+):(\d+):(\d+) *(\w+)/;

Earlier we mentioned the terms scalar and array context.


So far most things have been in scalar context - we’ve seen single results.

Lots of Perl operators can produce either scalar results or list results.
It depends on how they are used. They just “know” what is expected of them.

In the first example @array is a four element list.

In the second example each of @dudes, @chicks and other() returns a list, all
the lists are then joined together to produce a single (big) list and that is passed to
sort().

Some operators produce lists (like keys), while some consume them (like print).

You can stack several up several list operators in a row - see example 3. This takes all
the keys from %hash, turns them all into lower-case by applying the lc operator (via
map { }), passes that list to the sort function and then passes that list to the
reverse function which then (finally) prints that list.

If you do a pattern match in list context then all the back-references are pulled out as
a list - see example 4 and example 5. TMTOWTDI.
How Do I … Parse Comma-Separated Data?

 You have a file containing comma-separated values that you need to read in, but
these data fields may have quoted commas or escaped quotes in them.
sub parse_csv { This procedure is
my $text = shift; # record containing comma-separated values from “Mastering
my @new = ();
push(@new, $+) while $text =~ m{
Regular Expressions”
# the first part groups the phrase inside the quotes.
# see explanation of this pattern in MRE
"([^\"\\]*(?:\\.[^\"\\]*)*)",?
| ([^,]+),?
| ,
}gx;
push(@new, undef) if substr($text, -1,1) eq ',';
return @new; # list of values that were comma-separated
}

use Text::ParseWords; Use the standard


sub parse_csv { ParseWords module
return quoteword(",",0, $_[0];
}

More info: See The Perl Cookbook, section 1.15 Page 31.

Comma-separated data sounds simple to parse, but it is actually a complex format


since the fields themselves can contain commas. This makes the pattern matching
solution complex and rules out a simple split /,/.

Text::ParseWords hides all this complexity from you. Pass its quoteword()
function two arguments and a CSV string. The first argument is the separator (in this
case a comma); the second is a value which is true or false, and which controls
whether the strings returned have quotes around them.
How Do I … Check If A String Is A Valid Number?

 You want to check if a string contains a valid number.

if ($string =~ /PATTERN/) {
# is a number
General solution
} else {
# is not Specific solutions
}

warn "has nondigits" if /\D/;


warn "not a natural number" unless /^\d+$/; # rejects -3
warn "not an integer" unless /^-?\d+$/; # rejects +3
warn "not an integer" unless /^[+-]?\d+$/;
warn "not a decimal number" unless /^-?\d+\.?\d*$/; # rejects .2
warn "not a decimal number" unless /^-?(?:\d+(?:\.\d*)?|\.\d+)$/;
warn "not a C float"
unless /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/;

More info: See The Perl Cookbook, section 2.1 Page 44.

This is something which is common when validating input as part of a CGI script.

The solution is easy as long as you can decide what you mean by a number, and can
then write a regular expression (or series of expressions) to look for the pattern you
desire.

If numbers can have leading or trailing space then a substitution to remove that
space should occur, like this:

$probable_number = s/\s+//g;
How Do I … Copy And Substitute Simultaneously?

 You want a easy way in pattern matching of copying and substituting at the same
time.
$dst = $src; You want to avoid
$dst =~ s/this/that/; this

($dst = $src) =~ s/this/that/; So do this

# Make All Words Title-Cased


($capword = $word) =~ s/(\w+)/\u\L$1/g;

# /usr/man/man3/foo.1 changes to /usr/man/cat3/foo.1


($catpage = $manpage) =~ s/man(?=\d)/cat/;

($a = $b) =~ s/x/y/g; # copy $a and then change $b


$a = ($b =~ s/x/y/g); # change $b, count goes in $a

More info: See The Perl Cookbook, section 6.1 Page 164.
How Do I … Match Only Letters When Pattern Matching?

 You want to see whether a value consists on only alphabetic characters.

if ($var =~ /^[A-Za-z]+$/) { Use this if you don’t care about locale


# it is purely alphabetic
}

use locale; Use this if you do care about locale


if ($var =~ /^[^\W\d_]+$/) {
print "var is purely alphabetic\n";
}

More info: See The Perl Cookbook, section 6.2 Page 165.

The obvious way of doing this isn’t good enough in the general case since it doesn’t
respect a users locale setting. If you need to match letters with diacritical marks, then
use something like the second example which matches against a negated character
class.

The \w regular expression matches one alphabetic character, one numeric character
or _. Therefore \W is not one of those. The negated character class [^\W\d_]
specifies a byte which must not be alphanumeric, a digit, or an underscore. That
leaves nothing but alphabetics.
How Do I … Match Only Words When Pattern Matching?

 You want to pick out words from a string.


/\S+/ # as many non-whitespace bytes as possible Probably what
/[A-Za-z'-]+/ # as many letters, apostrophes, and hyphens I would choose

/\b([A-Za-z]+)\b/ # usually best


/\s([A-Za-z]+)\s/ # fails at ends or w/ punctuation

You need to decide what you want a word to be, and then
write a pattern to detect it.

For example, is sheep-shearing a word? What about


Shepherd’s?

More info: See The Perl Cookbook, section 6.3 Page 167.

What you mean by a word varies between languages. Perl doesn’t have a built-in
definition of what a word is. You must make them from character classes and
quantifiers.

There is no simple, straight-forward answer to this question, so be careful.


How Do I … Comment Regular Expressions?

 You want to comment regular expressions.


# Find duplicate words in paragraphs, possibly spanning line boundaries.
# Use /x for space and comments, /i to match the both `is'
# in "Is is this ok?", and use /g to find all dups.
$/ = ""; # paragrep mode
while (<>) {
while ( m{
\b # start at a word boundary
(\w\S+) # find a wordish chunk
(
\s+ # separated by some whitespace
\1 # and that chunk again
) + # repeat ad lib
\b # until another word boundary
}xig
)
xig
{
print "dup word '$1' at paragraph $.\n";
}
}

More info: See The Perl Cookbook, section 6.4 Page 168.

Use the /x modifier. This will cause the regular expression engine to ignore most
whitespace inside a regular expression and will also allow for the insertion of
comments. The allowed whitespace is space, tabs, and newlines.
How Do I … Find The Nth Occurrence Of A Match?

 You want to find the Nth match in a string, not just the first one.
Input: One fish two fish red fish blue fish Example: Find the word preceding
the third occurrence of “fish”.
$WANT = 3; Use the /g modifier in a while
$count = 0; loop and keep count of the
while (/(\w+)\s+fish\b/gi) { number of matches.
if (++$count == $WANT) {
print "The third fish is a $1 one.\n";
# Warning: don't `last' out of this loop
}
}

The third fish is a red one.

/(?:\w+\s+fish\s+){2}(\w+)\s+fish/i; Use a repetition count and a


repeated pattern

More info: See The Perl Cookbook, section 6.5 Page 170.

The /g modifier creates a progressive match which can be used in a while loop. To
find the Nth match, it’s easiest to keep your own counter and then whenever you
reach the count you want, do whatever is appropriate.
How Do I … Read Records With A Pattern Separator?

 You want read in records separated by a pattern.


undef $/; Solution: Read in the whole file and
@chunks = split(/pattern/, <FILEHANDLE>); use split().

# .Ch, .Se and .Ss divide chunks of STDIN Create a localised copy of
{ $/ which will be restored
local $/ = undef; after the code finishes. By
@chunks = split(/^\.(Ch|Se|Ss)$/m, <>); using split with () we also
} get the captured separators
print "I read ", scalar(@chunks), " chunks.\n"; returned in the final array.

An example: The input stream is a text file that consists of lines


separated by “.Ch”, “.Se”, and “.Ss”, which are codes used in
troff. We want to find the text that falls between them.

More info: See The Perl Cookbook, section 6.7 Page 176.

Example 1: (Note: $/ is Perl’s input record separator). $/ cannot be a pattern - it must


be a fixed string. To get round this we undefine $/ so that the next read operation
gets the whole of the rest of the file. Then we split that huge string using whatever
pattern we choose.
How Do I … Read A Range Of Lines?

 You want read all lines from one starting pattern to an ending pattern.
while (<>) { Solution: use the range operator
if (/BEGIN PATTERN/ .. /END PATTERN/) {
# line falls between BEGIN and END in the
# text, inclusive. }
}

while (<>) { Solution: use the range operator


if ($FIRST_LINE_NUM .. $LAST_LINE_NUM) {
# line is between BEGIN and END
# inclusive. }
}
}

You don’t need to keep track


of any line numbers in your
code, Perl is doing it for you.

More info: See The Perl Cookbook, section 6.8 Page 177.

Solution: Use the range operator .. Either with patterns or with line numbers.

Here’s a very interesting Perl one-liner which makes use of this feature:

perl -ne ‘print if 23 .. 72’ any_old_file.txt

Will print out just lines 23 to 72 of the file shown.


How Do I … Match From Where The Last Pattern
Left Off?
 You want to match again from where the last pattern left off.
while (/(\d+)/g) { Solution: Use a combination of the /g modifier,
print "Found $1\n"; the \G pattern anchor and the pos function.
}

$n = " 49 here"; Use \G to anchor the next match to the


$n =~ s/\G /0/g; end of any previous match.
print $n;
00049 here

More info: See The Perl Cookbook, section 6.14 Page 190.

If you use the /g pattern modifier, the Perl regular expression engine keeps track of
its position when it finishes matching. The next time you match with /g the engine
starts looking for a match from the remembered position. This lets you use a while
loop to extract the information you want from the string.
How Do I … Match From Where The Last Pattern
Left Off?
 You want to match again from where the last pattern left off.
$_ = "The year 1752 lost 10 days on the 3rd of September";

while (/(\d+)/gc) { Find all the


print "Found number $1\n"; numbers.
}

if (/\G(\S+)/g) { Now find what


print "Found $1 after the last number.\n"; follows the last
} number.

Found numeral 1752


Found numeral 10
Found numeral 3
Found rd after the last number.

More info: See The Perl Cookbook, section 6.14 Page 190.

By default, when your match fails (say when you run out of numbers in the example
above), the remembered position is reset to the start. If you don’t want this to
happen because you want to carry on matching then use the /c modifier with /g.

This pattern:

/\G(\S+)/g

will find whatever non-whitespace characters follow the last number (rd, in this case).
How Do I … Expand And Compress Tabs?

 You want to convert the tabs in a string into the appropriate number of spaces, or
vice-versa.
while ($string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e) {
# spin in empty loop until substitution finally fails 1
}

use Text::Tabs;
@expanded_lines = expand(@lines_with_tabs); 2
@tabulated_lines = unexpand(@lines_without_tabs);

while (<>) {
1 while s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e; 3
print;
}

use Text::Tabs;
$tabstop = 4;
4
while (<>) { print expand($_) }

More info: See The Perl Cookbook, section 1.7 Page 15.

1. Either use a funny looking substitution.

2. Use the standard Text::Tabs module.

3. 1 while (CONDITION) is the same as while (CONDITION} { # Code }.

4. Use the standard Text::Tabs module.

LAB6 - REGEXP_1
LAB6 - REGEXP_2
Scope, Pragmas, Modules, Subroutines, References

Notes:
Scope

 What do we mean by scope?


 Variables are visible from the point at which they are defined.
 Private versus Public:

foreach my $pw @password_list my ( $pw , $pw_length );


{
foreach $pw @password_list
my $pw_length = length( $pw );
{
if ( $pw_length < 8 )
$pw_length = length( $pw );
{
print "$pw is too short\n"; if ( $pw_length < 8 )
} {
} print "$pw is too short\n";
}
# $pw and $pw_length don’t exist }
# here
# $pw and $pw_length do exist
# here

Scope means whether a variable is temporary/permanent and private/public.

By default (if you do nothing at all) Perl’s variables are global and permanent (Later
we’ll see that these are called package variables). Makes writing short programs very
easy, but they can be difficult to debug.

In both cases we have forced all variables to be declared before they are used (using
my) - that doesn’t affect the code. The point is that in the left example $pw and
$pw_length only exist in this piece of code. In the right example the same two
variables exist after the code is finished executing.

Subroutine declarations are global declarations - wherever you place them they are
visible to all code in your package.
Pragmas

 A special kind of module that affects how your program is compiled.


 Invoked by a use or a no.
 Example:
use strict;
use integer;
{
no strict 'refs'; # allow symbolic references
no integer; # resume floating point arithmetic
# ....
}

Notes:
Pragmas

 use constant;

use constant BUFFER_SIZE => 4096;


use constant ONE_YEAR => 365.2425 * 24 * 60 * 60;
use constant PI => 4 * atan2 1, 1;
use constant DEBUGGING => 0;
use constant ORACLE => 'oracle@cs.indiana.edu';
use constant USERNAME => scalar getpwuid($<);
use constant USERINFO => getpwuid($<);

sub deg2rad { PI * $_[0] / 180 }

print "This line does nothing” unless DEBUGGING;

You can’t define more than one constant at a time.

By convention all constants are defined in upper-case.


Pragmas

 use integer;
use integer;
$x = 10/3;
# $x is now 3, not 3.33333333333333333

use integer;
$x = 1.8;
$y = $x + 1;
$z = -1.8;

This pragma tells the compiler to use integer arithmetic only from now to the end of
the enclosing block.

In the second example you’ll be left with $x == 1.8, $y == 2 and $z == -1. The case
for $z is special since the - sign in front of the 1.8 counts as an operation (unary
minus) so the value of 1.8 is truncated to 1 before its sign bit is flipped.
Pragmas

 use lib;
#!/usr/bin/perl -w

use lib ( "/design/analog/software/Modules" );

use strict; /design/analog/software/Modules


use Carp;
use English;
My_Constants.pm
use My_Constants;
use Netlist_Functions; Netlist_Functions.pm

use Mosfet;
use Capacitor; Mosfet.pm
use Resistor; Capacitor.pm
use Diode;
use Instance;
Resistor.pm
Diode.pm

Instance.pm

This is used to modify the list of places in which Perl will look to find library modules.
It’s roughly equivalent to adding to your Unix $path variable.

The strict, Carp and English modules are all standard Perl modules. Perl always knows
how to find these.

The modules My_Constants, Netlist_Functions, Mosfet, Capacitor,


Resistor, Diode and Instance are all imported from our user defined directory.

Parameters to use lib; are prepended to Perl’s search path.


Pragmas

 use strict;
use strict; # Install all three strictures.

use strict "vars"; # Variables must be predeclared.


use strict "refs"; # Can't use symbolic references.
use strict "subs"; # Bareword strings must be quoted.

use strict; # Install all...


no strict "vars"; # ...then renege on one.

use strict 'subs';

$x = whatever; # WRONG: bareword error!


$x = whatever(); # This always works, though.

sub whatever; # Predeclare function.


$x = whatever; # Now it's ok.

This pragma changes what Perl considers to be legal code. Sometimes these
strictures seem too strict for casual programming - until you spend an hour looking
for a bug which wouldn’t have happened if you’d used this pragma.

There are three things we can be strict about: subs, vars, and refs.

Symbolic references are suspect for a lot of reasons - its pretty easy to use one even
when you don’t mean to. With this stricture in effect you can only use real or hard
references. So, what are symbolic references?

Strict vars will trigger a compile time error if you attempt to access a variable which
has not met one of the following criteria:

1. Predefined by Perl self (i.e. a built-in variable).


2. Declared with our (for a global) or my (for a lexical).
3. Imported from another package.
4. Fully qualified using its package name and the :: package separator.
Standard Modules

 Carp - Report errors from a users perspective.


 Cwd - Finds the current working directory.
 English - Allows use of English variable names.
 Exporter - Determines what a module exports.

 There are lots of other modules - see Chapter 32 of “Programming Perl”.

Carp lets you report errors from the perspective of a user, so if a user fails to use
your modules correctly, the error messages will show up not as problems in your code
(which of course you’ve thoroughly debugged), but in the users code. In other words
this is a blame shifter.

Cwd is a module which lets you find out the current working directory - for Unix this
isn’t too useful since you can always use $cwd = `pwd`; However, this is guaranteed
to work on all systems where Perl is installed even when they don’t have a shell
function which will let them do $cwd = `pwd`;

English lets you use English names instead of the standard Perl names for built-in
variables.

Exporter is used with modules to determine what subroutines can be seen from the
outside of the module.
Subroutines

 Syntax:
 To declare a named subroutine without defining it do one of these.
sub NAME
sub NAME PROTO
sub NAME ATTRS
sub NAME PROTO ATTRS

 To declare and define a named subroutine, add a BLOCK:


sub NAME BLOCK
sub NAME PROTO BLOCK
sub NAME ATTRS BLOCK This all looks pretty
complicated - but this
sub NAME PROTO ATTRS BLOCK
is normally how we do
things.
sub say_hello {
print "Hello world.\n";
}

say_hello();

A subroutine is a small self-contained sub-program. It is Invoked by its name, it may


have arguments passed to it and it can return a scalar or a list value. It’s defined
using the sub keyword followed by the subroutine code in {}.

Subroutines can be defined anywhere in your program, loaded in from other files via
do, require or use, or generated at run time with eval. You can call a subroutine
directly, indirectly through a variable containing either its name or a reference to the
subroutine, or through an object letting the object determine which subroutine should
really be called.

To create an anonymous subroutine just leave out the name. PROTO and ATTRS
stand for prototype and attributes respectively - they’re not so important. NAME and
BLOCK are essential even when they’re missing. For forms without the name you
need to have some way to call the subroutine, so do this:

$subref = sub BLOCK;

And then later on you can say:

&$subref;
Subroutines

 The function return causes execution of the subroutine to finish.


 The value specified after the return is returned as the result.
 Using a return statement is optional (but it shouldn’t be).
 If one isn’t used, then the value returned is the value of the last statement
executed.
@sorted = dictionary_order( “eat” , “at” , “Joes” );
@sorted = dictionary_order( @unsorted );
@sorted = dictionary_order( @sheep , @goats , “shepherd” , $goatherd );

sub get_next { return <>; }

prompt(); # always okay since ()


$next = get_next(); # always okay since ()

prompt; # error - hasn’t seen definition yet


$next = get_next; # okay: get_next definition already seen

sub prompt { print “next> “; }

Just as in previous examples, the lists passed to a subroutine are all flattened. So the
third call to dictionary_order would contain the contents of the array @sheep,
followed by the contents of the array @goats, the value of “shepherd” and finally
the scalar value stored in $goatherd.

It is possible to pass two or more arrays to a subroutine and have them maintain their
integrity (i.e. keep them unflattened).

If the subroutine does not require arguments then it can be passed an empty
argument list. The list can also be missed completely as long as Perl knows it’s a
subroutine.

Like variables, subroutines have a leading symbol which indicates what they are. The
name of a subroutine is preceded by an & which may be used when calling it. It must
be used when calling a subroutine in certain contexts (we’ll see these in a minute). It
can’t be used when defining the subroutine however. So this won’t work:

sub &dictionary_order # FATAL Compile Time Error


{
return sort @_;
}
Other Ways To Call Subroutines

 Subroutines which have been defined earlier can be called without “(“ and “)”.

sub make_sequence # from, to, step_size


{
@list = ();
for ( $n = $_[0] ; $n < $_[1] ; $n += $_[2] )
{
push @list , $n;
}
return @list;
}

@stepped_sequence = make_sequence $min , $max , $step_size;

&my_subroutine; # Means my_subroutine( @_ );


my_subroutine; # Means my_subroutine();

Arguments passed to a subroutine are available via the @_ array.

Example 1: A subroutine already defined can be called without the “(“ and “)” around
the argument list.

Example 2: Another way to call a subroutine is to use the & prefix but without passing
any arguments. In this case the subroutine has the value of the @_ array passed to it
instead. This is used to call subroutines from within other subroutines. This is almost
never used in new code but may be present in old code. Always use subroutines as
shown in the style section of this course.
Named Subroutine Arguments

 Suppose we had a subroutine which took a lot of arguments:

ls( “*” , “any” , 1 , 1 , 0 , 0 , “alpha” , 4 , 1 );

ls( undef, undef , 1 , 1 , undef , undef , undef , 4 , 1 );

ls( cols => 1 , pages => 4 , width => 80 );

sub ls
{
%arg = @_; # convert a list to a hash

$arg{ pages } = “*” unless exists $arg{ pages };


$arg{ cols } = 1 unless exists $arg{ cols };

#etc
}

Example 1: You don’t want to pass 9 arguments to this subroutine when only a few
are going to change.

Example 2: You could arrange that passing undef as a parameter chooses a default
value but we’d still have to write a long piece of code as shown.

Example 3: Perl supports named parameters for arguments by passing a hash to a


subroutine rather than an array. We can use the => operator to associate a name
with each argument. Inside a subroutine we initialise a hash with the contents of the
@_ array. This documents the call better and since the entries of a hash can be
initialised in any order we don’t need to remember the order of parameters in the call.
Named Subroutine Arguments (Continued)
Set up some
defaults
%std_listing = ( cols => 2 , pages => 4 );

ls ( files => “*.txt” , %std_listing ); Use the


ls ( files => “*.log” , %std_listing ); defaults
ls ( files => “*.hlp” , %std_listing );

ls ( files => “*.dat” , %std_listing , cols => 8 ); Override some


of the defaults

In the first example we set up some default values for some arguments.

In the second set of examples we use the standard set of parameters.

In the third example we use a default set of arguments and then override some of
that standard set as well.
Aliasing Of Parameters - Pass By Reference

#!/usr/bin/perl -w

use strict;

my $line = “Mary had a little”;


my $animal = “lamb”;

Print_Rhyme( $line , $animal ); # prints “Mary had a little lamb”


Print_Rhyme( $line , $animal ); # prints “Mary had a little dog”

exit;

sub Print_Rhyme # Parameters passed in @_ as aliases


{
print $_[0] . “ “ . $_[1] . “\n”;
$_[1] = “dog”;

return 0;
}

In this code we pass the parameters in @_ (this is always true) and use them in the
subroutine as aliases. Therefore when we change the value of one or more of the
parameters in the subroutine we are actually changing them in the calling code as
well.

Therefore

$_[1] = “dog”;

has the effect of saying that

my $animal = “dog”;

on line 6.

This is nearly always *NOT WHAT YOU WANT*


Aliasing Of Parameters - Pass By Value

#!/usr/bin/perl -w

use strict;

my $line = “Mary had a little”;


my $animal = “lamb”;

Print_Rhyme( $line , $animal ); # prints “Mary had a little lamb”


Print_Rhyme( $line , $animal ); # prints “Mary had a little lamb”

exit;

sub Print_Rhyme # Parameters passed in @_ and copied


{ # into local variables
my ( $line , $animal ) = @_;

print $line . “ “ . $animal . “\n”;


This change is isolated
$animal = “dog”;
to the Print_Rhyme
subroutine.
return 0;
}

In this code we pass the parameters in @_ (this is always true) and use them in the
subroutine as values by copying them into local variables. Therefore when we change
the value of one or more of the parameters in the subroutine the change is restricted
to the values of the local variables in the subroutine. Therefore the assignment:

$animal = “dog”;

has no effect on the calling code - it is localised in the


Print_Rhyme subroutine.

This is the way you should use subroutines.


A Standard Way Of Using Subroutines
sub _interpolate_value
{
my ( $t1 , $v1 , $t2 , $v2 , $time ) = @_;

croak( "No t1 value in Waveform::_interpolate_value()" ) unless defined( $t1 );


croak( "No v1 value in Waveform::_interpolate_value()" ) unless defined( $v1 );

croak( "No t2 value in Waveform::_interpolate_value()" ) unless defined( $t2 );


croak( "No v2 value in Waveform::_interpolate_value()" ) unless defined( $v2 );

croak( "No time in Waveform::_interpolate_value()" ) unless defined( $time );

if ( $t1 == $time ) { return( $v1 ); }


if ( $t2 == $time ) { return( $v2 ); }

my $delta_t = $t2 - $t1;


my $delta_v = $v2 - $v1;

croak ( "Error - divide by zero in Waveform::_interpolate_value()" ) if ( $delta_t == 0 );

my $dv_by_dt = $delta_v/$delta_t;

my $interpolated_value = $v1 + ( $time - $t2) * $dv_by_dt;

return( int $interpolated_value );


}

Elements of the @_ array are special. They are not copies of the actual arguments.
They are aliases to the actual arguments.

If values $_[0], $_[1] etc. are changed then the argument in the calling routine is
changed, i.e the parameters in this case are passed by reference.

This behavior is useful but can lead to hard to find bugs.

Would prefer to be able to pass by value - this is the more usual form, so explicitly
copy the @_ array into a new array, and to be doubly safe make the receiving array a
my() array.

The above code is a fragment of an object-oriented program. The _ at the front of


the subroutine name is a convention for internal subroutines in OO code - it’s a
subroutine called only from within the object. croak() is a subroutine defined with:

use Carp;

It corresponds to die().
Subroutine Calling Context

 When a subroutine is called it is possible to detect whether it was expected to


return a scalar, a list or nothing at all.
 The contexts in which a subroutine is called are:

ls ( @files ); # void context: no return value expected

$listing = ls( @files ); # scalar context: scalar return value expected

@missing = ls( @files ); # list context: list return value expected


($f1 , $f2 ) = ls( @files ); # list context: list return value expected

print ( ls( @files ) ); # list context: list return value expected

The information about the calling context is obtained from the wantarray function.

The function returns:

undef (false and undefined) if the subroutine was called in void


context.
“” (false and defined) if the subroutine was called in scalar context.
1 (true and defined) if the subroutine was called in list context.

We could use his information to decide what value a subroutine needs to return.
Subroutine Prototypes

 Subroutines can be defined with a prototype.


 A series of specifiers which restrict the type and number of arguments.

sub add_two_param ( $$ )
{
return( $_[0] + $_[1] );
}

 The prototype is the ( $$ ) part.


 This restricts the arguments to be two scalars.
 But note - if you pass an array then the array context will be coerced to scalars
- i.e. the two scalars will be the lengths of the arrays.
 See perlsub man pages.

Notes:
How Do I … Access Subroutine Arguments

 You have written a function and want to access the arguments passed by its caller.
sub hypotenuse { Solution
return sqrt( ($_[0] ** 2) + ($_[1] ** 2) );
}

$diag = hypotenuse(3,4); # $diag is 5 Invoke like this

sub hypotenuse { Better version with


my ($side1, $side2) = @_; private variables
return sqrt( ($side1 ** 2) + ($side1 ** 2) );
}

More info: See The Perl Cookbook, section 10.1 Page 335.

All values passed as arguments are in the special array @_. So the first argument is in
@_[0] and so on. The number of arguments is scalar(@_).

Subroutines should always start by copying the arguments into a new private array.

To return a value from a subroutine use the return function. If there is no return
statement, then the value returned by the subroutine is the value of the last
statement executed by the subroutine.
How Do I … Make Variables Private To A Function

 You want to use temporary variables in your function.


sub somefunc { Solution: Use my to declare variables private
my $variable; to the subroutine.
my ($another, @an_array, %a_hash);
# ...
}

my ($name, $age) = @ARGV; You can combine my variables with


my $start = fetch_time(); an assignment

my ($a, $b) = @pair; Declare some variables


my $c = fetch_time();

sub check_x { $x and $y private to this function


my $x = $_[0];
my $y = "whatever";
run_check(); run_check() can’t see $x or $y
if ($condition) {
print "got $x\n";
} However, check_x can see $a, $b and $c
} since they are defined in the same scope

More info: See The Perl Cookbook, section 10.2 Page 337.

$variable is only visible and accessible within the function somefunc().

When you declare many private variables you must do so inside a list, like this:

my ($another, @an_array, %a_hash);

Variables declared with my have lexical scope, which means that they only exist
within a certain textual area of your code. Such a variable is destroyed when the body
of code is ended. Usually the body of code is a block with braces around it like this:

{
# Your Code Here
}

Since a lexical scope is usually a block you will often hear the phrase lexical variables
being only visible within their block.
How Do I … Create Persistent Private Variables

 You want a variable to retain its value between calls to a subroutine but not to be
visible outside that subroutine.
{
Solution: Wrap the function in
my $variable;
another block and declare my
sub mysub {
variables in the blocks scope
# ... accessing $variable }
rather then the functions.
}

BEGIN { Use a BEGIN block if


my $variable = 1; # initial value you need to perform
sub othersub { # ... accessing $variable } initialisation
}

{ By default the initial value in


my $counter; $counter is undef, which is
sub next_counter { return ++$counter } } treated as zero the first time
next_counter() is called
BEGIN {
my $counter = 42; Do this to initialise to
sub next_counter { return ++$counter } anything other than 0
sub prev_counter { return --$counter } }

More info: See The Perl Cookbook, section 10.3 Page 339.

Lexical variables don’t need to vanish when their scope ends. If something more
permanent is still aware of the lexical then it will be maintained. (Perl does this by
reference counting).
How Do I … Detect Return Context

 You want to return a value that depends upon the calling context.

if (wantarray()) { Solution: Use wantarray()


print "In list context\n";
return @many_things;
} elsif (defined wantarray()) {
print "In scalar context\n";
return $one_thing;
} else {
print "In void context\n";
return; # nothing
}

mysub(); # void context

$a = mysub(); # scalar context


if (mysub()) { } # scalar context

@a = mysub(); # list context


print mysub(); # list context

More info: See The Perl Cookbook, section 10.6 Page 344.

Solution: Use wantarry() which returns one of three things depending on how the
function was called.

A function can decide what context it was called in and then return something which
is appropriate to that context.

List context is indicated by a true return value.

Scalar context is indicated by a false return value which is defined.

Void context is indicated by a undef return value.


References

 Two kinds of references:


 Hard (real - a bit like pointers in C, C++).
 Symbolic (use the name of one thing to access some other thing).

 Allows a variable or a subroutine to be accessed indirectly.


 A reference is not a variable - it’s a means of accessing a variable.
 To create a reference we use the \ operator.
 This takes an ordinary variable and returns a reference to it, like this:

$ref_to_scalar = \$my_scalar;
$ref_to_array = \@my_array;
$ref_to_hash = \%my_hash;
$ref_to_sub = \&my_sub;

We are going to discuss hard references here and symbolic references (only in
passing) at the end of this section. When we say references we will always mean a
hard reference.

Once we have a reference, we can get at the thing it refers to by prefixing the
reference (optionally in { and }) with the appropriate symbol.

To refer to $my_scalar we write one of these:

${\$my_scalar};
$$ref_to_scalar;
${$ref_to_scalar};

So we can access @my_array like this:

@{\@my_array};
@$ref_to_array;
@{$ref_to_array};

and so on. If you prefix a reference by the wrong symbol then you’ll get an error.
References

 Accessing the elements of an array or hash through a reference:

$a = ${ $hash_ref }{ “first” }; This is a


${$array_ref}[0] = $h{ “first” }; bit messy

$a = $hash_ref->{ “first” }; But this


$array_ref->[0] = $h{ “first” }; is better

The arrow operator takes a reference on its left and either an array index in [] or a
hash key in {} on its right. It locates the array or hash that the reference refers to
and then access the appropriate element.
References And The ref() Function

If $reference contains: Then ref( $reference ) returns:


A scalar value undef
A reference to a scalar “SCALAR”
A reference to an array “ARRAY”
A reference to a hash “HASH”
A reference to a subroutine “CODE”
A reference to a filehandle “IO” or “IO:Handle”
A reference to a typeglob “GLOB”
A reference to a precompiled pattern “Regexp”
A reference to another reference “REF”

Object references are missing from the above list because the thing a
reference to an object will return is the name of the object. This, of course,
changes as you use different objects.

Because dereferencing a reference with the wrong prefix can cause errors it’s
sometimes necessary to be able to figure out what kind of referent a specific
reference is referring to.

The built-in ref() function takes a scalar value and returns a description of the kind of
reference it contains.

If a reference is used where a string is expected then the ref function is called
automatically to produce a string and a unique hex address representing the internal
memory address of the referent is appended. This means that printing out a reference
usually produces something like:

HASH(0x10027588)

If you use the ref() function on an object, this will be returned:

my $graphics_object = Polygon->new( 0 0 5 5 10 32 70 10 12 18
); # Polygon coordinates
print ref( $graphics_object ); # Will print “Polygon”
References And Anonymous Arrays

 References are useful in creating multi-dimensional arrays:

@table = (
( 1 , 2 , 3 ) ,
( 4 , 5 , 6 ) , This won’t work!
( 7 , 8 , 9 ) ,
);

@table = ( 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 );

@row1

@cols 1 2 3
@row1 = ( 1 , 2 , 3 );
@row2 = ( 4 , 5 , 6 ); \@row1 @row2
@row3 = ( 7 , 8 , 9 );
$table \@row2 4 5 6
@cols = ( \@row1 , \@row2 , \@row3 );
\@row3 @row3
$table = \@cols;
7 8 9

The first example doesn’t work because of list flattening. So we need to use
references to solve this problem.

Each element in a Perl array can store a scalar, and a reference is a scalar (albeit a
special kind of scalar).

The bottom half of the slide shows how to set this up using references. The elements
of the rows can be accessed using the arrow -> notation.

$table->[1]->[2];

This means: find the array referred to by the reference in $table (i.e. @cols) and then
get the element at index 1. That element stores a reference (a reference to @row2),
the get the element at index 2.

What’s the result?

This is a popular way of creating data structures so Perl provides some simple
assistance. If we place the list values in [] instead of () we create a reference to a
nameless (or anonymous) array. The array is automatically initialised to the specified
values.
References And Anonymous Arrays

 References are useful in creating multi-dimensional arrays:

@table = (
( 1 , 2 , 3 ) ,
( 4 , 5 , 6 ) , This won’t work!
( 7 , 8 , 9 ) ,
);

$table = [
[ 1 , 2 , 3 ] ,
[ 4 , 5 , 6 ] , But this will!
[ 7 , 8 , 9 ] ,
];

The bottom example is identical to the data structure we set up on the previous page
except that all the internal arrays are anonymous - so you can’t access @cols or
@rows. The only access to the array elements is via the reference to the overall table.

As a final piece of help, in any expression like:

print $table->[$x]->[$y];

Any arrow between a closing square or curly bracket and an opening square or curly
bracket can be removed. So the above can be rewritten like this:

print $table->[1][2];

which is much neater.


References To Hashes
%association = ( cat => “nap” , dog => “gone” , mouse => “ball” );

$association = { cat => “nap” , dog => “gone” , mouse => “ball” };

$behave =
{
cat => { nap => “lap” , eat => “meat” } ,
dog => { prowl = “growl” , pool => “drool” } ,
mouse => { nibble => “cheese” } ,
};

print “Cats eat “ , $behave->{cat}->{eat};


print “Cats eat “ , $behave->{cat}{eat};

Like the [] array constructor the {} hash constructor creates a reference which must
be assigned to a scalar variable ($association), not to a hash (%association). Like the
array reference, the values in the hash are only accessible via the hash reference:

print $association->{ cat };

You can even nest hashes as well.

Just like arrays, any -> between } and { can be omitted.


How Do I … Return More Than One Array Or Hash

 You want to return more than one array or one hash.

($array_ref, $hash_ref) = somefunc(); Solution: Return references to the


hashes or arrays
sub somefunc {
my @array;
my %hash;

# ...

return ( \@array, \%hash );


}

sub fn {
.....
return (\%a, \%b, \%c); # or
return \(%a, %b, %c); # same thing
}

More info: See The Perl Cookbook, section 10.9 Page 347.

Just as all lists are flattened when multiple lists are passed to a function, the same
happens with lists returned from functions with the return statement. Therefore to
maintain the integrity of the arrays and hashes which are returned from a function,
the arrays and hashes must be returned as references.
Creating Data Structures

 Suppose you write this as the first line of your program:

$sue{ children }->[1]->{ age } = 10;

 That’s pretty minimalist (and neat).

Perl creates a hash called %sue, gives it a new hash element indexed by the string
children, points that to a newly allocated array whose second entry is made to
refer to a newly allocated hash which gets and entry indexed by the string age.
References To Subroutines

 Anonymous subroutines can be created like this:

sub { print “Hello $_[0]\n”; }

 The above is useless since there’s no way to execute the subroutine, so do this:

$sub_ref = sub { print “Hello $_[0]\n”; };

 We can then call this:

$sub_ref->( “Steve”; )

Notes: The “;” at the end of the second example is required since the whole line is a
statement.

The third example executes the code in the subroutine reference. We need to pass a
parameter to the subroutine and this is done by enclosing it between “(” and “)”.
Passing Subroutine Arguments As References
sub mysub
{
# Arrays are references, counts are scalars Might be useful to prefix
references with ref_
my ( $array1 , $count1 , $array2 , $count2 ) = @_;

my $item1 = $array1->[ $count1 ];


my $item2 = $array2->[ $count2 ];

# Suppose $item1 = 15 and $item2 = 36

return( $item1 , $item2 );


}

# Call the above like this (assumes arrays and counts already set up)

my ( $r1 , $r2 ) = mysub( \@array1 , $count1 , \@array2 , $count2 );

print $r1 , $r2;

# prints 15 and 36

References provide a way of passing unflattened arrays or hashes to a subroutine


(remember that when we pass more than one array to a subroutine their identity is
lost because of array flattening).

In this code we are expecting four parameters to be passed to mysub, two arrays,
and two scalars which will be interpreted as an index into those arrays. The arrays are
passed by reference, the scalars by value. Note that we can return more than one
value from a subroutine - in this case we return 2.
Returning Subroutine Results As References
sub make_random_list
{
# Counts are scalars

my ( $count1 , $count2 ) = @_;


my @new_array = (); This is an example
of how not to do it.
foreach my $index ( $count1 .. $count2 )
{ What do you think is
$new_array[ $index ] = rand(); wrong with this?
}
return( @new_array );
}

# Call the above like this:

my @big_random_array = make_random_list( 42 , 14826504 );

# Do stuff with big_random_array

print $big_random_array[ 137 ];

Subroutines can return references as well as receiving them. This example shows a
subroutine which generates a large list of random numbers and then copies that list
back the the code which called the subroutine. As shown above the list is copied back
by value, I.e. a big copy of the list is passed back to the calling code as a large array.
This means that in the program code there exists:

1 copy of the array in the subroutine, and once the subroutine ends and the array
@new_array goes out of scope, that array is destroyed by Perl.

1 copy of the array is brought into existence in the main program as the end of
subroutine is reached and each of the internal values in new_array is copied back into
big_random_array. TINTWTDI.
Returning Subroutine Results As References
sub make_random_list
{
# Counts are scalars

my ( $count1 , $count2 ) = @_;


my @new_array = (); This is an example
of how to do it.
foreach my $index ( $count1 .. $count2 )
{ What do you think is
$new_array[ $index ] = rand(); wrong with this?
}
return( \@new_array );
}

# Call the above like this:

my $big_random_array = make_random_list( 42 , 14826504 );

# Do stuff with big_random_array

print $big_random_array->[ 137 ];

In this code there is only ever one copy of the list - and it’s the one defined in the
subroutine. When the subroutine ends and returns a reference to the list, normally
Perl would arrange for the list to be destroyed (since it’s local to the subroutine and
it’s about to go out of scope). However, since the subroutine is passing back a
reference to an array, Perl arranges for the array to remain in existence. Only if the
reference to the array is ever made to cease to exist, will Perl then delete the array
which was defined inside the subroutine.

Perl does this using a mechanism called reference counting. Basically it means that all
Perl’s garbage collection is done for you.

If you wanted to force Perl to delete the array inside the subroutine (to save on
memory, say) then all you need to do is to;

undef $big_random_array;

Perl will reduce the reference count on the variable, and if it is zero then the array
created by the subroutine will be deleted.

Also, since only one thing (a scalar which is a reference) is passed back from the
subroutine to the calling code, it’s very quick and efficient.
Symbolic References

 Examples:

$name = "bam";
$$name = 1; # Sets $bam
$name->[0] = 4; # Sets the first element of @bam
$name->{X} = "Y"; # Sets the X element of %bam to Y
@$name = (); # Clears @bam
keys %$name; # Yields the keys of %bam
&$name; # Calls &bam

With symbolic references Perl is using the value of one variable as the name of
another variable. This can be error prone and confusing, so I tend not to use this type
of reference. You can force Perl to make all of the above examples into errors by
using:

use strict;

Which I would recommend. If you then have a desperate need to use a symbolic
reference for a while you can then always countermand the stricture with:

no strict ‘refs’;
Packages
sub call
{
( $sub_ref , @args ) = @_;
$sub_ref->( @args );
}
This defines three completely distinct
package phone; subroutines named call.

sub call The first is in the main namespace.


{ The second is in the phone namespace.
if ( dial() ) The third is in the poker namespace.
{
talk();
} If we do this, which call are
} we calling?

package poker;

sub call
{ package main;
$pot = 21;
deal(); call( $ref , @args );
}

We would all like to use popular variable names like $count, $filename, $I. If
we did this there wouldn’t be any way to use other peoples code, since they would
have used the same variable names. Perl solves this problem by assigning each
named variable and each named subroutine to a particular family, known as a
package.

Each package maintains its own symbol table or namespace. So two different
packages may each have different variables and subroutines with identical names in
their own namespace.

By default Perl assumes that code is written in the namespace of the main package
(which is called, appropriately enough, “main”). You can change that default by
using the package keyword. A package declaration changes the namespace until
another package declaration is made or until the end of the current enclosing block,
eval, subroutine or file. See example:

The example defines three subroutines called “call” in three different packages. The
first, since it isn’t explicitly named is the main package. If we wanted to call one of
the other subroutines called call, we could either switch to the package or we can call
the subroutine version explicitly by prefixing the subroutine name by the package
name like this:

poker::call();
Package Variables

 Perl variables come in two flavours: package html;


 Package variables. $i = 56;
 Lexical variables.
 Package variables belong to a particular package.
 These are the standard, no-preparation-necessary, instant variables we all use
most of the time.

for ( $i = 0 ; $i < 100 ; $i++ )


{
print “$i\n”; Prints 0 .. 99
}

for ( $i = 0 ; $i < 100 ; $i++ )


{
print “$html::i\n”; Prints ???
}

$i is created when it is referenced and it exists until goes out of scope, in this case
the end of the program since it isn’t a lexical variable - it belongs to the current
package. We can force the use of a variable in another package by prefixing the
name of the variable with the name of the package followed by a ::
Lexical Variables

 Lexical variables:
 Lexical variables are declared explicitly with the keyword my.

package main;
A lexical
my $i;
variable
for ( $i = 0 ; $i < 100 ; $i++ )
{
A lexical
my $time = localtime();
variable
print “$i at time=$time\n”;
}

Lexical variables differ from package variables in three ways:

1 They don’t belong to any package, so you can’t prefix them with a package name.
2 They can only be accessed within the physical boundaries of the code block or file
scope in which they are declared. In the code shown, the variable $time is only
accessible to code physically located in the for loop and not to code appearing before
of after the loop.
3 They usually cease to exist each time the program leaves the code block in which
they were declared. In the example the variable $time ceases to exist at the end of
each iteration of the for loop (it is recreated at the beginning of each iteration of the
loop).
Modules

 Modules are the re-use part of Perl.

 A Perl module is a text file with a suffix .pm containing some Perl code.
 It’s placed in a “standard” place.
 You can add to the “standard” places with a use lib; statement.
 When the compiler encounters a use statement in a program it searches through
the standard directories, locates the file, and loads the code.

 Modules come in two flavours:


 Traditional - Interface available by exporting symbols.
 Object Oriented - Interface available by method calls.

 When you have created a module you can control what is visible to a user with the
Exporter() module. See the example at the end of this section.

The easiest way to see how to use modules is by example.

An example of exporting a module interface with symbols follows on the next slide.
An example of exporting a modules interface with method calls will be shown when
we come to Object Oriented Perl. (Generally Object oriented modules export nothing,
since the whole idea of methods is that Perl finds them for you automatically based
on the type of the object).
An Example Of Building A Module

 To build a module called Bestiary, create a file called Bestiary.pm that looks like
this:
package Bestiary;
require Exporter;

our @ISA = qw(Exporter);


our @EXPORT = qw(camel); # Symbols to be exported by default
our @EXPORT_OK = qw($weight); # Symbols to be exported on request
our $VERSION = 1.00; # Version number

### Include your variables and functions here

sub camel { print "One-hump dromedary" }

$weight = 1024;

1;

This is very important

In the example a program can now do this:

use Bestiary;

to be able to access the camel function (but not the weight variable), and:

use Bestiary qw( camel $weight );

to access both the function and the variable.

When you use a module, the module usually makes some variables or functions
available to your program - some symbols are exported from your module. Most
modules use Exporter to do this.

When modules are loaded they must return a TRUE value to indicate that the loading
was successful. This is usually represented by retuning the TRUE value as shown on
the last line of the example.
An Example Of Building A Module
require Exporter; These two lines make the module inherit from the
our @ISA = ("Exporter"); Exporter class (described in object-oriented Perl).
Bestiary can now export
symbols into other packages
with lines like this.
our @EXPORT = qw($camel %wolf ram); # Export by default
our @EXPORT_OK = qw(leopard @llama $emu); # Export by request
our %EXPORT_TAGS = ( # Export as group
camelids => [qw($camel @llama)],
critters => [qw(ram $camel %wolf)],
);

You can include any of


these statements to
use Bestiary; # Import @EXPORT symbols import symbols from
use Bestiary (); # Import nothing the Bestiary module.
use Bestiary qw(ram @llama); # Import the ram function and @llama array
use Bestiary qw(:camelids); # Import $camel and @llama
use Bestiary qw(:DEFAULT); # Import @EXPORT symbols
use Bestiary qw(/am/); # Import $camel, @llama, and ram
use Bestiary qw(/^\$/); # Import all scalars
use Bestiary qw(:critters !ram); # Import the critters, but exclude ram
use Bestiary qw(:critters !:camelids);
# Import critters, but no camelids

The first two line make the module inherit from the Exporter class.

The second set of lines tells Bestiary what it is allowed to export into classes which
use it.

The third set of lines can all be used in any program which uses Bestiary to determine
what is and what is not imported into the current package.

Leaving a symbol off the export lists does not render that symbol inaccessible to the
program using the module. The program will always be able to access the contents of
the modules package by fully qualifying the package name, like this:

$Bestiary::number_of_lambs;
POD, Special Variables, Internal Perl Functions
Command Line Switches, Perl One-liners

Notes:
POD

 Perl supports a simple mark-up langauage called POD


 Plain Old Documentation.
 You can embed POD in any sort of file - including Perl scripts/programs.
 Perl simply skips over the POD when compiling.
 The Perl lexer starts skipping when it sees an = sign and an identifier.
 =head1 Here There Be Dragons!
 All of the text from here until the lexer sees =cut, will be ignored.

=item snazzle

The snazzle() function will behave in the most spectacular form possible

=cut

sub snazzle {
my $arg = shift;
....
}

If you ever download CPAN modules you’ll find that a lot of them have POD
documentation included within the code. This is confusing at first until you realise that
the compiler just skips over all the POD.

Perl ships with tools to convert files containing POD into various printable file formats:

pod2text File.pm | more


pod2man File.pm | nroff -man | more

Or

pod2man File.pm | troff -man -Tps -t > tmppage.ps


ghostview tmppage.ps

Pod2html File.pm > tmppage.html

For a complete overview of POD see Chapter 26 of Programming Perl 3rd edition.

Look at Mosfet.pm in the Examples/OO_Code area. Also see Mosfet.pod_text,


Mosfet.man, Mosfet.postscript and Mosfet.html in the same area.
Some Special Variables

use English; Short name What it does

@ARG @_ Argument list passed to subroutine


$ARG $_ Default input and search pattern
%ENV Hash containing your current environment variables
$LIST_SEPARATOR $" Defaults to a space
$MATCH $& The string matched in the last successful pattern
$POSTMATCH $’ The string following what was last matched
$PREMATCH $` The string preceding what was last matched
$ERROR $! Current value of last system call
STDERR Special filehandle for standard error in any package
STDIN Special filehandle for standard input in any package
STDOUT Special filehandle for standard output in any package

This is not an exhaustive list - see Chapter 28 of Programming Perl, 3rd edition.

Items without a short name don’t need the use English; pragma.
Some Perl Functions (By Category)

 Scalar manipulation:
 chomp, chop, hex, lc, length, oct, reverse, sprintf, substr, tr///, uc, y///.
 Regular expressions:
 m//, s///, split.
 Numeric functions:
 abs, atan2, cos, exp, hex, int, log, oct, rand, sin, sqrt, srand.
 Array processing:
 pop, push, shift, unshift.
 Hash processing:
 delete, each, exists, keys, values.
 Filehandles, files and directories:
 chdir, chmod, chown, chroot, link, mkdiir, open, opendir, rename, rmdir, stat,
umask, unlink, utime.

Notes:
Some Perl Functions (By Category)

 Flow of program control:


 continue, die, eval, exit, goto, last, next, redo, return, sub, wantarray.
 Miscellaneous:
 defined, eval, scalar, undef.
 Process and process groups:
 alarm, exec, fork, kill, pipe, setpriority, sleep, system, wait, waitpid.
 Library modules:
 import, package, require, use.
 Classes and objects:
 bless, package, ref, use.
 Time:
 gmtime, localtime, time.

There are also extensive categories for:

1. Low-level socket access.


2. Inter-process communication.
3. Fetching user and group information.
4. Fetching network information.
Examples Of chop() And chomp()

@lines = `cat myfile`; Remember, chop is indiscriminate,


chop @lines; it always removes something, so
you’re supposed to know that the
chop($cwd = `pwd`); last character on a line is “\n”.
chop($answer = <STDIN>);

$answer = chop($tmp = <STDIN>); # WRONG What is in $answer?

$last_char = chop($var);

while (<PASSWD>) { chomp is more discriminating, it


chomp; # avoid \n on last field will only remove the last character
@array = split /:/; if it’s a “\n”.
...
} You could also do s/\n$//; which is
explicit.

You almost always want to use chomp() and not chop().

chop() always returns the character it removes. If you chop() a list, then every
item in the list is chopped. The thing which ends up in $answer in the question on
the slide is the character which was removed from the string $tmp. The thing you
probably wanted was $tmp.

chomp() is discriminating, and although by default it always removes the last


character on a line only if that character is “\n”, the default can be overridden. The
character (or string) which is removed is that contained in the Perl variable $/. So
chomp() can remove any arbitrary length string from the end of an input string.

chomp() returns the number of characters it deleted - not the characters themselves.
Examples Of hex() And oct()

$number = hex("ffff12c0");
sprintf uses the same
sprintf "%lx", $number; # (That's an ell, not a one.) conventions as C’s sprintf.

perl -e 'print 0xffdc;' A neat command line alternative


when you need a quick conversion.

Does $val start with an “0” (as


$val = oct $val if $val =~ /^0/; opposed to “0x” or “0b”).
$perms = (stat("filename"))[2] & 07777;
$oct_perms = sprintf "%lo", $perms;

Note that you can always set the value of any variable with a hex value just by doing
this:

$h_number = 0xffdd;
print $h;

The hex() function is interpreting a string as a hex number, not a value. If the string
begins with “0x”, this is ignored. To do a reverse conversion use sprintf() as
shown.

Hex strings can only represent integers. Strings which would cause integer overflow
will trigger a warning.

oct() will interpret a string as an octal value. If the string starts with “0” it will be
interpreted as octal. If the string starts with “0x” it will be interpreted as a hex
value. If it begins with “0b” it will be interpreted as a binary value.

Try this:

perl -e ‘print 0b11001001;’ # Is anyone (apart from me) sad


enough to know from what 80’s/90’s TV series this was an
episode title.
Examples Of sprintf()

Field Meaning

%% A percent sign

%c A character with the given number

%s A string

%d A signed integer, in decimal

%u An unsigned integer, in decimal

%o An unsigned integer, in octal

%x An unsigned integer, in hexadecimal

%e A floating-point number, in scientific notation

%f A floating-point number, in fixed decimal notation.

%g A floating-point number, in %e or %f notation

See Chapter 29 (pages 797 to 799) of Programming Perl, 3rd edition.

Be careful - sprintf() in Perl does its own formatting - it is NOT calling the
underlying sprintf() function in the C library.
Examples Of sprintf()

Field Meaning

%X Like %x, but using uppercase characters

%E Like %e, but using uppercase “E”

%G Like %g, but using uppercase “E” if applicable

%b An unsigned integer, in binary

%p A pointer (the Perl value’s address in hexadecimal)

%n A special: stores the number of characters output so far into the next variable in the
argument list.

In addition to the formats on the previous slide, Perl also supports the following
conversions.

For compatibility, Perl also supports these conversions:

%I - a synonym for %d
%D - a synonym for %ld
%U - a synonym for %lu
%O - a synonym for %lo
%F - a synonym for %f
Examples Of sprintf()

Flag Meaning

space Prefix positive number with a space

+ Prefix positive number with a plus sign

- Left-justify within field

0 Use zeroes, not spaces, to right-justify

# Prefix non-zero octal with “0”, non-zero hex with “0x”

number Minimum field width

.number “Precision”: digits after the decimal point for floating-point numbers, maximum length
for a string, minimum length for an integer.
l Interpret integer as a C type long or unsigned long

h Interpret integer as C type short or unsigned short (if no flags are supplied interpret
integer as C type int or unsigned

See Chapter 29 (pages 797 to 799) of Programming Perl, 3rd edition.

Perl allows the following flags between the % and the conversion character.
Examples Of split()
@chars = split //, $word;
@fields = split /:/, $line; Question: What does
@words = split " ", $paragraph; this produce?
@lines = split /^/, $buffer;

print join ':', split / */, 'hi there';

($login, $passwd, $remainder) = split /:/, $_, 3;

split /([-,])/, "1-10,20"; # Produces the list (1, '-', 10, ',', 20);

split /(-)|(,)/, "1-10,20"; # Produces the list (1, '-', undef, 10, undef, ',', 20)

$string = join(' ', split(' ', $string));

Syntax:
Split /PATTERN/ , EXPR , LIMIT
split /PATTERN/ , EXPR
split /PATTERN/
split
split() scans a string and splits the string into lots of sub-strings, returning the
resulting list in list context, or the count of sub-strings in scalar context. The
separator is determined by pattern matching using the regular expression given as
part of the split() function - so the separators need not be the same size and need
not be the same string, on every match. Normally the separators are not returned
(but if the pattern contains () then the substring matched by each pair of () IS
included in the resulting list, interspersed with the fields which are normally returned).
If more than one pair of () is used then one substring is returned for each pair (some
may be undef, so be careful).

If the pattern doesn’t match at all then split() returns the original string.

If a limit is supplied then Perl will not return more than that number of sub-strings.

If no sting is supplied then Perl uses “$_”.

If no pattern is supplied or is the literal space “ “, then the function splits on


whitespace, /\s+/, after skipping any leading whitespace.
Examples Of split()

open PASSWD, '/etc/passwd';


while (<PASSWD>) {
chomp; # remove trailing newline
($login, $passwd, $uid, $gid, $gcos, $home, $shell) = split /:/;
...
}

while (<>) {
foreach $word (split) {
$count{$word}++;
}
}

Both examples make use of defaults. In both cases the input text is extracted with
the <> operator and thus the splitting occurs on “$_”.

In the second case split() is passed no string (so it uses “$_”) and no pattern (so it
strips all leading whitespace and then splits on whitespace).
Examples Of stat() And unlink()
($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,
$atime,$mtime,$ctime,$blksize,$blocks) = stat $filename;

if (-x $file and ($d) = stat(_) and $d < 0) {


print "$file is executable NFS file\n";
}

$mode = (stat($filename))[2];
printf "Permissions are %04o\n", $mode & 07777;

use File::stat;
$sb = stat($filename);
printf "File is %s, size is %s, perm %04o, mtime %s\n",
$filename, $sb->size, $sb->mode & 07777,
scalar localtime $sb->mtime;

$count = unlink ‘file1’ , ‘file2’ , ‘file3’;


unlink @victims();

The stat() function returns a 13 element list giving statistics for a file. If a file stat
isn’t supported on a particular file system then the corresponding entry will be zero.

See page 801 of “Programming Perl, 3rd edition” for more details.

The File::stat module provides a convenient, by-name access mechanism.

The unlink() function is used to delete a list of files. The function returns the number
of files which were successfully deleted. BE CAREFUL - this is ‘rm’ in disguise.
gmtime And localtime
# 0 1 2 3 4 5 6 7 8
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = gmtime;

$london_month = (qw(Jan Feb Mar Apr May Jun


Jul Aug Sep Oct Nov Dec))[(gmtime)[4]];

# 0 1 2 3 4 5 6 7 8
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime;

$thisday = (Sun,Mon,Tue,Wed,Thu,Fri,Sat)[(localtime)[6]];

perl -le 'print scalar localtime'

All elements of the lists returned by gmtime() and localtime() are numeric, so January
is month 0, Sunday is day 0.

$year is the number of years since 1900.


system And exec And ``

@args = ("command", "arg1", "arg2");


system(@args) == 0
or die "system @args failed: $?”

# If the program succeeds - then life goes on

@args = ("command", "arg1", "arg2");


exec(@args);

# You will never get here.

This example uses backticks to capture


my $current_directory = ’pwd’; the output of the “pwd” command.

Why is this a bad example?

The system() and exec() functions execute any program on your system for you and
return that programs exit status - not the programs output. To capture the output
from a program you must use backticks or qx//.

The difference between the two functions is that system() will fo a fork first and then
wait for the executed program to finish. That is, it runs your program for you and
returns when it is done. Exec() replaces your running program with the the new one,
so it never returns if the replacement succeeds (which makes the return of the exit
status a bit redundant).

See “Programming Perl”, 3rd Edition, page 811, for more details.

In the last example on the slide we use backticks to figure out what our current
directory is. This is an example of how you can capture the output of an external
program - a bad example, because what will happen if you put this script on your
web-page, someone downloads it and then they find out it doesn’t run because their
system doesn’t have a pwd command.
Command Line Switches And Writing Perl One-Liners

 The -e switch allows you to write scripts directly on the command line.

perl -e ’print “Hello World\n”;’

 Perl programs can receive arguments from either:


 Standard input..
cat myfile | perl -e ’while(<>){ print unless /^\s+#/; }’

perl -e ‘while (<>){ print unless /^\s+#/; }’ < myfile

perl -e ’while(<>){ print unless /^\s+#/; }’ myfile

 The @ARGV array.


perl -e ‘print “@ARGV\n”;’ alpha.doc beta.txt gamma.eps

Perl one-liners fit the whole of a Perl program onto one line (a command line). See
the accompanying article in the second edition of the Perl Review (contained as a .pdf
file in the Examples directory). Also see the whole of Chapter 19 of “Programming
Perl”, 3rd edition, Pages 486-503 inclusive.

The first example is something you’ve already seen.

In the second example the pipe operator | takes the output of cat and makes it the
standard input to the Perl program. The diamond operator <> takes lines from
standard input, so this example prints the contents of the file “myfile” and executes
the pattern match shown (which throws away all comments - as long as comments
start with a #).

The third example does the same as the second but uses the file redirection operator
(<).

The fourth example uses the fact that the diamond operator can also open and
redirect the contents of a file specified on a command line. So this example is exactly
equivalent to both examples 2 and 3.

The last example prints out this:

alpha.doc beta.txt gamma.eps


Perl Command Line Switches Useful For One-Liners

Switch Effect
-e Used to enter one or more lines of a script.
-i Specifies that files processed by <> are to be edited in place.
-iEXTENSION Specifies that files processed by <> are to be edited in place
-mMODULE Loads MODULE as if you had executed a use.
-n Causes Perl to assume a loop around your code which makes it iterate over
filename arguments. See Example.
-p Causes Perl to assume a loop around your code which makes it iterate over
filename arguments. See Example.

Use the -I option with care. It renames the input file, opens and output file with the
original name and then selects that output file for all print, printf and write
statements.

If you use only the -I option then NO BACKUP COPY OF YOUR ORIGINAL FILE IS
MADE. The original file will be overwritten. If you do specify EXTENSION then the
original file is backed up using extension to supply a new name.

Here’s an example:

perl -p -i’.orig’ -e ‘s/foo/bar/’ xyz # Note that the -p option


has not yet been discussed.

This will load the file called xyz, rename a backup copy to xyz.orig, open a new
version of xyz for output and run the substitution on the original file contents, placing
the result of the substitutions into the new file (still called xyz).
An Example Of A Perl One-Liner
#!/usr/bin/perl
$extension = '.orig';
LINE: while (<>) {
if ($ARGV ne $oldargv) {
if ($extension !~ /\*/) {
$backup = $ARGV . $extension;
}
else {
($backup = $extension) =~ s/\*/$ARGV/g;
}
unless (rename($ARGV, $backup)) { This,
warn "cannot rename $ARGV to $backup: $!\n";
close ARGV;
next;
}
open(ARGVOUT, ">$ARGV");
select(ARGVOUT);
$oldargv = $ARGV; Does exactly the
}
same as this.
s/foo/bar/;
}
continue {
print; # this prints to original filename
}
select(STDOUT); perl -p -i’.orig’ -e ‘s/foo/bar/’ xyz

The example from the previous slide is expanded here as the minimum needed to
replace the functionality of the one-liner.
The Perl -n And -p Command Line Switches

 The -n switch causes Perl to assume the following loop around your script, which
makes it iterate over the filename arguments much as sed -n or awk do.
LINE:
while (<>) {
... # your script goes here
}

 The -p switch causes Perl to assume the following loop around your script, which
makes it iterate over the filename arguments much as sed does.
LINE:
while (<>) {
... # your script goes here
}
continue {
print or die "-p destination: $!\n";
}

In both cases you can use LINE as a loop label from within your script, even though
you can’t actually see it in your file.

With the -n switch, lines are not printed by default. With the -p switch, lines are
printed automatically.

In both cases BEGIN and END blocks may be used to capture control before or after
the implicit loop - just like awk.
Other Perl Command Line Switches

Switch Effect
-c Causes Perl to check the syntax of the script and then exit without executing what has
just been compiled.
-d Runs the script under control of the Perl debugger.
-h Prints a summary of Perl’s command line options.
-T Turns on “taint” checks - an extra form of security useful for running CGI scripts.

-v Prints the version number and patch level of the Perl executable.
-w Prints warnings about variables which are used only once, and variables which are
used before being set. See Chapter 33 of “Programming Perl” 3rd edition.

We will discuss the perl debugger later.

Everyone should always run Perl with the -w option, either as here, as part of the
command line, or more generally as part of the:

#!/usr/local/bin/perl -w

There are many more command line switches than those listed. See the whole of
Chapter 19 of “Programming Perl”, 3rd edition for a complete description.
Command Line Arguments etc.

Item Description
ARGV The special filehandle that iterates over command line filenames in @ARGV.
$ARGV Contains the name of the current file when reading from the ARGV handle using <>.

@ARGV The array containing the command-line arguments intended for the script. $#ARGV is
the number of arguments minus one. $ARGV[0] is the first argument, not the
command name. Use scalar @ARGV for the number of program arguments.

@ARG Within a subroutine, this array holds the argument list passed to that subroutine.
@_ Within a subroutine, this array holds the argument list passed to that subroutine.

Notes:
Adding Command Line Arguments To Your Own Programs

 There are two options:


 Use the CPAN getopts module.
 Write your own code - like this:
sub Process_Command_Line_Arguments
{
my ( $ref_arguments ) = @_;
my $numargs = @$ref_arguments;

# Process all arguments


my $next_arg;

while ( $numargs-- )
{
$next_arg = shift( @ARGV );
SWITCH: {
if ( $next_arg =~ m/^\-i/i ) { $main::infile = shift( @$ref_arguments ); $numargs-- ; last SWITCH; }
if ( $next_arg =~ m/^\-o/i ) { $main::outfile = shift( @$ref_arguments ); $numargs-- ; last SWITCH; }
if ( $next_arg =~ m/^\-d/i ) { $main::debug = TRUE; last SWITCH; }
if ( $next_arg =~ m/^\-/i ) { croak( "Unknown command line switch $next_arg" ); }
}
}
return TRUE;
}

Note that the input arguments are via a reference. You should also include some code
to look for something like -h or -help, print out something useful and then exit the
program.
Conclusion

 You’ve seen a lot in a short time.


 The key points of Perl are that:
 Variables consist of scalars and collections of scalars (arrays & hashes).
 A lot of the control structures are similar to C etc.
 References and subroutines.
 Packages and Modules.
 Pattern matching is very powerful.
 Perl is a very versatile language.
 You all now know enough to write useful Perl programs.

Notes:

Now give the advanced material in Style, then run LAB7 -


MODULES_AND_SUBROUTINES_1
Style Guidelines For Perl

1 Introduction
This document presents guidelines for anyone who writes Perl scripts for design support tasks. The
aim is to introduce a common style and understanding for the benefit of anyone who either writes
new programs, or has to debug and/or maintain old ones.

2 Program Structure
Structure your program in the same way you would structure a C program. Have one section of
code that is the equivalent of C’s main(), and as long as the total program size is anything other than
trivially small, put code into subroutines that are called from the main program body.

Don’t structure the top-level of a program in file-scope since any variables declared there are visible
in all following subroutines (even if they’re lexical, or my, variables)– instead create the top-level
of your program as a code block (if you to think in C terms, even label it as MAIN if this helps you)
and put all code there. Also, don’t use global variables at all (i.e., outside the code block), since this
allows variables to have side-effects in different subroutines. To achieve both of these features
structure your code like this:

#!/usr/local/bin/perl

use strict;
use warnings;
use diagnostics;

sub subroutine_1( $$$ );

MAIN:
{
my $variable_1 = 27;

# Program code – equivalent of C’s main()


}

exit;

sub subroutine_1( $$$ )


{
# Subroutine body – can’t see $variable_1 unless it was passed as an
# parameter in the subroutine call.
}

use strict and use warnings are never optional, while use diagnostics gives readable
error messages that are useful for new users (and old ones).
The loop with the label MAIN: is where the main body of the program is written. A code block like
this is the equivalent of a loop that runs exactly once, but has the feature that all the lexical
variables declared within its scope are restricted to that scope, i.e., subroutine_1 can’t see the
values of any lexical variables like $variable_1 unless they are passed to subroutine_1 as an
argument of a subroutine call to subroutine_1 (which is basically how you’d hope a program
would behave). Also note that the label (MAIN:) is optional, and can be omitted.

Note that subroutine_1 is declared before the main body of the program. This is only needed if
the subroutine definitions follow the main program – if they precede it then the forward declarations
aren’t needed since the declaration is also the definition. Also note that subroutines can optionally
be declared with prototypes (the $$$ in ( $$$ ) which here declares that the subroutine is
expecting three scalar arguments). This check is performed at compile time so there’s no run-time
overhead for doing this.

If you must use a global variable (you really shouldn’t) then make it explicit that this is what you’re
doing by referring to it as a package variable like this:

#!/usr/local/bin/perl

sub subroutine_1();

$main::count = 56;

MAIN:
{
$main::count = 27;

subroutine_1();
}

exit;

sub subroutine_1()
{
print “The value of count is $main::count\n”;
}

Here we’ve declared a global variable called $main::count (it’s a variable named $count in
package main, the default package name, which is why it’s name is $main::count). This code
prints the value 27 when executed since the initial value of 56 is overwritten in the main body of
code and this is the value seen in subroutine_1 when it is executed. Note that the value of
$main::count wasn’t passed to subroutine_1 as a parameter, but subroutine_1 can still see
its value (it can change its value as well – this is what I mean by having a side-effect).

2.1 Should Subroutine Parameters Be Passed By Value Or Reference ?


If you don’t want parameters passed in subroutines to be changed by the subroutine, then pass
parameters by value. This is nearly always what you want. To do this copy all the parameters to the
subroutine into lexical variables at the start of the subroutine like this:

2 / 18 July 31, 2005


subroutine_1( $$$ )
{
my ( $var_1 , $var_2 , $var_3 ) = @_;

# Subroutine code goes here. $var_1 etc are private to this code
}

This is a common Perl idiom where all the variables from the @_ array are copied into lexical
variables in the subroutine. This makes those variables local to the subroutine – changing them in
the subroutine will NOT change them in the calling code. This is normally how you would expect
programs to behave.

If you do want a variable in a subroutine to be changed in the calling code then pass the variables
to the subroutine by reference instead. This is done like this:

MAIN:
{
my $a = 56;

subroutine_1( $a );

print “A=$a\n”;
}

exit;

subroutine_1( $ )
{
$_[0] = 99; # Alter the first element of the @_ array
}

The elements of the @_ array are references to the variables in the calling code, so changing the
value of $_[0] will change the variable $a in the example above. Therefore the value printed will
be A=99. This form is not recommended since it’s confusing and inconsistent with normal usage.

2.2 Passing Arrays And Hashes To Subroutines


Lists of values when passed to subroutines are flattened, so if you pass two lists to a subroutine,
from the perspective of the subroutine itself this looks like one long list, i.e., the identity of the two
lists is lost. Since this almost certainly isn’t what you want to achieve, pass the lists as references
instead. This way the identities of the two (or more) lists is maintained. Here’s how to do this:

MAIN:
{
my @list_1 = qw( Alpha Baker Charlie Delta );
my @list_2 = qw( Zulu Yankee Xray Whisky );

subroutine_1( \@list_1 , \@list_2 );


}

July 31, 2005 3 / 18


exit;

subroutine_1( $$ )
{
my ( $list_1_r , $list_2_r ) = @_;

print $list_1_r->[ 1 ] , “ “ , $list2r_r->[ 3 ] , “\n”;


}

The two arguments (which are themselves scalars) are references to the original lists so the
subroutine can access the individual elements of the lists. Therefore the above example prints out
“Baker Whisky”.

2.3 Returning One Or More Results From A Subroutine : Part 1


Use the wantarray function to see if a subroutine was called in scalar or list context. If the
wantarray function returns TRUE then return a list, else return a scalar. Here’s how to do this:

MAIN:
{
my @list = subroutine_1();
my $scalar = subroutine_1();

print “@list $scalar”;


}

exit;

subroutine_1( $$ )
{
if ( wantarray )
{
return qw( one two three four five );
}
else
{
return( “once I caught a fish alive\n” );
}
}

The first call to subroutine_1 is in list context (the calling program expects a list to be returned).
In subroutine_1 the wantarray function is evaluated and for this first call it will be TRUE,
therefore subroutine_1 sends back a list of five things (the textual representation of the numbers
one to five inclusive). The second call to subroutine_1 is in scalar context (the calling program
expects a single thing to be returned). Now when the wantarray function is evaluated a single
thing is returned (a string consisting of the text “once I caught a fish alive”.

Note that you can also return information from a subroutine that is expected to be interpreted as a
hash. If this is true then you should make sure that you return an even number of scalars (each pair
of scalar’s will be used as a key/value pair in the resulting hash).

4 / 18 July 31, 2005


2.4 Returning One Or More Results From A Subroutine : Part 2
You want to return several scalars from a subroutine. Here’s how to do it;

MAIN:
{
my @values = qw ( 6.32 7.88 9.54 12.83 17.99 31.36 18.25 );
my ( $mean , $median , $mode , $variance ) = statistics( @values );

# Code to print out results


}

exit;

sub statistics
{
# Code to compute mean, median, mode, variance

return( $mean , $median , $mode , $variance );


}

We arrange for the subroutine to return four scalar variables in a list, and we arrange for the
receiving code to place those four returning values in that list, into another four scalar variables.

2.5 Making The Equivalent Of C Static Variables


Sometimes you want to be able to create a variable in a subroutine that will maintain its value
between subroutine calls. Here’s how to do this:

MAIN:
{
my $tmp;

$tmp = count(); print “Tmp = $tmp\n”;


$tmp = count(); print “Tmp = $tmp\n”;
}

exit;

BEGIN
{
my $count_value = 0;

sub count()
{
$count_value++;
return $count_value;
}
}

Place the subroutine definition(s) in a code block (subroutines are visible from everywhere
regardless of how you “hide” them). The lexical variable $count_value is locally scoped to the
July 31, 2005 5 / 18
code block its defined in and is therefore available to the subroutine count(). However, while
normally a lexical variable will be destroyed once a code block finishes execution, in this case the
compiler arranges for it to continue to exist since something is still referring to it (in technical terms
the subroutine count() has incremented $count_value’s reference count, and that stops Perl
from destroying it).

The only problem is how to get an initial value of zero into the value of $count_value. This is
done by placing all the code in a BEGIN block. Perl guarantees to execute all BEGIN blocks as soon
as they are compiled, thus ensuring that the single line of code “my $count_value = 0” is
executed before any call to the subroutine is made. The above code therefore prints out Tmp = 1
followed by Tmp = 2.

Of course, there’s no reason why several subroutines cannot share a variable in this way to provide
a globally accessed variable that cannot suffer from unintended side-effects. Here’s how:

MAIN:
{
my $tmp;

initialize( 37 );

$tmp = increment(); print “Tmp = $tmp\n”;


$tmp = decrement(); print “Tmp = $tmp\n”;
}

exit;

BEGIN
{
my $value = 0;

sub initialize( $ )
{
$value = shift @_;
}
sub increment()
{
$value++; return $value;
}
sub decrement()
{
$value--; return $value;
}
}

This is a very secure way to create something that can be accessed from anywhere in a controlled
and predictable manner. The variable $value is secure from any unintended side-effects (or even
intended ones) and can be initialized/incremented/decremented from anywhere (you could of course
also add a read subroutine to just return the value). We’ve almost strayed into OO land here since
we’ve created something that is encapsulated (the variable value) and can only be accessed via
subroutine calls (equivalent of OO methods).

6 / 18 July 31, 2005


2.6 Implementing A Switch Statement
One way is to download switch.pm from CPAN and use that, but that might not be an option for
code you export to other sites. Here’s another way that’s self-contained:

SWITCH:
{
if ( $condition == TRUE)
{
# Run some code

next SWITCH;
}
if ( $some_other_condition == TRUE)
{
# Run some other code

last SWITCH;
}

# Run some default code


}

Here, SWITCH is a label (so each switch statement needs a different label and this is a drawback)
while the last SWITCH piece of code is the equivalent of C’s break. Since this is a loop you can
repeat it with next (all clauses except the last) , and end it with last (the last clause only).

2.7 Labels : Use Them


Use labels to be explicit about where the commands next and last transfer you (and goto, but you’re
never going to use goto, are you!).

OUTER:
{
foreach my $item ( @item_list )
{
INNER:
{
foreach my $object ( @object_list )
{
# Code

next OUTER if ( $some_condition == TRUE );

# Code

next INNER if ( $some_other_condition == TRUE );


}
}
}
}

July 31, 2005 7 / 18


2.8 Labels : Don’t Use Them
If you use labels it is always clear where you are transferring control to, but it is never clear at the
transfer point (i.e., the actual label) where transfer of control has come from, and this makes it very
hard to debug code – next and last with labels are just synonyms for goto (and you’re never going
to use goto, are you!) On balance, use labels for SWITCH and one level loop operations.

3 Writing Efficient, Maintainable And Reusable Code


Package useful code into subroutines and then into modules and then share it with everyone. Install
tools in:

/design/rmc/tools/

and modules in:

/design/rmc/tools/Perl_Modules/tool/dev/

and in both cases release them. Don’t forget to write documentation, ideally as POD (Perl has
translators to generate man pages, html and PDF). Don’t reinvent the wheel.

Since a lot of what we do involves reading and parsing files, and then writing some new file(s), use
Netlist_Tools.pm in the Perl_Modules directory. These routines are debugged and work quite
happily with files that are gigabytes in size and they’ll transparently gunzip any files that are
gzipped even if you don’t know they’re gzipped. Don’t reinvent the wheel.

Also, before you write a mega-thingy widget that will revolutionize human-kind, look on CPAN
just in case someone else has beaten you to it (they probably have)! Don’t reinvent the wheel.

If you’re writing code that makes several different tests on some data, put the most common tests
before the less common ones. For example, if you’re testing a string in a loop like this:

foreach my $line ( @very_large_file )


{
if ( $line =~ m/\s*\#/ ) # Lines that are comments (start with a #)
{
next;
}
if ( $line =~ m/^$/ ) # Lines that are blank
{
next;
}
if ( $line =~ m/^\s+/ ) # Lines that contain leading white-space
{
next;
}
if ( $line =~ m/^\S+/ ) # Lines without leading white-space
{
# Code to process $line

next;
}
}

8 / 18 July 31, 2005


and you run this code with a file containing 10 million lines of which 99.99% of the lines are not
either comments, blank or start with white-space, then you’ll end up executing approximately 40
million tests. If you put the bottom most test (the test for lines without leading white-space) first,
then this code will now run and execute about 10 million tests.

3.1 Writing Readable Code


Here’s a very good question. Why do I need to observe and adhere to standards in programming in
an environment like ours? My answer to this is to give an example:

foreach $keyName (keys(%keys)) {


foreach $hierName (keys(%{$keys{$keyName}{instances}})) {
if(${$instances{$hierName}{type}} eq "key") {
my $cellName = ${$instances{expandExpression($keyName, $hierName)}{cellName}};
if(exists($cellProperties{$cellName}{classless}{keyTerminals})) {
foreach my $keyTermName (split(',',
${$cellProperties{$cellName}{classless}{keyTerminals}})) {
if(exists($keys{$keyName}{instances}{$hierName}{$keyTermName})) {
my $netName = expandExpression($keyName,
${$keys{$keyName}{instances}{$hierName}{$keyTermName}});
if(defined($packageTerms{$netName})) {
if(!defined($ios{$netName}) || $ios{$netName} ne "-global") {
if($packageTerms{$netName} eq $keyName) {
keysWarn("Instance of cell with duplicated key names, cell $cellName, in
$padName, duplicated key name is $keyTermName\n"); }
else {
keysWarn("Two or more keys connected to package terminal $netName, key $keyName-
$keyTermName and key ", $packageTerms{$netName}, "\n"); } } }
$packageTerms{$netName} = $keyName; } } } } } }
keysMessage("Checked ".scalar(keys(%packageTerms))." package terminals\n");

I’ve rendered it in a small font size to illustrate a point: the formatting has been preserved exactly as
it was written, and this is a small fragment of a much larger code-base of well over 5000 lines of
code just like this. And my point? I absolutely guarantee to you that one week after the above code
was written, that the original author will not know all the nuances that went into it’s authorship.
Any debugging exercise will be very difficult for that author, let alone someone who comes fresh to
the task with responsibility to maintain this code once the originator has moved on.

Therefore, style and readability and clarity matter.

3.1.1 Hints For Readable Code


Line up items so that it’s easy to spot errors. For example, this works but isn’t acceptable:

my $lef_filename = undef;
my $log_filename = undef;
my $default_log_filename = "lefPortStrip.log";
my $pin_names_r = [];
my $layer_names_r = [];

run_lef_import( $lef_filename , $log_filename , $default_log_filename ,


$pin_names_r , $layer_names_r );

July 31, 2005 9 / 18


but this is:

my $lef_filename = undef;
my $log_filename = undef;
my $default_log_filename = "lefPortStrip.log";
my $pin_names_r = [];
my $layer_names_r = [];

run_lef_import( $lef_filename ,
$log_filename ,
$default_log_filename ,
$pin_names_r ,
$layer_names_r );

If you’re writing a complex “if” statement then line up the brackets:

If ( ( $day == SUNDAY ) &&


( $full_moon == TRUE ) &&
{ $spring_equinox == TRUE ) )
{
print “It’s Easter Sunday\n”;
}

Use a 2 or 4 column indent and be consistent in its usage.

Put the opening curly brace on the line after a keyword and lined up with the start of the keyword.

A one-line BLOCK may be put on one line, including left- and right-brace.

If ( $flag == TRUE) { $result = PI; $next_example = FALSE; }

Don’t omit the semicolon in a one-line BLOCK even though you can (in the above example it’s the
semicolon after the “E” in FALSE. At some point it’s a certainty that you’ll change that one line
block to a multi-line block by adding new commands. At that point the semicolon is needed and
you’ll have to add it anyway.

Don’t put space before the semicolon after a statement. Do put space both before and after a “,”
when separating parameters and list items.

Put space around most (all) operators.

Put space around complicated subscripting code.

Put blank lines between sections of code that do different things.

Don’t put space between a function name and its opening parenthesis.

Break long lines after an operator.

10 / 18 July 31, 2005


Omit redundant punctuation as long as clarity doesn't suffer.

3.2 Use Constants


If values appear in code that are constants, define them as constants with “use constant”. It is an
accepted convention that constants should appear in all UPPERCASE.

use constant PI => 3.1415926;


use constant E => 2.7182818;
use constant A => 6.02E23;

MAIN:
{
my $radius = 2.0;
my $area = PI * $radius * $radius;
}

3.3 Make The Use Of References Obvious


If your code uses references, make sure that the variable names that are used are tagged with
something that makes it obvious they’re references, like _r. If you do this consistently it then
becomes obvious when you try to use something that is/is not a reference in a dereference
operation. For example, in the following code it’s obvious that you should only be using the
dereference operator (the ->) on a reference.

my $array_r = []; # Create a reference to an empty list

# and then later

$array_r->[ 56 ] = PI;

While in the following example it should be obvious that something has gone wrong because the
dereference operator is not being used on a reference (the _r is missing).

my $number = 56;

# and then later

$number->[ 0 ] = get_random_integer();

3.4 Don’t Use Default Values


When using a loop construct like foreach, don’t use the defaults allowed by Perl. I.e. it is allowable
to say this:

foreach ( @l )
{

July 31, 2005 11 / 18


print $_;
}

Which doesn’t tell you much about what’s going on and why, whereas the far more readable:

foreach my $book_title ( @library )


{
print “$book_title\n”;
}

tells you exactly what was/is intended. This will be more clear to others when they read your code
and will be clearer to you when you come back to debug your code in a years time.

3.5 Distinguish Between for And foreach


The Perl keywords, for and foreach are synonyms, so you can use either one to index through lists
or index through values. Here are two examples of how you should use them:

foreach my $name ( @friends )


{
print “I have a friend called $name\n”;
}

for ( my $count = 0 ; $count <= 10 ; $count++ )


{
print “Count = $count\n”;
}

And here are two examples of how you should not use them:

for my $name ( @friends )


{
print “I have a friend called $name\n”;
}

foreach ( my $count = 0 ; $count <= 10 ; $count++ )


{
print “Count = $count\n”;
}

3.6 Common Sense


Use meaningful variable and subroutine names. Don’t use variables with the names $a and $b. See
the man page for sort() to understand why.

Name variables using my (i.e., use lexical variables). Never use global variables and don’t be
tempted in the heat of debugging to insert just one or two to get around a problem.

12 / 18 July 31, 2005


Use lots of comments. You’ll be amazed how quickly you’ll forget just what it was you were trying
to express in your code a day, a week, a month, a year ago.

Document functions and procedures.

When in doubt use parentheses. Just because you can omit them doesn’t mean you should omit
them.

If your program is running for more than a few seconds, give your users some feedback. If you’re
programming a GUI in PerlTk, use a progress bar.

If your program is a command line driven program then always program a -help parameter to give
users some idea of what the program does and what to type. Make the invocation of the program
with no parameters display some help information. Give a user the option to get more help with a
–help parameter.

Allow default options. Make sure a user knows what they are, when he/she asks for help.

Make error messages clear so a user knows what to fix when things don’t run the way they expect.

Since many programs are often chained together or are run within a single controlling program,
make sure all scripts return an error or success code. Error codes for success are always 0 (zero). If
programs are designed to be chained together in a shell script, then follow the Unix philosophy of
having programs that complete successfully return no output at all (i.e., they are silent).

Here’s a way of setting up and using exit codes:

# Exit codes :

use constant EXIT_OKAY => 0; # Success


use constant EXIT_BAD_ARGS => 1; # Failed with bad arguments

# Later in your program

if ( $number_of_arguments < 4 ) # Not enough arguments given !


{
exit ( EXIT_BAD_ARGS );
}

# And at the end of your program

exit( EXIT_OKAY );

Always return a value from both your program and any subroutines in that program. If you don’t
use an explicit return statement then the value returned is the result of last statement evaluated. This
will change as you modify your code, and in particular since most code is added at the end of a
program, the return value from what you’re currently writing will be changing what is seen by
whatever wrapper is running your code.

If it’s vital that your code not return a value, because, say, you want to indicate that an error
occurred but it wasn’t a fatal error, then return undef. In Perl undef is a value that represents not
defined.
July 31, 2005 13 / 18
When you write Modules, remember that a module must always return a value of TRUE, so the last
line of a Module should look like this.

1;

If you cut-and-paste code, then that code belongs in a subroutine.

4 Testing
If your code is destined to be used by others then you must test it. In particular keep a directory or
folder with files that are read by your code, and write some scripts to run common cases. When you
add new features or debug problems, make sure all the old tests are run so that you can prove that
the modifications or additions haven’t caused unintended side-effects that cause old code to stop
working correctly (in computer science parlance this is called regression testing).

5 Traps For The Unwary (Or, Things That Catch Everyone Out Eventually)
Remember to use == for numeric tests and eq for string tests. Don’t fall into the C trap of using =
(assignment) when you mean == (comparison).

Remember not to use = when you mean =~.

Always start your Perl code with this:

use warnings;
use strict;
use diagnostics;

All arrays count from 0, not 1. An array of size 20 has elements [0] to [19] inclusive. There isn’t an
array item [20].

Hashes have no order, so you can’t use for or foreach with a hash. You also can’t index into them
with []. If you need to iterate over a hash you’ll need to use keys and values.

6 Some Common Tasks And Possible Solutions


There are many things that occur over and over again in Perl programming. Here are some simple
solutions to some of those tasks.

6.1 Adding A Command Line To A Program Ala Unix


First solution: Use the Perl getopts module. The advantage of getopts is that it is all completely
written for you. The disadvantage is that if it doesn’t do exactly what you want, then you either alter
it or live with it.

Second solution: Write your own routine. Here’s a template for it:

14 / 18 July 31, 2005


sub Parse_Command_Line_Arguments( $ )
{
my ( $arguments_r ) = @_;

my $usage = “my_prog -input <input filename>


-output <output filename>
[-print_flag]”;

my $numargs = @$arguments_r;
my $argument = undef;

foreach $argument ( @$arguments_r )


{
if ( $argument =~ m/\-help/i )
{
# Help requested
exit 0;
}
}

if ( $numargs < 1 ) # Process all arguments


{
print ( "\nUsage: $usage\n" );
print ( "\nUse my_prog -h to get more help\n\n" );
exit 0;
}

my $next_arg = undef;
my $input_filename = undef;
my $output_filename = undef;
my $print_flag = FALSE;

while ( $numargs-- )
{
$next_arg = shift( @$arguments_r );

SWITCH:
{
if ( $next_arg =~ m/^\-input/i )
{
$input_filename = shift( @$arguments_r ); $numargs-- ;
last SWITCH;
}
if ( $next_arg =~ m/^\-output/i )
{
$output_filename = shift( @$arguments_r ); $numargs--;
last SWITCH;
}
if ( $next_arg =~ m/^\-print_flag/i )
{
$print_flag = TRUE;
last SWITCH;
}
if ( $next_arg =~ m/^\-/i )
{
croak( "Unknown command line switch $next_arg" ); }

July 31, 2005 15 / 18


}
}
}
return ( $input_filename , $output_filename , $print_flag );
}

You can then call this routine like this:

my ( $input_filename ,
$output_filename ,
$print_flag ) = Parse_Command_Line_Arguments( \@ARGV );

croak ( "Missing input filename" ) unless defined $input_filename;


croak ( "Missing output filename" ) unless defined $output_filename;

6.2 Loading And Parsing A File


You want to load and loop through all the lines of a file performing some programming tasks on
some or all of the lines. You then want to write out a new file containing whatever manipulations
you’ve done.

Here’s a common way to do this:

#!/usr/local/bin/perl

use strict;
use warnings;
use diagnostics;

use Carp;
use Cwd;
use Config;

use lib ( "/design/rmc/tools/Perl_Modules/tool/current/" );


use lib ( "/design/rmc/tools/Perl_Modules/tool/current/
OS_SPECIFIC/$Config{archname}" );

use FindBin qw( $Bin );


use lib $Bin;

use Netlist_Tools;

MAIN:
{
my $file_r = Read_File( “BigFile.txt” );

foreach my $line ( @$file_r )


{
# Fiddle with the line
}

Write_File( “NewFile.txt” , $file_r );

16 / 18 July 31, 2005


exit 0;
}

This code will load one of (in this order) BigFile.txt, BigFile.txt.gz,
BigFile.txt.gzip. If you specify an output filename in Write_File that is suffixed in either
.gz or .gzip then the file will be compressed (with gzip) before it is written.

A major advantage of Read_File is that not only will it transparently read in the file via gzip if
necessary, all the lines are then formatted so that every line is in a list that can be iterated, and every
line is guaranteed to have no white-space before the first non-white-space character. There will also
be no white-space at the end of the line and all “words” on a line will be separated by exactly one
space.

If, alternatively, you want to create a new file based on some or all of the contents of an input file,
you can re-write the body of the code in the previous program like this:

use Netlist_Tools;

MAIN:
{
my $in_file_r = Read_File( “BigFile.txt” );
my $out_file_r = [];

foreach my $line ( @$file_r )


{
# Inspect the line, generate new information from it. Write the
# new information into a list like this:

push @$out_file_r , “new stuff”; # Don’t add \n at the ends of lines


}

Write_File( “NewFile.txt” , $out_file_r );

exit 0;
}

This will write out the contents of a list (@$out_file_r) which you build up piece-meal based on
some or all of what you read from the original input file.

6.3 Interacting With The LSF Queuing Mechanism


The LSF queuing system allows CPU intensive jobs to use the shared CPU resource of most of the
machines in this building. Here’s how to interface to that queuing system while limiting yourself to
a predetermined number of jobs and adding new jobs to the queue as old jobs complete:

# This example shows how to use ELDO for which we have 4 licenses.
# We’ll limit ourselves to use 2 of them. While we’re limiting ourselves
# here because of scarce license resource, the same code can be used to
# stop queues being flooded with jobs that are pending but consuming
# queue slots (and making yourself pretty damn unpopular).

July 31, 2005 17 / 18


my $running_jobs = 0;
my $jobs_limit = 4;

foreach my $file qw ( File_1.cir

File_2.cir
.
.
.
File_98.cir
File_99.cir )
{
# Test the queue

my $bjobs_output = `bjobs -q linux 2>&1`;


my @bjobs_lines = split( /\n/ , $bjobs_output );
$running_jobs = scalar( @bjobs_lines ) - 1;

if ( $running_jobs < $jobs_limit )


{
my $command = "bsub -q linux \'eldo -nomail -queue -stver $file\'";

system( $command );

print "\nJob $command submitted\n";


}
else
{
print ".";
sleep 10;
redo;
}
}

exit;

18 / 18 July 31, 2005


Useful regular expressions
--------------------------------
Roman numbers
m/^m*(d?c{0,3}|c[dm])(l?x{0,3}|x[lc])(v?i{0,3}|i[vx])$/i
--------------------------------
Swap first two words
s/(\S+)(\s+)(\S+)/$3$2$1/
--------------------------------
Keyword = Value
m/(\w+)\s*=\s*(.*)\s*$/ # keyword is $1, value is $2
--------------------------------
Line of at least 80 characters
m/.{80,}/
--------------------------------
MM/DD/YY HH:MM:SS
m|(\d+)/(\d+)/(\d+) (\d+):(\d+):(\d+)|
--------------------------------
Changing directories
s(/usr/bin)(/usr/local/bin)g
--------------------------------
Expanding %7E (hex) escapes
s/%([0-9A-Fa-f][0-9A-Fa-f])/chr hex $1/ge
--------------------------------
Deleting C comments (imperfectly)
s{
/\* # Match the opening delimiter
.*? # Match a minimal number of characters
\*/ # Match the closing delimiter
} []gsx;
--------------------------------
Removing leading and trailing whitespace
s/^\s+//;
s/\s+$//;
--------------------------------
Turning \ followed by n into a real newline
s/\\n/\n/g;
--------------------------------
Removing package portion of fully qualified symbols
s/^.*:://
--------------------------------
IP address
m/^([01]?\d\d|2[0-4]\d|25[0-5])\.([01]?\d\d|2[0-4]\d|25[0-5])\.
([01]?\d\d|2[0-4]\d|25[0-5])\.([01]?\d\d|2[0-4]\d|25[0-5])$/;
--------------------------------
Removing leading path from filename
s(^.*/)()
--------------------------------
Extracting columns setting from TERMCAP
$cols = ( ($ENV{TERMCAP} || " ") =~ m/:co#(\d+):/ ) ? $1 : 80;
--------------------------------
Removing directory components from program name and arguments
($name = join(" ", map { s,^\S+/,,; $_ } ($0 @ARGV));
--------------------------------
Checking your operating system
die "This isn't Linux" unless $^O =~ m/linux/i;
--------------------------------
Joining continuation lines in multiline string
s/\n\s+/ /g
--------------------------------
Extracting all numbers from a string
@nums = m/(\d+\.?\d*|\.\d+)/g;
--------------------------------
Finding all-caps words
@capwords = m/(\b[^\Wa-z0-9_]+\b)/g;
--------------------------------
Finding all-lowercase words
@lowords = m/(\b[^\WA-Z0-9_]+\b)/g;
--------------------------------
Finding initial-caps word
@icwords = m/(\b[^\Wa-z0-9_][^\WA-Z0-9_]*\b)/;
--------------------------------
Finding links in simple HTML
@links = m/<A[^>]+?HREF\s*=\s*["']?([^'" >]+?)[ '"]?>/sig;
--------------------------------
Finding middle initial in $_
$initial = m/^\S+\s+(\S)\S*\s+\S/ ? $1 : "";
--------------------------------
Changing inch marks to quotes
s/"([^"]*)"/``$1''/g
--------------------------------
Extracting sentences (two spaces required)
{ local $/ = "";
while (<>) {
s/\n/ /g;
s/ {3,}/ /g;
push @sentences, m/(\S.*?[!?.])(?= |\Z)/g;
}
}
--------------------------------
YYYY-MM-DD
m/(\d{4})-(\d\d)-(\d\d)/ # YYYY in $1, MM in $2, DD in $3
--------------------------------
North American telephone numbers
m/ ^
(?:
1 \s (?: \d\d\d \s)? # 1, or 1 and area code
| # ... or ...
\(\d\d\d\) \s # area code with parens
| # ... or ...
(?: \+\d\d?\d? \s)? # optional +country code
\d\d\d ([\s\-]) # and area code
)
\d\d\d (\s|\1) # prefix (and area code)
\d\d\d\d # exchange
$
/x
--------------------------------
Exclamations
m/\boh\s+my\s+gh?o(d(dess(es)?|s?)|odness|sh)\b/i
--------------------------------
Extracting lines regardless of line terminator
push(@lines, $1)
while ($input =~ s/^([^\012\015]*)(\012\015?|\015\012?)//);
Pattern Matching Operators Extended Regex Sequences Regex Metasymbols
=~ matches, or, contains (?#...) comment, discard \0 the null character
=! does not match, or, does not contain (?:...) cluster-only parentheses, no capturing \NNN octal character
(?imsx-imsx) enable/disable pattern modifiers \n nth captured string
The m// Operator (Matching) (?imsx-imsx:...) cluster-only parentheses plus modifiers \a alarm character
EXPR =~ m/PATTERN/cgimosx search string in EXPR for PATTERN (?=...) true if lookahead assertion succeeds \cX control X
EXPR =~ /PATTERN/cgimosx as above but once only match (?!...) true if lookahead assertion fails \C match C char
EXPR =~ ?PATTERN?cgimosx as above, no variable interpolation (?<=...) true if lookbehind assertion succeeds \e ASCII esc
m/PATTERN/cgimosx search in $_ (?<!...) true if lookbehind assertion fails \E end case (\L\U or \Q)
/PATTERN/cgimosx search in $_ (?>...) match nonbacktracking subpattern \f form-feed
?PATTERN?cgimosx search in $_ (?{...}) execute embedded Perl code \l lowercase character
/i ignore alphabetic case (??{...}) match regex from embedded Perl code \L lowercase until \E
/m let ^ and $ match next to embedded /n (?(...)...|...) match with if-then-else pattern \n newline
/s let . match newline and ignore deprecated $* (?(...)...) match with if-then pattern \Q de-meta until \E
/x ignore (most) whitespace and permit comments in pattern \r return
/o compile pattern once only Classic Character Classes \t TAB
/g globally find all matches /d digit [0-9] /D non-digit [^0-9] \u titlecase character
/cg allow continued search after failed /g match MODIFIERS /s whitespace [ /t/n/r/f] /S non-whitespace [^ /t/n/r/f] \U uppercase until \E
/w word [a-zA-Z0-9_] /W non-word [^a-zA-Z0-9_] \x match hex character
The s/// Operator (Substitution)
LVALUE =~ s/PATTERN/REPLACEMENT/egimosx Meta-characters / | ( ) [ { ^ $ * + ? . printf/sprintf
s/PATTERN/REPLACEMENT/egimosx /... de-meta next meta, or, meta next non-meta character %% percent sign
/i ignore alphabetic case (when matching) ...|... alternation (match one of many) %c character
/m let ^ and $ match next to embedded /n (...) grouping (treat as a unit). Patterns are stored %s string
/s let . match newline and ignore deprecated $* in $1, $2, etc. after match (\1 \2 inside match) %d signed dec int
/x ignore (most) whitespace and permit comments in pattern (?:PATTERN) group/cluster but don't capture %u unsigned dec int
/o compile pattern once only [...] character class (match one character from a set) %o unsigned oct int
/g replace globally, that is, all occurrences ^ true at beginning of string (or with \m after newline) %x unsigned hex int
/e evaluate the right side as an expression MODIFIERS \A true at begining of string %e float scientific
. match one character (except newline, normally) %f float decimal
The tr/// Operator (Transliteration)
$ true at end of string (or with /m after newline) %g float %e or %f
LVALUE =~ tr/SEARCHLIST/REPLACEMENTLIST/cds
\z true at end of string %X like %x but UC
tr/SEARCHLIST/REPLACEMENTLIST/cds
\Z true before newline at end of string, %E like %e uses 'E'
y/// is a synonym for tr///
otherwise at end of string %G like %g uses 'G'
/c Complement SEARCHLIST
\b match at word boundary (i.e. /w/W or /W/w) %b unsigned binary
/d Delete found but unreplaced characters
\B match at not a word boundary (i.e. \w\w or \W\W) %p a pointer
/s Squash duplicate replaced characters MODIFIERS
\G continue from where the last match ended %n perl specific
Regex Quantifiers Regexp Grabbag
* match 0 or more times (maximal) assign and substitute in one go ($copy = $original) =~ s/this/that/;
+ match 1 or more times (maximal) swap two words s/(/S+)(/s+)(/S+)/$3$2$1/;
? match 1 or 0 times (maximal) keyword = value m/(/w+)/s*=/s*(.*)/s*$/;
{COUNT} match exactly COUNT times line of at least N characters m/.{80,}/;
{MIN,} match at least MIN times (maximal) changing directories s(/usr/bin)(/usr/local/bin)g;
{MIN,MAX} match at least MIN but not more than MAX times (maximal) remove & compress whitespace s/^/s+//; s//s+$//; s/\s+/ /;
*? match 0 or more times (minimal) turn / followed n into a real newline s///n//n/g;
+? match 1 or more times (minimal) milli,micro,nano,pico,femto,atto
?? match 0 or 1 time (minimal) integer $arg =~ m/^\d+$/;
{MIN,}? match at least MIN times (minimal) integer + suffix $arg =~ m/^\d+[munpfa]$/;
{MIN,MAX}? match at least MIN but not more than MAX times (minimal) float $arg =~ m/^([+-]?)(\d+)?\.\d\d*([Ee]([+-]?\d+))?$/;
float + suffix $arg =~ m/^([+-]?)(\d+)?\.\d\d*([Ee]([+-]?\d+))?[munpfa]$/;
REGULAR EXPRESSION & (S)PRINTF SUMMARY filename $arg =~ m/[A-Za-z_][\d\w_\.]*$/;
Stepping and Running Actions and Command Execution
s - single step (single step subroutines too) a - delete action on current line
s EXPR - single step an expression (inc. subroutines & functions) a COMMAND - add action to current line
n - single step (don't single step subroutines) a LINE - delete action on line
n EXPR - single step an expression (not subroutines & functions) a LINE COMMAND - add action to line
<ENTER> - repeat previous s or n command A - delete all actions
. - set internal debugger pointer to last line executed and print line < - delete all actions before prompt
r - continue until the currently executing subroutine returns < ? - show action before prompt
< EXPR - add action before prompt
Breakpoints
<< EXPR - add another action before prompt
b - set a breakpoint on the line about
> - delete all actions after prompt
to execute
> ? - show action after prompt
b LINE [b 73] - set a breakpoint before LINE
> EXPR - add action after prompt
b CONDITION [b $x>10] - set breakpoint on next line with condition
>> EXPR - add another action after prompt
b LINE CONDITION [b 40 $a>12]- set a breakpoint on LINE with condition
{ - like < but a debugger command
b SUBNAME [b Load_File] - set breakpoint before first line
{ ? - like < ? but a debugger command
of subroutine
{ COMMAND - like < COMMAND but a debugger command
b SUBNAME CONDITION - set breakpoint before first line
{{ COMMAND - like << COMMAND but a debugger command
of subroutine with condition
! - repeat previous command
b postpone SUBNAME - set breakpoint at first line of
! NUMBER - repeat a numbered command
subroutine after compilation
! -NUMBER - repeat command counting backward
b postpone SUBNAME CONDITION - set breakpoint at first line of
! PATTERN - repeat command containing PATTERN
subroutine after compilation with condition
!! CMD - run external command in sub-process
b compile SUBNAME - set breakpoint on first statement to be
| - pipe external command to $ENV{PAGER}
executed after SUBNAME is compiled
| DBCMD - pipe debugger command DBCMD
b load FILENAME - set breakpoint on first executed line in file
|| PERLCMD - pipe perl command PERLCMD
d - delete breakpoint on the line about to execute
d LINE [d 224] - delete breakpoint on line LINE Locating Code
D - delete all breakpoints l - list the next few lines of code
L - list all breakpoints and actions l LINE - list code from line LINE
c - continue execution l MIN+INCR - list INCR+1 lines of code code from line MIN
c LINE [c 76] - continue execution (set one-time breakpoint l MIN-MAX - list code from lines MIN to MAX
on line LINE) - - list a previous few lines
Tracing w - list a window (a few lines) around the current line
T - produce a stack backtrace w LINE - list a window (a few lines) around line LINE
t - trace the program f FILENAME - view a different program or eval statement
t EXPR - trace an expression /PATTERN/ - search forward for PATTERN. / repeats previous search
W - delete all watchpoints ?PATTERN? - search backward for PATTERN. ? repeats previous search
W EXPR - add expression as global watchpoint S - list all subroutine names
p - print S PATTERN - list all subroutine names matching PATTERN
p EXPR - print an expression S !PATTERN - list all subroutine names not matching PATTERN
x - pretty-print (will recursively print data structures)
Miscellaneous Commands
x EXPR - pretty-print an expression
q or ^ - quit the debugger
V - display all variables in current package
R - restart the debugger
V PKG - display all variables in a package
= or = ALIAS - list all aliases or a named ALIAS
V PKG VARS - display named variables in a package
= ALIAS VALUE - create an alias
X - same as V in CURRENTPACKAGE
man - show man page for man or a named MANPAGE
X VARS - same as V in CURRENTPACKAGE
O - show all options
H - show all commands
O OPTION - set listed options to 1
H-NUMBER - show last NUMBER commands
O OPTION? - show listed options
Always: use warnings; use strict; use diagnostics; O OPTION=VALUE - set an option to a value
Pattern Matching Operators Extended Regex Sequences Regex Metasymbols
=~ matches, or, contains (?#...) comment, discard \0 the null character
=! does not match, or, does not contain (?:...) cluster-only parentheses, no capturing \NNN octal character
(?imsx-imsx) enable/disable pattern modifiers \n nth captured string
The m// Operator (Matching) (?imsx-imsx:...) cluster-only parentheses plus modifiers \a alarm character
EXPR =~ m/PATTERN/cgimosx search string in EXPR for PATTERN (?=...) true if lookahead assertion succeeds \cX control X
EXPR =~ /PATTERN/cgimosx as above but once only match (?!...) true if lookahead assertion fails \C match C char
EXPR =~ ?PATTERN?cgimosx as above, no variable interpolation (?<=...) true if lookbehind assertion succeeds \e ASCII esc
m/PATTERN/cgimosx search in $_ (?<!...) true if lookbehind assertion fails \E end case (\L\U or \Q)
/PATTERN/cgimosx search in $_ (?>...) match nonbacktracking subpattern \f form-feed
?PATTERN?cgimosx search in $_ (?{...}) execute embedded Perl code \l lowercase character
/i ignore alphabetic case (??{...}) match regex from embedded Perl code \L lowercase until \E
/m let ^ and $ match next to embedded /n (?(...)...|...) match with if-then-else pattern \n newline
/s let . match newline and ignore deprecated $* (?(...)...) match with if-then pattern \Q de-meta until \E
/x ignore (most) whitespace and permit comments in pattern \r return
/o compile pattern once only Classic Character Classes \t TAB
/g globally find all matches /d digit [0-9] /D non-digit [^0-9] \u titlecase character
/cg allow continued search after failed /g match MODIFIERS /s whitespace [ /t/n/r/f] /S non-whitespace [^ /t/n/r/f] \U uppercase until \E
/w word [a-zA-Z0-9_] /W non-word [^a-zA-Z0-9_] \x match hex character
The s/// Operator (Substitution)
LVALUE =~ s/PATTERN/REPLACEMENT/egimosx Meta-characters / | ( ) [ { ^ $ * + ? . printf/sprintf
s/PATTERN/REPLACEMENT/egimosx /... de-meta next meta, or, meta next non-meta character %% percent sign
/i ignore alphabetic case (when matching) ...|... alternation (match one of many) %c character
/m let ^ and $ match next to embedded /n (...) grouping (treat as a unit). Patterns are stored %s string
/s let . match newline and ignore deprecated $* in $1, $2, etc. after match (\1 \2 inside match) %d signed dec int
/x ignore (most) whitespace and permit comments in pattern (?:PATTERN) group/cluster but don't capture %u unsigned dec int
/o compile pattern once only [...] character class (match one character from a set) %o unsigned oct int
/g replace globally, that is, all occurrences ^ true at beginning of string (or with \m after newline) %x unsigned hex int
/e evaluate the right side as an expression MODIFIERS \A true at begining of string %e float scientific
. match one character (except newline, normally) %f float decimal
The tr/// Operator (Transliteration)
$ true at end of string (or with /m after newline) %g float %e or %f
LVALUE =~ tr/SEARCHLIST/REPLACEMENTLIST/cds
\z true at end of string %X like %x but UC
tr/SEARCHLIST/REPLACEMENTLIST/cds
\Z true before newline at end of string, %E like %e uses 'E'
y/// is a synonym for tr///
otherwise at end of string %G like %g uses 'G'
/c Complement SEARCHLIST
\b match at word boundary (i.e. /w/W or /W/w) %b unsigned binary
/d Delete found but unreplaced characters
\B match at not a word boundary (i.e. \w\w or \W\W) %p a pointer
/s Squash duplicate replaced characters MODIFIERS
\G continue from where the last match ended %n perl specific
Regex Quantifiers Regexp Grabbag
* match 0 or more times (maximal) assign and substitute in one go ($copy = $original) =~ s/this/that/;
+ match 1 or more times (maximal) swap two words s/(/S+)(/s+)(/S+)/$3$2$1/;
? match 1 or 0 times (maximal) keyword = value m/(/w+)/s*=/s*(.*)/s*$/;
{COUNT} match exactly COUNT times line of at least N characters m/.{80,}/;
{MIN,} match at least MIN times (maximal) changing directories s(/usr/bin)(/usr/local/bin)g;
{MIN,MAX} match at least MIN but not more than MAX times (maximal) remove & compress whitespace s/^/s+//; s//s+$//; s/\s+/ /;
*? match 0 or more times (minimal) turn / followed n into a real newline s///n//n/g;
+? match 1 or more times (minimal) milli,micro,nano,pico,femto,atto
?? match 0 or 1 time (minimal) integer $arg =~ m/^\d+$/;
{MIN,}? match at least MIN times (minimal) integer + suffix $arg =~ m/^\d+[munpfa]$/;
{MIN,MAX}? match at least MIN but not more than MAX times (minimal) float $arg =~ m/^([+-]?)(\d+)?\.\d\d*([Ee]([+-]?\d+))?$/;
float + suffix $arg =~ m/^([+-]?)(\d+)?\.\d\d*([Ee]([+-]?\d+))?[munpfa]$/;
REGULAR EXPRESSION & (S)PRINTF SUMMARY filename $arg =~ m/[A-Za-z_][\d\w_\.]*$/;

Stepping and Running Actions and Command Execution


s - single step (single step subroutines too) a - delete action on current line
s EXPR - single step an expression (inc. subroutines & functions) a COMMAND - add action to current line
n - single step (don't single step subroutines) a LINE - delete action on line
n EXPR - single step an expression (not subroutines & functions) a LINE COMMAND - add action to line
<ENTER> - repeat previous s or n command A - delete all actions
. - set internal debugger pointer to last line executed and print line < - delete all actions before prompt
r - continue until the currently executing subroutine returns < ? - show action before prompt
< EXPR - add action before prompt
Breakpoints
<< EXPR - add another action before prompt
b - set a breakpoint on the line about
> - delete all actions after prompt
to execute
> ? - show action after prompt
b LINE [b 73] - set a breakpoint before LINE
> EXPR - add action after prompt
b CONDITION [b $x>10] - set breakpoint on next line with condition
>> EXPR - add another action after prompt
b LINE CONDITION [b 40 $a>12]- set a breakpoint on LINE with condition
{ - like < but a debugger command
b SUBNAME [b Load_File] - set breakpoint before first line
{ ? - like < ? but a debugger command
of subroutine
{ COMMAND - like < COMMAND but a debugger command
b SUBNAME CONDITION - set breakpoint before first line
{{ COMMAND - like << COMMAND but a debugger command
of subroutine with condition
! - repeat previous command
b postpone SUBNAME - set breakpoint at first line of
! NUMBER - repeat a numbered command
subroutine after compilation
! -NUMBER - repeat command counting backward
b postpone SUBNAME CONDITION - set breakpoint at first line of
! PATTERN - repeat command containing PATTERN
subroutine after compilation with condition
!! CMD - run external command in sub-process
b compile SUBNAME - set breakpoint on first statement to be
| - pipe external command to $ENV{PAGER}
executed after SUBNAME is compiled
| DBCMD - pipe debugger command DBCMD
b load FILENAME - set breakpoint on first executed line in file
|| PERLCMD - pipe perl command PERLCMD
d - delete breakpoint on the line about to execute
d LINE [d 224] - delete breakpoint on line LINE Locating Code
D - delete all breakpoints l - list the next few lines of code
L - list all breakpoints and actions l LINE - list code from line LINE
c - continue execution l MIN+INCR - list INCR+1 lines of code code from line MIN
c LINE [c 76] - continue execution (set one-time breakpoint l MIN-MAX - list code from lines MIN to MAX
on line LINE) - - list a previous few lines
Tracing w - list a window (a few lines) around the current line
T - produce a stack backtrace w LINE - list a window (a few lines) around line LINE
t - trace the program f FILENAME - view a different program or eval statement
t EXPR - trace an expression /PATTERN/ - search forward for PATTERN. / repeats previous search
W - delete all watchpoints ?PATTERN? - search backward for PATTERN. ? repeats previous search
W EXPR - add expression as global watchpoint S - list all subroutine names
p - print S PATTERN - list all subroutine names matching PATTERN
p EXPR - print an expression S !PATTERN - list all subroutine names not matching PATTERN
x - pretty-print (will recursively print data structures)
Miscellaneous Commands
x EXPR - pretty-print an expression
q or ^ - quit the debugger
V - display all variables in current package
R - restart the debugger
V PKG - display all variables in a package
= or = ALIAS - list all aliases or a named ALIAS
V PKG VARS - display named variables in a package
= ALIAS VALUE - create an alias
X - same as V in CURRENTPACKAGE
man or man MANPAGE - show man page for man or a named MANPAGE
X VARS - same as V in CURRENTPACKAGE
O - show all options
H - show all commands
O OPTION - set listed options to 1
H-NUMBER - show last NUMBER commands
O OPTION? - show listed options
Always: use warnings; use strict; use diagnostics; O OPTION=VALUE - set an option to a value
Advanced Perl

Style
September 2005
A Standard Header

 This works in Bristol.


#!/usr/local/bin/perl

use strict; Preamble


use warnings;
use diagnostics;

use Carp; Some standard modules


use Cwd;
use Config;

use lib ( "/design/rmc/tools/Perl_Modules/tool/current/" );


use lib ( "/design/rmc/tools/Perl_Modules/tool/current/ Extend lib path
OS_SPECIFIC/$Config{archname}" );

use FindBin qw( $Bin );


use lib $Bin; Current directory

use Netlist_Tools; Site specific

 There are other binary invocations that use “eval’ with some “magic”.

The magic #! line works for all machines on-site, regardless of whether they are
SunOS (Solaris) or Linux based.

We always use strict and warnings. Diagnostics are useful for less experienced
programmers but if omitted can be added on a command line invocation with -
Mdiagnostics.

Carp is the standard blame shifter (makes errors show up in client code rather then in
your code). Cwd is a platform independent way of finding the current working directory.
Config is used to allow programs to transparently load precompiled code (C, C++ etc.)
on different binary platforms.

FindBin allows a program to find out from what directory it is being run and to add that
directory to Perl’s path.

Netlist_Tools are site specific tools.


Program Structure

 Structure your program in the same way you would structure a C program.
#!/usr/local/bin/perl

use strict; Standard Header


use warnings;
use diagnostics;

sub subroutine_1( $$$ ); Forward Declarations

MAIN:
{
my $variable_1 = 27;
Main Program
# Program code – equivalent of C’s main()
}

exit; Exit

sub subroutine_1( $$$ )


{
# Subroutine body – can’t see $variable_1 unless it was passed as a Subroutines
# parameter in the subroutine call.
}

By placing all the code for your program into subroutines and one top-level code block
(here called “Main Program”), we can enforce the scope of all variable declarations
and reduce or eliminate side-effects. Note that the top-level code block is headed by a
label (MAIN:) but this is optional, and the name of theblock can be anything (I’ve called
it MAIN to lull C programmers into a false sense of security).
If You Must Use Global variables

 Make all global variables, package variables.


#!/usr/local/bin/perl

use strict; Standard Header


use warnings;
use diagnostics;

sub subroutine_1(); Forward Declarations

$main::count = 56; Global Declarations

MAIN:
{ Write
$main::count = 27;

subroutine_1();
}

exit;

sub subroutine_1()
{
Read
print “The value of count is $main::count\n”;
}

There really isn’t any good reason to use global variables in the sense shown above.
The problem is that the global variable is seen by all the subroutines that follow it
because its scope is file scope. Therefore any subroutine can modify it and cause
other subroutines that also see the variable to change their behavior - this isn’t usually
what is intended.
Subroutine Parameters - I

 Three choices: Pass by value, by reference, or by value in a hash (next slide).

subroutine_1( $$$ )
{
my ( $var_1 , $var_2 , $var_3 ) = @_;
The Right Way
(Value)
# Subroutine code goes here. $var_1 etc are private to this code
}

MAIN:
{
my $a = 56;

subroutine_1( $a );

print “A=$a\n”;
} The Wrong Way
(Reference)
exit;

subroutine_1( $ )
{
$_[0] = 99; # Alter the first element of the @_ array
}

Of the ways to pass parameters to a subroutine, the best way (the correct way) is to
pass them by value. This is done by copying all the parameters into local variables
(lexical variables) at the start of the subroutine. Make this the first thing that any
subroutine does. Then, if you change the value of any of the variables then it doesn’t
affect the value of that variable in the code that called your subroutine. If you do want
to change the value of one of the input parameters then you can pass by reference
(option 1), or you can return a new value for the variable as a return value from the
subroutine and assign it back to the corresponding variable in the calling code (option
2). Option 1 corresponds to the way you might choose to do this in C. Option 2 is the
correct wy to do this in Perl. Note: option 1 and option 2 DO NOT refer to the two
sections of code above.
Subroutine Parameters - II

 Pass by value in a hash.


 Replaces ordering (bad) by naming (good).
sub format_line
{
my ( $args_r ) = @_;

$args_r->{ justify } = 0 unless( exists( $args_r->{ justify } ) ); Default

my $gap = $args_r->{ cols } - length $args_r->{ text };


my $left = $args_r->{ justify } ? int( $gap / 2 ) : 0;
my $right = $gap - $left;

return $args_r->{ filler } x $left .


$args_r->{ text } .
$args_r->{ filler } x $right;
} Creates a reference
to a hash
# Then later . . .

foreach my $line ( @lines )


{
$line = padded( { text => $line , cols => 20 , justify => 1 , filler => SPACE } );
}

If we pass values in a hash then we replace positional information by name


information. We can optionally set up default values in the subroutine so that any
missing parameters do not cause the subroutine to fail. See next slide.
Passing Arrays And Hashes To Subroutines

 Pass arrays and hashes as references:


MAIN:
{
my @list_1 = qw( Alpha Baker Charlie Delta );
my @list_2 = qw( Zulu Yankee Xray Whisky );

subroutine_1( \@list_1 , \@list_2 ); Pass As References


}

exit;

subroutine_1( $$ )
{ Copy To Lexicals
my ( $list_1_r , $list_2_r ) = @_;

print $list_1_r->[ 1 ] , “ “ , $list2r_r->[ 3 ] , “\n”; Use As References


}

Note a subtlety: We’re passing references here to make our program fast. If the lists
that the references point to are large, then we don’t end up copying those large lists
via the stack. We localise the references into subroutine_1 with the my statement, but
we can still change any value in the lists that the references point to, by simply running
the code as shown with list_1_r->[ 1 ] on the left-hand side of an assignment. In this
respect we’ve exactly emulated C where we’ve called a subroutine with a const pointer
- you can’t change the pointer but you can change the thing it’s pointing at. We’ve also
violated our “Option 2” rule from 2 slides back “Subroutines I”.
Returning Results From Subroutines - I

 Subroutines can decide what to return based on context:


MAIN:
{
my @list = subroutine_1();
my $scalar = subroutine_1();

print “@list $scalar”;


}

exit;

subroutine_1( $$ )
{
if ( wantarray )
{
return qw( one two three four five );
}
else
{
return( “once I caught a fish alive\n” );
}
}

Subroutines can return data in context, that is, subroutines can be made to know how
they were called: in list context or in scalar context.
Returning Results From Subroutines - II

 You want to return several scalars from a subroutine:


MAIN:
{
my @values = qw ( 6.32 7.88 9.54 12.83 17.99 31.36 18.25 );
my ( $mean , $median , $mode , $variance ) = statistics( @values );

# Code to print out results


}

exit;

sub statistics
{
# Code to compute mean, median, mode, variance

return( $mean , $median , $mode , $variance );


}

If you want to return more than one thing from a subroutine, then return a list. You can
then assign that list to another list in the calling code. Note that this can be error prone
(you need to get the right number and order of variables with no language assistance).
You could return a hash with named results, but you then run the risk (especially if you
pass subroutine parameters in a hash as well) of turning each subroutine call into
something with more overhead than code.
The Equivalent Of C Static Variables - I

 Sometimes you want to be able to create a variable in a subroutine that will


maintain its value between subroutine calls. Here’s how to do this:
MAIN:
{
my $tmp;

$tmp = count(); print “Tmp = $tmp\n”;


$tmp = count(); print “Tmp = $tmp\n”;
}

exit;

BEGIN
{
my $count_value = 0;

sub count()
{
$count_value++;
return $count_value;
}
}

Subroutine names are globally visible, so even though count() is buried one level down
everything/anything that needs to call it can do so. However, with code written as
shown, count() can access the variable named $count_value but nothing else in the
program can. It’s a lexical variable and not a package variable (so you can’t say
$main::count_value because that isn’t the way to access this particular variable) and
the fact that count() is referring to it will make sure that perl keeps its reference count
non-zero (so it is persistent and exists for the lifetime of the program). A long as we
make the block in which it is defined a BEGIN block then it will be initialised by Perl
before any of your code starts to run.
The Equivalent Of C Static Variables - II

 Several subroutines sharing common access to provide a “global” variable that


cannot suffer from unintended side-effects.
MAIN:
{
my $tmp;

initialize( 37 );

$tmp = increment(); print “Tmp = $tmp\n”;


$tmp = decrement(); print “Tmp = $tmp\n”;
}

exit;

BEGIN
{
my $value = 0;

sub initialize( $ ) { $value = shift @_; }

sub increment() { $value++; return $value; }


sub decrement() { $value--; return $value; }
}

This is a very secure way to create something that can be accessed from anywhere in
a controlled and predictable manner. The variable $value is secure from any
unintended side-effects (or even intended ones) and can be
initialized/incremented/decremented from anywhere (you could of course also add a
read subroutine to just return the value). We’ve almost strayed into OO land here since
we’ve created something that is encapsulated (the variable value) and can only be
accessed via subroutine calls (equivalent of OO methods).
Implementing A SWITCH Statement

 Perl doesn’t have a SWITCH statement. Here’s how to code one:


SWITCH:
{
if ( $condition == TRUE)
{
# Run some code

next SWITCH;
}
if ( $some_other_condition == TRUE)
{
# Run some other code

last SWITCH;
}

# Run some default code


}

Here, SWITCH is a label (so each switch statement needs a different label and this is
a drawback) while the last SWITCH piece of code is the equivalent of C’s break. Since
this is a loop, you can repeat it with next (all clauses except the last) , and end it with
last (the last clause only).
Labels - Use Them/Don’t Use Them

 Labels are, by convention, UPPERCASE.


OUTER:
{
foreach my $item ( @item_list )
{
INNER:
{
foreach my $object ( @object_list )
{
# Code

next OUTER if ( $some_condition == TRUE );

# Code

next INNER if ( $some_other_condition == TRUE );


}
}
}
}

Use labels to be explicit about where the commands next and last transfer you (and
goto, but you’re never going to use goto, are you!).

If you use labels it is always clear where you are transferring control to, but it is never clear at
the transfer point (i.e., the actual label) where transfer of control has come from, and this
makes it very hard to debug code – next and last with labels are just synonyms for goto (and
you’re never going to use goto, are you!) On balance, use labels for SWITCH and one level
loop operations.
Writing Efficient, Maintainable And Useable Code - I

 Package useful code into subroutines/modules and then share it.

 Install tools in:


 /design/rmc/tools/

 Install modules in:


 /design/rmc/tools/Perl_Modules/tool/dev/

 Add comments, lots of comments.


 Format your code so it is readable.
 Write documentation.
 Use Netlist_Tools.pm
 Look on CPAN.

 Don’t reinvent the wheel.


Writing Efficient, Maintainable And Useable Code - II

 Put tests in the “right” order.


foreach my $line ( @very_large_file )
{
if ( $line =~ m/\s*\#/ ) # Lines that are comments (start with a #)
{
next;
}
if ( $line =~ m/^$/ ) # Lines that are blank
{
next;
}
if ( $line =~ m/^\s+/ ) # Lines that contain leading white-space
{
next;
}
if ( $line =~ m/^\S+/ ) # Lines without leading white-space
{ The common case,
# Code to process $line so this should
go
next;
}
}

 Don’t reinvent the wheel.

If you’re writing code that makes several different tests on some data, put the most common
tests before the less common ones.

If you run the code on the slide, with a file containing 10 million lines, of which 99.99% of the
lines are not either comments, blank, or start with white-space, then you’ll end up executing
approximately 40 million tests. If you put the bottom most test (the test for lines without
leading white-space) first, then this code will now run and execute about 10 million tests.
Hints For Readable Code - I

 This is not the way to do it …


my $lef_filename = undef;
my $log_filename = undef;
my $default_log_filename = lefPortStrip.log";
my $pin_names_r = [];
my $layer_names_r = [];

run_lef_import( $lef_filename , $log_filename , $default_log_filename ,


pin_names_r , $layer_names_r );

 But this is …
my $lef_filename = undef;
my $log_filename = undef;
my $default_log_filename = "lefPortStrip.log";
my $pin_names_r = [];
my $layer_names_r = [];

run_lef_import( $lef_filename ,
$log_filename ,
$default_log_filename ,
$pin_names_r ,
$layer_names_r );

Note how much easier it would be to spot the missing opening quote and the missing $
sign on lines 3 and 8 of the upper example.
Hints For Readable Code - II

 If you’re writing a complex “if” statement, line up the brackets:


If ( ( $day == SUNDAY ) &&
( $full_moon == TRUE ) &&
{ $spring_equinox == TRUE ) )
{
print “It’s Easter Sunday\n”;
}

 Use a 2 or 4 column indent and be consistent in its usage.


 Put the opening curly brace on the line after a keyword and lined up with the start
of the keyword.
 A one-line BLOCK may be put on one line, including left- and right-brace.

If ( $flag == TRUE) { $result = PI; $next_example = FALSE; }

 Do put space both before and after a “,” when separating parameters and list items.
 Do put space around most (all) operators.
 Do put space around complicated subscripting code.

Don’t forget the semicolon in the one-line block case (the semicolon after the E in
FALSE). it is optional, but it shouldn’t be.
Hints For Readable Code - III

 Do put blank lines between sections of code that do different things.


 Do break long lines after an operator.
 Do omit redundant punctuation as long as clarity doesn't suffer.

 Don’t put space before the semicolon after a statement.


 Don’t put space between a function name and its opening parenthesis.
Using Constants

 This is far more clear:

If ( ( $day == SUNDAY ) &&


( $full_moon == TRUE ) &&
{ $spring_equinox == TRUE ) )
{
print “It’s Easter Sunday\n”;
}

 than this:

If ( ( $day == 6 ) &&
( $full_moon == 1 ) &&
{ $spring_equinox == 1 ) )
{
print “It’s Easter Sunday\n”;
}

 Using constants that are all UPPERCASE is a common convention.

You use constants with the,

use constant CONSTANT_NAME => value;

pragma.

For example;

use constant PI => 3.1415926535;


Make The Use Of References Obvious

 Tag all references with “_r”:

my $array_r = []; # Create a reference to an empty list

# and then later

$array_r->[ 56 ] = PI;

 While it should be obvious that this won’t compile:

my $number = 56;

# and then later

$number->[ 0 ] = get_random_integer();

If your code uses references, make sure that the variable names that are used are tagged with
something that makes it obvious they’re references, like _r. If you do this consistently it then
becomes obvious when you try to use something that is/is not a reference in a dereference
operation. For example, in the code above it’s obvious that you should only be using the
dereference operator (the ->) on a reference.
Avoid Using Default Values

 Avoid $_ and its cousins.


 You could program this:

foreach ( @_ )
{
print; # By default this statement will print $_
}

 but this is far better:


foreach my $book_title ( @library )
{
print “$book_title\n”;
}

When using a loop construct like foreach, don’t use the defaults allowed by Perl. I.e. it is
allowable to say remarkably little, (that doesn’t tell you much about what’s going on and why).
The second example tells you exactly what was/is intended.

Using default values leads to concise code that can be very difficult to read (even if
*you* wrote it). Don’t assume that your code will be debugged by you or that the
person debugging it will know what all the default values are. Keep it clear. Keep it
simple. It’s not the obfuscated Perl contest.
Distinguish Between For And Foreach

 The right way:


foreach my $name ( @friends )
{ Use foreach
print “I have a friend called $name\n”; with lists
}

for ( my $count = 0 ; $count <= 10 ; $count++ )


Use for with
{
indexes
print “Count = $count\n”;
}

 The wrong way:


for my $name ( @friends )
{
print “I have a friend called $name\n”;
}

foreach ( my $count = 0 ; $count <= 10 ; $count++ )


{
print “Count = $count\n”;
}

The Perl keywords, for and foreach are synonyms, so you can use either one to index through
lists or index through values. However, you will confuse others if you use them the wrong way
around (foreach with an index or for with a list).
Use Common Sense - I

 Do use meaningful variable and subroutine names.


 Do use lexical variables.
 Do use lots of comments.
 Do document functions and procedures.
 Don’t use global variables.
 When in doubt use parentheses (which is always).
 Give your users some feedback.
 If you’re programming a GUI in PerlTk, use a progress bar.
 Always program a -help parameter.
 Invoking a program with no parameters should display some help
information.
 Allow default options.
 Make sure a user knows what they are, when he/she asks for help.
 Make error messages clear.
 Use exit codes for chained scripts.

Use meaningful variable and subroutine names. Don’t use variables with the names $a and
$b. See the man page for sort() to understand why. Name variables using my (i.e., use lexical
variables). Never use global variables and don’t be tempted in the heat of debugging to insert
just one or two to get around a problem. Use lots of comments. You’ll be amazed how quickly
you’ll forget just what it was you were trying to express in your code a day, a week, a month, a
year ago. When in doubt use parentheses. Just because you can omit them doesn’t mean you
should omit them. If your program is running for more than a few seconds, give your users
some feedback. If you’re programming a GUI in PerlTk, use a progress bar. If your program is
a command line driven program then always program a -help parameter to give users some
idea of what the program does and what to type. Make the invocation of the program with no
parameters display some help information. Give a user the option to get more help with a
–help parameter. Make error messages clear so a user knows what to fix when things don’t run
the way they expect.
Use Common Sense - II

 Add exit codes. Here’s how:

use constant EXIT_OKAY => 0; # Success


use constant EXIT_BAD_ARGS => 1; # Failed with bad arguments

# Later in your program

if ( $number_of_arguments < 4 ) # Not enough arguments given !


{
exit ( EXIT_BAD_ARGS );
}

# And at the end of your program

exit( EXIT_OKAY );

 Always return values from subroutines with an explicit return statement.


 Why?
 If you ever cut-and-past code, then that code belongs in a subroutine.
 Why?

Since many programs are often chained together or are run within a single controlling
program, make sure all scripts return an error or success code. Error codes for success are
always 0 (zero). If programs are designed to be chained together in a shell script, then follow
the Unix philosophy of having programs that complete successfully return no output at all (i.e.,
they are silent).

Always return a value from both your program and any subroutines in that program. If you
don’t use an explicit return statement then the value returned is the result of last statement
evaluated. This will change as you modify your code, and in particular since most code is
added at the end of a program, the return value from what you’re currently writing will be
changing what is seen by whatever wrapper is running your code.

If it’s vital that your code not return a value, because, say, you want to indicate that an error
occurred but it wasn’t a fatal error, then return undef. In Perl undef is a value that
represents not defined.
Common Traps

 Don’t confuse “==” (numeric comparison) for “=” (assignment).


 Don’t confuse “==” and “=~”
 Use eq for string comparison.
 Always use the standard header:
use warnings;
use strict;
use diagnostics;

although it’s sometimes useful to use -Mdiagnostics on the command line.


 Arrays count from [0], not [1], so a 20 element array as elements [0] to [19].
 Hashes have no order.
 You can’t iterate over them with foreach.
 You can’t index into them with [].
 You can iterate over them with each(), keys() and values() [and sort() to impose order].
Object Oriented Filehandles

 It would be nice to treat filehandles like lexical variables. Here’s how:

use IO::Handle;
Create the filehandle
use IO::File;
< = read
MAIN: > = write
{ >> = append
my $logfile = “log.log”;

# Then later . . .

my $log_fh = IO::File->new( "> $logfile" ) or die( "Couldn't open/append file $logfile" );

# Then later

print $log_fh “This line’s heading for the logfile\n”;

exit;
Note: no comma
}

The code shows how we can create a lexical variable that is a filehandle. We can pass
this to any subroutine at any stack depth and print information to it as shown. Note that
as with normal filehandles, there is no comma between the filehandle and the thing
that is being printed to it.

Unlike normal filehandles this filehandle is a lexical variable. You can explicitly close
the handle with close, or, you can just let the handle go out of scope at which point it
will be automatically closed.

In early versions of Perl (when machine speeds were 66MHz) there was a
considerable time overhead in loading the vast amount of code that is hidden behind
IO::Handle and IO::File. With modern machine speeds this is no longer an issue.
Add A Command Line To Your Program - I

 There are two solutions:

 Use the getopt module:


 getopt::std
 getopt::long
 Advantage - it’s written for you, you supply a template.
 Disadvantage - if it doesn’t do what you want, you’re stuck with it.

 Write your own:


 Advantage - it will do exactly what you want.
 Disadvantage - you have to write it, but,
 We can use (reuse) a template.
Add A Command Line To Your Program - II

 Here’s a template:
sub Parse_Command_Line_Arguments( $ ) while ( $numargs-- )
{ {
my ( $arguments_r ) = @_; $next_arg = shift( @$arguments_r ); SWITCH:
{
my $usage = “my_prog -input <input filename> if ( $next_arg =~ m/^\-input/i )
-output <output filename> {
[-print_flag]”; $input_filename = shift( @$arguments_r ); $numargs-- ;
last SWITCH;
my $numargs = @$arguments_r; }
my $argument = undef; if ( $next_arg =~ m/^\-output/i )
{
foreach $argument ( @$arguments_r ) $output_filename = shift( @$arguments_r ); $numargs--;
{ last SWITCH;
if ( $argument =~ m/\-help/i ) }
{ if ( $next_arg =~ m/^\-print_flag/i )
# Help requested {
exit 0; $print_flag = TRUE;
} last SWITCH;
} }
if ( $next_arg =~ m/^\-/i )
if ( $numargs < 1 ) # Process all arguments {
{ croak( "Unknown command line switch $next_arg" ); }
print ( "\nUsage: $usage\n" );
print ( "\nUse my_prog -h to get more help\n\n" );
exit 0; }
} }
}
my $next_arg = undef; return ( $input_filename , $output_filename , $print_flag );
my $input_filename = undef; }
my $output_filename = undef;
my $print_flag = FALSE;

All of this code is in the file perl.template in the release directory of this course.
Add A Command Line To Your Program - III

 The code is then called like this:


my ( $input_filename ,
$output_filename ,
$print_flag ) = Parse_Command_Line_Arguments( \@ARGV );

croak ( "Missing input filename" ) unless defined $input_filename;


croak ( "Missing output filename" ) unless defined $output_filename;

All of this code is in the file perl.template in the release directory of this course.
Parsing Files - I

 This is a standard Perl idiom for reading and re-writing a file:

Use Netlist_Tools;

MAIN:
{
my $file_r = Read_File( “BigFile.txt” );

foreach my $line ( @$file_r )


{
# Fiddle with the line
}

Write_File( “NewFile.txt” , $file_r );

exit 0;
}

 Note that each time through the foreach loop, $line is a reference, not a copy.

You want to load and loop through all the lines of a file performing some programming tasks
on some or all of the lines. You then want to write out a new file containing whatever
manipulations you’ve done.

This code will load one of (in this order) BigFile.txt, BigFile.txt.gz,
BigFile.txt.gzip. If you specify an output filename in Write_File that is suffixed in
either .gz or .gzip then the file will be compressed (with gzip) before it is written.

A major advantage of Read_File is that not only will it transparently read in the file via
gzip if necessary, all the lines are then formatted so that every line is in a list that can be
iterated, and every line is guaranteed to have no white-space before the first non-white-space
character. There will also be no white-space at the end of the line and all “words” on a line
will be separated by exactly one space.

If you don’t want the formatting that Read_File imposes then use
Read_File_Without_Formatting to get at the raw unaltered data. This will make the
regular expressions that detect the information you’re interested in finding, more
complicated, but, if the formatting is important, it will be preserved.
Parsing Files - II

 This is a standard Perl idiom for reading, and then writing, a new file:

use Netlist_Tools;

MAIN:
{
my $in_file_r = Read_File( “BigFile.txt” );
my $out_file_r = [];

foreach my $line ( @$file_r )


{
# Inspect the line, generate new information from it. Write the
# new information into a list like this:

push @$out_file_r , “new stuff”; # Don’t add \n at the ends of lines


}

Write_File( “NewFile.txt” , $out_file_r );

exit 0;
}

If, alternatively, you want to create a new file based on some or all of the contents of an input
file, you can re-write the body of the code in the previous program like this:

In this code we still read an input file, but, rather then altering the information in that file
(and thus destroying the original) we make a new file on-the-fly and then write that to
disk under a new name.
Interacting With A Compute Farm (The LSF Queue)

 Scheduling jobs in an efficient way:

# This example shows how to use ELDO for which we have 4 licenses. We’ll limit ourselves to use 2 of them.
# While we’re limiting ourselves here because of scarce license resource, the same code can stop queues being
# flooded with jobs that are pending but consuming queue slots (and making yourself pretty damn unpopular).

my ( $running_jobs , $jobs_limit ) = ( 0 , 2 );

foreach my $file qw ( File_1.cir File_2.cir . . . File_98.cir File_99.cir )


{
my $bjobs_output = `bjobs -q linux 2>&1`;
my @bjobs_lines = split( /\n/ , $bjobs_output ); Get queue status
$running_jobs = scalar( @bjobs_lines ) - 1;

if ( $running_jobs < $jobs_limit )


{
my $command = "bsub -q linux \'eldo -nomail -queue -stver $file\'";
system( $command ); Queue a job
print "\nJob $command submitted\n";
sleep 10;
}
else
{ Sleep
print "."; # Poor mans progress bar
sleep 10;
redo;
}
}

exit;

The LSF queuing system allows CPU intensive jobs to use the shared CPU resource of most of
the machines in this building. Here’s how to interface to that queuing system, while limiting
yourself to a predetermined number of jobs and adding new jobs to the queue as old jobs
complete:
Using Object Oriented (OO) Modules - I

 Using OO modules involves:


 Creating an object and setting up internal state.
 Calling methods to manipulate the object.

 Objects can be:


 Graphic objects.
 File objects.
 HTML objects.
 Database objects.
.
.
.
 The list is literally endless.

 Programmers fall into two distinct groups:


 User of OO modules.
 Writers of OO modules.

For OO modules you really do need to read the documentation and look at examples.
OO programming is quite different from declarative programming. In OO you create
objects and then rather than call functions and procedures to manipulate (potentially
shared) data, you have objects send messages to each other to achieve the same
thing. If you’ve never done this before it can all seem a little weird.
Using Object Oriented (OO) Modules - II

 Let’s look at an example of using colours, palettes and bitmaps:

#!/usr/local/bin/perl

use Bitmap;
use Palette;
use Colour;

use constant BMP_HEIGHT => 100;

MAIN:
{
my $palette_r = Palette->new( "Palette_1024.pal" );
my $colours_in_palette = $palette_r->get_palette_colour_count(); Start
my $bitmap_r = Bitmap->new( $colours_in_palette , BMP_HEIGHT );

foreach my $x ( 0 .. ( $colours_in_palette - 1 ) )
{
my $colour_r = $palette_r->get_indexed_colour( $x ); Do work
$bitmap_r->vline( 0 , ( BMP_HEIGHT - 1 ) , $x , $colour_r );
}

$bitmap_r->save( "Scale.bmp" );
End
exit;
}

$palette_r is a palette object. $bitmap_r is a bitmap object.

You make objects do useful things by executing methods (subroutine calls), or in OO


parlance, you send them messages asking/telling them what to do.

So, we create a new colour palette by sending Palette the new message with the
name of a file that contains a description of our colour palette. In return we get a
palette object that we store in a variable called $palette_r. We can count how many
colours are in the palette we’ve just created by sending the newly created $palette_r
object the get_palette_colour_count() message - this returns a number, the number of
colours in the palette.

We next create a bitmap object by sending a Bitmap the new message. The two
parameters that new() requires are the X and Y dimensions of the bitmap.
Using Object Oriented (OO) Modules - III

 Here are the results (this is Scale.bmp):

 Note how the code to generate this was:


 Brief and concise.
 Reasonably abstract.
 You didn’t need to know about colours or RGB-triples.
 Bitmaps or their internal representations.
 How to perform various geometrical algorithms like drawing lines.

 These are the messages a bitmap object can respond to:


 new(), get_bitmap_x(), get_bitmap_y(), get_bitmap_xy_r(), get_cliprect_x(),
get_cliprect_y(), get_invert_y(), set_cliprect_x(), set_cliprect_y(), set_invert_y(),
set_pixel(), hline(), vline() , line(), circle(), bitmap_sign(), rect(), filled_rect(), save(),
resize().

 Have you noticed that we’re not talking in Perl any more!
Using The Debugger

This file exists as a stand-alone .PDF file in the release area for this course.
Using Regular Expressions

This file exists as a stand-alone .PDF file in the release area for this course.
Perl For Beginners - September 2005 - Course Feedback Form

Excellent Good Poor

Was the course content

Too long Just right Too short

Was the course duration


Not detailed
Too detailed Just right enough

Were the course notes


Not detailed
Too detailed Just right enough

Were the labs


Too many Too many
lectures Just right labs

Was the balance between


lecture material and labs
Yes No

Was any material covered that


you thought should have been
omitted?
If so, what?

Yes No

Was any material omitted that


you thought should have been
covered?
If so, what?

Would you be interested in an


advanced course covering Yes No

object oriented Perl?


Any other comments

Your name (optional)

You might also like