You are on page 1of 42

CS262 Programming Languages

UNIT 1 String Patterns


Building a Web Browser

HTML JavaScript

Web Browser

Web Page Image

HMTL Web page basics JavaScript Web page computations

Web Page Source

<b> hello 1+ 2+3

Break it up into important words

< b > hello 1 + 2 + 3

Understand the structure

+ /\ 1 + /\ 2 3

Find

6
meaning

The goal is to use the web browser to structure the learning.

Breaking up strings in Python: Hello world.find( ) --> 5 1 + 1 = 2.find(1,2[starting position]) --> 4 haystack.find(neddle) --> -1 [not found] Selecting Substrings: hello[1[start here]:3[up to but not including]] --> hello[1:[go as far as possible]] --> ello Splitting Words by Whitespace: Jane Eyre.split() --> [Jane, Eyre] el

We need more control over splitting strings --> Regular Expressions Regular Expressions: [1-3] (matches or denotes)-> [a-b] --> a b 1 2 3
Page

A module is a repository of library of functions and data. In Python, import brings in a module: import re 5-letter string: [0-9] Regular Expression: r[0-9]
[matches 10 1-letter strings]

findall takes a r.e. and a string, and returns a list of all of the substrings that match that r.e: re.findall(r[0-9],1+2==3) --> [1, 2, 3] re.findall(r[1-2],1+2==3) --> [1, 2] re.findall(r[a-c],Barbara Liskov) --> [a, b, a, a] Well need to find /> and == for JavaScript and HTML. Thus, we need to express concatenation and repetition, to match more complicated [compound] strings: r[a-c][1-2] --> a1 a2 b1 b2 c1 c2 r[0-9][0-9] --> 00 01 02 ... 99 re.findall(r[0-9][0-9],July 28, 1821) --> [28, 18, 21] re.findall(r[0-9][0-9],12345) --> [12, 34] re.findall(r[a-z][0-9],a1 2b cc3 44d) --> [a1, c3] + (One or More times) Regular Expression: ra+[looks back to the previous r.e.] --> a aa aaa aaaa aaaaa ... r[0-1]+ -(matches)-> 0 1 00 11 01 100 1101 ... Maximum Munch (an r.e. should consume the biggest string it can and not smaller parts): re.findall(r[0-9]+, 13 from 1 in 1776) --> [13, 1, 1776]
re.findall(r[0-9][
[match space]][0-9]+,

a1 2b cc3 44d)

-->

[1 2, 3 44]

Finite State Machines --> A visual representation of regular expressions Suppose we want r[0-9]+%[% is matched directly]
Edges or Transitions Start here

0-9 0-9

Accepting State

2
States

| (Disjunction or OR) Regular Expression: r[a-z]+|[0-9]+ a-z

a-z

0-9

re.findall(r[a-z]+|[0-9]+,Goethe 1749)

-->

[Goethe, 1749]

Page

0-9

Optional Components (<something>|<nothing>):

1
0-9

2 4

0-9

3
more consise

0-9

0-9

0-9

0-9

no input or the empty string

? (Optional, Zero or One time) Regular Expression: re.findall(r-?[0-9]+,1861-1941 R. Tagore)

-->

[1861, -1941]

* (Zero or More times) Regular Expression: a+ aa* So now + * ? [ ] all mean something special in regular expressions. But if we want to refer to the symbols themselves, we can use: Escape Sequences. Escape Sequences: \ - Escape Character r\+\+ (matches)->

++

. (any character (except newline)) Regular Expression: re.findall(r[0-9].[0-9],1a1 222 cc3) ^ [caret] (anything except something) Regular Expression: re.findall(r[0-9][^ab],1a1 222 cc3)

-->

[1a1, 222]

-->

[1 , 22, 2 ]

(?: ) (parentheses to show structure) Regular Expression: re.findall(r(?:do|re|mi)+,mimi rere midore doo-wop) [mimi, rere, midore, do]

-->

How to represent (encode) a FSM? Dictionaries! a edges[(1,a)] = 2 1 2 Dictionaries: is_flower = {} is_flower[rose] = True is_flower[dog] = False
>>> is_flower[rose] True >>> is_flower[juliet] <Error>

or

is_flower = {rose: True, dog: False}

Page

Tuples: Tuples are immutable lists. point = (1,5) point[0] == 1 point[1] == 5 Lets encode ra+1+:
a 1 1

edges = {(1,a): 2, (2,a): 2, (2,1): 3, (3,1): 3} accepting = [3] FSM Simulator: fsmsim(<string>, <starting state>, <edges>, <accepting>) --> if the <string> is accepted by the FSM (<edges>,<accepting>)

True,

def fsmsim(string,current,edges,accepting): if string == : return current in accepting else: letter = string[0] if (current, letter) in edges: destination = edges[(current, letter)] remaining_string = string[1:] return fsmsim(remaining_string,destination,edges,accepting) else: return False

Handling Epsilon and Ambiguity: A FSM accepts a string s if there exists even one path from the start state to any accepting state following s.

1
a

2 3

easy-to-write FSMs with epsilon transitions or ambiguity are known as non-deterministic (you may not know exactly where to go) finite state machines. A lock-step FSM with no epsilon edges or ambiguity is a deterministic finite state machine. [fsmsim can handle these] Every non-deterministic FSM has a corresponding deterministic FSM that accepts exactly the same strings. Non-deterministic FSMs are NOT more powerful, they are just more convenient. Idea: Build a deterministic machine D where every state in D corresponds to a SET of states in the nondeterministic machine.

Page

Example: rab?c

3 4

5
no epsilon

no ambiguity

b
2,3,4,6

4
c

Example 2:

1
a

2
b

3 5

c
no epsilon

6
b 2,3 bc

no ambiguity

2,4,5,6

Wrap Up: STRINGS sequences of characters. REGULAR EXPRESSIONS concise notation for specifying sets of strings. More flexible than fixed string matching. (phone numbers, words, numbers, quoted strings) <-- search for and match them. FINITE STATE MACHINES pictorial equivalent to regular expressions. DETERMINISTIC every FSM can be converted to a deterministic FSM. FSM SIMULATION it is very easy (~10 lines of recursive code) to see if a det FSM accepts a string.

Simulating Non-Deterministic FSMs:


def nfsmsim(string, current, edges, accepting): if string == : return current in accepting else: letter = string[0:1] if (current, letter) in edges: remainder = string[1:] newstates = edges[(current, letter)] for next_state in newstates: if nfsmsim(remainder, nextstate, edges, accepting): return True return False

Page

Reading Machine Minds (Identifying empty FSMs):


def nfsmaccepts(current, edges, accepting, visited): if current in visited: return None elif current in accepting: return else: newvisited = visited + [current] for edge in edges: if edge[0] == current: for newstate in edges[edge]: foo = nfsmaccepts(newstate, edges, accepting, newvisited) if foo != None: return edge[1] + foo return None

UNIT 2 Lexical Analysis


A Lexical Analyzer is a program that reads in a web page or a bit of JavaScript and breaks it down into words. Specify HTML + JavaScript: HyperTextMarkupLanguage tells a web browser how to display a web page. (a missing end tag causes the text starting from the start tag to be influenced, all the way to the end) Bold tag: <b>python</b> Underline tag: <u>python</u> Italics tag: <i>python</i> Anchor tag: <a href = http://www.google.com>now!</a> [Here, href (hypertext reference) is an argument to the anchor tag] Paragraph tag: <p>python</p>

LEXICAL ANALYSIS breaking something into words TOKEN smallest unit of lexical analysis output: words, strings, numbers, punctuation, not whitespace So, lexical analysis breaks down a string into a list of tokens.
String

Page

Some HTML Tokens: LANGLE LANGLESLASH RANGLE EQUAL STRING WORD

Lexical Analysis

List of Tokens

< </ > = google.com Welcome!

Well use regular expressions to specify tokens, and this is how we write out token definitions in Python:
token name

def t_RANGLE(token): r> regexp matching this token return token return text unchanged

Token Values: By default, the value is the string matched. def t_NUMBER(token): r[0-9]+ token.value = int(token.value) return token

Quoted Strings are critical to interpreting HTML and JavaScript: def t_STRING(token): r[^]* return token

We want to skip of pass over spaces! def t_WHITESPACE(token): r pass

Whats left is words: def t_WORD(token): r[^ <>]+ return token

A LEXER (LEXical analyzER) is just a collection of token definitions. When two token definitions can match the same string, the behavior of our lexical analyzer may be ambiguous. In our implementation, we favor the token definition listed first.

String Snipping (remove quotes that are markers for strings and are separate from the meaning): def t_STRING(token): r[^]* token.value = token.value[1:-1] return token

Page

Lets make a Lexical Analyzer: import ply.lex as lex tokens = ( LANGLE, LANGLESLASH, RANGLE, EQUAL, STRING, WORD ) t_ignore = # # # # # # < </ > = hello Welcome!

# shortcut for whitespace

def t_LANGLESLASH(token): r</ return token def t_LANGLE(token): r< return token def t_RANGLE(token): r> return token def t_EQUAL(token): r= return token def t_STRING(token): r[^]* token.value = token.value[1:-1] return token def t_WORD(token): r[^ <>\n]+ return token webpage = This is <b>my</b> webpage! # the next line tells our lexical analysis library that we want to use # all of the token definitions above to make a lexical analyzer, and # break up strings. htmllexer = lex.lex() htmllexer.input(webpage) while True: tok = htmllexer.token() if not tok: break # tok --> LexToken(<NAME>, <token>, <line>, <character>) print tok
Page

Tracking Line Numbers: def t_NEWLINE(token): r\n token.lexer.lineno += 1 pass

# Comments (documentation, removing functionality): HTML comments: <!-- comments --> Adding support for HTML comments to our lexical analyzer. Comments will be modeled as a separate FSM that will ignore everything. Lexer States: states = ( # exclusive if I am in the middle of processing an HTML comment, # I cant be doing anything else. (htmlcomment, exclusive) ) # It goes before the normal lexer def t_htmlcomment(token): r<!-- token.lexer.begin(htmlcomment) def t_htmlcomment_end(token): r--> token.lexer.lineno += token.value.count(\n) token.lexer.begin(INITIAL) # Weve said what to do when an HTML comment begins and ends, but # any other character we see in this special HTML comment mode # isnt going to match one of those two rules. So... def t_htmlcomment_error(token): token.lexer.skip(1) # Its a lot like pass except that is gathers up all of the text # into one big value so that I can count the newlines later

Page

Introducing JavaScript (with an example): <p> Welcome to <b>my</b> webpage. Five factorial (aka 5!) is: <script type=text/JavaScript> function factorial(n) { if (n == 0) { return 1; } ; return n * factorial(n-1); } document.write(factorial(5)); </script> </p>

Identifiers: A Variable or Function Name They identify a particular value or storage location. # Identifier for JavaScript def t_IDENTIFIER(token): r[A-Za-z][A-Za-z_]* return token Numbers in JavaScript: def t_NUMBER(token): r-?[0-9]+(?:\.[0-9]*)? token.value = float(token.value) return token End_of_Line Comments in JavaScript: def t_eolcomment(token): r//[^\n]* pass

Wrap Up: TOKENS HTML, JAVASCRIPT

Anonymous Functions: Making functions on the fly. Return here is implicit. # find max element of list_of_words according to the function given print findmax(lambda(word): word.find(python), list_of_words)

Page

10

Exercise: Identify Identifiers, and Decimal and Hexadecimal Numbers. import ply.lex as lex tokens = ( NUM, ID ) def t_ID(token): r[A-Za-z]+ return token def t_NUM_hex(token): r0x[0-9a-fA-F]+ token.value = int(token.value, 0) token.type = NUM return token def t_NUM_decimal(token): r[0-9]+ token.value = int(token.value) token.type = NUM return token t_ignore = \t\v\r def t_error(token): print Lexer: unexpected character + token.value[0] token.lexer.skip(1)

Regular Expressions SUBSTITUTE: re.sub(regexp, new_text, haystack) Exercise: Identify emails that may or may not contain NOSPAM text in it. import ply.lex as lex import re tokens = (EMAIL) def t_EMAIL(token): r[A-Za-z]+@[A-Za-z]+(?:\.[A-Za-z]+)+ token.value = re.sub(rNOSPAM, , tok.value) return token def t_error(token): print Lexer: unexpected character + token.value[0] token.lexer.skip(1)
Page

11

def addresses(haystack): lexer = lex.lex() lexer.input(haystack) result = [] while True: tok = lexer.token() if not tok: break result += [(tok.value)] return result

Exercise: JavaScript Comments & Keywords. tokens = ( 'ANDAND', 'COMMA', 'DIVIDE', 'ELSE', 'EQUAL', 'EQUALEQUAL', 'FALSE', 'FUNCTION', 'GE', 'GT', # 'IDENTIFIER', 'IF', 'LBRACE', 'LE', 'LPAREN', 'LT', 'MINUS', 'NOT', # 'NUMBER', 'OROR', 'PLUS', 'RBRACE', 'RETURN', 'RPAREN', 'SEMICOLON', # 'STRING', 'TIMES', 'TRUE', 'VAR', ) # && # , # / # else # = # == # false # function # >= # > #### Not used in this problem. # if # { # <= # ( # < # # ! #### Not used in this problem. # || # + # } # return # ) # ; #### Not used in this problem. # * # true # var

states = ( ('jscomment','exclusive'), )

Page

12

def t_jscomment(token): r'/\*' token.lexer.begin('jscomment') def t_jscomment_end(token): r'\*/' token.lexer.lineno += token.value.count('\n') token.lexer.begin('INITIAL') def t_jscomment_error(token): token.lexer.skip(1) def t_eolcomment(token): r'//[^\n]+' pass t_ANDAND = r'&&' t_COMMA = r',' t_DIVIDE = r'/' t_ELSE = r'else' t_EQUAL = r'=' t_EQUALEQUAL = r'==' t_FALSE = r'false' t_FUNCTION = r'function' t_GE = r'>=' t_GT = r'>' t_IF = r'if' t_LBRACE = r'{' t_LE = r'<=' t_LPAREN = r'\(' t_LT = r'<' t_MINUS = r'-' t_NOT = r'!' t_OROR = r'\|\|' t_PLUS = r'\+' t_RBRACE = r'}' t_RETURN = r'return' t_RPAREN = r'\)' t_SEMICOLON = r';' t_TIMES = r'\*' t_TRUE = r'true' t_VAR = r'var' t_ignore = ' \t\v\r' # whitespace t_jscomment_ignore = ' \t\v\r' # whitespace def t_newline(t): r'\n' t.lexer.lineno += 1 def t_error(t): print "JavaScript Lexer: Illegal character " + t.value[0] t.lexer.skip(1)

Page

13

Exercise: JavaScript Numbers and Strings. import ply.lex as lex tokens = ( 'IDENTIFIER', 'NUMBER', 'STRING', ) def t_IDENTIFIER(token): r'[A-Za-z][A-Za-z_]*' return token def t_NUMBER(token): r'-?[0-9]+(?:\.[0-9]*)?' token.value = float(token.value) return token def t_STRING(token): r'"(?:[^\\]|(?:\\.))*"' token.value = token.value[1:-1] return token t_ignore = ' \t\v\r' # whitespace def t_newline(t): r'\n' t.lexer.lineno += 1 def t_error(t): print "JavaScript Lexer: Illegal character " + t.value[0] t.lexer.skip(1)

FSM Optimization: Removing Dead States. def nfsmtrim(edges, accepting): states = [] for e in edges: states += [e[0]] + edges[e] live = [] for s in states: if nfsmaccepts(s,edges,accepting,[]) != None: live += [s] new_edges = {} for e in edges: if e[0] in live: new_destinations = [] for destination in edges[e]: if destination in live:

Page

14

new_destinations += [destination] if new_destinations != []: new_edges[e] = new_destinations new_accepting = [] for s in accepting: if s in live: new_accepting += [s] return (new_edges, new_accepting)

UNIT 3 Grammars
Lexing --> list of tokens. A list of words isnt enough. They have to adhere to a valid structure. Grammars give Infinite Utterances, yet not All Utterances. Noam Chomsky --> Utterances have rules (governed by formal grammars) [Grammatical Sentences] Formal Grammars
non-terminals grammar

derivation Sentence / \ Subject Verb | | Students Think

Sentence --> Subject Verb Subject --> Teachers Subject --> Students terminals Verb --> write Verb --> think
rewrite rules

Recursion in a context-free (recursive) grammar can allow for an infinite number of utterances. Adding to the previous grammar the next rule: subj --> subj and subj we get: Sentence --> subj verb --> subj and subj ver --> Students and Teachers think

Syntactical Analysis (Parsing): Token List --> Valid in Grammar? Lexing + Parsing = Expressive Power or word rules + sentence rules = creativity! Statements: stmt --> identifier = exp exp --> exp + exp exp --> exp exp exp --> number

Page

15

Optional Parts of Languages: Sent --> OptAdj Subj Verb Subj --> William Subj --> Tell OptAdj --> Accurate OptAdj --> Verb --> shoots Verb --> bows Grammars can encode Regular Expressions: number = r[0-9]+ number --> digit more_digits more_digits --> digit more_digits more_digits --> digit --> 0 digit --> 1 ... digit --> 9

Grammars Regular Expressions Regular Expressions describe Regular Languages Grammars describe Context Free Languages A language L is a context-free language if there exists a context-free grammar G such that the set of strings accepted by G is exactly L. Context-free languages are strictly more powerful than regular languages.

Irregularities: Too complicated features that cannot be captured by regular expressions. Balanced Parentheses are not Regular: p --> ( p ) r\(*\)* p -->

We are going to use formal grammars to understand or describe HTML and JavaScript. Parse trees are a pictorial representation of the structure of an utterance. A parse tree demonstrates that a string is in the language of a grammar.

exp exp
exp --> exp + exp exp --> exp exp exp --> number

exp exp exp exp


num num

+ + exp exp
num num

exp exp
num num

+ +

1+2+3

1 1

2 2

Page

16

3 3

One trait shared by programming languages and natural languages is ambiguity. A grammar is ambiguous if at least 1 string in the grammar has more than 1 different parse tree.

Grammar for HTML & JavaScript: HTML (Partial Grammar) <b>welcome to <i>my</i> webpage!</b> html --> element html html html html --> element --> word elt html elt html element --> tag_open html tag_close tag_open --> < word > tag_close --> </ word > to html tc to html tc
< < word word > > elt elt html html </ </ word word b b > >

Page

JavaScript (Partial Grammar) === expressions ======== b word ... b word ... exp --> identifier exp --> number exp --> string welcome welcome exp --> exp + exp This subtree (part of the webpage) is influenced by the bold tag exp --> exp - exp exp --> exp * exp exp --> exp / exp exp --> exp < exp exp --> exp == exp exp --> exp && exp exp --> TRUE exp --> FALSE === statements ========== stmt --> identifier = exp stmt --> return exp stmt --> if exp compoundstmt stmt --> if exp compoundstmt else compoundstmt compoundstmt --> { stmtS } stmtS --> stmt ; stmtS stmtS --> === function calls and definitions ========== js --> element js js --> element --> function identifier ( optparams ) compoundstmt element --> stmt ; optparams --> params optparams --> params --> identifier , params params --> identifier === expressions continued ========== exp --> identifier ( optargs ) optargs --> args optargs --> args --> exp , args args --> exp

17

lambda: Make me a function, or I am defining an anonymous function. # Im assigning to the variable mystery the result # of the lambda expression mystery = lambda(x): x + 2 print mystery(3)

map Function: Takes a function as its first argument, and then a list, and it applies that function to each element of the list in turn, creating a new list. def mysquare(x): return x*x print map(mysquare, [1,2,3,4,5]) # [1,4,9,16,25] or print map(lambda(x): x*x,[1,2,3,4,5])

List Comprehensions: print [x*x for x in [1,2,3,4,5]] # [1,4,9,16,25] print [len(x) for x in [hello, my, friends]] # [5,2,7]

Generators: Filtering data. def odds_only(numbers): for n in numbers: if (n % 2) == 1: yield n print [x for x in odds_only([1,2,3,4,5])] # [1,3,5] or print [x for x in [1,2,3,4,5] if x % 2 == 1] # [1,3,5]

Encoding Grammars... A --> B C, ...

-->

[(A,[B, C]), ...]

and enumerating strings (a slow way): def expand(tokens, grammar): for pos in range(len(tokens)): for rule in grammar: if tokens[pos] == rule[0]: yield tokens[0:pos] + rule[1] + tokens[pos+1:]

Page

18

Reading Machine Minds 2 (Identifying empty Context-free Grammars):


def cfgempty(grammar,symbol,visited): if symbol in visited: return None elif not any([rule[0] == symbol for rule in grammar]): return [symbol] else: new_visited = visited + [symbol] for rhs in [r[1] for r in grammar if r[0] == symbol]: if all([None != cfgempty(grammar, r, new_visited) for r in rhs]): result = [] for r in rhs: result += cfgempty(grammar, r , new_visited) return result return None

Infinite Mind Reading (Identify infinite grammars [ones that accept infinite strings]):
def cfginfinite(grammar): for Q in [rule[0] for rule in grammar]: def helper(current, visited, sizexy): if current in visited: return sizexy > 0 else: new_visited = visited + [current] for rhs in [rule[1] for rule in grammar if rule[0] == current]: for symbol in rhs: if helper(symbol, new_visited, sizexy + len(rhs) - 1): return True return False if helper(Q, [], 0): return True return False

Detecting Ambiguity:
def expand(tokens_and_derivation, grammar): (tokens, derivation) = tokens_and_derivation for token_pos in range(len(tokens)): for rule_index in range(len(grammar)): rule = grammar[rule_index] if tokens[token_pos] == rule[0]: yield ((tokens[0:token_pos] + derivation + [rule_index])

rule[1]

tokens[token_pos+1:]),

Page

def isambig(grammar, start, utterance): enumerated = [ ([start],[]) ] while True: new_enumerated = enumerated for u in enumerated: for i in expand(u, grammar): if not i in new_enumerated: new_enumerated = new_enumerated + [i] if new_enumerated != enumerated: enumerated = new_enumerated else: break return len([x for x in enumerated if x[0] == utterance]) > 1

19

UNIT 4 Parsing
Given a string s and a grammar G, is s in the language of G? Lexical analysis, broke the string down into a stream of tokens, and syntactic analysis, took that stream of tokens and checked to see if they adhere to a context-free grammar. BRUTE Force try all options exhaustively. Memoization: ... is a computer science technique in which we keep a chart or record of previous computations and compute new values in terms of previous answers.
import timeit t = timeit.Timer(stmt=""" chart = {} def memofibo(n): if n <= 2: return 1 if n-2 not in chart: chart[n-2] = memofibo(n-2) if n-1 not in chart: chart[n-1] = memofibo(n-1) return chart[n-1] + chart[n-2] memofibo(25)""") print t.timeit(number=100) t2 = timeit.Timer(stmt=""" def fibo(n): if n <= 2: return 1 return fibo(n-1) + fibo(n-2) fibo(25)""") print t2.timeit(number=100)

Parsing State:

S --> --> + --> - --> 1 --> 2

input = 1 + 2
Seen Not Seen

input = 1 + 2

--> +
Parsing State

--> 2 --> + S --> Parsed!


Page

20

A PARSING STATE is a rewrite rule from the grammar augmented with 1 on the right hand side.

Memoization in our Parser:


parse([t1, t2, ..., tN,..., tlast]) chart [N] = all parse states we could be in after seeing t1, t2, ..., tN only!

--> + --> int N chart[N]

input = int1 + int2 1 (int) --> int --> + 2 (int +) --> + ... 3 (int + int) ...

0 () --> + --> int

Must add starting position or from position to our parse states. 0 () --> + chart[N] --> int1, seen N 1 (int) --> int --> + 2 (int +) --> + --> int2, seen 3 (...
2

...

If we can build the chart, we have solve parsing. if input is T tokens long: S --> start at 0

in

chart[T], then the string is in the language.

int + int + int +

parsing

start

chart[0]

S --> from 0

M I D D L E ?

generating

end

chart[T]

S --> from 0

Making intermediate entries: S --> + from j in chart[i] # seen i tokens We are expecting to see an , so we need to find all rules --> something in grammar, and bring them in. Predicting or Computing the CLOSURE (1 way to complete the parsing chart): chart[i] has X --> a b c d from j for all grammar rules c --> p q r we add c --> p q r from i to chart[i] Consuming or Shifting over the Input (1 more way to complete the parsing chart): chart[i] has X --> a b c d from j If c is a terminal, we are shifting over it: X --> a b c d from j into chart[i+1] IF c is the i+1-th input token

Page

21

Reduction: X --> a b We reduce by applying the rule in reverse. If a b blah it becomes X blah Reduction Walkthrough: T --> a B a B --> b b N chart[N] N
1

input = a b b a 0 ()
2

1 (a)
from 0

2 (a b)
0

T --> a B a,

T --> a B a, from B --> b b, from 1

B --> b b,

from 1

3 (a b b)
3

4 (a b b a)
T --> a B a ,
from 0

chart[N]

B --> b b , from 1 T --> a B a, from 0

AddtoChart: The chart coded in Python: A dictionary chart where: chart[i] = [P --> ( P ) from 0, P --> ( ) from 1, ... ] def addtochart(chart, index, state): if state in chart[index]: return False else: chart[index] += [state] return True Encode Grammar: S --> P P --> ( P ) P --> Encode Parsing States: X --> a b c d Writing Closure:
def closure (grammar, i, x, ab, cd, j): return [(rule[0], [], rule[1], i) for rule in grammar \ if cd <> [] and rule[0] == cd[0]]

grammar = [(S,[P]), (P,[(, P, )]), (P,[])]

from j

state = (X, [a, b], [c, d], j)

Writing Shift: def shift(tokens, i, x, ab, cd, j): if cd <> [] and tokens[i] == cd[0]: return (x,ab+cd[:1],cd[1:],j) Writing Reduction:
def reductions(chart, i, x, ab, cd, j): return [ (state[0], state[1]+[x], state[2][1:], state[3]) for state \ in chart[j] if cd == [] and state[2] <> [] and state[2][0] == x ]

Page

22

Putting All Together: def parse(tokens, grammar): tokens += [end_of_input_marker] chart = {} start_rule = grammar[0] # By Convention, the first rule in the grammar for i in range(len(tokens)+1): chart[i] = [] start_state = (start_rule[0], [], start_rule[1], 0) chart[0] = [start_state] for i in range(len(tokens)): while True: changes = False for state in chart[i]: # State == X --> a b . c d, from j x = state[0] ab = state[1] cd = state[2] j = state[3] next_states = closure(grammar, i, x, ab, cd, j) for next_state in next_states: changes = addtochart(chart, i, next_state) or changes next_state = shift(tokens, i, x, ab, cd, j) if next_state <> None: changes = addtochart(chart, i+1, next_state) or changes next_states = reductions(chart, i, x, ab, cd, j) for next_state in next_states: changes = addtochart(chart, i, next_state) or changes not changes: break accepting_state = (start_rule[0], start_rule[1], [], 0) return accepting_state in chart[len(tokens)-1] if

Parse Trees: We also need to produce parse trees to get their meaning and interpret HTML and JavaScript programs. The format we are going to use for our parse trees is nested tuples.
parse rule lhs rhs parse trees

def p_exp_number(p): def p_exp_not(p): exp0 : NUMBER exp0 : NOT1 exp 1 2 p[0] = (number, p[1]) p[0] = (not, p[2])
returned parse tree

Page

Parsing Tags: def p_elt_tag(p): elt : LANGLE WORD tag_args RANGLE html LANGLESLASH WORD RANGLE p[0] = (tag-element, p[2], p[3], p[5], p[7])

23

Parsing JavaScript: def p_exp_binop(p): exp : exp PLUS exp | exp MINUS exp | exp TIMES exp p[0] = (binop, p[1], p[2], p[3]) Setting Associativity and Precedence: Issues need to be resolved are associativity and precedence. precedence = ( # lower precedence (left, PLUS, MINUS) (left, TIMES, DIVIDE) # higher precedence )

Parsing JavaScript Statements:


import ply.yacc as yacc import ply.lex as lex import jstokens from jstokens import tokens start = 'js' # use our JavaScript lexer # use our JavaScript tokens

# the start symbol in our grammar

def p_js(p): 'js : element js' p[0] = [p[1]] + p[2] def p_js_empty(p): 'js : ' p[0] = [ ] def p_element_function(p): 'element : FUNCTION IDENTIFIER LPAREN optparams RPAREN compoundstmt' p[0] = ('function', p[2], p[4], p[6]) def p_element_statement(p): 'element : stmt SEMICOLON' p[0] = ('stmt', p[1]) def p_optparams(p): 'optparams : params' p[0] = p[1] def p_optparams_empty(p): 'optparams : ' p[0] = [] def p_params(p): 'params : IDENTIFIER COMMA params' p[0] = [ p[1] ] + p[3] def p_params_last(p): 'params : IDENTIFIER' p[0] = [ p[1] ]

Page

24

def p_compoundstmt(p): 'compoundstmt : LBRACE statements RBRACE' p[0] = p[2] def p_statements(p): 'statements : stmt SEMICOLON statements' p[0] = [ p[1] ] + p[3] def p_statements_empty(p): 'statements : ' p[0] = [] def p_stmt_if_then(p): 'stmt : IF exp compoundstmt' p[0] = ('if-then', p[2], p[3]) def p_stmt_if_then_else(p): 'stmt : IF exp compoundstmt ELSE compoundstmt' p[0] = ('if-then-else', p[2], p[3], p[5]) def p_stmt_assignment(p): 'stmt : IDENTIFIER EQUAL exp' p[0] = ('assign', p[1], p[3]) def p_stmt_return(p): 'stmt : RETURN exp' p[0] = ('return', p[2]) def p_stmt_var(p): 'stmt : VAR IDENTIFIER EQUAL exp' p[0] = ('var', p[2], p[4]) def p_stmt_exp(p): 'stmt : exp' p[0] = ('exp', p[1]) # For now, we will assume that there is only one type of expression. def p_exp_identifier(p): 'exp : IDENTIFIER' p[0] = ("identifier",p[1]) jslexer = lex.lex(module=jstokens) jsparser = yacc.yacc() jslexer.input(input_string) parse_tree = jsparser.parse(input_string,lexer=jslexer) print parse_tree

Page

25

Parsing Javascript Expressions:


import ply.yacc as yacc import ply.lex as lex import jstokens from jstokens import tokens start = 'exp' # use our JavaScript lexer # use our JavaScript tokens

# we'll start at expression this time

precedence = ( ('left', 'OROR'), ('left', 'ANDAND'), ('left', 'EQUALEQUAL'), ('left', 'LT', 'GT', 'LE', 'GE'), ('left', 'PLUS', 'MINUS'), ('left', 'TIMES', 'DIVIDE', 'MOD'), ('right', 'NOT') ) def p_exp_identifier(p): 'exp : IDENTIFIER' p[0] = ("identifier",p[1]) def p_exp_number(p): 'exp : NUMBER' p[0] = ('number',p[1]) def p_exp_string(p): 'exp : STRING' p[0] = ('string',p[1]) def p_exp_true(p): 'exp : TRUE' p[0] = ('true',p[1]) def p_exp_false(p): 'exp : FALSE' p[0] = ('false',p[1]) def p_exp_not(p): 'exp : NOT exp' p[0] = ('not', p[2]) def p_exp_parens(p): 'exp : LPAREN exp RPAREN' p[0] = p[2] def p_exp_lambda(p): 'exp : FUNCTION LPAREN optparams RPAREN compoundstmt' p[0] = ("function", p[3], p[5])

Page

26

def p_exp_binop(p): """exp : exp OROR exp | exp ANDAND exp | exp EQUALEQUAL exp | exp MOD exp | exp LT exp | exp GT exp | exp LE exp | exp GE exp | exp PLUS exp | exp MINUS exp | exp TIMES exp | exp DIVIDE exp""" p[0] = ('binop', p[1], p[2], p[3]) def p_exp_call(p): 'exp : IDENTIFIER LPAREN optargs RPAREN' p[0] = ('call', p[1], p[3]) def p_optargs(p): 'optargs : args' p[0] = p[1] def p_optargs_empty(p): 'optargs : ' p[0] = [] def p_args(p): 'args : exp COMMA args' p[0] = [ p[1] ] + p[3] def p_args_last(p): 'args : exp' p[0] = [ p[1] ] jslexer = lex.lex(module=jstokens) jsparser = yacc.yacc() jslexer.input(input_string) parse_tree = jsparser.parse(input_string,lexer=jslexer) print parse_tree

Page

27

UNIT 5 Interpreting
A bug is just an instance where the programs meaning is different from its specification. But in practice a lot of the time the mistake is actually with the specification. Regardless of whether the problem is with the source code or the specification, understanding what code means in context is critical to figuring out if its right or wrong. Interpreters: An interpreter finds the meaning of a program by traversing its parse tree. String of HTML + JavaScript --> Break it down to words (Lexical Analysis) --> Parse those into a tree (Syntactic Analysis) --> Walk that tree and understand it (Semantics or Interpreting). Syntax vs. Semantics: Lexing and parsing deal with the form of an utterance. We now turn our attention to semantics, the meaning of an utterance. A well-formed sentence in a natural language can be "meaningless" or "hard to interpret". Similarly, a syntactically valid program can lead to a run-time error if we try to apply the wrong sort of operation to the wrong sort of thing (e.g., 3 + "hello"). Semantic Analysis: The process of looking at a programs source code and trying to see if its going to be well-behaved or not is known as type checking or semantic analysis. [One goal of semantic analysis is to notice and rule out bad programs (i.e., programs that will apply the wrong sort of operation to the wrong sort of object). This is often called type checking.] Types: A type is a set of similar objects (e.g., number or string or list) with an associated set of valid operations (e.g., addition or length). Lists
Numbers Strings / 1 5 a world + hello len Each operation has a different meaning for different types of data. [a, b] [1:-1] [1, 2, 3] + len []

* 3.14 3189 + 514

Graphics: Render a webpage. Well use a library to do that for us.


Nelson Mandela <b> was elected </b> democratically. graphics.word(Nelson) graphics.word(Mandela) graphics.begintag(b, { }) graphics.word(was) graphics.word(elected) graphics.endtag( ) graphics.word(democratically)

Page

28

Nelson Mandela was elected democratically.

Writing an Interpreter: All there is in HTML is word-elements, tag-elements and javascript-elements, and we see how to handle the first two.
import graphics def interpret(trees): # Hello, friend for tree in trees: # Hello, # ("word-element","Hello") nodetype=tree[0] # "word-element" if nodetype == "word-element": graphics.word(tree[1]) elif nodetype == "tag-element": # <b>Strong text</b> tagname = tree[1] # b tagargs = tree[2] # [] subtrees = tree[3] # ...Strong Text!... closetagname = tree[4] # b if not tagname == closetagname: graphics.warning('Tag mismatch!') else: graphics.begintag(tagname,tagargs) interpret(subtrees) graphics.endtag()
# Note that graphics.initialize and finalize will only work surrounding a call to interpret

graphics.initialize() # Enables display of output.\ interpret([("word-element","Hello,"),("tag-element", element', 'World!')], 'b')]) graphics.finalize() # Enables display of output.

'b',

[],

[('word-

Arithmetic: For the javascript-elements, well need to interpret the code to a string, and then call graphics.word() on that string. However, JavaScript is semantically richer than HTML and the process of interpretation wont be that simple. We are going to write a recursive procedure to interpret JavaScript arithmetic expressions. The procedure will walk over the parse tree of the expression. This is sometimes called evaluation.
# Write an eval_exp procedure to interpret JavaScript arithmetic expressions. # Only handle +, - and numbers for now. def eval_exp(tree): # ("number" , "5") # ("binop" , ... , "+", ... ) nodetype = tree[0] if nodetype == "number": return int(tree[1]) elif nodetype == "binop": left_child = tree[1] operator = tree[2] right_child = tree[3] left_value = eval_exp(left_child) right_value = eval_exp(right_child) if operator == "+": return left_value + right_value elif operator == "-": return left_value - right_value

Page

29

Context: We need to know the values of variables the context to evaluate an expression. The meaning of x+2 depends on the meaning of x (the current state of x). State: The state of a program execution is a mapping from variable names to values. Evaluating an expression requires us to know the current state. To evaluate x+2, well keep around a mapping {x: 3} (it will get more complicated later). This mapping is called the state. Variable Lookup: # ("binop", ("identifier","x"), "+", ("number","2")) def eval_exp(tree, environment): nodetype = tree[0] if nodetype == "number": return int(tree[1]) elif nodetype == "binop": left_value = eval_exp(tree[1], environment) operator = tree[2] right_value = eval_exp(tree[3], environment) if operator == "+": return left_value + right_value elif operator == "-": return left_value - right_value elif nodetype == "identifier": variable_name = tree[1] return env_lookup(environment, variable_name)

Control Flow: Python and JavaScript have conditional statements like if we say that such statements can change the flow of control through the program. Program elements that can change the flow of control, such as if or while or return, are often called statements. Typically statements contain expressions but not the other way around.

Evaluating Statements:
def eval_stmts(tree, environment): stmttype = tree[0] if stmttype == "assign": # ("assign", "x", ("binop", ..., "+", ...)) <=== x = ... + ... variable_name = tree[1] right_child = tree[2] new_value = eval_exp(right_child, environment) env_update(environment, variable_name, new_value) elif stmttype == "if-then-else": # if x < 5 then A;B; else C;D; conditional_exp = tree[1] # x < 5 then_stmts = tree[2] # A;B; else_stmts = tree[3] # C;D; if eval_exp(conditional_exp, environment): eval_stmts(then_stmts,environment) else: eval_stmts(else_stmts,environment)

Page

30

def eval_exp(exp, env): etype = exp[0] if etype == "number": return float(exp[1]) elif etype == "string": return exp[1] elif etype == "true": return True elif etype == "false": return False elif etype == "not": return not(eval_exp(exp[1], env)) def env_update(env, vname, value): env[vname] = value

Scope: We use the term scope to refer to the portion of a program where a variable has a particular value. So, environment CANNOT be a flat mapping {}.

Global x = outside x y = outside y myfun x = os lusiadas

Identifiers and Storage Places: Because the value of a variable can change, we will use explicit storage locations to track the current values of variables.

y = 2 def myfun(x): print x print y myfun(y+5)

y:2

pa ren t

x:7

Environments: There is a special global environment that can hold variable values. Other environments have parent pointers to keep track of nesting or scoping. Environments hold storage locations and map variables to values.

Environment

Page

31

Chained Environments: The process upon a function call is 1. Create a new environment. Its parent is the current environment. 2. Create storage places in the new environment for each formal parameter. 3. Fill in those places with the values of the actual arguments. 4. Evaluate the function body in the new environment.

gretting = hola def makegreeter(greeting): def greeter(person): print greeting + + person return greeter sayhello = makegreeter(hello from uttar pradesh) sayhello(lucknow)
Global greeting : hola makegreeter : ... sayhello : ...

# hello from uttar pradesh lucknow


makegreeter greeting : hello \ from uttar pradesh greeter : ...

greeter person : lucknow

Environments Needs: 1. Map variables to values, 2. Point to parent environment. So, well encode an environment as: (parent_pointer, dictionary). def env_lookup(vname, env): # (parent, dictionary) if vname in env[1]: return env[1][vname] elif env[0] == None: return None: else: return env_lookup(vname,env[0]) def env_update(vname, value, env): if vname in env[1]: env[1][vname] = value elif not env[0] == None: env_update(vname,value,env[0])

Catching Errors: Modern programming languages use exceptions to notice and handle run-time errors. "try-catch" or "try-except" blocks are syntax for handling such exceptions. try: print hello 1 / 0 # only runs if guarded block has an error except Exception as problem: print didnt work print problem

Page

32

Frames:
# Return will throw an excception # Function Calls: new environments, catch return values def eval_stmt(tree,environment): stmttype = tree[0] if stmttype == "call": # ("call", "sqrt", [("number","2")]) fname = tree[1] # "sqrt" args = tree[2] # [ ("number", "2") ] fvalue = env_lookup(fname, environment) if fvalue[0] == "function": # We'll make a promise to ourselves: # ("function", params, body, env) fparams = fvalue[1] # ["x"] fbody = fvalue[2] fenv = fvalue[3] if len(fparams) <> len(args): print "ERROR: wrong number of args" else: new_env = (fenv, dict((fparams[i], \ eval_exp(args[i],environment)) for i in range(len(args)))) try: eval_stmts(fbody,new_env) return None except Exception as return_value: return return_value else: print "ERROR: call to non-function" elif stmttype == "return": retval = eval_exp(tree[1],environment) raise Exception(retval) elif stmttype == "exp": eval_exp(tree[1],environment) def env_lookup(vname,env): if vname in env[1]: return (env[1])[vname] elif env[0] == None: return None else: return env_lookup(vname,env[0]) def env_update(vname,value,env): if vname in env[1]: (env[1])[vname] = value elif not (env[0] == None): env_update(vname,value,env[0]) def eval_exp(exp,env): etype = exp[0] if etype == "number": return float(exp[1]) elif etype == "binop": a = eval_exp(exp[1],env) op = exp[2] b = eval_exp(exp[3],env) if op == "*": return a*b elif etype == "identifier": vname = exp[1] value = env_lookup(vname,env)

Page

33

if value == None: print "ERROR: unbound variable " + vname else: return value def eval_stmts(stmts,env): for stmt in stmts: eval_stmt(stmt,env) sqrt = ("function",("x"),(("return",("binop",("identifier","x"), \ "*",("identifier","x"))),),{}) environment = (None,{"sqrt":sqrt}) print eval_stmt(("call","sqrt",[("number","2")]),environment)

Function Definitions:
fname fparams

function myfun(x) { return x+1; fbody } env[fname] = (function, fparams, fbody, fenv)
environment we were in when the function was defined

def eval_elt(tree,env): elttype = tree[0] if elttype == function: fname = tree[1] fparams = tree[2] fbody = tree[3] fvalue = (function, fparams, fbody, env) add_to_env(env, fname, fvalue)

Double Edged Sword: We can simulate JavaScript programs with our interpreter written in Python. That means that anything that can be done in JavaScript could be done in Python as well. It turns out that JavaScript could also simulate Python. So they are equally powerful! (Turing Complete. Turing Machine a mathematical model of computation.) Natural Language Power: While most computer languages are equivalent (in that any computation that can be done in one can also be done in another), it is debated whether the same is true for natural languages. Infinite Loops: Computer programs can contain infinite loops. A program either terminates (halts) in finite time or loops forever. We would like to tell if a program loops forever or not. It is provably impossible to write a procedure that can definitely tell if every other procedure loops forever or not.

Page

34

This Sentence is False: If tsif halts, then it loops forever. If tsif loops forever, then it halts. Both cases lead to a contradiction. Therefore, halts() cannot exist.

Adding While Loop to the JavaScript Interpreter:


def eval_while(while_stmt, env): conditional = while_stmt[1] loop_body = while_stmt[2] while eval_exp(conditional, env): eval_stmts(loop_body,env) or def eval_while(while_stmt, env): conditional_exp = while_stmt[1] loop_body = while_stmt[2] if eval_exp(conditional_exp, env): evalstmts(loop_body, env) eval_while(while_stmt, env)

def tsif( ): if halts(tsif): x = 0 while True: x = x + 1 else: return 0

UNIT 6 Building a Web Browser


Web Browser Architecture: Our HTML lexer, parser and interpreter will drive the main process; our JavaScript lexer, parser and interpreter will serve as subroutines. 1. Web page is lexed and parsed. 2. HTML interpreter walks the Abstract Syntax Tree, and calls the JavaScript interpreter. 3. JavaScript code calls write(). 4. JavaScript interpreter stores text from write(). 5. HTML interpreter calls graphics library. 6. Final image of web page is created. Fitting Them Together: We change our HTML lexer to recognize embedded JavaScript fragments as single tokens (we treat JavaScript as a single HTML token). We'll pass the contents of those tokens to our JavaScript lexer, parser and interpreter later.
def t_javascript(token): # Several backslashes may be unnecessary, but they are there to # make sure that the re will be interpreted correctly in any case. # This is called Defensive programming, and it is more commonly invoked # when dealing with security or correctness requirements. r\<script\ type=\text\/javascript\\> token.lexer.code_start = token.lexer.lexpos token.lexer.begin(javascript) def t_javascript_end(token): r\<\/script\> token.value = token.lexer.lexdata[token.lexer.code_start: \ token.lexer.lexpos-9] token.type = JAVASCRIPT token.lexer.lineno += token.value.count(\n) token.lexer.begin(INITIAL) return token

Page

35

Extending our HTML Grammar: We extend our HTML parser to handle our special token representing embedded JavaScript. def p_element_javascript(p): element : JAVASCRIPT p[0] = (javascript-element, p[1])

HTML Interpreter on JavaScript Elements: def interpret(trees): for tree in trees: treetype = tree[0] if treetype == word-element: graphics.word(tree[1]) elif treetype == tag-element: ... elif treetype == javascript-element: jstext = tree[1] jslexer = lex.lex(module=jstokens) jsparser = yacc.yacc(module=jsgrammar) jstree = jsparser.parse(jstext, lexer=jslexer) result = jsinterp.interpret(jstree) graphics.word(result)

JavaScript Output: A JavaScript program may contain zero, one or many calls to write(). We will use environments to capture the output of a JavaScript program. Assume every call to write appends to the special javascript output variable in the global environment. def interpret(trees): global_env = (None, {javascript output: }) for elt in trees: eval_elt(elt,global_env) return global_env[1][javascript output]

JavaScript Interpreter, Updating Output: def eval_exp(tree,env): exptype = tree[0] if exptype == call: fname = tree[1] fargs = tree[2] fvalue = env_lookup(fname,env) if fname == write: argval = eval_exp(fargs[0], env) output_sofar = env_lookup(javascript output, env) env_update(javascript output, \ output_sofar + str(argval), env) return None

Page

36

Debugging: A good test case gives us confidence that a program implementation adheres to its specification. In this situation, a good test case reveals a bug. Testing: We use testing to gain confidence that an implementation (a program) adheres to it specification (the task at hand). If a program accepts an infinite set of inputs, testing alone cannot prove that program's correctness. Software maintenance (i.e., testing, debugging, refactoring) carries a huge cost. Testing In Depth: When developing a project, there are two ways we could go ahead. Either by planning and reasoning about the implementation in advance and then write the code with a high confidence that it will be free of bugs, or because of (time) constraints just implement it and then test the implementation. To test the implementation, we develop test cases (code that uses the program we would like to test), and if we observe a bug, we start commenting out lines (Fault Localization) of the test file, going back and forth with the commenting/uncommenting of the lines, see if it still breaks, and manage to pinpoint the bug. Anonymous Functions in our JavaScript Interpreter: def eval_exp(tree,env): exptype = tree[0] if exptype == function: # function(x,y) { return x+y; } fparams = tree[1] fbody = tree[2] return (function, fparams, fbody, env) # For an anonymous function, we dont add it to the # environment, unless the user assigns it.

Optimization: An optimization improves the performance of a program while retaining its meaning (i.e., without changing the output). Implementing Optimizations: 1. Think of optimizations. (e.g., x = x + 0, x = x * 1) 2. Transform parse tree (directly). Note: Replacing an expensive multiplication with a cheaper addition is an instance of strength reduction. Optimization Timing: In this class we will optionally perform optimization after parsing but before interpreting. Our optimizer takes a parse tree as input and returns a (simpler) parse tree as output.

Program Text Result

Lexing

Tokens

Parsing Tree
(simpler)

Tree Optimization (optional)

Page

37

(meaning)

Interpreting

def optimize(tree): etype = tree[0] if etype == binop: a = tree[1] op = tree[2] b = tree[3]

* /\ 5 1

(binop, (number, 5), *, (number, 1)) (number, 5)

if op == * and b == return a if op == "*" and b == return ("number", if op == "+" and b == return a return tree

(number, 1): ("number", "0"): "0") # or return b ("number", "0"):

Rebuilding The Parse Tree: We desire an optimizer that is recursive. We should optimize the child nodes of a parse tree before optimizing the parent nodes. 1. Recursive calls, 2. Look for patterns, 3. Done. def optimize(tree): # Expression trees only etype = tree[0] if etype == "binop": a = optimize(tree[1]) op = tree[2] b = optimize(tree[3]) if op == "*" and b == ("number","1"): return a elif op == "*" and b == ("number","0"): return ("number","0") elif op == "+" and b == ("number","0"): return a return tree return tree

Wrap Up:

Lexing

regular expressions finite state machines context free grammars Parsing dynamic programming / parse trees must retain meaning Optimizing

Interpreting Debugging

walks A.S.T. recursively gain confidence

Page

38

HTML embedded in JavaScript embedded in HTML embedded in JavaScript...:


import import import import import import import import ply.lex as lex ply.yacc as yacc graphics as graphics jstokens jsgrammar jsinterp htmltokens htmlgrammar = = = = lex.lex(module=htmltokens) yacc.yacc(module=htmlgrammar,tabmodule="parsetabhtml") lex.lex(module=jstokens) yacc.yacc(module=jsgrammar,tabmodule="parsetabjs")

htmllexer htmlparser jslexer jsparser

def interpret(ast): for node in ast: nodetype = node[0] if nodetype == "word-element": graphics.word(node[1]) elif nodetype == "tag-element": tagname = node[1]; tagargs = node[2]; subast = node[3]; closetagname = node[4]; if (tagname <> closetagname): graphics.warning("(mistmatched " + \ tagname + " " + closetagname + ")") else: graphics.begintag(tagname,tagargs); interpret(subast) graphics.endtag(); elif nodetype == "javascript-element": jstext = node[1]; jsast = jsparser.parse(jstext,lexer=jslexer) result = jsinterp.interpret(jsast) htmlast = htmlparser.parse(result,lexer=htmllexer) interpret(htmlast) webpage = ... htmlast = htmlparser.parse(webpage,lexer=htmllexer) graphics.initialize() interpret(htmlast) graphics.finalize()

Bending Numbers:
# Write a procedure optimize(exp) that takes a JavaScript expression AST # node and returns a new, simplified JavaScript expression AST. You must # handle: # # X * 1 == 1 * X == X for all X # X * 0 == 0 * X == 0 for all X # X + 0 == 0 + X == X for all X # X - X == 0 for all X # # and constant folding for +, - and * (e.g., replace 1+2 with 3)

Page

39

def optimize(exp): etype = exp[0] if etype == "binop": a = optimize(exp[1]) op = exp[2] b = optimize(exp[3]) if op == "+" and a == ("number", 0): return b elif op == "+" and b == ("number", 0): return a if op == "*" and a == ("number", 1): return b elif op == "*" and b == ("number", 1): return a if op == "*" and (a == ("number", 0) or b == ("number", 0)): return ("number", 0) if op == "-" and a == b: return ("number", 0) if a[0] == b[0] == "number": if op == "+": return ("number", a[1] + b[1]) if op == "-": return ("number", a[1] - b[1]) if op == "*": return ("number", a[1] * b[1]) return ("binop", a, op, b) return exp

The Living and the Dead:


# # # # # # # # # # # # # # # # # # # # # # # Those lines can be safely removed because they do not compute a value that is used later. We say that a variable is LIVE if the value it holds may be needed in the future. More formally, a variable is LIVE if its value may be read before the next time it is overwritten. Whether or not a variable is LIVE depends on where you are looking in the program, so most formally we say a variable is live at some point P if it may be read before being overwritten after P. function myfun(a,b,c,d) { a = 1; # LIVE: nothing b = 2; # LIVE: b c = 3; # LIVE: c, b d = 4; # LIVE: c, b a = 5; # LIVE: a, c, b d = c + b; # LIVE: a, d return (a + d); }

Page

40

# Once we know which variables are LIVE, we can now remove assignments to # variables that will never be read later. Such assignments are called DEAD # code. Formally, given an assignment statement "X = ...", if "X" is not # live after that statement, the whole statement can be removed. # In this assignment, you will write an optimizer that removes dead code. # For simplicity, we will only consider sequences of assignment statements # (once we can optimize those, we could weave together a bigger optimizer # that handles both branches of if statements, and so on, but we'll just do # simple lists of assignments for now). # Write a procedure removedead(fragment,returned). "fragment" is encoded # as above. "returned" is a list of variables returned at the end of the # fragment (and thus LIVE at the end of it). # # Hint 1: One way to reverse a list is [::-1] # >>> [1,2,3][::-1] # [3, 2, 1] def removedead(fragment,returned): old_fragment = fragment new_fragment = [] live = returned for stmt in fragment[::-1]: if stmt[0] in live: new_fragment = [stmt] + new_fragment live = [x for x in live if x != stmt[0]] live = live + stmt[1] if new_fragment == old_fragment: return new_fragment else: return removedead(new_fragment, returned)

Find all the Subsets of a Set: def all_subsets(lst): pset = [[]] for elem in lst: pset += [x + [elem] for x in pset] return pset

UNIT 7 Wrap Up
Review: A language is a set of strings. Regular Expressions concise notation for specifying some sets of strings (regular languages). Finite State Machines pictorial representation + way to implement regular expressions (deterministic or not). Context-Free Grammars concise notation for specifying some sets of strings (context-free languages). Memoization (also called Dynamic Programming) keep previous results in a chart to save computation. Lexing break a big string up into a list of tokens (words) (specified using r.e.). Parsing determine if a list of tokens is in the language of a CFG. If so, produce a Parse Tree. Type a type is a set of values and associated safe operations. Semantics (Meaning) a program may have type errors (or other exceptions) or it may produce a value.

Page

41

Optimization replace a program with another that has the same semantics (but use fewer resources). Interpretation recursive walk over the (optimized) parse tree. the meaning of a program is computed from the meanings of its subexpressions. Web Browser lex and parse html, treating JS as a special token. HTML interpreter calls JS interpreter which returns a string. HTML interpreter calls graphics library to display them. Security: Computing in the presence of an adversary.

Page

42

This file is not offered officially by Udacity.com. This material was created by a student as personal notes, while attending the lectures of the course CS262: Programming Languages. This material is offered freely. Lamprianidis Nick Last Edited: 07/27/2012

You might also like