Post on 18-Apr-2015
Pag
e1
CS262 Programming Languages
UNIT 1 – String Patterns
Building a Web Browser
HTMLJavaScript
WebPage
Image
WebBrowser
HMTL – Web page basics JavaScript – Web page computations
<b>hello
1+2+3
Web Page Source
<b>
hello1+2+3
Break it up
into importantwords
Understand
the structure
+/ \
1 + / \
2 3
Find
meaning
6
The goal is to use the web browser to structure the learning. Breaking up strings in Python: “Hello world”.find(“ ”) --> 5
“1 + 1 = 2”.find(“1”,2[starting position]) --> 4
“haystack”.find(“neddle”) --> -1 [not found]
Selecting Substrings: “hello”[1[start here]:3[up to but not including]] --> “el”
“hello”[1:[go as far as possible]] --> “ello”
Splitting Words by Whitespace: “Jane Eyre”.split() --> [“Jane”, “Eyre”]
We need more control over splitting strings --> Regular Expressions Regular Expressions: [1-3] –(matches or denotes)-> “1” “2” “3”
[a-b] --> “a” “b”
Pag
e2
1 2a-z
0-9
3
a-z
0-9
A module is a repository of library of functions and data. In Python, import brings in a module: import re
5-letter string: “[0-9]” Regular Expression: r“[0-9]” [matches 10 1-letter strings] findall takes a r.e. and a string, and returns a list of all of the substrings that match that r.e: re.findall(r“[0-9]”,“1+2==3”) --> [“1”, “2”, “3”]
re.findall(r“[1-2]”,“1+2==3”) --> [“1”, “2”]
re.findall(r“[a-c]”,“Barbara Liskov”) --> [“a”, “b”, “a”, “a”]
We’ll need to find /> and == for JavaScript and HTML. Thus, we need to express concatenation and repetition, to match more complicated [compound] strings: r“[a-c][1-2]” --> “a1” “a2” “b1” “b2” “c1” “c2”
r“[0-9][0-9]” --> “00” “01” “02” ... “99”
re.findall(r“[0-9][0-9]”,“July 28, 1821”) --> [“28”, “18”, “21”]
re.findall(r“[0-9][0-9]”,“12345”) --> [“12”, “34”]
re.findall(r“[a-z][0-9]”,“a1 2b cc3 44d”) --> [“a1”, “c3”]
+ (One or More times) Regular Expression: r“a+[looks back to the previous r.e.]” --> “a” “aa” “aaa” “aaaa” “aaaaa” ...
r“[0-1]+” -(matches)-> “0” “1” “00” “11” “01” “100” “1101” ...
Maximum Munch (an r.e. should consume the biggest string it can and not smaller parts): re.findall(r“[0-9]+”, “13 from 1 in 1776”) --> [“13”, “1”, “1776”]
re.findall(r“[0-9][ [match space]][0-9]+”, “a1 2b cc3 44d”) --> [“1 2”, “3 44”]
Finite State Machines --> A visual representation of regular expressions Suppose we want r“[0-9]+%[% is matched directly]”
1 2 30-9
0-9
%
Start here
States
Edges orTransitions
Accepting State
| (Disjunction or OR) Regular Expression: r“[a-z]+|[0-9]+”
re.findall(r“[a-z]+|[0-9]+”,“Goethe 1749”) --> [“Goethe”, “1749”]
Pag
e3
Optional Components (<something>|<nothing>):
1 3-
0-9
4
0-9
0-9
20-9
1 3-
ϵno input
orthe empty string
0-920-9
moreconsise
? (Optional, Zero or One time) Regular Expression: re.findall(r“-?[0-9]+”,“1861-1941 R. Tagore”) --> [“1861”, “-1941”] * (Zero or More times) Regular Expression: a+ aa* So now + * ? [ ] all mean something special in regular expressions. But if we want to refer to the symbols themselves, we can use: Escape Sequences. Escape Sequences: \ - Escape Character
r“\+\+” –(matches)-> “++” . (any character (except newline)) Regular Expression: re.findall(r“[0-9].[0-9]”,“1a1 222 cc3”) --> [“1a1”, “222”] ^ [caret] (anything except something) Regular Expression: re.findall(r“[0-9][^ab]”,“1a1 222 cc3”) --> [“1 ”, “22”, “2 ”]
(?: ) (parentheses to show structure) Regular Expression: re.findall(r“(?:do|re|mi)+”,“mimi rere midore doo-wop”) -->
[“mimi”, “rere”, “midore”, “do”] How to represent (encode) a FSM? – Dictionaries!
edges[(1,“a”)] = 2
Dictionaries: is_flower = {}
is_flower[‘rose’] = True
is_flower[‘dog’] = False
or is_flower = {‘rose’: True,
‘dog’: False}
>>> is_flower[‘rose’]
True
>>> is_flower[‘juliet’]
<Error>
1 2a
Pag
e4
Tuples: Tuples are immutable lists. point = (1,5)
point[0] == 1
point[1] == 5
Let’s encode r“a+1+”:
1 2 3a
a
1
1
edges = {(1,‘a’): 2, (2,‘a’): 2, (2,1): 3, (3,1): 3}
accepting = [3]
FSM Simulator: fsmsim(<string>, <starting state>, <edges>, <accepting>) --> True,
if the <string> is accepted by the FSM (<edges>,<accepting>)
def fsmsim(string,current,edges,accepting):
if string == “”:
return current in accepting
else:
letter = string[0]
if (current, letter) in edges:
destination = edges[(current, letter)]
remaining_string = string[1:]
return fsmsim(remaining_string,destination,edges,accepting)
else:
return False
Handling Epsilon and Ambiguity:
A FSM accepts a string s if there exists even one path from the start state to any accepting state following s. “easy-to-write” FSMs with epsilon transitions or ambiguity are known as non-deterministic (you may not know exactly where to go) finite state machines. A “lock-step” FSM with no epsilon edges or ambiguity is a deterministic finite state machine. [fsmsim can handle these]
Every non-deterministic FSM has a corresponding deterministic FSM that accepts exactly the same strings.
Non-deterministic FSMs are NOT more powerful, they are just more convenient. Idea: Build a deterministic machine D where every state in D corresponds to a SET of states in the non-deterministic machine.
1 2a
a
3ϵ
Pag
e5
Example: r“ab?c”
1 3a
4
2ϵ
4 5
ϵ ϵ
a a
1 4a
2,3,4,6b
5
c
c
noepsilon
noambiguity
Example 2:
1 3a
4
2
ϵ5
ϵ
ba
1 2,3a
2,4,5,6b
3
c
bc
noepsilon
noambiguity4
bc
c
6
c
Wrap Up: STRINGS – sequences of characters. REGULAR EXPRESSIONS – concise notation for specifying sets of strings. More flexible than fixed string matching. (phone numbers, words, numbers, quoted strings) <-- search for and match them. FINITE STATE MACHINES – pictorial equivalent to regular expressions. DETERMINISTIC – every FSM can be converted to a deterministic FSM. FSM SIMULATION – it is very easy (~10 lines of recursive code) to see if a det FSM accepts a string. Simulating Non-Deterministic FSMs: def nfsmsim(string, current, edges, accepting):
if string == “”:
return current in accepting
else:
letter = string[0:1]
if (current, letter) in edges:
remainder = string[1:]
newstates = edges[(current, letter)]
for next_state in newstates:
if nfsmsim(remainder, nextstate, edges, accepting):
return True
return False
Pag
e6
LexicalAnalysis
String List of
Tokens
Reading Machine Minds (Identifying empty FSMs):
def nfsmaccepts(current, edges, accepting, visited):
if current in visited:
return None
elif current in accepting:
return “”
else:
newvisited = visited + [current]
for edge in edges:
if edge[0] == current:
for newstate in edges[edge]:
foo = nfsmaccepts(newstate, edges, accepting, newvisited)
if foo != None:
return edge[1] + foo
return None
UNIT 2 – Lexical Analysis
A Lexical Analyzer is a program that reads in a web page or a bit of JavaScript and breaks it down into words. Specify HTML + JavaScript:
HyperTextMarkupLanguage tells a web browser how to display a web page. (a missing end tag causes the text starting from the start tag to be influenced, all the way to the end) Bold tag: <b>python</b> Underline tag: <u>python</u> Italics tag: <i>python</i>
Anchor tag: <a href = “http://www.google.com”>now!</a> [Here, href (hypertext reference) is an argument to the anchor tag] Paragraph tag: <p>python</p> LEXICAL ANALYSIS – breaking something into words TOKEN – smallest unit of lexical analysis output: words, strings, numbers, punctuation, not whitespace
So, lexical analysis breaks down a string into a list of tokens. Some HTML Tokens:
LANGLE <
LANGLESLASH </
RANGLE >
EQUAL =
STRING “google.com”
WORD Welcome!
Pag
e7
We’ll use regular expressions to specify tokens, and this is how we write out token definitions in Python:
def t_RANGLE(token):
r‘>’
return token
token name
regexp matching this token
return text unchanged
Token Values: By default, the value is the string matched.
def t_NUMBER(token):
r‘[0-9]+’
token.value = int(token.value)
return token Quoted Strings are critical to interpreting HTML and JavaScript:
def t_STRING(token):
r‘“[^”]*”’
return token We want to skip of pass over spaces!
def t_WHITESPACE(token):
r‘ ’
pass What’s left is words:
def t_WORD(token):
r‘[^ <>]+’
return token A LEXER (LEXical analyzER) is just a collection of token definitions.
When two token definitions can match the same string, the behavior of our lexical analyzer may be ambiguous. In our implementation, we favor the token definition listed first. String Snipping (remove quotes that are markers for strings and are separate from the meaning):
def t_STRING(token):
r‘“[^”]*”’
token.value = token.value[1:-1]
return token
Pag
e8
Let’s make a Lexical Analyzer:
import ply.lex as lex
tokens = (
‘LANGLE’, # <
‘LANGLESLASH’, # </
‘RANGLE’, # >
‘EQUAL’, # =
‘STRING’, # “hello”
‘WORD’ # Welcome!
)
t_ignore = ‘ ’ # shortcut for whitespace
def t_LANGLESLASH(token):
r‘</’
return token
def t_LANGLE(token):
r‘<’
return token
def t_RANGLE(token):
r‘>’
return token
def t_EQUAL(token):
r‘=’
return token
def t_STRING(token):
r‘“[^”]*”’
token.value = token.value[1:-1]
return token
def t_WORD(token):
r‘[^ <>\n]+’
return token
webpage = “This is <b>my</b> webpage!”
# the next line tells our lexical analysis library that we want to use
# all of the token definitions above to make a lexical analyzer, and
# break up strings.
htmllexer = lex.lex()
htmllexer.input(webpage)
while True:
tok = htmllexer.token()
if not tok: break
# tok --> LexToken(<NAME>, <token>, <line>, <character>)
print tok
Pag
e9
Tracking Line Numbers:
def t_NEWLINE(token):
r‘\n’
token.lexer.lineno += 1
pass # Comments (documentation, removing functionality):
HTML comments: <!-- comments -->
Adding support for HTML comments to our lexical analyzer. Comments will be modeled as a separate FSM that will ignore everything. Lexer States:
states = (
# exclusive – if I am in the middle of processing an HTML comment,
# I can’t be doing anything else.
(‘htmlcomment’, ‘exclusive’)
)
# It goes before the normal lexer
def t_htmlcomment(token):
r‘<!--’
token.lexer.begin(‘htmlcomment’)
def t_htmlcomment_end(token):
r‘-->’
token.lexer.lineno += token.value.count(‘\n’)
token.lexer.begin(‘INITIAL’)
# We’ve said what to do when an HTML comment begins and ends, but
# any other character we see in this special HTML comment mode
# isn’t going to match one of those two rules. So...
def t_htmlcomment_error(token):
token.lexer.skip(1)
# It’s a lot like pass except that is gathers up all of the text
# into one big value so that I can count the newlines later
Pag
e10
Introducing JavaScript (with an example): <p>
Welcome to <b>my</b> webpage.
Five factorial (aka 5!) is:
<script type=“text/JavaScript”>
function factorial(n) {
if (n == 0) {
return 1;
} ;
return n * factorial(n-1);
}
document.write(factorial(5));
</script>
</p>
Identifiers: A Variable or Function Name They identify a particular value or storage location.
# Identifier for JavaScript
def t_IDENTIFIER(token):
r‘[A-Za-z][A-Za-z_]*’
return token Numbers in JavaScript:
def t_NUMBER(token):
r‘-?[0-9]+(?:\.[0-9]*)?’
token.value = float(token.value)
return token End_of_Line Comments in JavaScript:
def t_eolcomment(token):
r‘//[^\n]*’
pass Wrap Up: TOKENS HTML, JAVASCRIPT Anonymous Functions: Making functions on the fly. Return here is implicit.
# find max element of list_of_words according to the function given
print findmax(lambda(word): word.find(‘python’), list_of_words)
Pag
e11
Exercise: Identify Identifiers, and Decimal and Hexadecimal Numbers.
import ply.lex as lex
tokens = ( ‘NUM’, ‘ID’ )
def t_ID(token):
r‘[A-Za-z]+’
return token
def t_NUM_hex(token):
r‘0x[0-9a-fA-F]+’
token.value = int(token.value, 0)
token.type = ‘NUM’
return token
def t_NUM_decimal(token):
r‘[0-9]+’
token.value = int(token.value)
token.type = ‘NUM’
return token
t_ignore = ‘ \t\v\r’
def t_error(token):
print “Lexer: unexpected character ” + token.value[0]
token.lexer.skip(1)
Regular Expressions SUBSTITUTE: re.sub(regexp, new_text, haystack)
Exercise: Identify emails that may or may not contain NOSPAM text in it.
import ply.lex as lex
import re
tokens = (‘EMAIL’)
def t_EMAIL(token):
r‘[A-Za-z]+@[A-Za-z]+(?:\.[A-Za-z]+)+’
token.value = re.sub(r‘NOSPAM’, ‘’, tok.value)
return token
def t_error(token):
print ‘Lexer: unexpected character ’ + token.value[0]
token.lexer.skip(1)
Pag
e12
def addresses(haystack):
lexer = lex.lex()
lexer.input(haystack)
result = []
while True:
tok = lexer.token()
if not tok: break
result += [(tok.value)]
return result
Exercise: JavaScript Comments & Keywords. tokens = (
'ANDAND', # &&
'COMMA', # ,
'DIVIDE', # /
'ELSE', # else
'EQUAL', # =
'EQUALEQUAL', # ==
'FALSE', # false
'FUNCTION', # function
'GE', # >=
'GT', # >
# 'IDENTIFIER', #### Not used in this problem.
'IF', # if
'LBRACE', # {
'LE', # <=
'LPAREN', # (
'LT', # <
'MINUS', # -
'NOT', # !
# 'NUMBER', #### Not used in this problem.
'OROR', # ||
'PLUS', # +
'RBRACE', # }
'RETURN', # return
'RPAREN', # )
'SEMICOLON', # ;
# 'STRING', #### Not used in this problem.
'TIMES', # *
'TRUE', # true
'VAR', # var
)
states = (
('jscomment','exclusive'),
)
Pag
e13
def t_jscomment(token):
r'/\*'
token.lexer.begin('jscomment')
def t_jscomment_end(token):
r'\*/'
token.lexer.lineno += token.value.count('\n')
token.lexer.begin('INITIAL')
def t_jscomment_error(token):
token.lexer.skip(1)
def t_eolcomment(token):
r'//[^\n]+'
pass
t_ANDAND = r'&&'
t_COMMA = r','
t_DIVIDE = r'/'
t_ELSE = r'else'
t_EQUAL = r'='
t_EQUALEQUAL = r'=='
t_FALSE = r'false'
t_FUNCTION = r'function'
t_GE = r'>='
t_GT = r'>'
t_IF = r'if'
t_LBRACE = r'{'
t_LE = r'<='
t_LPAREN = r'\('
t_LT = r'<'
t_MINUS = r'-'
t_NOT = r'!'
t_OROR = r'\|\|'
t_PLUS = r'\+'
t_RBRACE = r'}'
t_RETURN = r'return'
t_RPAREN = r'\)'
t_SEMICOLON = r';'
t_TIMES = r'\*'
t_TRUE = r'true'
t_VAR = r'var'
t_ignore = ' \t\v\r' # whitespace
t_jscomment_ignore = ' \t\v\r' # whitespace
def t_newline(t):
r'\n'
t.lexer.lineno += 1
def t_error(t):
print "JavaScript Lexer: Illegal character " + t.value[0]
t.lexer.skip(1)
Pag
e14
Exercise: JavaScript Numbers and Strings. import ply.lex as lex
tokens = (
'IDENTIFIER',
'NUMBER',
'STRING',
)
def t_IDENTIFIER(token):
r'[A-Za-z][A-Za-z_]*'
return token
def t_NUMBER(token):
r'-?[0-9]+(?:\.[0-9]*)?'
token.value = float(token.value)
return token
def t_STRING(token):
r'"(?:[^”\\]|(?:\\.))*"'
token.value = token.value[1:-1]
return token
t_ignore = ' \t\v\r' # whitespace
def t_newline(t):
r'\n'
t.lexer.lineno += 1
def t_error(t):
print "JavaScript Lexer: Illegal character " + t.value[0]
t.lexer.skip(1)
FSM Optimization: Removing Dead States.
def nfsmtrim(edges, accepting):
states = []
for e in edges:
states += [e[0]] + edges[e]
live = []
for s in states:
if nfsmaccepts(s,edges,accepting,[]) != None:
live += [s]
new_edges = {}
for e in edges:
if e[0] in live:
new_destinations = []
for destination in edges[e]:
if destination in live:
Pag
e15
new_destinations += [destination]
if new_destinations != []:
new_edges[e] = new_destinations
new_accepting = []
for s in accepting:
if s in live:
new_accepting += [s]
return (new_edges, new_accepting)
UNIT 3 – Grammars Lexing --> list of tokens. A list of words isn’t enough. They have to adhere to a valid structure. Grammars give Infinite Utterances, yet not All Utterances. Noam Chomsky --> Utterances have rules (governed by formal grammars) [Grammatical Sentences] Formal Grammars
Sentence --> Subject Verb
Subject --> Teachers
Subject --> Students
Verb --> write
Verb --> think
grammar
rewrite rules
non-terminals
terminals
Recursion in a context-free (recursive) grammar can allow for an infinite number of utterances. Adding to the previous grammar the next rule: subj --> subj and subj we get: Sentence --> subj verb --> subj and subj ver
--> Students and Teachers think
Syntactical Analysis (Parsing): Token List --> Valid in Grammar?
Lexing + Parsing = Expressive Power or word rules + sentence rules = creativity! Statements: stmt --> identifier = exp
exp --> exp + exp
exp --> exp – exp
exp --> number
derivation
Sentence/ \
Subject Verb | |
Students Think
Pag
e16
Optional Parts of Languages: Sent --> OptAdj Subj Verb
Subj --> William
Subj --> Tell
OptAdj --> Accurate
OptAdj --> ϵ Verb --> shoots
Verb --> bows Grammars can encode Regular Expressions: number = r‘[0-9]+’
number --> digit more_digits
more_digits --> digit more_digits
more_digits --> ϵ digit --> 0
digit --> 1
...
digit --> 9
Grammars Regular Expressions Regular Expressions describe Regular Languages Grammars describe Context Free Languages A language L is a context-free language if there exists a context-free grammar G such that the set of strings accepted by G is exactly L. Context-free languages are strictly more powerful than regular languages. Irregularities: Too complicated features that cannot be captured by regular expressions.
Balanced Parentheses are not Regular: p --> ( p )
p --> ϵ r‘\(*\)*’
We are going to use formal grammars to understand or describe HTML and JavaScript. Parse trees are a pictorial representation of the structure of an utterance. A parse tree demonstrates that a string is in the language of a grammar.
expexp
expexp expexp++
expexp expexp++ numnum
numnum numnum 33
2211
exp --> exp + exp
exp --> exp – exp
exp --> number
1+2+3
Pag
e17
One trait shared by programming languages and natural languages is ambiguity. A grammar is ambiguous if at least 1 string in the grammar has more than 1 different parse tree. Grammar for HTML & JavaScript: HTML (Partial Grammar) html --> element html
html --> ϵ
element --> word
element --> tag_open html tag_close
tag_open --> < word >
tag_close --> </ word >
JavaScript (Partial Grammar) === expressions ========
exp --> identifier
exp --> number
exp --> string
exp --> exp + exp
exp --> exp - exp
exp --> exp * exp
exp --> exp / exp
exp --> exp < exp
exp --> exp == exp
exp --> exp && exp
exp --> TRUE
exp --> FALSE
=== statements ==========
stmt --> identifier = exp
stmt --> return exp
stmt --> if exp compoundstmt
stmt --> if exp compoundstmt else compoundstmt
compoundstmt --> { stmtS }
stmtS --> stmt ; stmtS
stmtS --> ϵ === function calls and definitions ==========
js --> element js
js --> ϵ element --> function identifier ( optparams ) compoundstmt
element --> stmt ;
optparams --> params
optparams --> ϵ params --> identifier , params
params --> identifier
=== expressions continued ==========
exp --> identifier ( optargs )
optargs --> args
optargs --> ϵ args --> exp , args
args --> exp
htmlhtml
eltelt htmlhtml
ϵ ϵ htmlhtml tctctoto
wordword<< >>
bb
eltelt htmlhtml
wordword
welcomewelcome
wordword</</ >>
bb......
<b>welcome to <i>my</i> webpage!</b>
This subtree (part of the webpage) is influenced by the bold tag
Pag
e18
lambda: “Make me a function”, or “I am defining an anonymous function”.
# I’m assigning to the variable mystery the result
# of the lambda expression
mystery = lambda(x): x + 2
print mystery(3) map Function: Takes a function as its first argument, and then a list, and it applies that function to each element of the list in turn, creating a new list.
def mysquare(x): return x*x
print map(mysquare, [1,2,3,4,5]) # [1,4,9,16,25]
or print map(lambda(x): x*x,[1,2,3,4,5])
List Comprehensions:
print [x*x for x in [1,2,3,4,5]] # [1,4,9,16,25]
print [len(x) for x in [‘hello’, ‘my’, ‘friends’]] # [5,2,7] Generators: Filtering data.
def odds_only(numbers):
for n in numbers:
if (n % 2) == 1:
yield n
print [x for x in odds_only([1,2,3,4,5])] # [1,3,5]
or print [x for x in [1,2,3,4,5] if x % 2 == 1] # [1,3,5] Encoding Grammars... A --> B C, ... --> [(‘A’,[‘B’, ‘C’]), ...]
and enumerating strings (a slow way): def expand(tokens, grammar):
for pos in range(len(tokens)):
for rule in grammar:
if tokens[pos] == rule[0]:
yield tokens[0:pos] + rule[1] + tokens[pos+1:]
Pag
e19
Reading Machine Minds 2 (Identifying empty Context-free Grammars):
def cfgempty(grammar,symbol,visited):
if symbol in visited:
return None
elif not any([rule[0] == symbol for rule in grammar]):
return [symbol]
else:
new_visited = visited + [symbol]
for rhs in [r[1] for r in grammar if r[0] == symbol]:
if all([None != cfgempty(grammar, r, new_visited) for r in rhs]):
result = []
for r in rhs:
result += cfgempty(grammar, r , new_visited)
return result
return None Infinite Mind Reading (Identify infinite grammars [ones that accept infinite strings]):
def cfginfinite(grammar):
for Q in [rule[0] for rule in grammar]:
def helper(current, visited, sizexy):
if current in visited:
return sizexy > 0
else:
new_visited = visited + [current]
for rhs in [rule[1] for rule in grammar if rule[0] == current]:
for symbol in rhs:
if helper(symbol, new_visited, sizexy + len(rhs) - 1):
return True
return False
if helper(Q, [], 0):
return True
return False Detecting Ambiguity:
def expand(tokens_and_derivation, grammar):
(tokens, derivation) = tokens_and_derivation
for token_pos in range(len(tokens)):
for rule_index in range(len(grammar)):
rule = grammar[rule_index]
if tokens[token_pos] == rule[0]:
yield ((tokens[0:token_pos] + rule[1] + tokens[token_pos+1:]),
derivation + [rule_index])
def isambig(grammar, start, utterance):
enumerated = [ ([start],[]) ]
while True:
new_enumerated = enumerated
for u in enumerated:
for i in expand(u, grammar):
if not i in new_enumerated:
new_enumerated = new_enumerated + [i]
if new_enumerated != enumerated:
enumerated = new_enumerated
else:
break
return len([x for x in enumerated if x[0] == utterance]) > 1
Pag
e20
UNIT 4 – Parsing Given a string s and a grammar G, is s in the language of G? – Lexical analysis, broke the string down into a stream of tokens, and syntactic analysis, took that stream of tokens and checked to see if they adhere to a context-free grammar. BRUTE Force – try all options exhaustively. Memoization: ... is a computer science technique in which we keep a ‘chart’ or ‘record’ of previous computations and compute new values in terms of previous answers. import timeit
t = timeit.Timer(stmt="""
chart = {}
def memofibo(n):
if n <= 2:
return 1
if n-2 not in chart:
chart[n-2] = memofibo(n-2)
if n-1 not in chart:
chart[n-1] = memofibo(n-1)
return chart[n-1] + chart[n-2]
memofibo(25)""")
print t.timeit(number=100)
t2 = timeit.Timer(stmt="""
def fibo(n):
if n <= 2:
return 1
return fibo(n-1) + fibo(n-2)
fibo(25)""")
print t2.timeit(number=100) Parsing State:
S --> εε --> ε + εε --> ε - εε --> 1ε --> 2
input = 1 + 2
ε --> ε + ● εSeen
Not Seen
Parsing State
input = 1 + 2
ε --> 2 ● ε --> ε + ε ● S --> ε ● Parsed!
A PARSING STATE is a rewrite rule from the grammar augmented with 1 ● on the right hand side.
Pag
e21
Memoization in our Parser: parse([t1, t2, ..., tN,..., tlast])
chart [N] = all parse states we could be in after seeing t1, t2, ..., tN only! ε --> ε + ε
ε --> int input = int
1 + int
2
N 0 () 1 (int) 2 (int +) 3 (int + int)
chart[N] ε --> ● ε + ε
ε --> ● int
ε --> int ●
ε --> ε ● + ε
ε --> ε + ● ε
... ...
Must add starting position or from position to our parse states.
N 0 () 1 (int) 2 (int +) 3 (...
chart[N] ε --> ● ε + ε
ε --> ● int1, seen 0
ε --> int ●
ε --> ε ● + ε
ε --> ε + ● ε
ε --> ● int2, seen 2
...
If we can build the chart, we have solve parsing. if input is T tokens long: S --> ε ● start at 0 in chart[T], then the string is in the language.
start end
chart[0] chart[T]
S --> ● ε from 0 S --> ε ● from 0
MIDDLE?
Making intermediate entries:
S --> ε + ● ε from j in chart[i] # seen i tokens
We are expecting to see an ε, so we need to find all rules ε --> something in grammar, and bring them in. Predicting or Computing the CLOSURE (1 way to complete the parsing chart): chart[i] has X --> a b ● c d from j
for all grammar rules c --> p q r we add c --> ● p q r from i to chart[i] Consuming or Shifting over the Input (1 more way to complete the parsing chart): chart[i] has X --> a b ● c d from j
If c is a terminal, we are shifting over it: X --> a b c ● d from j into chart[i+1] IF c is the i+1-th input token
int + int
ε + int
ε + ε
ε
par
sin
g
gen
erat
ing
Pag
e22
Reduction: X --> a b ● We reduce by applying the rule in reverse. If a b blah it becomes X blah
Reduction Walkthrough:
T --> a B a
B --> b b input = a b b a
N 0 () 1 (a) 2 (a b)
chart[N] T --> ● a B a, from 0 2T --> a ● B a, from 0
B --> ● b b, from 1 B --> b ● b, from 1
N 3 (a b b) 4 (a b b a)
chart[N]
1B --> b b ●, from 1
3T --> a B ● a, from 0
T --> a B a ●, from 0
AddtoChart: The chart coded in Python: A dictionary chart where: chart[i] = [P --> ( ● P ) from 0, P --> ● ( ) from 1, ... ]
def addtochart(chart, index, state):
if state in chart[index]:
return False
else:
chart[index] += [state]
return True
Encode Grammar: S --> P
P --> ( P )
P -->
grammar = [(‘S’,[‘P’]),
(‘P’,[‘(’, ‘P’, ‘)’]),
(‘P’,[])]
Encode Parsing States: X --> a b ● c d from j state = (‘X’, [‘a’, ‘b’], [‘c’, ‘d’], j)
Writing Closure: def closure (grammar, i, x, ab, cd, j):
return [(rule[0], [], rule[1], i) for rule in grammar \
if cd <> [] and rule[0] == cd[0]] Writing Shift: def shift(tokens, i, x, ab, cd, j):
if cd <> [] and tokens[i] == cd[0]:
return (x,ab+cd[:1],cd[1:],j) Writing Reduction: def reductions(chart, i, x, ab, cd, j):
return [ (state[0], state[1]+[x], state[2][1:], state[3]) for state \
in chart[j] if cd == [] and state[2] <> [] and state[2][0] == x ]
Pag
e23
Putting All Together: def parse(tokens, grammar):
tokens += [“end_of_input_marker”]
chart = {}
start_rule = grammar[0] # By Convention, the first rule in the grammar
for i in range(len(tokens)+1):
chart[i] = []
start_state = (start_rule[0], [], start_rule[1], 0)
chart[0] = [start_state]
for i in range(len(tokens)):
while True:
changes = False
for state in chart[i]:
# State == X --> a b . c d, from j
x = state[0]
ab = state[1]
cd = state[2]
j = state[3]
next_states = closure(grammar, i, x, ab, cd, j)
for next_state in next_states:
changes = addtochart(chart, i, next_state) or changes
next_state = shift(tokens, i, x, ab, cd, j)
if next_state <> None:
changes = addtochart(chart, i+1, next_state) or changes
next_states = reductions(chart, i, x, ab, cd, j)
for next_state in next_states:
changes = addtochart(chart, i, next_state) or changes
if not changes:
break
accepting_state = (start_rule[0], start_rule[1], [], 0)
return accepting_state in chart[len(tokens)-1]
Parse Trees: We also need to produce parse trees to get their meaning and interpret HTML and JavaScript programs. The format we are going to use for our parse trees is nested tuples.
def p_exp_number(p):
‘exp : NUMBER’
p[0] = (“number”, p[1])
parse rule
lhs
0
returned parse tree
rhs parse trees
1
def p_exp_not(p):
‘exp : NOT exp’
p[0] = (“not”, p[2])0 1 2
Parsing Tags: def p_elt_tag(p):
‘elt : LANGLE WORD tag_args RANGLE html LANGLESLASH WORD RANGLE’
p[0] = (‘tag-element’, p[2], p[3], p[5], p[7])
Pag
e24
Parsing JavaScript: def p_exp_binop(p):
‘exp : exp PLUS exp
| exp MINUS exp
| exp TIMES exp’
p[0] = (‘binop’, p[1], p[2], p[3]) Setting Associativity and Precedence: Issues need to be resolved are associativity and precedence.
precedence = (
# lower precedence
(‘left’, ‘PLUS’, ‘MINUS’)
(‘left’, ‘TIMES’, ‘DIVIDE’)
# higher precedence
)
Parsing JavaScript Statements: import ply.yacc as yacc
import ply.lex as lex
import jstokens # use our JavaScript lexer
from jstokens import tokens # use our JavaScript tokens
start = 'js' # the start symbol in our grammar
def p_js(p):
'js : element js'
p[0] = [p[1]] + p[2]
def p_js_empty(p):
'js : '
p[0] = [ ]
def p_element_function(p):
'element : FUNCTION IDENTIFIER LPAREN optparams RPAREN compoundstmt'
p[0] = ('function', p[2], p[4], p[6])
def p_element_statement(p):
'element : stmt SEMICOLON'
p[0] = ('stmt', p[1])
def p_optparams(p):
'optparams : params'
p[0] = p[1]
def p_optparams_empty(p):
'optparams : '
p[0] = []
def p_params(p):
'params : IDENTIFIER COMMA params'
p[0] = [ p[1] ] + p[3]
def p_params_last(p):
'params : IDENTIFIER'
p[0] = [ p[1] ]
Pag
e25
def p_compoundstmt(p):
'compoundstmt : LBRACE statements RBRACE'
p[0] = p[2]
def p_statements(p):
'statements : stmt SEMICOLON statements'
p[0] = [ p[1] ] + p[3]
def p_statements_empty(p):
'statements : '
p[0] = []
def p_stmt_if_then(p):
'stmt : IF exp compoundstmt'
p[0] = ('if-then', p[2], p[3])
def p_stmt_if_then_else(p):
'stmt : IF exp compoundstmt ELSE compoundstmt'
p[0] = ('if-then-else', p[2], p[3], p[5])
def p_stmt_assignment(p):
'stmt : IDENTIFIER EQUAL exp'
p[0] = ('assign', p[1], p[3])
def p_stmt_return(p):
'stmt : RETURN exp'
p[0] = ('return', p[2])
def p_stmt_var(p):
'stmt : VAR IDENTIFIER EQUAL exp'
p[0] = ('var', p[2], p[4])
def p_stmt_exp(p):
'stmt : exp'
p[0] = ('exp', p[1])
# For now, we will assume that there is only one type of expression.
def p_exp_identifier(p):
'exp : IDENTIFIER'
p[0] = ("identifier",p[1])
jslexer = lex.lex(module=jstokens)
jsparser = yacc.yacc()
jslexer.input(input_string)
parse_tree = jsparser.parse(input_string,lexer=jslexer)
print parse_tree
Pag
e26
Parsing Javascript Expressions: import ply.yacc as yacc
import ply.lex as lex
import jstokens # use our JavaScript lexer
from jstokens import tokens # use our JavaScript tokens
start = 'exp' # we'll start at expression this time
precedence = (
('left', 'OROR'),
('left', 'ANDAND'),
('left', 'EQUALEQUAL'),
('left', 'LT', 'GT', 'LE', 'GE'),
('left', 'PLUS', 'MINUS'),
('left', 'TIMES', 'DIVIDE', 'MOD'),
('right', 'NOT')
)
def p_exp_identifier(p):
'exp : IDENTIFIER'
p[0] = ("identifier",p[1])
def p_exp_number(p):
'exp : NUMBER'
p[0] = ('number',p[1])
def p_exp_string(p):
'exp : STRING'
p[0] = ('string',p[1])
def p_exp_true(p):
'exp : TRUE'
p[0] = ('true',p[1])
def p_exp_false(p):
'exp : FALSE'
p[0] = ('false',p[1])
def p_exp_not(p):
'exp : NOT exp'
p[0] = ('not', p[2])
def p_exp_parens(p):
'exp : LPAREN exp RPAREN'
p[0] = p[2]
def p_exp_lambda(p):
'exp : FUNCTION LPAREN optparams RPAREN compoundstmt'
p[0] = ("function", p[3], p[5])
Pag
e27
def p_exp_binop(p):
"""exp : exp OROR exp
| exp ANDAND exp
| exp EQUALEQUAL exp
| exp MOD exp
| exp LT exp
| exp GT exp
| exp LE exp
| exp GE exp
| exp PLUS exp
| exp MINUS exp
| exp TIMES exp
| exp DIVIDE exp"""
p[0] = ('binop', p[1], p[2], p[3])
def p_exp_call(p):
'exp : IDENTIFIER LPAREN optargs RPAREN'
p[0] = ('call', p[1], p[3])
def p_optargs(p):
'optargs : args'
p[0] = p[1]
def p_optargs_empty(p):
'optargs : '
p[0] = []
def p_args(p):
'args : exp COMMA args'
p[0] = [ p[1] ] + p[3]
def p_args_last(p):
'args : exp'
p[0] = [ p[1] ]
jslexer = lex.lex(module=jstokens)
jsparser = yacc.yacc()
jslexer.input(input_string)
parse_tree = jsparser.parse(input_string,lexer=jslexer)
print parse_tree
Pag
e28
Numbers
Strings
Lists
5
1
3.14
3189
514
/
-
*
+
“a”
“hello”
‘world’
len[1:-1]+
[ ]
[1, 2, 3]
[“a”, “b”]
len
+
Each operation has a different meaning for different types of data.
Nelson Mandela <b> was elected </b> democratically.
graphics.word(“Nelson”)graphics.word(“Mandela”)graphics.begintag(“b”, { })graphics.word(“was”)graphics.word(“elected”)graphics.endtag( )graphics.word(“democratically”)
Nelson Mandelawas electeddemocratically.
UNIT 5 – Interpreting
A bug is just an instance where the program’s meaning is different from its specification. But in practice a lot of the time the mistake is actually with the specification. Regardless of whether the problem is with the source code or the specification, understanding what code means in context is critical to figuring out if it’s right or wrong. Interpreters: An interpreter finds the meaning of a program by traversing its parse tree. String of HTML + JavaScript --> Break it down to words (Lexical
Analysis) --> Parse those into a tree (Syntactic Analysis) --> Walk
that tree and understand it (Semantics or Interpreting).
Syntax vs. Semantics: Lexing and parsing deal with the form of an utterance. We now turn our attention to semantics, the meaning of an utterance. A well-formed sentence in a natural language can be "meaningless" or "hard to interpret". Similarly, a syntactically valid program can lead to a run-time error if we try to apply the wrong sort of operation to the wrong sort of thing (e.g., 3 + "hello"). Semantic Analysis: The process of looking at a program’s source code and trying to see if it’s going to be well-behaved or not is known as type checking or semantic analysis. [One goal of semantic analysis is to notice and rule out bad programs (i.e., programs that will apply the wrong sort of operation to the wrong sort of object). This is often called type checking.] Types: A type is a set of similar objects (e.g., number or string or list) with an associated set of valid operations (e.g., addition or length).
Graphics: Render a webpage. We’ll use a library to do that for us.
Pag
e29
Writing an Interpreter: All there is in HTML is word-elements, tag-elements and javascript-elements, and we see how to handle the first two. import graphics
def interpret(trees): # Hello, friend
for tree in trees: # Hello,
# ("word-element","Hello")
nodetype=tree[0] # "word-element"
if nodetype == "word-element":
graphics.word(tree[1])
elif nodetype == "tag-element":
# <b>Strong text</b>
tagname = tree[1] # b
tagargs = tree[2] # []
subtrees = tree[3] # ...Strong Text!...
closetagname = tree[4] # b
if not tagname == closetagname:
graphics.warning('Tag mismatch!')
else:
graphics.begintag(tagname,tagargs)
interpret(subtrees)
graphics.endtag()
# Note that graphics.initialize and finalize will only work surrounding a call to interpret
graphics.initialize() # Enables display of output.\
interpret([("word-element","Hello,"),("tag-element", 'b', [], [('word-
element', 'World!')], 'b')])
graphics.finalize() # Enables display of output. Arithmetic: For the javascript-elements, we’ll need to interpret the code to a string, and then call graphics.word() on that string. However, JavaScript is semantically richer than HTML and the process of interpretation won’t be that simple. We are going to write a recursive procedure to interpret JavaScript arithmetic expressions. The procedure will walk over the parse tree of the expression. This is sometimes called evaluation.
# Write an eval_exp procedure to interpret JavaScript arithmetic expressions.
# Only handle +, - and numbers for now.
def eval_exp(tree):
# ("number" , "5")
# ("binop" , ... , "+", ... )
nodetype = tree[0]
if nodetype == "number":
return int(tree[1])
elif nodetype == "binop":
left_child = tree[1]
operator = tree[2]
right_child = tree[3]
left_value = eval_exp(left_child)
right_value = eval_exp(right_child)
if operator == "+":
return left_value + right_value
elif operator == "-":
return left_value - right_value
Pag
e30
Context: We need to know the values of variables — the context — to evaluate an expression. The meaning of
x+2 depends on the meaning of x (the current state of x). State: The state of a program execution is a mapping from variable names to values. Evaluating an expression requires us to know the current state. To evaluate x+2, we’ll keep around a mapping {‘x’: 3} (it will get more complicated later). This mapping is called the state. Variable Lookup: # ("binop", ("identifier","x"), "+", ("number","2"))
def eval_exp(tree, environment):
nodetype = tree[0]
if nodetype == "number":
return int(tree[1])
elif nodetype == "binop":
left_value = eval_exp(tree[1], environment)
operator = tree[2]
right_value = eval_exp(tree[3], environment)
if operator == "+":
return left_value + right_value
elif operator == "-":
return left_value - right_value
elif nodetype == "identifier":
variable_name = tree[1]
return env_lookup(environment, variable_name) Control Flow: Python and JavaScript have conditional statements like if — we say that such statements can change the flow of control through the program. Program elements that can change the flow of control, such as if or while or return, are often called statements. Typically statements contain expressions but not the other way around. Evaluating Statements: def eval_stmts(tree, environment):
stmttype = tree[0]
if stmttype == "assign":
# ("assign", "x", ("binop", ..., "+", ...)) <=== x = ... + ...
variable_name = tree[1]
right_child = tree[2]
new_value = eval_exp(right_child, environment)
env_update(environment, variable_name, new_value)
elif stmttype == "if-then-else": # if x < 5 then A;B; else C;D;
conditional_exp = tree[1] # x < 5
then_stmts = tree[2] # A;B;
else_stmts = tree[3] # C;D;
if eval_exp(conditional_exp, environment):
eval_stmts(then_stmts,environment)
else:
eval_stmts(else_stmts,environment)
Pag
e31
def eval_exp(exp, env):
etype = exp[0]
if etype == "number":
return float(exp[1])
elif etype == "string":
return exp[1]
elif etype == "true":
return True
elif etype == "false":
return False
elif etype == "not":
return not(eval_exp(exp[1], env))
def env_update(env, vname, value):
env[vname] = value Scope: We use the term scope to refer to the portion of a program where a variable has a particular value. So, environment CANNOT be a flat mapping {}. Identifiers and Storage Places: Because the value of a variable can change, we will use explicit storage locations to track the current values of variables.
Environments: There is a special global environment that can hold variable values. Other environments have parent pointers to keep track of nesting or scoping. Environments hold storage locations and map variables to values. Chained Environments: The process upon a function call is 1. Create a new environment. Its parent is the current environment. 2. Create storage places in the new environment for each formal parameter. 3. Fill in those places with the values of the actual arguments. 4. Evaluate the function body in the new environment.
x = “outside x”
y = “outside y”
Global
myfun
x = “os lusiadas”
y = 2
def myfun(x):
print x
print y
myfun(y+5)
y : 2
x : 7
parent
}Environment
Pag
e32
gretting = “hola”
def makegreeter(greeting):
def greeter(person):
print greeting + “ ” + person
return greeter
sayhello = makegreeter(“hello from uttar pradesh”)
sayhello(“lucknow”) # hello from uttar pradesh lucknow
greeting : “hola”
makegreeter : ...
sayhello : ...greeting : “hello \
from uttar pradesh”
greeter : ...person : “lucknow”
Global
makegreeter
greeter
Environments Needs: 1. Map variables to values, 2. Point to parent environment. So, we’ll encode an environment as: (parent_pointer, dictionary). def env_lookup(vname, env):
# (parent, dictionary)
if vname in env[1]:
return env[1][vname]
elif env[0] == None:
return None:
else:
return env_lookup(vname,env[0])
def env_update(vname, value, env):
if vname in env[1]:
env[1][vname] = value
elif not env[0] == None:
env_update(vname,value,env[0])
Catching Errors: Modern programming languages use exceptions to notice and handle run-time errors. "try-catch" or "try-except" blocks are syntax for handling such exceptions. try:
print “hello”
1 / 0
# only runs if guarded block has an error
except Exception as problem:
print “didn’t work”
print problem
Pag
e33
Frames: # Return will throw an excception
# Function Calls: new environments, catch return values
def eval_stmt(tree,environment):
stmttype = tree[0]
if stmttype == "call": # ("call", "sqrt", [("number","2")])
fname = tree[1] # "sqrt"
args = tree[2] # [ ("number", "2") ]
fvalue = env_lookup(fname, environment)
if fvalue[0] == "function":
# We'll make a promise to ourselves:
# ("function", params, body, env)
fparams = fvalue[1] # ["x"]
fbody = fvalue[2]
fenv = fvalue[3]
if len(fparams) <> len(args):
print "ERROR: wrong number of args"
else:
new_env = (fenv, dict((fparams[i], \
eval_exp(args[i],environment)) for i in range(len(args))))
try:
eval_stmts(fbody,new_env)
return None
except Exception as return_value:
return return_value
else:
print "ERROR: call to non-function"
elif stmttype == "return":
retval = eval_exp(tree[1],environment)
raise Exception(retval)
elif stmttype == "exp":
eval_exp(tree[1],environment)
def env_lookup(vname,env):
if vname in env[1]:
return (env[1])[vname]
elif env[0] == None:
return None
else:
return env_lookup(vname,env[0])
def env_update(vname,value,env):
if vname in env[1]:
(env[1])[vname] = value
elif not (env[0] == None):
env_update(vname,value,env[0])
def eval_exp(exp,env):
etype = exp[0]
if etype == "number":
return float(exp[1])
elif etype == "binop":
a = eval_exp(exp[1],env)
op = exp[2]
b = eval_exp(exp[3],env)
if op == "*":
return a*b
elif etype == "identifier":
vname = exp[1]
value = env_lookup(vname,env)
Pag
e34
if value == None:
print "ERROR: unbound variable " + vname
else:
return value
def eval_stmts(stmts,env):
for stmt in stmts:
eval_stmt(stmt,env)
sqrt = ("function",("x"),(("return",("binop",("identifier","x"), \
"*",("identifier","x"))),),{})
environment = (None,{"sqrt":sqrt})
print eval_stmt(("call","sqrt",[("number","2")]),environment) Function Definitions:
function myfun(x) {
return x+1;
}
fname fparams
fbody
env[fname] = (“function”, fparams, fbody, fenv)
environment we were in when the function was defined
def eval_elt(tree,env):
elttype = tree[0]
if elttype == “function”:
fname = tree[1]
fparams = tree[2]
fbody = tree[3]
fvalue = (“function”, fparams, fbody, env)
add_to_env(env, fname, fvalue)
Double Edged Sword: We can simulate JavaScript programs with our interpreter written in Python. That means that anything that can be done in JavaScript could be done in Python as well. It turns out that JavaScript could also simulate Python. So they are equally powerful! (Turing Complete. Turing Machine – a mathematical model of computation.) Natural Language Power: While most computer languages are equivalent (in that any computation that can be done in one can also be done in another), it is debated whether the same is true for natural languages. Infinite Loops: Computer programs can contain infinite loops. A program either terminates (halts) in finite time or loops forever. We would like to tell if a program loops forever or not. It is provably impossible to write a procedure that can definitely tell if every other procedure loops forever or not.
Pag
e35
This Sentence is False: If tsif halts, then it loops forever. If tsif loops forever, then it halts. Both cases lead to a contradiction. Therefore, halts() cannot exist. Adding While Loop to the JavaScript Interpreter: def eval_while(while_stmt, env):
conditional = while_stmt[1]
loop_body = while_stmt[2]
while eval_exp(conditional, env):
eval_stmts(loop_body,env)
or
def eval_while(while_stmt, env):
conditional_exp = while_stmt[1]
loop_body = while_stmt[2]
if eval_exp(conditional_exp, env):
evalstmts(loop_body, env)
eval_while(while_stmt, env)
UNIT 6 – Building a Web Browser Web Browser Architecture: Our HTML lexer, parser and interpreter will drive the main process; our JavaScript lexer, parser and interpreter will serve as subroutines.
1. Web page is lexed and parsed. 2. HTML interpreter walks the Abstract Syntax Tree, and calls the JavaScript interpreter. 3. JavaScript code calls write(). 4. JavaScript interpreter stores text from write(). 5. HTML interpreter calls graphics library. 6. Final image of web page is created. Fitting Them Together: We change our HTML lexer to recognize embedded JavaScript fragments as single tokens (we treat JavaScript as a single HTML token). We'll pass the contents of those tokens to our JavaScript lexer, parser and interpreter later.
def t_javascript(token):
# Several backslashes may be unnecessary, but they are there to
# make sure that the re will be interpreted correctly in any case.
# This is called Defensive programming, and it is more commonly invoked
# when dealing with security or correctness requirements.
r‘\<script\ type=\“text\/javascript\”\>’
token.lexer.code_start = token.lexer.lexpos
token.lexer.begin(“javascript”)
def t_javascript_end(token):
r‘\<\/script\>’
token.value = token.lexer.lexdata[token.lexer.code_start: \
token.lexer.lexpos-9]
token.type = ‘JAVASCRIPT’
token.lexer.lineno += token.value.count(‘\n’)
token.lexer.begin(‘INITIAL’)
return token
def tsif( ):
if halts(tsif):
x = 0
while True:
x = x + 1
else:
return 0
Pag
e36
Extending our HTML Grammar: We extend our HTML parser to handle our special token representing embedded JavaScript. def p_element_javascript(p):
‘element : JAVASCRIPT’
p[0] = (“javascript-element”, p[1])
HTML Interpreter on JavaScript Elements:
def interpret(trees):
for tree in trees:
treetype = tree[0]
if treetype == “word-element”:
graphics.word(tree[1])
elif treetype == “tag-element”:
...
elif treetype == “javascript-element”:
jstext = tree[1]
jslexer = lex.lex(module=jstokens)
jsparser = yacc.yacc(module=jsgrammar)
jstree = jsparser.parse(jstext, lexer=jslexer)
result = jsinterp.interpret(jstree)
graphics.word(result)
JavaScript Output: A JavaScript program may contain zero, one or many calls to write(). We will use environments to capture the output of a JavaScript program. Assume every call to write appends to the special “javascript output” variable in the global environment.
def interpret(trees):
global_env = (None, {“javascript output”: “”})
for elt in trees:
eval_elt(elt,global_env)
return global_env[1][“javascript output”]
JavaScript Interpreter, Updating Output:
def eval_exp(tree,env):
exptype = tree[0]
if exptype == “call”:
fname = tree[1]
fargs = tree[2]
fvalue = env_lookup(fname,env)
if fname == “write”:
argval = eval_exp(fargs[0], env)
output_sofar = env_lookup(“javascript output”, env)
env_update(“javascript output”, \
output_sofar + str(argval), env)
return None
Pag
e37
Debugging: A good test case gives us confidence that a program implementation adheres to its specification. In this situation, a good test case reveals a bug. Testing: We use testing to gain confidence that an implementation (a program) adheres to it specification (the task at hand). If a program accepts an infinite set of inputs, testing alone cannot prove that program's correctness. Software maintenance (i.e., testing, debugging, refactoring) carries a huge cost. Testing In Depth: When developing a project, there are two ways we could go ahead. Either by planning and reasoning about the implementation in advance and then write the code with a high confidence that it will be free of bugs, or because of (time) constraints just implement it and then test the implementation. To test the implementation, we develop test cases (code that uses the program we would like to test), and if we observe a bug, we start commenting out lines (Fault Localization) of the test file, going back and forth with the commenting/uncommenting of the lines, see if it still “breaks”, and manage to pinpoint the bug. Anonymous Functions in our JavaScript Interpreter: def eval_exp(tree,env):
exptype = tree[0]
if exptype == “function”:
# function(x,y) { return x+y; }
fparams = tree[1]
fbody = tree[2]
return (“function”, fparams, fbody, env)
# For an anonymous function, we don’t add it to the
# environment, unless the user assigns it.
Optimization: An optimization improves the performance of a program while retaining its meaning (i.e., without changing the output). Implementing Optimizations:
1. Think of optimizations. (e.g., x = x + 0, x = x * 1) 2. Transform parse tree (directly). Note: Replacing an expensive multiplication with a cheaper addition is an instance of strength reduction. Optimization Timing: In this class we will optionally perform optimization after parsing but before interpreting. Our optimizer takes a parse tree as input and returns a (simpler) parse tree as output.
Program Text
Lexing Tokens Parsing Tree
Optimization (optional)
Tree(simpler)
InterpretingResult(meaning)
Pag
e38
def optimize(tree):
etype = tree[0]
if etype == “binop”:
a = tree[1]
op = tree[2]
b = tree[3]
if op == “*” and b == (“number”, “1”):
return a
if op == "*" and b == ("number", "0"):
return ("number", "0") # or return b
if op == "+" and b == ("number", "0"):
return a
return tree
Rebuilding The Parse Tree: We desire an optimizer that is recursive. We should optimize the child nodes of a parse tree before optimizing the parent nodes. 1. Recursive calls, 2. Look for patterns, 3. Done.
def optimize(tree): # Expression trees only
etype = tree[0]
if etype == "binop":
a = optimize(tree[1])
op = tree[2]
b = optimize(tree[3])
if op == "*" and b == ("number","1"):
return a
elif op == "*" and b == ("number","0"):
return ("number","0")
elif op == "+" and b == ("number","0"):
return a
return tree
return tree Wrap Up:
*/ \
5 1
5
(“binop”, (“number”, “5”), “*”, (“number”, “1”))
(“number”, “5”)
Lexing
Parsing
Optimizing
Interpreting
Debugging
regular expressionsfinite state machines
context free grammarsdynamic programming / parse trees
must retain meaning
walks A.S.T. recursively
gain confidence
Pag
e39
HTML embedded in JavaScript embedded in HTML embedded in JavaScript...: import ply.lex as lex
import ply.yacc as yacc
import graphics as graphics
import jstokens
import jsgrammar
import jsinterp
import htmltokens
import htmlgrammar
htmllexer = lex.lex(module=htmltokens)
htmlparser = yacc.yacc(module=htmlgrammar,tabmodule="parsetabhtml")
jslexer = lex.lex(module=jstokens)
jsparser = yacc.yacc(module=jsgrammar,tabmodule="parsetabjs")
def interpret(ast):
for node in ast:
nodetype = node[0]
if nodetype == "word-element":
graphics.word(node[1])
elif nodetype == "tag-element":
tagname = node[1];
tagargs = node[2];
subast = node[3];
closetagname = node[4];
if (tagname <> closetagname):
graphics.warning("(mistmatched " + \
tagname + " " + closetagname + ")")
else:
graphics.begintag(tagname,tagargs);
interpret(subast)
graphics.endtag();
elif nodetype == "javascript-element":
jstext = node[1];
jsast = jsparser.parse(jstext,lexer=jslexer)
result = jsinterp.interpret(jsast)
htmlast = htmlparser.parse(result,lexer=htmllexer)
interpret(htmlast)
webpage = “““ ... ”””
htmlast = htmlparser.parse(webpage,lexer=htmllexer)
graphics.initialize()
interpret(htmlast)
graphics.finalize()
Bending Numbers: # Write a procedure optimize(exp) that takes a JavaScript expression AST
# node and returns a new, simplified JavaScript expression AST. You must
# handle:
#
# X * 1 == 1 * X == X for all X
# X * 0 == 0 * X == 0 for all X
# X + 0 == 0 + X == X for all X
# X - X == 0 for all X
#
# and constant folding for +, - and * (e.g., replace 1+2 with 3)
Pag
e40
def optimize(exp):
etype = exp[0]
if etype == "binop":
a = optimize(exp[1])
op = exp[2]
b = optimize(exp[3])
if op == "+" and a == ("number", 0):
return b
elif op == "+" and b == ("number", 0):
return a
if op == "*" and a == ("number", 1):
return b
elif op == "*" and b == ("number", 1):
return a
if op == "*" and (a == ("number", 0) or b == ("number", 0)):
return ("number", 0)
if op == "-" and a == b:
return ("number", 0)
if a[0] == b[0] == "number":
if op == "+":
return ("number", a[1] + b[1])
if op == "-":
return ("number", a[1] - b[1])
if op == "*":
return ("number", a[1] * b[1])
return ("binop", a, op, b)
return exp
The Living and the Dead: # Those lines can be safely removed because they do not compute a value
# that is used later. We say that a variable is LIVE if the value it holds
# may be needed in the future. More formally, a variable is LIVE if its
# value may be read before the next time it is overwritten. Whether or not
# a variable is LIVE depends on where you are looking in the program, so
# most formally we say a variable is live at some point P if it may be read
# before being overwritten after P.
# function myfun(a,b,c,d) {
# a = 1;
# # LIVE: nothing
# b = 2;
# # LIVE: b
# c = 3;
# # LIVE: c, b
# d = 4;
# # LIVE: c, b
# a = 5;
# # LIVE: a, c, b
# d = c + b;
# # LIVE: a, d
# return (a + d);
# }
#
Pag
e41
# Once we know which variables are LIVE, we can now remove assignments to
# variables that will never be read later. Such assignments are called DEAD
# code. Formally, given an assignment statement "X = ...", if "X" is not
# live after that statement, the whole statement can be removed.
# In this assignment, you will write an optimizer that removes dead code.
# For simplicity, we will only consider sequences of assignment statements
# (once we can optimize those, we could weave together a bigger optimizer
# that handles both branches of if statements, and so on, but we'll just do
# simple lists of assignments for now).
# Write a procedure removedead(fragment,returned). "fragment" is encoded
# as above. "returned" is a list of variables returned at the end of the
# fragment (and thus LIVE at the end of it).
#
# Hint 1: One way to reverse a list is [::-1]
# >>> [1,2,3][::-1]
# [3, 2, 1]
def removedead(fragment,returned):
old_fragment = fragment
new_fragment = []
live = returned
for stmt in fragment[::-1]:
if stmt[0] in live:
new_fragment = [stmt] + new_fragment
live = [x for x in live if x != stmt[0]]
live = live + stmt[1]
if new_fragment == old_fragment:
return new_fragment
else:
return removedead(new_fragment, returned)
Find all the Subsets of a Set: def all_subsets(lst):
pset = [[]]
for elem in lst:
pset += [x + [elem] for x in pset]
return pset
UNIT 7 – Wrap Up Review: A language is a set of strings. Regular Expressions – concise notation for specifying some sets of strings (regular languages). Finite State Machines – pictorial representation + way to implement regular expressions (deterministic or not). Context-Free Grammars – concise notation for specifying some sets of strings (context-free languages). Memoization (also called Dynamic Programming) – keep previous results in a chart to save computation. Lexing – break a big string up into a list of tokens (words) (specified using r.e.). Parsing – determine if a list of tokens is in the language of a CFG. If so, produce a Parse Tree. Type – a type is a set of values and associated safe operations. Semantics (Meaning) – a program may have type errors (or other exceptions) or it may produce a value.
Pag
e42
Optimization – replace a program with another that has the same semantics (but use fewer resources). Interpretation – recursive walk over the (optimized) parse tree. the meaning of a program is computed from the meanings of its subexpressions. Web Browser – lex and parse html, treating JS as a special token. HTML interpreter calls JS interpreter which returns a string. HTML interpreter calls graphics library to display them. Security: Computing in the presence of an adversary.
This file is not offered officially by Udacity.com. This material was created by a student as personal notes, while attending the lectures of the course CS262: Programming Languages. This material is offered freely. Lamprianidis Nick Last Edited: 07/27/2012