Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.
-
Upload
rosemary-ward -
Category
Documents
-
view
221 -
download
2
Transcript of Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.
![Page 1: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/1.jpg)
Methods in Computational Linguistics II
Queens College
Lecture 5: List Comprehensions
![Page 2: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/2.jpg)
2
Split into words
• sent = “That isn’t the problem, Bob.” • sent.split()• vs. • nltk.word_tokenize(sent)
![Page 3: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/3.jpg)
3
List Comprehensions
• Compact way to process every item in a list.
[x for x in array]
dest = []
for x in array:dest.append(x)
![Page 4: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/4.jpg)
4
Methods
• Using the iterating variable, x, methods can be applied.
• Their value is stored in the resulting list.
[len(x) for x in array]
dest = []
for x in array:dest.append(len(x))
![Page 5: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/5.jpg)
5
Conditionals
• Elements from the original list can be omitted from the resulting list, using conditional statements
[x for x in array if len(x) == 3]
dest = []
for x in array:
if len(x) == 3:dest.append(x)
![Page 6: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/6.jpg)
6
Building up
• These can be combined to build up complicated lists
[x.upper() for x in array if len(x) > 3 and x.startswith(‘t’)]
dest = []
for x in array:
if len(x) > 3 and x.startswith(‘t’):dest.append(x.upper())
![Page 7: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/7.jpg)
7
Lists Containing Lists
• Lists can contain lists• [[a, 1], [b, 2], [d, 4]]• ...or tuples• [(a, 1), (b, 2), (d, 4)]• [ [d, d*d] for d in array if d < 4]
![Page 8: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/8.jpg)
8
Using multiple lists
• Multiple lists can be processed simultaneously in a list comprehension
• [x*y for x in array1 for y in array2]
![Page 9: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/9.jpg)
9
List Comprehension Exercises
Make a list of the first ten multiples of ten (10, 20, 30... 90, 100) using a list comprehension.
Make a list of the first ten cubes (1, 8, 27... 1000) using a list comprehension.
Store five names in a list. Make a second list that adds the phrase "is awesome!" to each name, using a list comprehension.
Write out the following code without using a list comprehension:
plus_thirteen = [number + 13 for number in range(1,11)]
Exercises from: http://introtopython.org/all_exercises_challenges.html#ex_ch_12
![Page 10: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/10.jpg)
10
Lists within lists are often called 2-d arrays
• This is another way we store tables.
• Similar to nested dictionaries.• a = [[0,1], [1,0]]• a[1][1]• a[0][0]
![Page 11: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/11.jpg)
11
Numpy & Arrays
• Numpy is a commonly used package for numerical calculations in python.
• Its main object is a multidimensional array.
• A[1] List• A[1][2] ‘Rectangular’ 2-d Matrix• A[1][2][3] ‘Cube/Prism’ 3-d Matrix • A[1][2][3][4] 4-d Matrix• etc.
![Page 12: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/12.jpg)
12
Numpy arrays
from numpy import *
a = array([1,2,3,4])
a = array([1,2], [3,4])
a.ndim Number of dimensions
a.shape Length of each dimension
a.size Total number of elements
![Page 13: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/13.jpg)
13
numpy array initialization
>>> zeros( (3,4) )
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
>>> ones( (2,3,4), dtype=int16 )
array([[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]],
[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]]], dtype=int16)
>>> empty( (2,3) )
array([[ 3.73603959e-262, 6.02658058e-154, 6.55490914e-260],
[ 5.30498948e-313, 3.14673309e-307, 1.00000000e+000]])
![Page 14: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/14.jpg)
14
Content Types
• arrays are homogenous (ndarray)– array([1, 3, 4], dtype=int16)
• lists are not homogenous– [‘abc’, 123, [list1, list2]]
• dtype describes the “type” of object in the array– str, tuple, int, etc.– numpy.int16, numpy.int32, numpy.float64 etc.
![Page 15: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/15.jpg)
15
zip
• Zip allows you to “zip” two lists together, creating a list of tuples
• names = [‘Andrew’, ‘Beth’, ‘Charles’]• ages = [35, 34, 33]• name_age = zip(names, ages)
– [(‘Andrew’, 35), (‘Beth’, 34), (‘Charles’, 33)]
![Page 16: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/16.jpg)
16
foreach vs. indexed for loops
“More pythonic”
for n, a in zip(names, ages):
print “%s -- %s” % (n, a)
vs.
for i in xrange(len(names)):
print “%s -- %s” % (names[i], ages[i])
![Page 17: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/17.jpg)
17
map
• map allows you to apply the same function to a list of objects.
a = [‘1’, ‘2’, ‘4’]
map(int, a)
![Page 18: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/18.jpg)
18
map
Any function can be ‘map’ed over a list, but the elements of the list need to be a value argument.
def uppercase(s):
return s.upper()
a = [‘abc’, ‘def’, ‘ghi’]
map(uppercase, a)
![Page 19: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/19.jpg)
19
Functions as objects
• A function name can be assigned to a variable.• map is an example of this, where the first
argument to map is a function object.
a = [1, 3, 4]
len(a)
sum(a)
functions = [len, sum]
for fn in functions:print str(fn), fn(a)
![Page 20: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/20.jpg)
20
lambda
• Lambda functions are single use functions that do not need to be ‘def’ed.
• Using the uppercase example again:
def uppercase(s):
return s.upper()
a = [‘abc’, ‘def’, ‘ghi’]
map(uppercase, a)
![Page 21: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/21.jpg)
21
lambda
• Lambda functions are single use functions that do not need to be ‘def’ed.
• These are “anonymous” functions• Using the uppercase example again:
a = [‘abc’, ‘def’, ‘ghi’]
map(lambda s : s.upper(), a)
By design, lambdas are only a single statement
![Page 22: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/22.jpg)
22
Aside: Glob
• Construct a list of all filemames matching a pattern.
from glob import glob
glob(‘*.txt’)
glob(‘/Users/andrew/Documents/*/*.ppt’)
![Page 23: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/23.jpg)
23
Linguistic Annotation• Text only takes us so far.• People are reliable judges of linguistic
behavior.• We can model with machines, but for
“gold-standard” truth, we ask people to make judgments about linguistic qualities.
![Page 24: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/24.jpg)
24
Example Linguistic Annotations
• Sentence Boundaries• Part of Speech Tags• Phonetic Transcription• Syntactic parse trees• Speaker Identity• Semantic Role • Speech Act• Document Topic• Argument structure• Word Sense• many many many more
![Page 25: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/25.jpg)
25
We need…
• Techniques to process these.
• Every corpus has its own format for linguistic annotation.
• so…we need to parse annotation formats.
![Page 26: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/26.jpg)
26
Constructing a linguistic corpus
• Decisions that need to be made:
– Why are you doing this?– What material will be collected?– How will it be collected?
• Automatically?• Manually?• Found material vs. laboratory language?
– What meta information will be stored?– What manual annotations are required?
• How will each annotation be defined?• How many annotators will be used?• How will agreement be assessed? • How will disagreements be resolved?
– How will the material be disseminated?• Is this covered by your IRB if the material is the result of a human subject
protocol?
![Page 27: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/27.jpg)
27
Part of Speech Tagging
• Task: Given a string of words, identify the parts of speech for each word.
![Page 28: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/28.jpg)
28
Part of Speech tagging
• Surface level syntax.• Primary operation• Parsing• Word Sense Disambiguation• Semantic Role labeling• Segmentation • Discourse, Topic, Sentence
![Page 29: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/29.jpg)
29
How is it done?
• Learn from Data.• Annotated Data:
• Unlabeled Data:
![Page 30: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/30.jpg)
30
Learn the association from Tag to Word
![Page 31: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/31.jpg)
31
Limitations
• Unseen tokens• Uncommon interpretations• Long term dependencies
![Page 32: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/32.jpg)
32
Format conversion exercise
The/DET Dog/NN is/VB fast/JJ ./.
<word ortho=“The” pos=“DET”></word>
<word ortho=“Dog” pos=“NN”></word>
<word ortho=“is” pos=“VB”></word>
<word ortho=“fast” pos=“JJ”></word>
<word ortho=“.” pos=“.”></word>
The dog is fast.
1, 3, DET
5, 7, NN
9, 10, VB
12, 15, JJ
16, 16, .
![Page 33: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/33.jpg)
33
Parsing
• Generate a parse tree.
![Page 34: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/34.jpg)
34
Parsing
• Generate a Parse Tree from:• The surface form (words) of the text• Part of Speech Tokens
![Page 35: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/35.jpg)
35
Parsing Styles
![Page 36: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/36.jpg)
36
Parsing styles
![Page 37: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/37.jpg)
37
Context Free Grammars for Parsing
• S → VP• S →NP VP• NP → Det Nom• Nom → Noun• Nom → Adj Nom• VP → Verb Nom• Det → “A”, “The”
• Noun → “I”, “John”, “Address”
• Verb → “Gave”• Adj → “My”, “Blue”• Adv → “Quickly”
![Page 38: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/38.jpg)
38
Limitations
• The grammar must be built by hand.• Can’t handle ungrammatical sentences.• Can’t resolve ambiguity.
![Page 39: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/39.jpg)
39
Probabilistic Parsing
• Assign each transition a probability• Find the parse with the greatest
“likelihood”
• Build a table and count– How many times does each transition happen
• Structured learning.
![Page 40: Methods in Computational Linguistics II Queens College Lecture 5: List Comprehensions.](https://reader036.fdocuments.in/reader036/viewer/2022062314/56649de35503460f94ada745/html5/thumbnails/40.jpg)
40
Segmentation
• Sentence Segmentation
• Topic Segmentation
• Speaker Segmentation
• Phrase Chunking– NP, VP, PP, SubClause, etc.