Introduction to Python Language and Data Types

100

Transcript of Introduction to Python Language and Data Types

Page 1: Introduction to Python Language and Data Types
Page 2: Introduction to Python Language and Data Types

Invented in the Netherlands, early 90s by Guido van Rossum

Named after Monty Python Open sourced from the beginning Used by Google from the beginning 80 versions as of 4th October 2014.

“Python” or “CPython” is written in C/C++ - Version 2.7.7 came out in mid-2010 - Version 3.4.2 came out in late 2014 “Jython” is written in Java for the JVM “IronPython” is written in C# for the .Net environment

Page 3: Introduction to Python Language and Data Types

“Python is an experiment in how much freedom programmers need. Too much freedom and nobody can read another's code; too little and expressive-ness is endangered.”

- Guido van Rossum

Page 4: Introduction to Python Language and Data Types

Python is super easy to learn, relatively fast, object-oriented, strongly typed, widely used, and portable.

C is much faster but much harder to use.

Java is about as fast and slightly harder to use.

Perl is slower, is as easy to use, but is not strongly typed.

R is more of programming language for statistics.

Page 5: Introduction to Python Language and Data Types

http://docs.python.org/

Page 6: Introduction to Python Language and Data Types

PyDev with Eclipse Komodo Emacs Vim TextMate Gedit IDLE PIDA (Linux)(VIM Based) NotePad++ (Windows) BlueFish (Linux) Interactive Shell

Page 7: Introduction to Python Language and Data Types

Great for learning the language Great for experimenting with the library Great for testing your own modules Two variations: IDLE (GUI) (next slide),

python (command line) Press “Windows + R” and then write “python”. Enter Type statements or expressions at prompt:

>>> print "Hello, world"Hello, world>>> x = 12**2>>> x/272

Page 8: Introduction to Python Language and Data Types

IDLE is an Integrated Development Environment for Python, typically used on Windows

Multi-window text editor with syntax highlighting, auto-completion, smart indent and other.

Python shell with syntax highlighting. Integrated debugger

with stepping, persis-tent breakpoints,and call stack visi-bility

Page 9: Introduction to Python Language and Data Types

Install Python 2.7.7 from the official website. 32 bit as most packages are built for that.

Install setuptools, which will help to install other packages. Download the setup tools and then install.

Start the Python IDLE and start programming. Install Ipython which provides a browser-based notebook with

support for code, text, mathematical expressions, inline plots and other rich media.

Start writing small programs using problems at http://www.ling.gu.se/~lager/python_exercises.html

Can also use www.codecademy.com/tracks/python which has an interactive structure + deal with web APIs.

Page 10: Introduction to Python Language and Data Types

Start comments with #, rest of line is ignored Can include a “documentation string” as the first line of

a new function or class you define

# My first programdef fact(n): “““fact(n) assumes n is a positive integer and returns

facorial of n.”””assert(n>0)return 1 if n==1 else n*fact(n-1)

Page 11: Introduction to Python Language and Data Types

Names are case sensitive and cannot start with a number. They can contain letters, numbers, and underscores. var Var _var _2_var_ var_2 VaR

2_Var 222var

There are some reserved words:and, assert, break, class, continue, def, del, elif, else, except, exec, finally, for, from, global, if, import, in, is, lambda, not, or, pass, print, raise, return, try, while

Page 12: Introduction to Python Language and Data Types

No need to declare unlike in JAVA Simple assignment:

▪ a = 2, A = 2 Need to assign (initialize)

▪ use of uninitialized variable raises exception Not typed

if friendly: greeting = "hello world“ #(or ‘hello word’)else: greeting = 12**2print greeting

Page 13: Introduction to Python Language and Data Types

a

1b

a

1b

a = 1

a = a+1

b = a

a 1

2old reference deletedby assignment (a=...)

new int object createdby add operator (1+1)

Page 14: Introduction to Python Language and Data Types

If you try to access a name before it’s been properly created (by placing it on the left side of an assignment), you’ll get an error.

>>> y

Traceback (most recent call last): File "<pyshell#16>", line 1, in -toplevel- yNameError: name ‘y' is not defined>>> y = 3>>> y3

Page 15: Introduction to Python Language and Data Types

Assignment manipulates references▪ x = y does not make a copy of y▪ x = y makes x reference the object y references

Very useful; but beware! Example:

>>> a = [1, 2, 3]>>> b = a>>> a.append(4)>>> print b[1, 2, 3, 4]

Page 16: Introduction to Python Language and Data Types

You can also assign to multiple names at the same time.

>>> x, y = 2, 3>>> x2>>> y3

>>> x, y, z = 2, 3, 5.67>>> x2>>> y3>>> z5.67

Page 17: Introduction to Python Language and Data Types

+ Addition - Adds values on either side of the operator

a + b will give 30

- Subtraction - Subtracts right hand operand from left hand operand

a - b will give -10

* Multiplication - Multiplies values on either side of the operator

a * b will give 200

/ Division - Divides left hand operand by right hand operand

b / a will give 2

% Modulus - Divides left hand operand by right hand operand and returns remainder

b % a will give 0

** Exponent - Performs exponential (power) calculation on operators

a**b will give 10 to the power 20

// Floor Division - The division of operands where the result is the quotient in which the digits after the decimal point are removed.

9//2 is equal to 4 and 9.0//2.0 is equal to 4.0

Page 18: Introduction to Python Language and Data Types

== Checks if the value of two operands are equal or not, if yes then condition becomes true.

(a == b) is not true.

!= Checks if the value of two operands are equal or not, if values are not equal then condition becomes true.

(a != b) is true.

<> Checks if the value of two operands are equal or not, if values are not equal then condition becomes true.

(a <> b) is true. This is similar to != operator.

> Checks if the value of left operand is greater than the value of right operand, if yes then condition becomes true.

(a > b) is not true.

< Checks if the value of left operand is less than the value of right operand, if yes then condition becomes true.

(a < b) is true.

>= Checks if the value of left operand is greater than or equal to the value of right operand, if yes then condition becomes true.

(a >= b) is not true.

<= Checks if the value of left operand is less than or equal to the value of right operand, if yes then condition becomes true.

(a <= b) is true.

Page 19: Introduction to Python Language and Data Types

& Binary AND Operator copies a bit to the result if it exists in both operands.

(a & b) will give 12 which is 0000 1100

| Binary OR Operator copies a bit if it exists in eather operand.

(a | b) will give 61 which is 0011 1101

^ Binary XOR Operator copies the bit if it is set in one operand but not both.

(a ^ b) will give 49 which is 0011 0001

~ Binary Ones Complement Operator is unary and has the efect of 'flipping' bits.

(~a ) will give -61 which is 1100 0011 in 2's complement form due to a signed binary number.

<< Binary Left Shift Operator. The left operands value is moved left by the number of bits specified by the right operand.

a << 2 will give 240 which is 1111 0000

>> Binary Right Shift Operator. The left operands value is moved right by the number of bits specified by the right operand.

a >> 2 will give 15 which is 0000 1111

Page 20: Introduction to Python Language and Data Types

Operator Description Example

and Called Logical AND operator. If both the operands are true then then condition becomes true.

(a and b) is true.

or Called Logical OR Operator. If any of the two operands are non zero then then condition becomes true.

(a or b) is true.

not Called Logical NOT Operator. Use to reverses the logical state of its operand. If a condition is true then Logical NOT operator will make false.

not(a and b) is false.

Page 21: Introduction to Python Language and Data Types

in Evaluates to true if it finds a variable in the specified sequence and false otherwise.

x in y, here in results in a 1 if x is a member of sequence y.

not in Evaluates to true if it does not finds a variable in the specified sequence and false otherwise.

x not in y, here not in results in a 1 if x is not a member of sequence y.

is Evaluates to true if the variables on either side of the operator point to the same object and false otherwise.

x is y, here is results in 1 if id(x) equals id(y).

is not Evaluates to false if the variables on either side of the operator point to the same object and true otherwise.

x is not y, here is not results in 1 if id(x) is not equal to id(y).

Page 22: Introduction to Python Language and Data Types

We use the term object to refer to any entity in a python program.

Every object has an associated type, which determines the properties of the object.

Python defines six types of built-in objects:

Number : 10String : “hello”, ‘hi’List: [1, 17, 44]Tuple : (4, 5)Dictionary: {‘food’ : ‘something you eat’,

‘lobster’ : ‘an edible, undersea arthropod’}File

Each type of object has its own properties, which we will learn about in the next several weeks.

Types can be accessed by ‘type’ method

Page 23: Introduction to Python Language and Data Types

The usual suspects▪ 12, 3.14, 0xFF, 0377, (-1+2)*3/4**5, abs(x)

Integer – 1, 2, 3, 4. Also 1L, 2L Float – 1.0, 2.0, 3.0 3.2 Addition

▪ 1+2 ->3, 1+2.0 -> 3.0 Division

▪ 1/2 -> 0 , 1./2. -> 0.5, float(1)/2 -> 0.5 Long (arbitrary precision)

▪ 2**100 -> 1267650600228229401496703205376

Page 24: Introduction to Python Language and Data Types

Flexible arrays, mutable▪ a = [99, "bottles of beer", ["on", "the", "wall"]]

Same operators as for strings▪ a+b, a*3, a[0], a[-1], a[1:], len(a)

Item and slice assignment▪ a[0] = 98▪ a[1:2] = ["bottles", "of", "beer"]

-> [98, "bottles", "of", "beer", ["on", "the", "wall"]]▪ del a[-1] # -> [98, "bottles", "of", "beer"]

[0]*3 = [0,0,0]

Page 25: Introduction to Python Language and Data Types

>>> list_ex = (23, ‘abc’, 4.56, (2,3), ‘def’)Positive index: count from the left, starting with 0

>>> list_ex[1]‘abc’

Negative index: count from right, starting with –1>>> list_ex[-3] 4.56

Page 26: Introduction to Python Language and Data Types

>>> list_ex = (23, ‘abc’, 4.56, (2,3), ‘def’)

Return a copy of the container with a subset of the original members. Start copying at the first index, and stop copying before second.

>>> list_ex[1:4](‘abc’, 4.56, (2,3))Negative indices count from end

>>> list_ex[1:-1](‘abc’, 4.56, (2,3))

Page 27: Introduction to Python Language and Data Types

>>> list_ex = (23, ‘abc’, 4.56, (2,3), ‘def’)Omit first index to make copy starting from beginning of the container

>>> list_ex[:2] (23, ‘abc’)Omit second index to make copy starting at first index and going to end

>>> list_ex[2:](4.56, (2,3), ‘def’)

Page 28: Introduction to Python Language and Data Types

[ : ] makes a copy of an entire sequence>>> list_ex[:] (23, ‘abc’, 4.56, (2,3), ‘def’)

Note the difference between these two lines for mutable sequences

>>> l2 = l1 # Both refer to 1 ref, # changing one affects both>>> l2 = l1[:] # Independent copies, two refs

Page 29: Introduction to Python Language and Data Types

>>> li = [‘abc’, 23, 4.34, 23]>>> li[1] = 45 >>> li

[‘abc’, 45, 4.34, 23]

Name li still points to the same memory reference when we’re done.

Page 30: Introduction to Python Language and Data Types

a1 2 3

b

a1 2 3

b4

a = [1, 2, 3]

a.append(4)

b = a

a 1 2 3

Page 31: Introduction to Python Language and Data Types

>>> li = [1, 11, 3, 4, 5]

>>> li.append(‘a’) # Note the method syntax>>> li[1, 11, 3, 4, 5, ‘a’]

>>> li.insert(2, ‘i’)>>> li[1, 11, ‘i’, 3, 4, 5, ‘a’]

>>> print (li + [1,2,3])[1, 11, ‘i’, 3, 4, 5, ‘a’, 1, 2, 3]

Page 32: Introduction to Python Language and Data Types

+ creates a fresh list with a new memory refextend operates on list li in place.

>>> li.extend([9, 8, 7]) >>> li[1, 2, ‘i’, 3, 4, 5, ‘a’, 9, 8, 7]

Potentially confusing: extend takes a list as an argument. append takes a singleton as an argument.>>> li.append([10, 11, 12])>>> li[1, 2, ‘i’, 3, 4, 5, ‘a’, 9, 8, 7, [10, 11, 12]]

Page 33: Introduction to Python Language and Data Types

Lists have many methods, including index, count, remove, reverse, sort >>> li = [‘a’, ‘b’, ‘c’, ‘b’]>>> li.index(‘b’) # index of 1st occurrence1>>> li.count(‘b’) # number of occurrences2>>> li.remove(‘b’) # remove 1st occurrence>>> li [‘a’, ‘c’, ‘b’]

Page 34: Introduction to Python Language and Data Types

>>> a = range(5) # [0,1,2,3,4]>>> a.append(5) # [0,1,2,3,4,5]>>> a.pop() # [0,1,2,3,4]5>>> a.pop(2) # [0,1,2,3,4]2

>>> a.insert(0, 42) # [42,0,1,2,3,4]>>> a.pop(0) # [0,1,2,3,4]42>>> a.reverse() # [4,3,2,1,0]>>> a.sort() # [0,1,2,3,4]

Page 35: Introduction to Python Language and Data Types

>>> names[0]‘Ben'>>> names[1]‘Chen'>>> names[2]‘Yaqin'>>> names[3]Traceback (most recent call last):Traceback (most recent call last):File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>IndexError: list index out of rangeIndexError: list index out of range>>> names[-1]‘Yaqin'>>> names[-2]‘Chen'>>> names[-3]‘Ben'

[0] is the first item.[1] is the second item...

Out of range valuesraise an exception

Negative valuesgo backwards fromthe last element.

Names = [ ‘Ben’, ‘Chen’, ‘Yaqin’]

Page 36: Introduction to Python Language and Data Types

>>> ids = ["9pti", "2plv", "1crn"]>>> ids.append("1alm")>>> ids['9pti', '2plv', '1crn', '1alm']>>> del ids[0]>>> ids['2plv', '1crn', '1alm']>>> ids.sort()>>> ids['1alm', '1crn', '2plv']>>> ids.reverse()>>> ids['2plv', '1crn', '1alm']>>> ids.insert(0, "9pti")>>> ids['9pti', '2plv', '1crn', '1alm']

Page 37: Introduction to Python Language and Data Types

>>> names['ben', 'chen', 'yaqin']['ben', 'chen', 'yaqin']

>>> gender = [0, 0, 1][0, 0, 1]

>>> zip(names, gender)[('ben', 0), ('chen', 0), ('yaqin', 1)][('ben', 0), ('chen', 0), ('yaqin', 1)]

Can also be done using a for loop, but this is more efficientCan also be done using a for loop, but this is more efficient

Page 38: Introduction to Python Language and Data Types

The comma is the tuple creation operator, not parentheses>>> 1, -> (1,)

Python shows parentheses for clarity (best practice)>>> (1,) -> (1,)

Don't forget the comma!>>> (1) -> 1

Non-mutable lists. Elements can be accessed same way Empty tuples have a special syntactic form

>>> () -> ()>>> tuple() -> ()

Page 39: Introduction to Python Language and Data Types

>>> yellow = (255, 255, 0) # r, g, b>>> one = (1,)>>> one = (1,)>>> yellow[0]>>> yellow[1:](255, 0)>>> yellow[0] = 0Traceback (most recent call last):Traceback (most recent call last):File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>TypeError: 'tuple' object does not support item assignmentTypeError: 'tuple' object does not support item assignment

Very common in string interpolation:>>> "%s lives in %s at latitude %.1f" % ("Andrew", "Sweden",

57.7056)'Andrew lives in Sweden at latitude 57.7'

Page 40: Introduction to Python Language and Data Types

>>> t = (23, ‘abc’, 4.56, (2,3), ‘def’)>>> t[2] = 3.14

Traceback (most recent call last): File "<pyshell#75>", line 1, in -toplevel- tu[2] = 3.14TypeError: object doesn't support item assignment

You can’t change a tuple. You can make a fresh tuple and assign its reference to a

previously used name.>>> t = (23, ‘abc’, 3.14, (2,3), ‘def’)

The immutability of tuples means they’re faster than lists.

Page 41: Introduction to Python Language and Data Types

Lists slower but more powerful than tuples Lists can be modified, and they have lots of handy

operations and methods Tuples are immutable and have fewer features

To convert between tuples and lists use the list() and tuple() functions:li = list(tu)tu = tuple(li)

Page 42: Introduction to Python Language and Data Types

▪ "hello"+"world" "helloworld" # concatenation▪ "hello"*3 "hellohellohello" # repetition▪ "hello"[0] "h" # indexing▪ "hello"[-1] "o" # (from end)▪ "hello"[1:4] "ell" # slicing▪ len("hello") 5 # size▪ "hello" < "jello" 1 # comparison▪ "e" in "hello"1 # search▪ "escapes: \n etc, \033 etc, \if etc"▪ 'single quotes' "double quotes"

Page 43: Introduction to Python Language and Data Types

>>> smiles = "C(=N)(N)N.C(=O)(O)O">>> smiles[0]'C'>>> smiles[1]'('>>> smiles[-1]'O'>>> smiles[1:5]'(=N)'>>> smiles[10:-4]'C(=O)'

Page 44: Introduction to Python Language and Data Types

Boolean test whether a value is inside a container:>>> t = [1, 2, 4, 5]>>> 2 in tTrue>>> 4 not in tFalse

For strings, tests for substrings>>> a = 'abcde'>>> 'c' in aTrue>>> 'cd' in aTrue>>> 'cdf' in aFalse

Page 45: Introduction to Python Language and Data Types

The + operator produces a new tuple, list, or string whose value is the concatenation of its arguments.

>>> (1, 2, 3) + (4, 5, 6) (1, 2, 3, 4, 5, 6)

>>> [1, 2, 3] + [4, 5, 6] [1, 2, 3, 4, 5, 6]

>>> “Hello” + “ ” + “World” ‘Hello World’

Page 46: Introduction to Python Language and Data Types

The * operator produces a new tuple, list, or string that “repeats” the original content.

>>> (1, 2, 3) * 3(1, 2, 3, 1, 2, 3, 1, 2, 3)

>>> [1, 2, 3] * 3[1, 2, 3, 1, 2, 3, 1, 2, 3]

>>> “Hello” * 3‘HelloHelloHello’

Page 47: Introduction to Python Language and Data Types

smiles = "C(=N)(N)N.C(=O)(O)Osmiles = "C(=N)(N)N.C(=O)(O)O"">>> smiles.find("(O)")15

>>> smiles.find(".")9

>>> smiles.find(".", 10)-1

>>> smiles.split(".")['C(=N)(N)N', 'C(=O)(O)O']

Use “find” to find thestart of a substring.

Start looking at position 10.

Find returns -1 if it couldn’tfind a match.

Split the string into partswith “.” as the delimiter

Page 48: Introduction to Python Language and Data Types

if "Br" in “Brother”:print "contains brother“

>>> contains brother

email_address = “ravi”if "@" not in email_address:

email_address += "@latentview.com“

print email_address>>> [email protected]

Page 49: Introduction to Python Language and Data Types

>>> line = " # This is a comment line \n">>> line.strip()'# This is a comment line'>>> line.rstrip()' # This is a comment line'>>> line.rstrip("\n")' # This is a comment line '

Page 50: Introduction to Python Language and Data Types

email = [email protected]>>> email.startswith(“c")TrueTrue>>> email.endswith(“u")TrueTrue

>>> "%[email protected]" % "clin"'[email protected]''[email protected]'

>>> names = [“Ben", “Chen", “Yaqin"]>>> ", ".join(names)‘‘Ben, Chen, Yaqin‘Ben, Chen, Yaqin‘

>>> “chen".upper()‘‘CHEN‘CHEN‘

Page 51: Introduction to Python Language and Data Types

capitalize() -- Capitalizes first letter of string count(str, beg= 0,end=len(string))

Counts how many times str occurs in string. encode(encoding=’UTF-8′,errors=’strict’)

Returns encoded string version of string isalnum()

Returns true if string has at least 1 character isalpha()

Returns true if string has at least 1 character  isdigit()

Returns true if string contains only digits and false otherwise

Page 52: Introduction to Python Language and Data Types

Strings are read only

>>> s = "andrew">>> s[0] = "A"Traceback (most recent call last):Traceback (most recent call last):File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>TypeError: 'str' object does not support item assignmentTypeError: 'str' object does not support item assignment

>>> s = "A" + s[1:]>>> s'Andrew‘

Page 53: Introduction to Python Language and Data Types

\n -> newline\t -> tab\\ -> backslash...But Windows uses backslash for directories!filename = "M:\nickel_project\reactive.smi" # DANGER!filename = "M:\\nickel_project\\reactive.smi" # Better!filename = "M:/nickel_project/reactive.smi" # Usually works

Page 54: Introduction to Python Language and Data Types

Dictionaries are lookup tables. Random, not sorted They map from a “key” to a “value”.

symbol_to_name = {"H": "hydrogen","He": "helium","Li": "lithium","C": "carbon","O": "oxygen","N": "nitrogen"}

Duplicate keys are not allowed Duplicate values are just fine Keys can be any immutable value

Page 55: Introduction to Python Language and Data Types

Hash tables, "associative arrays"▪ d = {"duck": "eend", "water": "water"}

Lookup:▪ d["duck"] -> "eend"▪ d["back"] # raises KeyError exception

Delete, insert, overwrite:▪ del d["water"] # {"duck": "eend", "back": "rug"}▪ d["back"] = "rug" # {"duck": "eend", "back": "rug"}▪ d["duck"] = "duik" # {"duck": "duik", "back": "rug"}

Page 56: Introduction to Python Language and Data Types

>>> symbol_to_name["C"]'carbon''carbon'>>> "O" in symbol_to_name, "U" in symbol_to_name(True, False)(True, False)>>> "oxygen" in symbol_to_nameFalse>>> symbol_to_name["P"]>>> symbol_to_name["P"]

Traceback (most recent call last):Traceback (most recent call last):File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>KeyError: 'P'KeyError: 'P'>>> symbol_to_name.get("P", "unknown")'unknown''unknown'>>> symbol_to_name.get("C", "unknown")'carbon''carbon'

Page 57: Introduction to Python Language and Data Types

Keys, values, items:▪ d.keys() -> ["duck", "back"]▪ d.values() -> ["duik", "rug"]▪ d.items() -> [("duck","duik"), ("back","rug")]

Presence check:▪ d.has_key("duck") -> 1; d.has_key("spam") -> 0

Values of any type; keys almost any▪ {"name":"Guido", "age":43, ("hello","world"):1,

42:"yes", "flag": ["red","white","blue"]}

Page 58: Introduction to Python Language and Data Types

>>> symbol_to_name.keys()['C', 'H', 'O', 'N', 'Li', 'He']['C', 'H', 'O', 'N', 'Li', 'He']

>>> symbol_to_name.values()['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']

>>> symbol_to_name.update( {"P": "phosphorous", "S": "sulfur"} )

>>> symbol_to_name.items()[('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'), [('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'),

('P', 'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', ('P', 'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', 'helium')]'helium')]

>>> del symbol_to_name['C']>>> symbol_to_name{'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He': {'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He':

'helium'}'helium'}

Page 59: Introduction to Python Language and Data Types

dict.clear()Removes all elements of dictionary dict

dict.fromkeys()Create a new dictionary with keys from seq and values set to value

dict.setdefault(key, default=None)Similar to get(), but will set dict[key]=default if key is not already in dict

dict.update(dict2)Adds dictionary dict2‘s key-values pairs to dict

Page 60: Introduction to Python Language and Data Types

Keys must be immutable: numbers, strings, tuples of immutables▪ these cannot be changed after creation

reason is hashing (fast lookup technique) not lists or other dictionaries▪ these types of objects can be changed "in place"

no restrictions on values Keys will be listed in arbitrary order

again, because of hashing

Page 61: Introduction to Python Language and Data Types

if condition: statements[elif condition: statements] ...else: statements

while condition: statements

for var in sequence: statements

breakcontinue

Page 62: Introduction to Python Language and Data Types

var2 = 0 if var2: print "2 - Got a true expression value" print var2 else: print "2 - Got a false expression value" print var2 print "Good bye!“

2 - Got a false expression value 0 Good bye!

var1 = 100 if var1: print "1 - Got a true expression value" print var1 else: print "1 - Got a false expression value" print var1

1 - Got a true expression value 100

Page 63: Introduction to Python Language and Data Types

count = 0 while (count < 9):

print 'The count is:', count count = count + 1

print "Good bye!“

The count is: 0 The count is: 1 The count is: 2 The count is: 3 The count is: 4 The count is: 5 The count is: 6 The count is: 7 The count is: 8 Good bye!

Error with the program

var = 1 while var == 1 :

# This constructs an infinite loop

num = raw_input("Enter a number :") print "You entered: ",

num print "Good bye!"

Page 64: Introduction to Python Language and Data Types

for i in range(20): if i%3 == 0: print i if i%5 == 0: print "Bingo!" print "---“

for letter in 'Python': # First Example print 'Current Letter :', letter

fruits = ['banana', 'apple', 'mango'] for fruit in fruits:

# Second Example print 'Current fruit :', fruit print "Good bye!"

Page 65: Introduction to Python Language and Data Types

def name(arg1, arg2, ...): """documentation""" # optional doc string statements

return # from procedurereturn expression # from function

def add():return 5

>>>print add()5

Page 66: Introduction to Python Language and Data Types

def functionname( parameters ): "function_docstring" function_suite return [expression]

def printme( str ): "This prints a passed string into this function" print strprint ‘\n’

# Now you can call printme function printme ("I'm first call to user defined function!")printme("Again second call to the same function")

Page 67: Introduction to Python Language and Data Types

def gcd(a, b): "greatest common divisor" while a != 0: a, b = b%a, a # parallel assignment return b

>>> gcd.__doc__'greatest common divisor'>>> gcd(12, 20)4

Page 68: Introduction to Python Language and Data Types

class name: "documentation" statements-or-class name(base1, base2, ...): ...Most, statements are method definitions: def name(self, arg1, arg2, ...): ...May also be class variable assignments

Page 69: Introduction to Python Language and Data Types

class Stack: "A well-known data structure…"

def __init__(self): # constructor self.items = []

def push(self, x): self.items.append(x) def pop(self): x = self.items[-1]

del self.items[-1] return x

def empty(self): return len(self.items) == 0 # Boolean result

Page 70: Introduction to Python Language and Data Types

To create an instance, simply call the class object:x = Stack() # no 'new' operator!

To use methods of the instance, call using dot notation:x.empty() # -> 1x.push(1) # [1]x.empty() # -> 0x.push("hello")# [1, "hello"]x.pop() # -> "hello" # [1]

To inspect instance variables, use dot notation:x.items # -> [1]

Page 71: Introduction to Python Language and Data Types

When a Python program starts it only has access to a basic functions and classes.

(“int”, “dict”, “len”, “sum”, “range”, ...) “Modules” contain additional functionality Use “import” to tell Python to load a module>>> import math #Used for mathematical functions>>> import nltk #Used for NLP operations

Page 72: Introduction to Python Language and Data Types

Importing modules: import re; print re.match("[a-z]+", s) from re import match; print match("[a-z]+", s)

Import with rename: import re as regex from re import match as m Before Python 2.0:▪ import re; regex = re; del re

Page 73: Introduction to Python Language and Data Types

>>> import math>>> math.pi3.14159265358979313.1415926535897931

>>> math.cos(0)1.01.0>>> math.cos(math.pi)-1.0-1.0>>> dir(math)

['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh',['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh','asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos','asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos','cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod','cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod','frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10','frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10','log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan','log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan','tanh', 'trunc']'tanh', 'trunc']

Page 74: Introduction to Python Language and Data Types

>>> import mathmath.cosmath.cos>>> from math import cos, picos>>> from math import *

Page 75: Introduction to Python Language and Data Types

BeautifulSoup – A bit slow, but useful and easy for beginners

Numpy – Advanced mathematical functionalities SciPy – Algorithms and mathematical tools for python Matplotlib – plotting library. Excellent graphs Scrapy – Advanced Web scraping NLTK - Natural Language Toolkit (Pattern, TextBlob) IPython – Notebooks Scikit-learn – machine learning on Python Pandas – Gensim – Topic Modelling

http://www.catswhocode.com/blog/python-50-modules-for-all-needs

Page 76: Introduction to Python Language and Data Types

a, X, 9, < -- ordinary characters just match themselves exactly. The meta-characters which do not match themselves because they have special meanings are: . ^ $ * + ? { [ ] \ | ( ) (details below)

. (a period) -- matches any single character except newline '\n' \w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-

zA-Z0-9_]. Note that although "word" is the mnemonic for this, it only matches a single word char, not a whole word. \W (upper case W) matches any non-word character.

\b -- boundary between word and non-word \s -- (lowercase s) matches a single whitespace character -- space, newline,

return, tab, form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace character.

\t, \n, \r -- tab, newline, return \d -- decimal digit [0-9] (some older regex utilities do not support but \d, but

they all support \w and \s) ^ = start, $ = end -- match the start or end of the string \ -- inhibit the "specialness" of a character. So, for example, use \. to match a

period or \\ to match a slash. If you are unsure if a character has special meaning, such as '@', you can put a slash in front of it, \@, to make sure it is treated just as a character.

Page 77: Introduction to Python Language and Data Types

## Search for pattern 'iii' in string 'piiig'.

## All of the pattern must match, but it may appear anywhere ## On success, match.group() is matched text.

  match = re.search(r'iii', 'piiig') =>  found, match.group() == "iii"  match = re.search(r'igs', 'piiig') =>  not found, match == None

  ## . = any char but \n  match = re.search(r'..g', 'piiig') =>  found, match.group() == "iig"

  ## \d = digi , \w = word  match = re.search(r'\d\d\d', 'p123g') =>  found, match.group() == "123“

  match = re.search(r'\w\w\w', '@@abcd!!') =>  found, match.group() == "abc"

Page 78: Introduction to Python Language and Data Types

+ -- 1 or more occurrences of the pattern to its left, e.g. 'i+' = one or more i's * -- 0 or more occurrences of the pattern to its left ? -- match 0 or 1 occurrences of the pattern to its left

## i+ = one or more i's, as many as possible.  match = re.search(r'pi+', 'piiig') =>  found, match.group() == "piii"

  ## Finds the first/leftmost solution, and within it drives the +  ## as far as possible (aka 'leftmost and largest').  ## In this example, note that it does not get to the second set of i's.  match = re.search(r'i+', 'piigiiii') =>  found, match.group() == "ii"

  ## \s* = zero or more whitespace chars  ## Here look for 3 digits, possibly separated by whitespace.  match = re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx') =>  found, match.group() == "1 2   3"  match = re.search(r'\d\s*\d\s*\d', 'xx12  3xx') =>  found, match.group() == "12  3"  match = re.search(r'\d\s*\d\s*\d', 'xx123xx') =>  found, match.group() == "123"

  ## ^ = matches the start of string, so this fails:  match = re.search(r'^b\w+', 'foobar') =>  not found, match == None  ## but without the ^ it succeeds:  match = re.search(r'b\w+', 'foobar') =>  found, match.group() == "bar“

https://developers.google.com/edu/python/regular-expressions

Page 79: Introduction to Python Language and Data Types

def foo(x): return 1/x

def bar(x): try: print foo(x) except ZeroDivisionError, message: print "Can’t divide by zero:", message

bar(0)

Page 80: Introduction to Python Language and Data Types

Used for cleanup

f = open(file)try: process_file(f)finally: f.close()# always executed

print "OK" # executed on success only

Page 81: Introduction to Python Language and Data Types

Gives an idea where the error happened For example, in nested loops raise IndexError raise IndexError("k out of range") raise IndexError, "k out of range" try:

somethingexcept: # catch everything print "Oops" raise # reraise

Page 82: Introduction to Python Language and Data Types

“w” = “write mode” “a” = “append mode” “wb” = “write in binary” “r” = “read mode” (default) “rb” = “read in binary” “U” = “read files with Unix or Windows line endings”

>>> f = open(“names.txt")>>> f.readline()‘‘Ravi\n‘Ravi\n‘>>> f.read()‘‘Ravi\n‘Ravi\n‘‘‘Shankar\n’Shankar\n’

Page 83: Introduction to Python Language and Data Types

>>> lst= [ x for x in open("text.txt","r").readlines() ]>>> lst['Chen Lin\n', '[email protected]\n', 'Volen 110\n', 'Office ['Chen Lin\n', '[email protected]\n', 'Volen 110\n', 'Office

Hour: Thurs. 3-5\n', '\n', 'Yaqin Yang\n', Hour: Thurs. 3-5\n', '\n', 'Yaqin Yang\n', '[email protected]\n', 'Volen 110\n', 'Offiche Hour: '[email protected]\n', 'Volen 110\n', 'Offiche Hour: Tues. 3-5\n']Tues. 3-5\n']

input_file = open(“in.txt")output_file = open(“out.txt", "w")for line in input_file:

output_file.write(line)

Page 84: Introduction to Python Language and Data Types

>>> import csv >>> with open('eggs.csv', 'rb') as csvfile: ... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|') ... for row in spamreader: ... print ', '.join(row) Spam, Spam, Spam, Spam, Spam, Baked Beans Spam, Lovely Spam, Wonderful Spam

>>> with open('eggs.csv', 'wb') as csvfile: spamwriter = csv.writer(csvfile) spamwriter.writerow(['Spam'] * 5 + ['Baked Beans']) spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])

>>> w = csv.writer(open(Fn,'ab'),dialect='excel')

Page 85: Introduction to Python Language and Data Types
Page 86: Introduction to Python Language and Data Types

Python is a great language for NLP: Simple Easy to debug:▪ Exceptions▪ Interpreted language

Easy to structure▪ Modules▪ Object oriented programming

Powerful string manipulation

Page 87: Introduction to Python Language and Data Types

Here is a description from the NLTK official site:

“NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.”

Page 88: Introduction to Python Language and Data Types

A software package for manipulating linguistic data and performing NLP tasks

Advanced tasks are possible from an early stage Permits projects at various levels Consistent interfaces Facilitates reusability of modules Implemented in Python

Page 89: Introduction to Python Language and Data Types

The token class to encode information about NL texts. Each token instance represents a unit of text such as a

word, a text, or a document. A given instance is defined by a partial mapping from

property names to property values.

TreebankWordTokenizer WordPunctTokenizer PunctWordTokenizer WhitespaceTokenizer

Page 90: Introduction to Python Language and Data Types

>>> import nltk>>> text = nltk.word_tokenize(“Dive into NLTK : Part-of-speech tagging and POS Tagger”)>>> text[‘Dive’, ‘into’, ‘NLTK’, ‘:’, ‘Part-of-speech’, ‘tagging’, ‘and’, ‘POS’, ‘Tagger’]>>> nltk.pos_tag(text)[(‘Dive’, ‘JJ’), (‘into’, ‘IN’), (‘NLTK’, ‘NNP’), (‘:’, ‘:’), (‘Part-of-speech’, ‘JJ’), (‘tagging’, ‘NN’), (‘and’, ‘CC’), (‘POS’, ‘NNP’), (‘Tagger’, ‘NNP’)]

Page 91: Introduction to Python Language and Data Types

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root.

Stemming programs are commonly referred to as stemming algorithms or stemmers.

Helpful in reducing the vocabulary in large scale operations

Page 92: Introduction to Python Language and Data Types

>>> from nltk.stem.porter import PorterStemmer>>> porter_stemmer = PorterStemmer()>>> porter_stemmer.stem(‘maximum’)u’maximum’>>> porter_stemmer.stem(‘presumably’)u’presum’>>> porter_stemmer.stem(‘multiply’)u’multipli’>>> porter_stemmer.stem(‘provision’)u’provis’>>> porter_stemmer.stem(‘owed’)u’owe’>>> porter_stemmer.stem(‘ear’)u’ear’>>> porter_stemmer.stem(‘saying’)u’say’

Page 93: Introduction to Python Language and Data Types

>>> from nltk.stem.lancaster import LancasterStemmer>>> lancaster_stemmer = LancasterStemmer()>>> lancaster_stemmer.stem(‘maximum’)‘maxim’>>> lancaster_stemmer.stem(‘presumably’)‘presum’>>> lancaster_stemmer.stem(‘presumably’)‘presum’>>> lancaster_stemmer.stem(‘multiply’)‘multiply’>>> lancaster_stemmer.stem(‘provision’)u’provid’>>> lancaster_stemmer.stem(‘owed’)‘ow’>>> lancaster_stemmer.stem(‘ear’)‘ear’

Page 94: Introduction to Python Language and Data Types

>>> from nltk.stem import SnowballStemmer>>> snowball_stemmer = SnowballStemmer(“english”)>>> snowball_stemmer.stem(‘maximum’)u’maximum’>>> snowball_stemmer.stem(‘presumably’)u’presum’>>> snowball_stemmer.stem(‘multiply’)u’multipli’>>> snowball_stemmer.stem(‘provision’)u’provis’>>> snowball_stemmer.stem(‘owed’)u’owe’>>> snowball_stemmer.stem(‘ear’)u’ear’

Page 95: Introduction to Python Language and Data Types

Lemmatisation (or lemmatization) in linguistics, is the process of grouping together the different inflected forms of a word so they can be analysed as a single item.

Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech. However, stemmers are typically easier to implement and run faster, and the reduced accuracy may not matter for some applications.

Page 96: Introduction to Python Language and Data Types

>>> from nltk.stem import WordNetLemmatizer>>> wordnet_lemmatizer = WordNetLemmatizer()>>> wordnet_lemmatizer.lemmatize(‘dogs’)u’dog’>>> wordnet_lemmatizer.lemmatize(‘churches’)u’church’>>> wordnet_lemmatizer.lemmatize(‘aardwolves’)u’aardwolf’>>> wordnet_lemmatizer.lemmatize(‘abaci’)u’abacus’>>> wordnet_lemmatizer.lemmatize(‘hardrock’)‘hardrock’>>> wordnet_lemmatizer.lemmatize(‘are’)‘are’>>> wordnet_lemmatizer.lemmatize(‘is’)‘is’

Page 97: Introduction to Python Language and Data Types

Stanford POS Tagger Stanford Named Entity Recognizer

english_nertagger.tag(‘Rami Eid is studying at Stony Brook University in NY’.split())Out[3]:[(u’Rami’, u’PERSON’),(u’Eid’, u’PERSON’),(u’is’, u’O’),(u’studying’, u’O’),(u’at’, u’O’),(u’Stony’, u’ORGANIZATION’),(u’Brook’, u’ORGANIZATION’),(u’University’, u’ORGANIZATION’),(u’in’, u’O’),(u’NY’, u’O’)]

Stanford Parser

Page 98: Introduction to Python Language and Data Types

Stanford Parser english_parser.raw_parse_sents((“this is the english parser test”, “the parser is from

stanford parser”))Out[4]:[[u’this/DT is/VBZ the/DT english/JJ parser/NN test/NN’],[u'(ROOT’,u’ (S’,u’ (NP (DT this))’,u’ (VP (VBZ is)’,u’ (NP (DT the) (JJ english) (NN parser) (NN test)))))’],

Page 99: Introduction to Python Language and Data Types

Part of Speech Tagging Parsing Word Net Named Entity

Recognition Information Retrieval Sentiment Analysis Document Clustering Topic Segmentation

Authoring Machine Translation Summarization Information Extraction Spoken Dialog Systems Natural Language

Generation Word Sense

Disambiguation

Page 100: Introduction to Python Language and Data Types

Det Noun Verb Prep Adj

A 0.9 0.1 0 0 0

man 0 0.6 0.2 0 0.2

walks 0 0.2 0.8 0 0

into 0 0 0 1 0

bar 0 0.7 0.3 0 0