Introduction to Python Language and Data Types

Invented in the Netherlands, early 90s by Guido van Rossum

Named after Monty Python Open sourced from the beginning Used by Google from the beginning 80 versions as of 4th October 2014.

“Python” or “CPython” is written in C/C++ - Version 2.7.7 came out in mid-2010 - Version 3.4.2 came out in late 2014 “Jython” is written in Java for the JVM “IronPython” is written in C# for the .Net environment

“Python is an experiment in how much freedom programmers need. Too much freedom and nobody can read another's code; too little and expressive-ness is endangered.”

- Guido van Rossum

Python is super easy to learn, relatively fast, object-oriented, strongly typed, widely used, and portable.

C is much faster but much harder to use.

Java is about as fast and slightly harder to use.

Perl is slower, is as easy to use, but is not strongly typed.

R is more of programming language for statistics.

http://docs.python.org/

PyDev with Eclipse Komodo Emacs Vim TextMate Gedit IDLE PIDA (Linux)(VIM Based) NotePad++ (Windows) BlueFish (Linux) Interactive Shell

Great for learning the language Great for experimenting with the library Great for testing your own modules Two variations: IDLE (GUI) (next slide),

python (command line) Press “Windows + R” and then write “python”. Enter Type statements or expressions at prompt:

>>> print "Hello, world"Hello, world>>> x = 12**2>>> x/272

IDLE is an Integrated Development Environment for Python, typically used on Windows

Multi-window text editor with syntax highlighting, auto-completion, smart indent and other.

Python shell with syntax highlighting. Integrated debugger

with stepping, persis-tent breakpoints,and call stack visi-bility

Install Python 2.7.7 from the official website. 32 bit as most packages are built for that.

Install setuptools, which will help to install other packages. Download the setup tools and then install.

Start the Python IDLE and start programming. Install Ipython which provides a browser-based notebook with

support for code, text, mathematical expressions, inline plots and other rich media.

Start writing small programs using problems at http://www.ling.gu.se/~lager/python_exercises.html

Can also use www.codecademy.com/tracks/python which has an interactive structure + deal with web APIs.

http://www.ling.gu.se/~lager/python_exercises.html

http://www.ling.gu.se/~lager/python_exercises.html

http://www.codecademy.com/tracks/python

Start comments with #, rest of line is ignored Can include a “documentation string” as the first line of

a new function or class you define

# My first programdef fact(n): “““fact(n) assumes n is a positive integer and returns

facorial of n.”””assert(n>0)return 1 if n==1 else n*fact(n-1)

Names are case sensitive and cannot start with a number. They can contain letters, numbers, and underscores. var Var _var _2_var_ var_2 VaR

2_Var 222var

There are some reserved words:and, assert, break, class, continue, def, del, elif, else, except, exec, finally, for, from, global, if, import, in, is, lambda, not, or, pass, print, raise, return, try, while

No need to declare unlike in JAVA Simple assignment:

▪ a = 2, A = 2 Need to assign (initialize)

▪ use of uninitialized variable raises exception Not typed

if friendly: greeting = "hello world“ #(or ‘hello word’)else: greeting = 12**2print greeting

a

1b

a

1b

a = 1

a = a+1

b = a

a 1

2old reference deletedby assignment (a=...)

new int object createdby add operator (1+1)

If you try to access a name before it’s been properly created (by placing it on the left side of an assignment), you’ll get an error.

>>> y

Traceback (most recent call last): File "<pyshell#16>", line 1, in -toplevel- yNameError: name ‘y' is not defined>>> y = 3>>> y3

Assignment manipulates references▪ x = y does not make a copy of y▪ x = y makes x reference the object y references

Very useful; but beware! Example:

>>> a = [1, 2, 3]>>> b = a>>> a.append(4)>>> print b[1, 2, 3, 4]

You can also assign to multiple names at the same time.

>>> x, y = 2, 3>>> x2>>> y3

>>> x, y, z = 2, 3, 5.67>>> x2>>> y3>>> z5.67

+ Addition - Adds values on either side of the operator

a + b will give 30

- Subtraction - Subtracts right hand operand from left hand operand

a - b will give -10

* Multiplication - Multiplies values on either side of the operator

a * b will give 200

/ Division - Divides left hand operand by right hand operand

b / a will give 2

% Modulus - Divides left hand operand by right hand operand and returns remainder

b % a will give 0

** Exponent - Performs exponential (power) calculation on operators

a**b will give 10 to the power 20

// Floor Division - The division of operands where the result is the quotient in which the digits after the decimal point are removed.

9//2 is equal to 4 and 9.0//2.0 is equal to 4.0

== Checks if the value of two operands are equal or not, if yes then condition becomes true.

(a == b) is not true.

!= Checks if the value of two operands are equal or not, if values are not equal then condition becomes true.

(a != b) is true.

<> Checks if the value of two operands are equal or not, if values are not equal then condition becomes true.

(a <> b) is true. This is similar to != operator.

> Checks if the value of left operand is greater than the value of right operand, if yes then condition becomes true.

(a > b) is not true.

< Checks if the value of left operand is less than the value of right operand, if yes then condition becomes true.

(a < b) is true.

>= Checks if the value of left operand is greater than or equal to the value of right operand, if yes then condition becomes true.

(a >= b) is not true.

<= Checks if the value of left operand is less than or equal to the value of right operand, if yes then condition becomes true.

(a <= b) is true.

& Binary AND Operator copies a bit to the result if it exists in both operands.

(a & b) will give 12 which is 0000 1100

| Binary OR Operator copies a bit if it exists in eather operand.

(a | b) will give 61 which is 0011 1101

^ Binary XOR Operator copies the bit if it is set in one operand but not both.

(a ^ b) will give 49 which is 0011 0001

~ Binary Ones Complement Operator is unary and has the efect of 'flipping' bits.

(~a ) will give -61 which is 1100 0011 in 2's complement form due to a signed binary number.

<< Binary Left Shift Operator. The left operands value is moved left by the number of bits specified by the right operand.

a << 2 will give 240 which is 1111 0000

>> Binary Right Shift Operator. The left operands value is moved right by the number of bits specified by the right operand.

a >> 2 will give 15 which is 0000 1111

Operator Description Example

and Called Logical AND operator. If both the operands are true then then condition becomes true.

(a and b) is true.

or Called Logical OR Operator. If any of the two operands are non zero then then condition becomes true.

(a or b) is true.

not Called Logical NOT Operator. Use to reverses the logical state of its operand. If a condition is true then Logical NOT operator will make false.

not(a and b) is false.

in Evaluates to true if it finds a variable in the specified sequence and false otherwise.

x in y, here in results in a 1 if x is a member of sequence y.

not in Evaluates to true if it does not finds a variable in the specified sequence and false otherwise.

x not in y, here not in results in a 1 if x is not a member of sequence y.

is Evaluates to true if the variables on either side of the operator point to the same object and false otherwise.

x is y, here is results in 1 if id(x) equals id(y).

is not Evaluates to false if the variables on either side of the operator point to the same object and true otherwise.

x is not y, here is not results in 1 if id(x) is not equal to id(y).

We use the term object to refer to any entity in a python program.

Every object has an associated type, which determines the properties of the object.

Python defines six types of built-in objects:

Number : 10String : “hello”, ‘hi’List: [1, 17, 44]Tuple : (4, 5)Dictionary: {‘food’ : ‘something you eat’,

‘lobster’ : ‘an edible, undersea arthropod’}File

Each type of object has its own properties, which we will learn about in the next several weeks.

Types can be accessed by ‘type’ method

The usual suspects▪ 12, 3.14, 0xFF, 0377, (-1+2)*3/4**5, abs(x)

Integer – 1, 2, 3, 4. Also 1L, 2L Float – 1.0, 2.0, 3.0 3.2 Addition

▪ 1+2 ->3, 1+2.0 -> 3.0 Division

▪ 1/2 -> 0 , 1./2. -> 0.5, float(1)/2 -> 0.5 Long (arbitrary precision)

▪ 2**100 -> 1267650600228229401496703205376

Flexible arrays, mutable▪ a = [99, "bottles of beer", ["on", "the", "wall"]]

Same operators as for strings▪ a+b, a*3, a[0], a[-1], a[1:], len(a)

Item and slice assignment▪ a[0] = 98▪ a[1:2] = ["bottles", "of", "beer"]

-> [98, "bottles", "of", "beer", ["on", "the", "wall"]]▪ del a[-1] # -> [98, "bottles", "of", "beer"]

[0]*3 = [0,0,0]

>>> list_ex = (23, ‘abc’, 4.56, (2,3), ‘def’)Positive index: count from the left, starting with 0

>>> list_ex[1]‘abc’

Negative index: count from right, starting with –1>>> list_ex[-3] 4.56

>>> list_ex = (23, ‘abc’, 4.56, (2,3), ‘def’)

Return a copy of the container with a subset of the original members. Start copying at the first index, and stop copying before second.

>>> list_ex[1:4](‘abc’, 4.56, (2,3))Negative indices count from end

>>> list_ex[1:-1](‘abc’, 4.56, (2,3))

>>> list_ex = (23, ‘abc’, 4.56, (2,3), ‘def’)Omit first index to make copy starting from beginning of the container

>>> list_ex[:2] (23, ‘abc’)Omit second index to make copy starting at first index and going to end

>>> list_ex[2:](4.56, (2,3), ‘def’)

[ : ] makes a copy of an entire sequence>>> list_ex[:] (23, ‘abc’, 4.56, (2,3), ‘def’)

Note the difference between these two lines for mutable sequences

>>> l2 = l1 # Both refer to 1 ref, # changing one affects both>>> l2 = l1[:] # Independent copies, two refs

>>> li = [‘abc’, 23, 4.34, 23]>>> li[1] = 45 >>> li

[‘abc’, 45, 4.34, 23]

Name li still points to the same memory reference when we’re done.

a1 2 3

b

a1 2 3

b4

a = [1, 2, 3]

a.append(4)

b = a

a 1 2 3

>>> li = [1, 11, 3, 4, 5]

>>> li.append(‘a’) # Note the method syntax>>> li[1, 11, 3, 4, 5, ‘a’]

>>> li.insert(2, ‘i’)>>> li[1, 11, ‘i’, 3, 4, 5, ‘a’]

>>> print (li + [1,2,3])[1, 11, ‘i’, 3, 4, 5, ‘a’, 1, 2, 3]

+ creates a fresh list with a new memory refextend operates on list li in place.

>>> li.extend([9, 8, 7]) >>> li[1, 2, ‘i’, 3, 4, 5, ‘a’, 9, 8, 7]

Potentially confusing: extend takes a list as an argument. append takes a singleton as an argument.>>> li.append([10, 11, 12])>>> li[1, 2, ‘i’, 3, 4, 5, ‘a’, 9, 8, 7, [10, 11, 12]]

Lists have many methods, including index, count, remove, reverse, sort >>> li = [‘a’, ‘b’, ‘c’, ‘b’]>>> li.index(‘b’) # index of 1st occurrence1>>> li.count(‘b’) # number of occurrences2>>> li.remove(‘b’) # remove 1st occurrence>>> li [‘a’, ‘c’, ‘b’]

>>> a = range(5) # [0,1,2,3,4]>>> a.append(5) # [0,1,2,3,4,5]>>> a.pop() # [0,1,2,3,4]5>>> a.pop(2) # [0,1,2,3,4]2

>>> a.insert(0, 42) # [42,0,1,2,3,4]>>> a.pop(0) # [0,1,2,3,4]42>>> a.reverse() # [4,3,2,1,0]>>> a.sort() # [0,1,2,3,4]

>>> names[0]‘Ben'>>> names[1]‘Chen'>>> names[2]‘Yaqin'>>> names[3]Traceback (most recent call last):Traceback (most recent call last):File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>IndexError: list index out of rangeIndexError: list index out of range>>> names[-1]‘Yaqin'>>> names[-2]‘Chen'>>> names[-3]‘Ben'

[0] is the first item.[1] is the second item...

Out of range valuesraise an exception

Negative valuesgo backwards fromthe last element.

Names = [ ‘Ben’, ‘Chen’, ‘Yaqin’]

>>> ids = ["9pti", "2plv", "1crn"]>>> ids.append("1alm")>>> ids['9pti', '2plv', '1crn', '1alm']>>> del ids[0]>>> ids['2plv', '1crn', '1alm']>>> ids.sort()>>> ids['1alm', '1crn', '2plv']>>> ids.reverse()>>> ids['2plv', '1crn', '1alm']>>> ids.insert(0, "9pti")>>> ids['9pti', '2plv', '1crn', '1alm']

>>> names['ben', 'chen', 'yaqin']['ben', 'chen', 'yaqin']

>>> gender = [0, 0, 1][0, 0, 1]

>>> zip(names, gender)[('ben', 0), ('chen', 0), ('yaqin', 1)][('ben', 0), ('chen', 0), ('yaqin', 1)]

Can also be done using a for loop, but this is more efficientCan also be done using a for loop, but this is more efficient

The comma is the tuple creation operator, not parentheses>>> 1, -> (1,)

Python shows parentheses for clarity (best practice)>>> (1,) -> (1,)

Don't forget the comma!>>> (1) -> 1

Non-mutable lists. Elements can be accessed same way Empty tuples have a special syntactic form

>>> () -> ()>>> tuple() -> ()

>>> yellow = (255, 255, 0) # r, g, b>>> one = (1,)>>> one = (1,)>>> yellow[0]>>> yellow[1:](255, 0)>>> yellow[0] = 0Traceback (most recent call last):Traceback (most recent call last):File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>TypeError: 'tuple' object does not support item assignmentTypeError: 'tuple' object does not support item assignment

Very common in string interpolation:>>> "%s lives in %s at latitude %.1f" % ("Andrew", "Sweden",

57.7056)'Andrew lives in Sweden at latitude 57.7'

>>> t = (23, ‘abc’, 4.56, (2,3), ‘def’)>>> t[2] = 3.14

Traceback (most recent call last): File "<pyshell#75>", line 1, in -toplevel- tu[2] = 3.14TypeError: object doesn't support item assignment

You can’t change a tuple. You can make a fresh tuple and assign its reference to a

previously used name.>>> t = (23, ‘abc’, 3.14, (2,3), ‘def’)

The immutability of tuples means they’re faster than lists.

Lists slower but more powerful than tuples Lists can be modified, and they have lots of handy

operations and methods Tuples are immutable and have fewer features

To convert between tuples and lists use the list() and tuple() functions:li = list(tu)tu = tuple(li)

▪ "hello"+"world" "helloworld" # concatenation▪ "hello"*3 "hellohellohello" # repetition▪ "hello"[0] "h" # indexing▪ "hello"[-1] "o" # (from end)▪ "hello"[1:4] "ell" # slicing▪ len("hello") 5 # size▪ "hello" < "jello" 1 # comparison▪ "e" in "hello"1 # search▪ "escapes: \n etc, \033 etc, \if etc"▪ 'single quotes' "double quotes"

>>> smiles = "C(=N)(N)N.C(=O)(O)O">>> smiles[0]'C'>>> smiles[1]'('>>> smiles[-1]'O'>>> smiles[1:5]'(=N)'>>> smiles[10:-4]'C(=O)'

Boolean test whether a value is inside a container:>>> t = [1, 2, 4, 5]>>> 2 in tTrue>>> 4 not in tFalse

For strings, tests for substrings>>> a = 'abcde'>>> 'c' in aTrue>>> 'cd' in aTrue>>> 'cdf' in aFalse

The + operator produces a new tuple, list, or string whose value is the concatenation of its arguments.

>>> (1, 2, 3) + (4, 5, 6) (1, 2, 3, 4, 5, 6)

>>> [1, 2, 3] + [4, 5, 6] [1, 2, 3, 4, 5, 6]

>>> “Hello” + “ ” + “World” ‘Hello World’

The * operator produces a new tuple, list, or string that “repeats” the original content.

>>> (1, 2, 3) * 3(1, 2, 3, 1, 2, 3, 1, 2, 3)

>>> [1, 2, 3] * 3[1, 2, 3, 1, 2, 3, 1, 2, 3]

>>> “Hello” * 3‘HelloHelloHello’

smiles = "C(=N)(N)N.C(=O)(O)Osmiles = "C(=N)(N)N.C(=O)(O)O"">>> smiles.find("(O)")15

>>> smiles.find(".")9

>>> smiles.find(".", 10)-1

>>> smiles.split(".")['C(=N)(N)N', 'C(=O)(O)O']

Use “find” to find thestart of a substring.

Start looking at position 10.

Find returns -1 if it couldn’tfind a match.

Split the string into partswith “.” as the delimiter

if "Br" in “Brother”:print "contains brother“

>>> contains brother

email_address = “ravi”if "@" not in email_address:

email_address += "@latentview.com“

print email_address>>> [email protected]

>>> line = " # This is a comment line \n">>> line.strip()'# This is a comment line'>>> line.rstrip()' # This is a comment line'>>> line.rstrip("\n")' # This is a comment line '

email = [email protected]>>> email.startswith(“c")TrueTrue>>> email.endswith(“u")TrueTrue

>>> "%[email protected]" % "clin"'[email protected]''[email protected]'

>>> names = [“Ben", “Chen", “Yaqin"]>>> ", ".join(names)‘‘Ben, Chen, Yaqin‘Ben, Chen, Yaqin‘

>>> “chen".upper()‘‘CHEN‘CHEN‘

capitalize() -- Capitalizes first letter of string count(str, beg= 0,end=len(string))

Counts how many times str occurs in string. encode(encoding=’UTF-8′,errors=’strict’)

Returns encoded string version of string isalnum()

Returns true if string has at least 1 character isalpha()

Returns true if string has at least 1 character isdigit()

Returns true if string contains only digits and false otherwise

Strings are read only

>>> s = "andrew">>> s[0] = "A"Traceback (most recent call last):Traceback (most recent call last):File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>TypeError: 'str' object does not support item assignmentTypeError: 'str' object does not support item assignment

>>> s = "A" + s[1:]>>> s'Andrew‘

\n -> newline\t -> tab\\ -> backslash...But Windows uses backslash for directories!filename = "M:\nickel_project\reactive.smi" # DANGER!filename = "M:\\nickel_project\\reactive.smi" # Better!filename = "M:/nickel_project/reactive.smi" # Usually works

Dictionaries are lookup tables. Random, not sorted They map from a “key” to a “value”.

symbol_to_name = {"H": "hydrogen","He": "helium","Li": "lithium","C": "carbon","O": "oxygen","N": "nitrogen"}

Duplicate keys are not allowed Duplicate values are just fine Keys can be any immutable value

Hash tables, "associative arrays"▪ d = {"duck": "eend", "water": "water"}

Lookup:▪ d["duck"] -> "eend"▪ d["back"] # raises KeyError exception

Delete, insert, overwrite:▪ del d["water"] # {"duck": "eend", "back": "rug"}▪ d["back"] = "rug" # {"duck": "eend", "back": "rug"}▪ d["duck"] = "duik" # {"duck": "duik", "back": "rug"}

>>> symbol_to_name["C"]'carbon''carbon'>>> "O" in symbol_to_name, "U" in symbol_to_name(True, False)(True, False)>>> "oxygen" in symbol_to_nameFalse>>> symbol_to_name["P"]>>> symbol_to_name["P"]

Traceback (most recent call last):Traceback (most recent call last):File "<stdin>", line 1, in <module>File "<stdin>", line 1, in <module>KeyError: 'P'KeyError: 'P'>>> symbol_to_name.get("P", "unknown")'unknown''unknown'>>> symbol_to_name.get("C", "unknown")'carbon''carbon'

Keys, values, items:▪ d.keys() -> ["duck", "back"]▪ d.values() -> ["duik", "rug"]▪ d.items() -> [("duck","duik"), ("back","rug")]

Presence check:▪ d.has_key("duck") -> 1; d.has_key("spam") -> 0

Values of any type; keys almost any▪ {"name":"Guido", "age":43, ("hello","world"):1,

42:"yes", "flag": ["red","white","blue"]}

>>> symbol_to_name.keys()['C', 'H', 'O', 'N', 'Li', 'He']['C', 'H', 'O', 'N', 'Li', 'He']

>>> symbol_to_name.values()['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']['carbon', 'hydrogen', 'oxygen', 'nitrogen', 'lithium', 'helium']

>>> symbol_to_name.update( {"P": "phosphorous", "S": "sulfur"} )

>>> symbol_to_name.items()[('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'), [('C', 'carbon'), ('H', 'hydrogen'), ('O', 'oxygen'), ('N', 'nitrogen'),

('P', 'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', ('P', 'phosphorous'), ('S', 'sulfur'), ('Li', 'lithium'), ('He', 'helium')]'helium')]

>>> del symbol_to_name['C']>>> symbol_to_name{'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He': {'H': 'hydrogen', 'O': 'oxygen', 'N': 'nitrogen', 'Li': 'lithium', 'He':

'helium'}'helium'}

dict.clear()Removes all elements of dictionary dict

dict.fromkeys()Create a new dictionary with keys from seq and values set to value

dict.setdefault(key, default=None)Similar to get(), but will set dict[key]=default if key is not already in dict

dict.update(dict2)Adds dictionary dict2‘s key-values pairs to dict

Keys must be immutable: numbers, strings, tuples of immutables▪ these cannot be changed after creation

reason is hashing (fast lookup technique) not lists or other dictionaries▪ these types of objects can be changed "in place"

no restrictions on values Keys will be listed in arbitrary order

again, because of hashing

if condition: statements[elif condition: statements] ...else: statements

while condition: statements

for var in sequence: statements

breakcontinue

var2 = 0 if var2: print "2 - Got a true expression value" print var2 else: print "2 - Got a false expression value" print var2 print "Good bye!“

2 - Got a false expression value 0 Good bye!

var1 = 100 if var1: print "1 - Got a true expression value" print var1 else: print "1 - Got a false expression value" print var1

1 - Got a true expression value 100

count = 0 while (count < 9):

print 'The count is:', count count = count + 1

print "Good bye!“

The count is: 0 The count is: 1 The count is: 2 The count is: 3 The count is: 4 The count is: 5 The count is: 6 The count is: 7 The count is: 8 Good bye!

Error with the program

var = 1 while var == 1 :

# This constructs an infinite loop

num = raw_input("Enter a number :") print "You entered: ",

num print "Good bye!"

for i in range(20): if i%3 == 0: print i if i%5 == 0: print "Bingo!" print "---“

for letter in 'Python': # First Example print 'Current Letter :', letter

fruits = ['banana', 'apple', 'mango'] for fruit in fruits:

# Second Example print 'Current fruit :', fruit print "Good bye!"

def name(arg1, arg2, ...): """documentation""" # optional doc string statements

return # from procedurereturn expression # from function

def add():return 5

>>>print add()5

def functionname( parameters ): "function_docstring" function_suite return [expression]

def printme( str ): "This prints a passed string into this function" print strprint ‘\n’

# Now you can call printme function printme ("I'm first call to user defined function!")printme("Again second call to the same function")

def gcd(a, b): "greatest common divisor" while a != 0: a, b = b%a, a # parallel assignment return b

>>> gcd.__doc__'greatest common divisor'>>> gcd(12, 20)4

class name: "documentation" statements-or-class name(base1, base2, ...): ...Most, statements are method definitions: def name(self, arg1, arg2, ...): ...May also be class variable assignments

class Stack: "A well-known data structure…"

def __init__(self): # constructor self.items = []

def push(self, x): self.items.append(x) def pop(self): x = self.items[-1]

del self.items[-1] return x

def empty(self): return len(self.items) == 0 # Boolean result

To create an instance, simply call the class object:x = Stack() # no 'new' operator!

To use methods of the instance, call using dot notation:x.empty() # -> 1x.push(1) # [1]x.empty() # -> 0x.push("hello")# [1, "hello"]x.pop() # -> "hello" # [1]

To inspect instance variables, use dot notation:x.items # -> [1]

When a Python program starts it only has access to a basic functions and classes.

(“int”, “dict”, “len”, “sum”, “range”, ...) “Modules” contain additional functionality Use “import” to tell Python to load a module>>> import math #Used for mathematical functions>>> import nltk #Used for NLP operations

Importing modules: import re; print re.match("[a-z]+", s) from re import match; print match("[a-z]+", s)

Import with rename: import re as regex from re import match as m Before Python 2.0:▪ import re; regex = re; del re

>>> import math>>> math.pi3.14159265358979313.1415926535897931

>>> math.cos(0)1.01.0>>> math.cos(math.pi)-1.0-1.0>>> dir(math)

['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh',['__doc__', '__file__', '__name__', '__package__', 'acos', 'acosh','asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos','asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos','cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod','cosh', 'degrees', 'e', 'exp', 'fabs', 'factorial', 'floor', 'fmod','frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10','frexp', 'fsum', 'hypot', 'isinf', 'isnan', 'ldexp', 'log', 'log10','log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan','log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan','tanh', 'trunc']'tanh', 'trunc']

>>> import mathmath.cosmath.cos>>> from math import cos, picos>>> from math import *

BeautifulSoup – A bit slow, but useful and easy for beginners

Numpy – Advanced mathematical functionalities SciPy – Algorithms and mathematical tools for python Matplotlib – plotting library. Excellent graphs Scrapy – Advanced Web scraping NLTK - Natural Language Toolkit (Pattern, TextBlob) IPython – Notebooks Scikit-learn – machine learning on Python Pandas – Gensim – Topic Modelling

http://www.catswhocode.com/blog/python-50-modules-for-all-needs



a, X, 9, < -- ordinary characters just match themselves exactly. The meta-characters which do not match themselves because they have special meanings are: . ^ $ * + ? { [ ] \ | ( ) (details below)

. (a period) -- matches any single character except newline '\n' \w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-

zA-Z0-9_]. Note that although "word" is the mnemonic for this, it only matches a single word char, not a whole word. \W (upper case W) matches any non-word character.

\b -- boundary between word and non-word \s -- (lowercase s) matches a single whitespace character -- space, newline,

return, tab, form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace character.

\t, \n, \r -- tab, newline, return \d -- decimal digit [0-9] (some older regex utilities do not support but \d, but

they all support \w and \s) ^ = start, $ = end -- match the start or end of the string \ -- inhibit the "specialness" of a character. So, for example, use \. to match a

period or \\ to match a slash. If you are unsure if a character has special meaning, such as '@', you can put a slash in front of it, \@, to make sure it is treated just as a character.

## Search for pattern 'iii' in string 'piiig'.

## All of the pattern must match, but it may appear anywhere ## On success, match.group() is matched text.

match = re.search(r'iii', 'piiig') => found, match.group() == "iii" match = re.search(r'igs', 'piiig') => not found, match == None

## . = any char but \n match = re.search(r'..g', 'piiig') => found, match.group() == "iig"

## \d = digi , \w = word match = re.search(r'\d\d\d', 'p123g') => found, match.group() == "123“

match = re.search(r'\w\w\w', '@@abcd!!') => found, match.group() == "abc"

+ -- 1 or more occurrences of the pattern to its left, e.g. 'i+' = one or more i's * -- 0 or more occurrences of the pattern to its left ? -- match 0 or 1 occurrences of the pattern to its left

## i+ = one or more i's, as many as possible. match = re.search(r'pi+', 'piiig') => found, match.group() == "piii"

## Finds the first/leftmost solution, and within it drives the + ## as far as possible (aka 'leftmost and largest'). ## In this example, note that it does not get to the second set of i's. match = re.search(r'i+', 'piigiiii') => found, match.group() == "ii"

## \s* = zero or more whitespace chars ## Here look for 3 digits, possibly separated by whitespace. match = re.search(r'\d\s*\d\s*\d', 'xx1 2 3xx') => found, match.group() == "1 2 3" match = re.search(r'\d\s*\d\s*\d', 'xx12 3xx') => found, match.group() == "12 3" match = re.search(r'\d\s*\d\s*\d', 'xx123xx') => found, match.group() == "123"

## ^ = matches the start of string, so this fails: match = re.search(r'^b\w+', 'foobar') => not found, match == None ## but without the ^ it succeeds: match = re.search(r'b\w+', 'foobar') => found, match.group() == "bar“

https://developers.google.com/edu/python/regular-expressions



def foo(x): return 1/x

def bar(x): try: print foo(x) except ZeroDivisionError, message: print "Can’t divide by zero:", message

bar(0)

Used for cleanup

f = open(file)try: process_file(f)finally: f.close()# always executed

print "OK" # executed on success only

Gives an idea where the error happened For example, in nested loops raise IndexError raise IndexError("k out of range") raise IndexError, "k out of range" try:

somethingexcept: # catch everything print "Oops" raise # reraise

“w” = “write mode” “a” = “append mode” “wb” = “write in binary” “r” = “read mode” (default) “rb” = “read in binary” “U” = “read files with Unix or Windows line endings”

>>> f = open(“names.txt")>>> f.readline()‘‘Ravi\n‘Ravi\n‘>>> f.read()‘‘Ravi\n‘Ravi\n‘‘‘Shankar\n’Shankar\n’

>>> lst= [ x for x in open("text.txt","r").readlines() ]>>> lst['Chen Lin\n', '[email protected]\n', 'Volen 110\n', 'Office ['Chen Lin\n', '[email protected]\n', 'Volen 110\n', 'Office

Hour: Thurs. 3-5\n', '\n', 'Yaqin Yang\n', Hour: Thurs. 3-5\n', '\n', 'Yaqin Yang\n', '[email protected]\n', 'Volen 110\n', 'Offiche Hour: '[email protected]\n', 'Volen 110\n', 'Offiche Hour: Tues. 3-5\n']Tues. 3-5\n']

input_file = open(“in.txt")output_file = open(“out.txt", "w")for line in input_file:

output_file.write(line)

>>> import csv >>> with open('eggs.csv', 'rb') as csvfile: ... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|') ... for row in spamreader: ... print ', '.join(row) Spam, Spam, Spam, Spam, Spam, Baked Beans Spam, Lovely Spam, Wonderful Spam

>>> with open('eggs.csv', 'wb') as csvfile: spamwriter = csv.writer(csvfile) spamwriter.writerow(['Spam'] * 5 + ['Baked Beans']) spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])

>>> w = csv.writer(open(Fn,'ab'),dialect='excel')

Python is a great language for NLP: Simple Easy to debug:▪ Exceptions▪ Interpreted language

Easy to structure▪ Modules▪ Object oriented programming

Powerful string manipulation

Here is a description from the NLTK official site:

“NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.”

A software package for manipulating linguistic data and performing NLP tasks

Advanced tasks are possible from an early stage Permits projects at various levels Consistent interfaces Facilitates reusability of modules Implemented in Python

The token class to encode information about NL texts. Each token instance represents a unit of text such as a

word, a text, or a document. A given instance is defined by a partial mapping from

property names to property values.

TreebankWordTokenizer WordPunctTokenizer PunctWordTokenizer WhitespaceTokenizer

>>> import nltk>>> text = nltk.word_tokenize(“Dive into NLTK : Part-of-speech tagging and POS Tagger”)>>> text[‘Dive’, ‘into’, ‘NLTK’, ‘:’, ‘Part-of-speech’, ‘tagging’, ‘and’, ‘POS’, ‘Tagger’]>>> nltk.pos_tag(text)[(‘Dive’, ‘JJ’), (‘into’, ‘IN’), (‘NLTK’, ‘NNP’), (‘:’, ‘:’), (‘Part-of-speech’, ‘JJ’), (‘tagging’, ‘NN’), (‘and’, ‘CC’), (‘POS’, ‘NNP’), (‘Tagger’, ‘NNP’)]

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root.

Stemming programs are commonly referred to as stemming algorithms or stemmers.

Helpful in reducing the vocabulary in large scale operations

>>> from nltk.stem.porter import PorterStemmer>>> porter_stemmer = PorterStemmer()>>> porter_stemmer.stem(‘maximum’)u’maximum’>>> porter_stemmer.stem(‘presumably’)u’presum’>>> porter_stemmer.stem(‘multiply’)u’multipli’>>> porter_stemmer.stem(‘provision’)u’provis’>>> porter_stemmer.stem(‘owed’)u’owe’>>> porter_stemmer.stem(‘ear’)u’ear’>>> porter_stemmer.stem(‘saying’)u’say’

>>> from nltk.stem.lancaster import LancasterStemmer>>> lancaster_stemmer = LancasterStemmer()>>> lancaster_stemmer.stem(‘maximum’)‘maxim’>>> lancaster_stemmer.stem(‘presumably’)‘presum’>>> lancaster_stemmer.stem(‘presumably’)‘presum’>>> lancaster_stemmer.stem(‘multiply’)‘multiply’>>> lancaster_stemmer.stem(‘provision’)u’provid’>>> lancaster_stemmer.stem(‘owed’)‘ow’>>> lancaster_stemmer.stem(‘ear’)‘ear’

>>> from nltk.stem import SnowballStemmer>>> snowball_stemmer = SnowballStemmer(“english”)>>> snowball_stemmer.stem(‘maximum’)u’maximum’>>> snowball_stemmer.stem(‘presumably’)u’presum’>>> snowball_stemmer.stem(‘multiply’)u’multipli’>>> snowball_stemmer.stem(‘provision’)u’provis’>>> snowball_stemmer.stem(‘owed’)u’owe’>>> snowball_stemmer.stem(‘ear’)u’ear’

Lemmatisation (or lemmatization) in linguistics, is the process of grouping together the different inflected forms of a word so they can be analysed as a single item.

Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech. However, stemmers are typically easier to implement and run faster, and the reduced accuracy may not matter for some applications.

>>> from nltk.stem import WordNetLemmatizer>>> wordnet_lemmatizer = WordNetLemmatizer()>>> wordnet_lemmatizer.lemmatize(‘dogs’)u’dog’>>> wordnet_lemmatizer.lemmatize(‘churches’)u’church’>>> wordnet_lemmatizer.lemmatize(‘aardwolves’)u’aardwolf’>>> wordnet_lemmatizer.lemmatize(‘abaci’)u’abacus’>>> wordnet_lemmatizer.lemmatize(‘hardrock’)‘hardrock’>>> wordnet_lemmatizer.lemmatize(‘are’)‘are’>>> wordnet_lemmatizer.lemmatize(‘is’)‘is’

Stanford POS Tagger Stanford Named Entity Recognizer

english_nertagger.tag(‘Rami Eid is studying at Stony Brook University in NY’.split())Out[3]:[(u’Rami’, u’PERSON’),(u’Eid’, u’PERSON’),(u’is’, u’O’),(u’studying’, u’O’),(u’at’, u’O’),(u’Stony’, u’ORGANIZATION’),(u’Brook’, u’ORGANIZATION’),(u’University’, u’ORGANIZATION’),(u’in’, u’O’),(u’NY’, u’O’)]

Stanford Parser

Stanford Parser english_parser.raw_parse_sents((“this is the english parser test”, “the parser is from

stanford parser”))Out[4]:[[u’this/DT is/VBZ the/DT english/JJ parser/NN test/NN’],[u'(ROOT’,u’ (S’,u’ (NP (DT this))’,u’ (VP (VBZ is)’,u’ (NP (DT the) (JJ english) (NN parser) (NN test)))))’],

Part of Speech Tagging Parsing Word Net Named Entity

Recognition Information Retrieval Sentiment Analysis Document Clustering Topic Segmentation

Authoring Machine Translation Summarization Information Extraction Spoken Dialog Systems Natural Language

Generation Word Sense

Disambiguation

Det Noun Verb Prep Adj

A 0.9 0.1 0 0 0

man 0 0.6 0.2 0 0.2

walks 0 0.2 0.8 0 0

into 0 0 0 1 0

bar 0 0.7 0.3 0 0

Introduction to Python Language and Data Types

Education

Transcript of Introduction to Python Language and Data Types