Lecture05

68
Knowledge Representation in Digital Humanities Antonio Jiménez Mavillard Department of Modern Languages and Literatures Western University

Transcript of Lecture05

Page 1: Lecture05

Knowledge Representationin

Digital HumanitiesAntonio Jiménez Mavillard

Department of Modern Languages and LiteraturesWestern University

Page 2: Lecture05

Lecture 5

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard

* Contents: 1. Why this lecture? 2. Discussion 3. Chapter 5 4. Assignment 5. Bibliography

2

Page 3: Lecture05

Why this lecture?

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard

* This lecture... · goes deeply into the development of programming skills · introduces strings as means of text represention, the study subject for the rest of the course

3

Page 4: Lecture05

Last assignment discussion

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard

* Time to... · consolidate ideas and concepts dealt in the readings · discuss issues arised in the specific solutions to the projects

4

Page 5: Lecture05

Chapter 5

Text Representation in Python

1. More programming Python2. Complex data types

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard5

Page 6: Lecture05

Chapter 5

1 More programming in Python 1.1 Functions 1.2 Basic data types

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard6

Page 7: Lecture05

Chapter 5

2 Complex data types 2.1 Strings

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard7

Page 8: Lecture05

More programming in Python

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard8

Page 9: Lecture05

Functions

* Debugging · Syntax errors + colon at the end of def + indentation inside def · Logic errors + infinite recursion

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard9

Page 10: Lecture05

Functions

* Definition · A funcion is a named sequence of statements that performs a task · To use a function: 1. Define it 2. Call it

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard10

Page 11: Lecture05

Functions

* Definition · Syntax: + Definition

def function_name(parameters): #definition statements #body

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard11

Page 12: Lecture05

Functions

* Definition · Syntax: + Call

function_name(arguments)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard12

Page 13: Lecture05

Functions

* Arguments vs parameters · The arguments are values passed to the function call · The arguments are assigned to the parameters

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard13

Page 14: Lecture05

Functions

* Arguments vs parameters · Example:

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard14

#print versiondef mean(x, y):    print (x + y) / 2

In [1]: from mean1 import mean

In [2]: mean(2, 4)3

In [3]: 

Page 15: Lecture05

Functions* Arguments vs parameters · Example: + The parameter x takes the value of the first argument, 2 + The parameter y takes the value of the second argument, 4 + The function calculates (2 + 4) / 2 and prints the result

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard15

Page 16: Lecture05

Functions

* Scope · A block is a section of code, consisting of one or more statements grouped together · Examples: branches in if statements, code in loops for and while, body of functions...

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard16

Page 17: Lecture05

Functions* Scope · Variables created in a function are local to the function and do not exist outside · Two ways to communicate with the exterior: + arguments (input) + return statement (output)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard17

Page 18: Lecture05

Functions

* Scope · Example:

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard18

#return versiondef mean(x, y):    return (x + y) / 2

In [1]: from mean1 import mean

In [2]: m = mean(2, 4)3

In [3]: m

In [4]: from mean2 import mean

In [5]: m = mean(2, 4)

In [6]: mOut[6]: 3

In [7]: 

#print versiondef mean(x, y):    print (x + y) / 2

Page 19: Lecture05

Functions

* Exercise 1 · An integer y is a divisor of the integer x if the reminder of the division x/y is equals to 0 · An integer number is prime if it is greater than 1 and has no divisors other than 1 and itself

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard19

Page 20: Lecture05

Functions

* Exercise 1 · Write a function that prints the list of prime numbers less than 100

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard20

Page 21: Lecture05

Functions

* Exercise 1 (solution)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard21

#prime numbersdef prime_list(n):    i = 1    while i <= n:        if is_prime(i):            print i        i = i + 1

Page 22: Lecture05

Functions

* Exercise 1 (solution)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard22

def is_prime(n):    result = True    i = 1    while i <= n:        if is_divisor(i, n) and i != 1 and i != n:            result = False            break        i = i + 1    return result

Page 23: Lecture05

Functions

* Exercise 1 (solution)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard23

def is_divisor(x, y):    return y % x == 0

Page 24: Lecture05

Functions* Exercise 1 (solution)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard24

In [1]: from prime import prime_list

In [2]: prime_list(20)1235711131719

In [3]: 

Page 25: Lecture05

Functions

* About functions · There exist predefined functions ready to be used · Programmers can define new functions · A function can call another functions

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard25

Page 26: Lecture05

Functions

* Why functions? · Functions are reusable so they make a program shorter by eliminating repetitive code · Long programs divided into functions are easier to write, read, understand and debug

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard26

Page 27: Lecture05

References

Downey, Allen. “Chapter 3: Functions.” Think Python. Sebastopol, CA: O’Reilly, 2012. Print.

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard27

Page 28: Lecture05

Basic data types

* Debugging · Logic errors + mistake a variable data type

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard28

Page 29: Lecture05

Basic data types

* int · Type for numbers · Examples: 1, 1234567890* long · Type for long numbers · Examples: 101000 (a one followed by a thousand zeros)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard29

Page 30: Lecture05

Basic data types

* float · Type for floating-point numbers · Examples: 1.0, 3.1416* bool · Type for logic values · Examples: True, False

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard30

Page 31: Lecture05

Basic data types

* The type function · Returns the type of a value, variable or expression

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard31

In [1]: type(1)Out[1]: int

In [2]: x = 10**1000 + 1

In [3]: type(x)Out[3]: long

In [4]: 

Page 32: Lecture05

Basic data types

* The type function · Returns the type of a value, variable or expression

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard32

In [4]: y = 3.1 + 2.21

In [5]: type(y)Out[5]: float

In [6]: type(x == y)Out[6]: bool

In [7]: 

Page 33: Lecture05

Basic data types

* Type conversion functions · int: converts to int (if possible)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard33

In [1]: int("123")Out[1]: 123

In [2]: int(3.1416)Out[2]: 3

In [3]: 

Page 34: Lecture05

Basic data types

* Type conversion functions · float: converts to float (if possible)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard34

In [1]: float(123)Out[1]: 123.0

In [2]: float('3.1416')Out[2]: 3.1416

In [3]: 

Page 35: Lecture05

Basic data types

* Type conversion functions · bool: converts to bool (if possible)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard35

In [1]: bool([1, 2, 3])Out[1]: True

In [2]: bool(0)Out[2]: False

In [3]: 

Page 36: Lecture05

Basic data types

* Type conversion functions · str: converts to str (if possible)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard36

In [1]: str(123)Out[1]: '123'

In [2]: str(not True)Out[2]: 'False'

In [3]: 

Page 37: Lecture05

References

“5. Built-in Types — Python v2.7.6 Documentation.” N. p., n.d. Web. 17 Feb. 2014.

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard37

Page 38: Lecture05

Complex data types

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard38

Page 39: Lecture05

Strings

* Debugging · Syntax errors + not closing ''/“” · Semantic errors + not accessing the first and/or last element

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard39

Page 40: Lecture05

Strings

* Debugging · Logic errors + modifing an element + accessing to a non-existing element - index out of range

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard40

Page 41: Lecture05

Strings

* str · Type for strings · Examples: 'hello world!', “hello world!” · A string is a sequence of characters · Suitable to represent texts

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard41

Page 42: Lecture05

Strings

* Indices · Three ways to access a string: + As a whole - Example: word + Its characters one at a time - Syntax: string[index] - Example: word[1]

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard42

Page 43: Lecture05

Strings

* Indices · Three ways to access a string: + Slices - Syntax: string[index_1:index_2] - Example: word[2:5]

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard43

Page 44: Lecture05

Strings

* Exercise 2 · Figure out the range of indices for a string · Try out several examples · Extract a general pattern for any string

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard44

Page 45: Lecture05

Strings* Exercise 2 (solution)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard45

In [1]: s = 'digital'

In [2]: s[1]Out[2]: 'i'

In [3]: s[0]Out[3]: 'd'

In [4]: s[7]IndexError: string index out of range

In [5]: s[6]Out[5]: 'l'

In [6]: 

Page 46: Lecture05

Strings* Exercise 2 (solution)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard46

In [6]: s = 'humanities'

In [7]: s[1]Out[7]: 'u'

In [8]: s[0]Out[8]: 'h'

In [9]: s[10]IndexError: string index out of range

In [10]: s[9]Out[10]: 's'

In [11]: 

Page 47: Lecture05

Strings

* Exercise 2 (solution) From 0 to the string's number of characters minus 1

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard47

Page 48: Lecture05

Strings* Indices word = 'digital'

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard48

d i g i t a l

0 1 2 3 4 5 6

In [1]: word = 'digital'

In [2]: wordOut[2]: 'digital'

In [3]: word[1]Out[3]: 'i'

In [4]: word[2:5]Out[4]: 'git'

Page 49: Lecture05

Strings

* Inmutability · Strings are inmutable (cannot be modified) · To modify a string, it is necessary to reasign changes to a new (or same) string

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard49

Page 50: Lecture05

Strings

* Inmutability · Example: word += 's' + Equivalent to word = word + 's' + Accesses the value of the variable word, concatenates an s, and reasign the result to the variable word again

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard50

Page 51: Lecture05

Strings

* Some functions and operators · The len function returns the number of characters in a string

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard51

In [1]: len('digital')Out[1]: 7

In [2]: 

Page 52: Lecture05

Strings

* Exercise 3 · Write a function that receives a string and returns the number of characters (do not use the len function and do use a for loop)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard52

Page 53: Lecture05

Strings

* Exercise 3 (solution)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard53

def count(s):    counter = 0    for c in s:        counter += 1    return counter

Page 54: Lecture05

Strings

* Exercise 4 · What does this function do?

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard54

def any_function(string, char):    result = ­1    index = 0    while index < len(string):        if string[index] == char:            result = index            break        index += 1    return result

Page 55: Lecture05

Strings

* Exercise 4 (solution) It returns the (first) index of a character in a string or -1 if the not found

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard55

Page 56: Lecture05

Strings

* Exercise 5 · Write a function that counts the number of ocurrences of a character in a string

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard56

Page 57: Lecture05

Strings

* Exercise 5 (solution)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard57

def count(s, ch):    counter = 0    for c in s:        if c == ch:            counter += 1    return counter

Page 58: Lecture05

Strings* Some functions and operators · The operator in checks if a string is contained in another string

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard58

In [1]: s = 'abcde'

In [2]: 'bc' in sOut[2]: True

In [3]: 'rs' in sOut[3]: False

In [4]: 

Page 59: Lecture05

References

Downey, Allen. “Chapter 8: Strings.” Think Python. Sebastopol, CA: O’Reilly, 2012. Print.

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard59

Page 60: Lecture05

Assignment

* Assignment 5: Lexicon · Readings + Word play (Think Python) + Files (Think Python)

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard60

Page 61: Lecture05

Assignment* Assignment 5: Lexicon · Project + Grady Ward, as part of the Moby lexicon project, has collected a list of 113,809 official crosswords; that is, words that are considered valid in crossword puzzles and other word games

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard61

Page 62: Lecture05

Assignment

* Assignment 5: Lexicon · Project + Download a copy of the word list from http://thinkpython.com/code/words.txt

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard62

Page 63: Lecture05

Assignment

* Assignment 5: Lexicon · Project + Many words in English have endings (suffix) that identifies them as nouns + Some of these suffixes common to nouns are (non-exhaustive list):

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard63

Page 64: Lecture05

Assignment

* Assignment 5: Lexicon · Project -age, -ance, -ant, -cy, -dom, -ee, -ence, -ent, -er, -hood, -ing, -ism, -ist, -ity, -ment, -ness, -or, -ry, -ship, -sion, -tion, -tude

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard64

Page 65: Lecture05

Assignment* Assignment 5: Lexicon · Project + Write a program that reads the file words.txt and: - prints all the nouns (according to the previous list) - prints the number of nouns - prints the number of total words

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard65

Page 66: Lecture05

Assignment* Assignment 5: Lexicon · Project + Write a program that reads the file words.txt and: - calculates and prints the proportion (expressed in % with decimals) of nouns with respect to the total words

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard66

Page 67: Lecture05

References

Downey, Allen. “Chapter 14: Files.” Think Python. Sebastopol, CA: O’Reilly, 2012. Print.

Downey, Allen. “Chapter 9: Case Study - Word Play.” Think Python. Sebastopol, CA: O’Reilly, 2012. Print.

“Moby Project.” Wikipedia, the free encyclopedia 19 Jan. 2014. Wikipedia. Web. 20 Feb. 2014.

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard67

Page 68: Lecture05

Bibliography

“5. Built-in Types — Python v2.7.6 Documentation.” N. p., n.d. Web. 17 Feb. 2014.

Downey, Allen. Think Python. Sebastopol, CA: O’Reilly, 2012. Print.

“Moby Project.” Wikipedia, the free encyclopedia 19 Jan. 2014. Wikipedia. Web. 20 Feb. 2014.

Knowledge Representation in Digital HumanitiesAntonio Jiménez Mavillard68