Week4, Homework #5 -...
Transcript of Week4, Homework #5 -...
![Page 1: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/1.jpg)
Carnegie Mellon Worcester Polytechnic Institute
Homework #5
Professor Hugh C. Lauer CS-1004 — Introduction to Programming for Non-Majors
(Slides include materials from Python Programming: An Introduction to Computer Science, 2nd edition, by John Zelle and copyright notes by Prof. George Heineman of Worcester Polytechnic Institute)
Homework #5 CS-1004, A-Term 2015 1
![Page 2: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/2.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Assignment — HW5
Read one or more files of English text
Create a list of unique words that occurs in those files With count of number of occurrences of each word
Alphabetically
Write that list to another file
Homework #5 CS-1004, A-Term 2015 2
![Page 3: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/3.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Objectives
Become familiar with working with strings, lists, and files
Learn how to sort a list
Learn how read from and write to files
Learn how to create formatted output
Your biggest, most advanced Python program to date
Due, Friday, October 2, 6:00 PM
Homework #5 CS-1004, A-Term 2015 3
![Page 4: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/4.jpg)
Carnegie Mellon Worcester Polytechnic Institute
Strongly encouraged to work in 2-person teams
Send e-mail to [email protected] if you would like help in finding a partner
Homework #5 CS-1004, A-Term 2015 4
![Page 5: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/5.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Note
This is a common assignment in C and C++ language courses
Done differently Usually with a data structure called binary tree
Homework #5 CS-1004, A-Term 2015 5
![Page 6: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/6.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Note 2
§11.6.3 of textbook shows solution using Python dictionaries Somewhat simpler
NOT PERMITTED FOR THIS ASSIGNMENT!
Homework #5 CS-1004, A-Term 2015 6
![Page 7: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/7.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Structure for HW5
Three modules plus wrapper
Primary modules 1. Open input file, scan for words, strip punctuation,
etc.
2. Accumulate words from multiple files, eliminate duplicates, count
3. Write output file in required format
Wrapper Manage other modules
Prompt user for file names, etc.
(Extra credit) interpret command line arguments
Test parts of program
Homework #5 CS-1004, A-Term 2015 7
![Page 8: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/8.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Example — Gettysburg address
Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.
But, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our ….
Homework #5 CS-1004, A-Term 2015 8
![Page 9: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/9.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Example output — Gettysburg address
7 a
1 above
1 add
1 advanced
1 ago
1 all
...
1 task
1 testing
13 that
11 the
...
1 work
1 world
1 years
------------
138 Distinct words
Homework #5 CS-1004, A-Term 2015 9
![Page 10: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/10.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Requirement
Read one or more input files
Break into individual words
Remove punctuation between words …
… but not within words
Example “But, in a larger sense, we can not dedicate --”
Homework #5 CS-1004, A-Term 2015 10
Remove these And this
And this
![Page 11: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/11.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Requirement
Read one or more input files
Break into individual words
Remove punctuation between words …
… but not within words
Example “Bob’s hard-hearted attitude was his undoing”
Homework #5 CS-1004, A-Term 2015 11
But not this Or this
![Page 12: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/12.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
How to read lines from a file
f = open(filename, mode)
filename is a string
Relative to current directory!
mode should be 'r' (i.e., read)
for line in f:
# process line here
f.close() # finished with file!
Each line is a string ending in '\n'
Homework #5 CS-1004, A-Term 2015 12
![Page 13: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/13.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Extracting words from string
Let line be the string
'brought forth on this continent, a new nation,\n' (without the enclosing quotes)
Then line.split() returns the list:–
['brought', 'forth', 'on', 'this', 'continent,', 'a', 'new', 'nation,']
I.e., partitioned at white-space
Definition — white-space Space, tab, line feed, return, form feed, and vertical
tab
See Python documentation > Python standard library > Text, §6.1
Homework #5 CS-1004, A-Term 2015 13
Note: line.split() method is more general Can split at any set of characters!
Note embedded commas
![Page 14: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/14.jpg)
Carnegie Mellon Worcester Polytechnic Institute
Questions?
Homework #5 CS-1004, A-Term 2015 14
![Page 15: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/15.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
How to get rid of punctuation
line.strip() method
Also line.rstrip(), line.lstrip()
Argument is a string of the characters to remove …
… from leading and trailing end!
Example, let list[4] be 'continent,'
Then
list[4].strip('.,;:-?!')
returns a new string with these characters stripped from the ends — i.e.,
'continent'
However, "Bob’s".strip('.,;:-?!')
returns
"Bob’s"
Homework #5 CS-1004, A-Term 2015 15
![Page 16: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/16.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Note
split() first, then strip()!
I.e., break into words with punctuation first, …
… then remove the punctuation from ends of words, …
… leaving contractions, possessives, hyphenated word intact!
§11.6.3 does strip() first, then split()
Loses internal hyphens and apostrophes!
Produces many non-words 's', 'snt', 't', 've'
Homework #5 CS-1004, A-Term 2015 16
![Page 17: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/17.jpg)
Carnegie Mellon Worcester Polytechnic Institute
Questions?
You should have enough to read file and split into list (or lists) of words!
One module of your homework project!
Homework #5 CS-1004, A-Term 2015 17
![Page 18: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/18.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
What next?
Collect all words from all files into one list
Sort the list Using list.sort() method
Sorts in place!
Result:– Same words
Lots of duplicates — 'is', 'and', 'the', …
Need to design an algorithm to … … loop thru list
… for each repeated word, increment count
… for each new word, emit word & count to new list
Homework #5 CS-1004, A-Term 2015 18
![Page 19: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/19.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Suggestion
Make a list of pairs I.e., (count, string)
No duplicate strings!
E.g., [ ...,
(1, 'task'),
(1, 'testing'),
(13, 'that'),
(11, 'the'),
...
]
Homework #5 CS-1004, A-Term 2015 19
![Page 20: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/20.jpg)
Carnegie Mellon Worcester Polytechnic Institute
This is second module!
Short but challenging!
Homework #5 CS-1004, A-Term 2015 20
![Page 21: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/21.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Third Module
Format output and write to file
Will discuss next time!
Homework #5 CS-1004, A-Term 2015 21
![Page 22: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/22.jpg)
Carnegie Mellon Worcester Polytechnic Institute
Questions?
Homework #5 CS-1004, A-Term 2015 22
![Page 23: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/23.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Command Lines
Windows, Macintosh, and Linux all have “command prompt” windows
Command line format:–
verb arg1 arg2 arg3 ...
verb is name of a program that carries out command action
Each arg is a string
Delimited by spaces
arg0 is the verb!
Meaning:– Apply verb to the list of arguments Don’t return till finished!
Homework #5 CS-1004, A-Term 2015 23
![Page 24: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/24.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Operating System’s Responsibility
Pick apart command line Create a list of strings called “argv”
Number of items in list is “argc”
Load the program named verb (i.e., arg0) into a clean memory space.
Call the function with the name main(), passing argc and argv as arguments
Wait till it returns, continue with next command line
Homework #5 CS-1004, A-Term 2015 24
![Page 25: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/25.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Starting programs in a GUI
User “opens” a file or document
OS or Window manager consults list of file types Finds program that opens the type of this file or
document
Based on “extension” of file name
(Essentially) constructs a command line! As if it had been typed
Name of verb (i.e., program) as arg0
Name of file to be opened as arg1
Other arguments as needed
Calls main() function of the program!
Homework #5 CS-1004, A-Term 2015 25
![Page 26: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/26.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
What about Python?
Command must be python or python3
Command line must be python HW5.py outFile InFile1 InFile2 …
Getting the arguments into Python import sys.argv
sys.argv is a list containing the strings:–
['HW5.py', 'outFile', 'InFile1', 'InFile2', …]
Homework #5 CS-1004, A-Term 2015 26
Windows Macintosh Linux
![Page 27: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/27.jpg)
Carnegie Mellon Worcester Polytechnic Institute
Questions?
Homework #5 CS-1004, A-Term 2015 27
![Page 28: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/28.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
string.format()
A method for formatting output strings To keep columns aligned
To manage ‘field widths’
To manage #’s of significant digits in floats
Etc.
Let T be a template Structure of template to be described below
Then
T.format(value, value, value, …)
Makes a copy of T
Fills in the value arguments in the “slots” of new copy of T
Formats each value argument according to specifications in each “slot”
Homework #5 CS-1004, A-Term 2015 28
![Page 29: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/29.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Template
See §5.8.2 of textbook
See 6.1.3 of Python Documentation “Format String Syntax”
Similar to formatting tools in other high-level languages
Example:– T = "Hello {0} {1}, you may have won ${2}"
T.format('Mr.', 'Smith', 1000)
'Hello Mr. Smith, you may have won $1000'
Homework #5 CS-1004, A-Term 2015 29
![Page 30: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/30.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
Other formatting examples
T = 'left justification: {0:<5}'
T.format("hi!")
T = 'right justification: {0:>5}'
T.format("lo!")
Numbers with decimals
Decimal precisions
Commas in numbers
Locale-specific formats
Homework #5 CS-1004, A-Term 2015 30
![Page 31: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/31.jpg)
Carnegie Mellon Worcester Polytechnic Institute Worcester Polytechnic Institute
References
Textbook, §5.8.2
Python 3.4.2 Documentation > Python Standard Library > Text §6.1.2, 6.1.3
Online help
Homework #5 CS-1004, A-Term 2015 31
![Page 32: Week4, Homework #5 - web.cs.wpi.eduweb.cs.wpi.edu/~cs1004/a15/Protected/LectureNotes-A15/Week4_Homework5.pdfWorcester Polytechnic Institute Carnegie Mellon Objectives Become familiar](https://reader035.fdocuments.in/reader035/viewer/2022071510/612eaa3c1ecc51586942f4c1/html5/thumbnails/32.jpg)
Carnegie Mellon Worcester Polytechnic Institute
Questions?
Homework #5 CS-1004, A-Term 2015 32