Lecture 11 file management
-
Upload
alvin567 -
Category
Technology
-
view
186 -
download
0
Transcript of Lecture 11 file management
Introduction to computational thinking
Module 11 : File Management
Asst Prof Michael LeesOffice: N4‐02c‐76
email: mhlees[at]ntu.edu.sgModule 11 : File Management 1 of 53
Contents
• File basics• File interaction
– writing, reading
• File facts• Other file operations• OS module
Module 11 : File Management 2 of 53
Chapter 5 & 14
FILES BASICSModule 11 : File Management
Module 11 : File Management 3 of 53
What is a File?
• A file is a collection of data that is stored on secondary storage like a disk or a thumb drive.
• Accessing a file means establishing a connection between the file and the programand moving data between the two.
Module 11 : File Management 4 of 53
Two types of file
• Files come in two general types:– Text files: files where control characters such as “/n” are translated. These are generally human readable
– Binary files: all the information is taken directly without translation. Not readable and contains non‐readable info.
Module 11 : File Management 5 of 53
Binary vs. plain text
• Plain text + human readable, useful for certain file types.‐ inefficient storage (each character requires ? bytes) : 256 combinations for ASCII = 8 bits:1byte. Unicode could be 32 bits : 4 bytes
• Binary+ More efficient storage, custom format‐ Not human readable
Module 11 : File Management 6 of 53
Example
• Storing all the ages of the class (assume 500 students)
Module 11 : File Management 7 of 53
ASCII‘20’‘18’‘21’‘19’
How many bytes per entry? ASCII = 22 x 500 = 1000 bytes
Binary20182119
How many bytes per entry? Binary= 1 byte1 byte is 0‐255 (enough for age)1 x 500 = 500 bytes
File Objects or Stream
• When opening a file, you create a file object or file stream that is a connection between the file information on disk and the program.
• The stream contains a “buffer” of the information from the file, and provides the information to the program
Module 11 : File Management 8 of 53
Module 11 : File Management 9 of 53
Buffering
• Reading from a disk is very slow. Thus the computer will read a lot of data from a file in the hope that, if you need the data in the future, it will be “buffered” in the file object.
• This means that the file object contains a copy of information from the file called a cache(pronounced “cash”).
Module 11 : File Management 10 of 53
Buffering
Module 11 : File Management 11 of 53
File Buffer (Cache)Program Diskwrite
read
write
read
Creating a file object
myFile = open(“myFile.txt”, “r”)
• myFile is the file object. • It contains the buffer of information. • The open function creates the connection between the disk file and the file object.
• The first quoted string is the file name on disk, the second is the mode to open it (here,“r” means to read).
Module 11 : File Management 12 of 53
File location
• When opened, the name of the file can come in one of two forms:– “file.txt” assumes the file name is file.txt, and it is located in the current program directory.
– “c:\bill\file.txt” is the fully qualified file name and includes the directory information.
– ‘/Users/michaellees/python/CZCE1003’
Module 11 : File Management 13 of 53
File modesMode Description‘r’ read a text file‘w’ write a text (wipes contents)‘a’ append to existing file‘b’ binary file‘+’ both read and write
Module 11 : File Management 14 of 53
• Be careful if you open a file with the ‘w’ mode. It sets an existing file’s contents to be empty, destroying any existing data.
• The ‘a’ mode is nicer, allowing you to write to the end of an existing file without changing the existing contents.
FILE INTERACTIONModule 11 : File Management
Module 11 : File Management 15 of 53
Everything is a string
• If you are interacting with plain text files (which is all we will do for this semester), remember that everything is a string:–everything read is a string– if you write to a file, you can only write a string
Module 11 : File Management 16 of 53
File contents
• Once you have a file object:• fileObject.read()
– Reads the entire contents of the file as a string and returns it. It can take an optional argument integer to limit the read to N bytes, that is fileObject.read(N).
• fileObject.readline() – Delivers the next line as a string.
Module 11 : File Management 17 of 53
More contents
• fileObject.readLines() –Returns a single list of all the lines from the file.
• for line in fileObject:– Iterator to go through the lines of a file.
Module 11 : File Management 18 of 53
Close the door behind you
• When done, you close the file. Closing is important because the information in the fileObject buffer is “flushed” out of the buffer and into the file on disk, making sure that no information is lost.
fileObject.close()
Module 11 : File Management 19 of 53
Python elegance
• Python is often praised for it’s simplicity and elegance.
for line in file(“fileToRead.txt”):print(line)
• File is automatically opened (by file( )).• File is automatically closed at the end of the for loop.
• Defaults are read and text.
Module 11 : File Management 20 of 53
Writing
• Once opened, you can write to a file (if the mode is appropriate):
• fileObject.write(s) –writes the string s to the file
• fileObject.writelines(list) –write a list of strings (one at a time) to the file
Module 11 : File Management 21 of 53
Errors?
• What if the file doesn’t exist?• Your program should behave gracefully if the file can’t be opened.
• When writing software, treat others as you would like to be treated.
• In later chapters we will describe “exception,” but for now we will just assume that you can get the file.
Module 11 : File Management 22 of 53
Challenge 11.1 File CopyWrite a program to copy the contents of a file but removing all vowels
Module 11 : File Management 23 of 53
Thought process
• Open input and output file (‘r’, ‘w’)• Process each line of the input file.• Take each line and replace any vowel with empty string “”
• Write new string to output file• Close both files!!
Module 11 : File Management 24 of 53
Copy without vowelsvowels= [‘a’, ‘e’, ‘i’, ‘o’, ‘u’, ‘A’, ‘E’, ‘I’, ‘O’, ‘U’]# File reading and writinginFile = open("input.txt", "r")outFile = open("output.txt", "w")
for line in inFile:for letter in line:
if letter in vowels:line = line.replace(letter,’’)
outFile.write(line) # written to the output file
inFile.close()outFile.close()
Module 11 : File Management 25 of 53
FILE FACTSModule 11 : File Management
Module 11 : File Management 26 of 53
Newline character
• Each operating system (Windows, OS X, Linux) developed certain standards for representing text.
• In particular, they chose different ways to represent the end of a file, the end of a line, etc.
• This can confuse our text readers!
Module 11 : File Management 27 of 53
Universal new line
• To get around this, Python provides a special file option to deal with variations of OS text encoding.
• The ‘U’ option means that Python deals with the problem so you don’t have to!
fileObj = open(‘myFile.txt’, ‘rU’)
Module 11 : File Management 28 of 53
Current file position
• Every file maintains a “current file position.” • It is the current position in the file and indicates what the file will read next.
• It is set by the mode table above.
Module 11 : File Management 29 of 53
File buffer
• When the disk file is opened, the contents of the file are copied into the buffer of the file object.
• Think of the file object as a very big list where every index is one of the pieces of information of the file.
• The current position is the present index in that list.
Module 11 : File Management 30 of 53
Module 11 : File Management 31 of 53
OTHER FILE OPERATIONSModule 11 : File Management
Module 11 : File Management 32 of 53
tell()
• The tell() method tells you the current file position.
• The positions are in bytes (think characters for ASCII) from the beginning of the file:
fileObject.tell() => 42L
Module 11 : File Management 33 of 53
seek()
• The seek() method updates the current file position to where you like (in bytes offset from the beginning of the file):– fd.seek(0) # to the beginning of the file
– fd.seek(100) # 100 bytes from beginning• Counting bytes is a pain. • Seek has a optional second argument:
– 0: count from the beginning– 1: count for the current file position– 2: count from the end (backwards)
Module 11 : File Management 34 of 53
e.g., fd.seek(-100,2)100 bytes from end of file
Reading forward
• Every read/readline/readlines moves the current pos forward.
• When you hit the end, every read will just yield “” since you are at the end.
• You need to seek to the beginning to start again (or close and open, seek is easier).
Module 11 : File Management 35 of 53
The power of pickle
• Everything is a string. So how about things that aren’t?
• Python provides a standard module called pickle. This is an amazing module that can take almost any Python object and transfer it to a file (converting to string in the process); this process is called pickling. (import pickle – it’s a module)
Module 11 : File Management 36 of 53
x = pickle.load(f)pickle.dump(x, f)Pickle object x to file f Unpickle object x to file f
Remember:str(2)=>‘2’
Challenge 11.2 MP3 ID3 tagTake an mp3 file and output the song name and artist name from the ID3 tag
Module 11 : File Management 37 of 53
Thought process
• MP3 is a binary file (can see that if you load in text editor)
• Generally file headers (like ID3) are at specific locations.
• Use the internet to find this location (Wiki)• Open file, seek to correct bytes and then print out.
Module 11 : File Management 38 of 53
OS MODULEModule 11 : File Management
Module 11 : File Management 39 of 53
What is the OS module
• The os module in Python is an interface between the operating system and the Python language.
• As such, it has many sub‐functionalities dealing with various aspects.
• We will look mostly at the file‐related stuff.import os # to use os
Module 11 : File Management 40 of 53
What is a directory/folder
• Whether in Windows, Linux or on OS X, all OS’s maintain a directory structure.
• A directory is a container of files and other Directories.
• These directories are arranged in a hierarchy or tree.
Module 11 : File Management 41 of 53
Different paths styles
• It turns out that each OS has its own way of specifying a path:– C:\bill\python\myFile.py– /Users/bill/python/myFile.py
• Nicely, Python knows that and translates to the appropriate OS.
Module 11 : File Management 42 of 53
Some OS methods
• os.getcwd(): Returns the full path of the current working directory.
• os.chdir(pathString): Change the current directory to the path provided.
• os.listdir(pathString): Return a list of the files and directories in the path (including ‘.’).
Module 11 : File Management 43 of 53
More OS methods
• os.rename(sourcePathStr, destPathStr): Renames a file or directory.
• os.mkdir(pathStr): make a new directory. So os.mkdir(‘/Users/bill/python/new’) creates the directory new under the directory python.
• os.remove(pathStr). Removes the file.• os.rmdir(pathStr). Removes the directory, but the directory must be empty.
Module 11 : File Management 44 of 53
Take home lessons
• Files are important! (Obviously)• Binary vs. Plain text• Buffers – why and how.• Reading/Writing files (binary & plain text)• Elegant degradation (more in next module)
Module 11 : File Management 45 of 53
Further reading
• http://docs.python.org/tutorial/inputoutput.html
• http://diveintopython3.org/files.html
Module 11 : File Management 46 of 53