Computer Science 121
description
Transcript of Computer Science 121
Computer Science 121
Scientific ComputingWinter 2014
Chapter 5Files and Scripts
Files and Scripts
● File (non-technical): (Word) document, image,
recording, video, etc.
● File (technical): a named collection of bytes on
disk.
● ASCII vs. Binary● “ASCII file” means “file that can be viewed as text by a
program (Notepad) that interprets each byte as an ASCII code”.
● Binary file is anything that cannot be viewed that way● “JPEG file” means “file that can be viewed as an image by using a
program (Photoshop) that interprets the bytes as JPEG-encoded image.
● “MP3 File” means “file that can be watched/heard as a video/audio recording by using a program that interprets the bytes as an MP3-encoded video / audio stream”.
● “Foo File” means “file whose contents can be experienced by using a program that interprets the bytes as a Foo encoding”.
● XML (eXtensible Markup Language) is an attempt to compromise between binary and ASCII: make all data human-readable
5.1 Filenames
● General format: name . extension
● For historical reasons, extension is usually
three characters.
● Extension tells OS what program to use to open
file (MS Word, Excel, Matlab, ...)
Aside: File Deletion
foo.m 011010
110101
000100
111011
OMFG.jpg
hamlet.doc
sort.m
● Q.: What happens when you “delete” a file?
● (Drag OMFG.jpg to trash and empty trash…)
Aside: File Deletion
● A.: What appears to happen...
foo.m 011010
110101
111011hamlet.doc
sort.m
Aside: File Deletion
● A.: What actually happens ...
● Then use WinUnDelete (e.g.) to get back
OMFG.jpg
foo.m 011010
110101
111011hamlet.doc000100
sort.m
Directory Structure
● Directories (folders) are organized hierarchically
(one inside another)
● So we are forced to choose a single organization
method (like library with card catalog indexed
only by author)
● But we can use links (shortcuts) to add additional
organization, without copying files.
Pathnames
● Pathname is “full name” of directory in a linear
form– e.g., C:\MyDocuments\cs121\myproj\new\
● Complete filename includes path– e.g., C:\MyDocuments\cs121\myproj\new\myprog.m
● This becomes important because of the ...
Working Directory>> pwd % print working directory
ans = C:\MATLAB\work
● Without extra effort, we can only access files in our
working directory
>> myprog % run myprog.m script
ERROR: myprog? LOL!!
Working Directory
● Solutions● Make shortcuts from working directory
(annoying)● >> cd('C:\MyDocuments\cs121\myproj\new\')
>> myprog
ERROR: Can't find someOther.m… loser!● Use Matlab File menu to add paths:
File / Set Path...
Set Path
How Matlab Uses Paths● When we type a name foo into the interpreter, Matlab follows this sequence:
1. Looks for foo as a variable. If not found, ...
2. Looks in the current directory for a file named foo.m. If not found, ... 3. Searches the directories on the MATLAB search path, in order, for foo.bi (built-in function) or foo.m. If not found, ... 4. Reports ERROR
5.2 File operators
● File write/read operators allow us to save/restore
values from previous Matlab sessions.
● File / Save Workspace As... is simplest way to do
this – saves everything to a .mat file
● If we want to save/restore specific variables, we
can use the save and load commands:
5.2 File operators>> a = 'foo'; b = 2; c = pi;>> save myvariables a b>> clear>> load myvariables>> who
Your variables are:
a b
– I never use the other syntax
( >> save('myvariables', 'a', 'b'
)
5.3 Importing and Exporting Data
• Often want to get data from other programs
(Excel, LabView, text editor) into Matlab,
and save data in a format that other programs
can read.
• Excel saves data in binary, proprietary (of
course!) .xls format
5.3 Importing and Exporting Data
• Generally, other formats will all be text-
based (ASCII)–.csv : comma-delimited values (no
commas in vals)–.dlm : other delimiter (allows commas in
vals)–.xml : eXtensible Markup Language
(newer)
Spreadsheet data should have all cells filled (“flat format”), or Matlab will get confused:
YES NO
csvread operator allows us to read numerical data, but we need to cut off the header in the file:
Remove it by hand from the file:>> d = csvread('sunspots-no-
header.csv');
Specify # of lines to cut ignore in cvsread: >> d = csvread('sunspots.csv', 1);
% ignore first line
5.3 Importing and Exporting Data
>> d = csvread('sunspots.csv', 1)
d = 1749 1 58
1749 2 62.6
1749 3 70
etc.
5.3 Importing and Exporting Data
5.3 Importing and Exporting Data
● importdata command is useful for heterogeneous data.● Returns a data structure:>> d = importdata('sunspots.csv')
d = data: [2820x3 double]
textdata : {'Year', 'Month', ...
colheaders : {'Year', 'Month', ...
Non-numerical ASCII Files
s = 32 67 97 ... % need to munge
this
• txt files : anything we want to treat as text (ASCII characters)
>> fid = fopen('mobydick.txt');
>> s = fread(fid);
>> fclose(fid)
>> s
Non-numerical ASCII Files>> s = char(s') % transpose, textify
ans = Call me Ishmael. Some years ago-never mind how long precisely -having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world....
textread does this for us, and tokenizes words into cell array:>> s = textread('mobydick.txt‘, ‘%s’)s = {‘Call’, ‘me’, ‘Ishmael.’, …
Treat asstrings
5.4 Scripts● You know most of this stuff already ☺● You can run a script (e.g., myprog.m) from the
interpreter:>> myprog
● Tips− Don't name any variables myprog− Don't use any blank spaces in script names− Re-read search path stuff from a few pages
back
5.5 Scripts as Computations● Scripts are (mostly) like typing directly
into the interpreter – so variables can get overwritten
● This also means that there is no ans value:>> x = myprogERROR: loser trying to execute SCRIPT myprog as a program.
● Nor can we pass arguments: >> myprog(7)ERROR: My name is Donnie, and you suck at Matlab.