CIS52 – File Manipulation
description
Transcript of CIS52 – File Manipulation
![Page 1: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/1.jpg)
1© 2001 John Urrutia. All rights reserved.
CIS52 – File Manipulation
File Manipulation Utilities Regular Expressions
sed, awk
![Page 2: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/2.jpg)
2© 2001 John Urrutia. All rights reserved.
Overviewcomm – comparison of sorted filescut – output sections of lines in a filefind – find files that match a patternpaste – merges records in filespr – paginate files into pagestr – translate or delete characters
![Page 3: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/3.jpg)
3© 2001 John Urrutia. All rights reserved.
Overviewregular expressionssed – Stream Editor (batch file editor) awk – Aho,Weinberger,Kernighan (Pattern
match)
![Page 4: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/4.jpg)
4© 2001 John Urrutia. All rights reserved.
The comm before the stormCompares 2 sorted files
Results reported in 3 columns1st – records found only in file 12nd – records found only in file 23rd – records that match in both files
Options remove corresponding columns – [1] [2] [3]
![Page 5: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/5.jpg)
5© 2001 John Urrutia. All rights reserved.
comm – cont.Either file name can be substituted
with standard input
Example:File1 File2
aa bbdd ccee ddgg eehh ff
![Page 6: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/6.jpg)
6© 2001 John Urrutia. All rights reserved.
comm resultsFile1 File2 Bothaa
bbcc
ddee
ffgghh
option -1
bbcc
ddee
ff
option -2aa
ddee
gghh
option -12ddee
![Page 7: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/7.jpg)
7© 2001 John Urrutia. All rights reserved.
cut to the chaseAllows you to extract portions of
each record in a file.
Delimits data in the file into fields or columns.Default delimiter is the tab character
Can be changed by the –d option
![Page 8: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/8.jpg)
8© 2001 John Urrutia. All rights reserved.
cut cont.cut - [b | c | [ f [-d char] [-s] ] list
[--output-delimiter=string]b – bytes
c – characters (same as bytes)
f – fieldsd – delimiter characters– display only records with
delimiters
![Page 9: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/9.jpg)
9© 2001 John Urrutia. All rights reserved.
cut ! printchar – single byte used to delimit
fields in a record
list – list of range/s of characters to displayRanges are comma separated.
1-7 first 7 characters in record
1,7 first and seventh characters
![Page 10: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/10.jpg)
10© 2001 John Urrutia. All rights reserved.
cut ! print againstring – list of characters to
substitute for the delimiters.
![Page 11: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/11.jpg)
11© 2001 John Urrutia. All rights reserved.
cut - Example
[/@linux2 uid]$ cat file1The quick brown fox eyed the jactitating dog[/@linux2 uid]$ cut –f1,3,5,8 –d’ ‘ file1The brown eyed dog[/@linux2 uid]$ cut –f1,4-6,8 –d’ ‘ file1The fox eyed the dog
![Page 12: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/12.jpg)
12© 2001 John Urrutia. All rights reserved.
find that pot of goldfind – selects all files that meet the
selection criteria in the expressionNo action is taken unless it is specified
Sub-directories are scanned automatically
The expression can be simple or complex
![Page 13: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/13.jpg)
13© 2001 John Urrutia. All rights reserved.
find me somethingThe criteria expression:
And’s each operand separated by a space
Or’s each operand separated by –o
Processes left to right sequentially
![Page 14: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/14.jpg)
14© 2001 John Urrutia. All rights reserved.
find criteria continuedActions
-print prints the path of all files that meet the selection criteria
-exec cmds\; executes the commands before the \:
-ok same as –exec but must have a Y from stdin.
![Page 15: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/15.jpg)
15© 2001 John Urrutia. All rights reserved.
find criteria continued again
Evaluations-type specify a type of file (ie. directory)
-atime ±n accessed ±n days ago.
-mtime ±n modified ±n days ago.
-user uid owner of the file
-nouser uid owner is not known to system
![Page 16: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/16.jpg)
16© 2001 John Urrutia. All rights reserved.
paste tastes goodpaste [options] [filelist]
each record in the file is merged into 1 record-s process filelist sequentially. All
records are processed before going to the next file
-d [delimiter list] each character in turn delimits the file records.
![Page 17: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/17.jpg)
17© 2001 John Urrutia. All rights reserved.
paste continued[/@linux2 uid]$ cat file1
ABC
[/@linux2 uid]$ cat file2123
[/@linux2 uid]$ cat file3xyz
![Page 18: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/18.jpg)
18© 2001 John Urrutia. All rights reserved.
paste continued
[/@linux2 uid]$ paste file1 file2 file3
Output file
A 1 xB 2 yC 3 z
[/@linux2 uid]$ paste –s file1 file2 file3
Output file
A B C1 2 3x y z
![Page 19: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/19.jpg)
19© 2001 John Urrutia. All rights reserved.
pr – public relations--NOTpr paginate file(s) for printing
Can specify page attributesChanged lines through the –l option
For multiple files each starts a new page
![Page 20: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/20.jpg)
20© 2001 John Urrutia. All rights reserved.
pr – continuedpr paginate a file for printing
Creates a header and trailerChanged through the –h optionSuppress through the –t option
Can create columns of data–nbr Number of columns per line–Sx Character used to separate
columns
![Page 21: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/21.jpg)
21© 2001 John Urrutia. All rights reserved.
pr – continuedCan create numbers for each line
–nckc - character data separator
default is tab characterk – number of digits
![Page 22: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/22.jpg)
22© 2001 John Urrutia. All rights reserved.
Regular ExpressionsA set of characters that define the
criteria used to identify a string within a record.
Used by vi, grep, sed, awk, and others.
![Page 23: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/23.jpg)
23© 2001 John Urrutia. All rights reserved.
tr – Translate thistr – [c] [d] [s] [t] set1 [ set2 ]
Translate from set1 to set2c – compliment of set1
d – delete characters found in set1
s – squeeze out duplicates
t – truncate set1 to length of set2
![Page 24: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/24.jpg)
24© 2001 John Urrutia. All rights reserved.
Regular ExpressionsSimple strings
Bound by / … /Interpreted literallyie. /e D/ - matches exactly e D
Taste Dee – OK Taste don’t – not OK
![Page 25: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/25.jpg)
25© 2001 John Urrutia. All rights reserved.
Regular ExpressionsThe • special single sub character
Matches any single character
ie. – /.eny/ matches Aeny Beny Ceny
The [ char-range ] define a character class
The [^ char-range ] define the not-in-character class
![Page 26: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/26.jpg)
26© 2001 John Urrutia. All rights reserved.
Regular ExpressionsThe
(asterisk)Matches 0 or more of the preceding character.
What’s this?
/. // [ a-zA-Z ] /
/ ([^)] )/
![Page 27: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/27.jpg)
27© 2001 John Urrutia. All rights reserved.
Regular Expressions
The /^ (for the rabbit) characterIn the beginning …
The $/ (for the teacher) characterAt the end …
![Page 28: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/28.jpg)
28© 2001 John Urrutia. All rights reserved.
Regular ExpressionsQuote the raven – backslash
\. This yields •
\\ This yields \
\* This yields *
\[ This yields [
\] This yields ]
\ / This yields /
![Page 29: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/29.jpg)
29© 2001 John Urrutia. All rights reserved.
sed – the old Stream EDitor sed [-n] [-fscript ] [file-list]
Copies and edits to standard output
Edits file(s) in a non-interactive mode
Gets its instructions from a script file–f filename contains sed instructions
No option 1st command argument is used
–n suppress stdout unless specified
![Page 30: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/30.jpg)
30© 2001 John Urrutia. All rights reserved.
sed – the old mill stream Record processing
1. Read record from file list
2. Read record from script (or cmd line)
3. Apply selection criteria
4. If selected perform instructionand repeat 2 4 until no more script
5. Repeat 1 5 until no more file list.
![Page 31: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/31.jpg)
31© 2001 John Urrutia. All rights reserved.
He sed what!!??Instruction format
[addr1 ] ,addr2 ] ] inst [arg-list]
AddressA line number
Regular expression
Addr1 – start
Addr2 – stop
![Page 32: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/32.jpg)
32© 2001 John Urrutia. All rights reserved.
Address line numbers$ Designates the last line of the last file
1st address line numberStarts selecting records based on their
position in the input file list relative to 1.
2nd address line numberStops selecting records when position in
the input file list is > than the line number.
![Page 33: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/33.jpg)
33© 2001 John Urrutia. All rights reserved.
He sed some moreInstructions
! – Not negates the address selection sed ‘!/line/ p’ file.list
{…} – Groups the instructions for the address selection
![Page 34: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/34.jpg)
34© 2001 John Urrutia. All rights reserved.
sed Instructionsp – Print now and continue
d – Delete and get the next record
q – Quit processing; Stop; Go Away
![Page 35: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/35.jpg)
35© 2001 John Urrutia. All rights reserved.
sed Instructionsc – Change
[addr1] [addr2] c\ yada yada yadaall selected records are replaced as a group by the change value
a – Append[addr1] a\ …
add the text to the end of the selected records
![Page 36: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/36.jpg)
36© 2001 John Urrutia. All rights reserved.
sed Instructionsi – Insert
[addr1] a\ … add the text to the beginning of the selected records
n – Next[addr1] n
writes the current, gets the next and continues the script
![Page 37: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/37.jpg)
37© 2001 John Urrutia. All rights reserved.
sed Instructionsw – Write
[addr1] [,addr2] w filename
writes the selected records to a file
r – Read[addr1] r filename
reads records from the filename and appends them to the selected record
![Page 38: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/38.jpg)
38© 2001 John Urrutia. All rights reserved.
sed Instructionss – Substitute
[addr1] [,addr2] s/ptrn /repl /[g] [p] [w f ]for each selected record match the pattern and replace
g – Replace all non-overlapping occurrences
p – Print the record
w – write the record to the filename
![Page 39: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/39.jpg)
39© 2001 John Urrutia. All rights reserved.
Hawk – Squawk – awk The programmable utility that does everything.
Aho – Weinberger – Kernighan
Provides:Conditional execution
Looping
Handles:Numeric & string variables
Regular expresions
C print facilities
![Page 40: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/40.jpg)
40© 2001 John Urrutia. All rights reserved.
awkawk [–Fc] [–f] program-file [ file list ]
F – field delimiter character
f – name of the awk program file
program-file instream instructions
List of files to process
![Page 41: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/41.jpg)
41© 2001 John Urrutia. All rights reserved.
awk – program linespattern [ action ]
Like sed pattern selects records
Record processing is the same as sed
![Page 42: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/42.jpg)
42© 2001 John Urrutia. All rights reserved.
awk – patternPatterns follow regular expression format.
~ Tests for match to regular expression
!~ Tests for NO match to regular expression
, – Establishes a pattern range all records are processed inclusively within the range
BEGINexecutes before the first record is processed
ENDexecutes after the last record is processed
![Page 43: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/43.jpg)
43© 2001 John Urrutia. All rights reserved.
awk – relational operators< – less than
<= – less than or equal to
== – equal to
!= – not equal to
>= – greater than or equal to
> – greater than
![Page 44: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/44.jpg)
44© 2001 John Urrutia. All rights reserved.
awk – operatorsArithmetic
+ – addition
- – subtraction
* – multiplication
/ – division
Assignment= – assigns value to the left
+= – adds value to the left
![Page 45: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/45.jpg)
45© 2001 John Urrutia. All rights reserved.
awk – boolean operators&& – and
|| – or
! – not
![Page 46: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/46.jpg)
46© 2001 John Urrutia. All rights reserved.
awk – actions# - Comment to the right on any line
Default action is print to stdout
Multiple actions can be takenUse {…} to enclose multiple actions
Separate actions with ;
![Page 47: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/47.jpg)
47© 2001 John Urrutia. All rights reserved.
awk – actionsprint variable …
Var , Var2 , Var3Prints variables separated by delimiter
Var Var2 Var3NO separators
“literal value “Prints exactly everything between the “ “
![Page 48: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/48.jpg)
48© 2001 John Urrutia. All rights reserved.
awk – actionsprintf “cntl string” variable …
Control String\n – new line\t – tab
%[-] [n] [.d] conv char- left justificationn number of character.d decimal positions
![Page 49: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/49.jpg)
49© 2001 John Urrutia. All rights reserved.
awk – actions%[-] [n] [.d] conv char
- left justificationn number of character.d decimal positionsconv char – conversion character
d - decimal, e - exponent, f - floating-pointo - octal, x - hexadecimals - string
![Page 50: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/50.jpg)
50© 2001 John Urrutia. All rights reserved.
awk – variablesawk provided variables
NF – total number of fields
$1…$n – each field in the current record
FS – input field separator (default space or tab )
OFS – output field separator (default space )
![Page 51: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/51.jpg)
51© 2001 John Urrutia. All rights reserved.
awk – variablesawk provided variables
NR – current record number
$0 – entire current record
RS – record separator (default newline )
ORS – output record separator (default newline )
FILENAME – name of current input file
![Page 52: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/52.jpg)
52© 2001 John Urrutia. All rights reserved.
awk - variablesAssociative Arrays
array_name [ string ]The array name should be meaningfulThe index of the array is a stringElements are automatically created
for ( element in array ) actions
![Page 53: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/53.jpg)
53© 2001 John Urrutia. All rights reserved.
awk - functionslength(string) – returns the number of
characters in string
int(num) – returns the integer portion
index(str1,str2) – returns the index of str2 found in str1 or 0 if not present
split(str,arr,del) – populates arr[ ] from fields in str delimited by del – returns count of elements.
![Page 54: CIS52 – File Manipulation](https://reader035.fdocuments.in/reader035/viewer/2022062802/56814471550346895db10530/html5/thumbnails/54.jpg)
54© 2001 John Urrutia. All rights reserved.
awk - functionssprintf(fmt , args) – formats args using
the fmt and returns the formatted string.
substr(str , pos , len) – returns a substring of str starting with position pos for a length of len.