Post on 14-Jan-2016
description
FORTRAN Short CourseWeek 4
Kate Thayer-CalderMarch 10, 2009
Topics for this week
Searching in Unix
Grep, Regular Expressions
Multi-Dimensional Arrays
User Defined Datatypes
Missing data
Reading and writing scientific data
Unix Wildcards* - matches all files with none or more of the pattern
ls *a returns all files ending in ‘a’
ls a* returns all files starting with ‘a’
? - matches exactly one character
?ouse would return house and mouse, but not grouse.
grepSearches through a file looking for a specific string or pattern, returns the lines where the string occurs
grep -i ‘alien’ ufo.txt (case insensitive)
grep -w ‘abduct’ ufo.txt (whole word only)
grep -riw ‘saucer’ * (recursively thru subdirectories)
or lines where it does not occur:
grep -v ‘censor’ ufo.txt
Can grep multiple files, just add them to the list on the line: grep -i ‘parameterization’ *.txt
Unix lexicon: “Can’t grep dead trees.”
Regular Expressionsaka RegEx, is a special string for describing a pattern of text
RegExs can be used with grep or other unix commands and programs for sifting through text
They can get really huge, confusing, and powerful, we’ll just look at a few simple options.
For more: http://www.regular-expressions.info or just man regex
Grep and Regex
grep foo files search files for lines with “foo”grep ‘^foo’ files “foo” at the start of a linegrep ‘foo$’ files “foo” at the end of a line
grep ‘^foo$’ fileslines containing only the word
“foo”
grep ‘\^foo’ fileslines containing “^foo” (esc
char)grep ‘[fF][oO]o’ files search for foo, Foo, fOo, or FOo
grep ‘^$’ files search for blank lines
grep ‘[0-9][0-9]’ filessearch for a pair of numeric
digits
grep combinations
Just some ideas for using grep with other Unix commands:
ls -al | grep ‘Jan’
ps -ef | grep ‘501’
man ftp | grep -i ‘directory’
head -30 ‘mydata.txt’ | grep ‘temperature’
But...
Searching has become much easier than it once was, usually your desktop search engine will filter through files looking for your keywords
So, let’s talk about more Fortan!
Multi Dimensional Arrays
type, dimension(dim1,dim2,...) :: name
REAL, dimension(lon,lat,height,time) :: temp
Higher dimensional arrays are usually stored contiguously in memory or binary files, in COLUMN MAJOR order
See example Multiarrays.f90
Column MajorFortran fills up each dimension in order So for i,j,k array, i fills first, then j, then kBut do loops work inside outWrite out k first, then j, then iTo fix this, write your do-loops from the last index in to the first.
Do time=1,days Do lon=1,360 Do lat=1,180 Read (10,fmt) Data(lat,lon,time) enddo enddoenddo
Array Transformation
Reshape function is pretty cool
Matrix = RESHAPE( Source, Shape )A = RESHAPE( B, (/3,2/) )
Another way to index your array elements uses ‘mod’ and integer division
lat = array(MOD(i,num_lats)+1)lon = array(i/num_lats + 1)
1 4
2 5
3 6
lons
lats
Allocatable ArraysSometimes, you don’t know how large you want your array to be until runtime.
Fortran 90 has “allocatable arrays” that can be declared without fixed dimensions, and filled in when the program is running.
These can be filled from stdin, or a variable in a file, or a calculation based on previous work, or any other run-time value.
See example Multiarrays2.f90
WHERE statements
An easy way to initialize or set sections of arrays
WHERE (array expression) array assignment block
ELSEWHEREarray assignment block 2
END WHERE
This is called “masking”
FORALL Construct
This statement indicates to the compiler that the operations can be performed in parallel (no operations depend on the value of the operation on other elements in the array)
FORALL (triplet) variable = expression
Atmospheric Data
You’ll see data stored in arrays in many ways:
MyData(pressure, temp, mixingratio, height)
MyPressure(height), MyTemp(height), MyMixingRatio(height)
Pressure(lat,lon,height,time), Temperature(lat,lon,height,time)
The Perils of Parallel ArraysIt is common in our science to see people using multiple arrays of data that are all the same shape but for different variables (Temperature, Pressure, u wind, v wind, ...)
This is considered bad form in computer science, it would be better to have one array with multiple values possible at each point. Why?
This gets confusing if you implement a 5-D array, however.
User Defined Data TypesFortran gives us a nice way to describe more
complex data structures by creating new data types.Instead of 4 arrays with different variables in each, we can have one array with four values at each point.TYPE name
DataType :: Component_name
....
END TYPE name
We can create variables with this type or arrays of variables of this typeTYPE (name) :: VariableName
TYPE (name), Dimension(d1,d2,d3,d4) :: ArrayName
Example: StdAtmos.f90
INF and NaN
INF is defined as the value given to any Real that is outside the limits of the type.
Fortran has +INF and -INF
NaN (Not a Number) is produced as the result of an improper floating point calculation.
NaN is not equal to either INF. In fact, in the IEEE standard, NaN is not even equal to itself.
INF or NaN are occasionally used as placeholders for missing data.
See Example: WriteExample2.f90
Missing DataAny observational dataset is going to have holes.
If missing data is not given as an “outside the bounds” value (-9999 or 9999.0) it is often replaced with INF or NaN.
Most Fortran implementations will read INF or NaN in as a Real value (it is a real Real), we need to check for it before doing calculations, or we’ll get a runtime error.
See Example: ReadBadData.f90
NetCDF DataNetCDF is an I/O library that is widely used in the earth sciences.
Once the files are installed, you can use their procedures to open and access the files
Each files is “self-describing,” all of the data is annotated (dimension, units, range of values, missing data values, etc...)
Examples: read_netCDF.f90 with data from NCEP (NCEP.Precip.0100-1204.nc)
Zonal Average ExampleModelers and Dynamicists like to look at the
atmosphere in latitudinal bands.
Don’t have to worry about missing data here...
Loading in precip data is pretty simple if you know the parameters.
When you do a zonal average, first average in time at each point and then average across all longitudes.
Could come up with a less memory intensive way to get the same result...
Example: PlayWithPrecip.f90
What did we talk about?
Searching in Unix
Grep, Regular Expressions
Multi-Dimensional Arrays
User Defined Datatypes
Missing data
Reading and writing scientific data