CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.
-
Upload
rodger-george -
Category
Documents
-
view
214 -
download
0
Transcript of CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.
![Page 1: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/1.jpg)
CS410/510: SciData Management
1
Scientific Data Management
Dr. Laura BrightBill Howe
![Page 2: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/2.jpg)
CS410/510: SciData Management
2
Biology
Old way: Wet lab chemistry
New way: Microarray Search GenBank,
Ensembl, GDB, SwissProt, Entrez using BLAST, FASTA, GCG, EMBOSS
![Page 3: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/3.jpg)
CS410/510: SciData Management
3
Astronomy
Old way: Sign up for telescope time
New way: Sloan Digital Sky Survey
Systematically mapping ¼ of the entire sky
12 TB to date, 15 TB final in 2007
![Page 4: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/4.jpg)
CS410/510: SciData Management
4
Oceanography
Old way: Field work Simplified
Calculations
New way: Finite Element
Analysis In situ sensors CODAR
![Page 5: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/5.jpg)
CS410/510: SciData Management
5
Science is Changing
Old Science: “Query the world” Data acquisition is the dominant cost
New Science: “Download the world” Data analysis is the dominant cost
![Page 6: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/6.jpg)
CS410/510: SciData Management
6
Course Structure
10% In-class exercises10% Study Questions40% Homework Assignments15% Mini-project25% Short Paper (3 pages)
No exams
![Page 7: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/7.jpg)
CS410/510: SciData Management
7
Short Paper Assignment (1/2)
To be completed individually!Compare/Contrast a pair of papers We provide a list to choose from
![Page 8: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/8.jpg)
CS410/510: SciData Management
8
Short Paper Assignment (2/2)
25% = 3 milestones + final paper 2 points: select paper pair. (~ week 3) 5 points: a half-page summary of each
paper; one page total. (~ week 5) 3 points: a list of 3 points of
contrast/comparison, in complete sentences. (~ week 7)
15 points: Final paper (~ week 11) Both content and mechanics matter!
![Page 9: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/9.jpg)
CS410/510: SciData Management
9
Study Questions
Covers the readingsDiscussion ok, but write up your own answers Dr. Bright’s “Pizza rule” Try to keep the discussion on the list
3-4 questions per set, about 1 set per weekDetails: About a paragraph; use complete sentences Feel free to use diagrams or figures when
appropriate! Due at the beginning of class on the due date
![Page 10: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/10.jpg)
CS410/510: SciData Management
10
Homework Assignments
Covers Tools (rather than readings)To be completed individually!Send questions to the instructors rather than the list
![Page 11: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/11.jpg)
CS410/510: SciData Management
11
Late work
Prior approval is necessary, but not always sufficient
![Page 12: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/12.jpg)
CS410/510: SciData Management
12
Course Web Page
http://www.cs.pdx.edu/~howe/cs410
We hope to post class materials at least an hour before class (no promises)Extra copies of printed material will be available outside Dr. Bright’s office (FAB 310-24)
material web page hard copy
lectures Yes No
readings available online Yes No
copy-sensitive readings No Yes
study questions Yes Yes
homework Yes Yes
![Page 13: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/13.jpg)
CS410/510: SciData Management
13
Office Hours
Howe: FAB 310-C Monday 4-6 (or by appointment)
Bright: FAB 310-24 Thursday 1-3 (or by appointment)
![Page 14: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/14.jpg)
CS410/510: SciData Management
14
![Page 15: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/15.jpg)
CS410/510: SciData Management
15
Course Email List
“scidata”
Ok to discuss study questionsNot ok to discuss homework answersSend HW Questions to instructors
https://webmail.cecs.pdx.edu/mailman/listinfo.cgi/scidata
![Page 16: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/16.jpg)
CS410/510: SciData Management
16
Academic Integrity
2004-2005 PSU Catalog pages 29-30Posted on the web page
![Page 17: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/17.jpg)
CS410/510: SciData Management
17
A First Class Exercise1) Name (feel free to add pronounciation hints!)2) Email you wish to use for this class3) How much experience with RDBMS?
(A) What’s an RDBMS? (B) I’ve taken CS 386, but that’s it (C) I’ve used an RDBMS on a few projects (D) I write SQL semi-daily (E) I’m a DBA
4) How might Scientific Data Management be different than “regular” data management?
![Page 18: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/18.jpg)
CS410/510: SciData Management
18
(Scientific Data) Management
Interesting data types Gene sequences, spatio-temporal objects, scalars, vectors, tensors map layers, images, meshes unstructured metadata
Interesting Scale Terabytes becoming Petabytes
Interesting Access patterns Data “products” Data “releases”
![Page 19: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/19.jpg)
CS410/510: SciData Management
19
Scientific (Data Management)
Readings drawn from database literatureWe will consider: Conventional technology
Relational databases Web Services/XML
Specialized technology GIS Grid Workflow Visualization
Emphasis on Case Studies
![Page 20: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/20.jpg)
CS410/510: SciData Management
20
Characterizing SDMS (1/3)
What logical data types are involved? DNA sequences, maps of the earth, rivers, lakes maps of the sky, galaxies, stars Particle trajectories
What physical data types are involved? Multimedia? Multidimensional arrays? Spatio-temporal objects? “ordinary” tuples?
![Page 21: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/21.jpg)
CS410/510: SciData Management
21
Characterizing SDMS (2/3)
Who are the Customers? Other Researchers General Public Policy Makers Emergency Workers Commercial
![Page 22: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/22.jpg)
CS410/510: SciData Management
22
Customers?
![Page 23: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/23.jpg)
CS410/510: SciData Management
23
Characterizing SDMS (3/3)What is the Architecture? Pipeline (Workflow) Archive (Database) Clearinghouse (Portal)
What Interfaces are supported? Browse Query Upload Derive Script (Web Services)
![Page 24: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/24.jpg)
CS410/510: SciData Management
24
More Examples
geodata.gov governmental GIS clearinghouse
EOSDIS NASA’s satellite image repository
IOOS Ocean measurement and forecasting
Others?
![Page 25: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/25.jpg)
CS410/510: SciData Management
25
![Page 26: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/26.jpg)
CS410/510: SciData Management
26
National Weather Service: Timeline
1849: Smithsonian Institution provides weather instruments to telegraph operators 1900: Galveston Hurricane1935: Long range forecasts; buoys1955-1960: Computer forecasts scheduled regularly; weather satellite TIROS I launched.1979: AFOS Computer system is deployed, connecting all Weather Service forecast offices.1988: Weather Service mobilizes local forecasting operation to assist in fighting week-long wildfire in Yellowstone park1990: NEXRAD Radar deployment project; a Cray supercomputer deployed
![Page 27: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/27.jpg)
CS410/510: SciData Management
27
National Weather Service
Data Collection Radar Satellite Forecasts Bulletins
Data Dissemination Radio: aviation, marine, military channels FTP, HTTP, email, RSS: public
Part of a UN sponsored Gobal network
![Page 28: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/28.jpg)
CS410/510: SciData Management
28
National Weather Service: Network
![Page 29: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/29.jpg)
CS410/510: SciData Management
29
The Gateway
NWS: Gateway
Public
Anonymous FTPFTPMail
“Family of Services”(Direct phone line)
http web services (XML/SOAP)
web form
emailftp
bulletins
RSS
radarsatellite buoys
models
![Page 30: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/30.jpg)
CS410/510: SciData Management
30
National Weather Service: Products (1/2)
Computer Models GRIB files from 10+ models from regional to global
scale Example:SL.008001/ST.opnl/MT.ruc_CY.06/RD.20000622/PT.grid_DF.gr1/
fh.0003x_tl.press Facsimile/Images Text products derived from models Special products in special formats
Text Products - Warnings, outlooks, advisories, forecast, discussion ~100 different types
![Page 31: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/31.jpg)
CS410/510: SciData Management
31
National Weather Service: Products (2/2)
Observed Data - kept for 24 hours at least observations from aviation, buoys, ships, balloons special formats, but some have parsed them to XML
Radar Products - Multicast by connecting a router directly to NWS as well as FTP SL.us008001/DF.of/DC.radar/DS.p19r1/SI.kfws/sn.0114
Satellite Products – Cloud Water Vapor, Cloud Liquid Water, Rain Rate, Sea Ice
Concentration, Sea Ice Age, Sea Ice Edge, Soil Moisture, Surface Wind, Water Vapor over oceans, Surface Temperature, Snow Water Content, Cloud Amount, and EDR Surface Type
![Page 32: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/32.jpg)
CS410/510: SciData Management
32
National Weather Service: Radar
![Page 33: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/33.jpg)
CS410/510: SciData Management
33
National Weather Service: Forecasts (1/3)
Several Climate Models: Weather Research and Forecast (WRF) Global Forecast System (GFS) North American Mesoscale (NAM) Nested Grid Model (NGM)
Specialized Models: Fire Weather Hurricane Aviation
![Page 34: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/34.jpg)
CS410/510: SciData Management
34
National Weather Service: Forecasts (2/3)
National Digital Forecast Database 3 hr temporal resolution 5km spatial resolution GRIB files, GIS map layers, data
products
![Page 35: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/35.jpg)
CS410/510: SciData Management
35
National Weather Service: Forecasts (3/3)
Model Output Statistics (MOS) Examples:
Max/Min Temperature Forecasts Surface Temp / Dewpoint Forecasts Opaque Cloud Amount Probability of Precipitation Severe weather probabilities
MOS products
![Page 36: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/36.jpg)
CS410/510: SciData Management
36
National Weather Service: Satellites
Geostationary Operational Environmental Satellites
Variety of images and products
![Page 37: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/37.jpg)
CS410/510: SciData Management
37
National Weather Service: Summary
Domain?Customers?Architecture?Interfaces?
![Page 38: CS410/510: SciData Management1 Scientific Data Management Dr. Laura Bright Bill Howe.](https://reader035.fdocuments.in/reader035/viewer/2022062517/56649e835503460f94b84efb/html5/thumbnails/38.jpg)
CS410/510: SciData Management
38