Introduction to Field Station Databases John Porter Department of Environmental Sciences University...
-
Upload
sarah-lucas -
Category
Documents
-
view
223 -
download
1
Transcript of Introduction to Field Station Databases John Porter Department of Environmental Sciences University...
![Page 1: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/1.jpg)
Introduction to Field Station
Databases
Introduction to Field Station
Databases
John PorterDepartment of
Environmental SciencesUniversity of Virginia
![Page 2: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/2.jpg)
RoadmapRoadmap
• Why do we need field station databases?
• Challenges for Ecological Databases
• Database characteristics and types
• Evolving a Database• Software Tools and Hardware
• Why do we need field station databases?
• Challenges for Ecological Databases
• Database characteristics and types
• Evolving a Database• Software Tools and Hardware
![Page 3: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/3.jpg)
WHY have Scientific Databases?
• Improvement of data quality– multiple users provides multiple
opportunities for detecting and correcting problems in data
• Cost– data costs less to save than to collect
again– with environmental data, often data
cannot be collected again at any cost
• Improvement of data quality– multiple users provides multiple
opportunities for detecting and correcting problems in data
• Cost– data costs less to save than to collect
again– with environmental data, often data
cannot be collected again at any cost
![Page 4: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/4.jpg)
WHY have Scientific Databases?
• Environmental Policy and Management– environmental policy decisions
require data that are regional or national, but most ecological data is collected at smaller scales
– numerous Federal initiatives•NII - National Information Infrastructure•FGDC - Federal Geographic Data
Committee
• Environmental Policy and Management– environmental policy decisions
require data that are regional or national, but most ecological data is collected at smaller scales
– numerous Federal initiatives•NII - National Information Infrastructure•FGDC - Federal Geographic Data
Committee
![Page 5: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/5.jpg)
WHY have Scientific Databases?
•New Science– Long Term
• long-term studies depend on databases to retain project history
– Synthesis• use of data for a purpose other than
which it was collected
– Integrated, multidisciplinary projects• depend on databases to facilitate sharing
of data
![Page 6: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/6.jpg)
Attracting Researchers
Which do you choose?• Field Station A
– Beautiful mountain forest setting
– Modern Laboratories
• Field Station B– Beautiful mountain
forest setting– Modern
Laboratories– Climate and
Meteorological Data
– Biodiversity Data– Soils Data– Topographic Data
![Page 7: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/7.jpg)
ChallengesChallenges•Resources
– Equipment
•Resources–Operational expenses
•Resources–Personnel
•Resources– Equipment
•Resources–Operational expenses
•Resources–Personnel
![Page 8: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/8.jpg)
Challenges for Scientific Databases
• Long-term perspective – without databases, most data do not
outlive project that collected them– goal: data that is accessible and
interpretable 20-years in the future• technological - need persistent media
that does not become technologically obsolete
•contextual - need to capture context of data collection
•semantic - terms need to be well-defined
• Long-term perspective – without databases, most data do not
outlive project that collected them– goal: data that is accessible and
interpretable 20-years in the future• technological - need persistent media
that does not become technologically obsolete
•contextual - need to capture context of data collection
•semantic - terms need to be well-defined
![Page 9: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/9.jpg)
Challenges for Scientific Databases
• Deal with Diversity– science means asking NEW questions
•new kinds of queries
– scientific data is heterogeneous and diverse
– scientific users have different backgrounds and goals
– the user community for a given database will be dynamic
• Deal with Diversity– science means asking NEW questions
•new kinds of queries
– scientific data is heterogeneous and diverse
– scientific users have different backgrounds and goals
– the user community for a given database will be dynamic
![Page 10: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/10.jpg)
Characteristics of Ecological Characteristics of Ecological DataData
Complexity/Metadata RequirementsComplexity/Metadata Requirements
SatelliteImages
DataDataVolumeVolume(per(perdataset)dataset)
LowLow
HighHigh
HighHigh
Soil CoresSoil Cores
PrimaryPrimaryProductivityProductivity
GISGIS
Population DataPopulation Data
BiodiversityBiodiversitySurveysSurveys
Gene Sequences
Business Data
WeatherStations Most EcologicalMost Ecological
DataData
Most Most SoftwareSoftware
![Page 11: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/11.jpg)
Database Characteristics
“Deep” • Relatively few
kinds of data• Large numbers
of observations• Sophisticated
query and analysis tools
“Deep” • Relatively few
kinds of data• Large numbers
of observations• Sophisticated
query and analysis tools
“Wide”• Many different
types of data• Smaller number
of observations of each type
• Few analysis tools
“Wide”• Many different
types of data• Smaller number
of observations of each type
• Few analysis tools
““Deep” vs “Wide”Deep” vs “Wide”
![Page 12: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/12.jpg)
Examples of Scientific Databases
• Large Databases– GENBANK - genetic sequence data– PDB - protein structure database -
6K+ atomic coordinate entries– funding >$1 million/year– excellent examples of need for
database solutions that scale– substantial focus on specialized
tools and storage
• Large Databases– GENBANK - genetic sequence data– PDB - protein structure database -
6K+ atomic coordinate entries– funding >$1 million/year– excellent examples of need for
database solutions that scale– substantial focus on specialized
tools and storage
![Page 13: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/13.jpg)
Examples of Scientific Databases
• LTER Sites– approximately 15% of site funding– focus on long-term data– diverse approaches to data
management at different sites dictated by • locations of researchers• types of data collected
– testbed for “practical data management”
• LTER Sites– approximately 15% of site funding– focus on long-term data– diverse approaches to data
management at different sites dictated by • locations of researchers• types of data collected
– testbed for “practical data management”
![Page 14: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/14.jpg)
Examples of Scientific Databases
•WWW pages of individual researchers or research projects– can provide access to data – typically do not utilize
standards for metadata (documentation)
– typically provide no query tools
•WWW pages of individual researchers or research projects– can provide access to data – typically do not utilize
standards for metadata (documentation)
– typically provide no query tools
![Page 15: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/15.jpg)
Evolving a Database• Development of a database is an
evolutionary process• Implement system based on current
priorities - but think ahead!• Seek scalable solutions
– avoid bottlenecks– adding the 1000th piece of data should
be as easy as adding the first (or easier)
• Development of a database is an evolutionary process
• Implement system based on current priorities - but think ahead!
• Seek scalable solutions– avoid bottlenecks– adding the 1000th piece of data should
be as easy as adding the first (or easier)
![Page 16: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/16.jpg)
Developing a Database - Questions to Ask
•Why is this database NEEDED?•Who will be the USERS of the
database?•What types of QUESTIONS
should the database be able to answer?
•What INCENTIVES will be available for data providers?
•Why is this database NEEDED?•Who will be the USERS of the
database?•What types of QUESTIONS
should the database be able to answer?
•What INCENTIVES will be available for data providers?
![Page 17: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/17.jpg)
Library Model• Individual with 20 books
– just randomly put on shelves
• Individual with 500 books– sort books on shelves based on topic
or alphabetically
• Library– complex cataloging system– controlled keyword and subject
vocabularies
• Individual with 20 books– just randomly put on shelves
• Individual with 500 books– sort books on shelves based on topic
or alphabetically
• Library– complex cataloging system– controlled keyword and subject
vocabularies
![Page 18: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/18.jpg)
Commonly Used Types of Software
Commonly Used Types of Software
• Input and Analysis tools• Metadata Tools• Information sharing tools –
WWW• Database Management Systems
(DBMS)
• Input and Analysis tools• Metadata Tools• Information sharing tools –
WWW• Database Management Systems
(DBMS)
![Page 19: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/19.jpg)
Input and AnalysisInput and AnalysisSpreadsheets• Good
– Widely used, easy to learn for simple graphical and statistical analyses
– Commonly already installed on most computers
• Bad– Can encourage “bad practices” – create data
that can’t easily be used – Poor support for sophisticated analyses– Lack of auditability – hard to “back track”
how data were manipulated
Spreadsheets• Good
– Widely used, easy to learn for simple graphical and statistical analyses
– Commonly already installed on most computers
• Bad– Can encourage “bad practices” – create data
that can’t easily be used – Poor support for sophisticated analyses– Lack of auditability – hard to “back track”
how data were manipulated
![Page 20: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/20.jpg)
Statistical PackagesStatistical Packages
• Examples: SAS, SPSS, Statistica etc.
• Good– Powerful analysis tools– Auditable: Can store programs –
fully document details of analysis• Bad
– Harder to learn– Less common on computers – Can be expensive
• Examples: SAS, SPSS, Statistica etc.
• Good– Powerful analysis tools– Auditable: Can store programs –
fully document details of analysis• Bad
– Harder to learn– Less common on computers – Can be expensive
![Page 21: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/21.jpg)
Other Input Other Input
• DBMS – Database Management Systems– We’ll talk more about these later…..
• DBMS – Database Management Systems– We’ll talk more about these later…..
![Page 22: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/22.jpg)
Database Management
System (DBMS) Types • Filesystem-based
– simple– inefficient– few capabilities
• Hierarchical– phylogenetic
structures– geographical images
• Network– very flexible– not widely used
• Filesystem-based– simple– inefficient– few capabilities
• Hierarchical– phylogenetic
structures– geographical images
• Network– very flexible– not widely used
• Relational– widely-used, mature– table-oriented– restricted range of
structures
• Object-oriented– developing -few
commercial implementations
– diverse structures– extensible
• Relational– widely-used, mature– table-oriented– restricted range of
structures
• Object-oriented– developing -few
commercial implementations
– diverse structures– extensible
![Page 23: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/23.jpg)
DBMS Advantages and Disadvantages
•Advantages– additional
capabilities•sorting•query•integrity checking
– easy access to data
•Advantages– additional
capabilities•sorting•query•integrity checking
– easy access to data
• Disadvantages– few graphical or
statistical capabilities
– proprietary formats may limit archival quality of data
– require expertise and resources to administer
• Disadvantages– few graphical or
statistical capabilities
– proprietary formats may limit archival quality of data
– require expertise and resources to administer
![Page 24: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/24.jpg)
Choosing a DBMS• What tasks to do you want the
DBMS to accomplish?– query– sorting– analysis
• Is there a type of DBMS whose structure best mirrors that of the underlying data?
• What tasks to do you want the DBMS to accomplish?– query– sorting– analysis
• Is there a type of DBMS whose structure best mirrors that of the underlying data?
![Page 25: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/25.jpg)
Database Management Systems
Database Management Systems
• Commercial Products– Microsoft ACCESS (part of
Microsoft Office)– Microsoft SQLserver– Oracle
• Freeware– MySQL– PostgreSQL– MiniSQL
• Commercial Products– Microsoft ACCESS (part of
Microsoft Office)– Microsoft SQLserver– Oracle
• Freeware– MySQL– PostgreSQL– MiniSQL
![Page 26: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/26.jpg)
DBMS BackendsDBMS Backends• Increasingly DBMS are being used as
tools that support the “behind the scenes” activities in support of web sites– You may not interact with the database
itself, but rather with a TOOL that interacts with the database
• Tools such as Content Management Systems (CMS) use programs that in turn use DBMS to perform their functions
• Increasingly DBMS are being used as tools that support the “behind the scenes” activities in support of web sites– You may not interact with the database
itself, but rather with a TOOL that interacts with the database
• Tools such as Content Management Systems (CMS) use programs that in turn use DBMS to perform their functions
![Page 27: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/27.jpg)
Information Sharing Tools
Information Sharing Tools
•WWW servers– Apache Web Server
•Free•Based on open standards•Runs on PCs, Macintosh and Unix
– Microsoft Web Server•Free, often distributed with Windows•Links to Microsoft tools•Proprietary - runs only under Windows
•WWW servers– Apache Web Server
•Free•Based on open standards•Runs on PCs, Macintosh and Unix
– Microsoft Web Server•Free, often distributed with Windows•Links to Microsoft tools•Proprietary - runs only under Windows
![Page 28: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/28.jpg)
WWW ServersWWW Servers• Need dedicated Internet address that
is connected to the network all the time – A high-speed connection is desirable
• Need space to store web content• The web server need not be local
– Locally-created WWW pages can be uploaded to a remote server•e.g., field station can use server at main
university campus and use a modem or even floppy disks to transfer content
• Need dedicated Internet address that is connected to the network all the time – A high-speed connection is desirable
• Need space to store web content• The web server need not be local
– Locally-created WWW pages can be uploaded to a remote server•e.g., field station can use server at main
university campus and use a modem or even floppy disks to transfer content
![Page 29: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/29.jpg)
What are the “Best Software”?
What are the “Best Software”?
• SORRY! – there is no one list that is the correct answer for everyone!
• A knowledgeable user, rather than the particular software used, controls what can be accomplished
• Costs– Cost of software– Cost of administration– Life-cycle costs– Costs of migration
• SORRY! – there is no one list that is the correct answer for everyone!
• A knowledgeable user, rather than the particular software used, controls what can be accomplished
• Costs– Cost of software– Cost of administration– Life-cycle costs– Costs of migration
![Page 30: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/30.jpg)
Computer Systems• UNIX/Linux
– mature, full-functioned system
• strong on multitasking
• more reliable and robust
– steep learning curve
– lots of free software
– software can be expensive
– wide array of WWW tools
• UNIX/Linux– mature, full-
functioned system• strong on
multitasking• more reliable and
robust
– steep learning curve
– lots of free software
– software can be expensive
– wide array of WWW tools
• PCs & Macs– rapid improvements
in operating system design facilitate network access
– software & hardware inexpensive
– tools are more user-friendly
– number of tools rapidly growing
• PCs & Macs– rapid improvements
in operating system design facilitate network access
– software & hardware inexpensive
– tools are more user-friendly
– number of tools rapidly growing
![Page 31: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/31.jpg)
Cautionary Notes -
Lessons from the Worm
Community System
![Page 32: Introduction to Field Station Databases John Porter Department of Environmental Sciences University of Virginia John Porter Department of Environmental.](https://reader031.fdocuments.in/reader031/viewer/2022031922/56649e195503460f94b06821/html5/thumbnails/32.jpg)
Final Thoughts• Ecological
databases are increasingly setting the boundaries for science itself
• Databases evolve, but they don’t spontaneously generate
• Ecological databases are increasingly setting the boundaries for science itself
• Databases evolve, but they don’t spontaneously generate
ConnectivityConnectivityContentContent
OrganizationOrganization
Database Building Database Building BlocksBlocks
Database Building Database Building BlocksBlocks