Cryptography Lecture 3 Stefan Dziembowski [email protected].
Compilation and Design of a Functioning Distributed Database of North American Electric Generating...
-
Upload
sara-armstrong -
Category
Documents
-
view
216 -
download
0
description
Transcript of Compilation and Design of a Functioning Distributed Database of North American Electric Generating...
Compilation and Design of a Functioning Distributed Database of North American Electric Generating Emissions
Stefan FalkeCenter for Air Pollution Impact and Trend Analysis
Washington University in St. Louis
Gregory StellaAlpine Geophysics, LLC
Terry KeatingUS EPA – Office of Air & Radiation
Demonstration of a Distributed Emissions Inventory using Web Technologies
2
Background
In support of this longer term goal, the Commission on Environmental Cooperation (CEC) and the US EPA have initiated a project to develop a prototype web tool for enabling uniform access to distributed emissions data from North American electricity generating power plants.
Air pollutant emission inventories for the US, Canada, and Mexico are compiled, stored and disseminated using different methods
The development of a single comprehensive and accurate emissions inventory is essential for the coordinated reporting, policy development, transport analyses, and socio-economic studies that create an environment for collaboration among international researchers, policy-makers, and the interested public
3
Distributed Data and Management Networks
• CyberinfrastructureNSF’s initiative to apply new IT to building new ways of conducting collaborative research
http://www.communitytechnology.org/nsf_ci_report/
• Earth Observation SummitInternational effort to build comprehensive, coordinated, and sustained Earth observation systems
http://www.earthobservationsummit.gov
• EcoinformaticsEPA’s vision for national and international cooperation in data and technology development
http://oaspub.epa.gov/sor/user_conference$.startup
Advances in information science and technology are driving the trend toward distributed networks and virtual communities for science and management.
Integrated Ocean Observing SystemInternational network of ocean related monitoring, assessment, and communication http://www.ocean.us/
Linked Environments for Atmospheric Discovery Network of high-performance computers and software to gain new insights into weather http://lead.ou.edu/
Virtual ObservatoryNetwork for astronomical data sharing and distributed analysis http://www.us-vo.org/
4
Emissions Community Collaborative Activities
NIF Data standards► Standard format and submission► NEI XML schema
Environmental Information Exchange Network► Network linking EPA, States, and other partners through the Internet and
standardized data formats
Facility Registry System ► Standard facility codes and locations
Data Sharing Efforts► States, Tribes, Local agencies, RPOs ► North America
5
…is both a conceptual framework and implementation effort for the development of a fully integrated, distributed air emissions inventory – and the foundation for an all-media environmental information network
– Tie together data at all spatial and temporal scales using emerging distributed database technologies
– Provide shared, online tools for processing and analysis– Provide for the seamless merging, manipulation and analysis of Internet accessible air
quality-relevant data through the development of emerging Internet-oriented technologies – Make use of existing resources – link and partner with other efforts– Build a broad-based air quality user community: scientists, regulators, policy analysts and
the public – Create the network and toolkit piece-wise through multiple, connected projects
Networked Environmental Information System for Global Emissions Inventories
NEISGEI
Ongoing Efforts:NSF-EPA Digital Government Funded Projects:
The California Air Resources NetworkFire & Air Quality Data and Tools Network
Future Effort:EPA OAR RFA on Distributed Air Quality Data in Support of NEISGEI
6
CAREN: The California Air Resources Network
US EPA
RPOs
AQMDs
Municipalities
TribesStates
???
Environmental data sharing among international, national, state and local governments, the public and academic and other non-governmental research organizations is a difficult challenge.
► Barrier: Technological incompatibilities► Barrier: Data format incompatibilities► Barrier: Financial (staff time) limitations
Eduard Hovy, Jose-Luis Ambite, Andrew Philpot USC Information Sciences Institute
The Solution Strategy (First Step):Automate the integration of heterogeneous databases
Use semi-automated information integration methods to generate translation protocols between related information sources, e.g.AQMD and CARB.
7
Fire and Air Quality Data Network
Data Catalog
FS Coarse Spatial Data
NEI Fire Emissions
BLM Fire History
Wildland Fire Assessment System
HMS Fire Detection
Spatial Interpolation
Service
VIEWS
CAPITA
Data Wrappers
ftp
text tables
RDBMS
Data Sources
Data ‘wrappers’ are used to translate the format of data sets into a uniform format. The data are either stored on the CAPITA database server or dynamically accessed from its original source The datasets are registered with metadata in the data catalog GIS-type interfaces provide users with ways to view, analyze, and export the data
8
Envisioned Emissions Community Resource of Data & Tools
XML
GIS
EstimationMethods
RDBMS
GeospatialOne-Stop
TransportModels
EmissionsInventoryCatalog
Users &Projects
Web Tools/Services
Emissions Inventories
Data Data Catalogs
Activity Data
Spatial Allocation
Comparison of Emissions
Methods
Data Analysis
Model Development
Wrappers
Emissions Factors
Surrogates
ReportGeneration
9
Current Project Objectives
The project’s focus is on criteria pollutants and toxics because of their availability and accessibility.
Recommend and demonstrate to the CEC approaches for the comparability of techniques and methodologies for data gathering and analysis, data management, and electronic data communications for promoting access to publicly available electric utility emissions
Identify, collect, and review existing sources of electric generating utility (EGU) emissions and activity databases, and provide a summary of the state-of-science
Build a prototype web browser tool to query, retrieve, and explore emissions data from heterogeneous databases
• Demonstrate the utility of new information technologies in creating an integrated network of distributed data and tools• Dynamically link multiple existing data without requiring substantial modification on the provider’s end and provide interfaces that make the links transparent to the end user.• “Add value” to the linked data through the application of data analysis and processing tools
10
Design Objectives
Distributed Non-intrusive to data provider Transparent to end user Flexible Extendable Light on user requirements
11
Process of Building Demonstration Identify and access relevant data (build wrappers)
Build relational database to temporarily store the data that are not accessible in a distributed manner
Acquire authorization and access to those data that are dynamically accessible through internet interfaces
Create field name “mappings” among datasets
Identify available web technologies for building a distributed emissions tool
Develop new components necessary for the prototype
Build web tool prototype for demonstrating the feasibility of exploring emissions data
12
Available Internet Accessible Emissions DataData Source Time Coverage Pollutants Reporting Level
NEI (US)1985-1999 (criteria)1996-1999 (HAPs)
NOx, SO2, CO, PM, VOC, HAPs
Boiler
eGrid (US) 1996-2000 NOx, SO2, CO2, Mercury Boiler & Generator
Clean Air Markets (US)
1980, 1985, 1988-1999 NOx, SO2, CO2
Generator
NPRI (Canada) 1994-2001HAPs (Criteria starting in 2002)
Facility
These are publicly available, on-line accessible emissions data. Other data resources are available, but at this time only in hard copy form and therefore not usable in demonstrating distributed database concepts.
• NEI, NPRI, and eGrid data were downloaded and stored in relational databases on the CAPITA server• BRAVO Mexican emissions data were obtained in electronic format and imported to the CAPITA server• Clean Air Markets was identified as the source most suitable for demonstrating distributed access
13
Emissions Data Characteristics
Web browser queryWeb map serverRecent database structure upgrade
Web browser queryRemotely accessible using SecuRemoteNot yet publicly accessible
Web browser queryRemotely accessible using SecuRemoteOracle database
Downloadable Excel SpreadsheetsPlans for a dynamic web system were shelved
14
Database Fields Mapping
Emissions inventories are based on different underlying data models.
Each inventory uses a uniquely defined set of field names. However, many of these field names are similar to (or their content is similar to) fields in another country’s inventory.
Some of the key relationships among the inventories have been captured by developing a “mapping” among fields.
These mappings provide a set of connections that can subsequently be applied to automated query and integration of data from multiple inventories.
SO2
SO2Yr
SO2_Ann
Sulfur DioxideSO2
15
Leveraging Multiple Projects
The challenges in distributed information systems should be addressed by collaborative efforts across governments, agencies, researchers, and disciplines.
Common underlying goals among projects provide opportunities to naturally design systems to interoperate
Avoids one-time, stand alone solutions that cannot be reused
16
DataFed.Net
The Aerosol Data and Services Federation, (http://www.datafed.net), is a network of providers and users for sharing atmospheric data and processing services. DataFed includes a Community of participants who share and use data and processing services, Mediator software component to homogenize data access and Peer-to-peer computing for composing web applications.
R. Husar, CAPITA
17
Catalog of Air Quality Related Data
Metadata for each dataset are registered in a catalog allowing users to browse available datasets and determine which datasets to use for their particular application.
The catalog entries includes data access instructions.
http://capita.wustl.edu/dvoy_2.0.0/dvoy_services/datafed.aspx?
select data domain: aerosols, emissions, fire …
The emissions data are registered in the DataFed.net catalog
19
Modular Applications for Working with Online Data
Emissions data is multi-dimensional (plant, year, pollutant, fuel type, boiler capacity, etc.).Its multidimensionality requires multiple “views” of the data in a variety of end-use applications.
Data views can be created in the DataFeds.net framework, including maps, time series, and tables. Each view is independently linked to its data sources and described (i.e. geographic and temporal extents). Using the data access instructions registered in the catalog, a view can be dynamically assembled.
The modularly designed views can be embedded and controlled in web pages using Javascript, ASP or other web application programming languages.
20
Web Services
Substantial progress has been achieved in data interoperability. One the next advances required is interoperable data analysis/processing tools.
Web services are applications that are used over the Web. Because they are self-contained and use XML-based standards (SOAP, WSDL, UDDI) for describing themselves and communicating with other web resources, they can be reused in a variety of independent applications.
Many of the analysis and processing tools used by the air quality management community could benefit from web service technology. Not only can their data be shared but their heterogeneous, distributed tools that operate on that data can be shared as well.
A longer term vision for web services is to be able to “orchestrate” or “chain” multiple services from multiple providers so that new “third party” applications can be constructed.
21
In this example, the user can change the pollutant, date, map zoom data on/off, and map/time scales.
Example Web Services Application
Emissions data from multiple databases are displayed on maps, time series, and tables. Tools are included for browsing and querying the data.
22
North American Emissions Demonstration Data Flow
MapPointAccess
DataCatalog
MapPointRender
MapImageOverlay
MapImageAccess MapImageRender
The settings of each web service can be changed by the user, creating a dynamic application
Wrappers
= web service
BRAVO
NPRI
NEI
eGrid
CAM
DataSet = NPRIYear=1999Parameter=SO2
MapPointAccess
DataSet = eGridYear=1999Parameter=SO2
MapPointAccess
DataSet = NEIYear=1999Parameter=SO2
Color = YellowSymbol=BarWidth=8
MapPointRender
Color = RedSymbol=BarWidth=8
MapPointRender
Color = BlueSymbol=BarWidth=8
DataSet = N.Am. Borders
Layer Order = N.Am, NEI, eGRid, NPRI
Color= Maroon Size=2
NameSettings
23
Embedded Images and Controllers in Web Page
Parameter Controller
Date Controller
Query Controller
The controllers and map image view can be linked and assembled in a web page. Changing the settings of a controller changes the URL of the map image and updates the web page.
The web page can be constructed using standard web application programming languages, such as JavaScript and ASP.
24
Project Results
Technology has been demonstrated to be at the point where we can begin to apply some of the distributed database concepts to “real” situations
Emissions data present unique challenges due to their complex relational dimensionality
Collaborative efforts in the near future could generate a distributed North American emissions inventory– The initial versions of the inventory would help clarify the issues related
to handling complex queries– Building and using a distributed emissions tool will assist in creating
consensus data naming conventions
25
Current Project Challenges
Dynamic Access to Data- technical snafus- security issues- slow performance
Complexity of Emissions Data
Quickly evolving technology
26
What’s Missing
Metadata– More complete metadata would help in relating heterogeneous databases
More complete access to distributed datasets– A process for creating trusted provider-user agreements would help address
issues of security and data misuse
More comprehensive content– Networked data and tools that spark additional interest in the technology’s
potential
Actual Implementations– FASTNet will demonstrate the use of many of these technologies in the real time
monitoring of major aerosol events this summer
27
Next Steps
Advance the prototype tool to make it more representative of what an emissions inventory would ultimately use
EPA’s RFA on Distributed Air Quality Data in Support of NEISGEI
Link to other web services (Geospatial One-Stop, TerraServer) and other relevant data sources, including Web Map Servers
Establish collaborative partnerships with other researchers and agencies developing related networks and tools (ReVa, Earth Science Federation (NASA), NSF Cyberinfrastructure efforts)
Build tools that add value to the distributed data and provide incentives for data providers to join networks
Clarify the handling of distributed data for optimal system performance
28
Mediators
The flow of data from provider to user passes through brokers, or mediators.
The data continues to be maintained by original providers
Contracts are used to retain a constant link to the original data
Data is still not centrally stored but is “cached” in a format that allows efficient queries and analyses
Mediators provide an interface between the user and the data that enhances the effective information exchange between the two sides
29
Adding Value to data through Services
Server Client
Mediator Services
Data Sources Data Users
multidimensional data cubes
analytical web services