Post on 01-Jan-2016
1
Open DATA METI:All Content As Big Data
Dr. Brand NiemannDirector and Senior Enterprise Architect – Data Scientist
Semantic Communityhttp://semanticommunity.info/
AOL Government Bloggerhttp://gov.aol.com/bloggers/brand-niemann/
March 15, 2013http://semanticommunity.info/A_Japan_METI_Open_Data_Dashboard/Open_DATA_METI
2
Preface• Question from Brand Niemann:
– Does this deal with the data elements themselves in the data sets, so you can search for data elements that you want to integrate with other data elements and find their definitions (metadata) to know if they are the same or similar enough to be semantically integrated?
• Answer from John Erickson, Director, Web Science Operations, Tetherless World Constellation (RPI):– No. DCAT deals with the initial problems of where dataset catalogs and datasets
themselves are from and what they contain. Loosely speaking, it does for catalogs and datasets what Dublin Core did for publications: it provides a succinct vocabulary that providers can rely on for describing their datasets, and consumers can rely on for finding. DCAT has already been used as the basis for the schema.org "datasets" extension as a way to make discovery of datasets easier using popular search engines.
– Articulating the actual vocabularies used in published datasets is waaaay beyond the scope of DCAT, in part because DCAT is not restricted to datasets published as linked data. Some work including http://healthdata.tw.rpi.edu are looking at ways to communicate standard vocabularies used in published linked data...
All the work with Data Catalogs does not really help with data integration.
3
Preface
http://www.computerweekly.com/news/2240179544/Big-data-spells-new-architectures
"The data warehouse does what it does well and is not going to go anywhere. But it is not architected very well for the future. Our job, as IT, revolves entirely around one thing -- data integration”.
Big Data Spells New Architecture
4
Preface
http://radar.oreilly.com/2007/12/google-admits-data-is-the-inte.html
http://www.forbes.com/sites/jonbruner/2012/04/04/tim-oreilly-on-the-future-of-location-the-guy-with-the-most-data-wins/
‘Big Data is the new software’
5
Preface• Dominic Sale:
– Introduced as OMB Chief of Data Analytics & Reporting at the Big Data Technology Symposium, March 13, 2013.
– Said “new Digital Government Strategy is treating all content as data.“– Dominic Sale joined OMB’s Office of E-Government and Information
Technology in 2008 as a portfolio manager for several government-wide IT initiatives. At OMB, Dominic played a lead role in implementing and operating major initiatives such as the IT Dashboard, and he is currently heavily involved in implementing the Federal CIO’s 25-Point IT Management Reforms. Prior to arriving at OMB, Dominic began his Federal career as a program analyst in the OCIO at the Department of Transportation. In his prior life as a contractor at both BAE Systems and BearingPoint, Dominic managed EA, capital planning and security initiatives at DOL, NLRB, FDA, and Census. He has also worked on a variety of federal programs, at agencies such as the IRS, US Postal Service, US Mint, US Patent and Trademark Office, and the National Park Service.
http://semanticommunity.info/Big_Data_Symposia#Speaker_Bio_for_Dominic_Sale
“New Digital Government Strategy is treating all content as data.”
6
My Process
• Open DATA METI Web Site to MindTouch Knowledge Base to an Excel Spreadsheet
• Open DATA METI Data Set List by File Type to an Excel Spreadsheet
• Open DATA METI Data Sets by Metadata to an Excel Spreadsheet
• Import the Above (3) and Selected Open DATA METI Data Sets Into Spotfire
• Get Visualizations and Beginning of a Unified Big Data Architecture and Ecosystem for Big Data Integration
7
Open DATA METI: WordPress & CKAN
http://datameti.go.jp/
About DATA METI:HomeTerms of usePrivacy PolicyNotation of creditPartners leverage DATA METIInquiryAPIAPI Documentation
Section:TagStatisticsRevisionSite administrator
8
Open DATA METI: MindTouch
http://semanticommunity.info/A_Japan_METI_Open_Data_Dashboard/Open_DATA_METI
Knowledge Base with Well-Defined URLs
9
Open DATA METI: Excel Spreadsheet 1Knowledge Base
http://semanticommunity.info/@api/deki/files/21577/METI2013.xlsx
10
Open DATA METI: Data Set List
http://datameti.go.jp/data/
Drill Down on These 19
11
Open DATA METI: Excel Spreadsheet 2Data Set List
http://semanticommunity.info/@api/deki/files/21577/METI2013.xlsx
12
Open DATA METI:Comprehensive Energy Statistics
http://datameti.go.jp/data/group/statistics_sougouenergy
13
Open DATA METI:General Energy Statistics (FY 2011)
http://datameti.go.jp/data/dataset/statistics_sougouenergy_2011
Some Have Lots of Files
Source of Data
14
Open DATA METI:Source
http://www.enecho.meti.go.jp/info/statistics/jukyu/index.htm
15
Open DATA METI:Link to Excel Spreadsheet
http://datameti.go.jp/data/dataset/statistics_sougouenergy_2011/resource/b707e1d2-bd3d-483a-ab83-65e081c6daab
Link to SpreadsheetMy Comment: This is too many clicks to get to the actual data!
16
Open DATA METI:Excel Spreadsheet
http://www.enecho.meti.go.jp/info/statistics/jukyu/resource/xls/2011fysokuhou.xls
17
Open DATA METI:Excel Spreadsheet in Spotfire
Needs reformatting and language translation.Needs reformatting and language translation.
Beginning of a Unified Data Architecture and Ecosystem for Data Integration using the View Data function in Spotfire 5.
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AOpenDATAMETI-Spotfire.dxp
18
Open DATA METI: Excel Spreadsheet 3Data Sets Metadata
http://semanticommunity.info/@api/deki/files/21577/METI2013.xlsx
19
Open DATA METI:Excel Spreadsheet 1-3 in Spotfire
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AOpenDATAMETI-Spotfire.dxp
20
Open DATA METI: Excel Spreadsheet 4Merged Data Sets
http://semanticommunity.info/@api/deki/files/21577/METI2013.xlsx
21
Open DATA METI:Merged Data Sets in Spotfire
https://silverspotfire.tibco.com/us/library#/users/bniemann/Public?AOpenDATAMETI-Spotfire.dxp
22
Summary• Preface:
– All the work with Data Catalogs does not really help with data integration.– Big Data Spells New Architecture.– Big Data is the new software.– New Digital Government Strategy is treating all content as data.
• The Open DATA METI Data Catalog has been turned into data in spreadsheets and statistical visualizations in Spotfire.
• This simplifies the complex WordPress & CKAN interface which requires lots of extra mouse clicks and provides no faceted search.
• Google Chrome provides Japanese language translation of the metadata, but not of the data columns in the spreadsheets.
• This process provides the beginning of a Unified Data Architecture and Ecosystem for Data Integration using the View Data function in Spotfire 5.