Tlad 2015 presentation amin+charles-final
-
Upload
amin-chowdhury -
Category
Documents
-
view
196 -
download
0
Transcript of Tlad 2015 presentation amin+charles-final
![Page 1: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/1.jpg)
BETTER WITH DATA: A CASE STUDY IN SOURCING LINKED DATAINTO A BUSINESS INTELLIGENCE ANALYSIS
Amin ChowdhuryCharles BoisvertMatthew LoveIan Ibbotson
TLAD 2015 13th International Workshop on Teaching, Learning and Assessment of Databases (TLAD) Conference,Birmingham City University
![Page 2: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/2.jpg)
Sourcing Linked Datainto a Business Intelligence analysis
Can students apply more than one technology at a time?
• Early barriers prevents access to later work• Limited time• Need to measure performance• Cocktail effect
![Page 3: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/3.jpg)
We need carefully worked case studies
We use Open data to look into the relationship between weather conditions and levels of air pollution.
This is a case using a range of practices:
• Finding and accessing Open Data• Exploring Linked Data• Sections of the Extract-Transform-Load processes of data warehousing• Building an analytic cube• Application of data mining tools
Links provided for the data sources and tools.
![Page 4: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/4.jpg)
Our case study: Air pollution kills
Estimated 29,000 early deaths each year in the UK (PHE).
Government targets for reducing the quantities and/or frequencies of the main pollutants (some figures given below).
Local Authorities monitor and publish pollution levels in their areas. Sheffield City Council monitoring devices:
• Diffusion tubes• Fully automated processing units.
![Page 5: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/5.jpg)
Measuring pollution
Nitrogen Dioxide diffusion tube Around 160 diffusion tube devices Diffusion tubes:• are spread throughout the city area.• Have to be sent in for analysis• Data every six to eight weeks per tube.• Published aggregated annual level
![Page 6: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/6.jpg)
Measuring pollution
6 automated stations
• A.k.a. Groundhogs• Fixed spots (sort of)• Measure a variety of pollutants• Plus temperature and air pressure (from ‘groundhog 1’)• Frequent readings (several per hour) when it works• Log is publicly available• 15-year archive, with gaps• Some post-editing: deletions, correction of outliers.
![Page 7: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/7.jpg)
Data is available
Sheffield City Council web sites:
• Air Quality:https://www.sheffield.gov.uk/environment/air-quality/monitoring.html
• Air Pollution Monitoring:http://sheffieldairquality.gen2training.co.uk/sheffield/index.html
Good things:
• Automated station results
• We can selected a range, choose a format (PostScript, CSV, Excel), download.
• Data is human-readable (ish)
![Page 8: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/8.jpg)
Is it open?
Like so much data sourced from the Internet…
• Textual descriptions
• No obvious way of automatically deriving further information.
Open data: the idea that certain data should be freely available to everyone to use and republish as they wish wikipedia.org/Open_data
• e.g. Groundhog1 is at “Orphanage Road, Firhill” – where is that? What is it like?
![Page 9: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/9.jpg)
Is it open?• Navigation not designed for automation.• URL does not reflect the name of the Groundhog • On Sir Tim’s 5-star scale, this is 3 / 5.
• We want automated discovery by data harvesting tools.• Plus: how flexibly can users contribute to the data?• How is the meta-data (licencing, quality…)?
Available
Downloadable
Open format
No API
No automatic discovery
Image: 5stardata.info
![Page 10: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/10.jpg)
Wanted: automated discovery and consumption.
C Boisvert
office9327
tel1234
positionSenior
Lecturer
• Store everything as triples
• Rather than primary keys:Use URIsPKs are unique in one table of one system.URIs are unique World-Wide.
Linked Data
• Form ‘chains’ from point to point through the graph database.
![Page 11: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/11.jpg)
Air Quality+:Linked Data for Sheffield Pollution
https://github.com/BetterWithDataSociety
• A database of Sheffield pollution measurements as linked data.• Groundhogs have their URI
• Diverse measures, e.g. NO2, SO2, micro-particles (e.g. diesel fumes), air pressure , air temperature.• Measurements are archived in the database as triples.• The ontology allows all but literal values to be further investigated, for instance to find out more about the NO2 compound.• Allows machine discovery to add context to data, e.g. the type of neighbourhood of each of the Groundhog sites.
![Page 12: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/12.jpg)
AQ+ linked data
![Page 13: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/13.jpg)
SPARQLTo query the Subject / Predicate / Value triples in the database, we use the SPARQL query language.
Specify a partial triple to return all records that fit that context. Filter – e.g. return values within a selected date range. Discover programmatically
• What Groundhogs there are• What pollutants each monitors, • The readings of those pollutants.
The AQ+ endpoint offers multiple result formats, e.g. CVS, JSON, XML.
![Page 14: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/14.jpg)
SPARQL Editor
boisvert.me.uk/opendata/sparql_aq+.html
Hourly readings from all available Groundhogs between selected dates
• Editing• SPARQL syntax highlighted• interpreted on AQ+ endpoint
![Page 15: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/15.jpg)
Further data sources
A lucky strike:• Local enthusiast• Weather station readings at five-minute intervals• In PDF format - 200 pages per month!• Bytescout PDF -> CSV
Giving added context to facts, through Dimension descriptors added from other sources. • From Groundhog1 – temperature and air pressure• But no data on other factors - wind strength & direction, humidity
Surely these influence pollution formation and/or dispersal? We need detailed historic weather data; not cheap.
Licencing rights to this data have not been decided in general. Ask permission to use the data for study purposes (any commercial use of the data could cause the site to be closed).
![Page 16: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/16.jpg)
INTEGRATION OF FURTHER DATA SOURCES
• Microsoft SQL Server Data Warehouse • ETL processes• Data Cube from Data Star• Business Intelligence with MS Analysis
Service• Data Mining
![Page 17: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/17.jpg)
Data Warehouse
![Page 18: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/18.jpg)
Creation of Data Cube from Data Star
![Page 19: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/19.jpg)
Analysis and PowerPivot Exporting
![Page 20: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/20.jpg)
Self Service Data Exploration
![Page 21: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/21.jpg)
Self Service Data Exploration
![Page 22: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/22.jpg)
Data Mining
![Page 23: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/23.jpg)
![Page 24: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/24.jpg)
Cluster Data Mining Toolc
![Page 25: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/25.jpg)
Ranked by probability
![Page 26: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/26.jpg)
Comparison of properties of cluster 9
![Page 27: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/27.jpg)
Decision Trees
![Page 28: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/28.jpg)
http://aces.shu.ac.uk/teaching.cmsrml/AirQuality
Teaching resources
![Page 29: Tlad 2015 presentation amin+charles-final](https://reader035.fdocuments.in/reader035/viewer/2022062515/55cccefcbb61eb100d8b47df/html5/thumbnails/29.jpg)
Questions?