Smart Data for you and me: Personalized and Actionable Physical Cyber Social Big Data
-
Upload
amit-sheth -
Category
Data & Analytics
-
view
738 -
download
0
description
Transcript of Smart Data for you and me: Personalized and Actionable Physical Cyber Social Big Data
Smart Data for you and me: Personalized and Actionable Physical Cyber Social Big Data
Put Knoesis Banner
Keynote at WorldComp 2014, July 21, 2014
Amit ShethLexisNexis Ohio Eminent Scholar & Exec. Director,
The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)Wright State, USA
2
BIG Data 2014
http://hrboss.com/hiringboss/articles/big-data-infographic
3
Only 0.5% to 1% of the data is used for analysis.
http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explodehttp://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume
4
Variety – not just structure but modality: multimodal, multisensory
Structured
Unstructured
Semi structured
Audio
Video
Images
5
Velocity
Fast Data
Rapid Changes
Real-Time/Stream Analysis
Current application examples: financial services, stock brokerage, weather tracking, movies/entertainment and online retail
6
What has changed now?
About 2 billion of the 5+ billion have data connections – so they perform “citizen sensing”.And there are more devices connected to the Internet than the entire human population.
These ~2 billion citizen sensors and 10 billion devices & objects connected to the Internet makes this an era of IoT (Internet of Things) and Internet of Everything (IoE).
http://www.cisco.com/web/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf
7
“The next wave of dramatic Internet growth will come through the confluence of people, process, data, and things — the Internet of Everything (IoE).”
- CISCO IBSG, 2013
http://www.cisco.com/web/about/ac79/docs/innov/IoE_Economy.pdf
Beyond the IoE based infrastructure, it is the possibility of developing applications that spansPhysical, Cyber and the Social Worlds that is very exciting.
What has changed now?
8
What has not changed?
We need computational paradigms to tap into the rich pulse of the human populace,
and utilize diverse data
We are still working on the simpler representations of the real-world!
Represent, capture, and compute with richer and fine-grained representations of real-world
problems
What should change?
9
Current focus on Big Data is on meeting Enterprise/Company needs.
Significant opportunity in applications for individual and community needs. Many of these, esp. in complex domains such as health, fitness and well-being; better disaster coordination, personalized smart energy These need to exploit diverse data types and sources: Physical(sensor/IoT), Cyber(Web) and Social data.
Smart data –personalized, contextually relevant, actionable information – provide a better computational paradigm.
My take on thinking beyond the Big Data buzz
10
• Not just data to information, not just analysis, but actionable information, delivering insight and support better decision making right in the context of human activities
What is needed?
Data InformationActionable: An apple a day
keeps the doctor away
A blood test has ~30 bio markers…how will a doctor cope with a test with 300K data points?
11
What is needed? Taking inspiration from cognitive models
• Bottom up and top down cognitive processes: – Bottom up: find patterns, mine (ML, …)– Top down: Infusion of models and background
knowledge (data + knowledge + reasoning)
Left(plans)/Right(perceives) BrainTop(plans)/Bottom(perceives) Brainhttp://online.wsj.com/news/articles/SB10001424052702304410204579139423079198270
12
• Ambient processing as much as possible while enabling natural human involvement to guide the system
What is needed?
Smart Refrigerator: Low on Apples
Adapting the Plan: shopping for apples
13
Contextual
Information Smart Data
Makes Sense to a human
Is actionable – timely and better decisions/outcomes
15
My 2004-2005 formulation of SMART DATA - Semagix
Formulation of Smart Data strategy providing services for Search, Explore, Notify.
“Use of Ontologies and Data repositories to gain
relevant insights”
16
Smart Data (2014 retake)
Smart data makes sense out of Big data
It provides value from harnessing the challenges posed by volume, velocity, variety
and veracity of big data, in-turn providing actionable information and improve decision
making.
17
OF human, BY human FOR human
Smart data is about extracting value by improving human involvement in data creation,
processing and consumption. It is about (improving)
computing for human experience.
Another perspective on Smart Data
18Petabytes of Physical(sensory)-Cyber-Social Data everyday! More on PCS Computing: http://wiki.knoesis.org/index.php/PCS
‘OF human’ : Relevant Real-time Data Streams for Human Experience
Use of Prior Human-created Knowledge Models
19
‘BY human’: Involving Crowd Intelligence in data processing
Crowdsourcing and Domain-expert guided Machine Learning Modeling
20
Detection of events, such as wheezing sound, indoor temperature, humidity,
dust, and CO level
Weather Application
Asthma Healthcare Application
Close the window at home during day to avoid CO in
gush, to avoid asthma attacks at night
‘FOR human’ : Improving Human Experience (Smart Health)
Population Level
Personal
Public Health
Action in the Physical World
Luminosity
CO levelCO in gush during day time
21
Electricity usage over a day, device at work, power consumption, cost/kWh,
heat index, relative humidity, and public events from social stream
Weather Application
Power Monitoring Application
‘FOR human’ : Improving Human Experience (Smart Energy)
Population Level Observations
Personal Level Observations
Action in the Physical World
Washing and drying has resulted in significant cost
since it was done during peak load period. Consider
changing this time to night.
22
Every one and everything has Big Data –It is Smart Data that matter!
23http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/
MIT Technology Review, 2012
The Patient of the Future
Physical-Cyber-Social Computing An early 21st century approach to Computing for Human
Experience
PCS Computing
People live in the physical world while interacting with the cyber and social worlds
Physical WorldCyber World
Social World
26
Computations leverage observations form sensors, knowledge and experiences from
people to understand, correlate, and personalize solutions.
Physical-Cyber
Social-Cyber
Physical-Cyber-Social
What if?
27
Sensors around, on, and in humans will bridge the physical and cyber world.
Cyber
Physical
We believe that current CPS should view the physical worldby incorporate solutions form (knowledge) cyber worldwith a lens of social context.
There are silos of knowledge on the cyber world which are under utilized.
Social
Social networks bridge the social interactions in the physical and cyber world.
Mark’s discomfort sensed by: galvanic skin response, heart rate, fitbit, and Microsoft Kinect
Physical Cyber Social Computing involves: (1) Comparing physiological observations from people similar to him (age, weight, lifestyle,ethnicity, etc.) (2) Analyzing health experiences of similar people reporting heartburn (3) Incorporating history of ailments of Mark (4) Leveraging medical domain knowledge of diseases and symptoms.•He is advised to visit a doctor since he had a heart condition (from EMR) in the past and heartburns in similar people (social) was a symptom of arterial blockage
Mark is experiencing heartburn.
Alert to contact his doctor.
Physical
Sensing
Actuating
Computing
Rich knowledge of the medical
domain
EMR and PHR
Physiological sensor data from
human population
Health related experiences shared by humans
PCS Computing: Health Scenario
28
Vertical operators facilitate transcending from data-information-knowledge-
wisdom using background knowledge
Horizontal operators facilitate semantic integration of multimodal and
multisensory observations
PCS Computing
PCS computing is a holistic treatment of data, information, and knowledge from physical, cyber, and social worlds to integrate, understand, correlate, and provide contextually relevant abstractions to humans. Think of PCS Computing as the application/semantic layer for the IoE-based infrastructure.
http://wiki.knoesis.org/index.php/PCS
DATAsensor observations
KNOWLEDGEsituation awareness useful
for decision making
29
Primary challenge is to bridge the gap between data and knowledge
30
What if we could automate this sense making ability?
… and do it efficiently and at scale
31
Making sense of sensor data with
Henson et al An Ontological Approach to Focusing Attention and Enhancing Machine Perception on the Web, Applied Ont, 2011
32
People are good at making sense of sensory input
What can we learn from cognitive models of perception?
The key ingredient is prior knowledge
33* based on Neisser’s cognitive model of perception
ObserveProperty
PerceiveFeature
Explanation
Discrimination
1
2
Translating low-level signals into high-level knowledge
Focusing attention on those aspects of the environment that provide useful information
Prior Knowledge
Perception Cycle*
Convert large number of observations to semantic abstractions that provide insights and translate into
decisions
34
To enable machine perception,
Semantic Web technology is used to integrate
sensor data with prior knowledge on the Web
W3C SSN XG 2010-2011, SSN Ontology
35
W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph
Prior knowledge on the Web
36
W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph
Prior knowledge on the Web
37
ObserveProperty
PerceiveFeature
Explanation1
Translating low-level signals into high-level knowledge
Explanation
Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building
38
Inference to the best explanation• In general, explanation is an abductive
problem; and hard to compute
Finding the sweet spot between abduction and OWL• Single-feature assumption* enables use
of OWL-DL deductive reasoner
* An explanation must be a single feature which accounts for
all observed properties
Explanation
Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building
Representation of Parsimonious Covering Theory in OWL-DL
39
ExplanatoryFeature ≡ ssn:isPropertyOf∃ —.{p1} … ssn:isPropertyOf⊓ ⊓ ∃ —.{pn}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Observed Property Explanatory Feature
Explanation
Explanatory Feature: a feature that explains the set of observed properties
40
ObserveProperty
PerceiveFeature
Explanation
Discrimination2
Focusing attention on those aspects of the environment that provide useful information
Discrimination
Discrimination is the act of finding those properties that, if observed, would help distinguish between multiple explanatory
features
41
ExpectedProperty ≡ ssn:isPropertyOf.{f∃ 1} … ssn:isPropertyOf.{f⊓ ⊓ ∃ n}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Expected Property Explanatory Feature
Discrimination
Expected Property: would be explained by every explanatory feature
42
NotApplicableProperty ≡ ¬ ssn:isPropertyOf.{f∃ 1} … ¬ ssn:isPropertyOf.{f⊓ ⊓ ∃ n}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Not Applicable Property Explanatory Feature
Discrimination
Not Applicable Property: would not be explained by any explanatory feature
43
DiscriminatingProperty ≡ ¬ExpectedProperty ¬NotApplicableProperty⊓
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Discriminating Property Explanatory Feature
Discrimination
Discriminating Property: is neither expected nor not-applicable
Qualities-High BP-Increased Weight
Entities-Hypertension-Hypothyroidism
kHealth
Machine Sensors
Personal Input
EMR/PHR
Comorbidity risk score e.g., Charlson Index
Longitudinal studies of cardiovascular risks
- Find risk factors- Validation - domain knowledge - domain expert
Find contribution of each risk
factor
Risk Assessment Model
Current Observations-Physical-Physiological-History
Risk Score (e.g., 1 => continue3 => contact clinic)
Model CreationValidate correlations
Historical observations e.g., EMR, sensor observations
44
Risk Score: from Data to Abstraction and Actionable Information
45
Use of OWL reasoner is resource intensive (especially on resource-constrained devices), in terms of both memory and time
• Runs out of resources with prior knowledge >> 15 nodes
• Asymptotic complexity: O(n3)
How do we implement machine perception efficiently on a
resource-constrained device?
46
intelligence at the edge
Approach 1: Send all sensor observations to the cloud for processing
Approach 2: downscale semantic processing so that
each device is capable of machine perception
47
0101100011010011110010101100011011011010110001101001111001010110001101011000110100111
Efficient execution of machine perception
Use bit vector encodings and their operations to encode prior knowledge and execute semantic reasoning
Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.
48
O(n3) < x < O(n4)
O(n)
Efficiency Improvement
• Problem size increased from 10’s to 1000’s of nodes
• Time reduced from minutes to milliseconds• Complexity growth reduced from polynomial
to linear
Evaluation on a mobile device
49
2 Prior knowledge is the key to perceptionUsing SW technologies, machine perception can be formalized and integrated with prior knowledge on the Web
3 Intelligence at the edgeBy downscaling semantic
inference, machine perception can execute efficiently on resource-constrained devices
1 Translate low-level data to high-level knowledge
Machine perception can be used to convert low-level sensory signals into high-level knowledge useful for decision making
Semantic Perception for smarter analytics: 3 ideas to takeaway
50
• Healthcare: ADFH, Asthma, GI– Using kHealth system
• Smart Cities: Traffic management
I will use applications in 2 domains to demonstrate
• Social Media Analysis*:Crisis coordinationUsing Twitris platform
kHealthKnowledge-enabled Healthcare
To reduce preventable readmissions of patients with chronic heart failure (CHF, specifically ADHF) and GI;
Asthma in children
51
Brief Introduction Video
53
Through physical monitoring and analysis, our cellphones could act as an early warning system to detect serious health conditions, and provide actionable information
canary in a coal mine
Empowering Individuals (who are not Larry Smarr!) for their own health
kHealth: knowledge-enabled healthcare
Weight Scale
Heart Rate Monitor
Blood PressureMonitor
54
Sensors
Android Device (w/ kHealth App)
Readmissions cost $17B/year: $50K/readmission; Total kHealth kit cost: <
$500
kHealth Kit for the application for reducing ADHF readmission
ADHF – Acute Decompensated Heart Failure
55
1http://www.nhlbi.nih.gov/health/health-topics/topics/asthma/2http://www.lung.org/lung-disease/asthma/resources/facts-and-figures/asthma-in-adults.html 3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145.
25 million
300 million
$50 billion
155,000
593,000
People in the U.S. are diagnosed with asthma (7 million are children)1.
People suffering from asthma worldwide2.
Spent on asthma alone in a year2
Hospital admissions in 20063
Emergency department visits in 20063
Asthma: Severity of the problem
Sensordrone (Carbon monoxide,
temperature, humidity) Node Sensor
(exhaled Nitric Oxide)
56
Sensors
Android Device (w/ kHealth App)
Total cost: ~ $500
kHealth Kit for the application for Asthma management
*Along with two sensors in the kit, the application uses a variety of population level signals from the web:
Pollen level Air Quality Temperature & Humidity
59
what can we do to avoid asthma episode?
Real-time health signals from personal level (e.g., Wheezometer, NO in breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level (e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing information and uneven sampling frequencies.
Variety Volume
VeracityVelocity
ValueWhat risk factors influence asthma control?What is the contribution of each risk factor?
sem
antic
s Understanding relationships betweenhealth signals and asthma attacksfor providing actionable information
WHY Big Data to Smart Data: Asthma example
kHealth: Health Signal Processing Architecture
Personal level Signals
Public level Signals
Population level Signals
Domain Knowledge
Risk Model
Events from Social Streams
Take Medication before going to work
Avoid going out in the evening due to high pollen levels
Contact doctor
AnalysisPersonalized Actionable
Information
Data Acquisition & aggregation
60
61
Asthma Domain Knowledge
Domain Knowledge
ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ; *consider referral to specialist
Asthma Control and Actionable Information
62
Patient Health Score (diagnostic)
Risk assessment model
Semantic Perception
Personal level Signals
Public level Signals
Domain Knowledge
Population level Signals
GREEN -- Well Controlled YELLOW – Not well controlledRed -- poor controlled
How controlled is my asthma?
63
Patient Vulnerability Score (prognostic)
Risk assessment model
Semantic Perception
Personal level Signals
Public level Signals
Domain Knowledge
Population level Signals
Patient health Score
How vulnerable* is my control level today?
*considering changing environmental conditions and current control level
67
Sensordrone – for monitoring environmental air quality
Wheezometer – for monitoringwheezing sounds
Can I reduce my asthma attacks at night?
What are the triggers? What is the wheezing level?
What is the propensity toward asthma?
What is the exposure level over a day?
Commute to Work
Asthma: Actionable Information for Asthma Patients
Luminosity
CO level
CO in gush during day time
Actionable Information
Personal level Signals
Public level Signals
Population level Signals
What is the air quality indoors?
68
Population Level
Personal
Wheeze – YesDo you have tightness of chest? –Yes
Observations Physical-Cyber-Social System Health Signal Extraction Health Signal Understanding
<Wheezing=Yes, time, location>
<ChectTightness=Yes, time, location>
<PollenLevel=Medium, time, location>
<Pollution=Yes, time, location>
<Activity=High, time, location>
Wheezing
ChectTightness
PollenLevel
Pollution
Activity
Wheezing
ChectTightness
PollenLevel
Pollution
Activity
RiskCategory
<PollenLevel, ChectTightness, Pollution,Activity, Wheezing, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory>
.
.
.
Expert Knowledge
Background Knowledge
tweet reporting pollution level and asthma attacks
Acceleration readings fromon-phone sensors
Sensor and personal observations
Signals from personal, personal spaces, and community spaces
Risk Category assigned by doctors
Qualify
Quantify
Enrich
Outdoor pollen and pollution
Public Health
Health Signal Extraction to Understanding
Well Controlled - continueNot Well Controlled – contact nursePoor Controlled – contact doctor
73
RDF OWL
How are machines supposed to integrate and interpret sensor data?
Semantic Sensor Networks (SSN)
74
W3C Semantic Sensor Network Ontology
Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K., Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
76
W3C Semantic Sensor Network Ontology
Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K., Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
SSNOntology
2 Interpreted data(deductive)[in OWL] e.g., threshold
1 Annotated Data[in RDF]e.g., label
0 Raw Data[in TEXT]e.g., number
Levels of Abstraction
3 Interpreted data (abductive)[in OWL]e.g., diagnosis
Intellego
“150”
Systolic blood pressure of 150 mmHg
ElevatedBlood
Pressure
Hyperthyroidism
less
use
ful …
…
mor
e us
eful
……
78
79
Making sense of sensor data with
80
People are good at making sense of sensory input
What can we learn from cognitive models of perception?• The key ingredient is prior knowledge
81* based on Neisser’s cognitive model of perception
ObserveProperty
PerceiveFeature
Explanation
Discrimination
1
2
Perception Cycle*
Translating low-level signals into high-level knowledge
Focusing attention on those aspects of the environment that provide useful information
Prior Knowledge
82
To enable machine perception,
Semantic Web technology is used to integrate sensor data with prior knowledge on the Web
83
Prior knowledge on the Web
W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph
84
Prior knowledge on the Web
W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph
85
ObserveProperty
PerceiveFeature
Explanation1
Translating low-level signals into high-level knowledge
Explanation
Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building
86
Discrimination is the act of finding those properties that, if observed, would help distinguish between multiple explanatory features
ObserveProperty
PerceiveFeature
Explanation
Discrimination2
Focusing attention on those aspects of the environment that provide useful information
Discrimination
87
Discrimination
Discriminating Property: is neither expected nor not-applicable
DiscriminatingProperty ≡ ¬ExpectedProperty ¬NotApplicableProperty⊓
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Discriminating Property Explanatory Feature
88
Semantic scalability: Resource savings of abstracting sensor data
Orders of magnitude resource savings for generating and storing relevant abstractions vs. raw observations.
Relevant abstractions
Raw observations
89
How do we implement machine perception efficiently on aresource-constrained device?
Use of OWL reasoner is resource intensive (especially on resource-constrained devices), in terms of both memory and time
• Runs out of resources with prior knowledge >> 15 nodes• Asymptotic complexity: O(n3)
90
intelligence at the edge
Approach 1: Send all sensor observations to the cloud for processing
Approach 2: downscale semantic processing so that each device is capable of machine perception
Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.
91
Efficient execution of machine perception
Use bit vector encodings and their operations to encode prior knowledge and execute semantic reasoning
0101100011010011110010101100011011011010110001101001111001010110001101011000110100111
92
O(n3) < x < O(n4) O(n)
Efficiency Improvement
• Problem size increased from 10’s to 1000’s of nodes• Time reduced from minutes to milliseconds• Complexity growth reduced from polynomial to
linear
Evaluation on a mobile device
93
2 Prior knowledge is the key to perceptionUsing SW technologies, machine perception can be formalized and integrated with prior knowledge on the Web
3 Intelligence at the edgeBy downscaling semantic inference, machine perception can
execute efficiently on resource-constrained devices
Semantic Perception for smarter analytics: 3 ideas to takeaway
1 Translate low-level data to high-level knowledgeMachine perception can be used to convert low-level sensory signals into high-level knowledge useful for decision making
94
PCS Computing for Traffic Analytics:for personal and community needs
96
Duration: 36 months
Requested funding: 2.531.202 €
CityPulse Consortium
City of Aarhus
City of Brasov
97
Vehicular traffic data from San Francisco Bay Area aggregated from on-road sensors (numerical data/Physical), incident reports (textual/Cyber) and Tweets (Social)
http://511.org/
Every minute update of speed, volume, travel time, and occupancy resulting in 178 million link status observations, 8 million tweets, 738 active events, and 146 scheduled events with many unevenly sampled observations collected over 3 months.
Variety Volume
VeracityVelocity
ValueCan we detect the onset of traffic congestion?Can we characterize traffic congestion based on events?Can we estimate traffic delays in a road network?
sem
antic
s Representing prior knowledge of traffic lead to a focused exploration of this massive dataset
Big Data to Smart Data: Traffic Management example
98
Heterogeneity leading to complementary observations
Textual Streams for City Related Events
99
City Event Annotation – CRF Annotation Examples
Last O night O in O CA... O (@ O Half B-LOCATION Moon I-LOCATION Bay B-LOCATION Brewing I-LOCATION Company O w/ O 8 O others) O http://t.co/w0eGEJjApY O
B-LOCATIONI-LOCATIONB-EVENTI-EVENTO
Tags used in our approach:
These are the annotations providedby a Conditional Random Field modeltrained on tweet corpus to spotcity related events and location
BIO – Beginning, Intermediate, and Other is a notation used in multi-phrase entity spotting 100
Accident
Music event
Sporting event Road Work
Theatre event
External events<ActiveEvents, ScheduledEvents>
Internal observations<speed, volume, traveTime>
Weather
Time of Day
101
Modeling Traffic Events: Pictorial representation
102
Slow moving traffic
Link Description
Scheduled Event
Scheduled Event
511.org
511.org
Schedule Information
511.org
Domain Experts
ColdWeather
PoorVisibility
SlowTraffic
IcyRoad
Declarative domain knowledge
Causal knowledge
Linked Open Data
ColdWeather(YES/NO)IcyRoad (ON/OFF) PoorVisibility (YES/NO)SlowTraffic (YES/NO)
1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0
Domain Observations
Domain Knowledge
Structure and parameters
103
WinterSeason
Otherknowledge
Correlations to causations using Declarative knowledge on the Semantic Web
Combining Data and Knowledge Graph
Traffic jam
Link Descriptio
n
Scheduled Event
traffic jam
baseball game
Add missing random variables
Time of day
bad weather CapableOf slow traffic
bad weather
Traffic data from sensors deployed on road network in San Francisco
Bay Area
time of day
traffic jam
baseball gametime of day
slow traffic
Three Operations: Complementing graphical model structure extraction
Add missing links bad weather
traffic jam
baseball gametime of day
slow traffic
Add link directionbad weather
traffic jam
baseball gametime of day
slow traffic
go to baseball game Causes traffic jam
Knowledge from ConceptNet5
traffic jam CapableOfoccur twice each daytraffic jam CapableOf slow traffic
104
City Infrastructure
Tweets from a cityPOS
Tagging
Hybrid NER+ Event term extraction
Geohashing
Temporal Estimation
Impact Assessment
Event Aggregation
OSM Locations
SCRIBE ontology
511.org hierarchy
City Event Extraction
City Event Extraction Solution Architecture
City Event Annotation
OSM – Google Open Street MapsNER – Named Entity Recognition 105
City Events from Sensor and Social Streams can be…
• Complementary• Additional information• e.g., slow traffic from sensor data and accident from textual data
• Corroborative• Additional confidence• e.g., accident event supporting a accident report from ground truth
• Timely • Additional insight• e.g., knowing poor visibility before formal report from ground truth
106
Evaluation – Extracted Events AND Ground Truth Verification
Complementary Events
Event SourcesCity events extracted from tweets511.org, Active events e.g., accidents, breakdowns 511.org, Scheduled events e.g., football game, parade
City event extracted from twitter reporting about traffic complementing the road construction event reported on 511.org
Evaluation – Extracted Events AND Ground Truth Verification
Corroborative Events
Event SourcesCity events extracted from tweets511.org, Active events e.g., accidents, breakdowns 511.org, Scheduled events e.g., football game, parade
City event from twitter providing corroborative evidence for fog reported by 511.org
Evaluation – Extracted Events AND Ground Truth Verification
Event SourcesCity events extracted from tweets511.org, Active events e.g., accidents, breakdowns 511.org, Scheduled events e.g., football game, parade
City event from twitter providing report of a tornado before an event related to strong winds is reported by 511.org
Timeliness
Events from Social Streams and City Department*
Corroborative EventsComplementary Events
Event SourcesCity events extracted from tweets511.org, Active events e.g., accidents, breakdowns 511.org, Scheduled events e.g., football game, parade
City event from twitter providing complementary and corroborative evidence for fog reported by 511.org
*511.org 110
111
Actionable Information in City Management
Tweets from a CityTraffic Sensor Data OSM Locations
SCRIBE ontology
511.org hierarchy
Web of Data
How issues in a city can be resolved?e.g., what should I do when I have fog condition?
112
Two excellent videos• Vinod Khosla:
the Power of Storytelling and the Future of Healthcare
• Larry Smarr: The Human Microbiome and the Revolution in Digital Health
Wrapping up: For more on importance of what we talked about
113
• Big Data is every where– at individual and community levels - not just
limited to corporation – with growing complexity: Physical-Cyber-Social
• Analysis is not sufficient• Bottom up techniques are not sufficient, need
top down processing, need background knowledge
Wrapping up: Take Away
114
Wrapping up: Take Away
• Focus on Humans and Improve human life and experience with SMART Data.– Data to Information to Contextually Relevant
Abstractions (Semantic Perception)– Actionable Information (Value from data) to assist
and support human in decision making.
• Focus on Value -- SMART Data– Big Data Challenges without the intention of deriving
Value is a “Journey without GOAL”.
• Collaborators: Clinicians: Dr. William Abrahams (OSU-Wexner), Dr. Shalini Forbis (Dayton Childrens), Dr. Sangeeta Agrawal (VA), Valerie Shalin (WSU Cognitive Scientists ), Payam Barnaghi (U-Surrey), Ramesh Jain(UCI), …
• Funding: NSF (esp. IIS-1111183 “SoCS: Social Media Enhanced Organizational Sensemaking in Emergency Response,”), AFRL, NIH, Industry….
Acknowledgment
116
Amit Sheth’s PHD students
Ashutosh Jadhav*
Hemant Purohit
Vinh Nguyen
Lu ChenPavan Kapanipathi
*
Pramod Anantharam
*
Sujan Perera
Maryam Panahiazar
Sarasi Lalithsena
Shreyansh Batt
Kalpa Gunaratna
Delroy Cameron
Sanjaya Wijeratne
Wenbo Wang
Special thanks: Ashu. This presentation covers some of the work of my PhD students. Key contributors: Pramod Anantharam, Cory Henson and TK Prasad.
Special thanks
117
• Among top universities in the world in World Wide Web (cf: 10-yr impact, Microsoft Academic Search: among top 10 in June2014)
• Among the largest academic groups in the US in Semantic Web + Social/Sensor Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical & Biomedicine Applications
• Exceptional student success: internships and jobs at top salary (IBM Watson/Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research universities, NLM, startups )
• 100 researchers including 15 World Class faculty (>3K citations/faculty avg) and ~45 PhD students- practically all funded
• Extensive research for largely multidisciplinary projects; world class resources; industry sponsorships/collaborations (Google, IBM, …)
118
Top organization in WWW: 10-yr Field Rating (MAS)
120
thank you, and please visit us at
http://knoesis.org
Smart Data to Big Data; Physical-Cyber-Social Computing