Post on 27-Jan-2015
description
Seminars BigData Trento 26/03/2013
Via Sommarive, 18 – sala EIT ICT Labs.
Alfonso Crisci - a.crisci@ibimet.cnr.it
Valentina Grasso - grasso@lamma.rete.toscana.it
Image:http://www.greenbookblog.org/2012/03/21/big-data-opportunity-or-threat-for-market-research/
Severe weather events and social media streams:
bigdata approach for impact mapping
Social media and SEO are the information web rivers available.
Are they useful or not?That is the question ( W. Shakespeare).
Social Media are Data
•contents (UGC)
•conversation
•connection
•collaboration
•community
A big lens for crowd behaviour essentialy by:
COMMUNITY membership and CONVERSATION
5C
Why Social Media & Big Data?
User’s Generated Content is the actual largest world mine of data for every purposes.
Perfoming data mining on these kind of data involves many tasks and computational services for parsing and information extration concerning:
Georeferencing
Social Network analitics
Semantic processing
Information Rendering and visualisations
Co-Inference with other informative sources
Retrieval and stocking SM streams
Now there are platforms to realize the
data meet-upMapReduce
Parallel computation
RHadoop
Today code working….
RHadoopRevolution Analytics is only an example!
Social Media
Weatherversus
Intrinsically Big Data
SM & weather are connected!!
Plenty of weather content on SM
• Weather is a common conversation topic
• Services push the personalization of weather forecast
• Weather perceived has local dimension
• Weather could become a "emergency" issue
Considering severe weather events..
Where
WhoWhen
They happens in the space and in the time and troughout media a SM build
trigger an informative frame on the Web-sphere
a deep analogywith WEB processes exist!
Weather as emergency issue main features
•FREQUENT: vs to other emergencies
•FAMILIAR: people deal with weather daily
•PREDICTABLE: important for warnings
•LOCATED: specific spatial and temporal dimension
#fires#earthquake#chemical #nuclear #disaster#health#terrorism
Weather as an operational context where community may increase "resilience" attitude.
In emergency "behaviours" modulate "impacts" on society.
If I'm aware and prepared I act responsibly.
US tornado warning:
people get used to "weather warnings" and they learnt to be proactive in protection.
Enhance the resilience of communities as the aim
Changing climate - changing awareness In Italy and Europe in the last 10 years climate
change made us more exposed to extreme weather events - "preparedness"
Tornado hits: US - Italy 1999-2009
Geographical spreading and magnitude of events are important for awareness
Lovely (or less) Meteo SM fakes ..are everywhere…
Information verification become a must!
Welcome Bigdata!
Verification is a question of time event shape and coherency
start
peak
decline
weather phenomena and social/communication streams as "analogue" time delayed information waves
time
…..and geography as well
real physical process
& information flows
… dynamic informations warping means to explore the
Time coherence between
[ or its mathematical representation!!!!]
In a multidimesional space or better in every time-varying systems ( as the atmosphere or as the “WEB information seas” ) some structures ever could be detected.
Uncovering the Lagrangian Skeleton of TurbulenceMarthur et al.Phys Rev Lett. 2007 Apr 6;98(14):144502. Epub 2007 Apr 4.
Lagrangian coherent structures (LCS)well known in ecology and fluid dynamics
When two or more time-varying systems are connected a supercoherence could be detected if processes are linked.
The link structure between SM and weather could be done hypothetically by a opportune Hierarchy model (Theory of middle-number systems Weinberg 1975). Social media and weather relationships are surely an Organized Complexity.Many parts to be deterministically predicted, too few to be statistically forecasted.
Agent-Based Modeling of Complex Spatial Systemshttp://www.ncgia.ucsb.edu/projects/abmcss/ May Yuan, University of Oklahoma
SMERST 2013: Social Media and Semantic Technologies in Emergency Response15-16 April 2013, University of Warwick, Coventry UK
Disaster 2.0 project
Weather event: early heat wave on 5-7 April 2011
Working case on Italian Twitter-sphere
• investigate time/space coherence between the event extension and its social footprint on Twitter
• semantic analysis of Twitter stream on/off peaks days
Research objectives
Heat wave as a good case
Emergency as consequence of "behaviour"
Communication is key: "how to act"
Heat wave: definitionit's a period with persistent T° above the seasonal mean. Local definition depends by regional climatic context.
Severe weather refers to any dangerous
meteorological phenomena with the
potential to cause damage, serious social disruption, or loss of human life.[WMO]
Types of severe weather phenomena
vary, depending on the latitude, altitude, topography, and
atmospheric conditions. Ref:
http://en.wikipedia.org/wiki/Severe_weather
To overcome every SM& Weather complexities a 5-point :
road map
• Identify a 1-dimensional time flux of information from SM’s world
• Detection of every local statistical linear association of this one in a parametric –physical- spacetime representation ( time spatial grid of data).
• Mapping the significance in classes previously determined.
• Pattern verification with observations.
• Semantics and textual mining confirms.
• Community analisys of SM streams to detect users filters
Target and Products
Stakeholders: •forecasters
•institutional stakeholders
•EM communities
•media agents
Products: •DNKT sematic based SM stream metric
•The significant areas where association of the SM time vector (DNKT) and coupled time gridded data stack of weather paraemeters = spatial associative map
•A semantic analysis Twitter stream:
- clustering
-word clouds
-SNA improves
Detect areas where it's worth focusing attention, also for communication purpose.
Target
Data usedHeat wave period considered (7-13 April 2011)Social - Using Twitter API key-tagged (CALDO-AFA-SETE)
6069 tweets collected through geosearch service for italian area.
- Retweets and replies included (full volume stream)
Climate & Weather (7-10 April 2011)
- Urban daily maximum T° - Daily gridded data (lon 5-20 W lat 35-50)
WRF-ARW model T°max daily data (box 9km)
Semantized Twitter stream metrics
DNKT shows time coherence with daily profiles of areal averaged temperature
*Critical days identified as numerical neighbour of peaks (7-8-9-April): social "heaty days"
DNKT - "daily number of key-tagged tweets"
*
**
The associative map as a tool
Semantic based social stream in 1D * time space (DNKT)
Weather informative layers in 2D time* space
LinearAssociation Statisticallybased Verifierby pixel
Geographic Associative Map (2D space)
Impacted areas in evidence
It's a weather map at X-rays: Twitter stream is used as a "contrast medium"to visualize impacted areas.
This is not a Twitter map
Associative maps fits well
Urban maximum T° over 28 C° on 9 April
where & when
Semantic analitics
- Corpus creationDNKT classification by heat-wave peak days:
heat days ( 7-8-9 April) no-heat days (6-10-11 April).
- Terms Word Clouds (min wd frequency>30)
heat days vs no-heat days
Clustering associated terms
Term frequency ranking comparison
- Hashtag Word Clouds heat days vs no-heat days
R Stat 15.2 Packages used: tm (Feinerer and Hornik, 2012) & wordcloud (Fellows , 2012)
heat days
WordClouds of terms (excluding key-tag caldo-afa-sete)
heat days no-heat days
Terms association clustering
heat days no heat days
"heat" is THE conversation topic "heat" is marginal to the conversation topic
heat days
Terms frequency ranking
no heat N=2608 heat N=3461
oggi 6.0% oggi 8.3% 1°
sole 5.5% troppo
7.7% 2°
troppo 4.1% sole 5.9% 3°
Hashtags WordCloudsheat days no-heat days
On peak days:
- widening of lexical base during "heat critical days" - heat as a conversation topic
- ranking of terms (i.e.:adjectives as "troppo"!) is useful to detect change in communication during climatic stress
- geographic names appears in terms and hashtags wordsets ("#milano" !).
This fits with recent researches on "social media contribution to situational awareness during emergencies".
Semantic results
Snow events
SNA of keytagged social media streams
Begin 10 feb 2013
End 11 feb 2013
The Graph metrics of SM streams are dynamics.
The graph centrality analisys of Media and Istitutions may provide very useful parametersforWeather Event follow-up.
#firenzeneve
Conclusions- Methodology for a social "x-
rays" of a weather event: Semantized SM stream could become as a "contrast medium" to understand the social impact of severe weather events
- Methodology of social geosensing mining is able to map the severe weather impacts and overcome the weakening inside social media data.
Weather as a key emergency context where it's worth working on community resilience - also with the help of social insightful contents.
Reproducible R code
socialsensing Code & Data
https://github.com/alfcrisci/socialgeosensing.git
Wiki Recipes in
https://github.com/alfcrisci/socialgeosensing/wiki
#thanksContacts:Crisci Alfonso & Valentina Grassomail: a.crisci@ibimet.cnr.it mail: grasso@lamma.rete.toscana.it
Twitter: @alf_crisci @valenitna
www.lamma.rete.toscana.itwww.ibimet.cnr.it
#nowquestions(slowly please if is possible)
www.lamma.rete.toscana.itwww.ibimet.cnr.it