Big Data and Harvesting Data from Social Media
-
Upload
rajendra-akerkar -
Category
Data & Analytics
-
view
408 -
download
0
Transcript of Big Data and Harvesting Data from Social Media
Big Data and Harvesting Data from Social Media
Rajendra Akerkar
Invited Talk at “Inclusion‐Exclusion: Global Digital Cultures” , University of Bergen. 01‐03‐2016
01.03.2016 Rajendra Akerkar 2
“Ask not what your data can do for you, ask what you can do with your data.”
‐ a data‐driven reimagining of a famous
JFK quote
01.03.2016 Rajendra Akerkar 3
Data‐driven paradigm
01.03.2016 Rajendra Akerkar 4
How to utilize those raw data to learn new insights, predict trends and changes, introduce innovation and market leads, and create new opportunities.
Information ≤ Data, Information ≠ Insights
01.03.2016 Rajendra Akerkar 5
01.03.2016 Rajendra Akerkar 6
A
01.03.2016 Rajendra Akerkar 7
Reasons to explore big data with social media
01.03.2016 Rajendra Akerkar 8
Volume
Velocity
VarietyVariability
Complexity
Big Data Characteristics
01.03.2016 Rajendra Akerkar 9
Structured
Semi‐structured
Quasi‐structured
Unstructured
Data comprising a defined data type, format, structure .
Data that has no inherent structure & is mostly stored as different types of files.
Textual data with inconsistent data formats, can be formatted with effort, tools, and time.
Textual data files with a discernable pattern, enabling parsing .
More Structured
01.03.2016 Rajendra Akerkar 10
http://www.google.com/#hl=en&sugexp=kjrmc&cp=8&gs_id=2m&xhr=t&q=data+scientist&pq=big+data&pf=p&sclient=psyb&source=hp&pbx=1&oq=data+sci&aq=0&aqi=g4&aql=f&gs_sm=&gs_upl=&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=d566e0fbd09c8604&biw=1382&bih=651
The Red Wheelbarrow, by William Carlos Williams
View Source
Structured Data
Semi‐Structured Data
Quasi‐Structured Data
Unstructured Data
Source: EMC
What to do with these data?
01.03.2016 Rajendra Akerkar 11
Aggregation and Statistics • Data warehouse and OLAP
Indexing, Searching, and Querying• Keyword based search • Pattern matching (XML/RDF)
Knowledge discovery• Data Mining• Statistical Modelling
Emergency Management in Social Media Generation
01.03.2016 Rajendra Akerkar 12
The analysis of the communication behaviour via social media in an emergency situation and its impact on emergency management procedures.
Social Media Mining
01.03.2016 Rajendra Akerkar 13
Social Media Mining is the process of representing, analyzing, and extracting meaningful patterns from social media data.
Sources of real‐time data streams
01.03.2016 Rajendra Akerkar 14
Three key sources of live information streams: • Spontaneous User‐Generated Contents • Machine‐Generated Contents • Structured Database Contents
Information is becoming increasingly multimedia • Purely text based approach is inadequate
Also multilingual and multicultural
Several Research Issues!!
Social media mining challenges
01.03.2016 Rajendra Akerkar 15
1. Big Data Paradox• Social media data is big, yet not evenly distributed. • Often little data is available for an individual
2. Obtaining Appropriate Samples• Are our samples reliable representatives of the full data?
3. Noise Removal Fallacy• Too much removal makes data more sparse• Noise definition is relative & complicated & is task‐dependent
4. Evaluation Dilemma• When there is no ground truth, how can you evaluate?
5. Deception Detection• Information intended to deceive can spread though social media the
same as valid information.
Thank you
01.03.2016 Rajendra Akerkar 16