WORKSHOP- BIG DATA ANALYTICS Israeli Social Protest Osher Arbib Winter 2012-2013 Tel-Aviv University...
-
Upload
nicholas-franklin -
Category
Documents
-
view
212 -
download
0
Transcript of WORKSHOP- BIG DATA ANALYTICS Israeli Social Protest Osher Arbib Winter 2012-2013 Tel-Aviv University...
1
WORKSHOP-BIG DATA
ANALYTICSIsraeli Social
ProtestOsher ArbibWinter 2012-2013Tel-Aviv University
2
General Review
• In summer of 2011 began the Israeli Social Protest, which is the largest protest movement in Israel's history.
• One of the main factors in the development of the protest is the ease with information transition on the Internet – in a veteran format as e-mails and forums, and social networks Facebook and Twitter.
3
Project Background
• Over a period of about a year and a half were collected e-mails by the protests - some personal emails and most emails sent to distribution lists.
• The project goal is to build an efficient model for data analysis, and deduce information that will be useful for analyzing social science aspects of protest
4
Project Purposes
• The analysis of the given information will be on two levels – – Analyze the structured information (an analysis by
subjects, timeline, traffic volume, etc.)– Understanding the un-structured information (i.e.-
text content understanding)• Perhaps that will be produced suitable model
for analyzing information sources real – time, which helps with future events.
5
Data Base
Structured Data
Analysis of Network
Unstructured Data
Content Analysis
6
Python Libraries and Tools
NumPy ,SciPy,Pandas,matplotlib,
Pygrametl, etc.
Data Base
Structured Data
Analysis of Network
Unstructured Data
Content Analysis
7
Data Base
Structured Data
Analysis of Network
Unstructured Data
Content Analysis tfidfNetworkX
8
Data Base
Structured Data
Analysis of Network
Unstructured Data
Content Analysis
Google Visualization
9
Limitations
• The content analysis is limited:– As language limitations, i.e.- even the recognition
of verb conjugations and nouns in Hebrew is complicated.
– Limited tools allow entry-level content analysis- identification at most expressions of the two - three words.