Post on 11-May-2015
Hands-on-WorkshopBig (Twitter) Data
Damian Trilling
d.c.trilling@uva.nl@damian0604
www.damiantrilling.net
Afdeling CommunicatiewetenschapUniversiteit van Amsterdam
30 January 201413.15
#bigdata Damian Trilling
In this session (3/4):
What we’ll do
1 A bunch of exercises2 If you want to, the opportunity to develop an own script
Björn and I will help you.
#bigdata Damian Trilling
I’ll now show you some example scripts you can use for doing theexercises and for inspiration for an own project. You find everythingyou need at http://beehub.nl/bigdata-cw/workshop.
Or in the future at https://github.com/uvacw/py-examples
#bigdata Damian Trilling
RE exercise 1: Automated coding
See example from this morning
#bigdata Damian Trilling
RE exercise 2: Freqencies
netvizz ⇒ engeltjes.tab ⇒ engeltjes.py ⇒ screen output +engeltjes_count.csv
something new: The package nltk and the removal of stopwords
www.nltk.org
#bigdata Damian Trilling
RE exercise 2: Freqencies
netvizz ⇒ engeltjes.tab ⇒ engeltjes.py ⇒ screen output +engeltjes_count.csv
something new: The package nltk and the removal of stopwords
www.nltk.org
#bigdata Damian Trilling
RE exercise 3: Sentiment analysis
The pattern-modulepattern.nl | en | es | de | fr | it | nl
http://www.clips.ua.ac.be/pages/pattern
#bigdata Damian Trilling
RE exercise 4: Your own ideas
1 Have a look at the examples on beehub or github.2 Ask google.3 Ask us for advice.
#bigdata Damian Trilling
Before you start
Common errors
indention error Pay attention to TAB and SPACE.error in line YYY Have a close look at line YYY in your editor.index out of range Maybe you want to read column 5 from a table
with 4 columns?
Try your script on a small dataset first!
#bigdata Damian Trilling
Vragen of opmerkingen?
Damian Trilling
d.c.trilling@uva.nl@damian0604
www.damiantrilling.net
#bigdata Damian Trilling