A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London
-
Upload
muhammad-adnan -
Category
Technology
-
view
294 -
download
0
description
Transcript of A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London
A geodemographic analysis of the ethnicity and identity of Twitter users in Greater London
Muhammad Adnan, Guy Lansley, Paul Longley
Department of Geography, University College London
Web: http://www.uncertaintyofidentity.com
Introduction
• Use of Social media services has increased• But how representative social media data sets are of the
Census or Electoral roll data ?
• This paper provides an Ethnicity, Age, and Gender analysis of Twitter users• A comparison is provided with the 2011 Census data
• Could have potential applications in cyber maketing and cyber security
Twitter (www.twitter.com)
• Online social-networking and micro blogging service• Launched in 2006
• Users can send messages of 140 characters or less
• Approximately 200 million active users
• 350 million tweets daily
• In 2012, UK and London were ranked 4th and 3rd, respectively, in terms of the number of posted tweets
Data available through the Twitter API
• User Creation Date• Followers• Friends• User ID• Language• Location• Name• Screen Name• Time Zone
• Geo Enabled• Latitude• Longitude• Tweet date and time• Tweet text
Users can download 1% sample of the live tweets through the API
4 million geo-tagged tweets downloaded during August and December, 2012
4 million geo-tagged tweets downloaded during August and December, 2012
Predicting Ethnicity of Twitter Users by using their ‘Names’
Analysing Names on Twitter
• Some examples of NAME variations on Twitter
Real Names
Kevin Hodge
Andre Alves
Jose de Franco
Carolina Thomas, Dr.
Prof. Martha Del Val
Fabíola Sanchez Fernandes
Fake Names
Castor 5.
WHAT IS LOVE?
MysticMind
KIRILL_aka_KID
Vanessa
Petuna
Classifying Twitter Data to ethnic origins
• Applied ONOMAP (www.onomap.org) on FORENAME + SURNAME pairs
Kevin Hodge (ENGLISH)
Andre de Franco (ITALIAN)
…
…
…
…
English Italian
Pakistani Indian
TurkishGreek
Bangladeshi
Spanish
German French
Portuguese
Sikh
Tweeting Activity by different Ethnic Groups
• We used Information Theory Index (Thiel’s H) to compare segregation between different Twitter ethnic groups
Where (for each Twitter ethnic group) E = Greater London’s Entropy Ei = Entropy of each output area in Greater London T = Population of London ti = Population of each output area in Greater London
• 0= No Segregation ; 1=Maximum Segregation
Segregation in different ethnic groups of Twitter Users
Segregation in different ethnic groups of Twitter Users
Ethnic Groups Domestic buildings and
gardens
Week Days Week Nights Weekend
British 0.483 0.211 0.401 0.315
Irish 0.67 0.357 0.571 0.475
White Other 0.63 0.303 0.51 0.42
Pakistani 0.765 0.488 0.679 0.633
Indian 0.748 0.451 0.673 0.59
Bangladeshi 0.864 0.671 0.834 0.784
Black Caribbean 0.831 0.548 0.808 0.666
Black African 0.764 0.492 0.704 0.64
Chinese 0.712 0.403 0.608 0.524
Other 0.71 0.374 0.593 0.497
0= No Segregation ; 1=Maximum Segregation
• Onomap groups were aggregated to match the appropriate groups from the Census
London TotalWhite British
White other
Indian Pakistani BangladeshiBlack African
Chinese
Week Night
53611 71.35% 12.12% 2.63% 2.63% 1.82% 1.52% 1.74%
Week Day 80676 73.12% 11.80% 2.41% 2.41% 1.56% 1.25% 1.61%
Weekend 67351 72.86% 12.17% 2.61% 2.61% 1.67% 1.39% 1.73%
Comparison of Ethnic Groups between ‘2011 Census’ and ‘Twitter’
2011 Census 44.89% 12.65% 6.64% 2.74% 2.72% 7.02% 1.52%
Comparison of the distribution of ethnicity with the 2011 Census
2011 Census Twitter
White British (Quintiles)
Gender and Age Analysis of Twitter Users
Gender Analysis of Twitter Users
Male Female Unisex Not Found0%
10%
20%
30%
40%
50%
60%
Number of Tweets Number of Unique Users
Monica: Age estimation from given names
• Original data provided by CACI, consisting of a total of 12,000 names from a sample of almost 7 million individuals
• However, this sample did not account for people under the age of 18
• Birth certificate data from 1994 to 2011 was used to supplement the dataset (total of 9.7 million names)
• Data was then standardised by the age structure from the 2011 Census
Monica: Age estimation from given names
0-4 5-9 10-14
15-19
20-24
25-29
30-34
35-39
40-44
45-49
50-54
55-59
60-64
65-69
70-74
75-79
80-84
85+0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
PAUL BETTY GUY MUHAMMAD
Age group
Per
cen
t
Age-Sex structure of Twitter Users and 2011 Census
Male Female
Generalised Land Use Database
GLUD category
Tweets (%)Tweets per
km2
Open Water 1.11 402.71
Domestic Buildings
12.93 1748.52
Non-Domestic Buildings
14.14 3468.55
Road 29.36 2681.84
Path 0.84 1204.20
Rail 2.17 1962.57
Green Space 10.91 303.62
Domestic Gardens
17.69 867.89
Other 10.86 1637.06
Hourly Twitter Activity by Land Use
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:0
011
:00
12:0
013
:00
14:0
015
:00
16:0
017
:00
18:0
019
:00
20:0
021
:00
22:0
023
:00
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
35.0%
40.0%
Non-Domestic Buildings Transport ResidentialTime
Pe
rce
nta
ge
of
Tw
ee
ts
Conclusion
• An insight into the ethnic, gender, and age distribution of the Twitter users
• A first attempt to compare any social media data set with the census of population
• Future work will involve the investigation of micro-level activity patterns of twitter users during different times of the day
• We also envisage to extend this analysis to other social media services i.e. FourSquare, Facebook etc.