A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

22
A geodemographic analysis of the ethnicity and identity of Twitter users in Greater London Muhammad Adnan, Guy Lansley, Paul Longley Department of Geography, University College London Web: http://www.uncertaintyofidentity.com

description

This is the talk of my GISRUK 2013 conference paper.

Transcript of A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Page 1: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

A geodemographic analysis of the ethnicity and identity of Twitter users in Greater London

Muhammad Adnan, Guy Lansley, Paul Longley

Department of Geography, University College London

Web: http://www.uncertaintyofidentity.com

Page 2: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Introduction

• Use of Social media services has increased• But how representative social media data sets are of the

Census or Electoral roll data ?

• This paper provides an Ethnicity, Age, and Gender analysis of Twitter users• A comparison is provided with the 2011 Census data

• Could have potential applications in cyber maketing and cyber security

Page 3: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Twitter (www.twitter.com)

• Online social-networking and micro blogging service• Launched in 2006

• Users can send messages of 140 characters or less

• Approximately 200 million active users

• 350 million tweets daily

• In 2012, UK and London were ranked 4th and 3rd, respectively, in terms of the number of posted tweets

Page 4: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Data available through the Twitter API

• User Creation Date• Followers• Friends• User ID• Language• Location• Name• Screen Name• Time Zone

• Geo Enabled• Latitude• Longitude• Tweet date and time• Tweet text

Users can download 1% sample of the live tweets through the API

Page 5: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

4 million geo-tagged tweets downloaded during August and December, 2012

Page 6: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

4 million geo-tagged tweets downloaded during August and December, 2012

Page 7: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Predicting Ethnicity of Twitter Users by using their ‘Names’

Page 8: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Analysing Names on Twitter

• Some examples of NAME variations on Twitter

Real Names

Kevin Hodge

Andre Alves

Jose de Franco

Carolina Thomas, Dr.

Prof. Martha Del Val

Fabíola Sanchez Fernandes

Fake Names

Castor 5.

WHAT IS LOVE?

MysticMind

KIRILL_aka_KID

Vanessa

Petuna

Page 9: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Classifying Twitter Data to ethnic origins

• Applied ONOMAP (www.onomap.org) on FORENAME + SURNAME pairs

Kevin Hodge (ENGLISH)

Andre de Franco (ITALIAN)

Page 10: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

English Italian

Pakistani Indian

TurkishGreek

Bangladeshi

Spanish

German French

Portuguese

Sikh

Tweeting Activity by different Ethnic Groups

Page 11: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

• We used Information Theory Index (Thiel’s H) to compare segregation between different Twitter ethnic groups

Where (for each Twitter ethnic group) E = Greater London’s Entropy Ei = Entropy of each output area in Greater London T = Population of London ti = Population of each output area in Greater London

• 0= No Segregation ; 1=Maximum Segregation

Segregation in different ethnic groups of Twitter Users

Page 12: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Segregation in different ethnic groups of Twitter Users

Ethnic Groups Domestic buildings and

gardens

Week Days Week Nights Weekend

British 0.483 0.211 0.401 0.315

Irish 0.67 0.357 0.571 0.475

White Other 0.63 0.303 0.51 0.42

Pakistani 0.765 0.488 0.679 0.633

Indian 0.748 0.451 0.673 0.59

Bangladeshi 0.864 0.671 0.834 0.784

Black Caribbean 0.831 0.548 0.808 0.666

Black African 0.764 0.492 0.704 0.64

Chinese 0.712 0.403 0.608 0.524

Other 0.71 0.374 0.593 0.497

0= No Segregation ; 1=Maximum Segregation

Page 13: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

• Onomap groups were aggregated to match the appropriate groups from the Census

London TotalWhite British

White other

Indian Pakistani BangladeshiBlack African

Chinese

Week Night

53611 71.35% 12.12% 2.63% 2.63% 1.82% 1.52% 1.74%

Week Day 80676 73.12% 11.80% 2.41% 2.41% 1.56% 1.25% 1.61%

Weekend 67351 72.86% 12.17% 2.61% 2.61% 1.67% 1.39% 1.73%

Comparison of Ethnic Groups between ‘2011 Census’ and ‘Twitter’

2011 Census 44.89% 12.65% 6.64% 2.74% 2.72% 7.02% 1.52%

Page 14: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Comparison of the distribution of ethnicity with the 2011 Census

2011 Census Twitter

White British (Quintiles)

Page 15: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Gender and Age Analysis of Twitter Users

Page 16: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Gender Analysis of Twitter Users

Male Female Unisex Not Found0%

10%

20%

30%

40%

50%

60%

Number of Tweets Number of Unique Users

Page 17: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Monica: Age estimation from given names

• Original data provided by CACI, consisting of a total of 12,000 names from a sample of almost 7 million individuals

• However, this sample did not account for people under the age of 18

• Birth certificate data from 1994 to 2011 was used to supplement the dataset (total of 9.7 million names)

• Data was then standardised by the age structure from the 2011 Census

Page 18: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Monica: Age estimation from given names

0-4 5-9 10-14

15-19

20-24

25-29

30-34

35-39

40-44

45-49

50-54

55-59

60-64

65-69

70-74

75-79

80-84

85+0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

PAUL BETTY GUY MUHAMMAD

Age group

Per

cen

t

Page 19: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Age-Sex structure of Twitter Users and 2011 Census

Male Female

Page 20: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Generalised Land Use Database

GLUD category

Tweets (%)Tweets per

km2

Open Water 1.11 402.71

Domestic Buildings

12.93 1748.52

Non-Domestic Buildings

14.14 3468.55

Road 29.36 2681.84

Path 0.84 1204.20

Rail 2.17 1962.57

Green Space 10.91 303.62

Domestic Gardens

17.69 867.89

Other 10.86 1637.06

Page 21: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Hourly Twitter Activity by Land Use

0:00

1:00

2:00

3:00

4:00

5:00

6:00

7:00

8:00

9:00

10:0

011

:00

12:0

013

:00

14:0

015

:00

16:0

017

:00

18:0

019

:00

20:0

021

:00

22:0

023

:00

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

35.0%

40.0%

Non-Domestic Buildings Transport ResidentialTime

Pe

rce

nta

ge

of

Tw

ee

ts

Page 22: A Geodemographic Analysis of Ethnicity and Identity of Twitter Users in Greater London

Conclusion

• An insight into the ethnic, gender, and age distribution of the Twitter users

• A first attempt to compare any social media data set with the census of population

• Future work will involve the investigation of micro-level activity patterns of twitter users during different times of the day

• We also envisage to extend this analysis to other social media services i.e. FourSquare, Facebook etc.