Bipin Shetty Santosh Kalyankrishnan. Project Thesis In this project, we have analyzed information...

9
Social Networking sites and Indian caste system Bipin Shetty Santosh Kalyankrishnan

Transcript of Bipin Shetty Santosh Kalyankrishnan. Project Thesis In this project, we have analyzed information...

Page 1: Bipin Shetty Santosh Kalyankrishnan. Project Thesis In this project, we have analyzed information gathered from social networks to understand the nature.

Social Networking sites and Indian caste system

Bipin ShettySantosh Kalyankrishnan

Page 2: Bipin Shetty Santosh Kalyankrishnan. Project Thesis In this project, we have analyzed information gathered from social networks to understand the nature.

Project ThesisIn this project, we have analyzed information gathered

from social networks to understand the nature of the bias, if any.

We aim to look at preference in making friend linkages among various Orkut users to figure out if there is a preference with respect to caste and language and to what extent. We have also calculated bias on various cities on above criteria.

We have a large amount of Orkut data e.g Names,friends links provided to us, which we will use to mine various information and metrics. Based on these metrics, we hope to derive conclusions on the degree of bias existing.

Page 3: Bipin Shetty Santosh Kalyankrishnan. Project Thesis In this project, we have analyzed information gathered from social networks to understand the nature.

Milestones completed We did lot of data gathering on identifying caste name,

language and associated last names. We were able to identify 616 frequently occurring last

names, their caste, religion, language associated.• We have stored above information in XML format with respect

to tags <casteName><lastname><religion><parentCaste><language>

We have processed last names provided in our data, compare with last name listing of our listing and identify the caste, language,parentCaste

of each individual using mysql scripts.• We will then insert those data into a table that identify user

profile, caste name, language, location.

Page 4: Bipin Shetty Santosh Kalyankrishnan. Project Thesis In this project, we have analyzed information gathered from social networks to understand the nature.

Milestones completed We were able to indentify the links between

caste(intracaste)/intra-languange/intra-ParentCaste and links outside caste(Inter-caste)/Inter-language/inter-ParentCaste.

Calculation of Modularity : We have used the formula Q = (e(ii) − a2(i) )

Modularity is then a measure of the fraction of intra-community edges minus the expected value of the same quantity in a network with the same community divisions, but with edges placed without regard to communities.

Modularity therefore ranges from -1 to 1, with 0 representing no more community structure than would be expected in a random graph, and significantly positive values representing the presence of strong community structure.

Page 5: Bipin Shetty Santosh Kalyankrishnan. Project Thesis In this project, we have analyzed information gathered from social networks to understand the nature.

AccomplishmentWe were able to identify

caste/language/parentcaste of about 25% profiles.

Calculated bias on caste, language, parentCaste using above modularity algorithm. All Bombay Hyderaba

d

Sub-Caste0.24 0.14 0.26

Parent Caste 0.55 0.36

0.55

Language0.43 0.32 0.43

Page 6: Bipin Shetty Santosh Kalyankrishnan. Project Thesis In this project, we have analyzed information gathered from social networks to understand the nature.

Interpretation and ConclusionWe find strong bias towards parent Caste in

making friends in orkut social network. This is attributed to the fact that only 2 major castes find maximum occurrence.

We can conclude that language is a significant criteria for making friends in orkut.

We also find strong bias in making friends with respect to sub-caste .

Our finding also points stronger bias in caste and language in non-cosmopolitan cities like Hyderabad in contrast to metropolitan and multilingual cities like Bombay.

Page 7: Bipin Shetty Santosh Kalyankrishnan. Project Thesis In this project, we have analyzed information gathered from social networks to understand the nature.

Next MilestoneCalculate the bias on the 3 parameters for

few more cities to understand the distribution.

Alan has also suggested to run an algorithm to find strong community structure in our data. We would then calculate the bias with in the community structure

Page 8: Bipin Shetty Santosh Kalyankrishnan. Project Thesis In this project, we have analyzed information gathered from social networks to understand the nature.

Tradeoffs and bottlenecksMany orkut user names were not crawled so

we will not be able to properly identify caste.Some orkut users don’t have lastname, also

last name for many don’t map to a caste .

Page 9: Bipin Shetty Santosh Kalyankrishnan. Project Thesis In this project, we have analyzed information gathered from social networks to understand the nature.

Any Questions?