Bipin Shetty Santosh Kalyankrishnan. Project Thesis In this project, we have analyzed information...
-
Upload
cory-campbell -
Category
Documents
-
view
212 -
download
0
Transcript of Bipin Shetty Santosh Kalyankrishnan. Project Thesis In this project, we have analyzed information...
Social Networking sites and Indian caste system
Bipin ShettySantosh Kalyankrishnan
Project ThesisIn this project, we have analyzed information gathered
from social networks to understand the nature of the bias, if any.
We aim to look at preference in making friend linkages among various Orkut users to figure out if there is a preference with respect to caste and language and to what extent. We have also calculated bias on various cities on above criteria.
We have a large amount of Orkut data e.g Names,friends links provided to us, which we will use to mine various information and metrics. Based on these metrics, we hope to derive conclusions on the degree of bias existing.
Milestones completed We did lot of data gathering on identifying caste name,
language and associated last names. We were able to identify 616 frequently occurring last
names, their caste, religion, language associated.• We have stored above information in XML format with respect
to tags <casteName><lastname><religion><parentCaste><language>
We have processed last names provided in our data, compare with last name listing of our listing and identify the caste, language,parentCaste
of each individual using mysql scripts.• We will then insert those data into a table that identify user
profile, caste name, language, location.
Milestones completed We were able to indentify the links between
caste(intracaste)/intra-languange/intra-ParentCaste and links outside caste(Inter-caste)/Inter-language/inter-ParentCaste.
Calculation of Modularity : We have used the formula Q = (e(ii) − a2(i) )
Modularity is then a measure of the fraction of intra-community edges minus the expected value of the same quantity in a network with the same community divisions, but with edges placed without regard to communities.
Modularity therefore ranges from -1 to 1, with 0 representing no more community structure than would be expected in a random graph, and significantly positive values representing the presence of strong community structure.
AccomplishmentWe were able to identify
caste/language/parentcaste of about 25% profiles.
Calculated bias on caste, language, parentCaste using above modularity algorithm. All Bombay Hyderaba
d
Sub-Caste0.24 0.14 0.26
Parent Caste 0.55 0.36
0.55
Language0.43 0.32 0.43
Interpretation and ConclusionWe find strong bias towards parent Caste in
making friends in orkut social network. This is attributed to the fact that only 2 major castes find maximum occurrence.
We can conclude that language is a significant criteria for making friends in orkut.
We also find strong bias in making friends with respect to sub-caste .
Our finding also points stronger bias in caste and language in non-cosmopolitan cities like Hyderabad in contrast to metropolitan and multilingual cities like Bombay.
Next MilestoneCalculate the bias on the 3 parameters for
few more cities to understand the distribution.
Alan has also suggested to run an algorithm to find strong community structure in our data. We would then calculate the bias with in the community structure
Tradeoffs and bottlenecksMany orkut user names were not crawled so
we will not be able to properly identify caste.Some orkut users don’t have lastname, also
last name for many don’t map to a caste .
Any Questions?