Masters thesis defense talk

Post on 19-Jun-2015

829 views 4 download

Tags:

Transcript of Masters thesis defense talk

+

Towards Web 3.0: Harnessing Collective Intelligence of Humans for Knowledge Acquisition and Web Accessibility Presenter: Deepti Aggarwal Advisors: Prof. Venkatesh Choppella, & Prof. Vasudeva Varma Reviewers: Dr. Raghu Reddy, & Dr. Priyanka

Srivastava

1

+Evolution of the Web

Web 1.0 Web 2.0 Web 3.0

Web as an information portal

Web as a social platform

Web as a personalized portable web

Focus on ownership Focus on community Focus on individual

Static web pages User generated content

Semantic and portable pages

Meaning is dictated Meaning is socially constructed

Meaning is socially constructed and

contextually reinvented.

1990 2003 2010 2020

2

World wide web is a larger collection of interconnected documents

+Getting to Web 3.0 Major hurdles

1. Scattered data

2. Excess of data

3. Understanding data

Where should I look for the data?

Which data is best for me?

How can I understand the available data?

3

+

n  The web where every data owns its semantics and context of the content is defined by the data.

n  The web which is capable of reading and understanding the user context.

n  CONTEXT refers to why the content is relevant and to whom.

n  SEMANTICS refers to the meaning of data and how it is relevant to a given context.

4

Getting to Web 3.0 Through context and semantics

+

n Personalized web: content and advertising that match user preferences and choices.

n Data on demand: no need for browsing when all databases are semantically connected to each other.

n Multi-lingual web: easy access of sources available in varied languages.

5

Web 3.0 Possible applications

Knowledge acquisition

(Extraction and Validation)

Accessibility (Re-narration)

+Getting to Web 3.0: Methodology & contributions

Methodology: (Human Computer Interaction)

n  Research through design (Zimmerman 2007).

n  Prototyping – User studies – Analysis – Discussions.

Contributions:

n  Three prototypes, and their studies.

(Power of Friends, uPick ) : Extracting and validating information (Alipi) : Making the web more accessible through re-narration

6

+

Exploration: Power of Friends, an online friend sourcing game.

Problem: Extract and validate information

7

+Problem: Extracting & Validating Community related Information

Friends on social networks possess a variety of information about each other.

Applications: to personalize one’s browsing and targeted advertising.

Issues: information is scattered, and no one is an expert.

8

+Existing approaches

Task: Extract information about a person X.

Approach 1: Ask X. (21 questions)

Approach 2: Ask X’s friends. (Bernstein et al. 2008)

Problems: involves social awkwardness of revealing the truth.

9

+Motivation

Looking Glass-Self Theory

Cultural Consensus Theory

Secure Multi-party computation

Power of Ten

Make it fun

Ensure Privacy

Ask everyone

Get opinion of friends

10

+Our approach: Crowd Consensus

Our approach: Ask X’s friend to guess the opinion of X’s other friends.

Benefits: Tackles social awkwardness in an engaging and fun way.

11

!

+Power of Friends: Our Proposed game

A single player, and asynchronous social game.

12

+User study of Power of Friends

²  Seven communities, 67 participants (40 female).

²  Questions related to community members: 10 in each game play.

²  Questions related to the likes, hobbies and daily activities of community members.

² Task: play the game online.

² Four sessions: demographic information and questions about bonding, game demonstration, game play and interview.

13

+Results of the study

Community Id

Number of questions correctly identified

C1 6/10

C2 8/10

C3 5/10

C4 7/10

C5 6/10

C6 8/10

C7 7/10

Results of the study: Communities C2 and C6 were more accurate.

Correlation between the performance of a community and the bonding level within its members.

14

+Study Findings

n  It is challenging: “It requires a lot of thinking. I wish I knew my coworkers better”.

n  It creates a social impact: “It is not possible that my friend … knows cooking, I think she hates it. I have to ask her.”

n  It explores social awkwardness of answering a given question: “It is a cool way of giving my answer ... No one knows my answer except me.”

15

+Study Findings (contd.)

n  It creates a sense of connectedness among people: “Its kind of fun to see how accurately my thinking aligns with my friends.”

n  25% of the participants got confused while playing and thus needed help to remind them the game strategy.

n  30% recommended for multi-player settings; 10% for time-based challenge,60% for publishing the game on Facebook.

16

+Design Themes

n  Identify the level of bonding among friends as it impacts their performance in the game.

n  Include questions of every group member.

n  Select the questions carefully keeping interests of the members in mind.

n  Allow participants to generate questions.

17

+Discussions and Future Work

n  Exploring indirect mode of interaction for larger communities. (IRB approved)

n  A comparative study between direct and indirect mode of answering questions is planned.

n  Publishing game on Facebook. (Social media interaction)

18

Personalized web: content and advertising that match user preferences and choices.

+

Exploration 2: uPick, a crowdsourcing system for extracting Named Entities.

Problem: Extracting and validating information

19

+Problem scenario: Acquiring accurate and up-to-date information about Sachin from various web sources.

20

+Problem: Extracting useful data on demand

21

+Difficulty in Processing English language

“You see sir, I can talk English, I can walk English, I can laugh English, I can run English, because English

is such a funny language. Amitabh Bachhan in the movie Namak Halal

22

+Other Problems

Sachin Tendulkar was born in Bombay. He studied in Sharadashram...

Sachin Tendulkar was born in Bombay. Master Blaster is …

Sachin remembered his father last night … He said he loved poems.

Sachin Tendulkar was born in Bombay. Tendlya is …

Co- reference Ambiguity

Acronym Abbreviations

23

+Constructs of a sentence: Named Entity and relations

n  It is an atomic element in a body of text.

n  Types: person, organization, location etc.

n  Different named entities when linked together, form a relation.

Sachin Tendulkar was born in Bombay

Subject NE of type

‘Person’

Relation NE of type

‘Verb’

Object NE of type ‘Location’

24

+Extracting relationships among NEs: Required process

n  Identify part of speech constructs: noun, verb, adjective etc.

n  Determine co-references, abbreviations and acronyms.

n  Connect them together to form a relationship.

25

+Existing approach: Automated techniques

n  Natural Language Processing based: rule based.

n  Machine Learning based: supervised and unsupervised learning.

n  Other methods: Vocabulary based.

n  Hybrid: NLP and vocabulary based.

n  Issues: Dependency, Scalability.

26

+uPick : Our Proposed System

27

A crowdsourcing system to extract Named Entity relationship from the documents.

+uPick Working

n Step 1: Extract NEs and relations by using POS Tagger and relation extraction rules proposed by Chen.

n Step 2: Present the extracted relations to a crowd in the form a game (challenge).

n Step 3: Collect the generated responses.

n Step 4: Filter the relations by collecting the majority votes and comparing against the expert filtered relations.

28

+Processing of the generated data

n  With the help of human experts, we collected valid relations for each document from automatically generated relations (step 1). These relations form a ground truth dataset for further validation.

n  We compare the collected responses from each game against the expert corrected facts stored in the database and filter out erroneous response data.

n  The relation instances receiving a majority are taken as true facts corresponding to the document.

29

+

User study of

uPick

n  Supervised laboratory study, 12 participants (8 females).

n  Three sessions: training, game play and interview.

n  Four documents: Ashoka Maurya, Sachin Tendulkar, Shahrukh Khan, and Sonia Gandhi.

n Procedure: Read the given text and select the relations from the given list.

30

+Study Results

D1 D2 D3 D4

Total number of presented relations

37 39 40 33

Correctly identified valid relations

19 18 19 15

Incorrectly identified valid relations as invalid

5 6 4 1

Correctly identified invalid relations

12 12 16 15

Incorrectly identified invalid relations as valid

1 3 1 2

Accuracy (Correctly identified relations / total relations)

84% 77% 87% 91%

Accuracy using automated techniques only (Valid relations / total relations)

65% 61% 57% 49%

31

+Discussions and future work

n  Helpful in remembering facts related to a text, so could be used in online education systems.

n  Turn it into an engaging game play.

n  Leaderboards and persistent scoring.

32

Data on demand: no need for browsing when all databases are semantically connected to each other.

+

Exploration: Alipi, an online crowdsourcing system for re-narration.

Problem: Making the web accessible

33

+Problem scenario: Accessibility of the web content

A webpage on Fire Safety is re-narrated in Hindi

34

How can a person who do not know English, understand web pages on fire safety ? Solution : Re - narration

+Why are the existing approaches not sufficient?

n  Single point of control and authority.

n  Author forced to anticipate target audience.

n  Transferring authorship is difficult.

35

+

n  User rewrites different sections of a web-page.

n  Distribution of the point of control from author to users.

n  A step from target audience to target communities.

n  Follows the principle of “the best content for each one”.

36

Alipi: A re-narration framework (Dinesh et al. 2012)

+Alipi Architecture

37

+Alipi Architecture: Creating and Storing the re-narrations

38

+Alipi Architecture: Displaying a re-narrated page to the user

39

+Alipi Prototype

1.Open the website http:://alipi.us. Enter the page of interest, here, http://iiit.ac.in

2. Click on the button “Re-narrate”

40

+Alipi Prototype: Steps to re-narrate a page

3. Select a section of the web-page. Re-narrate the element.

4. Publish your re-narration by providing the target community.

41

+Alipi Prototype: Steps to see the available re-narrations

3. After clicking the “Re-narrations” button, choose a re-narration from the available list.

4. The queried page will change with the re-narrated element.

42

+My contribution: Testing feasibility of alipi

q  IIIT-H R&D showcase: 70 participants (45 male)

q Objective: to find out motivation of the user behind using Alipi, and for what sorts of tasks.

q Task: to re-narrate a web-page: IIIT-H webpage, Indian culture or any other page and later to check the available re-narrations.

q Four phases: demographics, training, system experience and questionnaire.

43

+Findings of the study

q  Participants appreciated both the roles of re-narrator and reader: vary for known and unknown domain.

q Re-narrators preferred text based re-narrations over video and audio re- narrations: to escape from setting the camera, and bandwidth issues.

q Readers preferred re-narrations in mixed media: to get a rich experience.

q Majority wanted to re-narrate for their friends and see re-narrations from known people: preferences known.

q  Participants found the interface design as non-intuitive and uneasy to follow but the system very useful to share information.

44

+My contribution 2: Alipi browser plugin

§  Allowing dynamic filtration based on user profile.

§  By-passes the URL http://alipi.us

§  Decentralize and editable user profile.

45

+Discussions and future work

n  How can we check the credibility of a re-narration: filtration of noisy re-narrations, ranking based on public voting?

n  How can we improve our selection algorithm to incorporate: rapidly growing online communities, dialects of a geographical location, vicinity of user mentioned region?

n  What could be the security implications of Alipi architecture?

46

Multi-lingual web: easy access and interoperability among contents between different languages.

+Summary and the way ahead

47

+

n Personalized web: content and advertising that match user preferences and choices.

n Data on demand: no need for browsing when all databases are semantically connected to each other.

n Multi-lingual web: easy access and interoperability among contents between different languages.

48

Summary of my work:

Knowledge acquisition

(Extraction and Validation)

Accessibility (Re-narration)

uPick

Alipi

Power of Friends

+Future Work

n  Can the proposed Crowd Consensus framework be useful to reduce the number of iterations required for crowdsourcing tasks?

n  Using the belief modality, can we develop a mathematical model to check the accuracy of answer generated by using the Crowd Consensus approach and to determine various related conditions where the accuracy may deviate?

n  Can the proposed uPick approach be useful in enhancing the experience of students while reading textbooks?

n  How to check the relatedness of a re-narration (generated with Alipi tool) with the original document as well as with other available re-narrations for the same web-page?

49

+References

n  C. Cooley. Human Nature & Social Order - Ppr. Social Science Classics Series. Transaction Pub, 1964.

n  M. S. Bernstein, D. Tan, G. Smith, M. Czerwinski, and E. Horvitz. Personalization via friendsourcing. ACM Trans. Comput.-Hum. Interact., 17(2):6:1–6:28, May 2008.

n  P.-S. Chen. English sentence structure and entity-relationship diagrams. Information Sciences, 29(2- 3):127 – 149, 1983.

n  S. C. Weller. Cultural consensus theory: Applications and frequently asked questions. Field Methods, 19(4):339–368, 2007.

50

+References (contd.)

n  I. Tuomi. Data is more than knowledge: implications of the reversed knowledge hierarchy for knowledge management and organizational memory. J. Manage. Inf. Syst., 16(3):103–117, Dec. 1999.

n  S. Sekine. Named Entity: History and Future. 2004.

n  W. Du and M. J. Atallah. Secure multi-party computation problems and their applications: a review and open problems. In Proceedings of the 2001 workshop on New security paradigms, NSPW ’01, pages 13–22, New York, NY, USA, 2001. ACM.

n  Z. Syed, E. Viegas, and S. Parastatidis. Automatic discovery of semantic relations using mindnet. LREC, 2010.

51

+References (contd.)

n  21 Questions. http://apps.facebook.com/twentyoneq/.

n  Mindnet. http://research.microsoft.com/apps/pubs/default.aspx?id=69647.

n  Power of 10. http://en.wikipedia.org/wiki/Power of 10.

n  Stanford pos tagger. http://nlp.stanford.edu/software/tagger.shtml.

52

+Related Publications

n  D. Aggarwal, R. A. Khot, and V. Choppella. Power of Friends: When Friends Guess About their Friends’ Guess. In Proceedings of the SIGCHI conference on Human factors in computing systems, CHI ’13, Paris, France, 2013, ACM.

n  D. Aggarwal, R. A. Khot, V. Varma, and V. Choppella. UPICK: Crowdsourcing Based Approach to Extract Relations Among Named Entities. In Proceedings of IndiaHCI, Pune, India, 2012 (Accepted as full paper).

n  T. B. Dinesh, S. Uskudrali, S. Sastry, D. Aggarwal, and V. Choppella. Alipi: A framework for re-narrating web pages. In Proceedings of the International Cross- Disciplinary Conference on Web Accessibility, W4A ’12, pages 22:1-4, Lyon, France, 2012, ACM.

n  D. Aggarwal, R. A. Khot, A. K. Dey, and V. Choppella. Crowd Consensus: Friendsourcing based approach to generate cultural beliefs. In preparation.

53

+Public Demonstrations

n  Presented “Alipi: Making the web Inclusive and Accessible for All” in IIIT-Hyderabad R&D Showcase, Hyderabad, India, 2013.

n  Presented “Crowdsourcing Based Approach to Extract Relations Among Named Entities” in OpenData Camp-Hyderabad Meet, Hyderabad, India, 2012.

n  Poster presentations on “Power of Friends: Rethinking Games With a Purpose”, and “Alipi: A renarration Web” in IIIT-Hyderabad R&D Showcase, Hyderabad, India, 2012.

54

+

My family

Special Thanks

Reviewers

Prof. Anind Dey Prof. Vasudeva Varma

IIIT-H Faculty

Study participants

Prof. Venkatesh Choppella

Friends

55

Dr. T. B. Dinesh

Thank you!

For more details: deepti.r.aggarwal@gmail.com

http://pascal.iiit.ac.in/~deepti.aggarwal

Web 3.0… Web of opportunities! This is just the beginning!

56