CSE 574 Extracting, Managing & Personalizing Web Information

Post on 09-Jan-2016

25 views 0 download

description

CSE 574 Extracting, Managing & Personalizing Web Information. Staffing Dan Weld Raphael Hoffmann Content Intersection of AI, ML, DB & HCI Student Responsibilities Reading, Reports, Discussion Project (for those taking 3 credits). Class Focus. Extracting, Managing & - PowerPoint PPT Presentation

Transcript of CSE 574 Extracting, Managing & Personalizing Web Information

04/20/23 23:56 1

CSE 574 Extracting, Managing & Personalizing Web Information

• Staffing– Dan Weld– Raphael Hoffmann

• Content – Intersection of AI, ML, DB & HCI

• Student Responsibilities– Reading, Reports, Discussion– Project (for those taking 3 credits)

Class Focus

Extracting, Managing & Personalizing Web Information

04/20/23 23:56 2

Why Information Extraction• Next-Generation Search

– Citeseer, Google scholar, MSRA Libra– Google product search– Flipdog– Zvents– Zoominfo

• Question Answering

04/20/23 23:56 3

04/20/23 23:56 5

People

04/20/23 23:56 6

…Continued

04/20/23 23:56 7

…Continued Some More

04/20/23 23:56 8

Making Structured Content • Information Extraction

– E.g. Google Scholar– Cons: Noisy

• Communal Content Creation– E.g. Wikipedia– Cons: Bootstrapping & Incentives

04/20/23 23:56 9

Why Managing ?• Select• Store, Index, Aggregate• Search, Query, Explore• Share, Collaborate, “Publish”

Example: Personalized Portalscf DBlife, Rexa, Dontcheva UIST-07

04/20/23 23:56 10

DBlife

04/20/23 23:56 11

Summaries - 1

04/20/23 23:56 12

Summaries - 2

04/20/23 23:56 13

Summaries - 3

04/20/23 23:56 14

Summaries - 4

04/20/23 23:56 15

Summaries - 5

04/20/23 23:56 16

Summaries - 6

04/20/23 23:56 17

Why Personalize?• Because we can.

04/20/23 23:56 18

Preliminary Schedule• Information Extraction

– Traditional Machine Learning Approaches– Self-Supervised Methods– Other Issues: Coreference & Ontology

• Collaborative Content Creation & UI Issues– Applying Contraints from Interaction to Learning– Decision Theoretic Interaction– Faceted Interfaces

• Community Information Management – Extraction over Evolving Text– Data Provenance – Mashups & Personalized Web

• Next-Generation Search – Inference, Textual Entailment, Machine Reading – Entity Search

04/20/23 23:56 19

04/20/23 23:56 20

For next time• Read

– Agichtein, Gravano. Snowball: Extracting Relations from Large Plain-Text Collections.

• Add yourself to mailing list• Look at papers on website wiki

– Add new ones– Add summary (different from report)– Notate if you wish to present one

• Think about project / (form a group?)