Template-based Authoring Knowledge Systems Laboratory Stanford.

38
Template-based Authoring Knowledge Systems Laboratory Stanford

Transcript of Template-based Authoring Knowledge Systems Laboratory Stanford.

Template-based Authoring

Knowledge SystemsLaboratoryStanford

Project Goals Assist analyst in everyday work Knowledge Authoring Tools to assist in:

Research for reports Produce reports Consume reports Share reports

Our solution: Semantic Web Templates

Semantic Web Templates Knowledge Representation,

Semantics are key for information exchange

Creation, maintenance of knowledge must be transparent

Automate extraction of knowledge Enhance knowledge retrieval

methods

Semantic Web Templates Similar to MS Word Templates

Different templates for different tasks Word templates can have restrictions on

text Very primitive, such as length of text Simplistic patterns such as “phone number” No concepts such as “color” or “country”

One template, many documents HTML templates are very common today

Many web sites use SQL database as back end, template + SQL HTML

Semantic Web Templates An HTML file with additional tags Tags specify:

Where particular knowledge is stated What kind of knowledge it is Where it came from, if applicable References to an entity or relation Repetitive regions of text

Goal: Assist Research Unstructured Extraction

Sort through buckets of data to find gold

Entity recognition Relation recognition

Semistructured Extraction Utilize repetitive patterns within a page Use similar pages to extract more data Robust despite changing pages, data

Unstructured Extraction Natural language processing News feeds Indexing, storage, retrieval Plugin architecture

Web Services Our system, collaboration with IBM via NIMD

Rover news crawler Political news articles from Yahoo! 22,000 articles, ~8500 concepts, ~1000 relations

Used in authoring tools

Unstructured Extraction Pattern based system

Leverage “hints” for the reader in news articles British Prime Minister Tony Blair <type Country><subClassOf Politician> <unknown name> “Tony Blair” is a Prime Minister who represents the

Country “England”. System runs daily on Yahoo political news Highlights known terms in green Highlights new terms in red Used to create search index, maintain KB Demo

Semi-structured Extraction Extract, produce knowledge Initial model is Domain Authorities

Enhance KB with ground facts Strong for relations and breadth of data Leverages work of others Makes use of SQL databases

Future work is wide-scale web of trust

Semi-structured Extraction Site Registry

By description and property CIA World Fact Book has data about

items which are of type <Country> CIA World Fact Book has properties

<population>, <hasNeighbor>, <hasMembership>, etc.

Demo

Semi-structured Extraction Publishing

Human editing good for high-level concepts

Automated techniques good for relations, ground level facts, and massive repetition

Rover web crawler Template construction is currently

manual With critical mass of data, templates

could be discovered.

Enhanced Document Retrieval Enhanced document retrieval

Search based on concept Find articles about… Membership: Scottie Pippen Trailblazers Membership: Osama bin Laden al-Qaeda Subgroups:

Ramadan Shallah Islamic Jihad al-Qaeda

Semantic search

Enhanced Document Retrieval Document Augmentation

Sidebar acts as glossary as you read Pre-fetch data user is likely to want Adapt to user preferences, activities Deeper understanding for user, gets

answers to questions raised while reading

Enhanced Document Retrieval

Search Augmentation Google assumes users only want

documents Provide answers along with documents Use query term denotation to more

closely target results “Browns Ferry” is a garden park “Browns Ferry” is a nuclear power plant Automates what people do with IR systems

Append hints about the type of term being sought

Search Augmentation

Search Augmentation Demo: Basic Search Demo: Followup Data Demo: Disambiguation Demo: Relations

Basic Question Answering Automated techniques for ground

facts Use reasoners for higher-level facts

Tie in with KSL AQUAINT work Feedback, direction from user Structure of knowledge allows

simple form of question answering

Basic Question Answering Multiple views into data Browse interface

Ugly, but complete view Activity-based knowledge

presentation Search, document augmentation Future work accept user feedback,

customization, preferred sources

Basic Question Answering Query by example

Users create many similar documents These are targeted to an activity Use past work to speed present work User creates and templates which

present data they find interesting in a way they find convenient

Query by Example

Query by Example

Query by Example

Goal: Produce Reports Most reports are made with Office

Word processor, spreadsheet Enhance with semantic awareness Provide seamless access to

knowledge Transparent maintenance, creation

Low overhead of operation Avoid centralized approach Contrast with relational database

Word Processing Creation of new data

Semantic scan Like spell check or grammar check Automatically identifies referenced entities Learns new entities, relations between

entities Annotation of text

User manually adjusts system User adds new data

System gets smarter over time

Word Processing Create data via entry into templates Create new templates

For others For personal use

Extend templates with new entry areas Enhance analyst’s view

Semantic Search, Document Augmentation Sidebar boxes are templates too

Word Processing Demo: Semantic Scan Demo: Annotation Demo: Knowledge Creation

Spreadsheets Spreadsheets are key tools in

analysis Tabular format, UI are both intuitive Sorting, basic math functions We add semantics:

New formula type: “Get Data” New formula type: “Put Data”

Summarization, new views

Spreadsheets Example scenario

Suppose SARS was found to affect Asian-Americans more than others?

Analyst wants to determine, based on that, which states are most at risk

Knowledge from Census tells us Asian-American population as a percentage

Spreadsheets

Spreadsheets

Spreadsheets

Spreadsheets

Spreadsheets

Spreadsheets

Goal: Consume Reports Verify others’ data against yours Incorporate others’ results into your

knowledge base, track sources Maintain data

Change notification Document updates with new data

Versioning of documents, data

Goal: Share Reports Easily exchangable via e-mail Truth maintenance techniques Multiple views into data Leverage domain expertise

The missile guy has a KB, … Collaboration, trust levels

Colleagues disagree, sources are unreliable

Conclusion KD-D effort is focused on

authoring, analysis tasks Leverage automated techniques to

complement manual techniques System gets smarter as it’s used Tie in with commonly used

applications