CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January...

46
CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Đorđe Popović, Ognjen Šćekić, Veljko Milutinović Veljko Milutinović A Research Review A Research Review January – July 2006. January – July 2006.

Transcript of CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January...

Page 1: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

CONCEPT MODELING:Đorđe Popović, Ognjen Šćekić,Đorđe Popović, Ognjen Šćekić, Veljko MilutinovićVeljko Milutinović

A Research ReviewA Research Review

January – July 2006.January – July 2006.

Page 2: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

22//4545

Initial Assignment

• January 2006January 2006 Initial assignment:Initial assignment:Get acquainted with different ways of Concept Modeling,Get acquainted with different ways of Concept Modeling,in general.in general.

• More specifically, explore the possibilities offered by RDF and OWL.More specifically, explore the possibilities offered by RDF and OWL.

• One of the ideas: Use the 7 Ws - One of the ideas: Use the 7 Ws - WHAT, WHO, WHEN, WHERE, WHY, WHICH, (W)HOW.WHAT, WHO, WHEN, WHERE, WHY, WHICH, (W)HOW.

Page 3: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

33//4545

What is concept modeling?

• A way of modeling reality:A way of modeling reality: Identifying conceptsIdentifying concepts Identifying relations among conceptsIdentifying relations among concepts Organizing the concepts in a knowledge-base, Organizing the concepts in a knowledge-base,

allowing an "intelligent" way to search and process this data.allowing an "intelligent" way to search and process this data.

• Why do we need concept modeling?Why do we need concept modeling?To make electronic resources not only machine-processable, To make electronic resources not only machine-processable, but also but also machine-understandablemachine-understandable!!

Page 4: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

44//4545

Challenges

• How to create a model that has a uniform structure, How to create a model that has a uniform structure, and is powerful enough to capture the essence of any concept?and is powerful enough to capture the essence of any concept?

• How should these models be linked into an efficient structure?How should these models be linked into an efficient structure?

• How can we bridge the gap between natural languageHow can we bridge the gap between natural languageand a machine-processable model?and a machine-processable model?

Page 5: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

55//4545

Why start with patents?

• Described by a very formal, structured language – Described by a very formal, structured language – claimsclaims..

• Each patent is a novel concept.Each patent is a novel concept.

• Definition of one patent is usually based on another one.Definition of one patent is usually based on another one.

Page 6: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

66//4545

General info about General info about the the patentpatent

Structure of a Patent Document

Abstract of the patentAbstract of the patent

Claims – primary target for WhatClaims – primary target for WhatDescriptionDescriptionReferenceReferences tos to related patent related patentss

Page 7: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

77//4545

Conceptual Indexing

• What is conceptual indexing?What is conceptual indexing?““New technique for organizing information to support subsequent access New technique for organizing information to support subsequent access that can dramatically improve your ability to find the information you need,that can dramatically improve your ability to find the information you need,with less hassle and with better results.”with less hassle and with better results.”

William William A. WoodsA. Woods

• Conceptual indexing combines techniques of:Conceptual indexing combines techniques of: Knowledge representationKnowledge representation Natural language processingNatural language processing Classical techniques for indexing words and phrasesClassical techniques for indexing words and phrases

• BBridgeridgess the gap between natural language the gap between natural languageand a machine processable modeland a machine processable model..

Page 8: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

88//4545

Conceptual Indexing

Conceptual indexing technology is a combination of:Conceptual indexing technology is a combination of:

• Concept extractorConcept extractorIIdentifies phrases to be indexeddentifies phrases to be indexed..

• Concept Concept aassimilatorssimilatorAAnalyzes nalyzes a a concept phrase to determine concept phrase to determine its place in the conceptual taxonomyits place in the conceptual taxonomy..

• Conceptual retriConceptual retrieeval systemval systemUUsseses conceptual taxonomy conceptual taxonomy to maketo make connections connections between requestbetween requesteded and indexed items and indexed items..

Target Text

Concept ExtractorConcept

Assimilator

Conceptual Retrieval System

RequestConnections

between requests and indexed items

Figure 1 –Figure 1 – Main components of a conceptual indexer Main components of a conceptual indexer

Page 9: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

99//4545

Hybrid Approach: Indices + RDF/OWL

• Conceptual indicesConceptual indices• RDF/OWLRDF/OWL

• Motivation:Motivation: Use the advantages of one approach Use the advantages of one approach to eliminate the drawbacks of the other.to eliminate the drawbacks of the other.

Page 10: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

1010//4545

Conceptual Indices vs. RDF/OWL

Conceptual indicesConceptual indices RDF/OWL ontologiesRDF/OWL ontologies

MajorMajoradvantages:advantages:

Linear-complexity structures Very expressive and precise

Provide basic subsumption relations Based on First-Order Logic

Provide built-in knowledgeon low-level concepts Supported by W3C

MajorMajordrawbacks:drawbacks:

Incapability of establishingexplicit relations among

high-level concepts Great complexityIncapability to create

precise models

Page 11: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

1111//4545

Why not use ontologies alone?

• If we want to use an ontology we have 2 choices:If we want to use an ontology we have 2 choices: Use an existing, well-established ontology that might not suite our needs. Use an existing, well-established ontology that might not suite our needs. Create a new ontology which does suit our needs:Create a new ontology which does suit our needs:

– We can create several different ontologies,We can create several different ontologies,depending on depending on howhow we want to capture the information. we want to capture the information.

– Problems arise when we want to merge ontologies.Problems arise when we want to merge ontologies.

• This approach works fine within a closed communityThis approach works fine within a closed communitywith specific needs:with specific needs: There already exists a well-defined basic ontology structure.There already exists a well-defined basic ontology structure. Community members have a good knowledge of how to model new conceptsCommunity members have a good knowledge of how to model new concepts

in terms of the existing ones.in terms of the existing ones.

Page 12: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

1212//4545

Why not use indices alone?

• For example, let us take the simplest possible definition, for a bird:For example, let us take the simplest possible definition, for a bird:

bird bird 11 – – a creature with wings and feathers that lays eggs a creature with wings and feathers that lays eggs and can usually fly. and can usually fly.

• Our index might then contain the following associations:Our index might then contain the following associations:

creature, wings, feathers, eggs, flycreature, wings, feathers, eggs, fly..

• A conceptual index does not offer the possibility A conceptual index does not offer the possibility to state the fact that to state the fact that some birds do not flysome birds do not fly!!

11 - Word definition taken from - Word definition taken from Longman Dictionary of Contemporary EnglishLongman Dictionary of Contemporary English , 3rd edition, 1995., 3rd edition, 1995.

Page 13: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

1313//4545

Hybrid Approach

• An index of associations represents a simple model,An index of associations represents a simple model,similar to what humans have on their mindsimilar to what humans have on their mindwhen they first think of a bird.when they first think of a bird.

• Having enough associations, one can create a model Having enough associations, one can create a model with a considerable degree of accuracy.with a considerable degree of accuracy.

• RDF/OWL statements provide a means RDF/OWL statements provide a means for expressing additional (but very important) informationfor expressing additional (but very important) information(e.g. there are birds that cannot fly!)(e.g. there are birds that cannot fly!)

• We believe this is good enough for most applications.We believe this is good enough for most applications.

Page 14: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

1414//4545

Hybrid Approach

• It is important to keep track of how many times a term is mentioned,It is important to keep track of how many times a term is mentioned,because it affects its descriptive power.because it affects its descriptive power.

Example: Example:

U.S. Patent 6,989,179 – “U.S. Patent 6,989,179 – “Synthetic grass sport surfacesSynthetic grass sport surfaces”, claims section:”, claims section:

1. synthetic grass1. synthetic grass [10][10]

2. playing surface2. playing surface [9] [9]

… …

• These terms represent the essence of what is being described!These terms represent the essence of what is being described!

Page 15: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

1515//4545

Hybrid Approach

• However, this is only because we However, this is only because we knowknow what “synthetic grass” what “synthetic grass” and “playing surface” are!and “playing surface” are!

At some level, we need to have some intrinsic,At some level, we need to have some intrinsic, built-in knowledge-base of basic concepts! built-in knowledge-base of basic concepts!

• All the other concepts can then be described All the other concepts can then be described in terms of these basic concepts.in terms of these basic concepts.

• Solution: Solution: Conceptual indexers are equipped Conceptual indexers are equipped with a knowledge base of basic terms.with a knowledge base of basic terms.

Page 16: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

1616//4545

Patent Model – Conceptual Index

• A patent’s Claims section is scanned and processedA patent’s Claims section is scanned and processedby a by a conceptual indexerconceptual indexer..

• The result is a The result is a descriptive indexdescriptive index,, associated with the patent associated with the patent (it size is approx. 1-5% of the full text).(it size is approx. 1-5% of the full text).

• This index can be seen as an ordered list This index can be seen as an ordered list of the patent’s of the patent’s WHAT associationsWHAT associations (terms, phrases, sentence fragments).(terms, phrases, sentence fragments).

• An entry in the descriptive index contains a low-level concept,An entry in the descriptive index contains a low-level concept,and the number of its occurrences.and the number of its occurrences.

Page 17: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

1717//4545

Patent Model – RDF/OWL

• For a different application, For a different application, a different RDF/OWL model needs to be devised.a different RDF/OWL model needs to be devised.

• For describing patents this model could be used For describing patents this model could be used to capture explicitly stated information:to capture explicitly stated information:

• Patent number and other numbersPatent number and other numbers (( WHICH) WHICH)• Inventor, examiner, attorney, …Inventor, examiner, attorney, … (( WHO) WHO)• Date when the patent was filedDate when the patent was filed (( WHEN) WHEN)• Explicit references to similar patentsExplicit references to similar patents (( WHICH) WHICH)• etc…etc…

• Each W can have multiple sub-categories Each W can have multiple sub-categories that are application-specific!that are application-specific!

Page 18: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

1818//4545

Patent Model – Creation

Patent

XML /RDF

Figure 2 –Figure 2 – Creation of a patent model: Creation of a patent model:

Claims section is processed by the conceptual indexer to produce an index associated with the patent.Claims section is processed by the conceptual indexer to produce an index associated with the patent.

Additional information about the concept is captured by RDF/OWL statements,Additional information about the concept is captured by RDF/OWL statements,into a predefined, application-specific structure.into a predefined, application-specific structure.

ConceptualIndexer

“descriptive index”

Page 19: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

1919//4545

Patent Model – Result

Figure 3 – Figure 3 – Patent model:Patent model:

WHAT associations are contained in a descriptive index. WHAT associations are contained in a descriptive index. Other Ws are expressed through RDF/OWL statements.Other Ws are expressed through RDF/OWL statements.

XML /RDF

XML /RDF

XML /RDF

WH

O

WHEN

WHERE

WHAT

application-specificinformation

“descriptive index”

Page 20: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

2020//4545

Patent model – Big Picture

• Descriptive indices are re-processed by the Conceptual indexer,Descriptive indices are re-processed by the Conceptual indexer,to form the to form the system indexsystem index..

• Each entry in the system index retains links Each entry in the system index retains links to the descriptive indices it originates from,to the descriptive indices it originates from,and vice-versa.and vice-versa.

• This structure allows us to:This structure allows us to: Perform quick searches of the existing patentsPerform quick searches of the existing patents Add/remove patents easilyAdd/remove patents easily

Page 21: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

2121//4545

RDF/OWL INDICES

XML /RDF

XML /RDF

XML /RDF

WHO

WHEN

WHERE

WHAT

XML /RDF

XML /RDF

XML /RDF

WHO

WHEN

WHERE

WHAT

AutomatedReasoner

ConceptualIndexer

”System Index”

XML /RDF

XML /RDF

XML /RDF

WHO

WHEN

WHERE

WHAT

Figure 4 – Figure 4 – Top-level schemeTop-level scheme

Page 22: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

2222//4545

Patent Model – Implicit Links

• Descriptions of similar concepts (patents) Descriptions of similar concepts (patents) usually make a frequent use of similar or even same terms.usually make a frequent use of similar or even same terms.

• By determining overlapping terms we createBy determining overlapping terms we createdynamic, implicit links among similar concepts.dynamic, implicit links among similar concepts.

• The number of such implicit links can be used The number of such implicit links can be used to express similarity among concepts.to express similarity among concepts.

• The algorithm for determining the similarity The algorithm for determining the similarity needs to be tweaked empirically.needs to be tweaked empirically.

Page 23: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

2323//4545

Advantages & Drawbacks

• AdvantagesAdvantages Reduced complexityReduced complexity (a great reduction of direct links between concepts)(a great reduction of direct links between concepts)

Fast search and retrievalFast search and retrieval (as the result of using indices)(as the result of using indices)

ScalabilityScalability

• DrawbacksDrawbacks Use of indices implies loss of precisionUse of indices implies loss of precision

Page 24: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

2424//4545

New Assignment

• May 2006May 2006

Specific assignment:Specific assignment:

Find ways of extracting prior art from previously filed patents.Find ways of extracting prior art from previously filed patents.

Use the results to determine novel art Use the results to determine novel art in the descriptions of patents that have yet to be filed.in the descriptions of patents that have yet to be filed.

Generate new claims from newly found novel art,Generate new claims from newly found novel art,to be submitted for new patents.to be submitted for new patents.

Page 25: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

2525//4545

Determining prior and novel art

• This work is currently done by experts.This work is currently done by experts.

• Requires great knowledge on the subject, Requires great knowledge on the subject, and much time spent searching various databasesand much time spent searching various databasesof existing patents.of existing patents.

• Both time-consuming and money-consuming!Both time-consuming and money-consuming!

Page 26: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

2626//4545

Determining prior and novel art

• Existing tools use statistical, data-mining techniques.Existing tools use statistical, data-mining techniques.

Very efficient and fast algorithms available Very efficient and fast algorithms available for extracting relevant keyphrases.for extracting relevant keyphrases.

But limited capabilities of establishing any other than basic relationsBut limited capabilities of establishing any other than basic relationsamong concepts. Usually undefined relations.among concepts. Usually undefined relations.

Problem: Problem: How to determine more complex relations among concepts How to determine more complex relations among concepts to create claims (sentences)?to create claims (sentences)?

• Solution: Solution: Additional Natural Language Processing (NLP) techniques required!Additional Natural Language Processing (NLP) techniques required!

Page 27: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

2727//4545

Proposed solution – Stage 1

• Statistical analysis & seed extraction:Statistical analysis & seed extraction:

Process the text with a statistical analysis tool. (In our case KEA 3.0)Process the text with a statistical analysis tool. (In our case KEA 3.0)

The output of such tools is an index of relevant words/phrases – The output of such tools is an index of relevant words/phrases – keywordskeywords, , associated with a associated with a scorescore..

Ideally, by using a conceptual indexer Ideally, by using a conceptual indexer the output would be a much more expressive “conceptual index”.the output would be a much more expressive “conceptual index”.

Composite keywords are turned into a single keyword and its descriptors.Composite keywords are turned into a single keyword and its descriptors.

Use empirical rules on word scores and composite phrasesUse empirical rules on word scores and composite phrasesto determine the most relevant keywords, to determine the most relevant keywords, and declare them to be the seeds for further analysis.and declare them to be the seeds for further analysis.

Three stages:1. Statistical analysis & seed extraction

2. Construction of Claims table

3. Creation of claims

Page 28: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

2828//4545

Proposed solution – Stage 1

• Tools such as KEA require initial training and tweakingTools such as KEA require initial training and tweakingto achieve maximum performance.to achieve maximum performance.

• We trained KEA on a set of 12 relevant Sun’s patents.We trained KEA on a set of 12 relevant Sun’s patents.

• All the seeds extracted once are kept in a database,All the seeds extracted once are kept in a database,to be at disposal later when needed.to be at disposal later when needed.

Page 29: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

2929//4545

Proposed solution – Stage 2

• Construction of Claims table:Construction of Claims table:

Text is processed once more Text is processed once more to eliminate the sentences not containing any of the seeds.to eliminate the sentences not containing any of the seeds.

Each seed is assigned an entry in the claims table, Each seed is assigned an entry in the claims table, and its occurrences in the text marked with a unique marker.and its occurrences in the text marked with a unique marker.

The text is then analyzed sentence by sentence. The text is then analyzed sentence by sentence.

Each sentence is decomposed into its functional parts – Each sentence is decomposed into its functional parts – subject fragments, object fragments, predicate fragments subject fragments, object fragments, predicate fragments and different adverbial fragments. and different adverbial fragments. (NLP – the hardest part!)(NLP – the hardest part!)

Page 30: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

3030//4545

[0] [0] GrassGrass (WHAT)(WHAT) TYPE: syntheticTYPE: synthetic

[1] [1] Surface(s) Surface(s) (WHAT)(WHAT) TYPE: [0], support, playingTYPE: [0], support, playing

are manufactured from are manufactured from s.g. panels [2]s.g. panels [2](predicate)(predicate)

[2] [2] Panel(s)Panel(s) (WHAT)(WHAT) TYPE: [0]TYPE: [0]

are placed side-by-sideare placed side-by-side (predicate)(predicate) to form continuous to form continuous support surface[1]support surface[1] (WHY)(WHY)form continuous support surfaceform continuous support surface (predicate)(predicate)are formed of are formed of grass sections[3]grass sections[3] (predicate)(predicate)are square OR rectangular are square OR rectangular (predicate)(predicate)have different color toneshave different color tones (predicate)(predicate)

[3]. [3]. Section(s)Section(s) (WHAT)(WHAT) TYPE: [0]TYPE: [0]

are cut from are cut from grass panels [from 2]grass panels [from 2] (predicate)(predicate) are sewn OR glued OR attached togetherare sewn OR glued OR attached together

(predicate)(predicate) by a hook and loop attachmentby a hook and loop attachment (HOW)(HOW) in a criss crossed wayin a criss crossed way (HOW)(HOW) to create a checkered patternto create a checkered pattern (WHY)(WHY)create checkered patterncreate checkered pattern

(predicate)(predicate)are assembled with ribbons OR fibersare assembled with ribbons OR fibers

(predicate)(predicate) lying in different directionslying in different directions (HOW)(HOW)

[4]. [4]. Ribbon(s) Ribbon(s) (WHAT)(WHAT) TYPE: TYPE: [2][2]

lie in different directionslie in different directions (predicate)(predicate)are fibrillatedare fibrillated (predicate)(predicate)

to remove the grain directionsto remove the grain directions (WHY)(WHY)etc…etc…

Figure 5 – Figure 5 – U.S. Patent 6,989,179 – “Synthetic grass sport surfaces”, Claims table (part of)U.S. Patent 6,989,179 – “Synthetic grass sport surfaces”, Claims table (part of)

Page 31: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

3131//4545

Proposed solution – Stage 3

• Creating claims once the table is complete is straightforward.Creating claims once the table is complete is straightforward.• Here are some of the created claimsHere are some of the created claims

from the previously shown table:from the previously shown table:

A synthetic grass surface manufactured from synthetic grass panels.A synthetic grass surface manufactured from synthetic grass panels. A synthetic grass playing surface as defined in claim 1, A synthetic grass playing surface as defined in claim 1,

wherein said synthetic grass panels are placed side by sidewherein said synthetic grass panels are placed side by sideto form a continuous support surface.to form a continuous support surface.

A synthetic grass playing surface as defined in claim 2, A synthetic grass playing surface as defined in claim 2, wherein said synthetic grass panels are formed of synthetic grass sections.wherein said synthetic grass panels are formed of synthetic grass sections.

• Generated claims are compared against prior-art databaseGenerated claims are compared against prior-art databaseto select only those claims describing potential novel art.to select only those claims describing potential novel art.

Page 32: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

3232//4545

Problems

Major obstacles that needed to be overcome were:Major obstacles that needed to be overcome were:

• How to determine prior-art:How to determine prior-art: Concept classifierConcept classifier Sentence Template Tool (NLP)Sentence Template Tool (NLP)

• How to determine functional parts of a sentence:How to determine functional parts of a sentence: Sentence Analyzer (NLP)Sentence Analyzer (NLP)

Page 33: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

3333//4545

KEA

keyphrase extractor

[6]

Patent document[2]

NLPsentence

analysis tool[5]

Set of SUN’s previous patents[1]

Related patents[3]

Parsed sentences

[7]

Parsed sentences

[7]

Keywords

Concept Classifier[10]

Concept Classifier[10]

Sentence template tool[9]

Sentence template tool[9]

Determine seeds[13]

Determine seeds[13]

Pu

rify with

a w

ord

stem

me

r too

l an

d Lo

Pa

r

Seeds

Potential novel

concepts&

Prior art[11]

Potential novel

concepts[12]

Potential novel

concepts[12]

Create all possible affirmative sentences containing seeds

and place them in Claims table(unfiltered)

Create all possible affirmative sentences containing seeds

and place them in Claims table(unfiltered)

Filter out sentencesdescribing prior-art concepts,

and sort by relevance[18]

Filter out sentencesdescribing prior-art concepts,

and sort by relevance[18]

Human interaction

Generate new claims[19]

Generate new claims[19]

Keywords & claimsDatabase

[4]claims

Keywords & claimsDatabase

[4]

Ke

ywo

rds fro

m previo

us p

ate

nts

[8][8]

[14][14]

Keywords & claimsDatabase

[4]Add new entry

Keywords (training set)

[15][15]

Grammar rules to determine

functional parts of a sentence

Grammar rules to determine

functional parts of a sentence

LEGEND:

- System part that needs to be implemented

- Output or partial result

- External tool

- Database or set of data

- Dataflow

- Interaction

NLPsentence

analysis tool[5]

Figure 6 – Figure 6 – Top-level schemeTop-level scheme

Patent description is processed by KEA Patent description is processed by KEA and the Sentence template tool and the Sentence template tool to extract relevant keywords (seeds).to extract relevant keywords (seeds).

Seeds are compared against prior art Seeds are compared against prior art contained in the database.contained in the database.

Claims table is created Claims table is created by analyzing sentences containing seeds.by analyzing sentences containing seeds.

Generate new claims from the table.Generate new claims from the table.

NLP processingNLP processing

Page 34: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

3434//4545

Implementation of NLP parts

• A subgroup of the research team began working on the NLP tools.A subgroup of the research team began working on the NLP tools.

• After extensive research we adopted the Stanford parserAfter extensive research we adopted the Stanford parseras the base tool for our work. (as the base tool for our work. (http://nlp.stanford.eduhttp://nlp.stanford.edu))

• The parser analyzes single sentences.The parser analyzes single sentences.Its output is a tree structure showing types of words and sentence fragments.Its output is a tree structure showing types of words and sentence fragments.

• It can also determine basic grammar relations.It can also determine basic grammar relations.

• Our plan: Our plan: Use the first output to create the template tool, Use the first output to create the template tool, and both outputs to determine functional parts of a sentence.and both outputs to determine functional parts of a sentence.

Page 35: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

3535//4545

"One implementation of the snapshot copy process provides a two-table approach.""One implementation of the snapshot copy process provides a two-table approach."(ROOT(ROOT (S(S (NP(NP (NP (CD One) (NN implementation))(NP (CD One) (NN implementation)) (PP (IN of)(PP (IN of) (NP (DT the) (NN snapshot) (NN copy) (NN process))))(NP (DT the) (NN snapshot) (NN copy) (NN process)))) (VP (VBZ provides)(VP (VBZ provides) (NP (DT a) (JJ two-table) (NN approach)))(NP (DT a) (JJ two-table) (NN approach))) (. .)))(. .)))

num(implementation-2, One-1)num(implementation-2, One-1)nsubj(provides-8, implementation-2)nsubj(provides-8, implementation-2)det(process-7, the-4)det(process-7, the-4)nn(process-7, snapshot-5)nn(process-7, snapshot-5)nn(process-7, copy-6)nn(process-7, copy-6)prep_of(implementation-2, process-7)prep_of(implementation-2, process-7)det(approach-11, a-9)det(approach-11, a-9)amod(approach-11, two-10)amod(approach-11, two-10)dobj(provides-8, approach-11)dobj(provides-8, approach-11)

Stanford parser – an example

Grammar relations can be used Grammar relations can be used to determine main functional to determine main functional parts of sentences.parts of sentences.

Page 36: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

3636//4545

Sentence Template Tool

• Motivation:Motivation:

IIn a single patent document authors often use n a single patent document authors often use the same sentence templates for describing various patent parts. the same sentence templates for describing various patent parts.

• This tool allows the users to specify the sentence templates to find, This tool allows the users to specify the sentence templates to find, and the parts they want extracted.and the parts they want extracted.

Page 37: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

3737//4545

Sentence Template Tool

• Example from the US patent No. 6,804,755 :Example from the US patent No. 6,804,755 :

FIG. 1 is a pictorial representation of a FIG. 1 is a pictorial representation of a distributed data processing systemdistributed data processing system in which the present in which the present invention may be implemented; invention may be implemented;

FIG. 2 is a block diagram of a FIG. 2 is a block diagram of a storage subsystemstorage subsystem in accordance with a preferred embodiment of in accordance with a preferred embodiment of the present invention; the present invention; ......FIG. 10 is an exemplary block diagram of a FIG. 10 is an exemplary block diagram of a multi-layer mapping tablemulti-layer mapping table in accordance with a in accordance with a preferred embodiment of the present invention; preferred embodiment of the present invention;

FIG. 11 is an exemplary illustration of FIG. 11 is an exemplary illustration of FlexRAIDFlexRAID in accordance with the preferred embodiment of in accordance with the preferred embodiment of the present invention;the present invention; ......

etc. etc.

• There are more than 20 sentences of the same structure There are more than 20 sentences of the same structure in this patent descriptionin this patent description ! !

Page 38: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

3838//4545

Sentence Template Tool

• This sentence structure is typical for many patent descriptions, This sentence structure is typical for many patent descriptions, when the inventor is describing what the pictures represent. when the inventor is describing what the pictures represent.

• Picture description sentences Picture description sentences may contain important novel concepts. may contain important novel concepts.

• Novel patents from already filed patents can be treated as prior art Novel patents from already filed patents can be treated as prior art for the analyses of future patents.for the analyses of future patents.

Page 39: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

3939//4545

Sentence Template Tool

• For example:For example:

"FIG. 10 is an exemplary block diagram of a "FIG. 10 is an exemplary block diagram of a multi-layer mapping tablemulti-layer mapping table in accordance with a preferred embodiment of the present invention."in accordance with a preferred embodiment of the present invention."

• The query that would return the underlined sentence partThe query that would return the underlined sentence partmight look like this:might look like this:

” ”Fig” * ”is” * <NounPhrase><Preposition><?:NounPhrase>*<.>Fig” * ”is” * <NounPhrase><Preposition><?:NounPhrase>*<.>

• We developed a comprehensive query syntax We developed a comprehensive query syntax for comparing parsed sentence trees, for comparing parsed sentence trees, similar to the one shown here.similar to the one shown here.

Page 40: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

4040//4545

Advantages

• Frequently used queries can be stored for later use. Frequently used queries can be stored for later use.

• If this tool is to be used primarily within a company, If this tool is to be used primarily within a company, people working for the company can be given the guidelinespeople working for the company can be given the guidelineson how to describe certain parts of the patent on how to describe certain parts of the patent to facilitate and make more efficient the use of this tool.to facilitate and make more efficient the use of this tool.

• The key advantage of this approach The key advantage of this approach is that it is much more accurate than statistical tools, is that it is much more accurate than statistical tools, because it is controlled by the humans. because it is controlled by the humans.

Page 41: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

4141//4545

An Unfortunate Turn . . .

• Unfortunately, the funding for the project was not approved Unfortunately, the funding for the project was not approved

• Our goal now is to use the accumulated knowledgeOur goal now is to use the accumulated knowledgein a somewhat different direction! in a somewhat different direction!

Page 42: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

4242//4545

Future plans

• Use the results returned by Google, Use the results returned by Google, refine them by applying the semantic analysisrefine them by applying the semantic analysisand give immediate answers to user queries!and give immediate answers to user queries!

• Users should be able to use the query syntax Users should be able to use the query syntax to specify not merely the keywords, to specify not merely the keywords, but also require the terms to appear in a specified context, but also require the terms to appear in a specified context, or ask specific questions.or ask specific questions.

Page 43: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

4343//4545

Future plans

• This kind of analysis requires an enormous amount of CPU time, This kind of analysis requires an enormous amount of CPU time, and should therefore be performed only for specific searches:and should therefore be performed only for specific searches:

PatentsPatents Legal acts and documentsLegal acts and documents Newspaper and other archivesNewspaper and other archives Deep internet searchDeep internet search etc.etc.

Page 44: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

4444//4545

Future plans

• Possible solution: Possible solution: Each document should contain an additional metadata section, Each document should contain an additional metadata section, which would contain the parsed data from the plain text which would contain the parsed data from the plain text contained in it.contained in it.

• That way, documents that change rarelyThat way, documents that change rarelyshould be processed only once.should be processed only once.

• Additional storage costs should be outweighedAdditional storage costs should be outweighedby the increased search performance.by the increased search performance.

Page 45: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

4545//4545

Future plans

• Our idea is still in the first stage of development.Our idea is still in the first stage of development.

• Further research is needed to explore the quality Further research is needed to explore the quality and feasibility of the proposed solution.and feasibility of the proposed solution.

• However, we expect to produce some interesting results However, we expect to produce some interesting results ..

Page 46: CONCEPT MODELING: Đorđe Popović, Ognjen Šćekić, Veljko Milutinović A Research Review January – July 2006.

Đorđe PopovićĐorđe Popović Ognjen ŠćekićOgnjen Šćekić Veljko Veljko MilutinovićMilutinović [email protected]@ptt.yu [email protected]@cg.yu [email protected]@etf.bg.ac.yu

Thank you !Thank you !

CONCEPT MODELING: A Research ReviewA Research Review