Automatic Discovery of Technology Trends from Patent Text
description
Transcript of Automatic Discovery of Technology Trends from Patent Text
![Page 1: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/1.jpg)
Automatic Discovery of Technology Trends from Patent Text
Youngho Kim, Yingshi Tian, Yoonjae Jeong, Ryu Jihee, Sung-Hyon Myaeng
School of EngineeringInformation and Communication
University, South Korea
![Page 2: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/2.jpg)
Introduction
• Motive: Patent text is a good source to discover technological progresses.
• Problem: Previous solutions(citation analysis, network-based patent analysis) for patent domain have some drawbacks– Need domain expertise– Not easy to recognize salient concepts– Hamper wide application of the proposed method
![Page 3: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/3.jpg)
Introduction
• In this paper, the authors want to– Avoid the limitations mentioned previously
• Method1. Semantic key-phrase extraction(No experts)2. Technological trend discovery(Unsupervised)
• Semantic key-phrase define:– Problem, such as “recognizing spoken language”– Solution, such as “language model”– Domain, such as “speech recognition”
![Page 4: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/4.jpg)
Introduction
• Application: help users explore numerous technical documents efficiently to get the technological trends, the below is a example
![Page 5: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/5.jpg)
Overall procedure
1. Technology identification through semantic key-phrase extraction• The probabilistic framework with linguistic clues• The probabilistic framework have weighting • The linguistic clues have weighting• Finally, Using statistical learner to learn(Libsvm)
2. Discover technological trends by • Select important technologies during a time sapn• Linking them according to semantic relatedness
![Page 6: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/6.jpg)
Problem Formulation
• Definition– Domain : A field of technology given by a user
query, then generate a collection of related field– Problem : A patent or a method attempts to solve– Solution : A method, a model or an approach that
is associated with a particular problem– Technology : A combination of a problem, a
solution, and the given domain– Time Span :
![Page 7: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/7.jpg)
Problem Formulation
• Definition– Technological Trend : a main stream of
technologies during a time span l.• Example:
![Page 8: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/8.jpg)
Technological Trend Discovery System
• Structure of Patent Documents
• Semantic Key-phrase Extraction– Problem Extraction– Solution Extraction
• Technological Trend Discovery
![Page 9: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/9.jpg)
Structure of Patent Documents
• Database : USPTO(United States Patent and Trademark office)
Time span
Citeinformation
Linguistic features
Linguistic features
Linguistic features
![Page 10: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/10.jpg)
Semantic Key-phrase Extraction
• Step 1– Parsing a patent to get smallest noun phrase as key-
phrase candidates(e.g. signal patterns)– Expand NP to V+NP by dependency(e.g. recognizing
signal patterns)• Step 2– Identify Problem key-phrase by classifying
• Step 3– Among the rest of candidate, extract solution key-
phrase to get
![Page 11: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/11.jpg)
Problem Extraction Feature
• Topical language model(unigram)
• Consider the dependency(bigram model)
• Special smoothing: Relevance & background language model
![Page 12: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/12.jpg)
Problem Extraction
• Question: Probability model is biased to the topicality, need other mechanism to revise it
• Method: Linguistic clues– Gather all distinct patterns from the annotation– Generalize grammar by these pattern– E.g. (method/NN+in/PP )and(system/NN+in/PP) ==> ( method | system )NN+in/PP
![Page 13: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/13.jpg)
Problem Extraction Feature
• 342 generalized patterns
![Page 14: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/14.jpg)
Problem Extraction
• generalized patterns need a confidence
• A statistical machine learner(Libsvm) to the linguistic clues and the language models.
• Libsvm classify the candidate into problem & non-problem by using the above features
![Page 15: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/15.jpg)
Solution Extraction
• Probability features work would not be useful– The solution phrase are rarely share within cited
document• Add the “head word” feature(i.e. model,
approach, method, methodology etc.)• the other feature category is the same as
Problem Extraction
![Page 16: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/16.jpg)
Technology Trend Discovery
• Reduction: Select several salient technologies and associate semantic relations between them
• How to find an good time span can discover effective technological trends– KL-divergence to compare two language model
![Page 17: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/17.jpg)
Technology Trend Discovery
• How to find salient technologies within time spans.– If a technology is important , many patent will
refer to it– Mutual information concept
![Page 18: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/18.jpg)
Technology Trend Discovery Algorithm
• Step 1– Define an initial time span(by dense of the data)
• Step 2– Generate all possible combination of time span(e.g.
<1998~2000,1999~2001> )• Step 3 – Calculate KL-divergences of all pairs from step 2, rank them
• Step 4 – Select the most important technology among the top n pairs
![Page 19: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/19.jpg)
Experiment
• Database: USPTO• Domain: Speech recognition• Data number: US 1420 patent document• Time: 1976 - 2003 • Annotator: three computer science graduate
students• Annotated number:400 document(uniformly
select over the span of time)
![Page 20: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/20.jpg)
Experiment
• Annotated work– Deal with the acronym(by Wiki and simple parenthetical
patterns)– WordNet to normalize the noun and verb
• Technology phrase(Answer) is produced by gold standard with majority votes
• Agreements for 78% of sample(about 300 )• Technology Trend Discovery do not have a
standard , it is too hard.(too many time span) ==>do not have good evaluation
![Page 21: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/21.jpg)
Experiment
• Set the background language model • Used LIBSVM as a machine learner,used 5-fold
cross validation
![Page 22: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/22.jpg)
Experiment
• All feature was proven the effectiveness
![Page 23: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/23.jpg)
Experiment
• From the above step, we can discover many meaningful problems and solutions
• Question: Synonymy issue(even utilize synonyms from WordNet)
![Page 24: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/24.jpg)
Experiment
• Discover technological trends by the Technology Trend Discovery Algorithm
![Page 25: Automatic Discovery of Technology Trends from Patent Text](https://reader036.fdocuments.in/reader036/viewer/2022062501/56815f99550346895dce9e24/html5/thumbnails/25.jpg)
Conclusion & future work
• Discover such trends can reveal latent technologies
• Also can assist an exploration by alleviating information overload caused by search results
• Future workSynonymy issue in Semantic ExtractionTTD standardized evaluation needs to investigated