David W. Embley , Stephen W. Liddle , & Deryle W. Lonsdale Brigham Young University, USA
Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W....
-
date post
20-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W....
![Page 1: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/1.jpg)
Toward Tomorrow’s Semantic WebAn Approach Based on
Information Extraction Ontologies
David W. EmbleyBrigham Young University
Funded in part by the National Science Foundation
![Page 2: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/2.jpg)
Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges
![Page 3: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/3.jpg)
Grand Challenge
Semantic UnderstandingSemantic Understanding
Can we quantify & specify the nature of this grand challenge?
![Page 4: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/4.jpg)
Grand Challenge
Semantic UnderstandingSemantic Understanding“If ever there were a technology that could generatetrillions of dollars in savings worldwide …, it wouldbe the technology that makes business informationsystems interoperable.”
(Jeffrey T. Pollock, VP of Technology Strategy, Modulant Solutions)
![Page 5: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/5.jpg)
Grand Challenge
Semantic UnderstandingSemantic Understanding“The Semantic Web: … content that is meaningful tocomputers [and that] will unleash a revolution of newpossibilities … Properly designed, the Semantic Webcan assist the evolution of human knowledge …”
(Tim Berners-Lee, …, Weaving the Web)
![Page 6: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/6.jpg)
Grand Challenge
Semantic UnderstandingSemantic Understanding“20th Century: Data Processing“21st Century: Data Exchange “The issue now is mutual understanding.”
(Stefano Spaccapietra, Editor in Chief, Journal on Data Semantics)
![Page 7: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/7.jpg)
Grand Challenge
Semantic UnderstandingSemantic Understanding“The Grand Challenge [of semantic understanding] has become mission critical. Current solutions … won’t scale. Businesses need economic growth dependent on the web working and scaling (cost: $1 trillion/year).”
(Michael Brodie, Chief Scientist, Verizon Communications)
![Page 8: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/8.jpg)
What is Semantic Understanding?
Understanding: “To grasp or comprehend [what’s]intended or expressed.’’
Semantics: “The meaning or the interpretation of a word, sentence, or other language form.”
- Dictionary.com
![Page 9: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/9.jpg)
Can We Achieve Semantic Understanding?
“A computer doesn’t truly ‘understand’ anything.”
But computers can manipulate terms “in ways that are useful and meaningful to the human user.”
- Tim Berners-Lee
Key Point: it only has to be good enough.And that’s our challenge and our opportunity!
…
![Page 10: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/10.jpg)
Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges
![Page 11: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/11.jpg)
Information Value Chain
Meaning
Knowledge
Information
Data
Translating data into meaning
![Page 12: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/12.jpg)
Foundational Definitions
Meaning: knowledge that is relevant or activates Knowledge: information with a degree of
certainty or community agreement Information: data in a conceptual framework Data: attribute-value pairs
- Adapted from [Meadow92]
![Page 13: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/13.jpg)
Foundational Definitions
Meaning: knowledge that is relevant or activates Knowledge: information with a degree of
certainty or community agreement (ontology) Information: data in a conceptual framework Data: attribute-value pairs
- Adapted from [Meadow92]
![Page 14: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/14.jpg)
Foundational Definitions
Meaning: knowledge that is relevant or activates Knowledge: information with a degree of
certainty or community agreement (ontology) Information: data in a conceptual framework Data: attribute-value pairs
- Adapted from [Meadow92]
![Page 15: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/15.jpg)
Foundational Definitions
Meaning: knowledge that is relevant or activates Knowledge: information with a degree of
certainty or community agreement (ontology) Information: data in a conceptual framework Data: attribute-value pairs
- Adapted from [Meadow92]
![Page 16: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/16.jpg)
Data
Attribute-Value Pairs• Fundamental for information• Thus, fundamental for knowledge & meaning
![Page 17: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/17.jpg)
Data
Attribute-Value Pairs• Fundamental for information• Thus, fundamental for knowledge & meaning
Data Frame• Extensive knowledge about a data item
�Everyday data: currency, dates, time, weights & measures
�Textual appearance, units, context, operators, I/O conversion
• Abstract data type with an extended framework
![Page 18: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/18.jpg)
Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges
![Page 19: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/19.jpg)
?
Olympus C-750 Ultra Zoom
Sensor Resolution: 4.2 megapixelsOptical Zoom: 10 xDigital Zoom: 4 xInstalled Memory: 16 MBLens Aperture: F/8-2.8/3.7Focal Length min: 6.3 mmFocal Length max: 63.0 mm
![Page 20: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/20.jpg)
?
Olympus C-750 Ultra Zoom
Sensor Resolution: 4.2 megapixelsOptical Zoom: 10 xDigital Zoom: 4 xInstalled Memory: 16 MBLens Aperture: F/8-2.8/3.7Focal Length min: 6.3 mmFocal Length max: 63.0 mm
![Page 21: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/21.jpg)
?
Olympus C-750 Ultra Zoom
Sensor Resolution: 4.2 megapixelsOptical Zoom: 10 xDigital Zoom: 4 xInstalled Memory: 16 MBLens Aperture: F/8-2.8/3.7Focal Length min: 6.3 mmFocal Length max: 63.0 mm
![Page 22: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/22.jpg)
?
Olympus C-750 Ultra Zoom
Sensor Resolution 4.2 megapixelsOptical Zoom 10 xDigital Zoom 4 xInstalled Memory 16 MBLens Aperture F/8-2.8/3.7Focal Length min 6.3 mmFocal Length max 63.0 mm
![Page 23: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/23.jpg)
Digital Camera
Olympus C-750 Ultra Zoom
Sensor Resolution: 4.2 megapixelsOptical Zoom: 10 xDigital Zoom: 4 xInstalled Memory: 16 MBLens Aperture: F/8-2.8/3.7Focal Length min: 6.3 mmFocal Length max: 63.0 mm
![Page 24: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/24.jpg)
?
Year 2002Make FordModel ThunderbirdMileage 5,500 milesFeatures Red
ABS6 CD changerkeyless entry
Price $33,000Phone (916) 972-9117
![Page 25: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/25.jpg)
?
Year 2002Make FordModel ThunderbirdMileage 5,500 milesFeatures Red
ABS6 CD changerkeyless entry
Price $33,000Phone (916) 972-9117
![Page 26: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/26.jpg)
?
Year 2002Make FordModel ThunderbirdMileage 5,500 milesFeatures Red
ABS6 CD changerkeyless entry
Price $33,000Phone (916) 972-9117
![Page 27: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/27.jpg)
?
Year 2002Make FordModel ThunderbirdMileage 5,500 milesFeatures Red
ABS6 CD changerkeyless entry
Price $33,000Phone (916) 972-9117
![Page 28: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/28.jpg)
Car Advertisement
Year 2002Make FordModel ThunderbirdMileage 5,500 milesFeatures Red
ABS6 CD changerkeyless entry
Price $33,000Phone (916) 972-9117
![Page 29: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/29.jpg)
?
Flight # Class From Time/Date To Time/Date Stops
Delta 16 Coach JFK 6:05 pm CDG 7:35 am 0 02 01 04 03 01 04
Delta 119 Coach CDG 10:20 am JFK 1:00 pm 0 09 01 04 09 01 04
![Page 30: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/30.jpg)
?
Flight # Class From Time/Date To Time/Date Stops
Delta 16 Coach JFK 6:05 pm CDG 7:35 am 0 02 01 04 03 01 04
Delta 119 Coach CDG 10:20 am JFK 1:00 pm 0 09 01 04 09 01 04
![Page 31: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/31.jpg)
Airline Itinerary
Flight # Class From Time/Date To Time/Date Stops
Delta 16 Coach JFK 6:05 pm CDG 7:35 am 0 02 01 04 03 01 04
Delta 119 Coach CDG 10:20 am JFK 1:00 pm 0 09 01 04 09 01 04
![Page 32: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/32.jpg)
?
Monday, October 13, 2003
Group A W L T GF GA Pts.USA 3 0 0 11 1 9Sweden 2 1 0 5 3 6North Korea 1 2 0 3 4 3Nigeria 0 3 0 0 11 0
Group B W L T GF GA Pts.Brazil 2 0 1 8 2 7…
![Page 33: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/33.jpg)
?
Monday, October 13, 2003
Group A W L T GF GA Pts.USA 3 0 0 11 1 9Sweden 2 1 0 5 3 6North Korea 1 2 0 3 4 3Nigeria 0 3 0 0 11 0
Group B W L T GF GA Pts.Brazil 2 0 1 8 2 7…
![Page 34: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/34.jpg)
World Cup Soccer
Monday, October 13, 2003
Group A W L T GF GA Pts.USA 3 0 0 11 1 9Sweden 2 1 0 5 3 6North Korea 1 2 0 3 4 3Nigeria 0 3 0 0 11 0
Group B W L T GF GA Pts.Brazil 2 0 1 8 2 7…
![Page 35: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/35.jpg)
?
Calories 250 calDistance 2.50 milesTime 23.35 minutesIncline 1.5 degreesSpeed 5.2 mphHeart Rate 125 bpm
![Page 36: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/36.jpg)
?
Calories 250 calDistance 2.50 milesTime 23.35 minutesIncline 1.5 degreesSpeed 5.2 mphHeart Rate 125 bpm
![Page 37: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/37.jpg)
?
Calories 250 calDistance 2.50 milesTime 23.35 minutesIncline 1.5 degreesSpeed 5.2 mphHeart Rate 125 bpm
![Page 38: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/38.jpg)
Treadmill Workout
Calories 250 calDistance 2.50 milesTime 23.35 minutesIncline 1.5 degreesSpeed 5.2 mphHeart Rate 125 bpm
![Page 39: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/39.jpg)
?
Place Bonnie LakeCounty DuchesneState UtahType LakeElevation 10,000 feetUSGS Quad Mirror LakeLatitude 40.711ºNLongitude 110.876ºW
![Page 40: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/40.jpg)
?
Place Bonnie LakeCounty DuchesneState UtahType LakeElevation 10,000 feetUSGS Quad Mirror LakeLatitude 40.711ºNLongitude 110.876ºW
![Page 41: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/41.jpg)
?
Place Bonnie LakeCounty DuchesneState UtahType LakeElevation 10,000 feetUSGS Quad Mirror LakeLatitude 40.711ºNLongitude 110.876ºW
![Page 42: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/42.jpg)
Maps
Place Bonnie LakeCounty DuchesneState UtahType LakeElevation 10,100 feetUSGS Quad Mirror LakeLatitude 40.711ºNLongitude 110.876ºW
![Page 43: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/43.jpg)
Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges
![Page 44: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/44.jpg)
Information Extraction OntologiesSource Target
InformationExtraction
InformationExchange
![Page 45: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/45.jpg)
What is an Extraction Ontology? Augmented Conceptual-Model Instance
• Object & relationship sets• Constraints• Data frame value recognizers
Robust Wrapper (Ontology-Based Wrapper)• Extracts information• Works even when site changes or when new sites
come on-line
![Page 46: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/46.jpg)
CarAds
Color
Feature
AccessoryBodyType
OtherFeatureEngine
Transmission
Mileage
ModelTrim
TrimModel
Year
Make
Price
PhoneNr
0:1
has1:*
0:1has1:*
0:0.7:1has
1:* 0:0.9:1has
1:*
0:0.78:1
has
1:*
0:1
1:*
0:1
1:*
0:1
has1:*
0:*has
1:*
0:*
has
1:*
CarAds
Color
Feature
AccessoryBodyType
OtherFeatureEngine
Transmission
Mileage
ModelTrim
TrimModel
Year
Make
Price
PhoneNr
0:1
has1:*
0:1has1:*
0:0.7:1has
1:* 0:0.9:1has
1:*
0:0.78:1
has
1:*
0:1
1:*
0:1
1:*
0:1
has1:*
0:*has
1:*
0:*
has
1:*
CarAds Extraction Ontology
<ObjectSet x="329" y="51" lexical="true" name="Mileage" id="osmx50"> <DataFrame> <InternalRepresentation> <DataType typeName="String"/> </InternalRepresentation> <ValuePhraseList> <ValuePhrase hint="Mileage Pattern 1"> <ValueExpression color="ffffff"> <ExpressionText>[1-9]\d{0,2}[kK]</ExpressionText> </ValueExpression> <LeftContextExpression color="ffffff"> …
<ObjectSet x="329" y="51" lexical="true" name="Mileage" id="osmx50"> <DataFrame> <InternalRepresentation> <DataType typeName="String"/> </InternalRepresentation> <ValuePhraseList> <ValuePhrase hint="Mileage Pattern 1"> <ValueExpression color="ffffff"> <ExpressionText>[1-9]\d{0,2}[kK]</ExpressionText> </ValueExpression> <LeftContextExpression color="ffffff"> …
![Page 47: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/47.jpg)
Extraction Ontologies:An Example of
Semantic Understanding
“Intelligent” Symbol Manipulation Gives the “Illusion of Understanding” Obtains Meaningful and Useful Results
![Page 48: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/48.jpg)
Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges
![Page 49: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/49.jpg)
A Variety of Applications Information Extraction Semantic Web Page Annotation Free-Form Semantic Web Queries Task Ontologies for Free-Form Service Requests High-Precision Classification Schema Mapping for Ontology Alignment Record Linkage Accessing the Hidden Web Ontology Discovery and Generation Challenging Applications (e.g. BioInformatics)
![Page 50: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/50.jpg)
Application #1
Information Extraction
![Page 51: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/51.jpg)
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
Constant/Keyword Recognition
Descriptor/String/Position(start/end)
Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155
![Page 52: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/52.jpg)
Heuristics
Keyword proximity Subsumed and overlapping constants Functional relationships Nonfunctional relationships First occurrence without constraint violation
![Page 53: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/53.jpg)
Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155
Keyword Proximity
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
![Page 54: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/54.jpg)
Subsumed/Overlapping Constants
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155
![Page 55: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/55.jpg)
Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155
Functional Relationships
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
![Page 56: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/56.jpg)
Nonfunctional Relationships
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155
![Page 57: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/57.jpg)
First Occurrence without Constraint Violation
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800 or 566-3888
Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155
![Page 58: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/58.jpg)
Year|97|2|3Make|CHEV|5|8Make|CHEVY|5|9Model|Cavalier|11|18Feature|Red|21|23Feature|5 spd|26|30Mileage|7,000|38|42KEYWORD(Mileage)|miles|44|48Price|11,995|100|105Mileage|11,995|100|105PhoneNr|566-3800|136|143PhoneNr|566-3888|148|155
Database-Instance Generator
insert into Car values(1001, “97”, “CHEVY”, “Cavalier”, “7,000”, “11,995”, “556-3800”)insert into CarFeature values(1001, “Red”)insert into CarFeature values(1001, “5 spd”)
![Page 59: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/59.jpg)
Application #2
Semantic Web Page Annotation
![Page 60: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/60.jpg)
Annotated Web Page(Demo)
![Page 61: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/61.jpg)
OWL<owl:Class rdf:ID="CarAds"> <rdfs:label xml:lang="en">CarAds</rdfs:label>...... <rdfs:subClassOf>
<owl:Restriction> <owl:onProperty rdf:resource="#hasMileage" /> <owl:minCardinality rdf:datatype="&xsd;nonNegativeInteger">0</owl:minCardinality>
</owl:Restriction> </rdfs:subClassOf> <rdfs:subClassOf>
<owl:Restriction> <owl:onProperty rdf:resource="#hasMileage" />
<owl:maxCardinality rdf:datatype="&xsd;nonNegativeInteger">1</owl:maxCardinality>
</owl:Restriction> </rdfs:subClassOf> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#hasMileage" /> <owl:allValuesFrom rdf:resource="#Mileage" /> </owl:Restriction> </rdfs:subClassOf>……</owl:Class>……<owl:Class rdf:ID="Mileage"> <rdfs:label xml:lang="en">Mileage</rdfs:label>……</owl:Class>……
<CarAds rdf:ID="CarAdsIns2"><CarAdsValue rdf:datatype="&xsd;string">2</CarAdsValue>
</CarAds>……<Mileage rdf:ID="MileageIns2">
<StartingCharPosition rdf:datatype="&xsd;nonNegativeInteger">237</StartingCharPosition>
<EndingCharPosition rdf:datatype="&xsd;nonNegativeInteger">241</EndingCharPosition>
</Mileage>…….<owl:Thing rdf:about="#CarAdsIns2">
<hasMake rdf:resource="#MakeIns2" /><hasModel rdf:resource="#ModelIns2" /><hasYear rdf:resource="#YearIns2" /><hasMileage rdf:resource="#MileageIns2" /><hasPhoneNr rdf:resource="#PhoneNrIns2" /><hasPrice rdf:resource="#PriceIns2" />
</owl:Thing>
……
![Page 62: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/62.jpg)
Application #3
Free-Form Semantic Web Queries
![Page 63: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/63.jpg)
Find Ontology“Tell me about cruises on San Francisco Bay. I’d like to know
scheduled times, cost, and the duration of cruises on Friday of next week.”
![Page 64: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/64.jpg)
Formulate Query
Friday, Oct. 29thcost
duration
Selection Constants
San Francisco Bayscheduled times
Projection
= Result ( )
Join Path
![Page 65: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/65.jpg)
StartTime Price Duration Source
10:45 am, 12:00 pm, 1:15, 2:30, 4:00 $20.00, $16.00, $12.00
1
10:00 am, 10:45 am, 11:15 am, 12:00 pm, 12:30 pm, 1:15 pm, 1:45 pm, 2:30 pm, 3:00 pm, 3:45 pm, 4:15 pm, 5:00 pm
$17.00, $16.00, $12.00
1 Hour 2
![Page 66: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/66.jpg)
Application #4
Task Ontologies for Free-Form Service Requests
![Page 67: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/67.jpg)
Basic Idea Service Request
Match with Task Ontology• Domain Ontology• Process Ontology
Complete, Negotiate, Finalize
I want to see a dermatologist next week; any day would
be ok for me, at 4:00 p.m. The dermatologist must be
within 20 miles from my home and must accept my
insurance.
![Page 68: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/68.jpg)
Domain Ontology
Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->
![Page 69: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/69.jpg)
Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->
Appointment …
context keywords/phrase: “appointment |want to see a |…”
Dermatologist …
context keywords/phrases: “([D|d]ermatologist) | …”
I want to see a dermatologist next week; any day would
be ok for me, at 4:00 p.m. The dermatologist must be
within 20 miles from my home and must accept my
insurance.
![Page 70: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/70.jpg)
Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->
Appointment …
context keywords/phrase: “appointment |want to see a |…”
Dermatologist …
context keywords/phrases: “([D|d]ermatologist) | …”
I want to see a dermatologist next week; any day would
be ok for me, at 4:00 p.m. The dermatologist must be
within 20 miles from my home and must accept my
insurance.
![Page 71: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/71.jpg)
Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->
Appointment …
context keywords/phrase: “appointment |want to see a |…”
Dermatologist …
context keywords/phrases: “([D|d]ermatologist) | …”
I want to see a dermatologist next week; any day would
be ok for me, at 4:00 p.m. The dermatologist must be
within 20 miles from my home and must accept my
insurance.
![Page 72: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/72.jpg)
Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->
Appointment …
context keywords/phrase: “appointment |want to see a |…”
Dermatologist …
context keywords/phrases: “([D|d]ermatologist) | …”
I want to see a dermatologist next week; any day would
be ok for me, at 4:00 p.m. The dermatologist must be
within 20 miles from my home and must accept my
insurance.
![Page 73: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/73.jpg)
Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->
Appointment …
context keywords/phrase: “appointment |want to see a |…”
Dermatologist …
context keywords/phrases: “([D|d]ermatologist) | …”
I want to see a dermatologist next week; any day would
be ok for me, at 4:00 p.m. The dermatologist must be
within 20 miles from my home and must accept my
insurance.
Date …NextWeek(d1: Date, d2: Date)returns (Boolean{T,F})context keywords/phrases: next week | week from now | …
Distanceinternal representation : real;input (s: String)context keywords/phrases: miles | mile | mi | kilometers | kilometer | meters | meter | centimeter | … Within(d1: Distance, “20”)returns (Boolean {T or F})context keywords/phrases: within | not more than | | …return (d1d2)…end;
![Page 74: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/74.jpg)
Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->
![Page 75: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/75.jpg)
Appointment
Place
Dermatologist
Person
Name
Address
Date
Time
is at
is on
has
hasis with
is for
is at
is at
has
is at
->Appointment
Place
Dermatologist
Person
Name
Address
Date
Time
is at
is on
has
hasis with
is for
is at
is at
has
is at
->
![Page 76: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/76.jpg)
Process Ontology
ready to schedule
task-view = null
report that the appointment cannot be scheduled
task-view != null
schedule-appointment(task-view.Person.Name,task-view.Service Provider.Name, task-view.Date, task-view.Time, task-view.Address);report that the appointment is scheduled;
initial-task-view ready
no missing information missing information
task-view = get-from-system(task-view); if (still missing values) task-view = ger-from-user(task-view);
@process ontology(domain ontology)
task-view = create-task-view(domain ontology);task-constraints = create-task-constraints(task-view);
ready@create
initialize
.
.
.
ready to schedule
task-view = null
report that the appointment cannot be scheduled
task-view != null
schedule-appointment(task-view.Person.Name,task-view.Service Provider.Name, task-view.Date, task-view.Time, task-view.Address);report that the appointment is scheduled;
initial-task-view ready
no missing information missing information
task-view = get-from-system(task-view); if (still missing values) task-view = ger-from-user(task-view);
@process ontology(domain ontology)
task-view = create-task-view(domain ontology);task-constraints = create-task-constraints(task-view);
ready@create
initialize
.
.
.
![Page 77: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/77.jpg)
Specification Satisfaction
"Dr. Carter" "Lynn Jones"
Dermatologist0 "IHC" "DMBA"
Appointment7 "4:00""5 Jan 05"
Person100
"Orem 600 State St." "Provo 300 State St."
"Dr. Carter" "Lynn Jones"
Dermatologist0 "IHC" "DMBA"
Appointment7 "4:00""5 Jan 05"
Person100
"Orem 600 State St." "Provo 300 State St."
Date(“28 Dec 04”) and NextWeek(“28 Dec 04”, “5 Jan 05”)Dermatologist(Dermatologist0) is at Address(“Orem 600 State St.”) and Within(DistanceBetween(“Provo 300 State St.”, “Orem 600 State St.”), “22”)i2 (Dermatologist(Dermatologist0) accepts Insurance(i2) and Equal(“IHC”, i2))
![Page 78: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/78.jpg)
Application #5
High-Precision Classification
![Page 79: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/79.jpg)
An Extraction Ontology Solution
![Page 80: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/80.jpg)
Document 1: Car Ads
Document 2: Items for Sale or Rent
Density Heuristic
![Page 81: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/81.jpg)
Document 1: Car Ads
Year: 3Make: 2Model: 3Mileage: 1Price: 1Feature: 15PhoneNr: 3
Expected Values Heuristic
Document 2: Items for Sale or Rent
Year: 1Make: 0Model: 0Mileage: 1Price: 0Feature: 0PhoneNr: 4
![Page 82: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/82.jpg)
Vector Space of Expected Values
OV ______ D1 D2Year 0.98 16 6Make 0.93 10 0Model 0.91 12 0Mileage 0.45 6 2Price 0.80 11 8Feature 2.10 29 0PhoneNr 1.15 15 11
D1: 0.996D2: 0.567
ov
D1
D2
![Page 83: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/83.jpg)
Grouping Heuristic
YearMakeModelPriceYearModelYearMakeModelMileage…
Document 1: Car Ads
{{{
YearMileage…MileageYearPricePrice…
Document 2: Items for Sale or Rent
{{
![Page 84: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/84.jpg)
GroupingCar Ads----------------YearYearMakeModel-------------- 3PriceYearModelYear---------------3MakeModelMileageYear---------------4ModelMileagePriceYear---------------4…Grouping: 0.875
Sale Items----------------YearYearYearMileage-------------- 2MileageYearPricePrice---------------3YearPricePriceYear---------------2PricePricePricePrice---------------1…Grouping: 0.500
Expected Number in Group = floor(∑ Ave ) = 4 (for our example)
Sum of Distinct 1-Max Object Sets in each GroupNumber of Groups * Expected Number in a Group
1-Max
3+3+4+4 4*4
= 0.875 2+3+2+1 4*4
= 0.500
![Page 85: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/85.jpg)
Application #6
Schema Mapping forOntology Alignment
![Page 86: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/86.jpg)
Problem: Different Schemas
Target Database Schema{Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}
Different Source Table Schemas• {Run #, Yr, Make, Model, Tran, Color, Dr}• {Make, Model, Year, Colour, Price, Auto, Air Cond.,
AM/FM, CD}• {Vehicle, Distance, Price, Mileage}• {Year, Make, Model, Trim, Invoice/Retail, Engine,
Fuel Economy}
![Page 87: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/87.jpg)
Solution: Remove Internal Factoring
Discover Nesting: Make, (Model, (Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*)*
Unnest: μ(Model, Year, Colour, Price, Auto, Air Cond, AM/FM, CD)* μ (Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*Table
Legend
ACURA
ACURA
![Page 88: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/88.jpg)
Solution: Replace Boolean Values
Legend
ACURA
ACURA
β CD Table
Yes,
CD
CD
Yes,Yes,βAutoβAir CondβAM/FMYes,
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
![Page 89: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/89.jpg)
Solution: Form Attribute-Value Pairs
Legend
ACURA
ACURA
CD
CD
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
<Make, Honda>, <Model, Civic EX>, <Year, 1995>, <Colour, White>, <Price, $6300>, <Auto, Auto>, <Air Cond., Air Cond.>, <AM/FM, AM/FM>, <CD, >
![Page 90: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/90.jpg)
Solution: Adjust Attribute-Value Pairs
Legend
ACURA
ACURA
CD
CD
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
<Make, Honda>, <Model, Civic EX>, <Year, 1995>, <Colour, White>, <Price, $6300>, <Auto>, <Air Cond>, <AM/FM>
![Page 91: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/91.jpg)
Solution: Do Extraction
Legend
ACURA
ACURA
CD
CD
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
![Page 92: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/92.jpg)
Solution: Infer Mappings
Legend
ACURA
ACURA
CD
CD
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
{Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}
Each row is a car. πModelμ(Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*TableπMakeμ(Model, Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*μ(Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*TableπYearTable
Note: Mappings produce sets for attributes. Joining to form recordsis trivial because we have OIDs for table rows (e.g. for each Car).
![Page 93: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/93.jpg)
Solution: Infer Mappings
Legend
ACURA
ACURA
CD
CD
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
{Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}
πModelμ(Year, Colour, Price, Auto, Air Cond, AM/FM, CD)*Table
![Page 94: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/94.jpg)
Solution: Do Extraction
Legend
ACURA
ACURA
CD
CD
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
{Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}
πPriceTable
![Page 95: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/95.jpg)
Solution: Do Extraction
Legend
ACURA
ACURA
CD
CD
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
AM/FM
Air Cond.
Air Cond.
Air Cond.
Air Cond.
Auto
Auto
Auto
Auto
{Car, Year, Make, Model, Mileage, Price, PhoneNr}, {PhoneNr, Extension}, {Car, Feature}
Yes,ρ Colour←Feature π ColourTable U ρ Auto←Feature π Auto β AutoTable U ρ Air Cond.←Feature π Air Cond.
β Air Cond.Table U ρ AM/FM←Feature π AM/FM β AM/FMTable U ρ CD←Featureπ CDβ CDTableYes, Yes, Yes,
![Page 96: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/96.jpg)
Application #7
Record Linkage
![Page 97: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/97.jpg)
“Kelly Flanagan” Query
![Page 98: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/98.jpg)
Gather evidence from each of several different facets• Attributes• Links• Page Similarity
Combine the evidence
A Multi-faceted Approach
![Page 99: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/99.jpg)
Phone number, email address, state, city, zip code Data-frame recognizers
Attributes
![Page 100: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/100.jpg)
Links
![Page 101: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/101.jpg)
“adjacent cap-word pairs”: Cap-Word (Connector | Preposition (Article)? | (Capital-LetterDot))? Cap-Word.
Page Similarity
![Page 102: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/102.jpg)
C1 C2 ….. Ci ….. Cj … Cn
C1 1 C12 C1i C1j C1n
C2 1 C2i C2j C2n
: : : :
Ci 1 Cij Cin
: : :
Cj 1 Cjn
: :
Cn 1
P(Ci and Cj refer to a same person | evidence for a facet f )
0 if no evidence for a facet f
Cij =
Training set to compute the conditional probabilities
Confidence Matrix for Each Facet
![Page 103: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/103.jpg)
0.96 + 0 + 0.78 - 0.96 * 0 - 0.96 * 0.78 - 0.78 * 0 + 0.96 * 0 * 0.78 = 0.9912
Confidence Matrix for Attributes Confidence Matrix for Links Confidence Matrix for Page Similarity
Final Matrix
![Page 104: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/104.jpg)
Input: final confidence matrix Output: citations grouped by same person The idea:
{Ci , Cj} and {Cj , Ck} then {Ci , Cj , Ck}
The threshold we use for “highly confident” is 0.8.
Grouping Algorithm
![Page 105: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/105.jpg)
Experimental Results
![Page 106: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/106.jpg)
Application #8
Accessing the Hidden Web
![Page 107: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/107.jpg)
Obtaining Data Behind Forms
• Web information is stored in databases
• Databases are accessed through forms
• Forms are designed in various ways
![Page 108: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/108.jpg)
Hidden Web Extraction System
Input Analyzer
Retrieved Page(s)
User Query
Site Form
Output Analyzer
Extracted Information
ApplicationExtraction Ontology
“Find green cars costing no more than $9000.”
![Page 109: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/109.jpg)
Application #9
Ontology Discovery & Generation
![Page 110: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/110.jpg)
TANGO: Table Analysis for Generating Ontologies
Recognize and normalize table information Construct mini-ontologies from tables Discover inter-ontology mappings Merge mini-ontologies into a growing ontology
![Page 111: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/111.jpg)
Recognize Table Information
Religion Population Albanian Roman Shi’a SunniCountry (July 2001 est.) Orthodox Muslim Catholic Muslim Muslim other
Afganistan 26,813,057 15% 84% 1%Albania 3,510,484 20% 70% 30%
![Page 112: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/112.jpg)
Construct Mini-Ontology Religion Population Albanian Roman Shi’a SunniCountry (July 2001 est.) Orthodox Muslim Catholic Muslim Muslim other
Afganistan 26,813,057 15% 84% 1%Albania 3,510,484 20% 70% 30%
![Page 113: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/113.jpg)
Discover Mappings
![Page 114: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/114.jpg)
Merge
![Page 115: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/115.jpg)
Application #10
Challenging Applications(e.g. BioInformatics)
![Page 116: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/116.jpg)
Large Extraction Ontologies
![Page 117: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/117.jpg)
Complex Semi-Structured Pages
![Page 118: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/118.jpg)
Additional Analysis Opportunities
Sibling Page Comparison Semi-automatic Lexicon Update Seed Ontology Recognition
![Page 119: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/119.jpg)
Sibling Page Comparison
![Page 120: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/120.jpg)
Sibling Page ComparisonAttributes
![Page 121: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/121.jpg)
Sibling Page Comparison
![Page 122: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/122.jpg)
Sibling Page Comparison
![Page 123: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/123.jpg)
Semi-automatic Lexicon Update
Additional Protein Names
Additional Source Speciesor Organisms
![Page 124: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/124.jpg)
nucleus;
nucleus;zinc ion binding;nucleic acid binding;
zinc ion binding;nucleic acid binding;
linear;
NP_079345;
9606;
Eukaryota; Metazoa;Chorata;Craniata;Vertebrata;Euteleostomi;Mammalia;Eutheria;Primates;Catarrhini;Hominidae;Homo;
NP_079345;
Homo sapiens;human;
GTTTTTGTGTT……….ATAAGTGCATTAACGGCCCACATG;
FLJ14299
msdspagsnprtpessgsgsgg………tagpyyspyalygqrlasasalgyq;
hypothetical protein FLJ14299;
8;eight;
“8:?p\s?12”;“8:?p11.2”;“8:?p11.23”;:: “37,?612,?680”;
“37,?610,?585”;
Seed Ontology Recognition
![Page 125: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/125.jpg)
Seed Ontology Recognition
![Page 126: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/126.jpg)
Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges
![Page 127: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/127.jpg)
Limitations and Pragmatics
Data-Rich, Narrow Domain Ambiguities ~ Context Assumptions Incompleteness ~ Implicit Information Common Sense Requirements Knowledge Prerequisites …
![Page 128: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/128.jpg)
Busiest Airport in 2003?
Chicago - 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.)Atlanta - 58,875,694 Passengers (Sep., latest numbers available)Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)
![Page 129: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/129.jpg)
Busiest Airport in 2003?
Chicago - 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.)Atlanta - 58,875,694 Passengers (Sep., latest numbers available)Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)
![Page 130: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/130.jpg)
Busiest Airport in 2003?
Chicago - 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.)Atlanta - 58,875,694 Passengers (Sep., latest numbers available)Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)
![Page 131: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/131.jpg)
Busiest Airport in 2003?
Chicago - 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.)Atlanta - 58,875,694 Passengers (Sep., latest numbers available)Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)
Ambiguous Whom do we trust? (How do they count?)
![Page 132: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/132.jpg)
Busiest Airport in 2003?
Chicago - 928,735 Landings (Nat. Air Traffic Controllers Assoc.) - 931,000 Landings (Federal Aviation Admin.)Atlanta - 58,875,694 Passengers (Sep., latest numbers available)Memphis - 2,494,190 Metric Tons (Airports Council Int’l.)
Important qualification
![Page 133: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/133.jpg)
Dow Jones Industrial Average
High Low Last Chg30 Indus 10527.03 10321.35 10409.85 +85.1820 Transp 3038.15 2998.60 3008.16 +9.8315 Utils 268.78 264.72 266.45 +1.7266 Stocks 3022.31 2972.94 2993.12 +19.65
44.07
10,409.85
Graphics, Icons, …
![Page 134: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/134.jpg)
Dow Jones Industrial Average
High Low Last Chg30 Indus 10527.03 10321.35 10409.85 +85.1820 Transp 3038.15 2998.60 3008.16 +9.8315 Utils 268.78 264.72 266.45 +1.7266 Stocks 3022.31 2972.94 2993.12 +19.65
44.07
10,409.85
Reported onsame date
WeeklyDaily
Implicit information: weekly stated in upper corner of page; daily not stated.
![Page 135: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/135.jpg)
Presentation Outline Grand Challenge Meaning, Knowledge, Information, Data Fun and Games with Data Information Extraction Ontologies Applications Limitations and Pragmatics Summary and Challenges
![Page 136: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/136.jpg)
Some Key Ideas Data, Information, and Knowledge Data Frames
• Knowledge about everyday data items• Recognizers for data in context
Ontologies• Resilient Extraction Ontologies• Shared Conceptualizations
Limitations and Pragmatics
![Page 137: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/137.jpg)
Some Research Issues
Building a library of open source data recognizers Precisely finding and gathering relevant information
• Subparts of larger data• Scattered data (linked, factored, implied)• Data behind forms in the hidden web
Improving concept matching• Indirect matching• Calculations, unit conversions, alternative representations,
… …
![Page 138: Toward Tomorrow’s Semantic Web An Approach Based on Information Extraction Ontologies David W. Embley Brigham Young University Funded in part by the National.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d455503460f94a22666/html5/thumbnails/138.jpg)
Some Research Challenges
Web Page Understanding• Suppose extraction is ~85% accurate• Generate a page grammar
� Increased recall (more extracted)� Increased precision (fewer false positives)�Fast extraction from same-site sibling pages
Universal Rules for Schema Matching• Must rules be domain-specific?• Can some rules be “universal”?
Boundaries of Usefulness: When should machine learning not be used?
Application to Significant Problems• Like those above• Many more …
www.deg.byu.edu
(Machine Learning)