Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge...
-
Upload
melissa-skelton -
Category
Documents
-
view
217 -
download
3
Transcript of Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge...
![Page 1: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/1.jpg)
Quality TaxonomiesQuality Taxonomies
Jim NisbetSenior Vice President of Technology
Semio Corporation
Knowledge Technologies 2001March 5th, 2001
![Page 2: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/2.jpg)
Ontology / Taxonomy
Root Ontology
Taxonomy Generation
Static Discovery
Dynamic Discovery
![Page 3: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/3.jpg)
What is Quality ?
“Best value for the money” According to this definition, you are entitled to
get high performance from a costly product; likewise a low cost product or service is expected to be a poor delivery. For example, a loose demo delivery is both predictable and acceptable, since its quality is: low conformance / low cost.
![Page 4: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/4.jpg)
What is Quality ?
“Good Quality is Nominal Conformance”
Taxonomy Quality is defined as Taxonomy Conformance to: • Valid requirements;• Explicitly documented development standards; and, • Implicit characteristics that are expected of all
professionally developed taxonomies, such as the desire for good maintainability.
![Page 5: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/5.jpg)
Standards
ISO 2788-1986• International Organization for Standardization. Documentation—Guidelines for the Establishment and
Development of Monolingual Thesauri. 2nd ed. n.p.: ISO, 1986. (ISO 2788-1986(E)). (Available in the U.S. from American National Standards Institute)
ISO 5964-1985 • International Organization for Standardization. Documentation—Guidelines for the Establishment and
Development of Multilingual Thesauri. n.p.: ISO, 1985. (ISO 5964-1985(E)). (Available in the U.S. from American National Standards Institute)
ANSI/NISO Z39.19-1993• National Information Standards Institute. Guidelines for the Construction, Format, and Management of
Monolingual Thesauri. Bethesda, MD: NISO Press, 1994. 69p. (ANSI/NISO Z39.19-1993)
SEMIO Quality Plan v1 2000 ISO/IEC 13250 Topic Maps RDF
• Please refer to RDF at http://www.w3.org/RDF and XML at http://www/w3/org/XML
![Page 6: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/6.jpg)
Project Plan
1. Kick-off2. Requirements Review3. Lexicon Review4. Taxonomy Review5. Tags Review6. Final Review
![Page 7: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/7.jpg)
1. Kick-off Objectives
• Purpose• Scope• Scale• Users• Conditions of receipt
Roles• Supplier• Customer
– Admin– KE– Experts– Users
Planning Training and Transfer
![Page 8: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/8.jpg)
2. Requirements Review
Sources Lexicon Ontology Install
![Page 9: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/9.jpg)
Sources
Dispersion (Multiplicity, Size, Homogeneity) Refresh AccessFeatures Internet,
News,E-Mail
Reports,Patents
E-Trade,Logs
Informative content - + +Number of topics covered + + -Structured information - + +Size of records - + -Number of records + - +
![Page 10: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/10.jpg)
Typical Patterns
Disparity Adjust sources Adjust crawl strategy Isolate communities / taxonomies
![Page 11: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/11.jpg)
Lexicon
Vocabularies, etc. Substitutions: Acronyms, Synonyms, etc. Preferred Keywords: Brand Names, etc. Banned Keywords
![Page 12: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/12.jpg)
Typical Patterns
Lack of requirements Use Librarian Resources
![Page 13: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/13.jpg)
Ontology
Thesaurus ? Is the information domain analysis
complete, consistent, and accurate ? Is the partitioning of the problem
complete ?
![Page 14: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/14.jpg)
Typical Patterns
Directory versus Taxonomy Isolate “directory” branches
Thesaurus versus Taxonomy Put an ontology on top of thesaurus Check ASAP match of thesaurus generics with
extracted lexicon
Very high level design for top categories requirements Plan to work bottom-up
See also Taxonomy (functions, combinations, etc.)
![Page 15: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/15.jpg)
Install
Implementation / Integration:• Are external and internal interfaces properly
defined? • Are all requirements traceable to the system level? • Has prototyping been conducted for the
user/customer? • Is performance achievable within the constraints
imposed by other system elements? • Are requirements consistent with schedule,
resources, and budget?
![Page 16: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/16.jpg)
Typical Patterns
Scale Security Missing Documents
![Page 17: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/17.jpg)
3. Lexicon Review
Coverage• Extracted words / Words• (Extracted Index / Index)
Sources bench-marking• Coverage• Extraction quality• Topic distribution
Structure• Most Frequent Phrases• Most Productive Generics
Substitutions Exceptions
![Page 18: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/18.jpg)
Typical Patterns
Low level of frequency / quality for the most meaningful content Increase size of value corpus Filter and re-import lexicon
![Page 19: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/19.jpg)
4. Taxonomy Review Taxonomy Operation
• Correctness• Reliability• Usability• Integrity• Efficiency
Taxonomy Revision• Maintainability• Flexibility• Testability
Taxonomy Transition• Portability• Reusability• Interoperability
![Page 20: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/20.jpg)
UB
i j
lf lflf1 2
g g gn 1 2 i
n3 4 mg g g g g g s s s s s s25 6 1 3 4
s s s s s s5 6 7 8 m n
v v v v1 2 m n
Level 0
Level 1
Level 2
Level 3
Level 4
UB = unique beginner lf = life-form g = generic s = specific v = varietal
Tax
Liability
Loan
Term loan
Short-term loan
Unique Beginner
Life Form
Generic
Specific
Varietal
Folk Taxonomies Design
The Berlin and Kay model: Taxonomy = Nomenclature + Terminology
![Page 21: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/21.jpg)
Correctness
Accuracy Completeness Consistency
![Page 22: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/22.jpg)
Accuracy
Precision Recall
![Page 23: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/23.jpg)
Completeness
Taxonomy Maps Lexicon Collection
![Page 24: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/24.jpg)
Concentration Works Against Quality
Lexicon
Document Collection
Maps
Taxonomy
Tagging
Tagging Coverage Ontology Coverage Hook Coverage Map Coverage Lexical Coverage Collection Coverage
![Page 25: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/25.jpg)
Consistency:Typical Patterns
Objectivization Hyperonymy Speciation Necessity
![Page 26: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/26.jpg)
Objectivization
EmploymentFiringHiring
Salaries
Avoid functional categories
Don’t mix functions / objects
Exhaust scripts Match idiomatic phrases
![Page 27: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/27.jpg)
Genericity
PartsAir ConditioningBelts and HosesBodyBrake SystemChassisEngineExhaust SystemFuel SystemGlassIgnition
Avoid meronymy Don’t mix
meronymy / hyperonymy
Exhaust prototypes
![Page 28: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/28.jpg)
Speciation
Person Unwelcome person
Unpleasant personSelfish person
OpportunistBackscratcher
Avoid “strings” of categories Avoid (non-idioms) properties for categories
(WordNet)
![Page 29: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/29.jpg)
Necessity
Tax
Individuals Corporations
Assets Liability Assets Liability
BC
D
E
FG
H
I
K
Tax
Individuals Corporations
Assets Liability
Individuals Corporations
Avoid non-productive categories
Avoid combinations of categories
![Page 30: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/30.jpg)
Nomenclature (Design Structure) Quality Index
Depth Width Balance
UB
i j
lf lflf1 2
g g gn 1 2 i
n3 4 mg g g g g g s s s s s s25 6 1 3 4
s s s s s s5 6 7 8 m n
v v v v1 2 m n
Level 0
Level 1
Level 2
Level 3
Level 4
UB = unique beginner lf = life-form g = generic s = specific v = varietal
![Page 31: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/31.jpg)
Complexity Index
Cyclometric complexity increases with number of Cross References within the Taxonomy, giving an indication of complexity and difficulty of testing.
Taxonomy Complexity Index combines:• autonomy• closure• similarity• typicality• commonality• redundancy• stability
![Page 32: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/32.jpg)
Maturity index
The IEEE standard 982.1-1988 suggests a taxonomy maturity index to provide an indication of the stability of the taxonomy .
Maturity Index combines:• number of modules in current ontology / taxonomy.• number of modules in current ontology / taxonomy that have
been changed.• number of modules added to current ontology / taxonomy. • number of modules deleted from the previous version of the
ontology / taxonomy.
![Page 33: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/33.jpg)
5. Tags Review
Document coverage Concepts coverage
<tagset> <document> <docurl>http://www.TaxSource.com</docurl> <tag> <tagname>Liability</tagname> <weight>1.289</weight> </tag> <tag> <tagname>Federal Funds</tagname> <weight>0.746</weight> </tag> </document></tagset>
![Page 34: Quality Taxonomies Jim Nisbet Senior Vice President of Technology Semio Corporation Knowledge Technologies 2001 March 5 th, 2001.](https://reader036.fdocuments.in/reader036/viewer/2022062515/56649c7d5503460f94932a42/html5/thumbnails/34.jpg)
6. Final Review
Receipt Maintenance