Topic Maps for Association Rule Mining
-
Upload
tmra -
Category
Technology
-
view
2.083 -
download
0
description
Transcript of Topic Maps for Association Rule Mining
![Page 1: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/1.jpg)
Topic Maps for
Association Rule Mining
Tomáš Kliegr, Jan Zemánek, Marek Ovečka
Department of Information and Knowledge EngineeringFaculty of Informatics and Statistics
University of Economics, Prague
![Page 2: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/2.jpg)
Data Mining using CRISP-DM
The goal of data mining is to obtain useful non-trivial patterns from the data.
Analytical Report
![Page 3: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/3.jpg)
Common data mining tasks
Clustering Classification
Sex(M) and Salary(Low) and District(Havlickuv Brod) => Quality(Bad)
Association rules
![Page 4: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/4.jpg)
Association Rule MiningEXAMPLEUnlike clustering and classification, association rules provide true “nuggets” – rules
meeting selected interest measuresDuration(2y+)and District(Prague)=> Loan Quality(good)
THE QUEST FOR TOPIC MAPS
Antecedent Consequent
Select the really interesting rules from the rules output automatically.Help searching through the results.
THE PROBLEM WITH INTEREST MEASURESIt is usually not possible to tweak the interest measure thresholds so that only the really interesting rules are output. To be on the safe side, we often get (many!) more rules than desired,
![Page 5: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/5.jpg)
The quest
- Past results
- Background knowledge
- Redundant rules
Discovered nuggetsMore precise tasks
orAutomatic rule filtering
The lingua franca for exchange of data mining models is PMML
![Page 6: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/6.jpg)
Predictive Modeling Markup Language• XML Schema• PMML is the leading standard for
statistical and data mining models• Supported by over 20 vendors and
organizations• Covers the technical part of the
CRISP-DM Cycle
http://www.dmg.org/pmml_examples/index.html
![Page 7: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/7.jpg)
PMML is “just” an XML Schema
• Developed for deploying mining models • Good for migration from one data mining
environment to anotherBut:• No explicit links between nodes• Verbose• Self-contained. Lacks support for– Interlinking multiple PMML documents– Interlinking PMML with other information
![Page 8: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/8.jpg)
Association Rule Mining Ontology
The ontology is a „semantization“ of PMML XML Schema
DESIGN GUIDELINESThe key design principle was to allow easy transformation of data from PMML to AROn
SCOPEThe ontology is limited to the subset of PMML relevant toassociation rule mining. 60 topic types, 50 association types and 20 occurence types
USENo automatic transformation is yet available, but we are working on one using OKS framework. Currently, data can be input using Ontopoly.
![Page 9: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/9.jpg)
• xs:element is mapped to topic type• Topics are assigned same names as PMML Nodes
– But respecting spaces between words and capitalization
• Superclasses are introduced for semantically similar XML Nodes
• Named elements used as children in other elements that carry most of the semantics of their parents are merged with parent
• If an XML element has a directly corresponding topic type in the ontology, the URI of the XML element within the schema is used as subject identifier
Design guidelines: Elements
![Page 10: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/10.jpg)
Design guidelines: Attributes• Enumeration restriction on an attribute is mapped as a topic type with enumeration
superclass (this is a workaround for missing TMCL support in OKS)
• Attributes that could be interpreted as reference to other elements become associations
• Other attributes become occurence types
![Page 11: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/11.jpg)
Design guidelines: Associations• Names for association types are arbitrarily chosen so that they are most
descriptive• Introduce less rather than more associations
– minimizes the effort when populating the ontology from PMML– Avoid unnecessary inflation of the topic map
• Link only the semantically closest topics– Additional „soft“ relations can be introduced with inference statements or derived with tolog
![Page 12: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/12.jpg)
Design guidelines: Role types
• Topic types used to map PMML elements are used as role types– Unless multiple topics are permitted in association end. In that case
superclass is used as a role type, or a new role type is introduced
![Page 13: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/13.jpg)
Two alternative association rulerepresentations-Apriori based(Item-Itemset)-GUHA based(Boolean Attributes)
![Page 14: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/14.jpg)
Ongoing work
• Support for background knowledge „already known association rules“
• Support for schema mapping „linking of background knowledge with mining results“
• Already in the ontology, distinguished by base of subject identifier
Schema Mapping• http://keg.vse.cz/sma/XXXBackground Knowledge• http://keg.vse.cz/bko/xxx
![Page 15: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/15.jpg)
Data Mining Use case
PREDICT LOAN QUALITYFind client characteristics that could be used to predict their attitude to paying back a loan.
BASED ON PAST RECORDS Input data: records on already given loans
![Page 16: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/16.jpg)
The data
• 6181 clients in the PKDD’99 financial dataset
Data were preprocessed, i.e.District districtPrague PragueBrno Brno… …
duration Duration
Many distinct values in<0;100>
<0;12>
<13;23>
<24;inf>
status statusAggA GoodB MediumC
BadD
ID sex age duration district Loan quality
5464 male 54 12 [months] Prague A
5489 female 20 6 months Ostrava E
… .. .. .. .. ..
![Page 17: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/17.jpg)
• ….And perhaps 9997 other association rules
Preprocessed data
Association Rule Learner
![Page 18: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/18.jpg)
WE CAN’T PRESENT ALL 10.000 RULES TO THE CLIENT
ASK CLIENT WHAT HE KNOWS
If loan duration is more than two years and the loan was given in Prague district, we can expect good loan quality.
…background knowledge
![Page 19: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/19.jpg)
Semantize the results
![Page 20: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/20.jpg)
Formalize Background Knowledge
![Page 21: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/21.jpg)
Schema Mapping• Background knowledge can use different “vocabulary” than the data • If we are to use background knowledge in querying, we need to interlink
them with data.
The same approach would apply if we interlink several mining models (PMMLs)
![Page 22: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/22.jpg)
Deleting information with Topic Maps
• Find association rules that subsume background knowledge
Visualization of a tolog query
![Page 23: Topic Maps for Association Rule Mining](https://reader034.fdocuments.in/reader034/viewer/2022051208/546c86c4af795985168b4cdf/html5/thumbnails/23.jpg)
Summary
• Methodology for transferring XML Schema to Topic Maps
• Association Rule Mining Ontology based on PMML• Easily extensible to other data mining algorithms• Initial attempts to formalize background knowledge• Initial attempts to use Topic Maps for schema mapping
AROn On-Line: http://maiana.topicmapslab.de/u/lmaicher/tm/kliegr