Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law...

23
Mapping Regulations to Industry–Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June 5, 2007

Transcript of Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law...

Page 1: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Mapping Regulations to Industry–Specific Taxonomies

Chin Pang Cheng, Gloria T. Lau, Kincho H. Law

Engineering Informatics Group, Stanford University

June 5, 2007

Page 2: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Motivating Problem

To Legal Practitioners:To Legal Practitioners: Hierarchical, well-structured Precise and concise Familiar with regulatory

organization systems

To Industry Practitioners:To Industry Practitioners: Voluminous Not trained to read

regulations More familiar with industry-

specific terminology and classification structure

Page 3: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Mapping Regulations to Taxonomies

Possible Cases: One-Taxonomy-One-Regulation One-Taxonomy-N-Regulation N-Taxonomy-One-Regulation N-Taxonomy-N-Regulation

Page 4: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

One-Taxonomy-One-Regulation

Simple keyword latching task Stemming (e.g. piling pile, disabled disable) Word interval

Concept: “fire alarm system” Regulation: “… fire alarm and detection system …”

Page 5: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Each taxonomy concept is hyperlinked

“No Matched Sections” for non-matched OmniClass concepts

See other matched related concepts in that section

Inverted Regulations

Page 6: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

One-Taxonomy-N-Regulation

Alabama (AL) regulation Arizona (AZ) regulation

Page 7: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

One Regulation as the Base

(AL)

(AZ)

Page 8: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Similarity Comparison on Sections

Core from Lau, Law and Wiederhold (2005) Feature extraction (e.g. concepts, measurements) Comparison of shared features Consideration of hierarchical and referential information

G.Lau, K.Law and G.Wiederhold. “Legal Information Retrieval and Application to E-Rulemaking,” In Proceedings of the 10th International Conference on Artificial Intelligence and Law (ICAIL 2005), Bologna, Italy, pp. 146-154, Jun 6-11, 2005.

A U

parent

sibling

child

psc(A) psc(U) ref(U)

child node

reference node

nodes in comparison

f0

s-refs-psc

psc-psc

AL regulation AZ regulation

Page 9: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Inclusion of Regulation Hierarchy Terminological differences: revealed by neighbor inclusion

4.13 Doors 12.5.4 Doors

4.13.9Door Hardware

12.5.4.2Door Furniture

12.5.4.14.13.1

4.13.3

4.13.2

4.13.12

UFAS BS8300

parent

sibling

Uniform Federal Accessibility Standards 4.13.9 Door Hardware 4.13 Doors 4.13.1 General ... 4.13.9 Door Hardware Handles, pulls, latches, locks, and other operating devices on accessible doors shall have a shape that is easy to grasp with one hand and does not require tight grasping ...

... 4.13.12 Door Opening Force

British Standard 8300 12.5.4.2 Door Furniture 12.5.4 Doors 12.5.4.1 Clear Widths of Door Openings 12.5.4.2 Door Furniture Door handles on hinged and sliding doors in accessible bedrooms should be easy to grip and operate by a wheelchair user or ambulant disabled person ...

Page 10: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

N-Taxonomy-One-Regulation

Multiple taxonomies exist in a single industry Translation is unavoidable E.g. in architectural, engineering and construction (AEC) industry

Industry Foundation Classes (IFC) CIMsteel Integration Standards (CIS/2) Automating Equipment Information Exchange (AEX) UniFormatTM, MasterFormatTM

etc.

Possible solution: Merging taxonomy

Unfamiliar taxonomy

Page 11: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Proposed System

Page 12: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Proposed Methodology of Taxonomy Mapping

[F] 903.4.2 Alarms. Approved audible devices shall be connected to every automatic sprinkler system. Such sprinkler water-flow alarm devices shall be activated by water flow equivalent to the flow of a single sprinkler of the smallest orifice size installed in the system. Alarm devices shall be provided on the exterior of the building in an approved location. Where a fire alarm system is installed, actuation of the automatic sprinkler system shall actuate the building fire alarm system.

sprinkler system

orifice

T1

fire

alarmT1

water flow

T2

fire alarm system

T2

Taxonomy Mapping: Mainly manually nowadays Usually term matching (e.g. fire fire alarm)

Page 13: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Demonstration in Construction Industry

International Building Code, IBC

Taxonomy 1 (OmniClass) Taxonomy 2 (ifcXML)

IfcSlab

steel

KnowledgeCorpus

Corpus: carefully selected (in the same domain)

Page 14: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Relatedness Analysis on Concepts

Notations: a pool of m concepts for a taxonomy a corpus of N regulation sections frequency vector is an N-by-1 vector storing the occurrence frequencies

of concept i among the N documents frequency matrix C is an N-by-m matrix in which the i-th column vector is

ic

Example:

C =

ic

m = 4, N = 5

=3c

Concept 3 is matched to Section 4 3 times

1

3

2

0

0

0101

0300

0213

0010

2051

5sec

4sec

3sec

2sec

1sec4321

Page 15: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Cosine Similarity Measure

Common arithmetic measure of similarity to compare documents in text mining

Finding angle between two frequency vectors in N dimensions

and from Taxonomy 1 and 2 respectively Similarity score = [0, 1] Represented using dot product and magnitude, the similarity

score is given by:

ic

jc

ji

ji

cc

ccjiSim

),(

Page 16: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Jaccard Similarity Coefficient

Statistical measure of the extent of overlapping of two vectors in N dimensions and from Taxonomy 1 and 2

Defined as size of intersection divided by size of union of the vector dimension sets:

For concept relatedness analysis,

ic

jc

ji

ji

cc

ccjiJaccard

),(

011011

11),(NNN

NjiSim

N11 = number of sections both concepts i and j are matched toN10 = number of sections concept i is matched to but not concept jN01 = number of sections concept j is matched to but not concept i

Page 17: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Market Basket Model

Probabilistic measure to find item-item correlation used in data-mining Two main elements: (1) set of items; (2) set of baskets

Association rule means a basket containing all the items is very likely to contain item j

Confidence of a rule =

Interest of a rule =

Example: Coca-cola Pepsi: Low-confidence but high-interest

jiii k },...,,{ 21

kii ,...,1

),...,|Pr( 1 kiij

)Pr(),...,|Pr( 1 jiij k

Page 18: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Market Basket Model (cont’d)

For concept relatedness analysis N11 = number of sections both concepts i and j are matched to

N01 = number of sections concept j is matched to but not concept i

N10 = number of sections concept i is matched to but not concept j

N00 = number of sections both concepts i and j are NOT matched to

Probability of concept j is

Confidence of association rule is

Forward similarity of concept i and j is the interest as:

00011011

0111)Pr(NNNN

NNj

ji

1011

11)(NN

NjiConf

00011011

0111

1011

11),(NNNN

NN

NN

NjiSim

Page 19: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Asymmetry of Market Basket Model

Asymmetry of market basket model: Forward similarity:

Backward similarity:

OmniClassconcept i

IfcXMLconcept j Sim(i, j) Sim(j, i)

curtain walls IfcCurtainWall 0.992849 0.992849

sound and signal devices

IfcSwitchingDeviceType

0.998808 0.998808

roof decking IfcSlab 0.802344 0.370313

speakers IfcAlarmType 0.883194 0.018024

gypsum board IfcWallType 0.568832 0.029939

concrete IfcSlab 0.119548 0.427615

00011011

0111

1011

11)Pr()(),(NNNN

NN

NN

NjjiConfjiSim

00011011

1011

0111

11)Pr()(),(NNNN

NN

NN

NiijConfijSim

Page 20: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Evaluation of Accuracy

Root Mean Square Error (RMSE): Difference between the true values and the predicted values For Taxonomy1 of m concepts and Taxonomy2 of n concepts:

Precision: Fraction of predictions that are correct

Recall: Fraction of correct matches that are predicted

m

i

n

jjiji predictedtrue

mnRMSE

1 1,,

1

RelatedPredicted

RelatedActurally RelatedPredictedPrecision

RelatedActurally

RelatedActurally RelatedPredictedRecall

Page 21: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Evaluation Results

Cosine Similarity: Average among three metrics

Jaccard Similarity: NOT preferred (unacceptably low recall, though high precision)

Market Basket Model: Preferred (lowest RMSE, highest recall)

Cosine Similarity

Jaccard Similarity

Market Basket Model

RMSE 0.1000 0.1300 0.0825

Precision 0.9130 1.0000 0.7955

Recall 0.3559 0.1186 0.5932

20 concepts from OmniClass, 20 concepts from ifcXML

Page 22: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

Conclusion

Mapping industry-specific taxonomy to regulation allows industry practitioners to retrieve regulations faster

Four cases: 1-Taxonomy-1-Regulation: simple keyword latching 1-Taxonomy-N-Regulation: hierarchy of regulation sections

considered N-Taxonomy-1-Regulation: 3 similarity analysis metrics

introduced (cosine similarity, Jaccard similarity, market basket model)

N-Taxonomy-N-Regulation: future step

Page 23: Mapping Regulations to Industry– Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June.

~ Thank You ~