REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold Contact...

42
REGNET REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold http://eig.stanford.edu/regnet Contact [email protected] http://eig.stanford.edu/glau An E-Government Infrastructure for Regulation Parsing and Relatedness Analysis
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    212
  • download

    0

Transcript of REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold Contact...

Page 1: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

REGNETREGNET

Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederholdhttp://eig.stanford.edu/regnet

[email protected]

http://eig.stanford.edu/glau

An E-Government Infrastructure for Regulation Parsing and Relatedness Analysis

Page 2: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

2

MotivationMotivation

Multiple sources of regulations Multiple jurisdictions: federal, state, local, etc. Different formats, terminologies, contexts

UK DDA in HTMLADAAG in HTML

Amending rules, conflicting ideas

IBC in PDF

Page 3: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

3

MotivationMotivation

Multiple sources of regulations Multiple jurisdictions: federal, state, local, etc. Different formats, terminologies, contexts Amending rules, conflicting ideas

Need for a repository Locate relevant information E.g., small business: penalty fees for violations

Need for analysis tool Complexity of regulations

Multiple jurisdictions Understanding of regulations & their relationships

Page 4: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

4

Example 1: Related ProvisionsExample 1: Related Provisions

ADAAG Appendix 4.6.3

… Such a curb ramp opening must be located within the access aisle boundaries, not within the parking space boundaries.

CBC 1129B.4.3

… Ramps shall not encroach into any parking space.

Exception: 1. Ramps located at the front of accessible parking spaces may encroach into the length of such spaces …

CBC allows curb ramps encroaching into accessible parking stall access aisles, while ADA disallows encroachment into any portion of the stall.

Page 5: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

5

Example 2: Related but Conflicting Example 2: Related but Conflicting ProvisionsProvisions

ADAAG 4.7.2Slope. …Transitions from ramps to walks, gutters, or streets shall be flush and free of abrupt changes…

CBC 1127B.5.5Beveled lip. The lower end of each curb ramp shall have a ½ inch (13mm) lip beveled at 45 degrees as a detectable way-finding edge for persons with visual impairments.

ADAAG focuses on wheelchair traversal; CBC focuses on the visually impaired when using a cane.

Page 6: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

6

ScopeScope

1. Overview Examples of system capabilities

2. Repository development 3. Relatedness analysis

generic features

domain-specific features

shallow parser

regulations in HTML, PDF,plain text, etc

feature extractor

Ontology

XML regulations

measurements exceptions definitions

Semio

concepts

author-prescribed

indicesglossary termsrefined XML regulations

DomainExpert

chemicals

effective dates

Similarity Analysis Core

domain knowledge

score refinements

feature matching

measurements

concepts

effective dates

drinking watercontaminants

base score

neighbor inclusion

reference distribution

refined score

discard belowthreshold pairs

related pairs

author-prescribed

indices

ontology (synonymicinformation) . . .

refined XMLregulations

. . .

domain-specificscoring algorithm

+

Page 7: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

7

40CFR natural structureOriginal 40CFR

Overview of System Capabilities: ParsingOverview of System Capabilities: ParsingPART 279—Standards For The Management Of UsedOil

Subpart B – Applicability

…§ 279.12 Prohibitions.(a) Surface impoundment prohibition. Used oil shallnot be managed in surface impoundments or wastepiles unless the units are subject to regulation underparts 264 or 265 of this chapter.(b) Use as a dust suppressant. The use of used oil as a dustsuppressant is prohibited, except when such activity takesplace in one of the states listed in § 279.82(c).(c) Burning in particular units. Off-specification used oilfuel may be burned for energy recovery in only thefollowing devices:

(1) Industrial furnaces identified in § 260.10 of thischapter;

(2) Boilers, as defined in § 260.10 of this chapter, thatare identified as follows:

(i) Industrial boilers located on the site of a facilityengaged in a manufacturing process where substances aretransformed into new products, including the componentparts of products, by mechanical or chemical processes;….§ 262.11 Used Oil Specification.…..

PART 279—Standards For The Management Of UsedOil

Subpart B – Applicability

…§ 279.12 Prohibitions.(a) Surface impoundment prohibition. Used oil shallnot be managed in surface impoundments or wastepiles unless the units are subject to regulation underparts 264 or 265 of this chapter.(b) Use as a dust suppressant. The use of used oil as a dustsuppressant is prohibited, except when such activity takesplace in one of the states listed in § 279.82(c).(c) Burning in particular units. Off-specification used oilfuel may be burned for energy recovery in only thefollowing devices:

(1) Industrial furnaces identified in § 260.10 of thischapter;

(2) Boilers, as defined in § 260.10 of this chapter, thatare identified as follows:

(i) Industrial boilers located on the site of a facilityengaged in a manufacturing process where substances aretransformed into new products, including the componentparts of products, by mechanical or chemical processes;….§ 262.11 Used Oil Specification.…..

Subsection(a)

Subsection(b)

Subsection(c)

Subsection(d)

40 CFR 279

Subpart A Subpart B Subpart I

Section 262.10 Section 262.11 Section 262.12

…… …

(a) Surface impoundment prohibition.Used oil shall not be managed insurface impoundments or waste pilesunless the units…

Example:

Subsection(a)

Subsection(b)

Subsection(c)

Subsection(d)

40 CFR 279

Subpart A Subpart B Subpart I

Section 262.10 Section 262.11 Section 262.12

…… …

(a) Surface impoundment prohibition.Used oil shall not be managed insurface impoundments or waste pilesunless the units…

Example:

(a) Surface impoundment prohibition.Used oil shall not be managed insurface impoundments or waste pilesunless the units…

Example:

Page 8: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

8

IBC in 2-columned PDF

XML hierarchy

Overview of System Capabilities: ParsingOverview of System Capabilities: Parsing

<regulation id="ibc" name="international building code" type="private">

<regElement id="ibc.1107" name="special occupancies"> …

<regElement id="ibc.1107.2" name=“assembly area seating">

<reference id="ibc.1107.2.4.1" times="1" />

<concept name="assembl area" times="1" /> …

<regText>Assembly areas with fixed seating shall comply with Sections … </regText>

<regElement id="ibc.1107.2.1" name="services"> ... </regElement>

</regElement>

</regElement>

</regulation>

Page 9: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

9

Usages of featuresExtracted features

Overview of System Capabilities: Overview of System Capabilities: Feature Feature ParsingParsing

Search Terms/ Concepts

Definitions

Links to References

Letter of Interpretation

Search by Concept

Page 10: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

10

Regulation comparison: 40CFR vs. 22CCR

Overview of System Capabilities: Overview of System Capabilities: ComparisonsComparisons

Page 11: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

11

Drafted regulations compared with public comments

Overview of System Capabilities: E-Overview of System Capabilities: E-rulemakingrulemaking

Content ofSection 1105.4

6 Related Public Comments

1105.4 [6]

Page 12: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

12

ScopeScope

1. Overview Examples of system capabilities

2. Repository development 3. Relatedness analysis

generic features

domain-specific features

shallow parser

regulations in HTML, PDF,plain text, etc

feature extractor

Ontology

XML regulations

measurements exceptions definitions

Semio

concepts

author-prescribed

indicesglossary termsrefined XML regulations

DomainExpert

chemicals

effective dates

Similarity Analysis Core

domain knowledge

score refinements

feature matching

measurements

concepts

effective dates

drinking watercontaminants

base score

neighbor inclusion

reference distribution

refined score

discard belowthreshold pairs

related pairs

author-prescribed

indices

ontology (synonymicinformation) . . .

refined XMLregulations

. . .

domain-specificscoring algorithm

+

Page 13: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

13

Repository developmentRepository development

generic features

domain-specific features

shallow parser

regulations in HTML, PDF,plain text, etc

feature extractor

Ontology

XML regulations

measurements exceptions definitions

Semio

concepts

author-prescribed

indicesglossary termsrefined XML regulations

DomainExpert

chemicals

effective dates

Page 14: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

14

Shallow parserShallow parser

Data Source Americans with Disabilities Act Accessibility Guide

(ADAAG), Uniform Federal Accessibility Standards (UFAS), Code of Federal Regulations Title 40 (40CFR), UK and Scottish Disability Discrimination Act, etc.

Current standard: HTML, PDF, hardcopy... Our system standard: XML Unit of extraction: section/provision

<regElement id=”ufas.4.32.1” name=”minimum number” asterisk=”0” >

<regText> Fixed or built-in seating, ... </regText>

<ref name=”ufas.4.5” num=”1” />

<ref name=”ufas.4.32” num=”1” />

</regElement>

Page 15: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

15

XMLregulations

shallow parser

singlecolumn

plaintext

getstructure

info

webster dictionary

parseaccording to

structure

parse tableof contents

originalregulations

Tagged with basicdocument structure

Shallow parser: PDF Shallow parser: PDF Basic XML format Basic XML format

40cfr.279.12

(a) Surface impoundment prohibition.

Used oil shall not be managed in sur-

face impoundments or waste piles un-

less the units are subject to regulation

under parts 264 or 265 of this chapter.

Subsection(a)

Subsection(b)

Subsection(c)

Subsection(d)

40 CFR 279

Subpart A Subpart B Subpart I

Section 262.10 Section 262.11 Section 262.12

…… …

(a) Surface impoundment prohibition.Used oil shall not be managed insurface impoundments or waste pilesunless the units…

Example:

Subsection(a)

Subsection(b)

Subsection(c)

Subsection(d)

40 CFR 279

Subpart A Subpart B Subpart I

Section 262.10 Section 262.11 Section 262.12

…… …

(a) Surface impoundment prohibition.Used oil shall not be managed insurface impoundments or waste pilesunless the units…

Example:

(a) Surface impoundment prohibition.Used oil shall not be managed insurface impoundments or waste pilesunless the units…

Example:

Page 16: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

16

Shallow parser: HTML Shallow parser: HTML Basic XML format Basic XML format

XMLregulations

shallow parser

singlecolumn

plaintext

getstructure

info

webster dictionary

parseaccording to

structure

parse tableof contents

originalregulations

Tagged with basicdocument structure

<regulation id="40.cfr" name="code of federal regulations" type="federal"> ...

<regElement id="40.cfr.279.12.c" name="Burning in particular units."> ...

<regElement id="40.cfr.279.12.c.3" name="">

<reference id="40.cfr.264.O" times="1" /> ...

<concept name="waste incinerator" times="1" />

<regText> Hazardous waste incinerators subject to regulation under subpart O of parts 264 or 265 of this chapter. </regText>

</regElement>

</regElement>

</regulation>

Page 17: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

17

Shallow parser: extracting referencesShallow parser: extracting references

REF

ASSUME_LEV0 LEV2’

SUBPART UL’

UL

BACKREFKEY LEV1r’

LEV1p

LEV1r CONN’ LEV1a’

LEV1a

LEV1s

INT

CONN

PART INT CONL2

e

Subpart

O

part

of

265

264

or

40.cfr

<regulation id="40.cfr" name="code of federal regulations" type="federal"> ...

<regElement id="40.cfr.279.12.c" name="Burning in particular units"> ...

<regElement id="40.cfr.279.12.c.3" name="">

<reference id="40.cfr.264.O" times="1" /> ...

<concept name="waste incinerator" times="1" />

<regText> Hazardous waste incinerators subject to regulation under subpart O of parts 264 or 265 of this chapter. </regText>

</regElement>

</regElement>

</regulation>

Page 18: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

18

Shallow parser: feature extractionShallow parser: feature extraction

Non-structural characteristics specific to a corpus To aid user retrieval of relevant materials For analysis purpose

generic features

domain-specific features

shallow parser

regulations in HTML, PDF,plain text, etc

feature extractor

Ontology

XML regulations

measurements exceptions definitions

Semio

concepts

author-prescribed

indicesglossary termsrefined XML regulations

DomainExpert

chemicals

effective dates

Page 19: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

19

Generic features Concepts - noun

phrases Exceptions - negated

provisions Definitions -

terminologies defined in regulations

Domain-specific features Glossary terms -

definitions from reference guides

Shallow parser: feature extractionShallow parser: feature extraction

generic features

domain-specific features

shallow parser

regulations in HTML, PDF,plain text, etc

feature extractor

Ontology

XML regulations

measurements exceptions definitions

Semio

concepts

author-prescribed

indicesglossary termsrefined XML regulations

DomainExpert

chemicals

effective dates

Author-prescribed indices - concepts from field handbooks

Measurements - e.g., 2 inches max, 4 ppm Chemicals - list of drinking water contaminants from EPA Effective dates - provision updates

Page 20: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

20

Example of Example of definitiondefinition//glossaryglossary tags tagsOriginal section 3.5 from the ADAAG

3.5 DEFINITIONS. Accessible. Describes a site, building, facility, or portion thereof … Clear. Unobstructed.

Refined section 3.5 in XML format<regElement name=”adaag.3.5” title=”definitions” asterisk=”0”> <indexTerm name=”facility” num=”1” /> <definition>

<term> accessible </term> <definedAs> Describes a site, building, facility, or portion thereof... </definedAs>

</definition> <definition>

<term> clear </term> <definedAs> Unobstructed. </definedAs>

</definition></regElement>

Page 21: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

21

Example of Example of indexTerm, concept, indexTerm, concept,

measurementmeasurement & & exceptionexception tags tagsOriginal section 4.6.3 from the UFAS

4.6.3* PARKING SPACES. Parking spaces for disabled people shall be at least 96 in (2440 mm) wide and shall have an adjacent access aisle 60 in (1525 mm) wide minimum (see Fig. 9). Parking access aisles shall be part of ...

EXCEPTION: … an adjacent access aisle at least 96 in (2440 mm) wide complying with 4.5...

Refined section 4.6.3 in XML format<regElement name=”ufas.4.6.3” title=”parking spaces” asterisk=”1”>

<concept name=”access aisle” num=”3” /> …

<indexTerm name=”accessible circulation route” num=”1” />

<measurement unit=”inch” magnitude=”96” quantifier=”min” />

<ref name=”ufas.4.5” num=”1” />

<regText> Parking spaces for disabled people shall ... </regText>

<exception> If accessible parking spaces for ... </exception>

</regElement>

Page 22: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

22

Usages of extracted features revisitedUsages of extracted features revisited

Usages of featuresExtracted features

Search Terms/ Concepts

Definitions

Links to References

Letter of Interpretation

Search by Concept

Page 23: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

23

ScopeScope

1. Overview Examples of system capabilities

2. Repository development 3. Relatedness analysis

generic features

domain-specific features

shallow parser

regulations in HTML, PDF,plain text, etc

feature extractor

Ontology

XML regulations

measurements exceptions definitions

Semio

concepts

author-prescribed

indicesglossary termsrefined XML regulations

DomainExpert

chemicals

effective dates

Similarity Analysis Core

domain knowledge

score refinements

feature matching

measurements

concepts

effective dates

drinking watercontaminants

base score

neighbor inclusion

reference distribution

refined score

discard belowthreshold pairs

related pairs

author-prescribed

indices

ontology (synonymicinformation) . . .

refined XMLregulations

. . .

domain-specificscoring algorithm

+

Page 24: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

24

Relatedness analysisRelatedness analysis

Similarity Analysis Core

domain knowledge

score refinements

feature matching

measurements

concepts

effective dates

drinking watercontaminants

base score

neighbor inclusion

reference distribution

refined score

discard belowthreshold pairs

related pairs

author-prescribed

indices

ontology (synonymicinformation) . . .

refined XMLregulations

. . .

domain-specificscoring algorithm

+

Page 25: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

25

Relatedness analysisRelatedness analysis

To utilize the structure, referencing of regulations and domain knowledge to obtain a better comparison

Measure Similarity score f(A, U) (0, 1) Nodes A and U are provisions from two different

regulation trees

f (0, 1)A U

ADAAG UFAS

parent

sibling

child

psc(A) psc(U) ref(U)

child node

reference node

nodes in comparison

Page 26: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

26

Base score Base score ff00 computation computation

Linear combination of feature matching

F(A,U,i) = similarity score between Sections (A,U) based on feature i

N = total number of features

N

iUAFUAf

N

i 10

),,(),(

|||| NM

NM

dd

dd

Similarity Analysis Core

domain knowledge

score refinements

feature matching

measurements

concepts

effective dates

drinking watercontaminants

base score

neighbor inclusion

reference distribution

refined score

discard belowthreshold pairs

related pairs

author-prescribed

indices

ontology (synonymicinformation) . . .

refined XMLregulations

. . .

domain-specificscoring algorithm

+

Feature matching Based on the Vector model using cosine similarity as the

distance between feature vectors Similarity between two documents M and N =

dM and dN are document vectors Cosine is normalized => always between 0 and 1

Page 27: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

27

Example of feature vectorsExample of feature vectors Traditional term match

each index term i is assigned a positive and non-binary weight wi,M in each document vector d M

Weight selection Frequency of term, or tf idf model

tf = term frequency; term density idf = inverse document frequency = log(n/ni); term rarity

Excluding stopwords Feature = concept

Concept vectors are formed per provision based on concept frequency in each provision

F(provision 1, provision 2, feature=concept) = cosine between two concept vectors

Page 28: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

28

Axis dependency: non-Boolean matchingAxis dependency: non-Boolean matching

Vector model assumes mutual independence between axes

Domain experts do not necessarily agree A measurement of “2 inches max” can be a 70%

match to “2 inches” Synonyms exist, e.g., ontology defined for chemicals

Limitation observed Need flexibility to model domain knowledge, such as a

0, 50%, 75% and 100% measurement match:

2 ppm

2 ppm min

2 ppm max

2ppm

0.75

0.750.5

measurements scores

1

1

1

Page 29: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

29

Proposed non-Boolean matching modelProposed non-Boolean matching model Define a feature matching matrix E

Eij = % match between features i and j E.g., a 3-dimensional vector space using “2 ppm”, “2

ppm max” and “2 ft” as the first, second and third measurement axes:

E =

Vector space transformation Map feature vectors onto an alternate space via matrix D Cosines are computed on the consolidated frequency

vectors

E.g., similarity based on measurements =

100

0175.0

075.01

|||| UA

UA

mDmD

mDmD

Page 30: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

30

Vector space transformationVector space transformation

Define D such that E = DTD is fulfilled Cosine between the consolidated frequency

vectors: =

=

=

=

Reduces to a Boolean cosine when E = I

|'||'|

''

UA

UA

mm

mm

|||| UA

UA

mDmD

mDmD

UT

UAT

A

UT

A

mDmDmDmD

mDmD

)()(

)(

UTT

UATT

A

UTT

A

mDDmmDDm

mDDm

UT

UATA

UTA

mEmmEm

mEm

Page 31: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

31

Score refinements based on regulation Score refinements based on regulation structurestructure Neighbor inclusion

Diffusion of similarity between clusters of nodes in the tree

Self vs. parent-sibling-child (psc), fs-

psc

psc vs. psc, fpsc-psc

A U

ADAAG UFAS

parentparent

sibling

child

sibling

child

psc(A) psc(U)

s-psc

psc-psc

Similarity Analysis Core

domain knowledge

score refinements

feature matching

measurements

concepts

effective dates

drinking watercontaminants

base score

neighbor inclusion

reference distribution

refined score

discard belowthreshold pairs

related pairs

author-prescribed

indices

ontology (synonymicinformation) . . .

refined XMLregulations

. . .

domain-specificscoring algorithm

+

Page 32: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

32

Neighbor inclusion: psc vs. pscNeighbor inclusion: psc vs. psc

A1 U1

ADAAG UFAS

psc(A1) psc(U1)

A2

U2psc(A2)

psc(U2)

child node

nodes in comparison

spread of similarity

red: similar nodes

blue: dissimilar nodes

Take a linear combination of neighboring pair scores Formulate a neighbor structure matrix N Define score matrix We have psc-psc = NA0NU

T

Page 33: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

33

Neighbor inclusion: self vs. pscNeighbor inclusion: self vs. psc

A1 U1

ADAAG UFAS

psc(U1)

child node

nodes in comparison

spread of similarity

red: similar nodes

blue: dissimilar nodes

A2

U2

psc(A2)

Take a linear combination of neighbor vs. self scores Formulate a neighbor structure matrix N Define score matrix We have s-psc = ½ (0NU

T + NA0)

Page 34: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

34

Score refinements based on regulation Score refinements based on regulation structurestructure

Reference distribution Diffusion of similarity between referencing nodes and

referenced nodes in the tree E.g., f(A5.3, U6.4(a)) updates f(A2.1, U3.3)

ADAAG--------------------------

Section 2.1-----------------------------------------------------------------

Section 5.3--------------------------

UFAS---------------------------------------

Section 3.3-----------------------------------------------------------------

Section 6.4(a)-------------

no crossreference

similarsections: fo != 0

reference

Page 35: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

35

Reference distribution: s-ref and ref-refReference distribution: s-ref and ref-ref

...A1

...

...

...A2

...A3

...

...U1

...U2

...

...U3

...

ADAAG UFAS

...A1

...

...

...A2

...A3

...

...U1

...U2

...

...U3

...

reference

s-ref comparisonsof Sections A2, U2

ref-ref comparisonsof Sections A2, U2

ADAAG UFAS

Take a linear combination of reference vs. self and reference vs. reference scores Formulate a reference structure matrix R Define score matrix We have ref-ref = RA0RU

T and s-ref = ½ (0RUT + RA0)

Page 36: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

36

Phrasing difference between American and British regulationsufas.4.13.9 Door Hardware. Handles, pulls, latches, locks, and other operating devices on accessible doors shall have a shape that is easy …

bs8300.12.5.4.2 Door Furniture. Door handles on hinged and sliding doors in accessible bedrooms should be easy to grip …

Neighbor similarities imply similarity between the interested nodes

Example of results: UFAS vs BS8300Example of results: UFAS vs BS8300

4.13 Doors 12.5.4 Doors

4.13.9Door Hardware

12.5.4.2Door Furniture

12.5.4.14.13.1

4.13.3

4.13.2

4.13.12

UFAS BS8300

parent

sibling

Page 37: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

37

Example of results: almost identical Example of results: almost identical provisionsprovisions

Regulation comparison: 40CFR vs. CCR

Page 38: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

38

Application domain: e-rulemaking Comparison between draft of rules and the

associated public comments ADAAG Chapter 11, rights-of-way draft

Less than 15 pages Over 1400 public comments received within 4

months Comments ~ 10MB in size; most are several pages

long New regulation draft can easily generate a huge

amount of data that needs to be reviewed and analyzed

Example of results: e-rulemakingExample of results: e-rulemaking

Page 39: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

39

Example of results: e-rulemakingExample of results: e-rulemaking

Regulations compared with public comments

Content ofSection 1105.4

6 Related Public Comments

1105.4 [6]

Page 40: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

40

Related draft section and public commentAdaag.1105.4.1

Where signal timing is inadequate for full crossing of all traffic lanes or where the crossing is not signalized, cut-through medians …

Deborah Wood, October 29, 2002

… This often means walk lights that are so short in duration that by the time a person who is blind realizes …

No identified related sectionDonna Ring, September 6, 2002

If you become blind, no amount of electronics … will make you safe … You have to learn modern blindness skills from a good teacher. You have to practice your new skills …

Concern not addressed in the draft

Example of results: e-rulemakingExample of results: e-rulemaking

Page 41: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

41

ConclusionsConclusions An infrastructure for

Repository for regulations Shallow parser Feature extractions

Similarity comparison Base score Score refinements Results

Comparisons between Federal codes, European codes Application to e-rulemaking

Future Directions Extension of application to other domains of semi-

structured documents Conflict analysis?

Page 42: REGNET Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold  Contact glau@stanford.edu .

42

Thank You!Thank You!