REGNETREGNET
Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederholdhttp://eig.stanford.edu/regnet
http://eig.stanford.edu/glau
An E-Government Infrastructure for Regulation Parsing and Relatedness Analysis
2
MotivationMotivation
Multiple sources of regulations Multiple jurisdictions: federal, state, local, etc. Different formats, terminologies, contexts
UK DDA in HTMLADAAG in HTML
Amending rules, conflicting ideas
IBC in PDF
3
MotivationMotivation
Multiple sources of regulations Multiple jurisdictions: federal, state, local, etc. Different formats, terminologies, contexts Amending rules, conflicting ideas
Need for a repository Locate relevant information E.g., small business: penalty fees for violations
Need for analysis tool Complexity of regulations
Multiple jurisdictions Understanding of regulations & their relationships
4
Example 1: Related ProvisionsExample 1: Related Provisions
ADAAG Appendix 4.6.3
… Such a curb ramp opening must be located within the access aisle boundaries, not within the parking space boundaries.
CBC 1129B.4.3
… Ramps shall not encroach into any parking space.
Exception: 1. Ramps located at the front of accessible parking spaces may encroach into the length of such spaces …
CBC allows curb ramps encroaching into accessible parking stall access aisles, while ADA disallows encroachment into any portion of the stall.
5
Example 2: Related but Conflicting Example 2: Related but Conflicting ProvisionsProvisions
ADAAG 4.7.2Slope. …Transitions from ramps to walks, gutters, or streets shall be flush and free of abrupt changes…
CBC 1127B.5.5Beveled lip. The lower end of each curb ramp shall have a ½ inch (13mm) lip beveled at 45 degrees as a detectable way-finding edge for persons with visual impairments.
ADAAG focuses on wheelchair traversal; CBC focuses on the visually impaired when using a cane.
6
ScopeScope
1. Overview Examples of system capabilities
2. Repository development 3. Relatedness analysis
generic features
domain-specific features
shallow parser
regulations in HTML, PDF,plain text, etc
feature extractor
Ontology
XML regulations
measurements exceptions definitions
Semio
concepts
author-prescribed
indicesglossary termsrefined XML regulations
DomainExpert
chemicals
effective dates
Similarity Analysis Core
domain knowledge
score refinements
feature matching
measurements
concepts
effective dates
drinking watercontaminants
base score
neighbor inclusion
reference distribution
refined score
discard belowthreshold pairs
related pairs
author-prescribed
indices
ontology (synonymicinformation) . . .
refined XMLregulations
. . .
domain-specificscoring algorithm
+
7
40CFR natural structureOriginal 40CFR
Overview of System Capabilities: ParsingOverview of System Capabilities: ParsingPART 279—Standards For The Management Of UsedOil
Subpart B – Applicability
…§ 279.12 Prohibitions.(a) Surface impoundment prohibition. Used oil shallnot be managed in surface impoundments or wastepiles unless the units are subject to regulation underparts 264 or 265 of this chapter.(b) Use as a dust suppressant. The use of used oil as a dustsuppressant is prohibited, except when such activity takesplace in one of the states listed in § 279.82(c).(c) Burning in particular units. Off-specification used oilfuel may be burned for energy recovery in only thefollowing devices:
(1) Industrial furnaces identified in § 260.10 of thischapter;
(2) Boilers, as defined in § 260.10 of this chapter, thatare identified as follows:
(i) Industrial boilers located on the site of a facilityengaged in a manufacturing process where substances aretransformed into new products, including the componentparts of products, by mechanical or chemical processes;….§ 262.11 Used Oil Specification.…..
PART 279—Standards For The Management Of UsedOil
Subpart B – Applicability
…§ 279.12 Prohibitions.(a) Surface impoundment prohibition. Used oil shallnot be managed in surface impoundments or wastepiles unless the units are subject to regulation underparts 264 or 265 of this chapter.(b) Use as a dust suppressant. The use of used oil as a dustsuppressant is prohibited, except when such activity takesplace in one of the states listed in § 279.82(c).(c) Burning in particular units. Off-specification used oilfuel may be burned for energy recovery in only thefollowing devices:
(1) Industrial furnaces identified in § 260.10 of thischapter;
(2) Boilers, as defined in § 260.10 of this chapter, thatare identified as follows:
(i) Industrial boilers located on the site of a facilityengaged in a manufacturing process where substances aretransformed into new products, including the componentparts of products, by mechanical or chemical processes;….§ 262.11 Used Oil Specification.…..
Subsection(a)
Subsection(b)
Subsection(c)
Subsection(d)
40 CFR 279
Subpart A Subpart B Subpart I
Section 262.10 Section 262.11 Section 262.12
…
…… …
(a) Surface impoundment prohibition.Used oil shall not be managed insurface impoundments or waste pilesunless the units…
Example:
Subsection(a)
Subsection(b)
Subsection(c)
Subsection(d)
40 CFR 279
Subpart A Subpart B Subpart I
Section 262.10 Section 262.11 Section 262.12
…
…… …
(a) Surface impoundment prohibition.Used oil shall not be managed insurface impoundments or waste pilesunless the units…
Example:
(a) Surface impoundment prohibition.Used oil shall not be managed insurface impoundments or waste pilesunless the units…
Example:
8
IBC in 2-columned PDF
XML hierarchy
Overview of System Capabilities: ParsingOverview of System Capabilities: Parsing
<regulation id="ibc" name="international building code" type="private">
<regElement id="ibc.1107" name="special occupancies"> …
<regElement id="ibc.1107.2" name=“assembly area seating">
<reference id="ibc.1107.2.4.1" times="1" />
<concept name="assembl area" times="1" /> …
<regText>Assembly areas with fixed seating shall comply with Sections … </regText>
<regElement id="ibc.1107.2.1" name="services"> ... </regElement>
</regElement>
</regElement>
</regulation>
9
Usages of featuresExtracted features
Overview of System Capabilities: Overview of System Capabilities: Feature Feature ParsingParsing
Search Terms/ Concepts
Definitions
Links to References
Letter of Interpretation
Search by Concept
10
Regulation comparison: 40CFR vs. 22CCR
Overview of System Capabilities: Overview of System Capabilities: ComparisonsComparisons
11
Drafted regulations compared with public comments
Overview of System Capabilities: E-Overview of System Capabilities: E-rulemakingrulemaking
Content ofSection 1105.4
6 Related Public Comments
1105.4 [6]
12
ScopeScope
1. Overview Examples of system capabilities
2. Repository development 3. Relatedness analysis
generic features
domain-specific features
shallow parser
regulations in HTML, PDF,plain text, etc
feature extractor
Ontology
XML regulations
measurements exceptions definitions
Semio
concepts
author-prescribed
indicesglossary termsrefined XML regulations
DomainExpert
chemicals
effective dates
Similarity Analysis Core
domain knowledge
score refinements
feature matching
measurements
concepts
effective dates
drinking watercontaminants
base score
neighbor inclusion
reference distribution
refined score
discard belowthreshold pairs
related pairs
author-prescribed
indices
ontology (synonymicinformation) . . .
refined XMLregulations
. . .
domain-specificscoring algorithm
+
13
Repository developmentRepository development
generic features
domain-specific features
shallow parser
regulations in HTML, PDF,plain text, etc
feature extractor
Ontology
XML regulations
measurements exceptions definitions
Semio
concepts
author-prescribed
indicesglossary termsrefined XML regulations
DomainExpert
chemicals
effective dates
14
Shallow parserShallow parser
Data Source Americans with Disabilities Act Accessibility Guide
(ADAAG), Uniform Federal Accessibility Standards (UFAS), Code of Federal Regulations Title 40 (40CFR), UK and Scottish Disability Discrimination Act, etc.
Current standard: HTML, PDF, hardcopy... Our system standard: XML Unit of extraction: section/provision
<regElement id=”ufas.4.32.1” name=”minimum number” asterisk=”0” >
<regText> Fixed or built-in seating, ... </regText>
<ref name=”ufas.4.5” num=”1” />
<ref name=”ufas.4.32” num=”1” />
</regElement>
15
XMLregulations
shallow parser
singlecolumn
plaintext
getstructure
info
webster dictionary
parseaccording to
structure
parse tableof contents
originalregulations
Tagged with basicdocument structure
Shallow parser: PDF Shallow parser: PDF Basic XML format Basic XML format
40cfr.279.12
(a) Surface impoundment prohibition.
Used oil shall not be managed in sur-
face impoundments or waste piles un-
less the units are subject to regulation
under parts 264 or 265 of this chapter.
Subsection(a)
Subsection(b)
Subsection(c)
Subsection(d)
40 CFR 279
Subpart A Subpart B Subpart I
Section 262.10 Section 262.11 Section 262.12
…
…… …
(a) Surface impoundment prohibition.Used oil shall not be managed insurface impoundments or waste pilesunless the units…
Example:
Subsection(a)
Subsection(b)
Subsection(c)
Subsection(d)
40 CFR 279
Subpart A Subpart B Subpart I
Section 262.10 Section 262.11 Section 262.12
…
…… …
(a) Surface impoundment prohibition.Used oil shall not be managed insurface impoundments or waste pilesunless the units…
Example:
(a) Surface impoundment prohibition.Used oil shall not be managed insurface impoundments or waste pilesunless the units…
Example:
16
Shallow parser: HTML Shallow parser: HTML Basic XML format Basic XML format
XMLregulations
shallow parser
singlecolumn
plaintext
getstructure
info
webster dictionary
parseaccording to
structure
parse tableof contents
originalregulations
Tagged with basicdocument structure
<regulation id="40.cfr" name="code of federal regulations" type="federal"> ...
<regElement id="40.cfr.279.12.c" name="Burning in particular units."> ...
<regElement id="40.cfr.279.12.c.3" name="">
<reference id="40.cfr.264.O" times="1" /> ...
<concept name="waste incinerator" times="1" />
<regText> Hazardous waste incinerators subject to regulation under subpart O of parts 264 or 265 of this chapter. </regText>
</regElement>
</regElement>
</regulation>
17
Shallow parser: extracting referencesShallow parser: extracting references
REF
ASSUME_LEV0 LEV2’
SUBPART UL’
UL
BACKREFKEY LEV1r’
LEV1p
LEV1r CONN’ LEV1a’
LEV1a
LEV1s
INT
CONN
PART INT CONL2
e
Subpart
O
part
of
265
264
or
40.cfr
<regulation id="40.cfr" name="code of federal regulations" type="federal"> ...
<regElement id="40.cfr.279.12.c" name="Burning in particular units"> ...
<regElement id="40.cfr.279.12.c.3" name="">
<reference id="40.cfr.264.O" times="1" /> ...
<concept name="waste incinerator" times="1" />
<regText> Hazardous waste incinerators subject to regulation under subpart O of parts 264 or 265 of this chapter. </regText>
</regElement>
</regElement>
</regulation>
18
Shallow parser: feature extractionShallow parser: feature extraction
Non-structural characteristics specific to a corpus To aid user retrieval of relevant materials For analysis purpose
generic features
domain-specific features
shallow parser
regulations in HTML, PDF,plain text, etc
feature extractor
Ontology
XML regulations
measurements exceptions definitions
Semio
concepts
author-prescribed
indicesglossary termsrefined XML regulations
DomainExpert
chemicals
effective dates
19
Generic features Concepts - noun
phrases Exceptions - negated
provisions Definitions -
terminologies defined in regulations
Domain-specific features Glossary terms -
definitions from reference guides
Shallow parser: feature extractionShallow parser: feature extraction
generic features
domain-specific features
shallow parser
regulations in HTML, PDF,plain text, etc
feature extractor
Ontology
XML regulations
measurements exceptions definitions
Semio
concepts
author-prescribed
indicesglossary termsrefined XML regulations
DomainExpert
chemicals
effective dates
Author-prescribed indices - concepts from field handbooks
Measurements - e.g., 2 inches max, 4 ppm Chemicals - list of drinking water contaminants from EPA Effective dates - provision updates
20
Example of Example of definitiondefinition//glossaryglossary tags tagsOriginal section 3.5 from the ADAAG
3.5 DEFINITIONS. Accessible. Describes a site, building, facility, or portion thereof … Clear. Unobstructed.
Refined section 3.5 in XML format<regElement name=”adaag.3.5” title=”definitions” asterisk=”0”> <indexTerm name=”facility” num=”1” /> <definition>
<term> accessible </term> <definedAs> Describes a site, building, facility, or portion thereof... </definedAs>
</definition> <definition>
<term> clear </term> <definedAs> Unobstructed. </definedAs>
</definition></regElement>
21
Example of Example of indexTerm, concept, indexTerm, concept,
measurementmeasurement & & exceptionexception tags tagsOriginal section 4.6.3 from the UFAS
4.6.3* PARKING SPACES. Parking spaces for disabled people shall be at least 96 in (2440 mm) wide and shall have an adjacent access aisle 60 in (1525 mm) wide minimum (see Fig. 9). Parking access aisles shall be part of ...
EXCEPTION: … an adjacent access aisle at least 96 in (2440 mm) wide complying with 4.5...
Refined section 4.6.3 in XML format<regElement name=”ufas.4.6.3” title=”parking spaces” asterisk=”1”>
<concept name=”access aisle” num=”3” /> …
<indexTerm name=”accessible circulation route” num=”1” />
<measurement unit=”inch” magnitude=”96” quantifier=”min” />
<ref name=”ufas.4.5” num=”1” />
<regText> Parking spaces for disabled people shall ... </regText>
<exception> If accessible parking spaces for ... </exception>
</regElement>
22
Usages of extracted features revisitedUsages of extracted features revisited
Usages of featuresExtracted features
Search Terms/ Concepts
Definitions
Links to References
Letter of Interpretation
Search by Concept
23
ScopeScope
1. Overview Examples of system capabilities
2. Repository development 3. Relatedness analysis
generic features
domain-specific features
shallow parser
regulations in HTML, PDF,plain text, etc
feature extractor
Ontology
XML regulations
measurements exceptions definitions
Semio
concepts
author-prescribed
indicesglossary termsrefined XML regulations
DomainExpert
chemicals
effective dates
Similarity Analysis Core
domain knowledge
score refinements
feature matching
measurements
concepts
effective dates
drinking watercontaminants
base score
neighbor inclusion
reference distribution
refined score
discard belowthreshold pairs
related pairs
author-prescribed
indices
ontology (synonymicinformation) . . .
refined XMLregulations
. . .
domain-specificscoring algorithm
+
24
Relatedness analysisRelatedness analysis
Similarity Analysis Core
domain knowledge
score refinements
feature matching
measurements
concepts
effective dates
drinking watercontaminants
base score
neighbor inclusion
reference distribution
refined score
discard belowthreshold pairs
related pairs
author-prescribed
indices
ontology (synonymicinformation) . . .
refined XMLregulations
. . .
domain-specificscoring algorithm
+
25
Relatedness analysisRelatedness analysis
To utilize the structure, referencing of regulations and domain knowledge to obtain a better comparison
Measure Similarity score f(A, U) (0, 1) Nodes A and U are provisions from two different
regulation trees
f (0, 1)A U
ADAAG UFAS
parent
sibling
child
psc(A) psc(U) ref(U)
child node
reference node
nodes in comparison
26
Base score Base score ff00 computation computation
Linear combination of feature matching
F(A,U,i) = similarity score between Sections (A,U) based on feature i
N = total number of features
N
iUAFUAf
N
i 10
),,(),(
|||| NM
NM
dd
dd
Similarity Analysis Core
domain knowledge
score refinements
feature matching
measurements
concepts
effective dates
drinking watercontaminants
base score
neighbor inclusion
reference distribution
refined score
discard belowthreshold pairs
related pairs
author-prescribed
indices
ontology (synonymicinformation) . . .
refined XMLregulations
. . .
domain-specificscoring algorithm
+
Feature matching Based on the Vector model using cosine similarity as the
distance between feature vectors Similarity between two documents M and N =
dM and dN are document vectors Cosine is normalized => always between 0 and 1
27
Example of feature vectorsExample of feature vectors Traditional term match
each index term i is assigned a positive and non-binary weight wi,M in each document vector d M
Weight selection Frequency of term, or tf idf model
tf = term frequency; term density idf = inverse document frequency = log(n/ni); term rarity
Excluding stopwords Feature = concept
Concept vectors are formed per provision based on concept frequency in each provision
F(provision 1, provision 2, feature=concept) = cosine between two concept vectors
28
Axis dependency: non-Boolean matchingAxis dependency: non-Boolean matching
Vector model assumes mutual independence between axes
Domain experts do not necessarily agree A measurement of “2 inches max” can be a 70%
match to “2 inches” Synonyms exist, e.g., ontology defined for chemicals
Limitation observed Need flexibility to model domain knowledge, such as a
0, 50%, 75% and 100% measurement match:
2 ppm
2 ppm min
2 ppm max
2ppm
0.75
0.750.5
measurements scores
1
1
1
29
Proposed non-Boolean matching modelProposed non-Boolean matching model Define a feature matching matrix E
Eij = % match between features i and j E.g., a 3-dimensional vector space using “2 ppm”, “2
ppm max” and “2 ft” as the first, second and third measurement axes:
E =
Vector space transformation Map feature vectors onto an alternate space via matrix D Cosines are computed on the consolidated frequency
vectors
E.g., similarity based on measurements =
100
0175.0
075.01
|||| UA
UA
mDmD
mDmD
30
Vector space transformationVector space transformation
Define D such that E = DTD is fulfilled Cosine between the consolidated frequency
vectors: =
=
=
=
Reduces to a Boolean cosine when E = I
|'||'|
''
UA
UA
mm
mm
|||| UA
UA
mDmD
mDmD
UT
UAT
A
UT
A
mDmDmDmD
mDmD
)()(
)(
UTT
UATT
A
UTT
A
mDDmmDDm
mDDm
UT
UATA
UTA
mEmmEm
mEm
31
Score refinements based on regulation Score refinements based on regulation structurestructure Neighbor inclusion
Diffusion of similarity between clusters of nodes in the tree
Self vs. parent-sibling-child (psc), fs-
psc
psc vs. psc, fpsc-psc
A U
ADAAG UFAS
parentparent
sibling
child
sibling
child
psc(A) psc(U)
s-psc
psc-psc
Similarity Analysis Core
domain knowledge
score refinements
feature matching
measurements
concepts
effective dates
drinking watercontaminants
base score
neighbor inclusion
reference distribution
refined score
discard belowthreshold pairs
related pairs
author-prescribed
indices
ontology (synonymicinformation) . . .
refined XMLregulations
. . .
domain-specificscoring algorithm
+
32
Neighbor inclusion: psc vs. pscNeighbor inclusion: psc vs. psc
A1 U1
ADAAG UFAS
psc(A1) psc(U1)
A2
U2psc(A2)
psc(U2)
child node
nodes in comparison
spread of similarity
red: similar nodes
blue: dissimilar nodes
Take a linear combination of neighboring pair scores Formulate a neighbor structure matrix N Define score matrix We have psc-psc = NA0NU
T
33
Neighbor inclusion: self vs. pscNeighbor inclusion: self vs. psc
A1 U1
ADAAG UFAS
psc(U1)
child node
nodes in comparison
spread of similarity
red: similar nodes
blue: dissimilar nodes
A2
U2
psc(A2)
Take a linear combination of neighbor vs. self scores Formulate a neighbor structure matrix N Define score matrix We have s-psc = ½ (0NU
T + NA0)
34
Score refinements based on regulation Score refinements based on regulation structurestructure
Reference distribution Diffusion of similarity between referencing nodes and
referenced nodes in the tree E.g., f(A5.3, U6.4(a)) updates f(A2.1, U3.3)
ADAAG--------------------------
Section 2.1-----------------------------------------------------------------
Section 5.3--------------------------
UFAS---------------------------------------
Section 3.3-----------------------------------------------------------------
Section 6.4(a)-------------
no crossreference
similarsections: fo != 0
reference
35
Reference distribution: s-ref and ref-refReference distribution: s-ref and ref-ref
...A1
...
...
...A2
...A3
...
...U1
...U2
...
...U3
...
ADAAG UFAS
...A1
...
...
...A2
...A3
...
...U1
...U2
...
...U3
...
reference
s-ref comparisonsof Sections A2, U2
ref-ref comparisonsof Sections A2, U2
ADAAG UFAS
Take a linear combination of reference vs. self and reference vs. reference scores Formulate a reference structure matrix R Define score matrix We have ref-ref = RA0RU
T and s-ref = ½ (0RUT + RA0)
36
Phrasing difference between American and British regulationsufas.4.13.9 Door Hardware. Handles, pulls, latches, locks, and other operating devices on accessible doors shall have a shape that is easy …
bs8300.12.5.4.2 Door Furniture. Door handles on hinged and sliding doors in accessible bedrooms should be easy to grip …
Neighbor similarities imply similarity between the interested nodes
Example of results: UFAS vs BS8300Example of results: UFAS vs BS8300
4.13 Doors 12.5.4 Doors
4.13.9Door Hardware
12.5.4.2Door Furniture
12.5.4.14.13.1
4.13.3
4.13.2
4.13.12
UFAS BS8300
parent
sibling
37
Example of results: almost identical Example of results: almost identical provisionsprovisions
Regulation comparison: 40CFR vs. CCR
38
Application domain: e-rulemaking Comparison between draft of rules and the
associated public comments ADAAG Chapter 11, rights-of-way draft
Less than 15 pages Over 1400 public comments received within 4
months Comments ~ 10MB in size; most are several pages
long New regulation draft can easily generate a huge
amount of data that needs to be reviewed and analyzed
Example of results: e-rulemakingExample of results: e-rulemaking
39
Example of results: e-rulemakingExample of results: e-rulemaking
Regulations compared with public comments
Content ofSection 1105.4
6 Related Public Comments
1105.4 [6]
40
Related draft section and public commentAdaag.1105.4.1
Where signal timing is inadequate for full crossing of all traffic lanes or where the crossing is not signalized, cut-through medians …
Deborah Wood, October 29, 2002
… This often means walk lights that are so short in duration that by the time a person who is blind realizes …
No identified related sectionDonna Ring, September 6, 2002
If you become blind, no amount of electronics … will make you safe … You have to learn modern blindness skills from a good teacher. You have to practice your new skills …
Concern not addressed in the draft
Example of results: e-rulemakingExample of results: e-rulemaking
41
ConclusionsConclusions An infrastructure for
Repository for regulations Shallow parser Feature extractions
Similarity comparison Base score Score refinements Results
Comparisons between Federal codes, European codes Application to e-rulemaking
Future Directions Extension of application to other domains of semi-
structured documents Conflict analysis?
42
Thank You!Thank You!
Top Related