REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03...

23
REGNET REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government Regulations

Transcript of REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03...

Page 1: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

REGNETREGNET

Stanford UniversityGloria LauDr. Shawn KerriganDr. Kincho LawDr. Gio Wiederhold

WITS’03Dec 13th, 2003

An Information Infrastructure for Government Regulations

Page 2: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

2

MotivationMotivation

Multiple sources of regulations E.g. federal, state, local Different formats Conflicting ideas

Need for a repository Locate relevant information E.g. small business

Need for analysis tool Complexity of regulations

Multiple sources Understanding of regulations & their relationships

Page 3: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

3

Example 1Example 1

ADAAG Appendix 4.6.3

… Such a curb ramp opening must be located within the access aisle boundaries, not within the parking space boundaries.

CBC 1129B.4.3

… Ramps shall not encroach into any parking space.

Exception: 1. Ramps located at the front of accessible parking spaces may encroach into the length of such spaces …

CBC allows curb ramps encroaching into accessible parking stall access aisles, while ADA disallows encroachment into any portion of the stall.

Page 4: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

4

Example 2Example 2

ADAAG 4.7.2Slope. …Transitions from ramps to walks, gutters, or streets shall be flush and free of abrupt changes…

CBC 1127B.5.5Beveled lip. The lower end of each curb ramp shall have a ½ inch (13mm) lip beveled at 45 degrees as a detectable way-finding edge for persons with visual impairments.

ADAAG focuses on wheelchair traversal; CBC focuses on the visually impaired when using a cane.

Page 5: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

5

ScopeScope Repository development

Shallow parser Feature extraction Ontology development

Automated extraction of related provisions Feature matching Structural matching Application to e-rulemaking

Compliance assistance using a Q&A system FOPC logic implementation Q&A compliance check

Page 6: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

6

Repository developmentRepository development

shallow parser

regulations in HTML, PDF,plain text, etc

feature extractor

Ontology

XML regulations

measurements exceptions definitions

Semio

concepts

author-prescribed

indicesglossaryterms refined XML regulations

generic features

domain-specific features

DomainExpert

chemicals

effectivedates

Page 7: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

7

Shallow parserShallow parser

Data Source Accessibility standards

US, UK and Scotland Drinking water standards in Environmental

regulations Federal and California

Current standard: HTML, PDF, hardcopy... Our system standard: XML Unit of extraction: section

<regElement name=”ufas.4.32.1” title=”minimum number” asterisk=”0” >

<regText> Fixed or built-in seating, ... </regText>

<ref name=”ufas.4.5” num=”1” />

<ref name=”ufas.4.32” num=”1” />

</regElement>

Page 8: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

8

Automated Translation to Hierarchical Automated Translation to Hierarchical StructureStructure

PART 279—Standards For The Management Of Used Oil

Subpart B – Applicability

…§ 279.12 Prohibitions.(a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter. (b) Use as a dust suppressant. The use of used oil as a dust suppressant is prohibited, except when such activity takes place in one of the states listed in § 279.82(c).(c) Burning in particular units. Off-specification used oil fuel may be burned for energy recovery in only the following devices: (1) Industrial furnaces identified in § 260.10 of this chapter; (2) Boilers, as defined in § 260.10 of this chapter, that are identified as follows: (i) Industrial boilers located on the site of a facility engaged in a manufacturing process where substances are transformed into new products, including the component parts of products, by mechanical or chemical processes;….

Subsection(a)

Subsection(b)

Subsection(c)

40 CFR 279

Subpart A Subpart B Subpart I

Section 279.10 Section 279.11 Section 279.12

… …

contains

(a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter.

(a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units …

Example:

Page 9: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

9

Ontology ViewOntology View

Page 10: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

10

Feature extractionFeature extraction

Generic features Concepts Exceptions Definitions

Domain-specific features Glossary terms Author-prescribed indices Effective dates Measurements Chemicals, e.g., drinking water contaminants

Page 11: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

11

XML regulation with features addedXML regulation with features addedOriginal section 141.11.b from the 40 CFR§ 141.11 Maximum contaminant levels for inorganic chemicals. (a) The maximum contaminant level for arsenic applies only to community water

systems ... (b) The maximum contaminant level for arsenic is 0.05 milligrams per liter for

community water systems until January 23, 2006. Refined section 141.11.b in XML format<regElement id=”40.cfr.141.11.b” name=””> <dwc name=”arsen” times=”1” /> <concept name=”commun water system” times=”1” /> <measurement unit=”ppm” size=”0.05” quantifier=”max” /> <date to=”January 23, 2006” /> ... <regText> The maximum contaminant level for arsenic is 0.05 milligrams per liter for community water systems until January 23, 2006. </regText></regElement>

Page 12: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

12

Similarity AnalysisSimilarity Analysis

measurements

exceptions

definitions

author-prescribed

indices

glossary terms

feature matching base score

neighbor inclusion

refined score

referencedistribution

final score

Similarity Analysis Core

trashbelow

thresholdpairs

refinedXML

regulations

relatedpairs

Page 13: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

13

Similarity Score computationSimilarity Score computation

Feature matching f0 = (i = features fi) / # features i

Features Concept & index match

tf idf vector tf = term frequency idf = inverse document frequency = log(n/ni)

Chemical match Measurement match Exception match Effective date match Glossary/definition term match

Page 14: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

14

Score refinementsScore refinements

Near-tree neighbors Self vs. parent-sibling-child (psc), fs-psc

psc vs psc, fpsc-psc

A U

ADAAG UFAS

parentparent

sibling

child

sibling

child

psc(A) psc(U)

s-psc

psc-psc

Page 15: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

15

Score refinementsScore refinements

Reference distribution, frd

Not-so-immediate neighbor effect on score E.g. f(A5.3, U6.4(a)) updates f(A2.1, U3.3)

ADAAG--------------------------

Section 2.1-----------------------------------------------------------------

Section 5.3--------------------------

UFAS---------------------------------------

Section 3.3-----------------------------------------------------------------

Section 6.4(a)-------------

no crossreference

similarsections: fo != 0

reference

Page 16: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

16

Phrasing difference between American and British regulationsufas.4.13.9 Door Hardware. Handles, pulls, latches, locks, and other operating devices on accessible doors shall have a shape that is easy …

bs8300.12.5.4.2 Door Furniture. Door handles on hinged and sliding doors in accessible bedrooms should be easy to grip …

Neighbor similarities imply similarity between the interested nodes

Preliminary results: UFAS vs BS8300Preliminary results: UFAS vs BS8300

4.13 Doors 12.5.4 Doors

4.13.9Door Hardware

12.5.4.2Door Furniture

12.5.4.14.13.1

4.13.3

4.13.2

4.13.12

UFAS BS8300

parent

sibling

Page 17: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

17

Application domain: e-rulemaking Comparison between draft of rules and the

associated public comments ADAAG Chapter 11, rights-of-way draft

Less than 15 pages Over 1400 public comments received within 4

months Comments ~ 10MB in size; most are several pages

long New regulation draft can easily generate a huge

amount of data that needs to be reviewed and analyzed

Preliminary results: e-rulemakingPreliminary results: e-rulemaking

Page 18: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

18

Preliminary results: e-rulemakingPreliminary results: e-rulemaking

1105.4 [6]

Content ofSection 1105.4

6 Related Public Comments

Page 19: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

19

Related draft section and public commentAdaag.1105.4.1

Where signal timing is inadequate for full crossing of all traffic lanes or where the crossing is not signalized, cut-through medians …

Deborah Wood, October 29, 2002

… This often means walk lights that are so short in duration that by the time a person who is blind realizes …

No identified related sectionDonna Ring, September 6, 2002

If you become blind, no amount of electronics … will make you safe … You have to learn modern blindness skills from a good teacher. You have to practice your new skills …

Concern not addressed in the draft

Preliminary results: e-rulemakingPreliminary results: e-rulemaking

Page 20: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

20

Compliance Assistance SystemCompliance Assistance System

Page 21: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

21

Compliance IssuesCompliance Issues

Page 22: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

22

ConclusionsConclusions

An infrastructure for Repository development

Shallow parser Feature extraction Ontology development

Automated extraction of related provisions Feature matching Structural matching Application to e-rulemaking

Compliance assistance using a Q&A system FOPC logic implementation Q&A compliance check

Future Directions Application on other semi-structured documents Inconsistency identification

Page 23: REGNET Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003 An Information Infrastructure for Government.

23

Thank You!Thank You!

Questions?Questions?