Legal Data Markup Software CS501 Design Presentation November 9 th, 2000.

49
Legal Data Markup Software CS501 Design Presentation November 9 th , 2000

Transcript of Legal Data Markup Software CS501 Design Presentation November 9 th, 2000.

Legal Data Markup Software

CS501 Design PresentationNovember 9th, 2000

Project TeamDevelopers Ju Joh Sylvia Kwakye Jason Lee Nidhi Loyalka Omar Mehmood Charles

Shagong Brian Williams

Sponsors Professor William

Arms Professor Thomas

Bruce

Reviewer Amy Siu

Introduction Objective: US Code (ASCII) Well-

formed, valid XML output XML output used as input to other

applications Goal of end-use: Making law

available for general public use

Overview Development Environment Execution Environment Software Design DTD Design Packaging

Development Environment Hardware

Server 233 MHz Intel PII processor 128 MB memory 28 GB hard disk

Notebook Computers 400 MHz Intel Celeron processor 96 MB memory 4.7 GB hard disk

Development Environment Software

Red Hat Linux 6.2 Perl 5.6 SSH Secure Shell 2.3 CVS 1.10.7 Emacs 20.5.1 VIM 5.6

Execution Environment Caveat

Client upgrades execution hardware and software environment at own risk. LDMS not guaranteed to work under new conditions.

Execution Environment Naming Standards

General Rule Must start with a word in lower case. First letter of addition words in upper

case.

Filename Naming Convention Example: thePerlFile.pl

File Name Length Maximum of 20 characters.

Execution Environment Naming Standards

Function Names Must begin with a verb Example: initializeModule

Variable names Must begin with qualifiers Example: $error_LastErrorMessage

Execution Environment Naming Standards

Filehandle Names Must be all capital letters

Xml Output File Names Same as input file name with “.xml”

extension

DTD Element Names Element names in capital letters Nested element names start with DIV

Execution Environment Coding Standards

A function shall not exceed 100 lines. A function shall have preceding

comments on its purpose, pre- and postcondition.

A variable shall have a purpose comment. Each loop shall have begin and end

comments. A 3-space indentation shall be used for

each block of code.

Execution Environment Coding Standards

Perl contractions shall not be used. Each file shall have a modification

history log. Each file shall include a copyright and

license notice. Version number shall correspond to

major and minor revisions to software

Software Design System Architectural Components

Modules and their descriptions Design Constraints Error Handling Application Environment User Interfaces

System Architecture

Program

Read and Parse File

LanguageParsing Output

Figure 1: Top-level diagram of major architectural components.

LDMS Main

File Parser Natural Language

Output

IH WsPM WPM EMH StMSM FC XMH

UML Component Diagram

File Parser Component

State Machine

File Parser

Input Handler

Natural Language Component

Natural Language

Whitespace Pattern Matching

Word Pattern

Output Component

Output

Error Message

File Creator Status Message

XML Output Handler

Status

WhiteSpacePatternMatching

Input

StoreAndOutputFile CreateFile

StateMachine

StoreAndOutputErrors

WordPatternMatching

Figure 2: UML class diagram for LDMS

Design Constraints 8-bit ASCII input files. Non-uniform title structure. Unattended operation.

Title Variation Example

-CITE- 11 USC Sec. 506 01/23/00 -EXPCITE- TITLE 11 - BANKRUPTCY CHAPTER 5 - CREDITORS, THE DEBTOR, AND THE ESTATE SUBCHAPTER I - CREDITORS AND CLAIMS -HEAD- Sec. 506. Determination of secured status

Title Variation (cont’d)

-CITE- 46 USC Sec. 13102 01/05/99 -EXPCITE- TITLE 46 - SHIPPING Subtitle II - Vessels and Seamen Part I - State Boating Safety Programs CHAPTER 131 - RECREATIONAL BOATING SAFETY -HEAD- Sec. 13102. Program acceptance

Error Handling Handled at topmost level. Processed by

StoreAndOutputErrors module. Standard report format:

<date> <time> <input filename> <user id> <line number> <error message>

Four main categories of errors.

Error CategoriesError Resolution

Improper command. Print brief usage help,exit.

Output file alreadyexists.

Exit and log errormessage unless overwriteflag is set.

Linux system error. Log to standard error,exit.

Non-critical data error. Tag region asunprocessed, continue.

Application Environment Preconditions

Input files must exist in a known path. Required hardware and software must be

available. Sufficient system resources must be

free. Postconditions

A valid, well-formed XML document conforming to our DTD will be produced.

User Interface Design Very little runtime interactivity

required. Command-line operation. Allows batch processing.

Command-Line ArgumentsParameter Effect

-O <filename> Output XML to <filename>.

-F Force overwriting of existingfile.

-V Verbose error and statusmessages.

-L# Status messages every # linesprocessed.

-? Display help message.

Status Reporting Frequency of status reports

controlled by -L parameter. Default is no status reporting.

Module Diagrams Diagrams can be divided into two

categories: Structural diagrams.

Flow diagram. Behavioral diagrams.

Culture diagram. Context diagram.

Flow Diagram

Cornell LIIHouse

LDMS

Public

U.S. Code (ASCII)

U.S. Code (ASCII)

U.S. Code (XML)

U.S. Code

Culture Diagram

Cornell LIIHouse

LDMSFormat of

code is not negotiable.

Seriously faulty input must be manually resolved.

XML should be double-checked.

“Why does publishing take

so long?”Public

Context Diagram

House ofRepresentatives

U.S. Code

Legal Data Markup System

CornellLegal Information

Institute

XML

ProducesUses as Input

Downloads

Produces

Executes

Publishes

DTD Schema

C ITE H E A D

D IV E X P C ITE

E X P C ITE

N A V G R O U P

D IV S O U R C E

S O U R C E S TA TA M E N D

D A TA TE X TN A M E X R E F

D A TA TE X T

S TA TU TE (F IE L D TA G S )

TITL E D A TA

S TR U C TD IV

The <STRUCTDIV> TagGeneric tag to define structural divisions.

May contain <TITLEDATA>, parsed character data (#PCDATA), or another <STRUCTDIV>.

NAME - Label of division. VLEVEL - Depth of division. HLEVEL - Sequential order of division. EID - Globally unique identifier.

The <TITLEDATA> Tag

A container for sequences of fields (dashline-tagged text). May contain <NAVGROUP>, <STATUTE>, #PCDATA, or any of the field tags (MISC1-MISC8, REFTEXT, COD, CHANGE, TRANS, EXEC, CROSS, SECREF).

Navigational Tags <NAVGROUP> - Container for

navigational information, such as <CITE>, <HEAD>, and <EXPCITE>.

<CITE> - Label, section number, and title.

<EXPCITE> - Hierarchy of catchlines. <DIVEXPCITE> - Individual catchline. <HEAD> - Name of current TOC section.

Content Tags <STATUTE> - Container for actual

legal data. <SOURCE> - List of relevant

sources. <DIVSOURCE> - Individual sources

within a <SOURCE> tag. <STATAMEND> - Amendments to a

statute.

Data Tags <DATATEXT> - Text that consists

of a centered header, followed by content.

<DATATEXTNAME> - Header of the current data.

<XREF> - Cross-reference: a link to another area of the USC.

LDMS Tags in Action

-CITE- 1 USC Sec. 1 01/23/00

-EXPCITE- TITLE 1 - GENERAL PROVISIONS CHAPTER 1 - RULES OF CONSTRUCTION

-HEAD- Sec. 1. Words denoting number, gender, and so forth

LDMS Tags in Action<STRUCTDIV name=”Sec.” vlevel=”3” hlevel=”1” eid=”112358”><TITLEDATA><NAVGROUP><CITE titlenumber=”1”> 1 USC Sec. 1 01/23/00</CITE>

<EXPCITE level=”3”> TITLE 1 - GENERAL PROVISIONS CHAPTER 1 - RULES OF CONSTRUCTION</EXPCITE>

<HEAD> Sec. 1. Words denoting number, gender, and so forth</HEAD>

Packaging Release package will include:

Documentation Source Code Executable Files Data Files

Documentation Source-level documentation. Program design document. DTD design document.

Source-Level Documentation Required for inclusion in each

build. Source code comments. Separate text files.

Program Design Document Intended as developer/maintainer

resource. High-level view of processing

engine. Individual processing components. Component interfaces. Updated as development

progresses.

DTD Design Document Resource for DTD developers and

maintainers. List of all elements and use. List of all attributes and use. Modified as development

progresses.

Source Code Source code for prototypes will not

be considered deliverables. Testing harnesses will not be

considered deliverables. All source code for release version

will be provided.

Executables and Data Files One executable script file. No other executables will be

included. DTD will be considered a

deliverable.

Installation No installation script is planned. Path to Perl binary must be specified

at head of executable script. Project directory must be copied in

its entirety to desired location. Relative paths within directory must

remain unchanged. User must have write permission