Legal Data Markup Software CS501 Design Presentation November 9 th, 2000.
-
Upload
brandon-holmes -
Category
Documents
-
view
214 -
download
0
Transcript of Legal Data Markup Software CS501 Design Presentation November 9 th, 2000.
Project TeamDevelopers Ju Joh Sylvia Kwakye Jason Lee Nidhi Loyalka Omar Mehmood Charles
Shagong Brian Williams
Sponsors Professor William
Arms Professor Thomas
Bruce
Reviewer Amy Siu
Introduction Objective: US Code (ASCII) Well-
formed, valid XML output XML output used as input to other
applications Goal of end-use: Making law
available for general public use
Development Environment Hardware
Server 233 MHz Intel PII processor 128 MB memory 28 GB hard disk
Notebook Computers 400 MHz Intel Celeron processor 96 MB memory 4.7 GB hard disk
Development Environment Software
Red Hat Linux 6.2 Perl 5.6 SSH Secure Shell 2.3 CVS 1.10.7 Emacs 20.5.1 VIM 5.6
Execution Environment Caveat
Client upgrades execution hardware and software environment at own risk. LDMS not guaranteed to work under new conditions.
Execution Environment Naming Standards
General Rule Must start with a word in lower case. First letter of addition words in upper
case.
Filename Naming Convention Example: thePerlFile.pl
File Name Length Maximum of 20 characters.
Execution Environment Naming Standards
Function Names Must begin with a verb Example: initializeModule
Variable names Must begin with qualifiers Example: $error_LastErrorMessage
Execution Environment Naming Standards
Filehandle Names Must be all capital letters
Xml Output File Names Same as input file name with “.xml”
extension
DTD Element Names Element names in capital letters Nested element names start with DIV
Execution Environment Coding Standards
A function shall not exceed 100 lines. A function shall have preceding
comments on its purpose, pre- and postcondition.
A variable shall have a purpose comment. Each loop shall have begin and end
comments. A 3-space indentation shall be used for
each block of code.
Execution Environment Coding Standards
Perl contractions shall not be used. Each file shall have a modification
history log. Each file shall include a copyright and
license notice. Version number shall correspond to
major and minor revisions to software
Software Design System Architectural Components
Modules and their descriptions Design Constraints Error Handling Application Environment User Interfaces
System Architecture
Program
Read and Parse File
LanguageParsing Output
Figure 1: Top-level diagram of major architectural components.
Status
WhiteSpacePatternMatching
Input
StoreAndOutputFile CreateFile
StateMachine
StoreAndOutputErrors
WordPatternMatching
Figure 2: UML class diagram for LDMS
Title Variation Example
-CITE- 11 USC Sec. 506 01/23/00 -EXPCITE- TITLE 11 - BANKRUPTCY CHAPTER 5 - CREDITORS, THE DEBTOR, AND THE ESTATE SUBCHAPTER I - CREDITORS AND CLAIMS -HEAD- Sec. 506. Determination of secured status
Title Variation (cont’d)
-CITE- 46 USC Sec. 13102 01/05/99 -EXPCITE- TITLE 46 - SHIPPING Subtitle II - Vessels and Seamen Part I - State Boating Safety Programs CHAPTER 131 - RECREATIONAL BOATING SAFETY -HEAD- Sec. 13102. Program acceptance
Error Handling Handled at topmost level. Processed by
StoreAndOutputErrors module. Standard report format:
<date> <time> <input filename> <user id> <line number> <error message>
Four main categories of errors.
Error CategoriesError Resolution
Improper command. Print brief usage help,exit.
Output file alreadyexists.
Exit and log errormessage unless overwriteflag is set.
Linux system error. Log to standard error,exit.
Non-critical data error. Tag region asunprocessed, continue.
Application Environment Preconditions
Input files must exist in a known path. Required hardware and software must be
available. Sufficient system resources must be
free. Postconditions
A valid, well-formed XML document conforming to our DTD will be produced.
User Interface Design Very little runtime interactivity
required. Command-line operation. Allows batch processing.
Command-Line ArgumentsParameter Effect
-O <filename> Output XML to <filename>.
-F Force overwriting of existingfile.
-V Verbose error and statusmessages.
-L# Status messages every # linesprocessed.
-? Display help message.
Status Reporting Frequency of status reports
controlled by -L parameter. Default is no status reporting.
Module Diagrams Diagrams can be divided into two
categories: Structural diagrams.
Flow diagram. Behavioral diagrams.
Culture diagram. Context diagram.
Flow Diagram
Cornell LIIHouse
LDMS
Public
U.S. Code (ASCII)
U.S. Code (ASCII)
U.S. Code (XML)
U.S. Code
Culture Diagram
Cornell LIIHouse
LDMSFormat of
code is not negotiable.
Seriously faulty input must be manually resolved.
XML should be double-checked.
“Why does publishing take
so long?”Public
Context Diagram
House ofRepresentatives
U.S. Code
Legal Data Markup System
CornellLegal Information
Institute
XML
ProducesUses as Input
Downloads
Produces
Executes
Publishes
DTD Schema
C ITE H E A D
D IV E X P C ITE
E X P C ITE
N A V G R O U P
D IV S O U R C E
S O U R C E S TA TA M E N D
D A TA TE X TN A M E X R E F
D A TA TE X T
S TA TU TE (F IE L D TA G S )
TITL E D A TA
S TR U C TD IV
The <STRUCTDIV> TagGeneric tag to define structural divisions.
May contain <TITLEDATA>, parsed character data (#PCDATA), or another <STRUCTDIV>.
NAME - Label of division. VLEVEL - Depth of division. HLEVEL - Sequential order of division. EID - Globally unique identifier.
The <TITLEDATA> Tag
A container for sequences of fields (dashline-tagged text). May contain <NAVGROUP>, <STATUTE>, #PCDATA, or any of the field tags (MISC1-MISC8, REFTEXT, COD, CHANGE, TRANS, EXEC, CROSS, SECREF).
Navigational Tags <NAVGROUP> - Container for
navigational information, such as <CITE>, <HEAD>, and <EXPCITE>.
<CITE> - Label, section number, and title.
<EXPCITE> - Hierarchy of catchlines. <DIVEXPCITE> - Individual catchline. <HEAD> - Name of current TOC section.
Content Tags <STATUTE> - Container for actual
legal data. <SOURCE> - List of relevant
sources. <DIVSOURCE> - Individual sources
within a <SOURCE> tag. <STATAMEND> - Amendments to a
statute.
Data Tags <DATATEXT> - Text that consists
of a centered header, followed by content.
<DATATEXTNAME> - Header of the current data.
<XREF> - Cross-reference: a link to another area of the USC.
LDMS Tags in Action
-CITE- 1 USC Sec. 1 01/23/00
-EXPCITE- TITLE 1 - GENERAL PROVISIONS CHAPTER 1 - RULES OF CONSTRUCTION
-HEAD- Sec. 1. Words denoting number, gender, and so forth
…
LDMS Tags in Action<STRUCTDIV name=”Sec.” vlevel=”3” hlevel=”1” eid=”112358”><TITLEDATA><NAVGROUP><CITE titlenumber=”1”> 1 USC Sec. 1 01/23/00</CITE>
<EXPCITE level=”3”> TITLE 1 - GENERAL PROVISIONS CHAPTER 1 - RULES OF CONSTRUCTION</EXPCITE>
<HEAD> Sec. 1. Words denoting number, gender, and so forth</HEAD>
…
Source-Level Documentation Required for inclusion in each
build. Source code comments. Separate text files.
Program Design Document Intended as developer/maintainer
resource. High-level view of processing
engine. Individual processing components. Component interfaces. Updated as development
progresses.
DTD Design Document Resource for DTD developers and
maintainers. List of all elements and use. List of all attributes and use. Modified as development
progresses.
Source Code Source code for prototypes will not
be considered deliverables. Testing harnesses will not be
considered deliverables. All source code for release version
will be provided.
Executables and Data Files One executable script file. No other executables will be
included. DTD will be considered a
deliverable.
Installation No installation script is planned. Path to Perl binary must be specified
at head of executable script. Project directory must be copied in
its entirety to desired location. Relative paths within directory must
remain unchanged. User must have write permission