Computer representation of legal documents Fabio Vitali University of Bologna May 2 nd, 2000.
Managing legislative information in Parliaments: new frontiers Prof. Fabio Vitali Department of...
-
Upload
helena-tyler -
Category
Documents
-
view
216 -
download
3
Transcript of Managing legislative information in Parliaments: new frontiers Prof. Fabio Vitali Department of...
Managing legislative information in Parliaments: new frontiers
Prof. Fabio VitaliDepartment of Computer ScienceUniversity of Bologna
Next: Summary2/34
Purpose of this talk
To assert that parliamentary processes and citizens’ access to parliamentary records and documents can be improved by:
Adopting the best technologies for document management (mainly, XML and related standards)
Adopting standard formats for naming and electronic representation of documents, possibly a common, multi-lingual, multi-national standard.
Fostering the creation and adoption of many different software tools to be made available to support these standards.
Next: Norme In Rete3/34
Summary
My background
Computer support for parliamentary activities Functionalities Advantages
Key discussion points Data/metadata Different views of the idea of document Content, structure and presentation Metadata and ontologies Naming mechanism
Next: Akoma Ntoso4/34
Norme In Rete
Norms on the NetItalian-wide initiative sponsored by the Ministry of Justice (1999 - present) to develop
An XML-based data format for national, regional and local norms
A naming schema to identify all relevant documents, both available and unavailable, both existing and potential
A distributed, federated architecture allowing for multiple storage centers with overlapping competencies, official and not official, unified by a single search engine
National standard, adopted by a large number of institutions both at the national and local level. Large font of inspiration for LexML (Brazil)
Next: CEN Metalex5/34
Akoma Ntoso
Sponsored by the UN Department of Economic and Social Affairs (UNDESA), born in 2004 and now adopted by Kenya, Nigeria, South Africa, Cameroon, etc. Architecture for Knowledge-Oriented Management of African Normative Texts using Open Standards and Ontologies.
Describing structures for legislative documents in XML Referencing documents within and across countries using URIs Adding systematic metadata to documents using ontologically
sound approaches based on OWL, FRBR, etc.
for describing and managing legislative documents and Parliamentary workflow documentation needs in AfricaEasy to implement, easy to understand, easy to use, yet complete, precise and reliable
Next: Computer support for parliamentary activities6/34
CEN Metalex
CEN-sponsored initiative for an XML-based interchange format for European-wide legislative systems.
Born in 2006. Still ongoing Output for ongoing European projects
Not an actual format, rather a meta-format allowing for individual formats to recognize each other Basic ideas: to identify similar structures through roles rather than vocabulary:
an article is an article regardless of how it is called.
Naming, workflow, references are also managed to support functionality without giving up generality
Next: Standard Applications, Architectures or Formats?7/34
Computer support for parliamentary activities
Support for documents’ generation Drafting activities, record keeping, translation into national
languages, etc.
Support for workflow Management of documents across lifecycle, storage,
security, timely involvement of relevant individuals and offices
Support for citizens’ access Multi-channel publication (on paper and on the web),
search, classification, identification
Further activities Consolidation, version comparison, language
synchronization, etc.
Next: HTML, PDF8/34
Standard Applications, Architectures or Formats?
Applications rely on concrete technologies (e.g., programming languages, operating systems, programming libraries, etc.) and provide actual support for users' processes and experience.Architectures describe processes and actors and roles, and describe the characteristics of the tools that support them. Data formats describe the kind of information that is exchanged by tools and that is kept over time.
Standardizing applications forces common architectures and data formats, but also forces uniformity in users' processes and experience, and is the most fragile to technological advances.Standardizing architectures is less fragile, but forces uniformity in processes and experienceStandardizing formats first provides solutions that are not dependent on technological advances, and fosters the further generation of architectural and applicative standards as a result, rather than as a prerequisite.
Next: XML9/34
HTML, PDF
Just a publishing medium, HTML helped make the Web a big success, but it was constraining by its own simplicity
Excessive reliance on typographic rather than semantic description
Few rules not even strongly imposed
PDF is a commercial, opaque data format aimed at guaranteeing visual aspect of documents
Appropriate when the important characteristic to be maintained is the visual aspect
No support for structure, homogeneity, semantic awareness
A different format is appropriate that provides Clear differentiation between visual aspect and actual
meaning Strong syntactic rules heavily imposed to guarantee
uniformity, homogeneity, sophisticated applications
Next: Parliamentary documents and XML10/34
XML
XML (Extensible Markup Language) is a W3C standard of incredibly widespread diffusion. XML is pure syntax, without pre-defined semantics. This allows document designers to provide their own semantics. Thanks to the associated languages (DTD, XSLT, RDF) we can create sophisticated applications with big flexibility in uses. XML allows to create markup languages that are readable, generic, structured, hierarchical.
Next: Why is XML good? 11/34
Parliamentary documents and XML
XML is ideal for representing parliamentary documents (and especially bills and acts):
They have a well-defined structure, which is systematic and standardized
There are required and optional parts according to rules and tradition
There are containment constraints that determine the global correctness of the document
There are references to other texts (schedules, other acts, etc.) that can fruitfully be used to create a hypertext network.
Next: What to look for12/34
Conve
rsio
n is
diffi
cult
Conversion is very easy
Energy / Information
Why is XML good?
Conversion is very easy
Next: Approaches13/34
What to look for
Simple, standard-based data formats to facilitate usage and understanding. relying on all the relevant W3C and ISO standards.
Long term feasibility and evolution (backward and forward) To support documents being drafted now as well as those
already drafted and enacted a long time ago. to support useful lifespan of the system and the documents in
the tens and possibly hundreds of years.Self explaining formats
Documents need to be able to provide all information for their use and meaning through a simple examination, even without the aid of specialized software.
Tools need to be created with ease to provide automatic and semi-automatic aid to data markup and document description.
Manual markup or fine tuning still a possible option for exceptions.
Next: Understanding the data/metadata dichotomy (1)14/34
Approaches
Extensibility It must be possible to allow local customizations of the data model It must be possible to extend the reach of the language towards
more countries, more document types, larger vocabularies of fragment qualification
Format-induced homogeneity Documents produced by different tools and individuals need to be,
as much as possible, identical Documents produced by hand and by tools need to be, as much
as possible, identicalMultiple uses
Display on PC Screen, display on cell phone, display on Braille terminal, print on paper, print on paper with a different paper size, cataloguing, searching, workflow management (during drafting and active lifecycle), automatic consolidation, textual analysis, semantic analysis, provision analysis, cross-country comparison, synchronized translation, etc.
Next: Understanding the data/metadata dichotomy (2)15/34
Understanding the data/metadata dichotomy (1)
Data the actual content (text, structure, images, schemas) as
was exactly provided by the author of the document
Metadata Any consideration or comment or additional information
that can be expressed on the content and on the document.
Metadata is generated either by human intervention, or through automated processes.
Ontology (in short) A formalized representation of the conceptual model
that shapes all metadata associated to a document.
Next: Different views on the idea of document (1)16/34
Understanding the data/metadata dichotomy (2)
Authors’ contribution: data The words and punctuation and breaks, exactly as have
been written and accepted by the original author (in the case of legislation, the legislative body)
Editors’ contribution: metadata Publication data. Lifecycle information. Footnotes. Analysis of
provisions. Metadata is useless unless it is provided following a precise
conceptual model, called ontology.
In a way, editors are the authors of the metadataPut it in another way, metadata is information about a document that was not provided by its authors.
Next: Different views on the idea of document (2)17/34
Different views on the idea of document (1)
Different concepts Italian Act 137/2004 The current consolidated version of
the Italian Act 137/2004 An XML representation of the
current consolidated version of the Italian Act 137/2004
The file stored as “act137-2004.xml” stored in a specific folder of my computer
Different properties What is the name of the document? Who is the author of the document?
What is the creation date of the document?
The IFLA FRBR hierarchy… Work: a distinct intellectual creation. Expression: the specific form in which a
work is realized Manifestation: the representation of an
expression according to the requirements of a medium
Item: a single exemplar (an instance) of a manifestation
… provides different answers E.g.: a different name for each level E.g.: the legislator, the editor, the
publisher, the data provider E.g.: the enactment date, the
consolidation date, the generation date, the copy date
Next: Content, structure and presentation (1)18/34
Different views on the idea of document (2)
Different processes. E.g.: A repeal is really a process on the work An amendment is a process on an expression generating a new one The markup is a process on an expression generating a manifestation The copy is a process on an item generating another item.
Different peculiarities A work has no content. The content of an expression is a set of words
and drawings. The content of a manifestation is computer data. Works are eternal and created by Authors. Expressions are stable
and created either by Authors or by Editors with domain expertise (consider amendment acts that do not specify the resulting consolidated text). Manifestations are created by computer tools used by secretaries or low level operatives.
Next: Content, structure and presentation (2)19/34
Content, structure and presentation (1)
Content What exactly was written in the document.
Structure How the content is organized
Presentation The typographical choices to present a
document on screen or on paper.
Next: Descriptive vs. prescriptive approach20/34
Content, structure and presentation (2)
The structure adds meaning to pieces of content. The words “Initial definitions” assumes meaning once we
know it is the title of section #1 of the Italian Act 137/2004
The structure connects the presentation to the content
Once we know that the text “Initial definitions” is the heading of a section, we can apply the typographical choices associated to section headings.
The structure can be used to test and validate the correctness of a document
We can deduce that a document is incorrect if there is no heading associated to a section.
Next: Metadata (and ontologies) (1)21/34
Descriptive vs. prescriptive approach
Descriptive schemas: a very loose set of constraints providing a full vocabulary of elements and little or no check on their presence and order. They are meant to:
Describe a set of documents with allowable many exceptions to the basic rule.
Describe an existing (and thus non-modifiable) set of documents Describe a set of documents created by a higher authority than the
XML coder.
Prescriptive schemas: a more restricted set of constraints providing the same full vocabulary plus tight checks on presence and order. They are meant to:
Impose adherence to drafting guidelines, and reject uncompliant documents
Impose homogeneity on the work of multiple different authors Allow applications to expect certain characteristic of the documents to
be present
Akoma Ntoso, for instance, provides a two-tiered level of documents allowing the full potentiality of both to be expressed
Next: Matadata issues22/34
Metadata (and ontologies) (1)
Documents’ content does not include all that is interesting about them. A metadata schema is necessary to associate to documents all data that is not in the content of a documentSome metadata schema are flat, i.e., metadata are simply text values referring to the document; e.g.: Dublin Core, Marc 21, etc. This prevents tools to
differentiate between the different ideas of document, identify more precisely classes of concepts associated to
documents, such as actors (persons and organizations), events, provisions, places, terms, etc.
An ontology expressed using Semantic Web concepts and languages (e.g., OWL and/or Topic Maps) offers all advantages of metadata schemas, plus allows to:
associate appropriate properties to different ideas of documents (e.g., author, creation date, title, etc.)
Make assertions about abstract concepts rather than plain strings
Next: Metadata terminology23/34
Matadata issues
Authorship of metadata The generation of metadata is itself an authoring process and
needs to be controlled, dated, signed, clearly identified.Versioning of metadata
Metadata may change in time, and actually more often than the document content. How to deal with changing of it?
Relationships between metadata and IFLA FRBR document levels
All metadata refer to one and not the other idea of documents. We need to make sure that these associations are not ambiguous and agreed upon.
Location of metadata: internals or externals? Internal location guarantees co-maintainance of content and
metadata, but makes it difficult to allow for multiple views of the same content
External location allows multiple metadata sets to coexist on the same document, but complicates correct association of data and metadata
Next: Workflow management24/34
Metadata terminology
Objective A piece of information for which no reasonable doubt can exist E.g. the title of article 15, the publication date
Subjective A piece of information that requires an active interpretation from
a human that may be wrong, or for which different opinions exist E.g., resolution of implicit citations, classification of provisions
Low competence the kind of competence one may expect from a non-specialized
employee, such as a secretary, armed with just common sense and some topical experience
E.g.: where does article 1 end and article 2 startHigh competence
A piece of information whose determination requires the kind of competence one may expect from specialized jurists that come to their results after careful and painful reasoning
e.g.: dates and times in norms.
Next: Consolidation and side-by-side comparison 25/34
Workflow management
An important bit of metadata sophistication is the support for workflow
Explicit management of document evolution Identification of sources of authority (e.g., legislative
bodies), sources of changes (e.g., amending acts), time of changes (time of acts is an extremely complex discipline)
Reliable identification of actors and content (through digital signature)
Next: Naming documents and fragments26/34
Consolidation and side-by-side comparison
Only possible when structure, content and presentation of documents are explicitly separatedTraditional approaches are labour-intensive, manual, requiring both legislative and typographic competencesExplicit recording of structure and independences from presentation allows:
Consolidation as a semi-automatic process based on explicit structural references in amendments and modification laws
Side-by-side comparison as a fully-automatic process based on a different presentation patterns of the differences between an original and a modified text.
Next: Naming documents and fragments (2)27/34
Naming documents and fragments
Universal Resource Identifiers These are used throughout the World Wide Web to
indicate resources. The best known are the URL (Universal Resource
Locators) that are used to navigate on the web http://www.akomantoso.org/09-examples.html
Next: Naming documents and fragments (3)28/34
Naming documents and fragments (2)
With legislative documents, the situation is more complex. Works, expressions and manifestations are not physical resources, but abstract entities. Only items are physical resources. Yet, references are rarely (or never) to items. So works, expressions and manifestations must have their own URI, This URI will not be a URL (i.e., it will not correspond to a physical address on a computer)The act of finding out what is the URL of the item that best represents the manifestation that we are looking for is called URI resolution.
Next: The basic features of a good national standard29/34
Naming documents and fragments (3)
Naming schema must guarantee a few properties: Complete: all relevant documents (in all their levels) must
be contemplated Global: all legislative bodies (ideally even across countries)
must be able to use and clearly identify their documents. Meaningful: names need to mean something.
Make assumption about the kind, freshness and relevance of a citation by looking only at the reference’s name
Memorizable: names need to be easy to jot down, easy to remember, easy to correct if something was written down wrongly.
Guessable: given a reference to act 136/2005, it should be easy to deduce what is the form for act 76/2006, etc.
Next: Why bother?30/34
The basic features of a good national standard
Compatibility with CEN MetalexSystematically use W3C standards (esp. XML, XML Schema, Namespace, semantic web languuages, etc.)Separate:
Structure Normative content Presentation Metadata
Strong naming policies (a future extension of CEN Metalex will provide guidelines)Allow for exceptions, extensions and customization
Next: Inventing, adopting, or… ?31/34
Why bother?
An open standard for data format allows for easier, more cost-effective distribution of legislative contentAn open standard for data format allows for long-term preservation of investments and supports ease of maintenanceAn open standard for data format allows for a thriving competing market of toolsAn open standard for data format allows integration of authoritative content providers and added-value content providers (esp. Private publishers and academics)An open standard for data format allows comparative studies to be performed with greater ease
Next: Conclusions (1)32/34
Inventing, adopting, or… ?
As long as fundamental compatibility is maintained
In terms of basic structures (CEN Metalex) Naming policies (URI-based)
It is not relevant that you adopt existing standards…
E.g. Akoma Ntoso
… or invent your own national new oneBut do behave fairly, and allow for international interoperability.
Next: Conclusions (2)33/34
Conclusions (1)
A successful system is built on three key factors: Precise and sophisticated content structure Complete metadata model (with precise time-awareness) Sophisticated and easy to use naming mechanism
NormeInRete, Akoma Ntoso and (increasingly) CEN Metalex share these properties. Also it is important to remember that we are discovering new interesting ways to store and use information in this very moment. So casting in stone design decisions that prevent future evolution of document formats, tools architecture and overall functionalities is wrong and doomed.
Fine presentazione34/34
Conclusions (2)
Adopting an international standard (e.g. Akoma Ntoso) is a first step in the right direction
Open to local customization, yet international Allows immediate adoption of existing architectures and
tools, yet allows for local developments and extensions
Sharing knowledge and experiences with colleagues from other countries increases the chance of success of local initiativesChances for training and capacity building exist
Cfr: Summer school on Legislative Informatics in Florence (September 2007, June 2008)…
… but also local initiatives specific to regional and national needs (e.g. African legislative school, Kenya, January 2008)