1 Legal Information Retrieval on the Web The Experience of the NiR Portal Costantino Ciampi.

Post on 02-May-2015

216 views 4 download

Transcript of 1 Legal Information Retrieval on the Web The Experience of the NiR Portal Costantino Ciampi.

1

Legal Information Legal Information Retrieval on the WebRetrieval on the Web

The Experience of the NiR PortalThe Experience of the NiR Portal

Costantino Costantino CiampiCiampi

Legal Information Retrieval on the Web

The Experience of the NiR Portal(http://www.nir.it)

Costantino CiampiCostantino Ciampi e-mail: c.ciampi@ittig.cnr.it

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

ContentsContents

Normeinrete (NIR) – “Access to Law on the Net”: an e-Government project

Project description (goals, technology, results) Standardization in the legal domain:

XML representation of Italian norms URN adoption to automate hyperlinking among norms in a

distributed environment

Rome, 26 April 2004

Clip video

3

NiR Project "Access to Law on the Net"

• Improving accessibility to legislation by providing a unique point of access to Italian and EU legal documents published on different web sites– ICT to allow rights fulfillment

• Supporting PA in managing legislative documentation life cycle and law consolidation by providing standardization, software tools and methodologies – ICT to improve PA efficiency

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

Project goals

A system prototype (third version) is available at the Url:http://www.normeinrete.it

4

• Main Actors:– Minister of Justice (beginner) (www.giustizia.it)

– AIPA -> CNIPA - (Authority ->) National Center for Information Technology in the Public Administration (founder and technical coordinator) (now CNIPA) (www.cnipa.it)

• Scientific and Technical Partners:– Institute of Legal Information Theory and Technologies of

the CNR, Florence (www.ittig.cnr.it)

– CINECA Consortium, Bologna (www.cineca.it)

• Public Administrations participating at the Project

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

NiR Actors

5

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

• Phase I (May 1999 - May 2000)First Study of feasibility and realization of the Portal prototype

• Phase II (December 2000 - November 2001)Second Study of feasibility, extension of the documentary base and qualitative evolution of the Portal prototype

• Phase III (years 2002/2003)Definition of standards (URN and XML) and preparation of the software for the dissemination of the standards (parser of references and parser of structures, NIREditor XML)

• Phase IV (years 2004/2005)Commitment to external managers and full operation of the NIR Portal (with economic resources from the e-Government programme and Italian financial laws)

Steps and Resources of the NiR Project

6

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

• Implementation of a specialized portal, delivering search and retrieval functions of legislative documents published on various Public Administration's web sites;

• Definition of standards, consistent with Internet technologies, to represent data and metadata meaningful in the legal domain;

• Development and distribution of open source software to support legislative document management and publishing;

• Training and knowledge sharing among Public Administrations.

NiR Project Strategy

7

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

•www.normeinrete.it: provides unified access to Italian and European Union legislation published on different institutional web sites

So far– more than 50 public institutions have taken part in the Project;

– more than 140,000 documents have been indexed;

– about 160,000 search sessions are held monthly on the site;

– creation and updating of the NiR Legal Database ("Norm Catalogue") including metadata;

– definition of the NiR Standards.

• Two standards issued by AIPA/CNIPA as technical norms – DTDs definition for Italian legislation;

– URN definition for any kind of legal document;

– Editors and other software tools developed and distributed to PA to support standard implementation.

Present Results

8

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

•The system is based on co-operative technological architecture, resulting in a federation of legislative data bases

developed on different platforms.

•Co-operation is achieved by means of suitable application gateways which provide "loose" integration by adopting two

standards:

– one for identifying legal resources (URNs), and

– one for representing document structures and metadata by XML mark-up language according to ad hoc DTDs.

NiR Features

9

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

Searching Tools and Architecture of the NIR System (1)

The NIR System consists of:

• NiR nodes: components belonging to administration domains

containing legal database systems and related application

gateways. Documents can be stored in the file system or

within database/full text management systems: they are all

accessible through the Internet

• Central registries: components in the co-operative layer

publishing information, needed to allow effective co-operation

10

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

Searching Tools and Architecture of the NIR System (2)

• Central registries include:– Standards repository (XML DTD and URN grammar

definitions and tools);

– Registry of official Authority names, needed to standardise URN adoption;

– Registry of NiR nodes, containing information needed to allow interaction between NiR agents and domain application gateways;

– Norm Catalogue, containing, for each norm: title, basic classification, URN and the list of known physical addresses (URL) where it is published.

11

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

The Norm Catalogue(> 45.000 documents)

– The Norm Catalogue is a relational database containing, for each norm:title, basic classification, URN and the list of known physical addresses (URL) where it is published

12

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

NiR Standards

• Uniform Resource Name (URN) definition (based on IETF) to:

– identify each document regardless of its physical address (URL)

– allow automatic hyperlink through a resolution system (as DNS)

• Document Type Definition (DTD) for Italian legislative and regulatory acts (based on W3C XML Meta-language) to

represent documents structure, semantics and metadata

(*) The standards have been issued as AIPA/CNIPA technical standards and published as regulations in the Italian Official Journal

13

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

URNs(1/3)

• Each law contains several references to other laws: the whole legislative corpus can be seen as a net, laws being nodes connected through references;

• Manual activity is required to build laws hypertext through URLs;

• The URN is a persistent, location-independent, resource identification mechanism;

• The URNs are defined as a combination of elements, according to a specific grammar, that are basically: name of the enacting Authority, type of norm, date, number and a some more detailed specifications when needed;

• URNs can be built regardless the availability of corresponding documents on-line.

14

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

URNs(2/3)

• The adoption of a URN-based scheme allows to build an automated distributed hypertext, according to a model similar to the DNS (Domain Name System) used to resolve the self-explaining web sites' names into numerical HTTP addresses.

• This opportunity relies on the following considerations:– the natural language expressions used in law references usually

contain repetitive patterns, thus automatically detectable;

– the URN is built by combining data (almost) always included in the reference;

– cross references between each URN and the list of corresponding URLs, needed for the resolution service, can be built automatically.

15

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

URNs: tools and examples(3/3)

• Parser

– Available on-line, automatically detects references within laws.

• Resolution service– Resolves URNs into URLs (when known).

Clip video

16

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

XML Representationof Italian Legislative and Regulatory

Acts(1/5)

Documents with a well-defined structure– laws, constitutional laws, regional laws

Documents partially structured– regulation acts, decrees

Generic documents – any kind of non-structured acts, enclosures,..

Three categories

17

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

• Basic DTD: well structured simple documents

• Strict DTD: well structured complex documents

• Loose DTD: documents with irregular structure, exceptions (suitable for historical documents)

Three DTDs

DTD definition approach(2/5)

Each DTD can represent several document types

Mark-up must be carried out using only relevant elements

18

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

• Structural elements– heading, preamble, sections, articles, paragraphs...

• Special elements– references to other laws, formatted representation of

text-embedded relevant entities (institution, dates, places)

• Elements containing Metadata – subject-matter classification, publication data,

preparatory iter

• Semantic elements – obligation, prohibition, penalties, exceptions,

modifications, abrogations,...

XML Elements (categories)(3/5)

19

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

• Example of an Italian Act, tagged with DTD Basic

• Examples of fragments of legal texts in different formats (XML vs Html)

Examples of Legal Texts in XML(4/5)

Clip video

Documento di Microsoft Word

• Navigating the document structure with a visual XML editor

20

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

Considering the relevance of XML to NIR:

• an intense training activity has been carried out, also

with the aid of multimedia e-learning product

developed by ITTIG-CNR;

• an XML Editor, that will be distributed as open

source software, has been developed and enriched of

parsing functions by ITTIG-CNR .

Training on XML and Development of an XML NirEditor

(5/5)

Clip video

Clip video

21

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

• Advanced search functions

• Supporting legislative documents life-cycle (law enacting workflow, "law in force" at any given date)

• Moving from a totally “free” approach to a more formally-defined organizational model in order to achieve completeness and to improve precision

Opportunities deriving from NIR standards

22

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

• Software tools to support Administrations in the

adoption of NiR standards

• XML Schema definition

• Parsing services

• New metadata

• Implementation of distributed URN resolution

• Certification of the authenticity of acts through

digital signature technology

Conclusive Remarks:Current Developments and Future Initiatives

23

CONSIGLIO NAZIONALE DELLE RICERCHEIstituto di Teoria e Tecniche dell’Informazione Giuridica

h t

t p

: /

/ w

w w

. i

t t

i g

. c

n r

. i

t

... The End …