Service-Oriented Architecture for automatic markup of documents

50
Service-Oriented Architecture for automatic markup of documents. An use case for legal documents. Francisco Adolfo Cifuentes-Silva Library of Congress of Chile - BCN 2014-08-19 “Digital law libraries at the crossroads: Innovative solutions to complex challenges.”

description

Presentación WLIC IFLA 2014

Transcript of Service-Oriented Architecture for automatic markup of documents

Page 1: Service-Oriented Architecture for automatic markup of documents

Service-Oriented Architecture for automatic markup of documents.

An use case for legal documents.

Francisco Adolfo Cifuentes-Silva

Library of Congress of Chile - BCN

2014-08-19

“Digital law libraries at the crossroads: Innovative solutions to complex challenges.”

Page 2: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Project context

It borns in response to two (2) problems:

To be able for to obtain all the parliamentary interventions, within the legislative process (Congress sessions and related documents)

To know the evolution and the discussion around a law, since that this is defined as a bill until it is published as law

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 2

11

22

Page 3: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Project context

It borns in response to two (2) problems:

To be able for to obtain all the parliamentary interventions, within the legislative process (Congress sessions and related documents)

To know the evolution and the discussion around a law, since that this is defined as a bill until it is published as law

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 3

And in an automated way!

And in an automated way!

11

22

Page 4: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Project context

How to: Two (2) sibling projects:

Parliamentary Labor project (PL):

To be able for to obtain all the parliamentary interventions, within the legislative process (Congress sessions and related documents)

History of the Law project (HL):

To know the evolution and the discussion around a law, since that this is defined as a bill until it is published as law

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 4

11

22

Page 5: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Project context

“Sibling projects” because both are possible processing the same documents:

• Session dailies• Debate reports• Reports• Amendments• Bills• etc.

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 5

Page 6: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 6

Project context

Page 7: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 7

Congress and legal resources

Project context

Page 8: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 8

Chilean Congress

- Senate- Chamber of Deputies

Project context

Page 9: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 9

Legal resources production

- Session dailies- Debate reports- Bills, etc

Project context

Page 10: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 10

Congress and legal resources

Workflow

Project context

Page 11: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 11

Business Processes

- Each type of document has an own process flow

- BCN implements a Workflow Management System for PL & HL

Project context

Page 12: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 12

Congress and legal resources

Tools

Project context

Workflow

Page 13: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 13

Support tools

- Automatic XML Marker - Web XML Editor- XSD in the base of support tools

Project context

Page 14: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 14

Congress and legal resources

Tools

XMLStorage

Project context

Workflow

Page 15: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 15

XML Storage

- SVN server for XML documents- Allow us manage all XML versions - REST access: HTTP GET, PUT

Project context

Page 16: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 16

Tools

XMLStorage

Information extraction

Linked Open Data

Congress and legal resources

Project context

Workflow

Page 17: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 17

Information Extraction

New information is extracted from enriched XML in two formats:

- Linked Open Data- Relational data (facts table)

Project context

Page 18: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 18

Tools

XMLStorage

Information extraction

Linked Open Data

Congress and legal resources

Project context

Workflow

Page 19: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 19

Tools

XMLStorage

Information extraction

Linked Open Data

Congress and legal resources

New data is used for a new process

Project context

Workflow

Page 20: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Strategic decisions

Service Oriented ArchitectureOur focus:- HTTP is the base- REST Web Services- W3C Web Standards

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 20

Page 21: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Strategic decisions

Service Oriented Architecture

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 21

Workflow Management SystemWorkflow Management System

Automatic MarkupAutomatic Markup XML EditorXML Editor RDF TriplestoreRDF Triplestore

SVN XMLSVN XML MediatorMediator Web ServicesWeb Services

Page 22: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Strategic decisions

Linked Open Data - LODSince 2011 BCN publishes LOD:

Dataset of legal norms Dataset of legislative documents Datasets and ontologies about:

People Geographic places Organizations Others like roles, bills, congress structure, etc.

Please visit http://datos.bcn.cl !!

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 22

Page 23: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Strategic decisions

Linked Open Data For automatic markup we are using:• URIs for legal documents• URIs for metadata• URIs for named entities:

– URIs for people– URIs for organizations– URIs for roles– URIs for events– URIs for locations– …. URIs for all

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 23

Page 24: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Strategic decisions

The definition of a XML Schema

We need a XML Schema for markup of documents, and eventually interchange the documents, so we have two big choices:

• Own XML Schema = low interoperability, reusability and high cost

• Standard XML Schema = high interoperability, reusability and low cost

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 24

Page 25: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Strategic decisions

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 25

Page 26: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Strategic decisions

The definition of a XML Schema

Standard XML Schema = high interoperability, reusability and low cost

Ok but, why Akoma-Ntoso?

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 26

Page 27: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Strategic decisions

Akoma-Ntoso

- XML Schema for legal documents designed and supported by “great minds” in OASIS Group

- Support to many types of documents:(session daily, bills, debate reports, amendments, among others)

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 27

Page 28: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Strategic decisions

Akoma-Ntoso

- There is a growing set of tools for working with him, such as Web XML editors or office editor tools, example:– LegisProWeb – Bungeni– Lime Editor

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 28

Page 29: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Automatic markup in XML

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 29

PlainText

Named Entities recognitionNamed Entities recognition

URI assignmentURI assignment

Structural MarkupStructural Markup

Akoma-Ntoso translationAkoma-Ntoso translationXMLAKN

Automatic XMLMarker

Page 30: Service-Oriented Architecture for automatic markup of documents

Automatic markup in XML

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 30

PlainText

Named Entities recognitionNamed Entities recognition

URI assignmentURI assignment

Structural MarkupStructural Markup

Akoma-Ntoso translationAkoma-Ntoso translationXMLAKN

Automatic XMLMarker

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Page 31: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Automatic markup in XML

Named Entity Recognizer (NER)

- We need to identify entities in the text- We are using a spanish adapted version

of Stanford NER which uses a CRF classifier.

- The classifier was trained with large documents achieving results over 80% of effectivity in entity recognition

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 31

Page 32: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Automatic markup in XML

Named Entity Recognizer (NER)

Web service, written in Java and based in the Stanford NER

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 32

Page 33: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Automatic markup in XML

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 33

PlainText

Named Entities recognitionNamed Entities recognition

URI assignmentURI assignment

Structural MarkupStructural Markup

Akoma-Ntoso translationAkoma-Ntoso translationXMLAKN

Automatic XMLMarker

Page 34: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Automatic markup in XML

URI assignment

- Once the NER find all entities, we need to assign its URI

- This tool is called “The Mediator” and it has been developed in collaboration with the Weso Research Group of the University of Oviedo.

Francisco Adolfo Cifuentes-Silva - Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 34

Page 35: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Automatic markup in XML

Mediator output in XML

Web service, written in Java and based in Apache Lucene

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 35

Page 36: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Automatic markup in XML

Mediator features

- Connected to SPARQL Endpoint- It allows to set context information for each

work session (ex: date, chamber, type of doc. in markup)

- Using the context information, it applies a set of heuristics for each entity type, identifying correctly the URI for each one

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 36

Page 37: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Automatic markup in XML

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 37

PlainText

Named Entities recognitionNamed Entities recognition

URI assignmentURI assignment

Structural MarkupStructural Markup

Akoma-Ntoso translationAkoma-Ntoso translationXMLAKN

Automatic XMLMarker

Page 38: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Automatic markup in XML

Structural markup

- The problem is to detect structural sections

- Combination of methods:- Regular expressions- Algorithms for detecting sequences- Rules and algorithms

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 38

Page 39: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Automatic markup in XML

Structural markup

- The combination of methods depends on each document type

- Finally, the object representation of document (simmilar to DOM) is converted to ad-hoc XML

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 39

Page 40: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Automatic markup in XML

Structural markup

Web service and written in Java

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 40

Page 41: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Automatic markup in XML

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 41

PlainText

Named Entities recognitionNamed Entities recognition

URI assignmentURI assignment

Structural MarkupStructural Markup

Akoma-Ntoso translationAkoma-Ntoso translationXMLAKN

Automatic XMLMarker

Page 42: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Automatic markup in XML

Akoma-Ntoso translator

- We need AKN documents for edition, enrichment and extraction

- AKN is a complex schema- The best solution was to build a web

service for convert ad-hoc XML to AKN

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 42

Page 43: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Automatic markup in XML

Akoma-Ntoso translator

Web service and written in Java

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 43

Page 44: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Results and discussion

Positive impact in the work, reducing dramatically time of XML markup compared to manual labeling of documents

reducing time and cost of product generation

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 44

Page 45: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Results and discussion

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 45

Time for completing a History of the Law in distinct scenarios

Page 46: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Conclusions

SOA has provided to improve each component separately impacting positively the final result (ex. Datasets, NER training, heuristics)

It is possible to integrate aditional XML Schemas to output

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 46

Page 47: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Conclusions

The automatic markup of XML documents, and subsequent manual enrichment of metadata provides an excelent source for data extraction

Our solution based on SOA allow us an easy integration of exceptions and new cases in the markup

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 47

Page 48: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Future Work

Alfonso Pérez, Director of the BCN, has installed the concept of “Semantic Library” like one of the main objectives of the BCN in the institutional strategic plan.

This new concept implies to apply the automatic markup schema to all BCN areas, developing new markup schemas and possible new challenges in terms of identify document sections and semantic content.

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 48

Page 49: Service-Oriented Architecture for automatic markup of documents

Project contextStrategic decisions - SOA - Linked Open Data - Akoma-NtosoAutomatic markup in XML - Named Entity Recognizer - URI assignment - Structural Markup - Akoma-Ntoso translatorResults and discussionConclussionsFuture workAcknowledgements

Acknowledgements

• Library of Congress of Chile Team • Developers team

– Ricardo Muñoz– Claudio Devia– Eridan Otto– David Vilches– Me

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 49

Page 50: Service-Oriented Architecture for automatic markup of documents

Thanks for your attention!

fcifuentes <at> bcn <dot> cl

twitter.com/fcifuentes

www.slideshare.net/francisco.cifuentes

www.linkedin.com/in/fcifuentes

Francisco Adolfo Cifuentes-Silva - Library of Congress of Chile 50

Me

If you need more details, you can contact me: