XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg...
-
date post
18-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg...
XML for Information Management – Day 2Airi Salminen
University of Erlangen-NurembergComputational Linguistics
Instructor: Professor Airi Salminenhttp://users.jyu.fi/~airi/
12.1.-16.1. 2009
XML for Information Management
2XML for Information Management – Day 2Airi Salminen
1. Markup languages2. Structured documents3. World Wide Web Consortium
Day 2: Background of XML
Outline
3XML for Information Management – Day 2Airi Salminen
1. Markup languages
•intended for human readers
•intended for computers
Markup
4XML for Information Management – Day 2Airi Salminen
•punctuational
•presentational
Markup for human readers
Texthasalwaysincludedsomekindofmarkupalsobeforethetimeofcomputers
to clarify the written expression
Text has always included some kind of markup, also before the time of computers.
Text has always included some kind of markup, also before the time of computers.
1. Markup languages
5XML for Information Management – Day 2Airi Salminen
• presentational
• procedural
• descriptive
Markup for computers
to provide information for a software module
In markup languages clear separation of markup and primary content. Markup is metadata, adding some information to the primary data.
1. Markup languages
6XML for Information Management – Day 2Airi Salminen
Presentational markup
information about the way the software module should present the primary content to the human perceiver
In <i>markup languages</i> there is clear separation of <i>markup</i> and <i>primary content</i>. Markup is <i>metadata</i>, adding some information to the primary data.
The tags <i> and </i> represent presentational markup in HTML.
1. Markup languages
The markup in an HTML file
7XML for Information Management – Day 2Airi Salminen
Procedural markup
a processing instruction for the software module
<![CDATA[<element>Example of an XML element</element>]]>
The strings <![CDATA[ and ]]> represent procedural markup in XML.
<![CDATA[ instructs the XML processor to regard all text before ]]> as character data
]]> instructs the XML processor to to continue normal identification of markup
<![CDATA[<element>Example of an XML element</element>]]>
1. Markup languages
The markup in an XML file
8XML for Information Management – Day 2Airi Salminen
Declarative markup
describes the content of a piece of primary content, what it is, or declares that the piece is a member of a particular class<student><first_name>Steve</first_name><last_name>Chung</last_name><email>[email protected]</email></student>
XML is primarily for declarative markup.
1. Markup languages
The markup in an XML file
9XML for Information Management – Day 2Airi Salminen
Markup in XML
‣All markup delivers information to XML Processor. DTD represents metamarkup, facilitating the definition of the markup vocabulary.
‣Markup in an XML document is usually classified in respect to the application.
‣Processing instructions represent procedural markup.
‣Element tags represent declarative markup.
‣ In the specification of an XML application different kinds of meanings can be given to element names, they can be processing instructions to the application or instructions about the way the content should be presented by the application.
1. Markup languages
10XML for Information Management – Day 2Airi Salminen
Example of HTML markup
<html><head><title>University of Jyväskylä </title></head><body><h2>Faculties</h2><ul><li>Humanities<li>Information Technology <li>Social Sciences</ul><br><address>[email protected]</address></body></html>
The element markup describes the structure for WWW publishing.
1. Markup languages
11XML for Information Management – Day 2Airi Salminen
<university><name>University of Jyväskylä</name><faculties>Faculties<faculty>Humanities</faculty><faculty>Information
Technology</faculty><faculty>Social Sciences</faculty></faculties><contact_email>[email protected]</
contact_email></university>
The same primary content with markup describing the content of elements by means of XML markup.
1. Markup languages
12XML for Information Management – Day 2Airi Salminen
1. Markup languages
Logical structure of the HTML document
html
body
Faculties
University of Jyväskylä
Humanitieshead
br
title
h2
ul
Social Sciences
Information Technology
li
li
li
address
Logical structure of the XML document
university
faculties
Faculties
University of Jyväskylä
Humanitiesname
Social Sciences
Information Technology
faculty
contact_email faculty
faculty
13XML for Information Management – Day 2Airi Salminen
2. Structured documents
Structured document
‣ structure, content, and external presentation can be separated from each other and processed separately
‣ structural components have names
‣ structural components can be recognized by software modules
‣ possible to define the structure
14XML for Information Management – Day 2Airi Salminen
Structured document
Structure
Content
Layout
2. Structured documents
an open language standard,
e.g. SGML, XML
different languages for defining the layout, e.g., CSS and XSL for XML
different languages for defining the structure,
e.g., DTD, XML Schema, RELAX NG for XML
15XML for Information Management – Day 2Airi Salminen
Structured document
Structure
Content
Layout
2. Structured documents
Example
DTD.txt
rhymes.txt rhymes.xml
style.txt style.css
rhymes with style attachment.xml
rhymes with style attachment.txt
16XML for Information Management – Day 2Airi Salminen
Management of structured documents
‣ document management
‣ management of the data contained in documents
2. Structured documents
17XML for Information Management – Day 2Airi Salminen
Characteristics in the management of structured documents
‣ Design. Adopting the approach of structured document management in an environment often requires careful planning before the creation of documents. Includes schema design and layout design.
‣ Content production. Content can be produced by different types of software, e.g. by a syntax-directed editor. Checking the validity against the schema.
‣ Evolution. Schema versioning, layout versioning.
‣ Operations. Most typical operation is some kind of transformation.
‣ Software. Many kinds of software systems used.
2. Structured documents
18XML for Information Management – Day 2Airi Salminen
2. Structured documents
Traditional document management
Structured document management
- No schema design.
- Processing applied to a document.
- Content, structure, and layout together.
- Schema design important. Also layou designed.
- Schemas can be utilized in various ways. Semantic information attached in the schemas.
- Processing of document parts.
- Content, structure, and layout can be processed separately.
- Management required for content schema, and stylesheet items and their different versions.
19XML for Information Management – Day 2Airi Salminen
2. Structured documents
Database management Structured document management
- Database often the information repository of one software system called Database Management System (DBMS), data processed by the operations of the DBMS.
- Design divided into schema design and view design.
- Content produced gradually, by the operations of the DBMS.
- Queries are the most important operations.
- Different software systems used to manipulate data.
- Schema design often related to extensive sectoral standard development. Layout requires design as well.
- Content produced by different kinds of programs, e.g. interactively by structure editors or automatically.
- Transformations most important operations.
20XML for Information Management – Day 2Airi Salminen
Database languages
‣ definition languages‣ query languages
Structured document languages
‣ definition languages‣ style languages‣ various manipulation, transformation
and query languages
2. Structured documents
21XML for Information Management – Day 2Airi Salminen
3. World Wide Web Consortium
‣W3C developes specifications to support the use of the web, publicly available at http://www.w3.org/TR/
‣Development is systematic
‣Development process is specified and published
22XML for Information Management – Day 2Airi Salminen
‣Working Draft: represents work in progress.
‣Candidate Recommendation: has received significant review from its immediate technical community, explicit call for implementation and technical feedback.
‣Proposed Recommendation: represents consensus in the development group, proposed to the Advisory Committee for review.
‣Recommendation: represents consensus within W3C, widespread implementation encouraged.
Phases of the development process
3. World Wide Web Consortium
23XML for Information Management – Day 2Airi Salminen
3. World Wide Web Consortium
‣Remains as a Recommendation indefinitely.
‣W3C rescinds the recommendation. A report called Rescinded Recommendation is published.
‣A new version of the Recommendation is developed.
‣Minor modifications are done. A report called Proposed Edited Recommendation is published.
What happens to a W3C Recommendation?