ISOM
Standards in Information Management: XML
Arijit Sengupta
ISOM
Learning Objectives
• Learn what XML is
• Learn the various ways in which XML is used
• Learn the key companion technologies
• See how XML is being used in industry as a meta-language
ISOM
Agenda
• Overview
• Syntax and Structure
• The XML Alphabet Soup
• XML as a meta-language
ISOM
OverviewWhat is XML?
• A tag-based meta language• Designed for structured data representation• Represents data hierarchically (in a tree)• Provides context to data (makes it meaningful)
Self-describing data
• Separates presentation (HTML) from data (XML)• An open W3C standard• A subset of SGML
vs. HTML, which is an implementation of SGML
ISOM
OverviewWhat is XML?
• XML is a “use everywhere” data specification
DocumentsConfiguration
Database
Application X
Repository
XML XML
XML XML
ISOM
OverviewDocuments vs. Data
• XML is used to represent two main types of things:Documents
• Lots of text with tags to identify and annotate portions of the document
Data• Hierarchical data structures
ISOM
OverviewXML and Structured Data
• Pre-XML representation of data:
• XML representation of the same data:“PO-1234”,”CUST001”,”X9876”,”5”,”14.98”
<PURCHASE_ORDER><PO_NUM> PO-1234 </PO_NUM><CUST_ID> CUST001 </CUST_ID><ITEM_NUM> X9876 </ITEM_NUM><QUANTITY> 5 </QUANTITY><PRICE> 14.98 </PRICE>
</PURCHASE_ORDER>
ISOM
OverviewBenefits of XML
• Open W3C standard• Representation of data across
heterogeneous environmentsCross platformAllows for high degree of interoperability
• Strict rulesSyntaxStructureCase sensitive
ISOM
OverviewWho Uses XML?
• Submissions byMicrosoft IBMHewlett-PackardFujitsu LaboratoriesSun MicrosystemsNetscape (AOL), and others…
• Technologies using XMLSOAP, ebXML, BizTalk, WebSphere, many
others…
ISOM
Agenda
• Overview
• Syntax and Structure
• The XML Alphabet Soup
• XML as a meta-language
ISOM
Syntax and StructureComponents of an XML Document
• Elements Each element has a beginning and ending tag
• <TAG_NAME>...</TAG_NAME> Elements can be empty (<TAG_NAME />)
• Attributes Describes an element; e.g. data type, data range, etc. Can only appear on beginning tag
• Processing instructions Encoding specification (Unicode by default) Namespace declaration Schema declaration
ISOM
Syntax and StructureComponents of an XML Document
<?xml version=“1.0” ?><?xml-stylesheet type="text/xsl” href=“template.xsl"?><ROOT>
<ELEMENT1><SUBELEMENT1 /><SUBELEMENT2 /></ELEMENT1><ELEMENT2> </ELEMENT2><ELEMENT3 type=‘string’> </ELEMENT3><ELEMENT4 type=‘integer’ value=‘9.3’> </ELEMENT4>
</ROOT>
Prologue (processing instructions)
Elements
Elements with Attributes
ISOM
Syntax and StructureRules For Well-Formed XML
• There must be one, and only one, root element• Sub-elements must be properly nested
A tag must end within the tag in which it was started
• Attributes are optional Defined by an optional schema
• Attribute values must be enclosed in “” or ‘’• Processing instructions are optional• XML is case-sensitive
<tag> and <TAG> are not the same type of element
ISOM
Syntax and StructureWell-Formed XML?
• No, CHILD2 and CHILD3 do not nest properly
<xml? Version=“1.0” ?><PARENT>
<CHILD1>This is element 1</CHILD1><CHILD2><CHILD3>Number 3</CHILD2></CHILD3>
</PARENT>
ISOM
Syntax and StructureWell-Formed XML?
• No, there are two root elements
<xml? Version=“1.0” ?><PARENT>
<CHILD1>This is element 1</CHILD1></PARENT><PARENT>
<CHILD1>This is another element 1</CHILD1></PARENT>
ISOM
Syntax and StructureWell-Formed XML?
• Yes
<xml? Version=“1.0” ?><PARENT>
<CHILD1>This is element 1</CHILD1><CHILD2/><CHILD3></CHILD3>
</PARENT>
ISOM
Syntax and StructureAn XML Document
<?xml version='1.0'?><bookstore> <book genre=‘autobiography’ publicationdate=‘1981’ ISBN=‘1-861003-11-0’> <title>The Autobiography of Benjamin Franklin</title> <author> <first-name>Benjamin</first-name> <last-name>Franklin</last-name> </author> <price>8.99</price> </book> <book genre=‘novel’ publicationdate=‘1967’ ISBN=‘0-201-63361-2’> <title>The Confidence Man</title> <author> <first-name>Herman</first-name> <last-name>Melville</last-name> </author> <price>11.99</price> </book></bookstore>
ISOM
Syntax and Structure Namespaces: Overview
• Part of XML’s extensibility• Allow authors to differentiate between tags of
the same name (using a prefix)Frees author to focus on the data and decide how to
best describe itAllows multiple XML documents from multiple authors
to be merged
• Identified by a URI (Uniform Resource Identifier)When a URL is used, it does NOT have to represent
a live server
ISOM
Syntax and Structure Namespaces: Declaration
xmlns: bk = “http://www.example.com/bookinfo/”
xmlns: bk = “urn:mybookstuff.org:bookinfo”
Namespace declaration examples:
Namespace declaration Prefix URI (URL)
xmlns: bk = “http://www.example.com/bookinfo/”
ISOM
Syntax and Structure Namespaces: Examples
<BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE currency=‘US Dollar’>19.99</bk:PRICE>
<bk:BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”xmlns:money=“urn:finance:money”> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE money:currency=‘US Dollar’> 19.99</bk:PRICE>
ISOM
Syntax and Structure Namespaces: Default Namespace
• An XML namespace declared without a prefix becomes the default namespace for all sub-elements
• All elements without a prefix will belong to the default namespace:
<BOOK xmlns=“http://www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR>
ISOM
Syntax and Structure Namespaces: Scope
• Unqualified elements belong to the inner-most default namespace.BOOK, TITLE, and AUTHOR belong to
the default book namespacePUBLISHER and NAME belong to the
default publisher namespace<BOOK xmlns=“www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> <PUBLISHER xmlns=“urn:publishers:publinfo”> <NAME>Microsoft Press</NAME> </PUBLISHER></BOOK>
ISOM
Syntax and Structure Namespaces: Attributes
• Unqualified attributes do NOT belong to any namespaceEven if there is a default namespace
• This differs from elements, which belong to the default namespace
ISOM
Syntax and Structure Entities
• Entities provide a mechanism for textual substitution, e.g.
• You can define your own entities• Parsed entities can contain text and markup• Unparsed entities can contain any data
JPEG photos, GIF files, movies, etc.
Entity Substitution< <
& &
ISOM
Agenda
• Overview
• Syntax and Structure
• The XML Alphabet Soup
• XML as a meta-language
ISOM
The XML ‘Alphabet Soup’
• XML itself is fairly simple
• Most of the learning curve is knowing about all of the related technologies
ISOM
The XML ‘Alphabet Soup’
XML Extensible Markup Language
Defines XML documents
Infoset Information Set Abstract model of XML data; definition of terms
DTD Document Type Definition
Non-XML schema
XSD XML Schema XML-based schema language
XDR XML Data Reduced An earlier XML schema
CSS Cascading Style Sheets Allows you to specify styles
XSL Extensible Stylesheet Language
Language for expressing stylesheets; consists of XSLT and XSL-FO
XSLT XSL Transformations Language for transforming XML documents
XSL-FO XSL Formatting Objects
Language to describe precise layout of text on a page
ISOM
The XML ‘Alphabet Soup’
XPath XML Path Language A language for addressing parts of an XML document, designed to be used by both XSLT and XPointer
XPointer XML Pointer Language
Supports addressing into the internal structures of XML documents
XLink XML Linking Language
Describes links between XML documents
XQuery XML Query Language (draft)
Flexible mechanism for querying XML data as if it were a database
DOM Document Object Model
API to read, create and edit XML documents; creates in-memory object model
SAX Simple API for XML API to parse XML documents; event-driven
Data Island XML data embedded in a HTML pageData Binding
Automatic population of HTML elements from XML data
ISOM
The XML ‘Alphabet Soup’ Schemas: Overview
• DTD (Document Type Definitions)Not written in XMLNo support for data types or namespaces
• XSD (XML Schema Definition)Written in XMLSupports data typesCurrent standard recommended by W3C
ISOM
The XML ‘Alphabet Soup’ Schemas: Purpose
• Define the “rules” (grammar) of the document Data types Value bounds
• A XML document that conforms to a schema is said to be valid More restrictive than well-formed XML
• Define which elements are present and in what order
• Define the structural relationships of elements
ISOM
The XML ‘Alphabet Soup’ Schemas: DTD Example
• XML document:
• DTD schema:<!DOCTYPE BOOK [<!ELEMENT BOOK (TITLE+, AUTHOR) ><!ELEMENT TITLE (#PCDATA) ><!ELEMENT AUTHOR (#PCDATA) >]>
<BOOK> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR></BOOK>
ISOM
The XML ‘Alphabet Soup’ Schemas: XSD Example
• XML document:
<CATALOG> <BOOK> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> </BOOK> …</CATALOG>
ISOM
The XML ‘Alphabet Soup’ Schemas: XSD Example
<xsd:schema id="NewDataSet“ targetNamespace="http://tempuri.org/schema1.xsd" xmlns="http://tempuri.org/schema1.xsd" xmlns:xsd="http://www.w3.org/1999/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"> <xsd:element name="book"> <xsd:complexType content="elementOnly"> <xsd:all> <xsd:element name="title" minOccurs="0" type="xsd:string"/> <xsd:element name="author" minOccurs="0" type="xsd:string"/> </xsd:all> </xsd:complexType> </xsd:element> <xsd:element name=“Catalog" msdata:IsDataSet="True"> <xsd:complexType> <xsd:choice maxOccurs="unbounded"> <xsd:element ref="book"/> </xsd:choice> </xsd:complexType> </xsd:element></xsd:schema>
ISOM
The XML ‘Alphabet Soup’ Schemas: Why You Should Use XSD
• Newest W3C Standard• Broad support for data types• Reusable “components”
Simple data types Complex data types
• Extensible• Inheritance support• Namespace support• Ability to map to relational database tables• XSD support in Visual Studio.NET
ISOM
The XML ‘Alphabet Soup’ Transformations: XSL
• Language for expressing document styles
• Specifies the presentation of XML More powerful than CSS
• Consists of:XSLTXPathXSL Formatting Objects (XSL-FO)
ISOM
The XML ‘Alphabet Soup’ Transformations: Overview
• XSLT – a language used to transform XML data into a different form (commonly XML or HTML)
XML,HTML,
…
XML
XSLT
ISOM
The XML ‘Alphabet Soup’ Transformations: XSLT
• The language used for converting XML documents into other forms
• Describes how the document is transformed• Expressed as an XML document (.xsl)• Template rules
Patterns match nodes in source documentTemplates instantiated to form part of result
document
• Uses XPath for querying, sorting, etc.
ISOM
The XML ‘Alphabet Soup’ XPath (XML Path Language)
• General purpose query language for identifying nodes in an XML document
• Declarative (vs. procedural)
• Contextual – the results depend on current node
• Supports standard comparison, Boolean and mathematical operators (=, <, and, or, *, +, etc.)
ISOM
The XML ‘Alphabet Soup’ XPath Operators
Operator Usage Description/ Child operator – selects only immediate children
(when at the beginning of the pattern, context is root)
// Recursive descent – selects elements at any depth (when at the beginning of the pattern, context is root)
. Indicates current context
.. Selects the parent of the current node
* Wildcard
@ Prefix to attribute name (when alone, it is an attribute wildcard)
[ ] Applies filter pattern
ISOM
The XML ‘Alphabet Soup’ XPath Query Examples
./author (finds all author elements within current context)
/bookstore (find the bookstore element at the root)
/* (find the root element)
//author (find all author elements anywhere in document)
/bookstore[@specialty = “textbooks”] (find all bookstores where the specialty
attribute = “textbooks”)
/book[@style = /bookstore/@specialty] (find all books where the style attribute = the specialty attribute of the bookstore element at the root)
ISOM
More XPath Examples
Path Expression Result
/bookstore/book[1] Selects the first book element that is the child of the bookstore element
/bookstore/book[last()] Selects the last book element that is the child of the bookstore element
/bookstore/book[last()-1] Selects the last but one book element that is the child of the bookstore element
/bookstore/book[position()<3] Selects the first two book elements that are children of the bookstore element
//title[@lang] Selects all the title elements that have an attribute named lang
//title[@lang='eng'] Selects all the title elements that have an attribute named lang with a value of 'eng'
/bookstore/book[price>35.00] Selects all the book elements of the bookstore element that have a price element with a value greater than 35.00
/bookstore/book[price>35.00]/title Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00
ISOM
XPath Functions
• Accessor functions:node-name, data, base-uri, document-uri
• Numeric value functions:abs, ceiling, floor, round, …
• String functions:compare, concat, substring, string-length,
uppercase, lowercase, starts-with, ends-with, matches, replace, …
• Other functions include functions on boolean values, dates, nodes, etc.
ISOM
• XML embedded in an HTML document• Manipulated via client side script or data
binding
<XML id=“XMLID”> <BOOK> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> </BOOK></XML>
<XML id=“XMLID” src=“mydocument.xml”>
The XML ‘Alphabet Soup’ Data Islands
ISOM
The XML ‘Alphabet Soup’ Data Islands
• Can be embedded in an HTML SCRIPT element
• XML is accessible via the DOM:
<SCRIPT language=“xml” id=“XMLID”><SCRIPT type=“text/xml” id=“XMLID”><SCRIPT language=“xml” id=“XMLID” src=“mydocument.xml”>
ISOM
The XML ‘Alphabet Soup’ XML-Based Applications
• Microsoft SQL ServerRetrieve relational data as XMLQuery XML dataJoin XML data with existing database tablesUpdate the database via XML UpdategramsNew XML data type in SQL 2005
• Microsoft Exchange ServerXML is native representation of many types of dataUsed to enhance performance of UI scenarios (for
example, Outlook Web Access (OWA))
ISOM
Agenda
• Overview
• Syntax and Structure
• The XML Alphabet Soup
• XML as a meta-language
ISOM
XML as a Meta-Language
XML/DTD
CSS
XSL
DSSL
XSLT
DOMSAX
XLL
XSchema
XPath
XPointer
MathML
BeanML
CML
WMLXQL
A Language to
create Languages
GO
ISOM
Gene Ontology (GO)
• Describing and manipulating information about the molecular function, biological process and cellular component of gene products.
• Gene Ontology website: http://www.geneontology.org
• GO DTD: ftp://ftp.geneontology.org/pub/go/xml/dtd/go.dtd
• GO Browsers and tools: http://www.geneontology.org/#tools
• GO Resources and samples: http://www.geneontology.org/#annotations
ISOM
Math ML
• Describing and manipulating mathematical notations
• MathML website www.w3.org/Math
• MathML DTD www.w3.org/Math/DTD
• MathML Browser www.w3.org/Amaya
• MathML Resources www.webeq.com/mathml see sample documents here
ISOM
Chemical ML
• Representing molecular and chemical information• CML website
www.xml-cml.org
• CML DTD www.xml-cml.org/dtdschema/index.html
• CML Browser and Authoring Environment www.xml-cml.org/jumbo.html
• CML Resources www.xml-cml.org/chimeral/index.html see sample documents here some require plug-in downloads, can be slow
ISOM
Wireless ML
• Allows web pages to be displayed over mobile devices
• WML works with WAP to deliver the content
• Underlying model: Deck of Cards that the User can sift through
• WAP/WML website www.wapforum.org
• WML DTD www.wapforum.org/DTD/wml_1.1.xml
• WAP/WML Resources www.oasis-open.org/cover/wap-wml.html www.w3scripts.com/wap Tutorial on WML, also see WAP Demo
ISOM
Scalable Vector Graphics
• Describing vector graphics data for use over the web
• Rendering is done on the browser
• Bandwidth demands lower, scaling easier
• SVG website www.w3.org/Graphics/SVG
• SVG Plug-Ins www.adobe.com/svg
• SVG Resources www.irt.org/articles/js176 1999 article and good, brief
tutorial planet.svg An Example from Deitel
ISOM
Bean ML
• Describing software components such as Java Beans• Defines how the components are interconnected and
can be used• Bean ML Specs and Tools
www.alphaworks.ibm.com/aw.nsf/techmain/bml
• Bean ML Resources www.oasis-open.org/cover/beanML.html With Bean ML
• You can mark-up beans using Bean ML
• And invoke different operations on Beans
• Includes BML Scripting Framework
ISOM
XBRL
• Extensible Business Reporting Language• Capturing and representing financial and accounting information• Variety of situations
e.g. publishing reports, extracting data for analysis, regulatory forms etc.
• Initiated under the direction of AICPA• XBRL website
www.xbrl.org
• XBRL DTDs and Schemas http://www.xbrl.org/Core/2000-07-31/default.htm
• Demos and Tools http://www.xbrl.org/Demos/demos.htm http://www.xbrl.org/Tools.htm
ISOM
News ML
• Designed to be media-independent• Initiated by International Press
Telecommunications Council• Enables tracking of news stories over time• NewsML website
www.newsml.org
• NewsML DTD http://www.oasis-open.org/cover/newsML.html
• SportsML DTD – Derived from NewsML DTD http://xml.coverpages.org/sportsML.html
ISOM
cXML
• CommerceXML from Ariba plus 40 other companies• cXML website
www.cxml.org
• Primary Set of Tools/Implementations to support cXML http://www.ariba.com/solutions/solutions_overview.cfm See also Whitepapers link explaining how these can be
used for • E-procurement• E-fulfillment• And others ..
ISOM
xCBL
• xCBL from Microsoft, SAP, Sun• xCBL website
www.xcbl.org Marketed as XML component library for B2B
e-commerce
• Available Resources (see internal links) DTDs and SchemasXDK: SOX Parser and an XSLT EngineExample Documents
ISOM
ebXML
• UN/CEFACT: the United Nations body whose mandate covers worldwide policy and technical development in the area of trade facilitation and electronic business. www.uncefact.org
• ebXML website www.ebxml.org
• Current Endorsements http://www.ebxml.org/endorsements.htm Still needs buy-in from the larger IS/IT vendors
• Related Effort: RosettaNet http://www.rosettanet.org/rosettanet/Rooms/DisplayPages/
LayoutInitial Business Processes for IT, Component and Chip companies
ISOM
Conclusion
• Overview
• Syntax and Structure
• The XML Alphabet Soup
• XML as a meta-language
ISOM
Resources
• http://www.xml.com/• http://www.w3.org/xml/• http://www.w3schools.com/• http://msdn.microsoft.com/xml/
Top Related