XML-talk
Click here to load reader
-
Upload
aschwarzman -
Category
Documents
-
view
83 -
download
1
Transcript of XML-talk
![Page 2: XML-talk](https://reader038.fdocuments.in/reader038/viewer/2022100723/58efd5ec1a28ab5f7b8b4667/html5/thumbnails/2.jpg)
Page 2 of 16
1. XML by example
1.1. Credit card statement (paper)
Cardmember Statement ACCOUNT NUMBER: 4444888822221111 AVAILABLE CREDIT: 5,000 CLOSING DATE: 11/25/02 PAYMENT DUE DATE: 12/15/02 CARDMEMBER STATEMENT SUMMARY TRANS DATE
POST DATE
REFERENCE NUMBER
DESCRIPTION OF TRANSACTION
CREDITS CHARGES
1023 1025 2416QZP Townhouse Store #3306 DC 10.65 1027 1027 2422KQ12 Wazuri DC 55.00 1103 1103 7422120F Payment – Thank you 1,000.00
![Page 3: XML-talk](https://reader038.fdocuments.in/reader038/viewer/2022100723/58efd5ec1a28ab5f7b8b4667/html5/thumbnails/3.jpg)
Page 3 of 16
1.2. Credit card statement (XML)
![Page 4: XML-talk](https://reader038.fdocuments.in/reader038/viewer/2022100723/58efd5ec1a28ab5f7b8b4667/html5/thumbnails/4.jpg)
Page 4 of 16
1.3. XML building blocks XML deal with documents A document is a basic unit of XML information, composed of elements and other
markup in an orderly package <Description> Payment -- Thank you </Description> Start tag Character data End tag Markup Element Markup
An element is an identifiable, named component of a document can have content (but doesn’t have to): data, other elements
can be a pointer to information (cross-reference, link)
must have one start and one end tag
elements can nest but cannot overlap
An attribute provides additional information about an element <Transaction Category=”Groceries”> found inside start tag
![Page 5: XML-talk](https://reader038.fdocuments.in/reader038/viewer/2022100723/58efd5ec1a28ab5f7b8b4667/html5/thumbnails/5.jpg)
Page 5 of 16
may be required or implied
an element may have multiple attributes
1.4. Credit card statement DTD
• What DTD can (structure, sequence, in-document linking, selected occurrence
indicators) and cannot provide for (datatyping, flexible occurrence indicators)
1.5. Document types and their instances • Invoice
• Sales catalog
• Dictionary
• Journal article
![Page 6: XML-talk](https://reader038.fdocuments.in/reader038/viewer/2022100723/58efd5ec1a28ab5f7b8b4667/html5/thumbnails/6.jpg)
Page 6 of 16
1.6. Validating parser
1.6.1. What parser does • Is document well-formed? (for stand-alone docs)
• Does a DTD conform to XML specs?
• Does a document instance conform to the DTD?
1.6.2. What parser does not do • Check semantics (“gobbledygook” might be meaningless but valid as far as a
validating parser is concerned)
• Check what a DTD cannot enforce (datatyping, flexible occurrence indicators)
1.7. Credit card statement in XML environment
![Page 7: XML-talk](https://reader038.fdocuments.in/reader038/viewer/2022100723/58efd5ec1a28ab5f7b8b4667/html5/thumbnails/7.jpg)
Page 7 of 16
2. Components of an XML system • Document instance
• DTD/Schema
• Validating parser
• Processing system
2.1.1. Document Two kinds Well-formed
Valid (has a model)
Usually created Manually – using XML Editor (Epic, XMetaL)
Programmatically from a database, another XML document, or by conversion from another format (LaTeX, MSWord)
2.1.2. DTD The modeling mechanism specified by the XML standard models one type of information
is a set of rules describing how documents of that type can be marked up
2.2. Processing system XML DOES NOT DO ANYTHING! Your software CAN! Start/stop behavior Run a script, load a database, create a “form letter” and fill-in contents
Link
Format (start bold, end bold)
Process Extract selected elements (e.g., metadata)
Rearrange/resequence content
Rename, add content
![Page 8: XML-talk](https://reader038.fdocuments.in/reader038/viewer/2022100723/58efd5ec1a28ab5f7b8b4667/html5/thumbnails/8.jpg)
Page 8 of 16
Count how many
3. XML origins
3.1. What is markup? Information added to a document that enhances its meaning in certain ways, in
that it identifies the parts and how they relate to each other.
3.2. Pre-electronic (traditional) markup Set this header in 12-point Helvetica Medium italic on a 14-point text body, justified on a 22-Pica slug with indents of 1 en on left and none on the right.
3.3. Markup language A set of symbols that can be placed in the text of a document to demarcate and
label the parts of that document
3.4. Specific markup languages Tells formatter what action to take: "carriage return", "center the following lines",
"go to the next page", etc.
3.4.1. RTF, Script, etc.
Script example .sp (skip one line) .bf roman 12 (change font size) .bd .ce Chapter 1. Introduction
(center "Chapter 1. Introduction" and print it in bold)
3.4.2. WYSIWYG Word Processors, DTP, and professional typesetting systems • WordPerfect, MSWord, WordStar, MacWrite
• Quark, Ventura
• XYVision, Penta, Miles 33
Proprietary, not interchangeable, structure and presentation inextricably intertwined. Retrieval, cross-referencing difficult.
![Page 9: XML-talk](https://reader038.fdocuments.in/reader038/viewer/2022100723/58efd5ec1a28ab5f7b8b4667/html5/thumbnails/9.jpg)
Page 9 of 16
3.5. Generic markup languages Uses descriptive tags rather than formatting codes. Indicates logical structure of
the documents. Separates formatting from structure/content.
3.5.1. Macro-based languages • LaTeX for TeX
• Syspub for Waterloo Script
• ms for nroff
LaTeX example \to{Mr. Smith} stands for 3 commands \noindent \settabs 6 \columns \+TO:&Mr. Smith\cr
3.6. SGML • 1960s. GCA’s “GenCode” (Graphics Communications Association)
• 1969. IBM’s GML. Generalized Markup Language (Charles Goldfarb, Edward Mosher, and Raymond Lorie)
• 1978. ANSI working group formed to provide a format for text interchange to develop a standard text-description language based on GML headed by Charles Goldfarb
• 1983 SGML developed. DoD and IRS adopt SGML. DoD develops CALS (Computer-Aided Acquisition and Logistic Support) as an SGML application. (CALS tables still in use.) AAP develops DTDs for books and journals. SGML spreads in Europe and North America
• 1986. ISO ratifies SGML as a standard (ISO 8879:1986)
3.7. HTML • Early 1990s. Tim Berners-Lee and Anders Berglund of European particle physics
lab CERN develop HyperText Markup Language (Berglund designed a publishing system to test SGML in the 1980s)
• HTML is an application of SGML for hypertext documents
![Page 10: XML-talk](https://reader038.fdocuments.in/reader038/viewer/2022100723/58efd5ec1a28ab5f7b8b4667/html5/thumbnails/10.jpg)
Page 10 of 16
• Both a step forward (Web, wide adoption, public interest in markup) and a step back (generic coding principles compromised: one (!) doc type used for all purposes, many tags purely presentational)
HTML example
![Page 11: XML-talk](https://reader038.fdocuments.in/reader038/viewer/2022100723/58efd5ec1a28ab5f7b8b4667/html5/thumbnails/11.jpg)
Page 11 of 16
HTML tags format
4. XML 1998. W3C group under Jon Bosak: simplified version of SGML: 80% of SGML power with 20% of its complexity
4.1. What XML can do XML can be used to tag… Content (what type of information is this?) City, state, zip
Part number
Debit, credit, payment
Question, answer
![Page 12: XML-talk](https://reader038.fdocuments.in/reader038/viewer/2022100723/58efd5ec1a28ab5f7b8b4667/html5/thumbnails/12.jpg)
Page 12 of 16
Genus, species
Indications, counter-indications
Structure (what part of document is this?) Paragraph, sub-section, section, chapter, list
Table, figure, formula, video
Author block, signature block, address block
Pointers (Location, navigation, linking, and other relationships) Hypertext links
Cross-references
Indexing terms
Metadata (information about data) Bibliographic/cataloging information (author, title, publication date)
Index terms and keywords (search terms)
Revision, version, edition
Status, tracking information
Data sources
Editor’s and reviewer’s comments
Abstracts, highlights, “teasers”, “blurbs”
Rendering/Processing (if you MUST) – how text should behave, display, or print normally handled through a stylesheet but… position of graphic on the page (floating, centered)
line break in titles
tables
author’s whimsy (“I want this word bold just because”)
![Page 13: XML-talk](https://reader038.fdocuments.in/reader038/viewer/2022100723/58efd5ec1a28ab5f7b8b4667/html5/thumbnails/13.jpg)
Page 13 of 16
4.2. XML is… A subset of SGML. A meta-language that describes the concepts and rules to
build domain-specific markup languages A family of technologies/standards (W3C Recommendations): XSLT, XSL,
Xpointer, XPath, XQuery, Xlink, DOM, SAX, etc. XML can be used: for document modeling
for data interchange
4.3. XML applications (domain-specific markup languages)
Device/media-oriented: • XHTML - Web
• WML – wireless markup language
• VoxML – spoken word markup language
Discipline-oriented: • MathML – mathematical markup language
• CML – chemical markup language
Industry-oriented: • Airlines/aircraft
• Semiconductors
Process-oriented: • SVG - Scalable Vector Graphics
4.4. XML is not… • a programming language. Does not replace C++, Java, Perl, etc.
• a user interface
• a presentation format
• a text formatting or processing system
• a standard set of document types
![Page 14: XML-talk](https://reader038.fdocuments.in/reader038/viewer/2022100723/58efd5ec1a28ab5f7b8b4667/html5/thumbnails/14.jpg)
Page 14 of 16
• a standard or recommended set of tags
• UNICODE
• a database
• user-unfriendly
5. XML in a publishing environment
![Page 15: XML-talk](https://reader038.fdocuments.in/reader038/viewer/2022100723/58efd5ec1a28ab5f7b8b4667/html5/thumbnails/15.jpg)
Page 15 of 16
5.1. Uncontrolled inputs, controlled outputs
Hand held computer
Cell phone
Telephone
A&IServices
XMLdocument
TOCsIndicesSearch
Interfaces
XML DB
WordPerfect
MSWord
LaTeX
HTML
PostScript
XMLConverter
Com
posi
tion
Engi
ne
Low-resPDF
High-resPDF
XML Article
HTML
XSLTstylesheet
CrossRefMDDB
![Page 16: XML-talk](https://reader038.fdocuments.in/reader038/viewer/2022100723/58efd5ec1a28ab5f7b8b4667/html5/thumbnails/16.jpg)
Page 16 of 16
5.2. Integrated environment with controlled inputs and outputs Example: technical manual (aircraft, automobile, etc.)
Conceptual configuration of a database-centered XML-aware system (adapted from The SGML Implementation Guide by B. Travis and D. Waldt)
Authoring Editing Reviewing
Copy-editing
Converting
Imaging
ComposingPublishing
Abstractingand Indexing
Searching Archiving
Revising
Tracking
Referencingand linking
Translating
Assigning
Master Database - Text Objects - Graphics - Works in Progress
6. XML advantages • Encode (markup) data only once. Create single information repository
• Separates content/structure from presentation/formatting
• Software/hardware independent
• Interoperability: common language for a community to agree on data content; machine-to-machine communication.
• Portability
• Preservation
• Non-proprietary/open industry standard
• Reuse/re-purposing (many outputs)
• Enables semantically complex searching and retrieval
• Cuts down on the number of required converters (saves software development costs)