Languages for E-Business/E-commerceect7010/Materials/Lecture/Lec5.pdfSlide 5 ECT 7010 Fundamentals...
Transcript of Languages for E-Business/E-commerceect7010/Materials/Lecture/Lec5.pdfSlide 5 ECT 7010 Fundamentals...
Slide 1
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Languages for
E-Business/E-commerce
Slide 2
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Aspects of content delivery
• Languages for formatting and aesthetics (HTML, DHTML, Style Sheets)
• Structure and semantics of the content (XML)
• Scripting languages for processing of data and interface with external applications (CGI, Java Server Pages, JavaScript, VBScript, Coldfusion, PHP)
Slide 3
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
History of Web• Originally availability and dissemination of
information• Adopted SGML – Standard Generalized
Markup Language• Maintain the idea mark-up language, with
hyper links, removing complex features• HTML- HyperText Markup Language• DHTML – Dynamic HTML, more flexible• Not standardized, no common standards
adopted by the leading browsers.
Slide 4
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
XML – eXtensible Markup Language
• Supported by major players like Microsoft and IBM.
• Projected to be the most relevant web standard of the future.
• Best suited for E-Commerce (as front-end and back end sub-systems can be conveniently integrated).
Slide 5
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
PDF – Portable Document File
• Proprietary format from Adobe• Best suited for distributing complex
documents• Preservation of original style, format etc.• Freely available Reader (Adobe Reader),
– no charge for viewing the documents.
Slide 6
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
SGML – Standard Generalized Markup Language
• Born out of generic coding and mark-up languages in early 1970’s.
• SGML as a formal standard under ISO• First edition in 1986• Amended in 1988.
Slide 7
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
SGML (contd.)
• Specifies a standard method for describing the structure of a document.
• Open-ended definitions• Does not directly specify any type of content
data, no restrictions on any type of data.• Flexible, able to describe any logically structured
set of information (e.g. form, memo, book, dictionary, spreadsheets, databases).
• Sophisticated, used by a loyal group of developers,
Slide 8
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
SGML - Not a markup scheme.
• But a means for describing any markup scheme.
• SGML can be used for developing markup schemes for different document classes.
Slide 9
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Uses of SGML
• Publishing• Multi-media• Save and store information for long
term – e.g. Association of American Publishers (AAP),
• Manuscript preparation and markup for publishers, authors, and editors – Air Transport Association (ATA),– Computer-aided Acquisition and Logistic Support (CALS) created
by U.S. Department of Defense• Supporting and maintaining military equipments
Slide 10
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
HTML – Hyper Text Markup Language
• Most prevalent form of web pages is HTML.• Born out of rejection of SGML’s complexity, is
easy to use.• Both Microsoft I.E. and Netscape Communicator
support it. • HTML content can be formatted with the addition
of a few tags.• Can be created using dynamic code generators
or templates• Word processor files, even MS-Excel, Access and
PowerPoint files can be exported to HTML.
Slide 11
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Advantages and disadvantages of HTML
• HTML has pre-defined tags, both a strength and a weakness, depending on the user’s experience.
• Elements of HTML include title, body, background, paragraphs, lists, tables, forms, formatting (bold, italics, underline)
• Can add Java applets and ActiveX controls within HTML pages.
• HTML can be developed using editors ranging in complexity from MS Notepad to WYSIWYG.
• Limitation – tags cannot be created for new un-defined structure, format.
Slide 12
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Insertion of pictures and graphics in HTML
• One of the main reasons for the success of the WWW.
• Inline images, hyper-linked images (stored on a different page).
• Types of raster graphics: GIF, JPEG, PNG, no need for plug-ins.
Syntax for inserting a picture<html><head><title> Picture of a dog </title></head><body><IMG SRC = “dog.jpg” border = 0></body></html>
Slide 13
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Other features• Align – takes values bottom, top, left, middle and right.• Alt- an alternative message• Border – defines border width• Height- height of the image and allows resizing• HSpace- horizontal space• SRC- URL of the image• VSpace- vertical space• Width – width of the image and allows resizing• Background color
– <body bgcolor = “#99CCFF”>– Here, 99 denotes red, CC denotes green and FF denotes blue in hexadecimal notation.– or – <body background = “lightgreen.jpg”>
Slide 14
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Most common HTML tags<html> </html> Marks the beginning of the document and end<head> </head> Specifies the beginning and end of header<title> </title> Indicate the title ; not displayed on web page<body> </body> Indicates the main part of web page<hn> </hn> Specifies the size of heading from 1 (large) to 6<p> </p> Delimits a paragraph with a blank line<li> </li> Delimits the beginning and end of unordered list<hr> Inserts a horizontal line<br> Indicates a cut in the flow going to next line<b> </b> Indicates that the text within is emphasized<i> </i> Indicates that the text within is italicized<u> </u> Indicates that the text is underlined<table> </table> Inserts a table in the document<form> </form> Inserts a form in the document
Slide 15
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
For additional functionalities, we might want to add Java applets
<APPLETCODEBASE=“..”CODE=“hm30.class”NAME=“HotMedia”WIDTH=“239”HEIGHT=“50”><PARAM NAME=“mvrfile”VALUE=“data/preview.mvr”></APPLET>
Slide 16
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Links to other pages
• Hyper-links- most attractive feature in Hypertext media.
• Makes navigation through the pages very convenient.
• Useful for e-commerce also, as it can be used to balance and distribute the load among various servers. Delegate time consuming operations to the service providers.
• <A HREF=“second.htm”> second page </A>• Here <A> and </A> act as anchors for the link.
HREF indicates the URL of the target page.
Slide 17
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Web publishing
• A web server should be running on the computer where the document to be published is located.
• FTP (File Transfer Protocol) allows remote insertion of pages on a web server.
• HTML editors– Many editors available, range of attributes
starting from simple Windows Notepad
Slide 18
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Features of HTML editors• Collaboration and site management
– Version control, navigate and control• Database features
– Integrate with database through JDBC or ODBC– Send from data to email address or save from data to file
• Deployment features– Multilingual editing, ftp to Web server
• Design features – JavaScript & VBScript support, – easy integration with CSS1 and CSS2, – DHTML with cross-browser tuning– Visual wizard for forms, tables and frames – Support for image composition and mapping, Pixel-precise positioning– Drag and drop support– HTML syntax checking, spell checker, site navigation overview
• HTML editing features– Customizable templates– HTML validation and cross XML compliance tool.– Search, replace, replace all features– Supports Java applets, ActiveX, CGI– Syntax coloring– In-built DHTML scripts or wizard– Import, view, play multi-media files (GIF, JPEG, BMP, WAV and MIDI)
Slide 19
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Advanced HTML and DHTML• Dynamic behavior of a web page can be created using many
technologies, e.g. JavaScript, VBScript, Document Object Model (DOM), and Cascading Style Sheets (CSS).
• The display of the web page can be changed after the page loads.
• Use of CSS for a uniform look and feel of the whole website.• Event-driven animation can be interactive and interesting.• Additional HTML tags
– Tags <div> and <span>– Advisable to use <div> as a generic container.
• Create a box and placed anywhere in the page and filled with whatever content
Slide 20
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.0 Transitional //EN”><HTML><STYLE type = “text/css”><!--#pic1{POSITION: absolute;Z-INDEX: 1;LEFT: 30px;TOP: 30px;Visibility: visible}-- ></STYLE><BODY><div id= pic1><img src= “http://mis.mgmt.umb.edu/euni.gif” width= 100height=50 alt= “ ” border = “0”></div></BODY></HTML>• Both absolute positioning and relative positioning are possible.• With DHTML, less data has to be downloaded, as there are no
large bitmap files.
Slide 21
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Elements that DHTML can control• Ability to hide and unhide portions of pages.• Display attributes of style sheets (text, background,
form field, images, frames, tables and paragraphs).• Animation effects, make them more interactive and
engaging• Scroll bars, ticker objects• Events
– OnBlur, OnFocus, OnLoad, OnAbort, OnChange, OnClick, OnError, OnKeyDown
Slide 22
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Portable Document Format (PDF)
• PDF writer and distiller• On any computer platform
– (DOS, Mac, Unix, Windows)• Preserves the format and looks• Can’t be modified, tampered with if protected• PDF Reader available free of cost to any user.• FrameMaker and Illustrator are able to write PDF
format.• Corel with WordPerfect can export to the original
PDF format but cannot modify existing documents.• Can be setup as an add-in to MS Word
Slide 23
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Cascading Style Sheets (CSS)• More powerful• More flexible• Useful for designing consistent looking web pages, like
templates• Designers will create style sheets and apply them to any web
page.• Developers can define their own classes, and new HTML
elements.• CSS Fundamentals
– Style sheets have a defined order of precedence– Formatting rules are applied in a hierarchical manner– CSS1 – by W3C, a set of style sheets or statements that may
determine how a given element is presented in a web page format,using Netscape and I.E. browsers.
• Advantages of CSS– Separation of style and layout of HTML files from their informational content.– Provides relative measurement for any size of monitor screen or resolution.– Enable companies to implement a house look and feel on their site, promote branding.– Improve the printing of web documents instead of having unpredictable HTML transfer to
paper.– Enable access to the web for people with disabilities (larger fonts, variation of colors)
Slide 24
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
CSS2• A newer standard proposed by W3C and agreed
upon by the industry for richer and more accessible web pages.
• See latest news:– http://www.w3.org/Style/CSS/#news
• New features – sidebars, navigation scrollbars• Images can be layered• Control over table layout• Useful features of CSS
– Both absolute and relative measurements can be applied– Color control– Fonts and texts can be formatted – Position, alignment properties– Spacing and areas (which includes borders, margins,
padding, width, height, float property and clear property).
Slide 25
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Example of a simple style<html><head><title>Style sheet</title><style type = “text/css”><!--body {background: #FFFFFF}A:link {color: #80FF00}A:visited {color: #FF00FF}H1 {font-size: 24pt; font-family: arial; color:blue}H2 {font-size: 18pt; font-family: braggadocio}H3 {font-size: 14pt; font-family: Desdemona}-- ></style></head><h1>this is heading 1 </h1><h2>heading 2</h2><h3>heading3</h3></body></html>
Slide 26
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
SGML, XML, and HTML
Slide 27
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Relating XML to SGML
Slide 28
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
SGML, HTML, XML
• Standard Generalized Markup Language (SGML) is a way of expressing data in text-processing applications. It’s been around for more than a decade;
•both XML and HTML are document formats derived from SGML. Thus they all share certain characteristics, such as a similar syntax and the use of bracketed tags. •But HTML is an application of SGML, whereas XML is a subset of SGML.
• The distinction is important. Basically HTML can’t be used to define new applications, but XML can.
Slide 29
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Relating XML to SGML
• SGML is a very powerful, very general standard, but with that power comes increased complexity.• XML is a subset of SGML intended to make SGML “light”enough for use on the Web in which XML is a proper subset of SGML - all XML documents are valid SGML documents, however not all SGML documents are valid XML documents:
SGML XML
Slide 30
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Relating XML to SGML (Cont’d)
The following shows the relationships among SGML, XML, and HTML, rectangle boxes indication applications, and ellipses indicate framework languages or meta-languages:
SGML XML
Text EncodingInitiative
ChannelDefinitiion
Format (CDF)
Open FinancialExchange
(OFX)
ChemicalMarkup
Language(CML)
DocBook
Edgar
HTML
SGML and XML Applications
WirelessMarkup
Language(WML)
Slide 31
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Relating XML to HTML
Slide 32
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Relating XML to HTMLXML documents consist of three distinct components, namely:
1. Data content - the words themselves2. Structure - the document type and the organization of its elements, i.e., memo, contract, cooking recipe. Also, what kindof elements it can contain and in what order they can occur.3. Presentation - the way the information is presented to the reader, on a piece of paper, a browser screen or via voice synthesis. Also which fonts or voice inflections are used for each element type and so on.
The central idea of XML is that significant benefits accrue to the document owner if these three aspects of a document are kept separate and made explicit in a computer system.
Slide 33
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Relating XML to HTML (Cont’d)
Let’s compare and contrast the treatment of the three strands of a document in traditional word processors to their treatment in XML.
Content
Structure
Presentation
Content
Structure
Presentation
TraditionalWYSIWYG
(HTML)Document
XML Document
Difference in XML andHTML documents
Slide 34
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Relating XML to HTML (Cont’d)• Structure
Traditional approach: As for structure - capturing what the information really is - the concept is hardly present at all. The only structural information stored relates to the creation of the final paper output - details about page margins, font sizes, and so on.XML approach: the inherent structure of documents such as procedure manuals, invoices, and tax returns is considered just as important as the content itself. Presentation information isalso important but is kept well separated from the content.
• In XML, you create document content by concentrating on what the information really is and how it is structured as shown in the next slide.
Slide 35
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
What is XML?
Slide 36
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
What is XML?
• XML is actually compatible with SGML - XML documents can be read by any SGML authoring or viewing tool.
• However, XML is less complex than SGML, and it’s designed to work across a limited bandwidth network such as the Internet.
• According to Tim Bray, coeditor of the XML specification, the idea behind XML was to take the benefits of SGML, remove the complicated parts, keep it light, and make it work on the Web.
Slide 37
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
What is XML? (Cont’d)• HTML, SGML, and XML will continue to be used where
appropriate; none of them will render the others obsolete. • HTML will remain the simplest way to publish data
quickly on the Web, mostly short-term data such as meeting agendas or advertising brochure.
• If the data has a longer-term use and needs a bit more structure, Web builders will want to move to XML.
• Unlike HTML and XML, SGML will probably never gain widespread acceptance on the Internet, simply because it was never designed or optimized for the demands of a network protocol. For high-end, highly structured publishing applications, SGML will continue to fit the bill.
Slide 38
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
What is XML? (Cont’d)
• XML is an acronym which stands for eXtensible Markup Language.
• XML is not a software program, and like HTML, it provides a standard approach for describing, capturing, processing, and publishing information which has significant benefits over HTML.
Slide 39
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
What is XML? (Cont’d)
• XML Vs. HTML
HTML XML <!-- HTML snippet --> <h1>Invoice </h1> <p>From: Joe Bloggs <p>To : A. Another <p>Date : 1 Feb 1999 <p>Amount : $100.00 <p>Tax : 21 % <p>Total Due : $121.00
<!-- XML snippet --> <Invoice> <From>Joe Bloggs</From> <To>A. Another</To> <Date year = ‘1999’ month = ‘2’ day = ‘1’ /> <Amount currency = ‘Dollars’>100.00</Amount> <TaxRate>21</TaxRate> <TotalDue currency = ‘Dollars’>121.00</TotalDue> </Invoice>
Slide 40
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
What is XML? (Cont’d)
– XML is a markup language, unlike other markup languages, e.g. HTML, are fixed markup languages that provide a certain feature set in its markup which is fixed in the design of the language. It has a fixed set of tags with which you craft your documents - <H1>, <P>, <TABLE>, etc.
– XML, on the other hand, does not define any particular set of tags, rather, it provides a standardized framework with which to define your own, or to use those defined by others that best fit your needs.
Slide 41
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
What is XML? (Cont’d)
<Main><Name>Curry</Name><CountryOfOrigin Country = “India”/><Description>Distinctive. Excellent with slow
cooked, earthy dishes.</Description><Example>Curry Chicken</Example></Main>
Slide 42
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
What is XML? (Cont’d)
• The majority of people simply use the XML-based markup languages created by others that best fit their purpose rather than creating his/her own markup language. For instance there is already a number of industry standard XML-based languages exist in fields such as Push Technology (CDF -Channel Definition Format), Electronic Commerce (IOTP –Internet Open Trading Protocol) and WAP (WML –Wireless Markup Language).
Slide 43
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
An Example: Use of XML
Slide 44
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
An Example: Use of XMLThe main advantage of being able to define your own markup language is that it gives you the freedom to capture and publish useful information about what your data is and how it is structured, instead of having to stick to the defined format.For instance, a company running an e-Business selling PCs on the Internet has the following sort of information that needs topublish:
Maker: Acme PC IncModel : Blaster 555Storage:
RAM: 72 MBHard Disk : 2GB
Slide 45
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
An Example: Use of XML (Cont’d)
<!-- HTML snippet --><h1>Personal Computers For Sale</h1><h2>Maker : Acme PC Inc</h2><h3>Model : Blaster 555</h3><table border = 1><tr> <td>Storage</td> </tr><tr> <td>RAM</td> <td>72MB</td></tr><tr> <td>Hard Disk</td> <td>2 GB</td> </tr>
</table>
Slide 46
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
An Example: Use of XML (Cont’d)
Slide 47
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
An Example: Use of XML (Cont’d)
The original data has been transformed into HTML for publishing purpose. The original data has been transformed, useful information about what the information really is has been lost. The HTML version of the data knows nothing about PCs or hard disk sizes. All it knows about are heading levels, tables, italic text, etc. As a consequence, when this document is let loose on the WWW, search engines and users alike see only a collection of levels, tables, italic text, etc as in the figure in the next slide.
Slide 48
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
An Example: Use of XML (Cont’d)
HTML document linked to the WWW:
Slide 49
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
An Example: Use of XML (Cont’d)
Instead, the XML document looks like this:
<!-- XML snippet --><PcForSale><Maker>Acme PC Inc</Maker><Model>Blaster 555</Model><Storage><Ram Units = “MB”>72</Ram><HardDisk Units = “GB”>2</HardDisk>
</Storage></PcForSale>
This document can have a much richer interface to the Web, an interface that presents all sorts of possibilities about how it might be put to use, as in the figure in the next slide.
Slide 50
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
An Example: Use of XML (Cont’d)
XML document linked to the WWW:
Slide 51
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
An Example: Use of XML (Cont’d)
By keeping information about what the pieces of data really are, i.e. a hard disk capacity, a PC model name, etc - you can contemplate:1. Letting the browser do the work to format the data on the user’s screen. Perhaps allowing users to choose between a variety of “looks” or presentation formats for the same data.2. Letting the user’s browser perform calculations on the data, and manipulate and display the results in a variety of ways.3. Allowing intelligent searching of the information, .e.g., “find all PCs for sale on the Web with disk drive capacity greater than 2GB.”
Slide 52
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
An Example: Use of XML (Cont’d)
4. Intelligently checking that all the pieces of information required for a proper entry on the PC selling Web page are actually there, e.g., “all PCs must have a RAM size element and can optionally have a hard disk size element.”5. Performing complex queries on the data either for your own management purposes or as a service to customers, e.g., “how many laptop PCs with built-in CD-ROM drives were sold last month in Arkansas?”6. Building rich links between different types of information, e.g., linking a sales invoice (itself perhaps an XML document!) with the particular makes/models of PCs it references.
Slide 53
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
An Example: Use of XML (Cont’d)
7. Standardizing a set of XML element types for an entire industry, such as PC vendors. Users and vendors alike would benefit from the standardization. Software “robots” could crawl the Web to find the perfect PC for you, based on criteria you specify. Vendors would be able to easily contrast their offerings with those of the competition via “tick sheets” and so forth.8. Avoiding the need to “dumb down” data into HTML prior to publishing. This activity often involves complex software and is frequently error prone. With XML, the data can be stored and published in the same format. You don’t need either batch or on-the-fly translation into HTML.
Slide 54
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Why’s XML important
Slide 55
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Why’s XML important
• Problem: A disclaimer saying “Best viewed at 800-by-600-pixel resolution”.– XML will help to solve that problem because,
rather than specifying where to display something, Web builders will be able to specify the structure of the document. For example, you can specify the document’s title, its author, a list of related links, and so on. Then any device with an XML browser -a palm-top computer, a set-top box, or a high-powered workstation, for example - will be able to render a version of the document specifically tailored to that device.
Slide 56
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Why’s XML important (Cont’d)
• Perhaps XML’s best feature, though, is its inherent extensibility. Companies and organizations will be able to extend XML to meet new challenges and applications. One XML-based language is already in use -Microsoft’s Channel Definition Format (CDF) - and more are on the way, including the Resource Definition Format (RDF) and the Open Software Description (OSD).
• XML holds the promise of becoming a standardized mechanism for the exchange of data as well as documents. For example, XML may become a way for databases from different vendors to exchange information across the Internet.
Slide 57
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Document Type Definition (DTD)
Slide 58
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Document Type Definition (DTD)
• A Document Type Definition (DTD) is a set of syntax rules for tags. It tells you what tags you can use in a document, what order they should appear in, which tags can appear inside other ones, which tags have attributes, and so on. A DTD can be part of an XML document, but it’s usually a separate document or series of documents.
• Because XML is not a language itself, but rather than a system for defining languages, it doesn’t have a universal DTD the way HTML does. Instead, each industry or organization that wants to use XML for data exchange can define its own DTDs.
Slide 59
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Document Type Definition (DTD) (Cont’d)
In a DTD, the XML documents can be arranged to be automatically checked in various ways, for instance:
1. A person’s name consists of an optional title, a given name, and a surname.2. A TV timetable contains one or more channels. Each channel contains one or more time slots. Each time slot has a program title and an optional description.
Slide 60
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Document Type Definition (DTD) (Cont’d)
These effects can be achieved in the DTD by listing the element types you wish to use in your document and indicating the structural order in which they can occur. A utility program called an XML Parser is then able to test whether or not the document meets the prescribed rules such as the following:
Slide 61
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
XML processor (or XML parser)
• A tool for reading XML documents is popularly called an XML parser, though the more formal name is an XML processor. XML processors pass data to an application for authoring, publishing, searching, or displaying. XML doesn't provide an application programming interface (API) to an application, it just passes data to it. No XML processor will parse data that isn’t well-formed.
Slide 62
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Well-formed and Valid Document
Slide 63
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Well-formed and Valid Document
• Two related types of XML documents– Well-formed document: A well-formed XML document
conforms to the general rules of XML syntax, which are more rigorous than those of either HTML or SGML.
– XML character data is never left hanging without an ending markup designation of some sort, either an end tag such as in the tag-pair <MYTAG></MYTAG> or a special empty element tag with a forward slash before the right-angle bracket, such as <MYTAG/>.
– XML markup always starts with a left-angle bracket or an ampersand; element types and attribute names are case-sensitive; attributes require quotation marks; and so on.
Slide 64
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Well-formed and Valid Document (Cont’d)
– Valid document: Valid XML documents are ones that conform to a specific DTD. Confirming the validity of XML documents is largely the work of authoring and publishing tools, whereas XML-capable browsers need only check for well -formedness in order to read XML documents.
• The XML parser in an authoring tool will have to worry about the checking for well-formedness and validity, but browsers will have to worry only about looking for well-formed XML.
Slide 65
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
• Well-form– Minimum set of requirements– logically coherent in the manner defined by
the XML specification• Valid
– well-form– conforms to a document type definition
containing declarations that specify structure of a document
Slide 66
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Well-formed
• Well-formed– Elements
• contain one or more elements• contain a uniquely name element (root
element), no part of which appears in the content of other elements
• all other elements within the root element must be correctly nested
Slide 67
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
<customer><customer-id id=“aa0001”>
<customer-name><first-name>Peter</first-name><last-name>Taylor</last-name>
</customer-name><customer-address>123, High Street</customer-address><customer-phone>21234567</customer-phone>
</customer-id><customer-id id=“aa0002”>
…</customer-id>
</customer>
Slide 68
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
– Attributes• values pass to the application by the XML
parser• do not constitute part of the content of the
element• should not contain <, &, or single ‘ or “
Slide 69
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
– Entities• a mechanism to associate a name with a long
piece of text• general entities
– <!ENTITY entityname “some lengthy text …. ”>– <!ENTITY entityname SYSTEM
“http://www.server.com/file.html”>
• parameter entities– <!ENTITY % entityname “some lengthy text … ”>
Slide 70
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
HTML Vs. Well-formed XML
• HTML<TITLE>Reasonable HTML</title>Some text, and I <I>really</I> don’t want a line break between “line” and “break.”<P>Here’s a picture: <IMG src=madonna.jpg>
• Well-formed XML<HTML><TITLE>Well-formed XML</TITLE>Some text, and I <I>really</I> don’t want a line  break between “line” and “break.”<P>Here’s a picture: <IMG src=“madonna.jpg”/></P></HTML>
Unicode non-breaking space
Slide 71
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
XML Applications
• Push TechnologiesCDF - Channel Definition Format
• Electronic Commerce transactions IOTP – Internet Open Trading Protocol
• MathematicsMathML - Mathematical Markup Language
• Wireless Application ProtocolWML – Wireless Markup Language
Slide 72
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Document Type Definition (DTD)
Slide 73
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
DTD
• Content Models: XML allows element types to be declared that can contain other elements, the order and occurrence of these elements can be constrained in various ways.
• Element type declarations<!ELEMENT …>e.g.,
<!ELEMENT foo EMPTY>foo element to be empty element
<!ELEMENT foo ANY>foo element contains any mixture of character data and other
elements as long as those element type have been declared in the DTD
<!ELEMENT foo (apple|orange|banana)>foo can contain exactly one apple, one orange, or one banana
element
Slide 74
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
DTD• A sequence of elements, one after another
<!ELEMENT person (name, address, telephone)>A person element consists of a name element, an address element, and a telephone element, in exactly that order.
E.g., (VALID)<person>
<name>…<address>…<telephone>…
</person>E.g., (INVALID)
<person><name>…<address>…
</person>
Slide 75
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
DTD
• A selection from a list of elements, only one allowed<!ELEMENT fruit (apple|orange|banana)>
A fruit element consists of an apple element, or an orange element, or a banana element.
E.g., (VALID)<fruit>
<apple>…</fruit>
E.g., (INVALID)<fruit>
<apple>…<orange>…<banana>...
</fruit>
Slide 76
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
DTD• An element occurring once or not at all
<!ELEMENT person (name, address, telephone?)>A person element consists of a name element, followed by an address element, optionally followed by a telephone element.
E.g., (VALID) or<person> <person>
<name>… <name>…<address>… <address>…<telephone>… </person>
</person>E.g., (INVALID)
<person><name>…<address>…<telephone>…<telephone>…
</person>
Slide 77
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
DTD• An element occurring zero or more times
<!ELEMENT person (name, address, telephone*)>A person element consists of a name element, followed by an address element, optionally followed by zero or more telephone elements.
E.g., (VALID) or<person> <person>
<name>… <name>…<address>… <address>…
</person> <telephone>…<telephone>…
</person>or
<person><name>…<address>…<telephone>…
</person>
Slide 78
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
DTD• An element occurring one or more times
<!ELEMENT person (name, address, telephone+)>A person element consists of a name element, followed by an address element, optionally followed by one or more telephone elements.
E.g., (VALID) or<person> <person><name>… <name>…<address>… <address>…<telephone>… <telephone>…</person> <telephone>…
</person>E.g., (INVALID)
<person><name>…<address>…</person>
Slide 79
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
, Strict ordering| Selection+ Repetition with minimum 1* Repetition with minimum 0? Optional() Grouping
Slide 80
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
DTD• An element containing any other element(s) in any order
<!ELEMENT person ANY>A person element consists of any combination of element in any order.
• Some more complex examples– <!ELEMENT invoice (from,to,item+)>
An invoice element consists of a from element followed by a toelement followed by one or more item elements.
– <!ELEMENT invoice (from,to?,item+)>An invoice element consists of a from element, optionally followed by a to element, followed by one or more itemelements.
– <!ELEMENT invoice (from,to*,item+)>An invoice element consists of a from element, followed by zero or more to element, followed by one or more item elements.
Slide 81
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
DTD• Some more complex examples
– <!ELEMENT invoice (from,to,(item+|batch*))>An invoice element consists of a from element followed by a toelement. This is followed either by one or more item elements or zero or more batch elements.
– <!ELEMENT invoice ANY>An invoice element consists of a any combination of elements and character data, in any order.
– <!ELEMENT invoice (((from,to)|(to,from)),item?)>An invoice element consists of the elements from and to in any order, followed by an optional item element.
– <!ELEMENT invoice (from,to,(item+|batch?))>An invoice element consists of a from element followed by a toelement followed by either one or more item elements or an optional batch element.
Slide 82
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
X (x,(y|z)?)x followed by either y or z optional<X><x> </x>
</X>
<X><x> </x><y> </y><z> </z>
</X>
Slide 83
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
x, y, and z in any order or a
( ( (x,y,z) | (x,z,y) | (y,x,z) | (y,z,x) | (z,x,y) | (z,y,x) ) | a )
Slide 84
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Attributes – Example 1<?xml version="1.0"?><!DOCTYPE Attribute_test [
<!ELEMENT Attribute_test ANY><!ELEMENT fruit (#PCDATA)><!ATTLIST fruit
type (Orange|Apple|Mango) "Apple"From CDATA #IMPLIEDProvider CDATA #FIXED "ABC Company"Price CDATA #REQUIRED>
<!ELEMENT Owner (#PCDATA)><!ENTITY owner "D.J.">
]><Attribute_test>
<fruit From="USA" Provider="ABC Company" Price="1.0"/><Owner>&owner;</Owner>
</Attribute_test>
Slide 85
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
AttributesAttribute List Declarations
– String– Enumerated– ID, IDREF, IDREFS– ENTITY, ENTITIES– NMTOKEN, NMTOKENS– NOTATION
• String Attributes<!ATTLIST foo bar CDATA … >
• Enumerated attributese.g.,
<!ATTLIST apple quality (GOOD|BAD|INDIFFERENT) … >
Valid: <apple quality = “GOOD”>Invalid: <apple quality = “good”>
Slide 86
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Attributes• ID/IDREF/IDREFS
ID:<!ATTLIST foo UniqueName ID … >
e.g.,<foo UniqueName = “P1234”>
IDREF: IDREF attributes are the flip side of the coin. Amy IDREF attributes assigned values in a valid XML document must match the value assigned to an ID attribute somewhere in the document.Here is an example of a bar element using its IDREF attribute to point to the foo element of the last example.<!ATTLIST bar Reference IDREF … ><bar Reference = “P1234”>
Slide 87
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Attributes• ENTITY/ENTITIES
ENTITY is similar to the sort of macros you would expect to find a word processing package they are short strings or characters that can be used as abbreviations for large pieces of text (or markup)
(SEE Example 1)
Slide 88
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Attributes
• NMTOKEN/NMTOKENSAn NMTOKEN attribute is restricted to certain characters allowed in a name: any combination of letters, digits, and some punctuation characters “.”, “_”, “-”, and “:”. Note that this list does not contain any white space characters.
E.g.,<ATTLIST product code NMTOKEN …>
Valid: <product code = “Alpha-123”><product code = “333”>
Invalid: <product code = “A 123”>
Slide 89
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Attributes defaults
• Required attributes
(SEE Example 1)
Slide 90
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Attributes defaults
• Implied attributesThese are attributes that can be left unspecified if desired. The XML processor passes the fact that the attribute was unspecified through to the XML application, which can then choose what best to do.
(SEE Example 1)
Slide 91
ECT 7010 Fundam
entals of E-Com
merce Technologies Edited by Christopher C. Yang
Attributes defaults
• Fixed attributesThese are attributes that have their value fixed in
the DTD.
(SEE Example 1)