se3316a_2013_02_intro_html
description
Transcript of se3316a_2013_02_intro_html
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 1/61
SE 3316 © Jagath Samarabandu 1
SE 3316a - WEB TECHNOLOGIES
Chapter 1
HTML and Web Pages
TEB 351
519-661-2111 x80058
Dr. Jagath Samarabandu
Slides based on those from Møller et. al.
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 2/61
SE 3316 © Jagath Samarabandu 2
Objectives The history of HTML
URLs and related schemes Survivor's guides to HTML and CSS
Limitations of HTML Unicode
The World Wide Web Consortium (W3C)
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 3/61
SE 3316 © Jagath Samarabandu 3
Hypertext Collections of document connected by
hyperlinks Paul Otlet, philosophical treatise (1934)
Vannevar Bush, hypothetical Memex system(1945)
Ted Nelson introduced hypertext (1968)
Hypermedia generalizes hypertext beyondtext
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 4/61
SE 3316 © Jagath Samarabandu 4
Markup Languages
Notation for adding formal structure to text
Charles Goldfarb, the INLINE system (1970) Standard Generalized Markup Language,
SGML (1986) DTD, element, attribute, tag, entity:
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 5/61
SE 3316 © Jagath Samarabandu 5
SGML Example
<!DOCTYPE greeting [<!ELEMENT greeting (#PCDATA)><!ATTLIST greeting style (big|small) "small"><!ENTITY hi "Hello">
]><greeting style="big"> &hi; world! </greeting>
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 6/61
SE 3316 © Jagath Samarabandu 6
The Origins of the WWW
WWW was invented by Tim Berners-Lee at
CERN (1989) Hypertext across the Internet (replacing FTP)
Three constituents: HTML + URL + HTTP HTML is an SGML language for hypertext
URL is a notation for locating files on servers
HTTP is a high-level protocol for file transfers
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 7/61
SE 3316 © Jagath Samarabandu 7
The Design of HTML
Simple, purist design principles
HTML describes the logical structure of adocument
Browsers are free to interpret tags differently HTML is a lightweight file format
Size of file containing just ”Hello World!”:− MS Word: 11,274 bytes
− PDF: 4915 bytes
− HTML: 28 bytes
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 8/61
SE 3316 © Jagath Samarabandu 8
The History of HTML
1992: HTML 1.0, Tim Berners-Lee original proposal
1994: HTML 2.0, standard with best features 1996: Competing Netscape and Explorer features
1996: HTML 3.2, the Browser Wars end
1997: HTML 4.0, stylesheets are introduced 1999: HTML 4.01, we have a winner!
2000: XHTML 1.0, an XML version of HTML 4.01
2001: XHTML 1.1, modularization 2002: XHTML 2.0, simplified (abandoned in 2009)
2008: XHTML5/HTML5 initiative (current draft Aug. 6, 2013)
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 9/61
SE 3316 © Jagath Samarabandu 9
Uniform Resource Locator - URL
Consists of a scheme (http), a server
(example.com ), and a path (/ex/one) A Web resource is located by a URL
http://example.com/ex/one/
Relative URLsgml/dtd.html
Fragment identifierhttp://example.com/ex/page.html#sec1
Subset of the generic URI
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 10/61
SE 3316 © Jagath Samarabandu 10
Uniform Resource Identifier - URI
scheme:scheme-specific-part
Conventions about use of /, #, and ?− “/” for separating hierarchical structure
−
“?” for separating a query from query-ableresource− “#” for separating URI from “fragment identifier”
http is most common scheme. Other schemes: mailto, file, ftp
Official registry of schemes at IANA
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 11/61
SE 3316 © Jagath Samarabandu 11
Uniform Resource Name - URN
A pointer to a resource without referring to a
particular location.− E.g. urn:isbn:0-471-94128-X
Not widely used at time of introduction
Some applications in semantic web withinResource Description Framework (RDF)
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 12/61
SE 3316 © Jagath Samarabandu 12
Internationalized Resource Identifier
Syntax of URI is restricted to US-ASCII
IRI is being developed to allow internationalcharacters in a URI
Uses a mapping that is rather complex E.g. http://www.blåbærgrød.dk/blåbærgrød.html
Maps to: http://www.xn--blbrgrd-fxak7p.dk/bl%E5b%E6rgr%F8d.html
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 13/61
Data Flow Models in WWW
Sep 10, 2012 SE 3316 © Jagath Samarabandu 13
Server
Server
Server
Browser
Browser
Browser
Static pages
Server side
processing
Server sideprocessing Javascript
engine
Client sideprocessing
Asynchronouscommunication
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 14/61
SE 3316 © Jagath Samarabandu 14
Survivor’s Guide to HTML
Overall structure of an HTML document
<html><head><title>The Title of the Document</title></head><body bgcolor="white">...
</body></html>
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 15/61
SE 3316 © Jagath Samarabandu 15
Simple Formatting
<html><head><title>Good Advice</title>
</head><body><h1>Good Advice for Everyday Life</h1><h2>For UNIX programmers</h2>
<b>Never</b> type:<p><tt>rm -rf /*</tt><p>on your computer.<h2>For Nuclear Scientists</h2><b>Never</b> press the
<i>Big <font color="red">Red</font> Button</i>.</body>
</html>
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 16/61
SE 3316 © Jagath Samarabandu 16
More Formatting<html>
<head><title>Things To Do</title>
</head><body><ol>
<li>Feed the cat.<li>Try out the shell command:<pre>foreach x ( `ls` )
cat $x | tr "aeiouy" "x" > $xend</pre>
<li>Buy ticket for Timbuktu.
</ol></body>
</html>
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 17/61
SE 3316 © Jagath Samarabandu 17
Hyperlinks: Source Document
<html><head>
<title>Source Document</title></head><body>
<a href="target.html#danger">Look here</a>.</body>
</html>
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 18/61
SE 3316 © Jagath Samarabandu 18
Hyperlinks: Target Document
<html><head>
<title>Target Document</title></head><body>
...<a name="danger"></a>
<h2>Chapter 17: Dangerous Shell Commands</h2> Never execute a shell command thatinadvertently changes all vowels tothe character 'x'.
</body></html>
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 19/61
SE 3316 © Jagath Samarabandu 19
Tables<table border="1">
<tr><td>PostScript</td><td align="right">11,274 bytes</td>
</tr><tr><td>PDF</td><td align="right">4,915 bytes</td>
</tr><tr><td>MS Word</td><td align="right">19,456 bytes</td>
</tr>
<tr><td>HTML</td><td align="right">28 bytes</td>
</tr></table>
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 20/61
SE 3316 © Jagath Samarabandu 20
Tables for Alignment
<table width="100%"><tr><td align="left"><a href="index.html"><img src="home.gif" border="0"></a><a href="info.html"><img src="info.gif" border="0"></a>
</td><td align="right"><a href="links.html"><img src="left.gif" border="0"></a><a href="survivor.html"><img src="right.gif” border="0"></a>
</td></tr>
</table><h1>Using Tables</h1>
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 21/61
SE 3316 © Jagath Samarabandu 21
Fill-Out Forms<form method="get"action="http://www.google.com/search"><input type="text" name="q">
<input type="submit" name="btnG" value="GoogleSearch"></form>
Tip:
Use http://www.brics.dk/ixwt/echo as the action
URL will echo the GET and PUT variables
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 22/61
SE 3316 © Jagath Samarabandu 22
GUI Elements<input name="foo" type="text" size="20"><hr><input name="bar" type="radio" value="s">Small<input name="bar" type="radio" value="m">Medium <input name="bar" type="radio" value="l">Large
<hr><input name="baz" type="checkbox" value="c">Cheese<input name="baz" type="checkbox"value="p">Pepperoni<input name="baz" type="checkbox"value="a">Anchovies
<hr><select name="bar"><option value="s">Small<option value="m">Medium <option value="l">Large</select><hr><select name="baz" multiple><option value="c">Cheese<option value="p">Pepperoni<option value="a">Anchovies
</select>
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 23/61
SE 3316 © Jagath Samarabandu 23
GUI Elements ... contd.
<hr><textarea name="foo" rows="5" cols="20">
Write something here...
</textarea><hr><input name="foo" type="password" value="tomato"><hr><input name="foo" type="file">
<hr><input name="foo" type="hidden" value="you can't seethis"><hr><input name="qux" type="image" src="Denmark.gif"><hr>
<input type="submit" value="Submit this form"><hr><input type="reset" value="Reset this form">
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 24/61
SE 3316 © Jagath Samarabandu 24
Logical Versus Physical
Logical structure the page starts with a
header
the entries are written ina list
numbers are
emphasized
Physical layout headers are centered,
huge, and grey
lists have square bullets
emphasis is rendered inbold-style italics
Phone Numbers
John Doe, (202) 555-1414 Jane Dow, (202) 555-9132 Jack Doe, (212) 555-1742
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 25/61
SE 3316 © Jagath Samarabandu 25
Cascading Style Sheets – CSS
Cascading Stylesheets separate structure fromlayout
The essential concepts are selectors and properties
Properties describe physical layout of specific HTML
tags CSS 2.1 has 98 properties that can be set
A stylesheet associates properties with selectedtags
Avoids duplication
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 26/61
SE 3316 © Jagath Samarabandu 26
Some CSS Properties and Values
font-style normal, italics, oblique
font-size 12pt, larger, 150%, 1.5em
text-align
line-height normal, 1.2em, 120%
display
color red, yellow, rgb(212,120,20)
left, right, center, justify
block, inline, list-item, none
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 27/61
SE 3316 © Jagath Samarabandu 27
Applying a Stylesheet
h1 { color: #888; font: 50px/50px "Impact"; text-align: center; }ul { list-style-type: square; }em { font-style: italic; font-weight: bold; }
<html><head><title>Phone Numbers</title>
<link href="style.css" rel="stylesheet" type="text/css"></head><body><h1>Phone Numbers</h1><ul>
<li>John Doe, <em>(202) 555-1414</em><li>Jane Dow, <em>(202) 555-9132</em><li>Jack Doe, <em>(212) 555-1742</em>
</ul></body>
</html>
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 28/61
SE 3316 © Jagath Samarabandu 28
Structure of a Stylesheet
A selector is a list of tag names
For each selector, some properties areassigned values:− b {color: red; font-size: 12pt}
− i {color: green}
Longer selectors give context sensitivity:
− table b {color: blue; font-size: 12pt}− form b {color: yellow; font-size: 12pt}
− i {color: green}
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 29/61
SE 3316 © Jagath Samarabandu 29
CSS Selectors
The most specific selector is chosen to apply
E F matches any F element that is a descendant ofE element
E > F matches any F element that is a child element
of E E.foo matches any E element whose class attribute
contains foo
E#foo matches any E element with ID equal to foo Other selectors include *, + and pseudo-classes
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 30/61
SE 3316 © Jagath Samarabandu 30
Specificity in Action
<head><style type="text/css"> b {color: red;}
b b {color: blue;} b.foo {color: green;} b b.foo {color: yellow;} b.bar {color: maroon;}</style>
<title>CSS Test</title></head>
<body><b class=foo>Hey!</b><b>Wow!<b>Amazing!</b><b class=foo>Impressive!</b><b class=bar>k00l!</b><i>Fantastic!</i>
</b></body>
Hey!Hey!Hey!Hey! Wow!Wow!Wow!Wow! Amazing!Amazing!Amazing!Amazing! Impressive!Impressive!Impressive!Impressive! K00l!K00l!K00l!K00l! Fantastic!Fantastic!Fantastic!Fantastic!Green Red Blue ellow aroon Red
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 31/61
SE 3316 © Jagath Samarabandu 31
HTML Validity
HTML has a formal syntax specification
800 lines of DTD notation A validator gives syntax errors for invalid
documents
Valid documents may contain this logo:
Most HTML documents on the Web are
invalid: Lets check a few. Shall we!
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 32/61
SE 3316 © Jagath Samarabandu 32
Validation Errors
<html><body>
<table><b>123</i></table></body>
</html>
Check errors on following HTML markup
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 33/61
SE 3316 © Jagath Samarabandu 33
Reasons for Invalidity
Ignorance of the HTML standard
Lack of testing ”This page is optimized for the XYZ browser”
”This page is best viewed in 1024x768” Automatic tools generate invalid HTML output
Forgiving browsers try to interpret invalidinput
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 34/61
SE 3316 © Jagath Samarabandu 34
How Does This Look?
<h2>Lousy HTML</h1><li><a>This is not very</b> good.
<li><i>In fact, it is quite bad</em></ul>But the browser does <a naem="goof">something.
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 35/61
SE 3316 © Jagath Samarabandu 35
Problems with Invalidity
There are several different browsers
Each browsers has many differentimplementations
Each implementation must interpret invalidHTML
There are many arbitrary choices to make
As a result, the HTML standard has beenundermined
HTML renders differently for most clients
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 36/61
SE 3316 © Jagath Samarabandu 36
A Standard for Invalid HTML
The HTML Tidy tool tries to save the situation
Invalid HTML is transformed to (almost) validHTML
Still many arbitrary choices, but now weagree
<h2>Lousy HTML</h1><li><a>This is not very</b> good.
<li><i>In fact, it is quite bad</em></ul>But the browser does <a naem="goof">something.
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 37/61
SE 3316 © Jagath Samarabandu 37
HTML Recipes<h1>Rhubarb Cobbler</h1><h2>Wed, 4 Jun 95</h2>This recipe is suggested by Jane Dow.Rhubarb Cobbler made with bananas as the main sweetener.
It was delicious.
<table><tr><td> 2 1/2 cups <td> diced rhubarb<tr><td> 2 tablespoons <td> sugar<tr><td> 2 <td> fairly ripe bananas
<tr><td> 1/4 teaspoon <td> cinnamon<tr><td> dash of <td> nutmeg</table>
<i>Combine all and use as cobbler, pie, or crisp.</i><p>This recipe has 170 calories, 28% from fat,58% from carbohydrates, and 14% from protein.<p>Related recipes: <a href="#GardenQuiche">Garden Quiche</a>is also yummy.
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 38/61
SE 3316 © Jagath Samarabandu 38
Limitations of HTML
HTML is designed for hypertext, not for
recipes Structure and presentation is intertwined
HTML validation is less than recipe validation
HTML standards have been undermined
We need a special Recipe Markup Language!
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 39/61
SE 3316 © Jagath Samarabandu 39
Bytes vs. Characters
HTML files are represented as text files
A text file is logically a sequence ofcharacters but physically a sequence of bytes
Several mappings exist:− ASCII
− EBCDIC
− Unicode Unicode aims to cover all characters in all
past or present written languages
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 40/61
SE 3316 © Jagath Samarabandu 40
Unicode
Unicode Characters−
Mapping characters to numbers− Numbers are called “code points”
Glyphs− Pictorial representation of each code point− Handled by fonts
Encoding/Decoding− Convention for mapping code points to a serial bit
stream (“code units”) and vice versa− UTF-8 (8 bit code units), UTF-16 (16 bits)
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 41/61
SE 3316 © Jagath Samarabandu 41
Unicode Characters
A character is a symbol that appears in a text
−
letters of the alphabet− pictograms (like ©)
− accents
Unicode characters are abstract entities:− LATIN CAPITAL LETTER A (A)
− LATIN CAPITAL LETTER A WITH RING ABOVE (Å)
− HIRAGANA LETTER SA (さ)− SINHALA LETTER SA (ස)
− RUNIC LETTER THURISAZ THURS THORN ( )
සමරබ
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 42/61
SE 3316 © Jagath Samarabandu 42
Unicode Glyphs
A glyph is a graphical presentation
A typical example is: Å This may represent several characters:
− LATIN CAPITAL LETTER A WITH RING ABOVE
− ANGSTROM SIGN Or even a sequence of characters:
−
LATIN CAPITAL LETTER A− COMBINING RING ABOVE
Some characters even result in several glyphs
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 43/61
SE 3316 © Jagath Samarabandu 43
Unicode Code Points
A code point is a unique number assigned to everyUnicode character
Code points are between 0 and 1,114,112
Current standard (6.1) defines 110,116 code points
The character HIRAGANA LETTER SA is assignedthe code point 12,373
Written as U+3055 (4+ digit hex of code point)
Code point 0 through 127 coincide with ASCII
Some code point are never assigned
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 44/61
SE 3316 © Jagath Samarabandu 44
Unicode Character Encoding
A character encoding interprets a sequenceof bytes as a sequence of code points
The bytes are first parsed into code units
Code units have a fixed length
One or more code units may be required todenote a code point
Examples are UTF-8, UTF-16, UTF-32
UTF
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 45/61
SE 3316 © Jagath Samarabandu 45
UTF-8
A code unit is a single byte
A code point is from 1 to 4 code units Code units between 0 and 127 directly
represent the corresponding code points
110XXXXX indicates that 2 code units are used
1110XXXX indicates that 3 code units are used
11110XXX indicates that 4 code units are used The remaining code units looks like 10XXXXXX
UTF 8 E l
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 46/61
SE 3316 © Jagath Samarabandu 46
UTF-8 Example
11100011 10000001 10010101
11100011 10000001 10010101 11000001010101
12,373 HIRAGANA LETTER SA
UTF-8 files start with the special sequence“EF BB BF” (11101111 10111011 10111111)
What does this stand for?
UTF 16
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 47/61
SE 3316 © Jagath Samarabandu 47
UTF-16
A code unit consists of 2 bytes
Code points below 65,536 are in a single code unit Higher code points are represented as:
− 110110XXXXXXXXXX 110111XXXXXXXXXX
− (after subtracting 65,536) This makes sense because Unicode assign no code
points between the numbers:
− 1101100000000000 (55,296)
− and
− 1101111111111111 (57,343)
B t O d
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 48/61
SE 3316 © Jagath Samarabandu 48
Byte Order
When reading several bytes at once, we mustconsider the byte order of the architecture
UTF-16 starts text with the special code point:− 1111111011111111 (65,279, 0xFEFF)
− called zero-width non-breaking space The dual code point
− 1111111111111110 (65,534)− is never assigned
UTF-16LE and UTF-16BE may avoid this
UTF 16 E l
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 49/61
SE 3316 © Jagath Samarabandu 49
UTF-16 Example
11111110 11111111 00110000 01010101
11111110 11111111 00110000 01010101 00110000 01010101
12,373 HIRAGANA LETTER SA
U i d i J
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 50/61
SE 3316 © Jagath Samarabandu 50
Unicode in Java
Java represents characters as UTF-16 codeunits – not as UTF-16 code points!
A pragmatic choice to use only 16 bits
The length function on strings may be wrong
(see String.codePointCount) Some strings may represent illegal data
ISO 8859 1
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 51/61
SE 3316 © Jagath Samarabandu 51
ISO-8859-1
Another popular character encoding
Only 256 code points Single byte code units
Coincides with ASCII on code points 0-127 Cannot represent general Unicode
In all, there are hundreds of differentencodings...
Character Encodings in HTML
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 52/61
SE 3316 © Jagath Samarabandu 52
Character Encodings in HTML
The document may declare its own encoding:<meta http-equiv="Content-Type"
content="text/html; charset=UTF-8">
Rest of the document is assumed to be
encoded in UTF-8 Unicode characters may be represented
using numeric character references as:さ
How to Input Unicode
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 53/61
SE 3316 © Jagath Samarabandu 53
How to Input Unicode
Needs software support for
− localization (L10n),− multilingualization (m17n),
− internationalization (i18n)
Input methods− Allows entering characters not found on the input
device
− E.g. Chinese characters using a QWERTYkeyboard
Developing i18n Ready Software
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 54/61
SE 3316 © Jagath Samarabandu 54
Developing i18n Ready Software
Old days of using ascii strings are gone!
Represent all strings using Unicode Ensure that character data is decoded when
reading and encoded when writing
Use Unicode aware functions for manipulatingstrings
World Wide Web Consortium (W3C)
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 55/61
SE 3316 © Jagath Samarabandu 55
World Wide Web Consortium (W3C)
Develops HTML, CSS, and most Webtechnology
Founded in 1994
Has 380 companies and organizations as
members Is directed by Tim Berners-Lee
Located at MIT (US), Inria (France), Keiko(Japan)
W3C Players
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 56/61
SE 3316 © Jagath Samarabandu 56
W3C Players
Members ($50,000 per year)
Team Advisory board
Technical Architecture Group Working Groups
W3C Documents
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 57/61
SE 3316 © Jagath Samarabandu 57
W3C Documents
Working Drafts
Candidate Recommendations Proposed Recommendations
Recommendations Working Group Notes
Member Submissions
Staff Comments
Team Submissions
W3C Principles
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 58/61
SE 3316 © Jagath Samarabandu 58
W3C Principles
Consensus among members
Limited intellectual property rights Free Web access to technical reports (unlike
ISO)
WHATWG
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 59/61
SE 3316 © Jagath Samarabandu 59
WHATWG
Web Hypertext Application Technology WorkingGroup
Founded by individuals from Apple, Mozilla,Opera
Focuses on HTML5 standard. Differs slightly from W3C HTML5
Aims to have better involvement with browservendors- From WHATWG FAQ
Summary
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 60/61
SE 3316 © Jagath Samarabandu 60
Summary
History and structure of HTML, CSS,URL/URI/URN/IRI and Unicode
Survivor’s guides to these technologies
Limitations of HTML for general data (the
recipe collection example)
Essential Online Resources
7/18/2019 se3316a_2013_02_intro_html
http://slidepdf.com/reader/full/se3316a201302introhtml 61/61
SE 3316 © Jagath Samarabandu 61
Essential Online Resources
http://www.w3.org/TR/html5/
http://www.w3.org/Addressing/ http://www.w3.org/Style/CSS/
http://validator.w3.org/ http://www.w3.org/
http://www.whatwg.org/