XML and the Semi-Structured Data Model
description
Transcript of XML and the Semi-Structured Data Model
1
XML and the Semi-Structured Data Model
2
Motivation
• We have seen that relational databases are very convenient to query. However:– There is a LOT of data not in relational
databases!!
• Perhaps the most widely accessed database is the web, and it certainly isn’t a relational database.
3
Documents Vs. Databases
Documents Databases
Paragraphs, Sentences Tables, tuples
Easy for people to understand
Easy for computers to understand
Static Dynamic
4
Querying the Web
• The web can be queried using a search engine, however, we can’t ask questions like:– What is the weather in Zanzibar today?– What is the lowest price for which a Jaguar is sold
on the web?
• Problems:– There are no facilities for asking complex
questions, such as aggregation of data– Words have overloaded meanings (Jaguar)
5
Understanding the Web
• In order to query the web, we must be able to understand it.
• 2 Computer Science Approaches:– Artificial Intelligence Approach– Database Approach
6
Artificial Intelligence Approach
“The web is unstructured and we must deal with it”
• Use techniques for machine learning to understand the web.
• Example: To understand the word “Jaguar” check if it appears on a page with the word car or automobile; or rather with jungle and Africa
• Problem: Such techniques tend to be inexact and have a large percentage of mistakes
7
Database Approach
“The web is unstructured and we will structure it”
• Sometimes problems that are very difficult can be solved easily by enforcing a standard
• Encourage the use of XML as a standard for data exchange on the web
8
Example XML Document<?xml version=“1.0”?>
<transaction>
<account>89-344</account>
<buy shares = “100”>
<ticker exch = “NASDAQ”>WEBM</ticker>
</buy>
<sell shares = “30”>
<ticker exch = “NYSE”>GE</ticker>
</sell>
</transaction>
Opening Tag
Attribute Name
Attribute Value
ElementClosing Tag
9
XML Representation of a Table<?xml version=“1.0”?>
<ROWSET>
<ROW num = “1” >
<ENAME>KING </ENAME>
<SAL>5000</SAL>
</ROW>
<ROW num = “2” >
<ENAME>SCOTT </ENAME>
<SAL>3000</SAL>
</ROW>
</ROWSET>
ENAME SAL
KING 5000
SCOTT 3000
10
Very Unstructured XML
<?xml version=“1.0”?>
<DamageReport>
The insured’s <Vehicle Make = “Volks”> Beetle </Vehicle> broke through the guard rail and plummeted into the ravine. The cause was determined to be <Cause>faulty brakes </Cause>. Amazingly there were no casualties.
</DamageReport>
11
XML Vs. HTML
• XML and HTML are brothers. They are both special cases of SGML.
• HTML has specific tag and attribute names. These are associated with a specific meaning
• XML can have any tag and attribute name. These are not associated with any meaning
• HTML is used to specify visual style• XML is used to specify meaning
12
Rules for Creating XML Documents
13
Rule 1 – XML Declaration
• An XML document should begin with an XML declaration. A simple declaration is:
<?xml version=“1.0”?>
Other things can be specified, such as
character encoding.
14
Rule 2 – Document Element
• Use exactly one top-level document element:
Example:<?xml version=“1.0”?>
<Question> This is legal </Question>
<?xml version=“1.0”?>
<Question> Is this legal? </Question>
<Answer> No. </Answer>
15
Rule 3 – Match Opening and Closing Tags
• XML is case sensitive. The following examples are all illegal
Example:
<Question> This is legal </QUESTION>
<Question> <B> Is this legal? </Question> </B>
16
Rule 4 – Comments
• Comments are between <!-- and --> characters. Comments can’t appear as attribute values or within a tag.
Example:<!-- This is a legal comment -->
<Question <!-- This is illegal -->>
Why is this illegal
<!-- This is a legal comment -->
</Question>
17
Rule 5 – Element Names
• Element and attribute names must be continuous sequences of letters or hyphens or underscores.
Example:Legal Names:
<_legal> <This-is-OK>
I Illegal Names: <2-Part-Question> <Two Part Question>
<Question 4You = “Yes”>
18
Rule 6 – Attribute Values
• Attribute values – go in opening tags.– should be enclosed by matching quotes (‘ or “)– should have only text and not tags
Legal Example:
<Question Poster = “Yitzchak”>Do you like XML? </Question>
<Answer Poster = ‘Yaakov’>I do.</Answer>
19
Rule 6 – Continued
Illegal Examples:
<Question Poster = “Yitzchak’>Do you like XML? </Question>
<Question>Do you like XML? </Question Poster = “Yitzchak”>
<Question Poster = “<first>Yitzchak</first>”>Do you like XML? </Question>
20
Rule 7 – Empty Elements
• Empty elements are elements that do not contain text or nested elements. They can be written in a compact syntax:
<Person First = “Shmuel” Last = “Levy”></Person>
is the same as
<Person First = “Shmuel” Last = “Levy” />
21
Abstract View of XML
22
A Different Data Model
Relational Semi-Structured
Abstract
Model
Sets of tuples
Labeled Directed Graph
Concrete
Model
Tables XML Documents
Standard
for
Storing Data
Data Exchange
23
An Example<?xml version=“1.0”?>
<transaction>
<account>89-344</account>
<buy shares = “100”>
<ticker exch = “NASDAQ”>WEBM</ticker>
</buy>
<sell shares = “30”>
<ticker exch = “NYSE”>GE</ticker>
</sell>
</transaction>
24
Corresponding Treetransaction
account
89-344
buy
ticker
shares
100
NASDAQ WEBM
exch
sell
ticker
shares
30
NYSE GE
exch
25
Using XML
• Quering XML: There are query languages that query XML and return XML. Examples: XQuery, XPath, SQL4X
• Displaying XML: An XML document can have an associated style-sheet which specifies how the document should be translated to HTML. Examples: CSS, XSL
26
Namespaces
• Namespaces are used to attach an accepted meaning to a set of tags.
• Syntax for defining a namespace
<SomeElement xmlns:prefixname=“namespaceURL” >
the namespace will be recognized within the SomeElement element.
27
Example Namespace
<irs:Form id=“1040” xmlns:irs=“http://www.irs.gov”><irs:Name>Tina Wells</irs:Name><PhoneNumber>03-5655666</PhoneNumber>
</irs:Name>
• In order for the namespace to be recognized in all elements, the declaration should be in the document element
28
XSQL Pages
29
What are XSQL Pages?
• XSQL pages are XML documents that have SQL queries embedded in them.
• When a user requests to view an XSQL page, the web server:1. Dynamically computes the embedded queries2. Translates the query results into XML3. Inserts the results in the proper places in the
document4. Transforms the result to HTML if a stylesheet is
given
30
A Simple Example
<?xml version=“1.0”?>
<xsql:query connection=“scott” xmlns:xsql=“urn:oracle-xsql”>
SELECT sname
FROM Sailors
</xsql:query>You should specify the connection and the namespace on the document element
31
Page Seen in Browser
<?xml version=“1.0”?>
<ROWSET>
<ROW num = “1” >
<SNAME>Rusty</SNAME>
</ROW>
<ROW num = “2” >
<SNAME>Justin </SNAME>
</ROW>
</ROWSET>
• A ROWSET element encloses query result
• Each ROW element encloses each row
• Each column in the row is within a tag with its column’s name
32
Another Example
<?xml version=“1.0”?>
<RESULTS connection=“scott” xmlns:xsql=“urn:oracle-xsql”>
Here is something interesting:
<xsql:query>
SELECT sname, age + rating as ra
FROM Sailors
WHERE sid = 13
</xsql:query>
</RESULTS>
33
Resulting Document
<?xml version=“1.0”?>
<RESULTS>
Here is something interesting:
<ROWSET>
<ROW num = “1” >
<SNAME>Rusty</SNAME>
<RA>55</RA>
</ROW>
</ROWSET>
</RESULTS>
34
Using Parameters
• Your page can use parameters. The value of a parameter param is determined in the following fashion:1. The value of the URL parameter param if
supplied2. The value of the HTTP session object param if
supplied3. The value of the closest ancestor’s attribute
named param, if present4. An empty string
35
Example with Parameters
<?xml version=“1.0”?>
<xsql:query connection=“scott” xmlns:xsql=“urn:oracle-xsql”
sname = “Joe”>
SELECT *
FROM Sailors
WHERE sname = ‘{@sname}’
</xsql:query>
36
Evaluating the Query
• Suppose the XSQL document is at:
http://cs.huji.ac.il/~db/query1.xsql• Then, requesting the url:
http://cs.huji.ac.il/~db/query1.xsql?sname=Jim
will return all the details of Jim.• Requesting
http://cs.huji.ac.il/~db/query1.xsql
will return all the details of Joe (the defualt value)
37
A Strange Example
<?xml version=“1.0”?>
<xsql:query connection=“scott” xmlns:xsql=“urn:oracle-xsql”
select = “*” where = “1=1” order=“1”>
SELECT {@select}
FROM {@from}
WHERE {@where}
ORDER BY {@order}
</xsql:query>
38
Customizing Results
• The query tag can have different attributes that customize the query results. Here are some of the important options:– max-rows: The maximum number of rows returned– skip-rows: The number of rows to skip before
returning rows– rowset-element: The name of the rowset element– row-element: The name of the row element
39
Customizing Results
<?xml version=“1.0”?>
<xsql:query connection=“scott” xmlns:xsql=“urn:oracle-xsql”
skip = “0” max-rows=“2” skip-rows={@skip} >
SELECT *
FROM Program
ORDER BY url
</xsql:query>
By calling the same page with different values for skip, we can see the different programs
40
Notes
• An XSQL document can have many queries.• The queries can appear within arbitrary XML
tags
• We can produce XML that has a more nested structure using the CURSOR function...
41
Remembering Subqueries in the SELECT Clause
• Subqueries in the SELECT clause must return a single value. What do we do if we want for each boat, all the sailors who reserved the boat?
• We want each bid to be associated with a table of Sailors data!
42
Using the CURSOR Function
<?xml version=“1.0”?>
<xsql:query connection=“scott” xmlns:xsql=“urn:oracle-xsql”>SELECT bid,
CURSOR(SELECT sid, sname FROM Sailors S, Reserves R WHERE S.sid = R.sid
and R.bid = B.bid) as Reservers
FROM Boats B;</xsql:query>
43
<?xml version=“1.0”?>
<ROWSET>
<ROW num = “1” >
<BID>113</BID>
<RESERVERS>
<RESERVERS_ROW num = “1” >
<SID> 13 </SID>
<SNAME> Joe </SNAME>
</RESERVERS_ROW>
<RESERVERS_ROW num = “2” >
.... </RESERVERS_ROW>
</RESERVERS>
</ROW>
</ROWSET>
Note use of select query alias instead of inner row set and row tags.
44
Setting Page Level Parameters
• The following statement defines a parameter pname. The value of pname is the value in the first column of the first row of the query
• The variable pname will be recognized in the page
<xsql:set-page-param name=“pname”>
SELECT Statement
</xsql:set-page-param>
45
Example<?xml version=“1.0”?>
<page connection=“scott” xmlns:xsql=“urn:oracle-xsql”>
<xsql:set-page-param name=“num-stories”> SELECT headings_num
FROM user_prefs WHERE userid={@user}
</xsql:set-page-param>
<xsql:query max-rows={@num-stories} > SELECT title, url FROM latest_news
</xsql:query>
</page>
46
Another Way to Define a Page Level Parameter
• Page level parameters can also be set with the statement:
<xsql:set-page-param name=“pname” value=“val”/>
• For example:
<xsql:set-page-param name=“num-stories” value=“10”/>
47
Additional Options
• The set-page-param element can have the following attributes:– only-if-unset: If the value is “yes” then the
parameter will be set only if it has no value– ignore-empty-value: If value is “yes” then the
parameter will be set only if its value will not be an empty string
48
Setting Cookie Values
• The following statement defines a parameter pname. The value of pname is the value in the first column of the first row of the query
• The variable pname will be recognized until the cookie expires
<xsql:set-cookie name=“pname”> SELECT Statement
</xsql:set-cookie>
49
Additional Attributes for Set-Cookie
• The set-cookie element can have the following attributes:– max-age: The number of seconds before
the cookie expires (defaults to expire when user exits current browser instance)
– only-if-unset– ignore-empty-value
50
Example
<?xml version=“1.0”?>
<page connection=“scott” xmlns:xsql=“urn:oracle-xsql”>
<xsql:set-cookie name=“siteuser” max-age=“31536000”
only-if-unset=“yes” ignore-empty-value=“yes”> SELECT username
FROM site_users WHERE username= ‘{@username}’ and password=‘{@password}’
</xsql:set-cookie>
<!-- Other Actions Here -->
</page>
51
DML or PL/SQL• We can do DML (update, insert, delete) or call PL/SQL
procedures with the following basic syntax:
<xsql:dml> DML Statement
</xsql:dml>
or
<xsql:dml>BEGIN
Any valid PL/SQL StatementEND;
</xsql:dml>
52
Example<xsql:dml>
INSERT INTO page_requests_log(page,userid) VALUES(‘page12.xsql’, ‘{@siteuser}’)
</xsql:dml>
If successful the following element is added to the page:
<xsql-status action=“xsql:dml” rows=“n” />
Otherwise, an error element is added:<xsql-error action=“xsql:dml”> ...</xsql-error>