Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of...

Efficiently Publishing Relational Data as XML Documents

Jayavel Shanmugasundaram

University of Wisconsin-Madison/University of Wisconsin-Madison/IBM Almaden Research CenterIBM Almaden Research Center

Joint work with: Rimon BarrMichael CareyBruce LindsayHamid PiraheshBerthold ReinwaldEugene Shekita

Outline

• Why?

• How?

• Which?

• Hence

XML Example<department name=“Purchasing”>

</emplist>

<project> Internet </project>

<project> Recycling </project>

</projlist>

</department>

What is the big deal about XML?

• Elegantly models complex, hierarchical/ graph-structured data

• Domain-specific tags (unlike HTML)

• Simple!

Fast emerging as dominant standard for data exchange on the WWW

Why Relational Data?

• Most business data stored in relational databases

• Unlikely to change in the near future– Scalability, Reliability, Performance, Tools

Need efficient means to publish relational data as XML documents

Usage Scenario

Existing Database System

(RDBMS)

Application/User Query to produce XML Documents

XML Result (processed or

displayed in browser)

The Internet

Example Relational Schema

Department

DeptId DeptName

10 PurchasingProject

ProjId DeptId ProjName

888 10 Internet

795 10 Recycling

EmployeeEmpId DeptId EmpName

101 10 John

91 10 Mary

Salary

XML Representation<department name=“Purchasing”> <emplist> <employee> John </employee> <employee> Mary </employee> </emplist> <projlist> <project> Internet </project> <project> Recycling </project> </projlist></department>

Main Issues

• Relational data is flat, XML is a tagged graph

• How do we specify translation from flat model to a graph model?– A query language to map from relations to XML

• How do we transform flat representations to tagged nested representations?– Efficient implementation strategies

Outline

• Why?

• How?– Language?– Mechanism?

• Which?

• Hence

Transformation Languages• Two obvious choices:

– XML Query Language– SQL

Example Relational Schema

Department

DeptId DeptName

10 PurchasingProject

ProjId DeptId ProjName

888 10 Internet

795 10 Recycling

EmployeeEmpId DeptId EmpName

101 10 John

91 10 Mary

Salary

XMLQL: Default XML View

<row> <deptid>10</> <deptname>Purchasing</> </row>

</department>

</employee>

<row> <projid>888</> <deptid>10</> <projname>Internet</> </row>

<row> <projid>795</> <deptid>10</> <projname>Recycling</> </row>

</project>

</defaultview>

XMLQL: Query Over Default ViewWHERE <defaultview.department.row>

<deptid> $did </> <deptname> $dname </>

</> IN DefaultView

CONSTRUCT <department name=$dname>

</emplist>

</projlist> </>

{ WHERE <defaultview.employee.row>

<deptid> $did </> <empname> $ename </> </> IN DefaultView CONSTRUCT <employee> $ename </> }

{ WHERE <defaultview.project.row>

<deptid> $did </> <projname> $pname </> </> IN DefaultView CONSTRUCT <project> $pname </> }

XMLQL: Query Result<department name=“Purchasing”> <emplist> <employee> John </employee> <employee> Mary </employee> </emplist> <projlist> <project> Internet </project> <project> Recycling </project> </projlist></department>

XMLQL: Pros and Cons

• Pros:– Natural for XML users– Infrastructure to build hierarchies of XML views– One query language for XML and relational data

• Cons:– Ignores existing API (JDBC), tools, support– Need to mature new query language (aggregates etc.)

SQL: Key Ideas

• Sub-queries to specify nesting

• Scalar functions to specify tags/attributes– XML Constructors

• Aggregate functions to group child elements

SQL: Query to publish XML

Select DEPT(d.name,

<subquery to produce emplist>,

)From Department d

SQL: XML Constructor

Define XML Constructor DEPT(dname: varchar(20), emplist: xml, projlist: xml) As ( <department name=$dname> <emplist> $emplist </emplist> <projlist> $projlist </projlist></department>

Select DEPT(d.name,

<subquery to produce emplist>,

)From Department d

Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), <subquery to produce projlist>

)From Department d

SQL: XML Constructor

Define XML Constructor EMP(ename: varchar(20)) As (

<employee> <name> $ename </name></employee>

Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), <subquery to produce projlist>

)From Department d

Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), (Select XMLAGG(PROJ(p.name)) From Project p Where p.deptno = d.deptno) )From Department d

Query Result

</emplist>

<project> Internet </project>

<project> Recycling </project>

</projlist>

</department>

(<XML Result>)

SQL: Pros and Cons

• Pros:– Reuses SQL infrastructure/API– Natural for SQL users– Efficient execution inside relational engine

• Cons:– Limited support for XML View Composition

Outline

• Why?

• Which?

• Hence

Relations to XML: Issues

• Two main differences:– Nesting (structuring)– Tagging

• Space of alternatives:Late TaggingEarly Tagging

Late Structuring

Early StructuringInside Engine Inside Engine

Inside Engine

Outside Engine Outside Engine

Outside Engine

Stored Procedure Approach

• Issue queries for sub-structures and tag them

• Could be a Stored Procedure

DBMS EngineDepartment

Employee

Project

• Problem: Too many SQL queries!

(10, Purchasing)

(John)

(Mary)

(Internet)

(Recycling)

Early Tagging, Early Structuring, Outside Engine

Correlated CLOB Approach

• Problem: Correlated execution of sub-queries

Select DEPT(d.name, (Select XMLAGG(EMP(e.name)) From Employee e Where e.deptno = d.deptno), (Select XMLAGG(PROJ(p.name)) From Project p Where p.deptno = d.deptno) )From Department d

Early Tagging, Early Structuring, Inside Engine

De-Correlated CLOB Approach

• Problem: CLOBs during processing

With EmpStruct (deptname, empinfo) AS (

Select d.deptname,

XMLAGG(EMP(employee, e.empname))

From department d left join employee e

on d.deptid = e.deptid

Group By d.deptname)

With ProjStruct (deptname, projinfo) AS (

Select d.deptname,

XMLAGG(PROJ(employee, p.projname))

From department d left join project p

on d.deptid = e.deptid

Group By d.deptname)

Select DEPT(name, d1.empinfo, d2.projinfo))

From EmpStruct d1 full join ProjStruct d2

on d1.deptname = d2.deptname

Early Tagging, Early Structuring, Inside Engine

Late Tagging, Late Structuring• XML document content produced without

structure (in arbitrary order)

• Tagger enforces order as final step

Relational QueryProcessing

Unstructured content

TaggingResult XML Document

Redundant Relation Approach• How do we represent nested content as relations?

(10, Purchasing)

(10, Internet)

(10, Recycling)

(10, John)

(10, Mary) (Purchasing, John, Internet)

(Purchasing, John, Recycling)

(Purchasing, Mary, Internet)

(Purchasing, Mary, Recycling)

• Problem: Large relation due to data redundancy!

Late Tagging, Late Structuring

Outer Union Approach• How do we represent nested content as relations?

• Problem: Wide tuples (having many columns)

Department

Employee ProjectDepartment

Employee Project

(Purchasing, Internet)

(Purchasing, Recycling)

(Purchasing, John)

(Purchasing, Mary)

(10, Purchasing)

(Purchasing, null, Internet , 0)

(Purchasing, null, Recycling, 0)

(Purchasing, John, null , 1)

(Purchasing, Mary, null , 1)

Hash-based Tagger

• Results not structured early– In arbitrary order

• Tagger has to enforce order during tagging– Hash-based approach

• Inside/Outside engine tagger

• Problem: Requires memory for entire document

Late Tagging, Early Structuring• Structured XML document content produced

• Tagger just adds tags (constant space)

Relational QueryProcessing

Structured content

TaggingResult XML Document

Sorted Outer Union Approach

D E F G

A B n n E n n

A n C n n F n

A n C n n n G

Late Tagging, Early Structuring

A B n D n n n

Sort By: Aid, Bid, Cid

• Problem: Only partial ordering required

Constant Space Tagger

• Detects changes in XML document hierarchy

• Adds appropriate opening/closing tags

• Inside/outside engine

Classification of AlternativesLate TaggingEarly Tagging

LateStructuring

EarlyStructuring

Inside Engine

De-Correlated CLOB

Stored Procedure

Inside Engine

Sorted Outer Union(Tagging inside)

Sorted Outer Union(Tagging outside)

Unsorted Outer Union(Tagging inside)

Unsorted Outer Union(Tagging outside)

Correlated CLOB

Outline

• Why?

• Which?

• Hence

Performance Evaluation

TABLE000 TABLE001 TABLE011TABLE010

TABLE00 TABLE01

TABLE0

Query Depth

Query Fan Out

Database Size

Inside vs. Outside Engine

Query Fan Out

Stored Proc

CLOB-Corr

CLOB-DeCorr

Redundant R

Unsorted OU (Out)

Unsorted OU (In)

Sorted OU (Out)

Sorted OU (In)

Where Does Time Go?

101520253035

XML File

Tagging

Bind Out

Execution

Effect of Query Fan Out

Query Fan Out

Time (

in sec

CLOB-Corr

CLOB-DeCorr

Unsorted OU

Sorted OU

Effect of Query Depth

Query Depth

Time (

CLOB-Corr

CLOB-DeCorr

Unsorted OU

Sorted OU

Memory Considerations

• Sorted outer union more robust

• Relational sort highly scalable!

Outline

• Why?

• Which?

• Hence

Conclusion

• Publishing XML from relational sources important in Internet

• Language alternatives:– SQL based

– XML query language based

• Implementation Alternatives– Inside engine >> Outside engine

– Unsorted Outer Union : sufficient main memory

– Sorted Outer Union : otherwise

Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of...

Documents

Transcript of Efficiently Publishing Relational Data as XML Documents Jayavel Shanmugasundaram University of...

Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)

IBM Research-Almaden: San Jose, California

Almaden FC Handbook 2011

By, Anish Shanmugasundaram Yashwanth Sainath Jammi.

Loading a Cache with Query Results Laura Haas, IBM Almaden Donald Kossmann, Univ. Passau Ioana Ursu, IBM Almaden.

Revathi Shanmugasundaram and Ramesh K. Selvaraj Journal of Immunology Regulatory T Cell Properties of Chicken CD4+CD25+ Cells Revathi Shanmugasundaram and Ramesh K. Selvaraj Chicken

Early Childhood Recreation- Almaden Community Center

MASTER PLAN REPORT ALMADEN ROAD PARK - San … · MASTER PLAN REPORT ALMADEN ROAD PARK City of San José March 2008 . MASTER PLAN DOCUMENT ALMADEN ROAD PARK City of San José Master

15-1 COMPLETE BUSINESS STATISTICS by AMIR D. ACZEL & JAYAVEL SOUNDERPANDIAN 6 th edition (SIE)

6-1 COMPLETE BUSINESS STATISTICS by AMIR D. ACZEL & JAYAVEL SOUNDERPANDIAN 6 th edition (SIE)

Subramanyam Shanmugasundaram J.D.H. Keatinge ...IFPRI Discussion Paper 00922 November 2009 The Mungbean Transformation Diversifying Crops, Defeating Malnutrition Subramanyam Shanmugasundaram

Privacy Preserving OLAP Rakesh Agrawal, IBM Almaden Ramakrishnan Srikant, IBM Almaden Dilys Thomas, Stanford University.

Wineries Section - Almaden Valley Directory

· Printed by Shri. R. Jayavel at Sree Jayavel Press, 08/11, Gandhi Maidanam, Ammapettai, Salem-636 003. Editor : Shri. K. S. Jeevamani, Cell : 97891-65555 væm.b

Bridging Relational Technology and XML Jayavel Shanmugasundaram University of Wisconsin & IBM Almaden Research Center.

Saffron at IBM Almaden Cognitive Computing

CAPITAL PROGRAM SERVICES 5750 ALMADEN EXPRESSWAY …

Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.

Zouhal, Hassane; Hammami, Amri; Tijani, Jed M.; Jayavel ...

IBM Research – Almaden · 2019-06-24 · IBM Almaden Client Briefings Program Jcd 2019.05.10a IBM Research – Almaden 650 Harry Road (see below!) San Jose, CA 95120 Main lobby: