Query Optimization Techniques and Performance Issues in XML and Parallel databases

26
Query Optimization Techniques and Performance Issues in XML and Parallel databases CSE 8330 Instructor: Dr.Margaret H. Dunham Presenter: Akshaya Aradhya

description

Query Optimization Techniques and Performance Issues in XML and Parallel databases. CSE 8330 Instructor: Dr.Margaret H. Dunham Presenter: Akshaya Aradhya. Topics to be covered. Introduction Query optimization in XML databases Query optimization in Parallel databases Comparison - PowerPoint PPT Presentation

Transcript of Query Optimization Techniques and Performance Issues in XML and Parallel databases

Page 1: Query Optimization Techniques and Performance Issues in XML and Parallel databases

Query Optimization Techniques and Performance Issues in XML

and Parallel databasesCSE 8330

Instructor: Dr.Margaret H. DunhamPresenter: Akshaya Aradhya

Page 2: Query Optimization Techniques and Performance Issues in XML and Parallel databases

Introduction Query optimization in XML databases Query optimization in Parallel databases Comparison Conclusion and Future work Bibliography

Topics to be covered

Page 3: Query Optimization Techniques and Performance Issues in XML and Parallel databases

XML is an emerging standard for exchanging, storing and representing the data

The data encoded in XML conforms to a DTD (Document Type Definition)

XML structure is intuitive and it is easier to interpret it using its tree like structure.

Introduction

Page 4: Query Optimization Techniques and Performance Issues in XML and Parallel databases

XML data model is very complex when compared to other relational models, which renders a larger search space for optimizing XML queries

In order to optimize XML queries, we need to study the equivalence issue related to the data and the query in order to find out the query equivalence before transforming the query

Introduction

Page 5: Query Optimization Techniques and Performance Issues in XML and Parallel databases

The techniques used to classify the XML query optimization techniques can be divided into groups based on the content and structure

Content based query optimization – Based on statistics or classification

Query execution can be improved by classifying the elements, which transform the query based on constraints which are obtained from the data

Introduction

Page 6: Query Optimization Techniques and Performance Issues in XML and Parallel databases

The application of parallel database systems can be observed in decision support systems and a wide range of modern database applications.

The machine architecture in parallel database systems are based on parallel dataflow architecture system, which make use of conventional, shared nothing hardware design.

For each relation in the database, the tuples are de-clustered (partitioned) across disk storage units, which are attached to individual processors.

Introduction

Page 7: Query Optimization Techniques and Performance Issues in XML and Parallel databases

There are two properties demonstrated by parallelism, which makes it very desirable.

The first one is called as linear scale-up, where the system can perform a task ‘k’ times the size in a particular span of time, after the number of processors are increased by ‘k’.

The second one is called as linear speedup where the response time is reduced by ‘k’ times if we increase the number of processors by ‘k’ times

Introduction

Page 8: Query Optimization Techniques and Performance Issues in XML and Parallel databases

During the query processing stage in parallel databases, parallelism can be exploited in three different ways.

In the independent parallelism technique, different processors can execute different queries in parallel if the query operators do not depend on each other.

By pipelining or by making use of inter-operator parallelism, the output of the producer to the consumer can be passed on in parallel by two or more operators in a producer consumer relationship.

Finally, in intra-operator or partitioned parallelism technique, copies of the same query operator can be run on multiple processors simultaneously, where each of them can be operated on a partition of the data.

Introduction

Page 9: Query Optimization Techniques and Performance Issues in XML and Parallel databases

ToXin indexing scheme was developed to overcome the limitation of applying optimization for path query processing.

This scheme was developed with the primary goal of exploiting the path structure of the XML databases in all the stages of query processing.

There are two types of index structures in Toxin called Value index and Path index

Optimization mechanism using ToXin tree

Page 10: Query Optimization Techniques and Performance Issues in XML and Parallel databases

Algorithm: ConstructIndexTreeOutput: Tree TConstructIndexTree()1. Perform a depth first traversal of the tree.2. For each visited edge 2.1 Check whether the corresponding index edge has been

added 2.1.1 For the current index edge of the XML element 2.1.1.1 Update the instance function in two

redundant hash tables representing forward and backward navigation tables

2.1.1.2 Add the parent node and child node 2.2 If it has been added already, skip to the next index edge3. Stop

Optimization mechanism using ToXin tree

Page 11: Query Optimization Techniques and Performance Issues in XML and Parallel databases

An input query is divided into a set of sub queries where each operation is evaluated separately, as a part of the query.

An effective execution order for these operations is obtained by creating evaluations for all the set of operations, which in turn helps in executing the queries faster.

The final result can be obtained by joining all the aggregation of the results together.

Optimization mechanism in Lore

Page 12: Query Optimization Techniques and Performance Issues in XML and Parallel databases

Algorithm: PlanSelectionAlgorithmInput: Input list (for the query)Output: Plan P

PlanSelectionAlgorithm (input list)1. Create a structure in order to track the binding variables2. while input list is not empty 2.1 For each element in the input list 2.1.1 Based on the current bound variables, find the

cheapest access method for the remaining steps 2.1.2 If the step has the least cost, mark the variables as

bound and add it to the plan P 2.1.3 Remove the chosen step3. Return the final plan P obtained from the previous steps

Optimization mechanism in Lore

Page 13: Query Optimization Techniques and Performance Issues in XML and Parallel databases

Using a set oriented algebraic technique named PAT algebra, a series of set related operations and rules are defined.

PAT expressions are obtained by transforming input queries, after checking for the correctness of their syntax.

Based on the relationship of elements in the DTD, the PAT expressions can be normalized with the help of the PAT algebra in order to get a new query.

Optimizing queries in XML structured document databases

Page 14: Query Optimization Techniques and Performance Issues in XML and Parallel databases

Query optimization based on Schema

Page 15: Query Optimization Techniques and Performance Issues in XML and Parallel databases

Query optimization by pruning and rewriting queries

Page 16: Query Optimization Techniques and Performance Issues in XML and Parallel databases

Query optimization by classification of elements

Page 17: Query Optimization Techniques and Performance Issues in XML and Parallel databases

Join Strategy Selection

Page 18: Query Optimization Techniques and Performance Issues in XML and Parallel databases

Optimal Serial Plan (in identical processors)

Page 19: Query Optimization Techniques and Performance Issues in XML and Parallel databases

Comparison between Relational Database Management System vs. XML Database System

Page 20: Query Optimization Techniques and Performance Issues in XML and Parallel databases
Page 21: Query Optimization Techniques and Performance Issues in XML and Parallel databases

Comparison of algorithms

Page 22: Query Optimization Techniques and Performance Issues in XML and Parallel databases

The tree generation algorithm and some of the optimal plan selection and generation algorithms run in polynomial time and hence, they need to be optimized to run in linear time.

PAT algebra is being extended to make it more suitable for query optimization. Frequency search operations heavily make use of the indexing techniques in PAT.

The future research will also be focused more towards generation and use of partially correlated sub-plans, which depend on bindings passed between portions of query plan.

When a significant number of paths pass through a small number of objects, a transformation which introduces a group-by clause can be useful.

Further examination is being conducted in order to implement the Toxin Graph and to check if the Toxin Tree can be extended to be used as an alternative to DOM for querying, updating and storing XML documents

Conclusion and Future work

Page 23: Query Optimization Techniques and Performance Issues in XML and Parallel databases

Value based grouping and join techniques are being investigated along with multi-way structural joins, new access methods for merged operators and several structural pattern techniques.

In addition to this, new optimization algorithms have to be implemented to improve caching in Web Service Management Systems, XQuery language constructs are to be optimized.

Cost based decisions are to be integrated in earlier stages of the query evaluation process and the cost model has to be refined in order model the CPU cost in a precise manner.

Conclusion and Future work

Page 24: Query Optimization Techniques and Performance Issues in XML and Parallel databases

[1] Dunren Che, Karl Aberer, and Tamer. 2006. Query optimization in XML structured-document databases. The VLDB Journal 15, 3 (September 2006), 263-289.

[2] Jason McHugh and Jennifer Widom. 1999. Query Optimization for XML. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB '99), Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, and Michael L. Brodie (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 315-326.

[3] Boag, S.; Berglund, A.; Chamberlin, D.; Siméon, J.; Kay, M.; Robie, J. & Fernández, M. F. (2007), 'XML Path Language (XPath) 2.0' , Technical report, W3C , http://www.w3.org/TR/2007/REC-xpath20-20070123/ .

[4] Haw, S.C and Rao, G.S.V.R.K., 2005. Query Optimization Techniques for XML Databases. International Journal of Information Technology, 2(1): 97 – 104.

[5] S. Groppe and S. Bottcher: “Schema-based Query Optimization for XQuery Queries”, Proceedings of the Advances in Databases and Information Systems 2005, Tallinn, Estonia, 2005.

[6] Mary F. Fernandez and Dan Suciu. 1998. Optimizing Regular Path Expressions Using Graph Schemas. In Proceedings of the Fourteenth International Conference on Data Engineering (ICDE '98). IEEE Computer Society, Washington, DC, USA, 14-23.

[7] Dung Xuan Thi Le, Stephane Bressan, David Taniar, and Wenny Rahayu. 2007. Semantic XPath query transformation: opportunities and performance. In Proceedings of the 12th international conference on Database systems for advanced applications (DASFAA'07), Ramamohanarao Kotagiri, P. Radha Krishna, Mukesh Mohania, and Ekawit Nantajeewarawat (Eds.). Springer-Verlag, Berlin, Heidelberg, 994-1000.

Bibliography

Page 25: Query Optimization Techniques and Performance Issues in XML and Parallel databases

[8] Atri Salminen and Frank Wm. Tompa:”Pat expressions: an algebra for text search”. In Acta Linguista Hungarica 41, pages 277 – 306, 1994.

[9] F.Rizzolo and A.Mendelzon. Indexing XML Data with ToXin. In Proc. 4th Int. Workshop on the Web and Database (in Conjunction with ACM SIGMOD), Santa Barbara, CA, May 2001.

[10] Jason McHugh and Jennifer Widom: “Query Optimization for XML”. In proceedings of the 25th Very Large Data Bases Conference, Edinburgh, Scotland, 1999.

[11] Wei Sun; Daxin Liu; Wansong Zhang; , "An efficient method for XML queries optimization based DTD abstraction and classification," Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on , vol.5, no., pp. 3926- 3929 Vol.5, 15-19 June 2004

[12] Alberto O. Mendelzon. “ToX: The Toronto XML Server”. Proc. Int. Database Engineering and Applications Symposium (IDEAS). IEEE CS Press. Edmonton, Canada, July 2002.

[13] J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: A Database Management System for Semistructured Data. SIGMOD Record, 26(3):54-66, September 1997.

[14] McHugh, J., Widom. J., 1999b. Optimizing branching path expressions. Technical Report, Stanford University.

[15] Ke Geng, Gillian Dobbie, and Yulong Meng. 2009. Survey of XML Semantic Query Optimization. In Proceedings of the 2009 Fourth International Conference on Internet Computing for Science and Engineering (ICICSE '09). IEEE Computer Society, Washington, DC, USA, 297-300.

[16] Tae-Sun Chung and Hyoung-Joo Kim. 2002. Extracting indexing information from XML DTDs. Inf. Process. Lett. 81, 2 (January 2002), 97-103.

[17] Wu, Y., Patel, J.M., Jagadish, H.V.: Structural join order selection for XML query optimization. In: ICDE, pp. 443-454. IEEE Computer Society, New York (2003)

Bibliography

Page 26: Query Optimization Techniques and Performance Issues in XML and Parallel databases

[18] Abdelkader Hameurlain, Franck Morvan: Evolution of Query Optimization Methods. T. Large-Scale Data- and Knowledge-Centered Systems 1: 211-242 (2009)

[19] Andreas M. Weiner, Theo Härder: An integrative approach to query optimization in native XML database management systems. IDEAS 2010: 64-74

[20] Amol Deshpande and Lisa Hellerstein. 2008. Flow Algorithms for Parallel Query Optimization. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering (ICDE '08). IEEE Computer Society, Washington, DC, USA, 754-763.

[21] S. M. Mahajan and V. P. Jadhav. 2011. A survey of issues of query optimization in parallel databases. In Proceedings of the International Conference & Workshop on Emerging Trends in Technology (ICWET '11). ACM, New York, NY, USA, 553-554.

[22] Sai Wu, Feng Li, Sharad Mehrotra, and Beng Chin Ooi. 2011. Query optimization for massively parallel data processing. In Proceedings of the 2nd ACM Symposium on Cloud Computing (SOCC '11). ACM, New York, NY, USA, , Article 12 , 13 pages.

[23] David J. DeWitt and Jim Gray. 1990. Parallel database systems: the future of database processing or a passing fad?. SIGMOD Rec. 19, 4, 104-112.

[24] Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2, 2 (August 2009), 1626-1629.

[25] Foto N. Afrati and Jeffrey D. Ullman. 2010. Optimizing joins in a map-reduce environment. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT '10), Ioana Manolescu, Stefano Spaccapietra, Jens Teubner, Masaru Kitsuregawa, Alain Leger, Felix Naumann, Anastasia Ailamaki, and Fatma Ozcan (Eds.). ACM, New York, NY, USA, 99-110.

[26] Utkarsh Srivastava, Kamesh Munagala, Jennifer Widom, and Rajeev Motwani. 2006. Query optimization over web services. In Proceedings of the 32nd international conference on Very large data bases (VLDB '06), Umeshwar Dayal, Khu-Yong Whang, David Lomet, Gustavo Alonso, Guy Lohman, Martin Kersten, Sang K. Cha, and Young-Kuk Kim (Eds.). VLDB Endowment 355-366.

[27] Mikael Fernandus Simalango: XML Query Processing and Query Languages: A Survey CoRR abs/1010.1147: (2010)

Bibliography