AN EFFICIENT MULTI JOIN QUERY OPTIMIZATION FOR...

AN EFFICIENT MULTI JOIN QUERY OPTIMIZATION FOR RELATIONAL

DATABASE MANAGEMENT SYSTEM USING

SWARM INTELLIGENCE APPROACHES

AHMED KHALAF ZAGER ALSAEDI

A thesis submitted in

Fulfillment of the requirements for the award of the

Doctor of Philosophy in Information Technology

Faculty of Computer Science and Information Technology

Universiti Tun Hussein Onn Malaysia

JUNE 2016

iii

DEDICATION

DEDICATION

To my lord Allah, my Creator teacher and master messenger, Mohamed bin Abdullah

(Peace be upon him) my beloved mother, my beloved family, wife and children, all

the people in my life who touch my heart, I dedicate this research.

iv

ACKNOWLEDGEMENT

In the name of Allah, the beneficent, the merciful

I would like to express my deepest appreciation to my supervisor, Prof. Dr. Hajah

Rozaida Bt. Ghazali. Without her guidance and persist help, my thesis would not have

been finished. During the last few years, she has spent countless hours to patiently

guide me to build interesting ideas, strengthen the algorithms and improve the writings.

As a supervisor, she shows her wisdom, insights, wide knowledge and conscientious

attitude. All of these sets me a good example to become a good researcher.

I would like to thank all my friends in the database group who have made my

Ph.D. life more colorful. I would like to express my sincere gratitude to Prof. Dr.

Mustafa Mat Deris who helps me in methodology research work and make me capable

of achieving this research work.

Finally, I would like to thank my mother Rashida, without her continuous

support and encouragement I never would have been able to achieve my goals and

every decision I made during my Ph.D. life.

v

ABSTRACT

Currently, it is fairly obvious that the Multi Join Query Optimization (MJQO) is

becoming the centre of attention in the context of Database Management System

(DBMS). The functions consist of combination of data from multiple tables, reducing

the number of needed queries, optimizing the Query Execution Plan (QEP), and

moving processing abounded database servers to enhance both data integrity and

performance. MJQO is an optimization task, which serves to locate the optimal QEP

of a RDBMS in query processing. A major problem associated with RDBMS is the

fact that they are still unable to fully meet the demands of big data. The majority of

MJQO techniques encompass solution space at an extremely reduced pace. Many

queries attempted to gather information from multiple sites or correlations, while every

relation are compelled to answer these query via their limited resources. This lead to

the access of data from many locations that are limited in their memory retention

capabilities, which inevitably increase the size of the database, the number of the join,

and Query Execution Time (QET). In order to eschew trapping and slow coverage

difficulties in the quest to discover the optimal QEP and slow query execution time,

this work proposes a total of three optimization algorithm that are based on Particle

Swarm Optimization (PSO), Ant Colony Optimization (ACO), and Two-Phase

Artificial Bee Colony (TPAPC) to solve the optimization problem in RDBMS

Framework. The TPABC algorithm can be utilized to solve MJQO problems via

simulation and increasing exploration and exploitation whilst balancing them for

optimal results from giving queries. A directed acyclic graph, based on materialized

query graph, aids in the optimization of algorithms and solving MJQO by removing

non-promising QEP, which decreases the QEP combination space. Finally,

experimental results demonstrate that the performance of TPABC, when compared to

PSO, ACO, and native technique in the context of computational time, is very

promising, which is indicative of the fact that the TPABC algorithm is capable of

solving MJQO problems in shorter amounts of time and at lower costs compared to

other approaches.

ABSTRAK

vi

Sehingga kini, jelas bahawa Pengoptimuman Pertanyaan Gabungan Berganda (MJQO)

telah mendapat banyak perhatian dalam bidang Sistem Pengurusan Pangkalan Data

(DBMS). Fungsinya terdiri daripada gabungan data daripada jadual berganda,

pengurangan bilangan pertanyaan yang diperlukan, mengoptimumkan Rancangan

Pelaksanaan Pertanyaan (QEP) dan pemindahan pemprosesan pangkalan data pelayan

yang banyak untuk meningkatkan integriti dan prestasi data. MJQO adalah salah satu

tugas pengoptimuman, ia menggambarkan pencarian QEP yang optimum bagi DBMS

dalam pemprosesan pertanyaan. Walau bagaimanapun, penyelesaian kebanyakan teknik

MJQO diperoleh dalam kadar yang sangat perlahan. Oleh itu, untuk mengatasi masalah

terperangkap, masalah capaian perlahan dalam pencarian QEP yang optimum dan masa

pelaksanaan pertanyaan yang perlahan, kajian ini mencadagkan penambahbaikan tiga

algoritma pengoptimuman. MJQO yang ditambahbaik diinspirasikan daripada

Pengoptimuman Kawanan Zarah (PSO), Pengoptimuman Koloni Semut (ACO) dan dua

fasa perilaku Koloni Lebah Buatan (ABC) telah digunakan untuk menyelesaikan masalah

dalam Rangka Kerja RDBMS. Objektif utama kajian ini adalah untuk mengoptimumkan

QEP dan mengurangkan Masa Pelaksanaan Pertanyaan (QET) dalam RDBMS dengan

menggunakan pendekatan kecerdasan kawanan yang diinspirasikan daripada tiga

algoritma pengoptimuman, ABC, PSO dan ACO. Oleh yang demikian, Dua Fasa

Algoritma Koloni Lebah Buatan yang ditambahbaik (TPABC) digunakan untuk

menyelesaikan masalah MJQO dengan simulasi, peningkatan eksploitasi, mutu pencarian

dan memberi keseimbangan bagi mendapatkan hasil yang optimum dengan pertanyaan

yang telah ditetapkan. Struktur grafik diwakili oleh graf berkitar terarah berdasarkan

kenyataan graf pertanyaan, bagi membantu algoritma pengoptimuman dalam

menyelesaikan masalah MJQO, QEP yang tidak sesuai telah dipangkas, dengan itu, ia

dapat mengurangkan ruang kombinasi QEP. Akhir sekali, hasil eksperimen menunjukkan

bahawa prestasi TPABC berbanding PSO, ACO dan teknik naif dari segi pengiraan masa,

sangat memberangsangkan dan ini menunjukkan bahawa algoritma TPABC dapat

menyelesaikan masalah MJQO dalam masa yang singkat pada kos yang lebih rendah

berbanding teknik lain.

vii

TABLE OF CONTENTS

TITLE i

DECLARATION ii

DEDICATION iii

ACKNOWLEDGEMENTS iv

ABSTRACT v

ABSTRAK vi

TABLE OF CONTENTS vii

LIST OF PUBLICATIONS xi

LIST OF TABLES xiii

LIST OF FIGURES xvi

LIST OF SYMBOLS AND ABBREVIATIONS xx

LIST OF APPENDICES xxii

CHAPTER 1 INTRODUCTION 1

1.1 Research Background 1

1.2 Research Problems 4

1.3 Aims of Research 8

1.4 Research Objective 8

viii

1.5 Significance of Research Contribution 8

1.6 Scope of Research 9

1.7 Thesis Organization 10

CHAPTER 2 LITERATURE REVIEW

11

2.1 Introduction 11

2.2 Advantages of Database Management System 11

2.3 Query Optimization 13

2.3.1 Optimization in RDBMS 14

2.3.2 Architecture of Query Optimizer 16

2.4 Joint Methods 19

2.4.1 Natural Joint 20

2.4.2 Outer Joint 21

2.4.3 Left Joint 23

2.4.4 Right-Quter Joint 24

2.5 MJQO in RDBMS 24

2.6 Advantage of Multi-Joint Query Optimization 25

2.7 Techniques for MJQO in RDBMS 27

2.8 Manipulation Database: SQL 29

2.9 Swarm Intelligence Overview 31

2.10 Exploration and Exploitation Properties in SI 33

2.11 Swarm Intelligent Algorithms 35

2.11.1 Ant Colony Optimization 35

2.11.2 Particle Swarm Optimization (PSO) 38

2.11.3 Artificial Bee Colony Algorithm 40

2.11.4 Behavior of Real Bees 41

2.11.5 Modified Versions of ABC 46

2.12 Application of Swarm Intelligence in MJQO 50

2.13 Scenario leading to The Research Framework 52

2.14 Chapter Summary 56

ix

CHAPTER 3 RESEARCH METHODOLOGY 57

3.1 Introduction 57

3.2 Research Methodology Framework 58

3.3 Database 60

3.4 The Design of MJQO 60

3.5 Complexity of MJQO problem 62

3.5.1 Base line Joint Enumeration Algorithms 63

3.5.2 Bottom- up and Enumerations 68

3.5.3 Comparison of DPsize and DPset 69

3.6 Proposed MJQO Based on Swarm Intelligent Approaches 69

3.6.1 Artificial Bee Colony (ABC) 70

3.6.2 Practice Swarm Optimization (PSO) 75

3.6.3 Ant Colony Optimization (ACO) 78

3.7 The standard Test Function Performance

of SI Approaches 81

3.8 Multi-Joint Query Optimization Techniques 90

3.9 Graphical Representation 91

3.10 Multi -View Processing plan 97


CHAPTER 4 THE PROPOSED IMPPROVE ABC ALGORITHM 103

4.1 Introduction 103

4.2 Notation 105

4.3 Artificial Bee Colony Algorithm 105

4.4 Improved ABC Algorithm for MJQO Problem 106

4.4.1 Subset Function 107

4.4.2 QEP Function 107

4.4.3 Cost Estimate Function 108

4.4.4 QET Function 109

4.5 Proposed Two Phase Artificial Bees Colony 110

4.5.1 First phase (Employ Bee) 113

4.5.2 Second phase (Onlooker Bee) 121

4.6 Pruning Technique 124

x

4.7 Optimization 126


CHAPTER 5 SIMULATION RESULT 128


5.2 QET in RDBMS based on My SQL Server 129

5.2.1 Experiment One 130

5.2.2 Experiment Two 131

5.2.3 Experiment Three 133

5.2.4 Experiment Four 135

5.2.5 Experiment Five 137

5.3 Optimized and Unoptimized Query effect 139

5.4 Time Complexity 141

5.5 Standard Test Function Performance of SI Approaches 147

5.6 MJQO with proposed Optimization Algorithm Two phase

ABC150

5.6.1 The effect on Number of Queries 151

5.6.2 The effect on Number of Data size 153

5.7 Effectiveness of ABC Algorithm Over Naïve Heuristic

Algorithm 154

5.8 Efficiency of Two phase ABC Algorithm 156

5.9 Optimization Using TPABC Algorithm for Evaluation Time

158

5.10 Discussions 160


CHAPTER 6 CONCLUSION AND RECOMMENDATIONS 162


6.2 Summary of Findings 163

6.3 Contribution of the Research 166

6.4 Recommendation and Futurs work 167

REFERENCES 169

APPENDIX 186

xi

LIST OF PUBLICATIONS

Journals:

(i) Ahmed Khalaf Zager, Rozaida Ghazali and Mustafa Mat Deris," An

Efficient Multi Join Query Optimization for DBMS using Swarm Intelligent

Approach”, Publisher in IEEE DOI: 10.1109/7077312, 8-11 Dec. I: 10.1109 /

WICT.2014. 7077312.

(ii) Ahmed Khalaf Zager, Rozaida Ghazali and Mustafa Mat Deris,"

Materializing multi join query optimization for DBMS using swarm intelligent

approach" IJCISIM,ISSN 2150-7988, Published in International Journal of

Computer Information Systems and Industrial Management Applications.

ISSN1507988Volume7,(2015),pp.074.083©MIRLabs,www.mirlabs.net/ijcisi

m/index.htm.special issues.

(iii) Ahmed Khalaf Zager, Rozaida Ghazali and Mustafa Mat Deris

“Materialized View Selection for Query Optimization in Data Warehouse

System Using Heuristic Approaches”. Published in Journal of Next Generation

Information Technology, Vol. 6, No. 3, pp. 13 ~ 24, 201, 2015, and (Scopus).

(iv) Ahmed Khalaf Zager, Rozaida Ghazali and Mustafa Mat Deris," Improved

MJQO for DBMS using swarm intelligent approach" Advance Science Letter

, Volume 20, Number 10/11/12 American Scientific Publishers. Publication

type: Journals. ISSN: 19366612, 19367317, 2016, and (Scopus).

http://www.scimagojr.com/journalsearch.php?q=American%20Scientific%20Publishers&tip=pub

xii

Proceeding:

(i) Ahmed Khalaf Zager, Rozaida Ghazali and Mustafa Mat Deris,” An

Efficient Multi Join Query Optimization for RDBMS using Swarm Intelligent

Approach proceeding in Fourth World Congress on Information and

Communication Technologies , WICT 2014 (December 08-10, 2014 in

Malacca, Malaysia).

(ii) Ahmed Khalaf Zager, Rozaida Ghazali and Mustafa Mat Deris,” An

Efficient Multi Join Query Optimization for Relational Database Management

System Using Two Phase Artificial Bess Colony Algorithm” processing in

IVIC'15 - 4th International Visual Informatics Conference held at Hotel Bangi-

Putrajaya, Kuala Lumpur in 17-19 November, Advances in Visual Informatics

Volume 9429 of the series Lecture Notes in Computer Science pp 213-226.

Date: (LNCS- 2015).

(iii) Ahmed Khalaf Zager, Rozaida Ghazali and Mustafa Mat Deris,” Query Optimization for RDBMS using Swarm Intelligence Approaches ”,

International Symposium of Information and Internet Technology”,

MALTESAS conferences in 26-28 January held at Melaka, Malaysia in (2016).

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=0ahUKEwjZ14ra09rKAhVHGI4KHVOhBCQQFggwMAI&url=http%3A%2F%2Fwww.ivic.org.my%2F&usg=AFQjCNHadc1XgGPoQVvHvaLFnNw-3vAlGg

http://link.springer.com/book/10.1007/978-3-319-25939-0

http://link.springer.com/bookseries/558

xiii

LIST OF TABLES

2.1 Enroll table for natural join 21

2.2 Student table for natural join 21

2.3 Result Example Tow join 21

2.4 Student table for outer join 22

2.5 Faculty table outer join 22

2.6 Outer join for two table 22

2.7 Student table for left join 23

2.8 Faculty table for left join 23

2.9 Left join for two tables 23

2.10 Student table for Equijoin 24

2.11 Faculty table for Equijoin 24

2.12 Right outer Equijoin for two table 24

2.13 Student table for query 1 29

2.14 Result of query 1 30

xiv

2.15 Student table for query 2 30

2.16 Enroll table for query 2 30

2.17 Result for query 2 30

2.18 Swarm intelligent techniuques 35

2.19 An overview of ABC based on hony bees behivoer 49

3.1 Comparision of MJQO problem Based on ∣P∣, ∣T∣ 62

3.2 An example of particle Coding 77

4.1 Simple notation used in this chapter 105

4.2 Running example of queries and planes 112

4.3 An example illustrate the QET enumerate algorithm 119

5.1 Running queries for experiment one 130

5.2 QET and field number for experiment one 131

5.3 Running queries for experiment two 132

5.4 QET and Field Number for Experiment Two 133

5.5. Running queries for Experiment Three 134

5.6 QET and field number for Experiment three 135

xv

5.7 Running queries for experiment four 136

5.8 Query in single and multi-join to experiment four 136

5.9 Running queries for experiment five 138

5.10 QET and field number for experiment five 138

5.11 Running example of Query optimization 140

5.12 Improvement facto DPopt over DP set 141

5.13 Optimization Time in chain queris 142

5.14 Optimization Time in cycle queris 143

5.15 Time optimization for star queris 145

5.16 Time optimization for clique queris 146

5.17 The resulted obtain by PSO, ACO and ABC algorithm 47

5.18 main of best function values obtain for 50 cycle by ABC

Algorithm under different colony size 148

5.19 QET based on four optimization algorithm 152

5.20 Comparison of two algorithm combined with two technique 153

5.21 QET for two phase ABC 159

xvi

LIST OF FIGURES

1.1 Multi Join Query Optimization problem 7

2.1 Execuating SQL Queries in RDBMS 15

2.2 Sampled architcure of the query optimizatin in DBMS 16

2.3 QEP for various instancess of atemplate query 18

2.4 Swarm Intelligent Capability and Benefit 32

2.5 Ant Colony Optimization ACO 36

2.6 working Ant Colony Optimization 37

2.7 Pesudocode for ACO 37

2.8 Working of Particles Swarm Optimization 39

2.9 Pesudocode for PSO 40

2.10 Pesudocode for ABC 45

2.11 Senario leading to the reserch framework 55

3.1 Research Methodology Framework 59

3.2 Example of query graph types 61

3.3 Bottom-up order : DPSize 64

3.4 Bottom-up order : DPSet 66

xvii

3.5 Bottom-up enumaration : DPoptimization 68

3.6 Example of join processing tree 71

3.6 (a) Left deep tree 71

3.6 (b) Right deep tree 71

3.6 (c) Bush deep tree 71

3.7 Join Operation Bettween two Relations 72

3.8 Simulation of the Bees Behaviour with MJQO 75

3.9 ACO for MJQO 80

3.10 Schafer Function 82

3.11 Evaluation of Mean Best Values for Schafer Function 83

3.12 Source code of schafer function 83

3.13 Sphere Function 84

3.14 Evaluation of Mean Best Values for Sphere Function 84

3.15 Source code of Sphere Function 85

3.16 Griewank Function 85

3.17 Evaluation of Mean Best Values for Griewank Function 86

3.18 Source code of Griewank 87

3.19 Rastrigin Function 87

xviii

3.20 Evaluation of Mean Best values for Rastrigin Function 88

3.21 Source code of Rosanbork Function 88

3.22 Rosanbork Function 89

3.23 Evaluation of Mean Best values for Rosanbork function 89

3.24 Source code of Rosanbork Function 90

3.25 Query Evaluation Based Swrm Intelligent Technique 91

3.26 (a) Initial Graph 92

3.26 (b) First Iteration 93

3.26 (c) Second Iteration 94

3.27 Remove invalid edges 95

3.28 Pruning Technique 96

3.29 Example Queries 98

3.30 (a) Query graph for query 1

98

3.30 (b) Query graph for query 2 99

3.30 (c) Apply MVPP merged plan for queries 99

3.31 MVPP merged plan for queries 100

4.1 Simple Examples of MJQO Based on Proposed algorithm 104

4.2 Flowchart of Proposed TPABC Algorithm 111

xix

4.3 Phase Two with Pruning Technique 126

5.1 Effect Field Number for Experment1 131

5.2 Effect of Field Number for Experment 2 133

5.3 Effect of Field Number for Experment 3 135

5.4 Effects of Field Number on QET 137

5.5 Effect of Culumns Number on QET 139

5.6 Effect of optimizing Join Queries on QET 140

5.7 Chain Query for Set Tables 142

5.8 Relative Performance of Chain Query 143

5.9 Cycle Queries for Set Tables 143

5.10 Relative Performance for Cycle Queries 144

5.11 Star Queries for Set Tables 144

5.12 Relative Performance for Star Queries 145

5.13 Clique Queries for Set Tables 146

5.14 Relative Performance for Clique Query 146

5.15 The Result Obtained by PSO, ACO, and ABC Algorithms 148

5.16 Evalution of Mean Best Values for Schaffer Function 149

5.17 Evolution of mean best values for Sphere Function 150

xx

5.18 Effectiveness of optimization algorithm 152

5.19 Accuracies of Different Optimization Algorithms 153

5.20 Effect of Data Size 154

5.21 Two-Combined ABC Techniques 156

5.22 Effects of Number of Queries on QET 157

5.23 Effect of Number of Relation on QET 157

5.24 Optimization Time for Two-phase ABC 159

xxi

LIST OF SYMBOLS AND ABBREVIATIONS

MJQO - Multi join query optimization

SJQO - Single Join Query Optimization

RDBMS - Relation database management system

QET - Query execution time

QEP - Query execution plane

ABC - Artificial bee’s colony

PSO - Particle swarm optimization

ACO - Ant colony optimization

NT - Native Technique

N (Ri) - Set of neighbors for a relation Ri ∈ R w.r.t. G

N(S) - Set of neighbors for a set of relations S ⊆ R w.r.t. G

Min(S) - Relation with the smallest subscript index

In a set of relations S

TPABC - Two-Phase Artificial Bees Colony Algorithm

ACOMJQO - Ant Colony Optimization for MJQO

Ci - Set of connected subsets of R with a cardinality of i

2𝑝𝑠𝑘 - Set of k-way partitions of a connected subset S

𝒑𝒔 - Set of partitions of a connected subset S

P - Set of partitions of all the connected subsets in C

𝑻𝒔 - Multiset of connected subsets in all partitions in PS

T - Multiset of connected subsets in all partitions in P

𝑰𝒔 - Set of interesting plans for a connected subset S

CSE (QEP′) - Set of CSEs of a plan QET′ w.r.t. Q

xxii

Cost (QEP′) - Cost of a plan QET′

JoinExp (QEP′) - Join expression associated with a plan QEP

CSE (QEP) - Set of CSEs of Query Execution Plane

R = {R0, · · ·, Rn−1} - Set of relations in Q

G = (V, E) - Query graph for Q

C=⋃ 𝐶𝑖𝑛𝑖=2 - Set of connected subsets of R with a cardinality

of at least2

Q = {𝑸𝒊, · · ·, 𝑸𝒏} - Set of relations in Q

𝑼𝒊 = {𝑼𝒊𝟏, · · ·, 𝑼𝒊∣𝒖𝟏∣} - Set of all the possible plans for Qi

𝑾𝒊 = {𝑾𝒊𝟏, · · ·, 𝑾𝒊∣𝒘𝒊∣ } - Set of all the possible plans for Qi

xxiii

LIST OF APPENDICES

A: System analysis and configuration database 187

B: Querying from Multiple Tables 200

C: Source Code 207

xxiv

1

2CHAPTER 1

3INTRODUCTION

3.1 Research Background

A database management system (DBMS) is a computer software application that

interacts with the user, other applications, and the database itself to capture and analyze

data. A general-purpose DBMS is designed to allow the definition, creation, querying,

update, and the administration of databases. Meanwhile, an RDBMS is a DBMS based

on the relational aspect. As of 2015, many frequently used databases are based on the

relational database model.

Multi join query optimization (MJQO) for DBMS is perhaps the most

important application for searching and retrieving information in shorter amounts of

time. The rapid growth in the amount of data available in the world has compelled

DBMS to manage its data efficiently. This plays a big role in storage management and

maintenance of the data (Wang & Strong, 1996).

Another major player in data management is information retrieval. This is the

process of accessing data from relational databases, which is subsequently used to

make queries into databases. On the other hand, Structured Query Language (SQL) is

a programming language designed for organizing, manipulating, and retrieving data

to/from RDBMS (Srivastava & Han, 2012).

A query in RDBMS can be executed via multiple approaches, where each

query contains SQL clauses and filters due to a large number of alternative Query

Execution Plan (QEP) being possible, making it the main difficult task when selecting

optimal QEPs.

2

A QEP is represented as a query tree that includes information about the access

method available for each relation, as all the algorithms are used in computing the

relational operations in the tree. The important step is to generate codes for the selected

QEP, which will then be executed in either compiled or interpreted mode to produce

the query results (Singh, 2006).

In the case where the query is inserted, a query optimizer provides a large

number of execution strategies that are required to analyze the data for execution by

checking its validity. Hence, a large number of alternative execution plans are possible,

and after a special purpose, it is not possible to analyze every possible query execution

plan.

The inability to work with a large amount of data is a problem, and the major

concern pertaining to this flaw is the inability to select an optimal QEP for execution.

The MJQO problem appears when the number of joins in the query tree increases,

which subsequently increases the number of QEP. The traditional approach is very

costly and time consuming.

The problem of optimal join order in query optimization is NP-hard (Leo &

Cesar, 2008). To reduce its complexity, it should be followed up with a well-accepted

heuristic in RDBMS (Moerkotte & Neumann, 2006). On the other hand, (David &

Frank, 2007) accounted for all bushy plans, but excluded a cross product

mathematically from the enumeration space. Thus, in many case, the query optimizer

ends up having to optimize for a plan that has nearly optimized.

An optimal QEP has always depended on the number of tuples used in a query.

It means that the query optimizer primarily relies on statistical information to make

tuple assessment, and it always depends on the accuracy of tuple assessment.

Increasing the qualities of the selection process of an optimal QEP relies on additional

CPU cost and increased memory consumption. Cost estimation models are

mathematical algorithms or parametric equations used to estimate the costs of a QEP

in terms of time or memory consumption (Dong & Shivnath, 2011).

RDBMS is the most well-known database being used nowadays, which is

based on the relational database model (Leo & Cesar, 2008). Query language is an

effective tool, which provides an interface to a user to store and access data. In the past

few decades, SQL has emerged as a standard query language (Vidya Banu &

Nagaveni, 2012); (Rashid & Ali, 2010); (Chaudhuri & Krishnamurthy, 1995).

3

Two components that are evident for query evaluation are the query optimizer

and the query execution engine (Chaudhuri & Kr).An optimal solution should be able

to evaluate the connected subset enumerate (CSE) once and reuse their results for

subsequent queries to improve overall query performance. Complex multi-join queries

usually takes longer to evaluate due to the inherent complexity of the queries. There

could be considerable performance saving by sharing the computation of CSE among

the queries.

In an RDBMS context, it was shown that substantial performance saving can

be obtained by using MJQO techniques. In addition to MJQO techniques in the

RDBMS context, there are also some preliminary studies (Chaudhuri & Ger, 2006);

(Tomasiz et al., 2010); (Lim & Herodotou, 2012) on the MJQO techniques in the

DBMS context proposed by Google (Dean & Ghemawat, 2004), which have recently

emerged as a new paradigm for large-scale data analysis and widely embraced by

Amazon, Google, Facebook, Yahoo!, and many other companies.

There are two key reasons for this; first, the framework can be scaled to

thousands of commodity machines in a fault-tolerant manner, and is thus able to use

more machines to support parallel computing. Second, the framework has a simple yet

expressive programming model through which users can parallelize their respective

programs without being concerned about issues such as fault tolerance and execution

strategy )Deng & Chain, 2014).

While all MJQO techniques (Prasad &, Deshpande, 2011), (Yihong et al.,

1998), (Nilesh et al., 2003) have been extensively studied in the RDBMS context, most

mainly focus on optimizing a handful of SQL join queries. MJQO problem in the

RDBMS context differed from these works, since the focus on optimizing a large

collection (hundreds or thousands) of cross product queries produced by the

applications of enumerative set-based queries.

In a traditional database, the total numbers of relations in multi-join queries

are usually less than 10, which can be effectively handled by dynamic programming

approaches. The complexity of this problem increases due to generation of complex

multi-join queries in certain modern applications, such as knowledge-based systems,

decision support systems, expert systems, Online Analytical Processing (OLAP), and

data mining.

An increase in the number of tables in the join query also increases the number

of alternative QEP, which complicates the optimizer’s task. Traditional methods are

4

not able to solve this optimization problem effectively due to the increased size of the

data and larger number of tables (Dong et al., 2011). Deterministic algorithms, greedy

algorithms, and heuristic algorithm-based approaches have tried to approximate the

optimal solution, but their performance remains weak (Steinbrunn & Kemper, 1997).

This problem is then tied with genetic approaches and randomized approaches,

such as tabu search, ant colony, bee colony, etc., all of which performs better

(Kadkhodaei & Mahmoudi, 2011), but better quality performing solution is still vital.

Another work has proposed a new algorithm that utilizes a cuckoo search algorithm

(Yang & Deb, 2009) combined with the tabu search algorithm (Glover& Ullman,

1989) to seek better solutions and determine the optimal join order. It is an integrated

part of the query optimizer. The optimizer generates a QEP, which takes some time to

execute. All authors are unable to find an optimal solution to this problem due to the

usage of only one database, and the results obtained were based on the only number of

tables in the database, which is insufficient.

3.2 1.2 Problem Statements

In this study, there are two new problems, namely MJQO and Single Join Query

Optimization (SJQO) in RDBMS. They are a crucial factor that affects the capability

of the database. The MJQO technique used in RDBMS should aim to obtain results of

each query efficiently, and the process of query should be optimized for time efficiency

as well.

However, MJQO used in an RDBMS are inefficient in terms of Query

Execution Time (QET) and cost on average. The traditional query optimization

technology wasted a long time per query and for the staff when trying to request

information on the work(s). This increases the daily and annual costs in institutes or

company. The traditional applications of RDBMS are inefficient in terms of QET and

cost. The number of joins N involved in a single query is relatively small, usually N <

10.

With the expansion of the database application, the traditional query

optimization technique are unable to support some of the latest database applications,

such as applications of Decision Support System (DSS), OLAP, and Data Mining

(DM), which may demand a query of more than 100 genes.

5

When multiple users and variety queries access distributed federated database

multiple tables with data variety, the tables must be joined. This can result in many

database operations, leading to increased database sizes to huge tables, and join and

slow processing or a deadlock situation on the other hand, queries need to return

answer quickly to clients. To solve this problem, minimizing the number of joins,

query plans, queries, and increased sharing are all needed in order to decrease

administration time (less cost).

Hence, such shortfall in the traditional query optimization is gradually

exposed. It is therefore necessary to explore new techniques to solve the MJQO

problem. Since MJQO is an NP hard problem (Li Liu & Dong, 2008) with increased

join, the number of QEP corresponding to a query grows exponentially, which leads

to computational complexity of MJQO problem.

Hence, the need to acquire an improved quality and performance. The

implications of these criteria are important to increase speed of query and reduce cost

in RDBMS. Therefore, a new intelligent approach, such as the swarm intelligent

approach that performs well, shorter QET, and low cost are all required.

Solving problems with a heuristic algorithm becomes a hotspot as it appears

on many location or site of RDBMS, therefore needing multi-optimization or

decentralized optimization, as proven in certain studies, such as ACO (Li Liu & Dong,

2008), Greedy Algorithm (GA) (Prasan & Bhobe, 2000), Genetic Algorithm (GA),

ABC, (Abber & Mourad, 2013) etc. Several approaches have been proposed to model

the specific intelligent behavior of meta-heuristic being applied for solving

combinatorial problems.

The state-of-the-art work in this direction (Tomasz & Potamias, 2010)

proposed two sharing techniques for a batch of jobs. Recent researchers have used

different models to solve the MJQO problem. However, they have been unable to

provide a better solution in reducing the corresponding time and cost. Traditional

methods are not able to solve this optimization problem effectively due to the increased

data size and large number of tables (Dong, 2008).

6

The optimal join order in RDBMS framework has been widely adopted by

modern enterprises, such as Facebook (Thusoo & Borthakur, 2010), to process

complex analytical queries on large data warehouse systems due to its high scalability,

fine-grained fault tolerance, and easy programming model for large-scale data

analysis. Given the long execution times for such complex queries, it makes sense to

spend more time optimizing such queries to RDBMS for all processing time.

While the optimal join order problem has recently attracted much attention in

a conventional RDBMS context (Kiyoshi & Guy, 1990); (Guido Moerkotte, 2006);

(Guido & Thomas, 2012); (Isard & Prabhakaran, 2009); (Pit Fender & Guido, 2013);

(Pit Fender & Thomas Neumann, 2012); (Fender & Moerkotte, 2012); (Roy &

Siddhesh, 2000); (Nilesh & Sudarshan, 2003); (Zhou & Lehner, 2007), the developed

solutions are not applicable to RDBMS due to the differences in query evaluation

framework and algorithms.

The optimal join order problem in RDBMS has a larger join enumeration space

compared to that in RDBMS due to the presence of multi-way joins. There has been

good work in RDBMS context for complexity study (Kiyoshi & Guy, 1990);

(Moerkotte, 2006); (Fender & Guido, 2012); (Fender & Neumann, 2012).

To the best of our knowledge, there has not been any prior work on the study

of these problems in the presence of multi-way joins in DBMS context. First, the

intermediate results in RDBMS are always materialized instead of being pipelined as

in RDBMS, which simplifies the MJQO problem in two ways.

Second, the MJQO problem in RDBMS may incur deadlocks due to the

pipelining framework (Nilesh & Sudarshan, 2003), while RDBMS does not have

deadlock problem due to the materialization framework. Materializing and reusing

results of Connected Subset Enumerate (CSE) in RDBMS may incur additional

materialization and reading costs due to the pipelining framework. However, since the

intermediate results always materialized in the DBMS framework, and there is no

additional overhead incurred by the technique.

Although the MJQO problem in RDBMS has been shown to be a very difficult

problem with a search space that is doubly exponential in the size of the queries

(Prasan, & Siddhesh, 2000); (Nilesh & Sudarshan, 2003); (Jingree & Lehner, 2007),

the simplification in RDBMS enables them to propose join order algorithms for the

MJQO problem in RDBMS, however, they are unable to reduce the cost associated

with QET, cost, and search spaces.

7

The large search space, number of possible plans, and many semantically

equivalent logical plans, logical plans with N operators have 2n possible placement

decision. In a simple example, the following figure shows different possible plans for

only 3 joins on 4 tables in Figure 1.1.

Figure 1.1: Multi Join Query Optimization Problem

They share the same (A JOIN B) subtree. The existing techniques calculate the cost

for all posable plans, which means it takes a long time when using swarm intelligent

approaches instead of computing the cost of this subtree in every plan, compute it once,

save the computed cost, and reusing it when seeing this subtree again. Using this

swarm technique results in us having a (2*N)! / (N+1)! time complexity, “just” 3N. In

our previous example with 4 joins, it means passing from 336 ordering to 81.

1.3 Aim of Research

This study aims to provide a comprehensive and in-depth research for a systematic

study of MJQO problem in the RDBMS paradigm and proposed swarm intelligence

approaches, namely standard ACO, PSO, and improve the Two-Phase Artificial Bees

Colony Algorithm (TPABC).

Heading

Heading

Heading

Heading

JOIN

JOIN JOIN

A B C D

JOIN

JOIN

JOIN

A B C D

JOIN

JOIN

JOIN

A B D C

JOIN

JOIN

JOIN

C D A B

JOIN

JOIN JOIN

A B C D

JOIN

JOIN

JOIN

A B C D

JOIN

JOIN

JOIN

A B D C

JOINJOIN

JOIN

C D A B

8

The proposed algorithm is used to search for and insert the query execution plan and

optimal global query execution plan to solve the MJQO problem in order to RDBMS

to reduce time, cost, and increase the performance of RDBMS.

1.4 Research Objectives

To achieve the research aims, the objectives are as follows:

(i) To design an MJQO for a RDBMS using a query graph based on Pruning and

Materialize Techniques.

(ii) To propose a new Two-Phase Artificial Bee Colony (TPABC) by removing the

scout-bee agent in order to improve the exploration factor.

(iii) To optimize Query Execution Plan (QEP) and Query Execution Time (QET)

for (i) using the proposed (ii).

(iv) To compare the performance of the proposed method in (ii) with other QEP-

swarm-based, such as PSO and ACO for processing time and accuracy.

1.5 Significance of Research

An important component in RDBMS is the query optimization. A user request is

usually expressed in high-level, non-procedural language describing the condition

produced by RDBMS’ need to satisfy.

The main problem in the RDBMS is the volume, which grows from 10 GB to

100 TB, or Exabyte in recent years. Query processing needs to be combined with non-

related sources over distributed database to obtain data with huge spaces.

Each query in the optometry phase produces more than one query plan, and

the optimizer tries to select the best plan at lower costs. All clients see similar views,

and are able to find similar replicas of unstructured data, which leads to very expensive

throughput and takes a long time for a user or client in a company, resulting in loss of

income.

9

The multiplicity of human needs is increasing alongside limited resources,

such as the MJQO problem. Economic resources are limited and insufficient to satisfy

all human needs characterized by parochialism and the lack the human needs of

multiple repeated renewal, such as the need to constantly include food, housing,

treatment, and jobs. Multi-join query optimization problem has been widely addressed

in RDBMS.

Therefore, it is necessary to design an efficient MJQO to determine the best

QEP and minimizing the number of queries or objectives and joins based on a swarm

intelligence approach that can be adapted to solve the MJQO problem. The proposed

TPABC optimization algorithm is used to select an evaluation plan for a batch of

queries and best plans in RDBMS. This is done by expanding exploration to find the

optimal QEP for MJQO in order to improve the performance of RDBMS. The

exploitation process is increased using TPABC to find the global optimal plane from

command sub-expression queries sharing.

1.6 Scope of Research

This research aims to enhance the overall statues on MJQO in RDBMS to solve MJQO

problem, which is the NB-hard problem in RDBMS. The study proposed swarm

intelligence approaches, such as (ABC, PSO, ACO), as new methods to reduce the

complexity and cost in order to solve this problem. All these algorithms are used to

optimize QEP, QET and cost. The research work proposed TPABC to improve

exploration and exploration factors to increase the performance of the database. The

study attempt to solve optimal join order problem in RDBMS based on four types of

query graph in RDBMS framework.

1.7 Thesis Organization

This thesis is organized and divided into six chapters. The first chapter introduces the

research background, problem statements, and objectives and contributions. Chapter

two presents a comprehensive literature review of the problems in RDBMS and

provide an overview of the swarm intelligence-based algorithm, such as ABC, ACO,

and PSO and joint techniques in RDBMS.

10

Chapter three encompass the methodology used to carry out the study systemically. It

consists of optimization algorithm (i.e. ABC, PSO, and ACO) and two new techniques

to solve MJQO problems in DBMS. Chapter four explains the proposed improve

TPABC swarm-based MJQO in DBMS to solve MJQO problem, and compares

TPABC with (naive heuristic algorithm) to improve factors of exploration and

exploitation. Chapter five simulate the result and analysis data of both MJQO and

QET. Finally, Chapter six conclude the work and provide suggestions and contribution

of the research, and points out some directions for future work.

CHAPTER 2

LITERATURE REVIEW

2.1 Introduction

The second chapter of the thesis is the heart of an investigation, in which it provides

an overview of contemporary literature in a broad academic and historical context

(Boote & Beile, 2005). The chapter sets to describe the focus or content of the study

and provide definitions of the scope of the study. This literature review explores there

domain themes of the research work: Relational Database Management System

(RDBMS) performance, Multi Join Query Optimization (MJQO) as good issues to

improve RDBMS performance and setup swarm intelligent approaches as a technique

to solve MJQO problems. The scope of this literature review is expanded to include

the researches that examine the domain themes of the research work, the MJQO

problem has been widely addressed in Relational Database Management Systems

(RDBMS).

2.2 Advantages of Database Management System

Because data are the crucial raw material from which information is derived must have

a good method to manage such data. DBMS helps make data management more

efficient and effective, in particular, a DBMS provides advantages such as improved

data sharing. The DBMS helps create an environment in which end users have better

access to more and better-managed data. Such access makes it possible for end users

to respond quickly to changes in their environment to improve data security.

In cases where more users access the data, the greater the risks of data security

breaches. As such, it is noted that corporations, ensuring the corporate data are used

properly by investing considerable amounts of time, effort, and money.

12

Therefore the use of DBMS provides a framework for better enforcement of

data privacy and security policies. Better data integration with wider access that allows

well-managed data are able to promote an integrated view of the organization’s

operations and a clearer view of the big picture.

It becomes much easier to see how actions in one segment of the company

affect other segments. Data inconsistency exists when different versions of the same

data appear in different places. The RDBMS makes it possible to produce quick

answers to ad hoc queries. From a database perspective, a query is a specific request

issued to the DBMS for data manipulation, for example; to read or update the data

simply put, a query is at work, and an ad hoc query is a spur-of-the-moment work.

RDBMS sends back an answer (called the query result set) to the application.

Technological advancements around transmission of data through the network, have

largely influenced the cost of transmitting the data per terabyte over long distances

(Gelogo & Lee, 2012). Furthermore, the RDBMS has achieved progress in two

company’s dimensions: data management and data transfer.

Based on the relate research, data management happens to be more costly than

data transfer (Gelogo & Lee, 2012; Garefe, 1996). In addition, there is a rapidly

growing interest in outsourcing DBMS tasks to third parties that can provide these

tasks for much lower cost due to economy of scale.

Designation of a new outsourcing model has few benefits, but the most

significant benefit is the reduction of the cost of running DBMS on one’s own (Gelogo

& Lee, 2012), (Buyya et al., 2011).

Whereby it shares information between multiple devices, and the number of

these devices which expected to increase. Currently it is notable that there are a lot of

companies that offer DBMS as a cloud service such as: Microsoft Azure, google,

amazon EC2, GoGrid, guarantee data, Mongo lab, etc.

2.3 Query Optimization

Current relational optimizers are influenced by the techniques introduced in the system

query optimizer (Patricia & Raymond, 1997; Chaudhuri & Krishnamurthy, 2006). One

important contribution of this reference is a cost-based framework to obtain execution

plans, which is still used with some variations in most current optimizers.

13

Another important contribution of (Patricia & Raymond, 1997) is a bottom-up

dynamic programming, search strategy to traverse the space of candidate execution

plans. This strategy needs to consider O (N) expressions (Kiyoshi & Guy, 1990) for a

given query. To decrease optimization time, some heuristics are used such as delaying

the optimization of cartesian products, or considering only leaving-deep join trees.

The Starburst optimizer (Laura. & Christoph, 1998; Laura & Lohman, 1990)

extend system-r with a more efficient, extensible approach and consists of two rule-

based subsystems. In the second phase the actual execution plan is chosen.

Physical operators called LOLEPOPs can be combined in many ways to

implement higher-level operators, and such combinations are expressed in a grammar

production-like language (Guy & Lohman, 2001). The join enumerator in the starburst

is similar to the system bottom-up enumeration scheme. The Exodus optimizer

generator (Graefe & David, 1987) is the first extensible optimization framework that

uses a top-down approach.

Exodus separates the optimizer's search strategy from its data model, and

distinguishes between transformation rules (which map one algebraic expression into

another) and implementation rules (which map an algebraic expression into an

operator tree). Although it was difficult to construct efficient optimizers provide a

useful foundation for the next generation of extensible optimizers.

The Volcano Optimizer Generator (William, 1993) improves the efficiency of

exodus and introduces ore extensibility and effectiveness. Volcano's search algorithm

combines dynamic programming with directed search based on physical properties,

branch-and-bound prune and heuristic guidance. Finally, the cascades framework

(Shekita & Wilms, 1993) solves some problems present in Exodus and Volcano, and

improves functionality, ease of use, and robustness without compromising

extensibility and efficiency.

Cascades are the state-of-the-art rule based optimization framework used in

current optimizers such as Tandem's Nonstop SQL (Pedro & Celis, 1996) and

Microsoft SQL server (Graefe, 1996) the cascades framework differs from the

starburst in its approach to enumeration, in fact, this system does not use two distinct

optimization phases as Starburst does, and the application of rules is goal-driven, as

opposed to the forward-chaining rule application phase in Starburst. A detailed

description of the Cascades and some extensions to the original framework appear in

(Yongwen, 1998; Billings, 1997).

14

2.3.1 Optimization of Relational Database Management System

Relational query languages provide a high-level declarative interface to access data

stored in relational database systems. With a declarative language, users (or

applications acting as users) write queries stating what they want, but without

specifying step-by-step instructions on how to obtain such results.

In turn, the RDBMS internally determines the best way to evaluate the input

query and obtains the desired result. Structured Query Language, or SQL (Jim Melton

& Alan Simon, 1993) has become the most widely used relational database languages

in order to answer a given SQL query. Atypical RDBMS goes through a series of steps,

illustrated in Figure 2.1, which shows the input query, treated as a string of characters,

is parsed and transformed into an algebraic tree that represents the structure of the

query.

This step performs both syntactic and semantic checks over the input query,

rejecting all invalid requests. The algebraic tree is optimized and turned into a query

execution plan. A query execution plan indicates not only the operations required to

evaluate the input query, but also the order in which they are performed, the algorithm

used to perform each step and the way in which stored data are obtained and processed

(Graefe, 1993) the query execution plan is evaluated and results are passed back to the

user in the form of a relational table.

Figure 2.1: Executing SQL Queries in a Relational Database System.

Optimizer Execution

Engine Parser

SELECT R. a, S.cFROM R,S,T

WHERE R .x=S .yAND S.b <10

R. a S .cX 1Y 6 Z 4

Output Table Input SQL query

|X|R.xS.y

R σ s.b<10

∏ R. a, S. cProject R.a S.c

Merge join R.x,S.y

Sequential scan over R

Sort[S.Y]Algebra Tree

Clustered index scan over S Filter on fly

[S.b<10]Query execution

plan

Simplified the RDBMS

15

Modern relational query optimizers are complex pieces of code and typically

represent 40 to 50 developer-years of effort (Raghu & Johannes, 2000). As stated

before, the role of the optimizer in a database system is to identify an efficient

execution plan to evaluate the input query indicate in Figure 2.1. To that end,

optimizers usually examine a large number of possible query plans and choose the one

that is expected to result in the fastest execution.

Database queries are given in declarative languages, typically SQL. The goal

of query optimization is to choose the best execution strategy for a given query under

the given resource constraints. While the query specifies the user intent (i.e., the

desired output), it does not specify how the output should be produced. This allows for

optimization decisions, and for many queries there is a wide range of possible

execution strategies, which can differ greatly in their resulting performance. This

renders query optimization an important step during query processing.

The role of the optimizer is to determine the lowest cost plan for executing

queries. By "lowest cost plan," it means an access path to the data that takes the least

amount of time. Times invoke the optimizer for structural query language (SQL)

statements when more than one execution plan is possible. The optimizer chooses what

it thinks is the optimum plan. This plan persists until the statement is either invalidated

or dropped by the application.

2.3.2 Architecture of Query Optimizer

Several query optimization frameworks have been proposed in the literature (David,

1987; William, 1993; Patricia & Raymond, 1997; Laura & Christoph, 1998; Graefe,

1995) and most modern optimizers rely on the concepts introduced in these references.

Although implementation details vary among specific systems, virtually all

optimizers share the same basic structure (Ioannidis, 1997; Surajit, 1998) as shown in

Figure 2.2.

16

Input Query

Output Query

Execution plan

Sub QEP Explored

EnumerationEngine

Cost Estimation Cost Estimation module Cardinality Estimate

Simplified Query Optimizer

Figure 2. 2: Sampled Architecture of the Query Optimizer in a Database System.

For each input query, the optimizer considers a multiplicity of alternative plans.

For that purpose, enumeration engine navigates through the space of candidate

execution plans by applying rules.

Some optimizers have a set of rules to enumerate alternative plans (Patricia &

Raymond, 1997). While others implement extensible transformational rules to

navigate through the search space (Laura, 1998; Graefe, 1995).

During optimization, a cost module estimates the expected consumption of

resources of each discovered query plan (resources are usually the number of I/O's, but

can also include CPU time, memory, communication bandwidth, or a combination of

these). Finally, once all interesting execution plans are explored, the optimizer extracts

the best one, which is evaluated in the execution engine shows in Figure 2.3.

The cost estimation module is then a critical component of a relational

optimizer. In general, it is not possible to obtain the exact cost of a given plan without

executing it (which does not make sense during optimization). Thus, the optimizer is

forced to estimate the cost of any given plan without executing it. It is then

fundamental for an optimizer to rely on accurate procedures to estimate costs, since

optimization is only as good as its costs estimates. Cost estimation must also be

efficient, since it is repeatedly invoked during the optimization process.

17

The basic framework for estimating costs is based on the following recursive

approach described in (Surajit, 1998) as collect statistical summaries of stored data,

given an operator in the execution plan and statistical summaries for each of its sub-

plans, determine tow operation statistical summaries of the output and estimated cost

of executing the operator. The second step can be applied iteratively to an arbitrary

tree to derive the costs of each operator. The estimated cost of a plan is then obtained

by combining the costs of each of its operators. In general, the number of disk I/O's

needed to manage intermediate results while executing a query plan (and thus the

plan's cost) is a function of the sizes of the intermediate query results.

Therefore, the cost estimation module heavily depends on cardinality estimates

of sub-plans generated during optimization. The following example illustrates how

sizes of intermediate results can significantly change the plan that is chosen by an

optimizer.

Example 1: Consider the following query template, where C is a numeric parameter.

SELECT * FROM R, S

WHERE R.x =S.y and R.a < C

Figure 2.3 shows the execution plans produced by an optimizer when

instantiate C with the values 20, 200, and 2000. Three instantiated queries are almost

identical.

Figure 2.3: Query Execution Plans for Various Instances of a Template Query

18

The resulting query plans are considerably different. For instance, in Figure 2.3

(A), the optimizer estimates that the number of tuples in R satisfying R. a < 20 is very

small, so it chooses to evaluate the query as follows. First, using a secondary index

over R.a, it retrieves the record identifiers of all tuples in R that satisfy R.a < 20. Then,

using lookups against table R, it fetches the actual tuples that correspond to those

record identifiers. It performs a nested-loop join between the subset of tuples of R

calculated before, and table S, which is sequentially scanned.

For the case C = 2000 in Figure 2.3 hash join, the optimizer estimates that the

number of tuples of R satisfying R. a < 2000 is rather large, and therefore chooses to

scan both tables sequentially (discarding on the y the tuples from R that do not satisfy

the condition R. a < 2000) and then perform a hash join to obtain the result.

(In this scenario, the lookups of the previous plan would have been too

numerous, and therefore, too expensive).

Figure 2.3 merge join, shows yet another execution plan that is chosen when

the number of tuples of R satisfying the predicate is neither too small nor too large. In

this case, table S is scanned in increasing order of S: y using a clustered index, and

table R is scanned sequentially (discarding invalid tuples on the y as before) and then

sorted by R: x.

A merge join is performed on the two intermediate results it is known that if

cardinality estimates are accurate, overall cost estimates are typically by no more than

10 percent (Michael & Lohman, 2001). However, cardinality estimates can be off by

orders of magnitude when the underlying assumptions on the data distribution are

invalid. Clearly, if the optimizer does not have accurate cardinality estimations during

optimization, the \wrong" execution plan might be chosen for a given query.

In the previous example, if the number of tuples satisfying R. a < 2000 is

underestimated, the optimizer could choose the less efficient plan (for that scenario)

of Figure 2.3 merge join, and therefore waste time by sorting a large intermediate

subset of R. In the context of adaptive query processing (Joseph & Franklin, 2003)

where initial bad choices during optimization can be later corrected during query

execution, accurate cardinality estimates allow the optimizer to start with a higher

quality execution plan, thus minimizing the probability of dynamic changes during

query execution. Henceforth for this reason, it is crucial to provide the optimizer with

accurate procedures to estimate cardinality values during optimization.

19

Next section give an overview of statistical structures that can be used to

estimate the cardinality of intermediate results generated by query sub-plans during

optimization and there are a few type of join methods in DBMS.

2.4 Join Methods

Theta join combines tuples from different relations provided they satisfy the theta

condition. The join condition is denoted by the symbol θ, R1 ⋈ R2, R1 and R2 are

relations having attributes (A1, A2 ... An) and (B1, B2 … Bn) such that the attributes

do not have anything in common, that is R1 ∩ R2 = Ø. The optimizer can be selected

from multiple join methods. When the rows from two tables are joined, one table is

designated the outer table and the other the inner table.

The optimizer decides which of the tables should be the outer table and which

should be the inner table. During a join, the optimizer scans the rows in the outer and

inner tables to locate the rows that match the join condition. The optimizer analyses

the statistics for each table for example; might identify the smallest table or the table

with the best selectivity for the query as outer table.

If indexes exist for one or more of the tables to be joined, the optimizer takes them into

account when selecting the outer and inner tables. If more than two tables are to be

joined, the optimizer analyses the various combinations of joins on table pairs to

determine which pair to join first, which table to join with the result of the join, and so

on for the optimum sequence of joins.

The cost of a join is largely influenced by the method in which the inner and

outer tables are accessed to locate the rows that match the join condition. The optimizer

selects from two join methods when determining the query optimizer plan.

The current join methods as natural join, outer join are not sufficient to merge

tables of database therefore necessary to find new and efficient way to improve and

optimize query and RDBMS performance.

2.4.1 Natural Joint (⋈)

Natural join does not use any comparison operator. It does not concatenate the way a

cartesian product does can perform a natural join only if there is at least one common

20

attribute that exists between two relations. In addition, the attributes must have the

same name and domain. Natural join acts on those matching attributes where the values

of attributes in both the relations are the same.

Example Two:

SELECT Enroll, StuId, lastName, first Name

FROM Student, Enroll

WHERE class No = ’ART 103A’

AND Enroll. StuId = Student. StuId

Table 2.1: Enroll Table Table 2.2: Student Table

Table 2.3: Result Example Two Join

StuId Last Name First Name

S1001 Smith Tom

S1002 Chin Ann

S1010 Burns Edward

In the example two required the use of two tables shown in the two Tables 2.1

and 2.2 and join those records into a new table shown in Table 2.3 and join those record

into new table. From this table, the result show the last name and first name, this is

similar to the join operation in relational algebra. SQL allows the user to do a natural

join. The result of join Enroll and Student that show in Table 2.3.

StuId last Name First Name Major Credits

S1001 Smith Tom History 90

S1002 Chin Ann Math 36

S1005 Lee Perry History 3

S1010 Burns Edward Art 63

S1013 McCarthy Owen Math 0

S1015 Jones Mary Math 42

S1020 Rivera Jane CSC 15

StuId Class Number Grade

S1001 ART103A A

S1001 HST205A C

S1002 ART103A D

S1002 CSC201A F

S1002 MTH103C B

S1010 ART103A

S1010 MTH103C

S1020 CSC201A B

S1020 MTH101B A

Nature join

join

21

2.4.2 Outer Joint

Previously, discussed at nature join, where the selects rows for the common to the

participating tables to a join. What about the cases are interested in selecting elements

in a table regardless of whether they are present in the second table will now need to

use the SQL OUTER JOIN command. The syntax for performing an outer join in SQL

is database dependent. For example, in Oracle, will place an "(+)" in the WHERE

clause on the other hand of the table for which it wanted to include all the rows. Let is

assume they have the following two Tables 2.4 and 2.5.

Student OUTER-EQUIJOIN Faculty

Compare Student. LastName with Faculty.name

The result of outer join query show in the Table 2.6.

Table 2.4: Student Table Table 2.5: Faculty Table

Table 2.6: Outer Join for Student and Faculty Tables

StuId LastName FirstName Major Credits FacId Name Department Rank

S1001 Smith Tom History 90 F221 Smith CSC Professor

S1001 Smith Tom History 90 F115 Smith History Associate







F101 Adams Art Professor

F105 Tanaka CSC Instructor

F110 Byrne Math Assistant

StuId Last Name First

Name

Major Credits








FacId Name Dep Rank

F101 Adams Art Prof

F105 Tanaka CSC Instr

F110 Byrne Math Assi

F115 Smith History Asso

F221 Smith CSC Prof

Outer Join

22

The outer equijoin use to search full tables left and right in current example

search about student last name in Table 2.4 to compare last name in student Table 2.4

with name in the faculty in the Table 2.5 to finding similar name then the result will

put the required record in the result Table 2.6 otherwise leave the record of right table

is null.

2.4.3 Left Outer Joint

In a left outer join, all rows from the first table mentioned in the SQL query is selected,

regardless whether there is a matching row on the second table mentioned in the SQL

query. Let is assume having the following two tables.

Table 2.7: Student Table Table 2.8: Faculty Tables

Table 2.9: Left Join for Student and Faculty Tables

StuId LastName FirstName Major Credits








FacId Name Department Rank




F115 Smith History Associate

F221 Smith CSC Professor

StuId LastName First Name Major Credits FacId Name Department Rank









Left join

23

In the left join, all rows in the left table to kept in the result and compare the

last name in student Table 2.7 with name in the faculty Table 2.8 for the column name

in the faculty, if the same name have found in the name of faculty table, in this case

save all record of faculty table in the result Table 2.9 otherwise the result will be null.

2.4.4 Right-Outer Joint

The right outer join keyword returns all rows from the right table with the matching

rows in the left table. The result is NULL in the left side when there is no match.

Student RIGHT-OUTER-EQUIJOIN

Table 2.10: Student Table Table 2.11: Faculty

Table 2.12: Result of Right Outer join

FacId Name Department Rank




F115 Smith History Associate

F221 Smith CSC Professor

StuId LastName FirstName Major Credits








StuId LastName FirstName Major Credits FacId Name Department Rank






Right Outer Join

24

2.5 Multi Join Query Optimization in Relational Database Management System

Query optimization is a function of many relational database management systems.

The query optimizer attempts to determine the most efficient way to execute a given

query by considering the possible query plans. Multi-joint query is one of the basic

operations while using database. Therefore, Multi-joint query optimization is of great

necessity to improve database performance.

There are often other cost metrics in addition to execution time that are relevant

to compare query plans (Trummer & Immanuel, 2015). In a cloud computing scenario

for instance, one should compare query plans not only in terms of how much time they

take to execute but also in terms of how much money spending their execution costs.

The context of approximate query optimization, it is possible to execute query plans

on randomly selected samples of the input data in order to obtain approximate results

with reduced execution overhead.

For example, in a database system enhanced with inference capabilities, a

simple query involving a rule with multiple definitions may expand to more than one

actual query that has to be run over the database.

In the past few years, several attempts have been made to extend the benefits

of the database approach in business to other areas, such as artificial intelligence and

engineering design automation. Traditionally, query optimizers like (Chaudhuri, 2006)

optimize queries one at a time and do not identify any commonalities in queries,

resulting in repeated computations. As observed in (Rosenthal & Chakravarthy, 1988;

Sellis, 1988) exploiting common results can lead to significant performance gains.

This is known as multi-query optimization. Existing techniques for multi-query

optimization assume that all intermediate results are materialized (Cosar & Srivastava,

2008; Roy & Seshadri, 2000; Deshpande et al., 1998).

They assume that if a common subexpression is to be shared, it will be

materialized and read whenever it is required subsequently. Current multi-query

optimization techniques do not try to exploit pipelining of results to all the users of the

common subexpression.

https://en.wikipedia.org/wiki/Relational_database_management_system

https://en.wikipedia.org/wiki/Query_plan

REFERENCES

Abber Al-Dayel & Murad. (2013). Query paraphrasing enhancement using artificial

bee colony. In the 3rd International Conference on Web Intelligence. Mining and

Semantics.

Alamery & Faraahi. (2010). Multi-join query optimization using the bee’s algorithm.

In Proceedings of the 7th International Symposium on Distributed Computing

and ArtificialIntelligence (pp. 449-457).

Alberto, O. & Mendelzon, S. (2002). Proc. algorithm for the generation of optimal

bushy join trees without cross products. In Database Engineering and

Applications Symposium (IDEAS). IEEE CS Press. Edmonton, Canada, July .

Alzaqebah, M. & Abdullah, S. (2011). Artificial bee colony search algorithm for

examination timetabling problems, ToX: The Toronto XML Server. Int. J.

Phys.Sci.(79).

Arens & Knoblock. (1994). Cooperating agents for information retrieval. In processing

of the second of international conference on comparative of information system.

Awadallah, M. (2011). Artificial Bee Colony Algorithm for Curriculum-Based Course

Timetabling Problem. ICIT.(78).

Baykasoglu, L. Ozbakir & P. Tapkan. (2007). Artificial bee colony algorithm and its

application to generalized assignment problem. Swarm Intelligence: Focus on

Ant and Particle Swarm Optimization, 2007, 113-144.

Beard, L. Getoor, M. Blake.(2007) Visual mining of multi-modal social networks at

different abstraction levels. IEEE Conference on Information Visualization

Symposium of Visual Data Mining (IV-VDM), July 2007.

Beynon & Kur (2001).A. Sussman,H. Andrade, R. Ferreira, and J. Saltz. Processing

largescale multi-dimensional data in parallel and distributed

environments.Parallel Computing, 28(5):827–859.

Billings, K. (1997). A TPCD model for database query optimization in Cascades.M.S.

Thesis, Portland State University.

Biskup, D. & Feldmann, M. S. (2001). Benchmarks for scheduling on a single

machine against restrictive and unrestrictive common due dates. Computers &

Operations Research, volume 28, pp. 787 -801.

Boote, N. & Beile, P. (2005). Scholars Before Researchers: On the Centrality of the

Dissertation Literature Review in Research Preparation.Educational Researcher,

Vol. 34, No. 6, pp 3-15.

Bramley & Chiu (2000) .Acomponent based services architcture for building

distrbuted applications .in processding of HPDC.

Bullnheimer, B. & Hartl, R. (1998). Applying the ant system to the vehicle

routingproblem. In: Osman, I.H., Vo, S., Martello, S., Roucairol, C. (eds.)

Metaheuristics: Advances and Trends in Local Search Paradigms for

Optimization, pp. 109–120. Kluwer Academics,Dordrecht .

http://people.cs.georgetown.edu/~singh/papers/singh2007IV.pdf

http://people.cs.georgetown.edu/~singh/papers/singh2007IV.pdf

163

Burrough, P.A. (1986) Principles of Geographic Information Systems for Land

Resource Assessment. Monographs on Soil and Resources Survey No. 12,

Oxford Science Publications, New York.

Camazine, S., Franks, N., & Deneubourg, L. ( 2001). Self-Organization in Biological

Systems. Princeton Studies in Complexity, PrincetonUniversity Press,

Princeton, NJ.(62).

Cao, Y., & Fang, Q. (2008). Parallel Query Optimization Techniques for Multi-Join

Expressions. Jurnal of software 13(2). 250-256 .

Catherine Riccardo (2012). Introduction in daatabase conceptual .third edition ,

Canada. Cathleen Sether.

Celis,& pedro . (1996). The query optimizer in Tandem's new Server Ware SQL

Product. In Proceedings of the Twenty-second International Conference on Very

Large Databases (VLDB'96).

Chakravarthy, B.S. (1986): Measuring Strategic Performance, Journal,7 (5), 437458.d

oi: 10.1002/ smj.4250070505.

Chande, S.V., & snik, M. (2007). Genetic Optimization for the Join Ordering Problem

of Database Queries. Jaipur, India, Department of Computer Science

International School of Informatics and Management.

Chaudhuri & Ger (2006). Probabilistic information retrieval approach for ranking of

database query results. ACM Transactions on Database Systems 31(3):1134–

1168. DOI doi.acm.org/10.1145/1166074.1166085

Chaudhuri & krishnamurthy (2006) .optimization queries with matarlized views .

proc.11 the ICDE,190,200.

Chen & Dunham (2006). Ansthe ring Top-K Queries with Multi-Dimensional

Selections: The Ranking Cube Approach. ACM.

Christian & Andrea, (2003). Relative Deprivation, Personal Income Satisfaction, and

Average Well-Being under Different Income Distributions, Economics Working

Papers 2003,05, Christian-Albrechts-University of Kiel, Department of

Economics.

Christopher Beer & Tim Hendtlass (2012). Improving Exploration in Ant Colony

Optimisation with Antennation, IEEE must be obtained for all other uses.

Chuan, Z,, Xin Y., & Jian, Y. (2001). An evolutionary approach to materialized views

selection in a data warehouse environment. IEEE TRANS. SYST.,MAN,

CYBERN, 31:282|294.

Chuang, Y., & Chen, C. (2012). Black-Box Optimization Benchmarking for Noiseless

Function Testbed using Artificial Bee Colony Algorithm. GECCO’10, Portland

Oregon, USA.

Civicioglu P, Besdok .(2011). A conceptual comparison of the CK, PSO, DE and ABC

algorithms, Springer.(78).

Colorni, A., Dorigo, M., & Maniezzo, V. (1994). Ant system for job-shop scheduling.

BelgianJournal of Operations Research, Statistics and Computer Science 4(1),

3953 Computer Science and Software Engineering.

Cosar, C., Reed, B., Silberstein, A., & Srivastava, U. (2008). Automatic optimization

of parallel dataflow programs. USENIX Annual USENIX Association Technical

Conference In ATC, pages 267–273.cross products. InVLDB.

D. Karaboga & C. Ozturk, (2011). A novel clustering approach: artifcial bee colony

(ABC) algorithm,” Applied Soft Computing Journal.

David & Frank. Tompa (2007). Optimal top-down join enumeration. School of

Computer Science University of Waterloo Waterloo, Ontario, Canada.

https://ideas.repec.org/p/zbw/cauewp/787.html

https://ideas.repec.org/p/zbw/cauewp/787.html

https://ideas.repec.org/s/zbw/cauewp.html

https://ideas.repec.org/s/zbw/cauewp.html

164

Dean, J., & Ghemawat, S . (2004). Mapreduce. simplifie d data processing on large

clusters. In OSDI, pages 137–150.

DeHaan, D., & Tompa, F . (2007). Data Consumers, in: Journal of Management

Information Systems, 12, 1996, No. 4, pp. 5. Optimal top-down join

enumeration. InSIGMOD, pages 785–796,

Deng, W., Chain, H., & Li, H. (2014). A Novel Hybrid Intelligence Algorithm for

Solving CombinatorialOptimization Problems. Journal of Computing Science

and Engineering,Vol. 8, No. 4, December, pp. 199-206

Derakhshan, R., Dehne, F., Korn, O., & Stantic, B. (2006). Simulated annealing for

materialized view selection in data warehousing environment. In Proceedings of

the 24th IASTED international conference on Database and applications, pages

89{94, Anaheim, CA, USA. ACTA Press.

Derakhshan, R., Stantic,B., Korn, O., & Dehne, F. (2008). Parallel simulated annealing

for materialized view selection in data warehousing environments.In

Proceedings of the 8th international conference on Algorithms and

Architectures for Parallel Processing, ICA3PP '08, pages 121132,

Berlin,Heidelberg,. Springer-Verlag.

Dervis & Akay , B. (2009). A comparative study of artificial bee colony algorithm.

Appl Math Comput 214(1):108–132 .

Deshpande ,Yihong, Z., Prasad M., F., & Amit S. (1998). Simultaneous optimization

and evaluation of multiple dimensional queries. In SIGMOD, pages 271–282.

Dong & Horvath (2007) Understanding Network Concepts in Modules, BMC Systems

Biology 1:24.

Dong, & Shivnath Babu. (2011). Mapreduce programming and cost-based

optimization? crossing this chasm with starfish. PVLDB, 4(12):1446–1449.

Dorigo, M. & Theraulaz, G. (1999). Swarm Intelligence:From Natural to Artificial

Systems, Oxford University Press, Oxford.(62)

Dorigo, M. & Sttzle, T. (2004). Ant Colony Optimization. MIT Press, Cambridge,

2004.(61).

Dorigom, M. Gambardella, L.M. (1997).: Ant colonies for the traveling salesman

problem. BioSystems 43(2), 73–81 .

El-Abd, M. (2010). A Cooperative Approach to The Artificial Bee Colony Algorithm.

CEC(79) El-Abd M .(2005). All rights reserved. doi:10.1016/j.aei..01.004.(67).

Elghandour, I., & Vienna, A. (1993) . on Data Engineering, , pp. 345–354. Restore

Reusing results of mapreduce jobs. Austria IEEE Conference , PVLDB,

5(6):586597.

Fender, P., & Moerkotte, G. (2011). A new, highly efficient, and easy to implement

top-down join enumeration algorithm. In ICDE, pages 864–875.

Fender, P., & Moerkotte, G. (2012). Reassessing top-down join enumeration.

TKDE,24(10):1803–1818.

Fender, P., Guido, M., Thomas, N. & Viktor L. (2012). Effective androbust pruning

for top-down join enumeration algorithms. In ICDE, pages 414–425.

Franklin & Joseph .(2003). Flux: An Adaptive Partitioning Operator for Continuous

Query Systems, ICDE.

Franklin, Mistry & jonsson, (1996). Adaptive query processing: Technology in

evolution. IEEE Data Engineering Bulletin.

Frisch, K. (1967). The Dance Language and Orientation of Bees. Cambridge, Mass.:

The Belknap Press of Harvard University Press.

165

Garro, B.A., Sossa, H. & Vazquez, RA. (2011). Artificial neural network synthesis by

means of artificial bee colony algorithm. 2011 IEEE Congress of Evolutionary

Computation (CEC)

Gelogo, Y. E., & Lee, S.(2014) . Database Management System as a Cloud Service.

International Journal of Future Generation Communication and Networking

Vol.5 No. 2, June,2012

Geng, K., Dobbie, G., & Meng, Y. (2009). Survey of XML Semantic Query

Optimization. In Proceedings of the 2009 Fourth International Conference on

Internet Computing for Science and Engineering (ICICSE '09). IEEE Computer

Society, Washington, DC, USA, 297-300.

Giakoumakis, L., & Galindo-Legaria, C. (2008). Testing SQL Servers Query

Optimizer: Challenges.Techniques and Experiences”, IEEE.

Glover , F., & Ullman, D. (1989). Optimizing joins in a map-reduce environment. In

EDBT, pages 99–110,. Tabu Search-Part I.

Goldberg, D. E. (1989). Genetic Algorithms in Search Optimization and Machine

Learning”, 0201157675Addison-Wesley Pub. Co. (1989).

Graefe, G. (1993b) .Query evaluation techniques for large databases. ACM Computing

Surveys, 25(2).

Graefe, G. (1995). The Cascades framework for query optimization. Data Engineer-

ing Bulletin, 18(3),

Graefe, G. (1996).The Microsoft Relational Engine. In Proceedings of the Twelfth

International Conference on Data Engineering (ICDE'96).

Graefe, G., & David, J.(1987). The EXODUS optimizer generator. In Proceedings of

the 1987 ACM International Conference on Management of Data (SIGMOD'87),

Graefe, G., & McKenna, W.( 1993a). The Volcano optimizer generator: Extensibility

and ancient search. In Proceedings of the Ninth International Conference on

Data Engineering (ICDE'93).

Guido M. & Thomas N., (2008). Dynamic programming strikes back. In University of

Mannheim Mannheim, Germany

Guido M.(2006). Analysis of two existing and one new dynamic programming, VLDB

'06 Proceedings of the 32nd international conference on Very large data, Pages

930-941

Gupta A, et al. (2001) Crystal structure of Rv2118c: an AdoMet-dependent

methyltransferase from Mycobacterium tuberculosis H37Rv. J Mol Biol

312(2):381-91

Haas, M., Lin, E. T., & Roth, M. A.(2002). Data Integration through Database

Federation. IBM System Journal, VOL 41, No 4.

Hawash, A., Deik, A., & Jarrar, M. (2010). Towards Query Optimization for the Data

Web - Disk Based Algorithms: Trace Equivalence and Bisimilarity. In:

Proceedings of the International Conference on Intelligent Semantic Web -

Services and Applications(ISWSA’10),Amman,Jordan(pp.131-137). /

doi:10.11451874590.

Horng & C. C. Yeh. (2000).App lying genetic algorithms to query optimization in

document retrie val”. Information Processing & Management, 36(5), pp. 737–

759.

Ibaraki & T.Kameda. (1984). On the optimal nesting order for computing n-relational

joins. A CM-TODS , 9(3):482.

Ioannidis, Y .(1997). Query optimization. In Handbook for Computer Science. CRC

Press

http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5936494

http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5936494

166

Jeanne,R.L. (1986). The evolution of the organization of work in social insects.Monit

. Zool. Ital. 20 (1986) 267–287.

Jehrek, R. (2010) .database systems for management third edition. University of

Wisconsin Madison, Wisconsin, USA,Book.

Jeya, D., & Mohan, V. (2009). ABC Tester - Artificial Bee Colony Based Software

Test Suite Optimization Approach.

Kadkhodaei, H., & Mahmoudi, F. (2011). A combination method for Join Ordering

Problem in relational databases using Genetic Algorithm and Ant Colony.

Kang & Bhargava (1994). Mullti query optimazation on algrithm level. Jurnal data

knowledgeenginering.Volume14 Issue1, Nov.1994 Pages 57-75.

Karaboga, D., & Basturk, B. (2008). On the performance of artificial bee colony

(ABC) algorithm . Volume 8, Issue 1, January (2008).

Karnan. (2013). A Comprehensive review of Artificial Bee Colony Algorithm.(70)

Kennedy, J., Eberhart, R. C., & Shi, Y. (2001), Swarm Intelligence,Morgan

Kaufmann, San Francisco, CA.(62).

Kiyoshi O,. & Guy M. Lohman(1990). Measuring the complexity of join enumeration

Knight, G. (2014). Writing a Wellcome Trust Data Management & Sharing

Plan,london school of hyegine & medicine.

Krink, B., & Thomsen, R.(2004). Noisy optimization problems a particular challenge

for differential evolution. in Proceedings of 2004 Congress on Evolutionary

Computation, IEEE Press, Piscataway,NJ, 2004, pp. 332–339, 2004

Krink, D., & karboga S.(2008). on the performance of ABC algorithm. Pages 687-

697 Volume 8 Issue 1, January , journal applied softcomputing .

Krishnamurthy, W.R., Boral, H., & Zaniolo .(1986). Optimization of nonrecursive

queries. In:Proc. Of the Conf. On Very Large Data Base (sVLDB), Kyoto, Japan,

pp. 128–137 .

Laura &, Johann (1989). Extensible query processing in Starburst.In Proceedings of

the ACM International Conference on Management of Data (SIGMOD'89).

Leo Giakoumakis & Cesar Galindo Legaria, (2008) . Testing SQL Servers Query

Optimizer:Challenges, Techniques and Experiences”, IEEE.

Li, N., Liu, Y., Dong, Y., & Gu, J. (2008). Application of Ant Colony Optimization

Algorithm to Multi Join Query Optimization, Springer-Verlag Berlin

Heidelberg.

Lim, H., Herodotou, H., & Babu, S. (2012). study: A transformation-based optimizer

for mapreduce workflows. PVLDB, 5(11):1196–1207.

Ling, C., Yixin, L., Jianli, C., Ling, & Jing, G . (1988). A diversity guaranteed ant

colony algorithm based on queries. In ICDE, pages 311–319.

Liu, S., & Wang, Y. (2010). Quantum dynamic mechanism-based parallel ant colony

optimization algorithm.International Journal of Computational Intelligence

Systems. 3:101–113,.(61).

Liu, X., Cai, Z. (2009). Artificial bee colony Programming Made Faster, Fifth

International Conference on Natural Computation(80).

Lohman, O., & Guy, M. (1990).. Measuring the complexity of join enumeration in

query optimization. In Proceedings of the Sixteenth, International Conference

on Very Large Databases (VLDB'90).

Lohman,G. (1988). Grammar-like functional rules for representing query optimization

alternatives. In Proceedings of the 1988 ACM International Conference on

Management of Data (SIGMOD'88), 1988.

http://www.sciencedirect.com/science/journal/15684946/8/1

167

Luo, T.S. Pan, P.W. Tsai, & J.S. Pan.(2010). Parallelized artificial bee colony with

ripple-communication strategy. In Genetic and Evolutionary Computing

(ICGEC), 2010 Fourth International Conference on, IEEE, 2010, 350–353.

Management Applications. ISSN 2150-7988 Volume 7 pp. 074-083 © MIR

Labs,www.mirlabs.net/ijcisim/index.htm.special issues.Management Journal,7

437-458.

Maniezzo, V., Dorigo, M., & Colorni, A. (1994). The ant system applied to the

quadratic assignmentproblem,IRIDIA/94-28. Universite de Bruxelles, Belgium

(80)

Manolescu, I., Bouganim, L., & Fabret, F. (2002). Simon. Efficient Querying of

Distributed Resources in Mediator Systems. CoopIS/DOA/ODBASE 2002,

pp.468 485, 2002.

Matysiak, M. (1995). Efficient optimization of large join queries using Tabu Search.

Information Sciences, 83(1-4), 77–88. doi:10.1016/0020-0255(94)00094-R.

McHugh, J., & Widom, J. (1999). Query Optimization for XML. In proceedings of the

25th Very Large Data Bases Conference, Edinburgh, Scotland.

McHugh, J., Abiteboul, S., Goldman., R., Quass, D., & Lore, W. (1997). A Database

Management System for Semistructured Data. SIGMOD Record, 26(3):54-66,

September .

McHugh, J., Widom, J . (1999). Optimizing branching path expressions. Technical

Report, Stanford University.

Mcleod & Khan . (2003). A probe based technique to optimize join queries in

distributed internet bases, Knowledge and Information Systems.

Mehta PK, et al. (1993) Aminotransferases: demonstration of homology and division

into evolutionary subgroups. Eur J Biochem 214(2):549-61.

Mezura-Montes, E., & Cetina-Domínguez, O. (2009). Exploring promising regions of

the search space with The scout bee in the Artificial Bee Colony for Constrained

optimization(79).

Michael m M., Guy, M. & Lohman, M. (2001) . DB2's .learning optimizer. In

Proceedings of the 27th International Conference on Very Large Databases

(VLDB'01),

Middendorf, M. (2002). Ant colony optimization, in: Tutorial Proc. Genetic and

Evolutionary Computation Conference .

Mistry, P. Roy, S. Sudarshan, & Ramanritham, (2001). Materialized view selection

and maintenance using multi-query optimization. In Proceedings of the 2001

ACM-SIGMOD Conference, Santa Barbara,A. ACM Press.

Moerkotte & T. Neumann. (2006). Analysis of two existing and one new dynamic

programming algorithmfor the generation of optimal bushy join trees without

cross product . VLDB '06 Proceedings of the 32nd international conference on

Very large data bases.

Moerkotte, G., & Kemper, A. (1997). Heuristic and randomized optimization for the

join ordering problem . VLDB Journal.

Moerkotte, G., & Neumann, G. (2006). Analysis of two existing and one new dynamic

programing algorithm for the generation of optimal bushy join trees without

cross products. VLDB endowment Seoul, Korea .

Moerkotte, G., & Neumann, T. (2006). Dynamic programming strikes back Analysis

of two existing and one new dynamic programming algorithm for the generation

of optimal bushy join trees without cross products. In VLDB, pages 930–941.

Moerkotte, G., & Neumann, T. (2008). Dynamic programming strikes back.

InSIGMOD, pages 539–552.

http://www.mirlabs.net/ijcisim/index.htm

168

Montgomery , J., & Randall, M. (2002). Anti-pheromone as a tool for better

exploration of search space. In Proceedings of the Third International Workshop

on Ant Algorithms, ANTS ’02, pages 100–110, London, UK, Springer-

Verlag.(61).

Mukul, J., Praveen, S. (2013). Query Optimization: An Intelligent Hybrid Approach

using Cuckoo and Tabu Search. International Journal of Intelligent Information

Technologies, 9(1), 40-55,

Nakamichi & T. (2004). Diversity control in ant colony optimization.Artificial Life

Robot , 7:198–204.

Narasimhan.(2009). Parallel artificial bee colony (pabc) algorithm. In Nature &

Biologically Inspired Computing.. World Congress on, 2009, 306-311.

Nilesh, N., Dalvi, K. S., Sanghai, P., & Sudarshan, S. (2003). Pipelining in multi-query

optimization. J. Comput. Syst. Sci., 66(4):728–762,

Noha, A.R., Yousri, Khalil, M., & Nagwa, M. (2005). Algorithms for selecting

materialized views in a data warehouse. In The 3rd ACS/IEEE International

Conference on Computer Systems and Applications, page 27.

Pandao, M., & Isalkar, A. (2012). multi query optimization using heuristic approach

.International journal of computer science and network, ISSN 2277-5420, 2012.

Patricia & Raymond (1997). System R: Relational Approach to Database

Management. ACM Transactions on Database Systems, 1(2), 1976, 97-137.

Patricia, G., Selinger, M., Astrahan, D., Chamberlin, A., & Thomas, P. (1979). Access

path selection in a relational database anagement ystem. In Proceedings of the

1979 ACM International Conference on Management of Data (SIGMOD'79).

Petrou & Amiri, "Robust electrical spin injection into a semiconductor

heterostructure", Phys. Rev. B 62, 8180 (2000).

Pham, D.T., Otri, S., & Afify, A. (2007). data clustering using the bees algorithm.

Intelligent Systems Laboratory, Manufacturing Engineering Centre, Cardiff

University, Cardiff CF24 3AA, UK.(69).

Phuboon, J., & Auepanwiriyakul, R. (2007). Selecting materialized views using Two-

Phase optimization with multiple view processing plan. In World Academy of

Science, Engineering and Technology 27,

Phuboonob, J., (2009). Materialized View Selection Using Two-Phase Optimization

Algorithm. PhD thesis, National Institute of Development Admistration,

Bangkapi, Bangkok, Thailand.

Pinal Dave, (2014) . Adventureworks 2014. (http://blog.sqlauthority.com).

Pit Fender & Guido Moerkotte. (2012). Reassessing top-down join enumeration. IEEE

Transactions on Knowledge and Data Engineering, 24(10):1803–1818.

Pit Fender & Guido Moerkotte.,(2013). Top down plan generation: From theory to

practice.In ICDE, pages.

Plale, B. and K. Schwan (2003). Dynamic Querying of Event Streams with the

dQUOB System,IEEE Transactions of Parallel and Distributed Systems, IEEE

Computer Science Press,Vol. 14, No. 3, pp. 422432.

Pulikanti, S., Singh, A. (2009). An ABC for the Quadratic Knapsack problem. ICONIP

2009, pp. 196-205.

Quan & Xinling Shi (2008). On the analysis of performance of the improved artificial-

bee-colony algorithm, in natural computation, 7, 2008, 654-658.

Randall, M., & Tonkes, E.(2002). Intensification and diversification strategies in ant

colony system. Complexity International, 9, (61).

Rashid & Ali (2009). Efficient Transformation of a Natural Language Query to SQL

for Proceedings of the Conference on Language & Technol ogy 2009

https://scholar.google.com/scholar?cluster=15466550502837111601

https://scholar.google.com/scholar?cluster=15466550502837111601

https://www.google.com/search?newwindow=1&q=Adventureworks2014&spell=1&sa=X&ved=0CBoQBSgAahUKEwi24tiU1_vHAhXKGY4KHQQHBTs&biw=1467&bih=723

169

Ribeiro, C. C., Ribeiro, C. D., & Lanzelotte, R. S. (1997). Query optimization in

distributed relational databases. Journal of Heuristics, 3(1), 3–23.

doi:10.1023/A:1009670031749.

Riley, J. R., Greggers, U., Smith, A. D., Reynolds, D. R., & R .(2005). Menzel. The

flight paths of honeybees recruited by the waggle dance,” Nature, vol. 435, no.

7039, pp. 205–207: View at Publisher · View at Google Scholar · View at

Scopus.

Rizzolo, F., & Mendelzon , A. (2001). Indexing XML Data with ToXin. In Proc. 4th

Int. Workshop on the Web and Database (in Conjunction with ACM SIGMOD),

Santa Barbara, CA, May.

Rodríguez, M. (2010) . Automata Theory Based Approach to the Join Ordering

Problem in Relational Database Systems.

Rosenthal and U. S. Chakravarthy (1988). Anatomy of a modular multiple query

optimizer. Intel. Conf. Very Large Databases, pages 230–239, 1988.

Roussopoulos,(2000). WebView Materialization. In the Proceedings of the

ACMSIGMOD International Conference on Management of Data, Dallas,

Texas, US A, May 2000.

Roy, P., Seshadri, S., & Siddhesh, B. (2000). Efficient and extensiblealgorithms for

multi query optimization. SIGMOD Rec., 29(2):249–260.

Russell, C., Eberhart & Yuhui, S., (1998). Comparison between Genetic Algorithms

and Particle Swarm Optimization. Evolutionary Programming VII,–

Springer.(67)

S. Abiteboul, S. Cluet, V. Christophides, T. Milo, G. Moerkotte, and J.(

1998) Simeon. Querying documents in object databases. Intl. Journal on Digital

Libraries, 1:5–19.

Salminen, A., & Wm, F. (1994). Tompa: Pat expressions: an algebra for text search.

In Acta Linguista Hungarica 41, pages 277 – 306.

Seeley T.D., Visscher P.K., & Passino K.M. (2006). Group decision making in honey

bee swarms". American Scientist 94: 220–229. doi:10.1511/2006.3.220, .

Seeley,(1995).

The Wisdom of the Hive. Harvard University Press, Cambridge., MA.

The Social Physiology of Honey Bee Colonies.

Sellis & Timos, K. (1988). Multiple-query optimization. TODS, 13(1):23–52, Tomasz,

N., Michalis, P., Chaitanya, M., George, K., & Nick, K. (1997). Mrshare:

Sharing across multiple queries in mapreduce. PVLDB, 3(1-2).

Seyed, H., Talebian & Sameem, A. (2009). Using genetic algorithm to select

materialized views subject to dual constraints. In 2009 International Conference

on Signal Processing Systems, pages 633–638, Singapore, 2009.

Shams, I., & Aryanezhad, M. (2010). Optimization of a Multiproduct CONWIP-based

Manufacturing System using Artificial Bee Colony Approach. IMECS, Hong

Kong(80).

Shekita Lapis, G., & wilim (1993).Starburst midnight: As the dust clears. IEEE

Transactions on Knowledge and Data Engineering, 2(1).

Shekita, E., Young, H., & Tan, K.L. (1993).Multi-join optimization for symmetric

multiprocessors. In:Proc. Of the Conf. on Very Large Data Bases (VLDB),

Dublin, Ireland, pp. 479–492.

Shuang, B., Chen, J., & Li, Z. ( 2009) Study on hybrid psaco algorithm. Applied

Intelligence, pages 1–10,.(61).SIGMOD, pages 539–552, 2008.

Simon & Jim Melton (1993). Understanding the New SQL:A Complete Guide.Morgan

Kaufman,.

http://dx.doi.org/10.1038/nature03526

http://scholar.google.com/scholar?q=http://dx.doi.org/10.1038/nature03526

http://www.scopus.com/scopus/inward/record.url?eid=2-s2.0-18744399576&partnerID=K84CvKBR&rel=3.0.0&md5=838fdfac908e846dceb8f21ee458e558

http://www.scopus.com/scopus/inward/record.url?eid=2-s2.0-18744399576&partnerID=K84CvKBR&rel=3.0.0&md5=838fdfac908e846dceb8f21ee458e558

http://en.wikipedia.org/wiki/Digital_object_identifier

http://dx.doi.org/10.1511%2F2006.3.220

170

Singh, S.K. (2006). Database system coceptual , design and application ., person

eduction first emmpirsing , ISBN 81-7758-567-3

Sinha & Craig M. Chase (1996): Prefetching and Caching for Query Scheduling in a

Special Class of Distributed Applications. ICPP, Vol. 3 : 95-102.

Sonmez. (2010) Design of Fiber Reinforced Laminates for Maximum Fatigue Life

Procedia Engng 2 , 251-25 6 .

Sood & Qureshi (1985).Database machines: illustrated. Computers Systems

Architecture : Springer-Verlag. ISBN 0387171649 .

Souley, B., & mohamed, D. (2013). performing analysis of query optimizers under

varity hardware component in RDBMS. in journal computer engineering and

information technology, Journal of Software 13(2), 250–256 (2002).

Srivastava, D Han, MA Rico-Ramirez, M Bray, T Islam.,(2012). Selection of

classification techniques for land use/land cover change investigation, Advances

in Space Research 50 (9), 1250–1265.

Steinbrunn, M., Moerkotte, G., & Kemper, A. (1997). Heuristic and randomized

optimization for the join ordering problem. The Very Large Data Bases Journal,

6(3), 191–208. Doe: 10.1007/s007780050040 (1997).

Surajit (1998). An overview of query optimization. In Rlatinal system in processidings

of the sympesuim on principles of database systems (PODS).

Surjanovic, S. & Bingham, D. (2013). Virtual Library of Simulation Experiments: Test

Functions and Datasets.Retrieved May 28, 2016, from

http://www.sfu.ca/~ssurjano.

Swami, A., Iyer, B. (1993). A polynomial time algorithm for optimizing join queries.

In: Proc. IEEEConf. on Data Engineering, Vienna, Austria: 345–354.

Tae, S., & Hyoung, J. ( 2002). Extracting indexing information from XML DTDs. Inf.

Process. Lett. 81, 2 (January 2002), 97-103.

Talbi, E. G., Roux, O., & Fonlupt, C. ( 2009). Robillard.Parallel ant colonies for the

quadratic assignment problem.Future Generation Computer Systems, 17:441–

449,

Tang YY, et al. (2010) Does short-term mental training induce grey matter change

Jurnal Progress in Modern Biomedicine 2010 Vol. 10 No. 15 pp. 2961-2963.

1673-6273

Teodorovi, C.D., & Dell M. (2005). Bee colony optimizationa cooperative learning

approach tocomplex transportation problems. In: Proceedings ofthe 10th EWGT

Meeting, Poznan, 13–16 September 2005.

Tereshko, CV., & Loengarov, A. (2005). “Collective Decision-Making in Honey Bee

Foraging Dynamics”. Computing and Information Systems Journal, ISSN 1352-

9404, vol. 9, No 3.(72)

Tereshko, V. (2000). Reaction–diffusion model of a honeybee colony’s foraging

behavior. in: M.Schoenauer (Ed.), Parallel Problem Solvingfrom Nature VI,

Lecture Notes in Computer Science, vol. 1917, Springer–Verlag, Berlin(72).

Tereshko, V., & Lee, T. (2002). How information mapping patterns determine

foraging behavior of a honeybee colony. Open Systems and Information

Dynamics 9 181–193(72).

Tereshko, V., & Loengarov, A. (2005) Collective Decision-Making in Honey Bee

Foraging Dynamics.Comput. Inf. Sys. J., 9(3): 1–7.ternational Journal of

Computer Science and Information Technologies, Vol. 5 (3) , 2014, 3052-

305.http://www.sersc.org/journals/IJFGCN/vol5_no2/6.pdf.

http://dblp.uni-trier.de/pers/hd/c/Chase:Craig_M=

http://dblp.uni-trier.de/db/conf/icpp/icpp96-3.html#SinhaC96

https://www.google.com.my/search?tbo=p&tbm=bks&q=subject:%22Computers%22

https://www.google.com.my/search?tbo=p&tbm=bks&q=subject:%22Computers+Systems+Architecture%22

https://www.google.com.my/search?tbo=p&tbm=bks&q=subject:%22Computers+Systems+Architecture%22

https://scholar.google.com/citations?view_op=view_citation&hl=ja&user=tBcSd_gAAAAJ&citation_for_view=tBcSd_gAAAAJ:UebtZRa9Y70C

https://scholar.google.com/citations?view_op=view_citation&hl=ja&user=tBcSd_gAAAAJ&citation_for_view=tBcSd_gAAAAJ:UebtZRa9Y70C

http://www.sfu.ca/~ssurjano

http://www.cabdirect.org/search.html?q=do%3A%22Progress+in+Modern+Biomedicine%22

http://www.cabdirect.org/search.html?q=sn%3A%221673-6273%22

http://www.sersc.org/journals/IJFGCN/vol5_no2/6.pdf

171

Thiele, L., Miettinen, K., Korhonen, P.J., & Molina, J. (2009). A preference-based

evolutionary algorithm for multi-objective optimization. Evolutionary

Computation, 17(3), 411–436.

Thusoo A & Borthakur D, (2010).Data Warehousing and Analytics Infrastructure at

Facebook, Data warehouse, s calability, data discovery, resource sharing,

distributed file system, Hadoop, Hive, Facebook, Scribe, log aggregation,

analytics, mapreduce, distributed systems

Tomasz, N., Michalis P., Chaitanya, M., George K., & Nick K., Mrshare. (2010).

Sharing across multiple queries in mapreduce. PVLDB, 3(1-2):494–505.

Trummer and I. Koch. (2015) Multi-objective parametric query optimization.VLDB,

8(3):221–232, 2015.

Tsai, et.al .(2009). Enhanced Artificial Bee Colony Optimization, Innovative

Computing, vol.5, pp.1349-4198, Aug.2009.

TUBA, M. (2013). Artificial Bee Colony (ABC) Algorithm, Exploitation and

Exploration Balance. Latest Advances in Information Science and Applications.

Upen, S., & Jack, M. ( 1986). Multiple query processing in deductive databases using

query graphs. In VLDB, pages 384–391.

Vidya Banu & N. Nagaveni. (2013) Evaluation of a perturbation-based technique for

Visscher, PK., & Seeley, TD. (1982). Foraging strategy of honey bee colonies in a

temperate deciduous fores. Issue 6,63:1790–1801(72).

Vivek, S., & Brajesh, P (2012). An Idea of Extraction of Information Using Query

Optimization and Rank Query. (2012). International Journal of Advanced

Research Vol., 74. John Wiley & Sons, New York.VLDB, pages 930–941.

Wang & Beni (2011). An Improved Artificial Bee Colony Algorithm, IEEE Bolaji

AL, Khader AT, Al-betar MA.

Wang, R,Y, strong & Guauacio,L.M., (1996). beyond Accurancy .what data quality

mean to data consumer jurnal of mangment information system .12(4),5-33.

Wang, X., Burns, R., Terzis, A., & sun. (2008). Network-Aware Join Processing in

Global Scale Database Federations. ICDE 2008.

Wei, S., Daxin, L., Wansong, Z. ( 2004 ). An efficient method for XML queries

optimization based DTD abstraction and classification. Intelligent Control and

Automation,. WCICA 2004. Fifth World Congress on , vol.5, no., pp. 3926-

3929 Vol.5.

Wu, S., & Banzhaf, W. (2008). the use of computational intelligence in intuition

detection system : review , Article. Bibliometrics Data Bibliometrics.

Wu, S., Feng L., Sharad M. & Beng, O. (2011). Query optimization for massively

parallel data processing. In SOCC, pages 12:1–12:13.Scientific & Engineering

Research Volume 2, Issue 9, ISSN 2229-5518.

Wu, Y., Patel, J.M., & Jagadish, H.V. (2003). Structural join order selection for XML

query optimization. In: ICDE, pp. 443-454. IEEE Computer Society, New York

Xue, H., Zhang, P., & Yang, L. (2010). A multiple ant colonies optimization algorithm

based on immunity for solving tsp. pages 289–293, (61).

Yang & Deb (2009). Cuckoo search via Lévy flights. World Congress on Nature &

Biologically Inspired Computing (NaBIC 2009). IEEE Publications. pp. 210–

214

Yang, J., Karlapalem, K., & Li, O. (1997). Algorithms for terializedView esign in data

warehousing environment. In Proceedings of the 23rdInternational onference on

Very Large Data Bases, pages 136145. MorganKaufmannPublishers Inc.

172

Yongwen, X. (1998). Effciency in the Columbia database query optimizer. M.S.

Thesis,Portland State University.

Zafarni, E. (1993). new method for optimizing join queris processing in heterogeneous

distributed database. IEEE in knowledge discovery and data mining search .In

Proceedings of the Ninth I.nternational Conference on Data Engineering

(ICDE'93).

Zhang & Lin. (2010). An adaptive heterogeneous multiple ant colonies system.

volume 1, pages 193–196,.(61).

Zhang, & Chen, Y. (2011). Best-worst ant system. In 2011 3rd International

Conference on Advanced Computer Control. pages 392–395, (61).

Zhang, C., & Yang, J. (1999). Genetic algorithm for materialized view selection in

data warehouse environments. In Proceedings of the First International

Conference on Data Warehousing and Knowledge Discovery, pages 116

125.Springer-Verlag.

Zhou, J., & Larson, P. (2007). Efficient exploitation of similar subexpressions for

query processing. In SIGMOD, pages 533–544.

AN EFFICIENT MULTI JOIN QUERY OPTIMIZATION FOR...

Documents

Transcript of AN EFFICIENT MULTI JOIN QUERY OPTIMIZATION FOR...