AN EFFICIENT MULTI JOIN QUERY OPTIMIZATION FOR...
Transcript of AN EFFICIENT MULTI JOIN QUERY OPTIMIZATION FOR...
AN EFFICIENT MULTI JOIN QUERY OPTIMIZATION FOR RELATIONAL
DATABASE MANAGEMENT SYSTEM USING
SWARM INTELLIGENCE APPROACHES
AHMED KHALAF ZAGER ALSAEDI
A thesis submitted in
Fulfillment of the requirements for the award of the
Doctor of Philosophy in Information Technology
Faculty of Computer Science and Information Technology
Universiti Tun Hussein Onn Malaysia
JUNE 2016
iii
DEDICATION
DEDICATION
To my lord Allah, my Creator teacher and master messenger, Mohamed bin Abdullah
(Peace be upon him) my beloved mother, my beloved family, wife and children, all
the people in my life who touch my heart, I dedicate this research.
iv
ACKNOWLEDGEMENT
In the name of Allah, the beneficent, the merciful
I would like to express my deepest appreciation to my supervisor, Prof. Dr. Hajah
Rozaida Bt. Ghazali. Without her guidance and persist help, my thesis would not have
been finished. During the last few years, she has spent countless hours to patiently
guide me to build interesting ideas, strengthen the algorithms and improve the writings.
As a supervisor, she shows her wisdom, insights, wide knowledge and conscientious
attitude. All of these sets me a good example to become a good researcher.
I would like to thank all my friends in the database group who have made my
Ph.D. life more colorful. I would like to express my sincere gratitude to Prof. Dr.
Mustafa Mat Deris who helps me in methodology research work and make me capable
of achieving this research work.
Finally, I would like to thank my mother Rashida, without her continuous
support and encouragement I never would have been able to achieve my goals and
every decision I made during my Ph.D. life.
v
ABSTRACT
Currently, it is fairly obvious that the Multi Join Query Optimization (MJQO) is
becoming the centre of attention in the context of Database Management System
(DBMS). The functions consist of combination of data from multiple tables, reducing
the number of needed queries, optimizing the Query Execution Plan (QEP), and
moving processing abounded database servers to enhance both data integrity and
performance. MJQO is an optimization task, which serves to locate the optimal QEP
of a RDBMS in query processing. A major problem associated with RDBMS is the
fact that they are still unable to fully meet the demands of big data. The majority of
MJQO techniques encompass solution space at an extremely reduced pace. Many
queries attempted to gather information from multiple sites or correlations, while every
relation are compelled to answer these query via their limited resources. This lead to
the access of data from many locations that are limited in their memory retention
capabilities, which inevitably increase the size of the database, the number of the join,
and Query Execution Time (QET). In order to eschew trapping and slow coverage
difficulties in the quest to discover the optimal QEP and slow query execution time,
this work proposes a total of three optimization algorithm that are based on Particle
Swarm Optimization (PSO), Ant Colony Optimization (ACO), and Two-Phase
Artificial Bee Colony (TPAPC) to solve the optimization problem in RDBMS
Framework. The TPABC algorithm can be utilized to solve MJQO problems via
simulation and increasing exploration and exploitation whilst balancing them for
optimal results from giving queries. A directed acyclic graph, based on materialized
query graph, aids in the optimization of algorithms and solving MJQO by removing
non-promising QEP, which decreases the QEP combination space. Finally,
experimental results demonstrate that the performance of TPABC, when compared to
PSO, ACO, and native technique in the context of computational time, is very
promising, which is indicative of the fact that the TPABC algorithm is capable of
solving MJQO problems in shorter amounts of time and at lower costs compared to
other approaches.
ABSTRAK
vi
Sehingga kini, jelas bahawa Pengoptimuman Pertanyaan Gabungan Berganda (MJQO)
telah mendapat banyak perhatian dalam bidang Sistem Pengurusan Pangkalan Data
(DBMS). Fungsinya terdiri daripada gabungan data daripada jadual berganda,
pengurangan bilangan pertanyaan yang diperlukan, mengoptimumkan Rancangan
Pelaksanaan Pertanyaan (QEP) dan pemindahan pemprosesan pangkalan data pelayan
yang banyak untuk meningkatkan integriti dan prestasi data. MJQO adalah salah satu
tugas pengoptimuman, ia menggambarkan pencarian QEP yang optimum bagi DBMS
dalam pemprosesan pertanyaan. Walau bagaimanapun, penyelesaian kebanyakan teknik
MJQO diperoleh dalam kadar yang sangat perlahan. Oleh itu, untuk mengatasi masalah
terperangkap, masalah capaian perlahan dalam pencarian QEP yang optimum dan masa
pelaksanaan pertanyaan yang perlahan, kajian ini mencadagkan penambahbaikan tiga
algoritma pengoptimuman. MJQO yang ditambahbaik diinspirasikan daripada
Pengoptimuman Kawanan Zarah (PSO), Pengoptimuman Koloni Semut (ACO) dan dua
fasa perilaku Koloni Lebah Buatan (ABC) telah digunakan untuk menyelesaikan masalah
dalam Rangka Kerja RDBMS. Objektif utama kajian ini adalah untuk mengoptimumkan
QEP dan mengurangkan Masa Pelaksanaan Pertanyaan (QET) dalam RDBMS dengan
menggunakan pendekatan kecerdasan kawanan yang diinspirasikan daripada tiga
algoritma pengoptimuman, ABC, PSO dan ACO. Oleh yang demikian, Dua Fasa
Algoritma Koloni Lebah Buatan yang ditambahbaik (TPABC) digunakan untuk
menyelesaikan masalah MJQO dengan simulasi, peningkatan eksploitasi, mutu pencarian
dan memberi keseimbangan bagi mendapatkan hasil yang optimum dengan pertanyaan
yang telah ditetapkan. Struktur grafik diwakili oleh graf berkitar terarah berdasarkan
kenyataan graf pertanyaan, bagi membantu algoritma pengoptimuman dalam
menyelesaikan masalah MJQO, QEP yang tidak sesuai telah dipangkas, dengan itu, ia
dapat mengurangkan ruang kombinasi QEP. Akhir sekali, hasil eksperimen menunjukkan
bahawa prestasi TPABC berbanding PSO, ACO dan teknik naif dari segi pengiraan masa,
sangat memberangsangkan dan ini menunjukkan bahawa algoritma TPABC dapat
menyelesaikan masalah MJQO dalam masa yang singkat pada kos yang lebih rendah
berbanding teknik lain.
vii
TABLE OF CONTENTS
TITLE i
DECLARATION ii
DEDICATION iii
ACKNOWLEDGEMENTS iv
ABSTRACT v
ABSTRAK vi
TABLE OF CONTENTS vii
LIST OF PUBLICATIONS xi
LIST OF TABLES xiii
LIST OF FIGURES xvi
LIST OF SYMBOLS AND ABBREVIATIONS xx
LIST OF APPENDICES xxii
CHAPTER 1 INTRODUCTION 1
1.1 Research Background 1
1.2 Research Problems 4
1.3 Aims of Research 8
1.4 Research Objective 8
viii
1.5 Significance of Research Contribution 8
1.6 Scope of Research 9
1.7 Thesis Organization 10
CHAPTER 2 LITERATURE REVIEW
11
2.1 Introduction 11
2.2 Advantages of Database Management System 11
2.3 Query Optimization 13
2.3.1 Optimization in RDBMS 14
2.3.2 Architecture of Query Optimizer 16
2.4 Joint Methods 19
2.4.1 Natural Joint 20
2.4.2 Outer Joint 21
2.4.3 Left Joint 23
2.4.4 Right-Quter Joint 24
2.5 MJQO in RDBMS 24
2.6 Advantage of Multi-Joint Query Optimization 25
2.7 Techniques for MJQO in RDBMS 27
2.8 Manipulation Database: SQL 29
2.9 Swarm Intelligence Overview 31
2.10 Exploration and Exploitation Properties in SI 33
2.11 Swarm Intelligent Algorithms 35
2.11.1 Ant Colony Optimization 35
2.11.2 Particle Swarm Optimization (PSO) 38
2.11.3 Artificial Bee Colony Algorithm 40
2.11.4 Behavior of Real Bees 41
2.11.5 Modified Versions of ABC 46
2.12 Application of Swarm Intelligence in MJQO 50
2.13 Scenario leading to The Research Framework 52
2.14 Chapter Summary 56
ix
CHAPTER 3 RESEARCH METHODOLOGY 57
3.1 Introduction 57
3.2 Research Methodology Framework 58
3.3 Database 60
3.4 The Design of MJQO 60
3.5 Complexity of MJQO problem 62
3.5.1 Base line Joint Enumeration Algorithms 63
3.5.2 Bottom- up and Enumerations 68
3.5.3 Comparison of DPsize and DPset 69
3.6 Proposed MJQO Based on Swarm Intelligent Approaches 69
3.6.1 Artificial Bee Colony (ABC) 70
3.6.2 Practice Swarm Optimization (PSO) 75
3.6.3 Ant Colony Optimization (ACO) 78
3.7 The standard Test Function Performance
of SI Approaches 81
3.8 Multi-Joint Query Optimization Techniques 90
3.9 Graphical Representation 91
3.10 Multi -View Processing plan 97
3.11 Chapter Summary 102
CHAPTER 4 THE PROPOSED IMPPROVE ABC ALGORITHM 103
4.1 Introduction 103
4.2 Notation 105
4.3 Artificial Bee Colony Algorithm 105
4.4 Improved ABC Algorithm for MJQO Problem 106
4.4.1 Subset Function 107
4.4.2 QEP Function 107
4.4.3 Cost Estimate Function 108
4.4.4 QET Function 109
4.5 Proposed Two Phase Artificial Bees Colony 110
4.5.1 First phase (Employ Bee) 113
4.5.2 Second phase (Onlooker Bee) 121
4.6 Pruning Technique 124
x
4.7 Optimization 126
4.8 Chapter Summary 127
CHAPTER 5 SIMULATION RESULT 128
5.1 Introduction 128
5.2 QET in RDBMS based on My SQL Server 129
5.2.1 Experiment One 130
5.2.2 Experiment Two 131
5.2.3 Experiment Three 133
5.2.4 Experiment Four 135
5.2.5 Experiment Five 137
5.3 Optimized and Unoptimized Query effect 139
5.4 Time Complexity 141
5.5 Standard Test Function Performance of SI Approaches 147
5.6 MJQO with proposed Optimization Algorithm Two phase
ABC150
5.6.1 The effect on Number of Queries 151
5.6.2 The effect on Number of Data size 153
5.7 Effectiveness of ABC Algorithm Over Naïve Heuristic
Algorithm 154
5.8 Efficiency of Two phase ABC Algorithm 156
5.9 Optimization Using TPABC Algorithm for Evaluation Time
158
5.10 Discussions 160
5.11 Chapter Summary 161
CHAPTER 6 CONCLUSION AND RECOMMENDATIONS 162
6.1 Introduction 162
6.2 Summary of Findings 163
6.3 Contribution of the Research 166
6.4 Recommendation and Futurs work 167
REFERENCES 169
APPENDIX 186
xi
LIST OF PUBLICATIONS
Journals:
(i) Ahmed Khalaf Zager, Rozaida Ghazali and Mustafa Mat Deris," An
Efficient Multi Join Query Optimization for DBMS using Swarm Intelligent
Approach”, Publisher in IEEE DOI: 10.1109/7077312, 8-11 Dec. I: 10.1109 /
WICT.2014. 7077312.
(ii) Ahmed Khalaf Zager, Rozaida Ghazali and Mustafa Mat Deris,"
Materializing multi join query optimization for DBMS using swarm intelligent
approach" IJCISIM,ISSN 2150-7988, Published in International Journal of
Computer Information Systems and Industrial Management Applications.
ISSN1507988Volume7,(2015),pp.074.083©MIRLabs,www.mirlabs.net/ijcisi
m/index.htm.special issues.
(iii) Ahmed Khalaf Zager, Rozaida Ghazali and Mustafa Mat Deris
“Materialized View Selection for Query Optimization in Data Warehouse
System Using Heuristic Approaches”. Published in Journal of Next Generation
Information Technology, Vol. 6, No. 3, pp. 13 ~ 24, 201, 2015, and (Scopus).
(iv) Ahmed Khalaf Zager, Rozaida Ghazali and Mustafa Mat Deris," Improved
MJQO for DBMS using swarm intelligent approach" Advance Science Letter
, Volume 20, Number 10/11/12 American Scientific Publishers. Publication
type: Journals. ISSN: 19366612, 19367317, 2016, and (Scopus).
xii
Proceeding:
(i) Ahmed Khalaf Zager, Rozaida Ghazali and Mustafa Mat Deris,” An
Efficient Multi Join Query Optimization for RDBMS using Swarm Intelligent
Approach proceeding in Fourth World Congress on Information and
Communication Technologies , WICT 2014 (December 08-10, 2014 in
Malacca, Malaysia).
(ii) Ahmed Khalaf Zager, Rozaida Ghazali and Mustafa Mat Deris,” An
Efficient Multi Join Query Optimization for Relational Database Management
System Using Two Phase Artificial Bess Colony Algorithm” processing in
IVIC'15 - 4th International Visual Informatics Conference held at Hotel Bangi-
Putrajaya, Kuala Lumpur in 17-19 November, Advances in Visual Informatics
Volume 9429 of the series Lecture Notes in Computer Science pp 213-226.
Date: (LNCS- 2015).
(iii) Ahmed Khalaf Zager, Rozaida Ghazali and Mustafa Mat Deris,” Query Optimization for RDBMS using Swarm Intelligence Approaches ”,
International Symposium of Information and Internet Technology”,
MALTESAS conferences in 26-28 January held at Melaka, Malaysia in (2016).
xiii
LIST OF TABLES
2.1 Enroll table for natural join 21
2.2 Student table for natural join 21
2.3 Result Example Tow join 21
2.4 Student table for outer join 22
2.5 Faculty table outer join 22
2.6 Outer join for two table 22
2.7 Student table for left join 23
2.8 Faculty table for left join 23
2.9 Left join for two tables 23
2.10 Student table for Equijoin 24
2.11 Faculty table for Equijoin 24
2.12 Right outer Equijoin for two table 24
2.13 Student table for query 1 29
2.14 Result of query 1 30
xiv
2.15 Student table for query 2 30
2.16 Enroll table for query 2 30
2.17 Result for query 2 30
2.18 Swarm intelligent techniuques 35
2.19 An overview of ABC based on hony bees behivoer 49
3.1 Comparision of MJQO problem Based on ∣P∣, ∣T∣ 62
3.2 An example of particle Coding 77
4.1 Simple notation used in this chapter 105
4.2 Running example of queries and planes 112
4.3 An example illustrate the QET enumerate algorithm 119
5.1 Running queries for experiment one 130
5.2 QET and field number for experiment one 131
5.3 Running queries for experiment two 132
5.4 QET and Field Number for Experiment Two 133
5.5. Running queries for Experiment Three 134
5.6 QET and field number for Experiment three 135
xv
5.7 Running queries for experiment four 136
5.8 Query in single and multi-join to experiment four 136
5.9 Running queries for experiment five 138
5.10 QET and field number for experiment five 138
5.11 Running example of Query optimization 140
5.12 Improvement facto DPopt over DP set 141
5.13 Optimization Time in chain queris 142
5.14 Optimization Time in cycle queris 143
5.15 Time optimization for star queris 145
5.16 Time optimization for clique queris 146
5.17 The resulted obtain by PSO, ACO and ABC algorithm 47
5.18 main of best function values obtain for 50 cycle by ABC
Algorithm under different colony size 148
5.19 QET based on four optimization algorithm 152
5.20 Comparison of two algorithm combined with two technique 153
5.21 QET for two phase ABC 159
xvi
LIST OF FIGURES
1.1 Multi Join Query Optimization problem 7
2.1 Execuating SQL Queries in RDBMS 15
2.2 Sampled architcure of the query optimizatin in DBMS 16
2.3 QEP for various instancess of atemplate query 18
2.4 Swarm Intelligent Capability and Benefit 32
2.5 Ant Colony Optimization ACO 36
2.6 working Ant Colony Optimization 37
2.7 Pesudocode for ACO 37
2.8 Working of Particles Swarm Optimization 39
2.9 Pesudocode for PSO 40
2.10 Pesudocode for ABC 45
2.11 Senario leading to the reserch framework 55
3.1 Research Methodology Framework 59
3.2 Example of query graph types 61
3.3 Bottom-up order : DPSize 64
3.4 Bottom-up order : DPSet 66
xvii
3.5 Bottom-up enumaration : DPoptimization 68
3.6 Example of join processing tree 71
3.6 (a) Left deep tree 71
3.6 (b) Right deep tree 71
3.6 (c) Bush deep tree 71
3.7 Join Operation Bettween two Relations 72
3.8 Simulation of the Bees Behaviour with MJQO 75
3.9 ACO for MJQO 80
3.10 Schafer Function 82
3.11 Evaluation of Mean Best Values for Schafer Function 83
3.12 Source code of schafer function 83
3.13 Sphere Function 84
3.14 Evaluation of Mean Best Values for Sphere Function 84
3.15 Source code of Sphere Function 85
3.16 Griewank Function 85
3.17 Evaluation of Mean Best Values for Griewank Function 86
3.18 Source code of Griewank 87
3.19 Rastrigin Function 87
xviii
3.20 Evaluation of Mean Best values for Rastrigin Function 88
3.21 Source code of Rosanbork Function 88
3.22 Rosanbork Function 89
3.23 Evaluation of Mean Best values for Rosanbork function 89
3.24 Source code of Rosanbork Function 90
3.25 Query Evaluation Based Swrm Intelligent Technique 91
3.26 (a) Initial Graph 92
3.26 (b) First Iteration 93
3.26 (c) Second Iteration 94
3.27 Remove invalid edges 95
3.28 Pruning Technique 96
3.29 Example Queries 98
3.30 (a) Query graph for query 1
98
3.30 (b) Query graph for query 2 99
3.30 (c) Apply MVPP merged plan for queries 99
3.31 MVPP merged plan for queries 100
4.1 Simple Examples of MJQO Based on Proposed algorithm 104
4.2 Flowchart of Proposed TPABC Algorithm 111
xix
4.3 Phase Two with Pruning Technique 126
5.1 Effect Field Number for Experment1 131
5.2 Effect of Field Number for Experment 2 133
5.3 Effect of Field Number for Experment 3 135
5.4 Effects of Field Number on QET 137
5.5 Effect of Culumns Number on QET 139
5.6 Effect of optimizing Join Queries on QET 140
5.7 Chain Query for Set Tables 142
5.8 Relative Performance of Chain Query 143
5.9 Cycle Queries for Set Tables 143
5.10 Relative Performance for Cycle Queries 144
5.11 Star Queries for Set Tables 144
5.12 Relative Performance for Star Queries 145
5.13 Clique Queries for Set Tables 146
5.14 Relative Performance for Clique Query 146
5.15 The Result Obtained by PSO, ACO, and ABC Algorithms 148
5.16 Evalution of Mean Best Values for Schaffer Function 149
5.17 Evolution of mean best values for Sphere Function 150
xx
5.18 Effectiveness of optimization algorithm 152
5.19 Accuracies of Different Optimization Algorithms 153
5.20 Effect of Data Size 154
5.21 Two-Combined ABC Techniques 156
5.22 Effects of Number of Queries on QET 157
5.23 Effect of Number of Relation on QET 157
5.24 Optimization Time for Two-phase ABC 159
xxi
LIST OF SYMBOLS AND ABBREVIATIONS
MJQO - Multi join query optimization
SJQO - Single Join Query Optimization
RDBMS - Relation database management system
QET - Query execution time
QEP - Query execution plane
ABC - Artificial bee’s colony
PSO - Particle swarm optimization
ACO - Ant colony optimization
NT - Native Technique
N (Ri) - Set of neighbors for a relation Ri ∈ R w.r.t. G
N(S) - Set of neighbors for a set of relations S ⊆ R w.r.t. G
Min(S) - Relation with the smallest subscript index
In a set of relations S
TPABC - Two-Phase Artificial Bees Colony Algorithm
ACOMJQO - Ant Colony Optimization for MJQO
Ci - Set of connected subsets of R with a cardinality of i
2𝑝𝑠𝑘 - Set of k-way partitions of a connected subset S
𝒑𝒔 - Set of partitions of a connected subset S
P - Set of partitions of all the connected subsets in C
𝑻𝒔 - Multiset of connected subsets in all partitions in PS
T - Multiset of connected subsets in all partitions in P
𝑰𝒔 - Set of interesting plans for a connected subset S
CSE (QEP′) - Set of CSEs of a plan QET′ w.r.t. Q
xxii
Cost (QEP′) - Cost of a plan QET′
JoinExp (QEP′) - Join expression associated with a plan QEP
CSE (QEP) - Set of CSEs of Query Execution Plane
R = {R0, · · ·, Rn−1} - Set of relations in Q
G = (V, E) - Query graph for Q
C=⋃ 𝐶𝑖𝑛𝑖=2 - Set of connected subsets of R with a cardinality
of at least2
Q = {𝑸𝒊, · · ·, 𝑸𝒏} - Set of relations in Q
𝑼𝒊 = {𝑼𝒊𝟏, · · ·, 𝑼𝒊∣𝒖𝟏∣} - Set of all the possible plans for Qi
𝑾𝒊 = {𝑾𝒊𝟏, · · ·, 𝑾𝒊∣𝒘𝒊∣ } - Set of all the possible plans for Qi
xxiii
LIST OF APPENDICES
A: System analysis and configuration database 187
B: Querying from Multiple Tables 200
C: Source Code 207
xxiv
1
2CHAPTER 1
3INTRODUCTION
3.1 Research Background
A database management system (DBMS) is a computer software application that
interacts with the user, other applications, and the database itself to capture and analyze
data. A general-purpose DBMS is designed to allow the definition, creation, querying,
update, and the administration of databases. Meanwhile, an RDBMS is a DBMS based
on the relational aspect. As of 2015, many frequently used databases are based on the
relational database model.
Multi join query optimization (MJQO) for DBMS is perhaps the most
important application for searching and retrieving information in shorter amounts of
time. The rapid growth in the amount of data available in the world has compelled
DBMS to manage its data efficiently. This plays a big role in storage management and
maintenance of the data (Wang & Strong, 1996).
Another major player in data management is information retrieval. This is the
process of accessing data from relational databases, which is subsequently used to
make queries into databases. On the other hand, Structured Query Language (SQL) is
a programming language designed for organizing, manipulating, and retrieving data
to/from RDBMS (Srivastava & Han, 2012).
A query in RDBMS can be executed via multiple approaches, where each
query contains SQL clauses and filters due to a large number of alternative Query
Execution Plan (QEP) being possible, making it the main difficult task when selecting
optimal QEPs.
2
A QEP is represented as a query tree that includes information about the access
method available for each relation, as all the algorithms are used in computing the
relational operations in the tree. The important step is to generate codes for the selected
QEP, which will then be executed in either compiled or interpreted mode to produce
the query results (Singh, 2006).
In the case where the query is inserted, a query optimizer provides a large
number of execution strategies that are required to analyze the data for execution by
checking its validity. Hence, a large number of alternative execution plans are possible,
and after a special purpose, it is not possible to analyze every possible query execution
plan.
The inability to work with a large amount of data is a problem, and the major
concern pertaining to this flaw is the inability to select an optimal QEP for execution.
The MJQO problem appears when the number of joins in the query tree increases,
which subsequently increases the number of QEP. The traditional approach is very
costly and time consuming.
The problem of optimal join order in query optimization is NP-hard (Leo &
Cesar, 2008). To reduce its complexity, it should be followed up with a well-accepted
heuristic in RDBMS (Moerkotte & Neumann, 2006). On the other hand, (David &
Frank, 2007) accounted for all bushy plans, but excluded a cross product
mathematically from the enumeration space. Thus, in many case, the query optimizer
ends up having to optimize for a plan that has nearly optimized.
An optimal QEP has always depended on the number of tuples used in a query.
It means that the query optimizer primarily relies on statistical information to make
tuple assessment, and it always depends on the accuracy of tuple assessment.
Increasing the qualities of the selection process of an optimal QEP relies on additional
CPU cost and increased memory consumption. Cost estimation models are
mathematical algorithms or parametric equations used to estimate the costs of a QEP
in terms of time or memory consumption (Dong & Shivnath, 2011).
RDBMS is the most well-known database being used nowadays, which is
based on the relational database model (Leo & Cesar, 2008). Query language is an
effective tool, which provides an interface to a user to store and access data. In the past
few decades, SQL has emerged as a standard query language (Vidya Banu &
Nagaveni, 2012); (Rashid & Ali, 2010); (Chaudhuri & Krishnamurthy, 1995).
3
Two components that are evident for query evaluation are the query optimizer
and the query execution engine (Chaudhuri & Kr).An optimal solution should be able
to evaluate the connected subset enumerate (CSE) once and reuse their results for
subsequent queries to improve overall query performance. Complex multi-join queries
usually takes longer to evaluate due to the inherent complexity of the queries. There
could be considerable performance saving by sharing the computation of CSE among
the queries.
In an RDBMS context, it was shown that substantial performance saving can
be obtained by using MJQO techniques. In addition to MJQO techniques in the
RDBMS context, there are also some preliminary studies (Chaudhuri & Ger, 2006);
(Tomasiz et al., 2010); (Lim & Herodotou, 2012) on the MJQO techniques in the
DBMS context proposed by Google (Dean & Ghemawat, 2004), which have recently
emerged as a new paradigm for large-scale data analysis and widely embraced by
Amazon, Google, Facebook, Yahoo!, and many other companies.
There are two key reasons for this; first, the framework can be scaled to
thousands of commodity machines in a fault-tolerant manner, and is thus able to use
more machines to support parallel computing. Second, the framework has a simple yet
expressive programming model through which users can parallelize their respective
programs without being concerned about issues such as fault tolerance and execution
strategy )Deng & Chain, 2014).
While all MJQO techniques (Prasad &, Deshpande, 2011), (Yihong et al.,
1998), (Nilesh et al., 2003) have been extensively studied in the RDBMS context, most
mainly focus on optimizing a handful of SQL join queries. MJQO problem in the
RDBMS context differed from these works, since the focus on optimizing a large
collection (hundreds or thousands) of cross product queries produced by the
applications of enumerative set-based queries.
In a traditional database, the total numbers of relations in multi-join queries
are usually less than 10, which can be effectively handled by dynamic programming
approaches. The complexity of this problem increases due to generation of complex
multi-join queries in certain modern applications, such as knowledge-based systems,
decision support systems, expert systems, Online Analytical Processing (OLAP), and
data mining.
An increase in the number of tables in the join query also increases the number
of alternative QEP, which complicates the optimizer’s task. Traditional methods are
4
not able to solve this optimization problem effectively due to the increased size of the
data and larger number of tables (Dong et al., 2011). Deterministic algorithms, greedy
algorithms, and heuristic algorithm-based approaches have tried to approximate the
optimal solution, but their performance remains weak (Steinbrunn & Kemper, 1997).
This problem is then tied with genetic approaches and randomized approaches,
such as tabu search, ant colony, bee colony, etc., all of which performs better
(Kadkhodaei & Mahmoudi, 2011), but better quality performing solution is still vital.
Another work has proposed a new algorithm that utilizes a cuckoo search algorithm
(Yang & Deb, 2009) combined with the tabu search algorithm (Glover& Ullman,
1989) to seek better solutions and determine the optimal join order. It is an integrated
part of the query optimizer. The optimizer generates a QEP, which takes some time to
execute. All authors are unable to find an optimal solution to this problem due to the
usage of only one database, and the results obtained were based on the only number of
tables in the database, which is insufficient.
3.2 1.2 Problem Statements
In this study, there are two new problems, namely MJQO and Single Join Query
Optimization (SJQO) in RDBMS. They are a crucial factor that affects the capability
of the database. The MJQO technique used in RDBMS should aim to obtain results of
each query efficiently, and the process of query should be optimized for time efficiency
as well.
However, MJQO used in an RDBMS are inefficient in terms of Query
Execution Time (QET) and cost on average. The traditional query optimization
technology wasted a long time per query and for the staff when trying to request
information on the work(s). This increases the daily and annual costs in institutes or
company. The traditional applications of RDBMS are inefficient in terms of QET and
cost. The number of joins N involved in a single query is relatively small, usually N <
10.
With the expansion of the database application, the traditional query
optimization technique are unable to support some of the latest database applications,
such as applications of Decision Support System (DSS), OLAP, and Data Mining
(DM), which may demand a query of more than 100 genes.
5
When multiple users and variety queries access distributed federated database
multiple tables with data variety, the tables must be joined. This can result in many
database operations, leading to increased database sizes to huge tables, and join and
slow processing or a deadlock situation on the other hand, queries need to return
answer quickly to clients. To solve this problem, minimizing the number of joins,
query plans, queries, and increased sharing are all needed in order to decrease
administration time (less cost).
Hence, such shortfall in the traditional query optimization is gradually
exposed. It is therefore necessary to explore new techniques to solve the MJQO
problem. Since MJQO is an NP hard problem (Li Liu & Dong, 2008) with increased
join, the number of QEP corresponding to a query grows exponentially, which leads
to computational complexity of MJQO problem.
Hence, the need to acquire an improved quality and performance. The
implications of these criteria are important to increase speed of query and reduce cost
in RDBMS. Therefore, a new intelligent approach, such as the swarm intelligent
approach that performs well, shorter QET, and low cost are all required.
Solving problems with a heuristic algorithm becomes a hotspot as it appears
on many location or site of RDBMS, therefore needing multi-optimization or
decentralized optimization, as proven in certain studies, such as ACO (Li Liu & Dong,
2008), Greedy Algorithm (GA) (Prasan & Bhobe, 2000), Genetic Algorithm (GA),
ABC, (Abber & Mourad, 2013) etc. Several approaches have been proposed to model
the specific intelligent behavior of meta-heuristic being applied for solving
combinatorial problems.
The state-of-the-art work in this direction (Tomasz & Potamias, 2010)
proposed two sharing techniques for a batch of jobs. Recent researchers have used
different models to solve the MJQO problem. However, they have been unable to
provide a better solution in reducing the corresponding time and cost. Traditional
methods are not able to solve this optimization problem effectively due to the increased
data size and large number of tables (Dong, 2008).
6
The optimal join order in RDBMS framework has been widely adopted by
modern enterprises, such as Facebook (Thusoo & Borthakur, 2010), to process
complex analytical queries on large data warehouse systems due to its high scalability,
fine-grained fault tolerance, and easy programming model for large-scale data
analysis. Given the long execution times for such complex queries, it makes sense to
spend more time optimizing such queries to RDBMS for all processing time.
While the optimal join order problem has recently attracted much attention in
a conventional RDBMS context (Kiyoshi & Guy, 1990); (Guido Moerkotte, 2006);
(Guido & Thomas, 2012); (Isard & Prabhakaran, 2009); (Pit Fender & Guido, 2013);
(Pit Fender & Thomas Neumann, 2012); (Fender & Moerkotte, 2012); (Roy &
Siddhesh, 2000); (Nilesh & Sudarshan, 2003); (Zhou & Lehner, 2007), the developed
solutions are not applicable to RDBMS due to the differences in query evaluation
framework and algorithms.
The optimal join order problem in RDBMS has a larger join enumeration space
compared to that in RDBMS due to the presence of multi-way joins. There has been
good work in RDBMS context for complexity study (Kiyoshi & Guy, 1990);
(Moerkotte, 2006); (Fender & Guido, 2012); (Fender & Neumann, 2012).
To the best of our knowledge, there has not been any prior work on the study
of these problems in the presence of multi-way joins in DBMS context. First, the
intermediate results in RDBMS are always materialized instead of being pipelined as
in RDBMS, which simplifies the MJQO problem in two ways.
Second, the MJQO problem in RDBMS may incur deadlocks due to the
pipelining framework (Nilesh & Sudarshan, 2003), while RDBMS does not have
deadlock problem due to the materialization framework. Materializing and reusing
results of Connected Subset Enumerate (CSE) in RDBMS may incur additional
materialization and reading costs due to the pipelining framework. However, since the
intermediate results always materialized in the DBMS framework, and there is no
additional overhead incurred by the technique.
Although the MJQO problem in RDBMS has been shown to be a very difficult
problem with a search space that is doubly exponential in the size of the queries
(Prasan, & Siddhesh, 2000); (Nilesh & Sudarshan, 2003); (Jingree & Lehner, 2007),
the simplification in RDBMS enables them to propose join order algorithms for the
MJQO problem in RDBMS, however, they are unable to reduce the cost associated
with QET, cost, and search spaces.
7
The large search space, number of possible plans, and many semantically
equivalent logical plans, logical plans with N operators have 2n possible placement
decision. In a simple example, the following figure shows different possible plans for
only 3 joins on 4 tables in Figure 1.1.
Figure 1.1: Multi Join Query Optimization Problem
They share the same (A JOIN B) subtree. The existing techniques calculate the cost
for all posable plans, which means it takes a long time when using swarm intelligent
approaches instead of computing the cost of this subtree in every plan, compute it once,
save the computed cost, and reusing it when seeing this subtree again. Using this
swarm technique results in us having a (2*N)! / (N+1)! time complexity, “just” 3N. In
our previous example with 4 joins, it means passing from 336 ordering to 81.
1.3 Aim of Research
This study aims to provide a comprehensive and in-depth research for a systematic
study of MJQO problem in the RDBMS paradigm and proposed swarm intelligence
approaches, namely standard ACO, PSO, and improve the Two-Phase Artificial Bees
Colony Algorithm (TPABC).
Heading
Heading
Heading
Heading
JOIN
JOIN JOIN
A B C D
JOIN
JOIN
JOIN
A B C D
JOIN
JOIN
JOIN
A B D C
JOIN
JOIN
JOIN
C D A B
JOIN
JOIN JOIN
A B C D
JOIN
JOIN
JOIN
A B C D
JOIN
JOIN
JOIN
A B D C
JOINJOIN
JOIN
C D A B
8
The proposed algorithm is used to search for and insert the query execution plan and
optimal global query execution plan to solve the MJQO problem in order to RDBMS
to reduce time, cost, and increase the performance of RDBMS.
1.4 Research Objectives
To achieve the research aims, the objectives are as follows:
(i) To design an MJQO for a RDBMS using a query graph based on Pruning and
Materialize Techniques.
(ii) To propose a new Two-Phase Artificial Bee Colony (TPABC) by removing the
scout-bee agent in order to improve the exploration factor.
(iii) To optimize Query Execution Plan (QEP) and Query Execution Time (QET)
for (i) using the proposed (ii).
(iv) To compare the performance of the proposed method in (ii) with other QEP-
swarm-based, such as PSO and ACO for processing time and accuracy.
1.5 Significance of Research
An important component in RDBMS is the query optimization. A user request is
usually expressed in high-level, non-procedural language describing the condition
produced by RDBMS’ need to satisfy.
The main problem in the RDBMS is the volume, which grows from 10 GB to
100 TB, or Exabyte in recent years. Query processing needs to be combined with non-
related sources over distributed database to obtain data with huge spaces.
Each query in the optometry phase produces more than one query plan, and
the optimizer tries to select the best plan at lower costs. All clients see similar views,
and are able to find similar replicas of unstructured data, which leads to very expensive
throughput and takes a long time for a user or client in a company, resulting in loss of
income.
9
The multiplicity of human needs is increasing alongside limited resources,
such as the MJQO problem. Economic resources are limited and insufficient to satisfy
all human needs characterized by parochialism and the lack the human needs of
multiple repeated renewal, such as the need to constantly include food, housing,
treatment, and jobs. Multi-join query optimization problem has been widely addressed
in RDBMS.
Therefore, it is necessary to design an efficient MJQO to determine the best
QEP and minimizing the number of queries or objectives and joins based on a swarm
intelligence approach that can be adapted to solve the MJQO problem. The proposed
TPABC optimization algorithm is used to select an evaluation plan for a batch of
queries and best plans in RDBMS. This is done by expanding exploration to find the
optimal QEP for MJQO in order to improve the performance of RDBMS. The
exploitation process is increased using TPABC to find the global optimal plane from
command sub-expression queries sharing.
1.6 Scope of Research
This research aims to enhance the overall statues on MJQO in RDBMS to solve MJQO
problem, which is the NB-hard problem in RDBMS. The study proposed swarm
intelligence approaches, such as (ABC, PSO, ACO), as new methods to reduce the
complexity and cost in order to solve this problem. All these algorithms are used to
optimize QEP, QET and cost. The research work proposed TPABC to improve
exploration and exploration factors to increase the performance of the database. The
study attempt to solve optimal join order problem in RDBMS based on four types of
query graph in RDBMS framework.
1.7 Thesis Organization
This thesis is organized and divided into six chapters. The first chapter introduces the
research background, problem statements, and objectives and contributions. Chapter
two presents a comprehensive literature review of the problems in RDBMS and
provide an overview of the swarm intelligence-based algorithm, such as ABC, ACO,
and PSO and joint techniques in RDBMS.
10
Chapter three encompass the methodology used to carry out the study systemically. It
consists of optimization algorithm (i.e. ABC, PSO, and ACO) and two new techniques
to solve MJQO problems in DBMS. Chapter four explains the proposed improve
TPABC swarm-based MJQO in DBMS to solve MJQO problem, and compares
TPABC with (naive heuristic algorithm) to improve factors of exploration and
exploitation. Chapter five simulate the result and analysis data of both MJQO and
QET. Finally, Chapter six conclude the work and provide suggestions and contribution
of the research, and points out some directions for future work.
CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
The second chapter of the thesis is the heart of an investigation, in which it provides
an overview of contemporary literature in a broad academic and historical context
(Boote & Beile, 2005). The chapter sets to describe the focus or content of the study
and provide definitions of the scope of the study. This literature review explores there
domain themes of the research work: Relational Database Management System
(RDBMS) performance, Multi Join Query Optimization (MJQO) as good issues to
improve RDBMS performance and setup swarm intelligent approaches as a technique
to solve MJQO problems. The scope of this literature review is expanded to include
the researches that examine the domain themes of the research work, the MJQO
problem has been widely addressed in Relational Database Management Systems
(RDBMS).
2.2 Advantages of Database Management System
Because data are the crucial raw material from which information is derived must have
a good method to manage such data. DBMS helps make data management more
efficient and effective, in particular, a DBMS provides advantages such as improved
data sharing. The DBMS helps create an environment in which end users have better
access to more and better-managed data. Such access makes it possible for end users
to respond quickly to changes in their environment to improve data security.
In cases where more users access the data, the greater the risks of data security
breaches. As such, it is noted that corporations, ensuring the corporate data are used
properly by investing considerable amounts of time, effort, and money.
12
Therefore the use of DBMS provides a framework for better enforcement of
data privacy and security policies. Better data integration with wider access that allows
well-managed data are able to promote an integrated view of the organization’s
operations and a clearer view of the big picture.
It becomes much easier to see how actions in one segment of the company
affect other segments. Data inconsistency exists when different versions of the same
data appear in different places. The RDBMS makes it possible to produce quick
answers to ad hoc queries. From a database perspective, a query is a specific request
issued to the DBMS for data manipulation, for example; to read or update the data
simply put, a query is at work, and an ad hoc query is a spur-of-the-moment work.
RDBMS sends back an answer (called the query result set) to the application.
Technological advancements around transmission of data through the network, have
largely influenced the cost of transmitting the data per terabyte over long distances
(Gelogo & Lee, 2012). Furthermore, the RDBMS has achieved progress in two
company’s dimensions: data management and data transfer.
Based on the relate research, data management happens to be more costly than
data transfer (Gelogo & Lee, 2012; Garefe, 1996). In addition, there is a rapidly
growing interest in outsourcing DBMS tasks to third parties that can provide these
tasks for much lower cost due to economy of scale.
Designation of a new outsourcing model has few benefits, but the most
significant benefit is the reduction of the cost of running DBMS on one’s own (Gelogo
& Lee, 2012), (Buyya et al., 2011).
Whereby it shares information between multiple devices, and the number of
these devices which expected to increase. Currently it is notable that there are a lot of
companies that offer DBMS as a cloud service such as: Microsoft Azure, google,
amazon EC2, GoGrid, guarantee data, Mongo lab, etc.
2.3 Query Optimization
Current relational optimizers are influenced by the techniques introduced in the system
query optimizer (Patricia & Raymond, 1997; Chaudhuri & Krishnamurthy, 2006). One
important contribution of this reference is a cost-based framework to obtain execution
plans, which is still used with some variations in most current optimizers.
13
Another important contribution of (Patricia & Raymond, 1997) is a bottom-up
dynamic programming, search strategy to traverse the space of candidate execution
plans. This strategy needs to consider O (N) expressions (Kiyoshi & Guy, 1990) for a
given query. To decrease optimization time, some heuristics are used such as delaying
the optimization of cartesian products, or considering only leaving-deep join trees.
The Starburst optimizer (Laura. & Christoph, 1998; Laura & Lohman, 1990)
extend system-r with a more efficient, extensible approach and consists of two rule-
based subsystems. In the second phase the actual execution plan is chosen.
Physical operators called LOLEPOPs can be combined in many ways to
implement higher-level operators, and such combinations are expressed in a grammar
production-like language (Guy & Lohman, 2001). The join enumerator in the starburst
is similar to the system bottom-up enumeration scheme. The Exodus optimizer
generator (Graefe & David, 1987) is the first extensible optimization framework that
uses a top-down approach.
Exodus separates the optimizer's search strategy from its data model, and
distinguishes between transformation rules (which map one algebraic expression into
another) and implementation rules (which map an algebraic expression into an
operator tree). Although it was difficult to construct efficient optimizers provide a
useful foundation for the next generation of extensible optimizers.
The Volcano Optimizer Generator (William, 1993) improves the efficiency of
exodus and introduces ore extensibility and effectiveness. Volcano's search algorithm
combines dynamic programming with directed search based on physical properties,
branch-and-bound prune and heuristic guidance. Finally, the cascades framework
(Shekita & Wilms, 1993) solves some problems present in Exodus and Volcano, and
improves functionality, ease of use, and robustness without compromising
extensibility and efficiency.
Cascades are the state-of-the-art rule based optimization framework used in
current optimizers such as Tandem's Nonstop SQL (Pedro & Celis, 1996) and
Microsoft SQL server (Graefe, 1996) the cascades framework differs from the
starburst in its approach to enumeration, in fact, this system does not use two distinct
optimization phases as Starburst does, and the application of rules is goal-driven, as
opposed to the forward-chaining rule application phase in Starburst. A detailed
description of the Cascades and some extensions to the original framework appear in
(Yongwen, 1998; Billings, 1997).
14
2.3.1 Optimization of Relational Database Management System
Relational query languages provide a high-level declarative interface to access data
stored in relational database systems. With a declarative language, users (or
applications acting as users) write queries stating what they want, but without
specifying step-by-step instructions on how to obtain such results.
In turn, the RDBMS internally determines the best way to evaluate the input
query and obtains the desired result. Structured Query Language, or SQL (Jim Melton
& Alan Simon, 1993) has become the most widely used relational database languages
in order to answer a given SQL query. Atypical RDBMS goes through a series of steps,
illustrated in Figure 2.1, which shows the input query, treated as a string of characters,
is parsed and transformed into an algebraic tree that represents the structure of the
query.
This step performs both syntactic and semantic checks over the input query,
rejecting all invalid requests. The algebraic tree is optimized and turned into a query
execution plan. A query execution plan indicates not only the operations required to
evaluate the input query, but also the order in which they are performed, the algorithm
used to perform each step and the way in which stored data are obtained and processed
(Graefe, 1993) the query execution plan is evaluated and results are passed back to the
user in the form of a relational table.
Figure 2.1: Executing SQL Queries in a Relational Database System.
Optimizer Execution
Engine Parser
SELECT R. a, S.cFROM R,S,T
WHERE R .x=S .yAND S.b <10
R. a S .cX 1Y 6 Z 4
Output Table Input SQL query
|X|R.xS.y
R σ s.b<10
∏ R. a, S. cProject R.a S.c
Merge join R.x,S.y
Sequential scan over R
Sort[S.Y]Algebra Tree
Clustered index scan over S Filter on fly
[S.b<10]Query execution
plan
Simplified the RDBMS
15
Modern relational query optimizers are complex pieces of code and typically
represent 40 to 50 developer-years of effort (Raghu & Johannes, 2000). As stated
before, the role of the optimizer in a database system is to identify an efficient
execution plan to evaluate the input query indicate in Figure 2.1. To that end,
optimizers usually examine a large number of possible query plans and choose the one
that is expected to result in the fastest execution.
Database queries are given in declarative languages, typically SQL. The goal
of query optimization is to choose the best execution strategy for a given query under
the given resource constraints. While the query specifies the user intent (i.e., the
desired output), it does not specify how the output should be produced. This allows for
optimization decisions, and for many queries there is a wide range of possible
execution strategies, which can differ greatly in their resulting performance. This
renders query optimization an important step during query processing.
The role of the optimizer is to determine the lowest cost plan for executing
queries. By "lowest cost plan," it means an access path to the data that takes the least
amount of time. Times invoke the optimizer for structural query language (SQL)
statements when more than one execution plan is possible. The optimizer chooses what
it thinks is the optimum plan. This plan persists until the statement is either invalidated
or dropped by the application.
2.3.2 Architecture of Query Optimizer
Several query optimization frameworks have been proposed in the literature (David,
1987; William, 1993; Patricia & Raymond, 1997; Laura & Christoph, 1998; Graefe,
1995) and most modern optimizers rely on the concepts introduced in these references.
Although implementation details vary among specific systems, virtually all
optimizers share the same basic structure (Ioannidis, 1997; Surajit, 1998) as shown in
Figure 2.2.
16
Input Query
Output Query
Execution plan
Sub QEP Explored
EnumerationEngine
Cost Estimation Cost Estimation module Cardinality Estimate
Simplified Query Optimizer
Figure 2. 2: Sampled Architecture of the Query Optimizer in a Database System.
For each input query, the optimizer considers a multiplicity of alternative plans.
For that purpose, enumeration engine navigates through the space of candidate
execution plans by applying rules.
Some optimizers have a set of rules to enumerate alternative plans (Patricia &
Raymond, 1997). While others implement extensible transformational rules to
navigate through the search space (Laura, 1998; Graefe, 1995).
During optimization, a cost module estimates the expected consumption of
resources of each discovered query plan (resources are usually the number of I/O's, but
can also include CPU time, memory, communication bandwidth, or a combination of
these). Finally, once all interesting execution plans are explored, the optimizer extracts
the best one, which is evaluated in the execution engine shows in Figure 2.3.
The cost estimation module is then a critical component of a relational
optimizer. In general, it is not possible to obtain the exact cost of a given plan without
executing it (which does not make sense during optimization). Thus, the optimizer is
forced to estimate the cost of any given plan without executing it. It is then
fundamental for an optimizer to rely on accurate procedures to estimate costs, since
optimization is only as good as its costs estimates. Cost estimation must also be
efficient, since it is repeatedly invoked during the optimization process.
17
The basic framework for estimating costs is based on the following recursive
approach described in (Surajit, 1998) as collect statistical summaries of stored data,
given an operator in the execution plan and statistical summaries for each of its sub-
plans, determine tow operation statistical summaries of the output and estimated cost
of executing the operator. The second step can be applied iteratively to an arbitrary
tree to derive the costs of each operator. The estimated cost of a plan is then obtained
by combining the costs of each of its operators. In general, the number of disk I/O's
needed to manage intermediate results while executing a query plan (and thus the
plan's cost) is a function of the sizes of the intermediate query results.
Therefore, the cost estimation module heavily depends on cardinality estimates
of sub-plans generated during optimization. The following example illustrates how
sizes of intermediate results can significantly change the plan that is chosen by an
optimizer.
Example 1: Consider the following query template, where C is a numeric parameter.
SELECT * FROM R, S
WHERE R.x =S.y and R.a < C
Figure 2.3 shows the execution plans produced by an optimizer when
instantiate C with the values 20, 200, and 2000. Three instantiated queries are almost
identical.
Figure 2.3: Query Execution Plans for Various Instances of a Template Query
18
The resulting query plans are considerably different. For instance, in Figure 2.3
(A), the optimizer estimates that the number of tuples in R satisfying R. a < 20 is very
small, so it chooses to evaluate the query as follows. First, using a secondary index
over R.a, it retrieves the record identifiers of all tuples in R that satisfy R.a < 20. Then,
using lookups against table R, it fetches the actual tuples that correspond to those
record identifiers. It performs a nested-loop join between the subset of tuples of R
calculated before, and table S, which is sequentially scanned.
For the case C = 2000 in Figure 2.3 hash join, the optimizer estimates that the
number of tuples of R satisfying R. a < 2000 is rather large, and therefore chooses to
scan both tables sequentially (discarding on the y the tuples from R that do not satisfy
the condition R. a < 2000) and then perform a hash join to obtain the result.
(In this scenario, the lookups of the previous plan would have been too
numerous, and therefore, too expensive).
Figure 2.3 merge join, shows yet another execution plan that is chosen when
the number of tuples of R satisfying the predicate is neither too small nor too large. In
this case, table S is scanned in increasing order of S: y using a clustered index, and
table R is scanned sequentially (discarding invalid tuples on the y as before) and then
sorted by R: x.
A merge join is performed on the two intermediate results it is known that if
cardinality estimates are accurate, overall cost estimates are typically by no more than
10 percent (Michael & Lohman, 2001). However, cardinality estimates can be off by
orders of magnitude when the underlying assumptions on the data distribution are
invalid. Clearly, if the optimizer does not have accurate cardinality estimations during
optimization, the \wrong" execution plan might be chosen for a given query.
In the previous example, if the number of tuples satisfying R. a < 2000 is
underestimated, the optimizer could choose the less efficient plan (for that scenario)
of Figure 2.3 merge join, and therefore waste time by sorting a large intermediate
subset of R. In the context of adaptive query processing (Joseph & Franklin, 2003)
where initial bad choices during optimization can be later corrected during query
execution, accurate cardinality estimates allow the optimizer to start with a higher
quality execution plan, thus minimizing the probability of dynamic changes during
query execution. Henceforth for this reason, it is crucial to provide the optimizer with
accurate procedures to estimate cardinality values during optimization.
19
Next section give an overview of statistical structures that can be used to
estimate the cardinality of intermediate results generated by query sub-plans during
optimization and there are a few type of join methods in DBMS.
2.4 Join Methods
Theta join combines tuples from different relations provided they satisfy the theta
condition. The join condition is denoted by the symbol θ, R1 ⋈ R2, R1 and R2 are
relations having attributes (A1, A2 ... An) and (B1, B2 … Bn) such that the attributes
do not have anything in common, that is R1 ∩ R2 = Ø. The optimizer can be selected
from multiple join methods. When the rows from two tables are joined, one table is
designated the outer table and the other the inner table.
The optimizer decides which of the tables should be the outer table and which
should be the inner table. During a join, the optimizer scans the rows in the outer and
inner tables to locate the rows that match the join condition. The optimizer analyses
the statistics for each table for example; might identify the smallest table or the table
with the best selectivity for the query as outer table.
If indexes exist for one or more of the tables to be joined, the optimizer takes them into
account when selecting the outer and inner tables. If more than two tables are to be
joined, the optimizer analyses the various combinations of joins on table pairs to
determine which pair to join first, which table to join with the result of the join, and so
on for the optimum sequence of joins.
The cost of a join is largely influenced by the method in which the inner and
outer tables are accessed to locate the rows that match the join condition. The optimizer
selects from two join methods when determining the query optimizer plan.
The current join methods as natural join, outer join are not sufficient to merge
tables of database therefore necessary to find new and efficient way to improve and
optimize query and RDBMS performance.
2.4.1 Natural Joint (⋈)
Natural join does not use any comparison operator. It does not concatenate the way a
cartesian product does can perform a natural join only if there is at least one common
20
attribute that exists between two relations. In addition, the attributes must have the
same name and domain. Natural join acts on those matching attributes where the values
of attributes in both the relations are the same.
Example Two:
SELECT Enroll, StuId, lastName, first Name
FROM Student, Enroll
WHERE class No = ’ART 103A’
AND Enroll. StuId = Student. StuId
Table 2.1: Enroll Table Table 2.2: Student Table
Table 2.3: Result Example Two Join
StuId Last Name First Name
S1001 Smith Tom
S1002 Chin Ann
S1010 Burns Edward
In the example two required the use of two tables shown in the two Tables 2.1
and 2.2 and join those records into a new table shown in Table 2.3 and join those record
into new table. From this table, the result show the last name and first name, this is
similar to the join operation in relational algebra. SQL allows the user to do a natural
join. The result of join Enroll and Student that show in Table 2.3.
StuId last Name First Name Major Credits
S1001 Smith Tom History 90
S1002 Chin Ann Math 36
S1005 Lee Perry History 3
S1010 Burns Edward Art 63
S1013 McCarthy Owen Math 0
S1015 Jones Mary Math 42
S1020 Rivera Jane CSC 15
StuId Class Number Grade
S1001 ART103A A
S1001 HST205A C
S1002 ART103A D
S1002 CSC201A F
S1002 MTH103C B
S1010 ART103A
S1010 MTH103C
S1020 CSC201A B
S1020 MTH101B A
Nature join
join
21
2.4.2 Outer Joint
Previously, discussed at nature join, where the selects rows for the common to the
participating tables to a join. What about the cases are interested in selecting elements
in a table regardless of whether they are present in the second table will now need to
use the SQL OUTER JOIN command. The syntax for performing an outer join in SQL
is database dependent. For example, in Oracle, will place an "(+)" in the WHERE
clause on the other hand of the table for which it wanted to include all the rows. Let is
assume they have the following two Tables 2.4 and 2.5.
Student OUTER-EQUIJOIN Faculty
Compare Student. LastName with Faculty.name
The result of outer join query show in the Table 2.6.
Table 2.4: Student Table Table 2.5: Faculty Table
Table 2.6: Outer Join for Student and Faculty Tables
StuId LastName FirstName Major Credits FacId Name Department Rank
S1001 Smith Tom History 90 F221 Smith CSC Professor
S1001 Smith Tom History 90 F115 Smith History Associate
S1002 Chin Ann Math 36
S1005 Lee Perry History 3
S1010 Burns Edward Art 63
S1013 McCarthy Owen Math 0
S1015 Jones Mary Math 42
S1020 Rivera Jane CSC 15
F101 Adams Art Professor
F105 Tanaka CSC Instructor
F110 Byrne Math Assistant
StuId Last Name First
Name
Major Credits
S1001 Smith Tom History 90
S1002 Chin Ann Math 36
S1005 Lee Perry History 3
S1010 Burns Edward Art 63
S1013 McCarthy Owen Math 0
S1015 Jones Mary Math 42
S1020 Rivera Jane CSC 15
FacId Name Dep Rank
F101 Adams Art Prof
F105 Tanaka CSC Instr
F110 Byrne Math Assi
F115 Smith History Asso
F221 Smith CSC Prof
Outer Join
22
The outer equijoin use to search full tables left and right in current example
search about student last name in Table 2.4 to compare last name in student Table 2.4
with name in the faculty in the Table 2.5 to finding similar name then the result will
put the required record in the result Table 2.6 otherwise leave the record of right table
is null.
2.4.3 Left Outer Joint
In a left outer join, all rows from the first table mentioned in the SQL query is selected,
regardless whether there is a matching row on the second table mentioned in the SQL
query. Let is assume having the following two tables.
Table 2.7: Student Table Table 2.8: Faculty Tables
Table 2.9: Left Join for Student and Faculty Tables
StuId LastName FirstName Major Credits
S1001 Smith Tom History 90
S1002 Chin Ann Math 36
S1005 Lee Perry History 3
S1010 Burns Edward Art 63
S1013 McCarthy Owen Math 0
S1015 Jones Mary Math 42
S1020 Rivera Jane CSC 15
FacId Name Department Rank
F101 Adams Art Professor
F105 Tanaka CSC Instructor
F110 Byrne Math Assistant
F115 Smith History Associate
F221 Smith CSC Professor
StuId LastName First Name Major Credits FacId Name Department Rank
S1001 Smith Tom History 90 F221 Smith CSC Professor
S1001 Smith Tom History 90 F115 Smith History Associate
S1002 Chin Ann Math 36
S1005 Lee Perry History 3
S1010 Burns Edward Art 63
S1013 McCarthy Owen Math 0
S1015 Jones Mary Math 42
S1020 Rivera Jane CSC 15
Left join
23
In the left join, all rows in the left table to kept in the result and compare the
last name in student Table 2.7 with name in the faculty Table 2.8 for the column name
in the faculty, if the same name have found in the name of faculty table, in this case
save all record of faculty table in the result Table 2.9 otherwise the result will be null.
2.4.4 Right-Outer Joint
The right outer join keyword returns all rows from the right table with the matching
rows in the left table. The result is NULL in the left side when there is no match.
Student RIGHT-OUTER-EQUIJOIN
Table 2.10: Student Table Table 2.11: Faculty
Table 2.12: Result of Right Outer join
FacId Name Department Rank
F101 Adams Art Professor
F105 Tanaka CSC Instructor
F110 Byrne Math Assistant
F115 Smith History Associate
F221 Smith CSC Professor
StuId LastName FirstName Major Credits
S1001 Smith Tom History 90
S1002 Chin Ann Math 36
S1005 Lee Perry History 3
S1010 Burns Edward Art 63
S1013 McCarthy Owen Math 0
S1015 Jones Mary Math 42
S1020 Rivera Jane CSC 15
StuId LastName FirstName Major Credits FacId Name Department Rank
S1001 Smith Tom History 90 F221 Smith CSC Professor
S1001 Smith Tom History 90 F115 Smith History Associate
F101 Adams Art Professor
F105 Tanaka CSC Instructor
F110 Byrne Math Assistant
Right Outer Join
24
2.5 Multi Join Query Optimization in Relational Database Management System
Query optimization is a function of many relational database management systems.
The query optimizer attempts to determine the most efficient way to execute a given
query by considering the possible query plans. Multi-joint query is one of the basic
operations while using database. Therefore, Multi-joint query optimization is of great
necessity to improve database performance.
There are often other cost metrics in addition to execution time that are relevant
to compare query plans (Trummer & Immanuel, 2015). In a cloud computing scenario
for instance, one should compare query plans not only in terms of how much time they
take to execute but also in terms of how much money spending their execution costs.
The context of approximate query optimization, it is possible to execute query plans
on randomly selected samples of the input data in order to obtain approximate results
with reduced execution overhead.
For example, in a database system enhanced with inference capabilities, a
simple query involving a rule with multiple definitions may expand to more than one
actual query that has to be run over the database.
In the past few years, several attempts have been made to extend the benefits
of the database approach in business to other areas, such as artificial intelligence and
engineering design automation. Traditionally, query optimizers like (Chaudhuri, 2006)
optimize queries one at a time and do not identify any commonalities in queries,
resulting in repeated computations. As observed in (Rosenthal & Chakravarthy, 1988;
Sellis, 1988) exploiting common results can lead to significant performance gains.
This is known as multi-query optimization. Existing techniques for multi-query
optimization assume that all intermediate results are materialized (Cosar & Srivastava,
2008; Roy & Seshadri, 2000; Deshpande et al., 1998).
They assume that if a common subexpression is to be shared, it will be
materialized and read whenever it is required subsequently. Current multi-query
optimization techniques do not try to exploit pipelining of results to all the users of the
common subexpression.
REFERENCES
Abber Al-Dayel & Murad. (2013). Query paraphrasing enhancement using artificial
bee colony. In the 3rd International Conference on Web Intelligence. Mining and
Semantics.
Alamery & Faraahi. (2010). Multi-join query optimization using the bee’s algorithm.
In Proceedings of the 7th International Symposium on Distributed Computing
and ArtificialIntelligence (pp. 449-457).
Alberto, O. & Mendelzon, S. (2002). Proc. algorithm for the generation of optimal
bushy join trees without cross products. In Database Engineering and
Applications Symposium (IDEAS). IEEE CS Press. Edmonton, Canada, July .
Alzaqebah, M. & Abdullah, S. (2011). Artificial bee colony search algorithm for
examination timetabling problems, ToX: The Toronto XML Server. Int. J.
Phys.Sci.(79).
Arens & Knoblock. (1994). Cooperating agents for information retrieval. In processing
of the second of international conference on comparative of information system.
Awadallah, M. (2011). Artificial Bee Colony Algorithm for Curriculum-Based Course
Timetabling Problem. ICIT.(78).
Baykasoglu, L. Ozbakir & P. Tapkan. (2007). Artificial bee colony algorithm and its
application to generalized assignment problem. Swarm Intelligence: Focus on
Ant and Particle Swarm Optimization, 2007, 113-144.
Beard, L. Getoor, M. Blake.(2007) Visual mining of multi-modal social networks at
different abstraction levels. IEEE Conference on Information Visualization
Symposium of Visual Data Mining (IV-VDM), July 2007.
Beynon & Kur (2001).A. Sussman,H. Andrade, R. Ferreira, and J. Saltz. Processing
largescale multi-dimensional data in parallel and distributed
environments.Parallel Computing, 28(5):827–859.
Billings, K. (1997). A TPCD model for database query optimization in Cascades.M.S.
Thesis, Portland State University.
Biskup, D. & Feldmann, M. S. (2001). Benchmarks for scheduling on a single
machine against restrictive and unrestrictive common due dates. Computers &
Operations Research, volume 28, pp. 787 -801.
Boote, N. & Beile, P. (2005). Scholars Before Researchers: On the Centrality of the
Dissertation Literature Review in Research Preparation.Educational Researcher,
Vol. 34, No. 6, pp 3-15.
Bramley & Chiu (2000) .Acomponent based services architcture for building
distrbuted applications .in processding of HPDC.
Bullnheimer, B. & Hartl, R. (1998). Applying the ant system to the vehicle
routingproblem. In: Osman, I.H., Vo, S., Martello, S., Roucairol, C. (eds.)
Metaheuristics: Advances and Trends in Local Search Paradigms for
Optimization, pp. 109–120. Kluwer Academics,Dordrecht .
163
Burrough, P.A. (1986) Principles of Geographic Information Systems for Land
Resource Assessment. Monographs on Soil and Resources Survey No. 12,
Oxford Science Publications, New York.
Camazine, S., Franks, N., & Deneubourg, L. ( 2001). Self-Organization in Biological
Systems. Princeton Studies in Complexity, PrincetonUniversity Press,
Princeton, NJ.(62).
Cao, Y., & Fang, Q. (2008). Parallel Query Optimization Techniques for Multi-Join
Expressions. Jurnal of software 13(2). 250-256 .
Catherine Riccardo (2012). Introduction in daatabase conceptual .third edition ,
Canada. Cathleen Sether.
Celis,& pedro . (1996). The query optimizer in Tandem's new Server Ware SQL
Product. In Proceedings of the Twenty-second International Conference on Very
Large Databases (VLDB'96).
Chakravarthy, B.S. (1986): Measuring Strategic Performance, Journal,7 (5), 437458.d
oi: 10.1002/ smj.4250070505.
Chande, S.V., & snik, M. (2007). Genetic Optimization for the Join Ordering Problem
of Database Queries. Jaipur, India, Department of Computer Science
International School of Informatics and Management.
Chaudhuri & Ger (2006). Probabilistic information retrieval approach for ranking of
database query results. ACM Transactions on Database Systems 31(3):1134–
1168. DOI doi.acm.org/10.1145/1166074.1166085
Chaudhuri & krishnamurthy (2006) .optimization queries with matarlized views .
proc.11 the ICDE,190,200.
Chen & Dunham (2006). Ansthe ring Top-K Queries with Multi-Dimensional
Selections: The Ranking Cube Approach. ACM.
Christian & Andrea, (2003). Relative Deprivation, Personal Income Satisfaction, and
Average Well-Being under Different Income Distributions, Economics Working
Papers 2003,05, Christian-Albrechts-University of Kiel, Department of
Economics.
Christopher Beer & Tim Hendtlass (2012). Improving Exploration in Ant Colony
Optimisation with Antennation, IEEE must be obtained for all other uses.
Chuan, Z,, Xin Y., & Jian, Y. (2001). An evolutionary approach to materialized views
selection in a data warehouse environment. IEEE TRANS. SYST.,MAN,
CYBERN, 31:282|294.
Chuang, Y., & Chen, C. (2012). Black-Box Optimization Benchmarking for Noiseless
Function Testbed using Artificial Bee Colony Algorithm. GECCO’10, Portland
Oregon, USA.
Civicioglu P, Besdok .(2011). A conceptual comparison of the CK, PSO, DE and ABC
algorithms, Springer.(78).
Colorni, A., Dorigo, M., & Maniezzo, V. (1994). Ant system for job-shop scheduling.
BelgianJournal of Operations Research, Statistics and Computer Science 4(1),
3953 Computer Science and Software Engineering.
Cosar, C., Reed, B., Silberstein, A., & Srivastava, U. (2008). Automatic optimization
of parallel dataflow programs. USENIX Annual USENIX Association Technical
Conference In ATC, pages 267–273.cross products. InVLDB.
D. Karaboga & C. Ozturk, (2011). A novel clustering approach: artifcial bee colony
(ABC) algorithm,” Applied Soft Computing Journal.
David & Frank. Tompa (2007). Optimal top-down join enumeration. School of
Computer Science University of Waterloo Waterloo, Ontario, Canada.
164
Dean, J., & Ghemawat, S . (2004). Mapreduce. simplifie d data processing on large
clusters. In OSDI, pages 137–150.
DeHaan, D., & Tompa, F . (2007). Data Consumers, in: Journal of Management
Information Systems, 12, 1996, No. 4, pp. 5. Optimal top-down join
enumeration. InSIGMOD, pages 785–796,
Deng, W., Chain, H., & Li, H. (2014). A Novel Hybrid Intelligence Algorithm for
Solving CombinatorialOptimization Problems. Journal of Computing Science
and Engineering,Vol. 8, No. 4, December, pp. 199-206
Derakhshan, R., Dehne, F., Korn, O., & Stantic, B. (2006). Simulated annealing for
materialized view selection in data warehousing environment. In Proceedings of
the 24th IASTED international conference on Database and applications, pages
89{94, Anaheim, CA, USA. ACTA Press.
Derakhshan, R., Stantic,B., Korn, O., & Dehne, F. (2008). Parallel simulated annealing
for materialized view selection in data warehousing environments.In
Proceedings of the 8th international conference on Algorithms and
Architectures for Parallel Processing, ICA3PP '08, pages 121132,
Berlin,Heidelberg,. Springer-Verlag.
Dervis & Akay , B. (2009). A comparative study of artificial bee colony algorithm.
Appl Math Comput 214(1):108–132 .
Deshpande ,Yihong, Z., Prasad M., F., & Amit S. (1998). Simultaneous optimization
and evaluation of multiple dimensional queries. In SIGMOD, pages 271–282.
Dong & Horvath (2007) Understanding Network Concepts in Modules, BMC Systems
Biology 1:24.
Dong, & Shivnath Babu. (2011). Mapreduce programming and cost-based
optimization? crossing this chasm with starfish. PVLDB, 4(12):1446–1449.
Dorigo, M. & Theraulaz, G. (1999). Swarm Intelligence:From Natural to Artificial
Systems, Oxford University Press, Oxford.(62)
Dorigo, M. & Sttzle, T. (2004). Ant Colony Optimization. MIT Press, Cambridge,
2004.(61).
Dorigom, M. Gambardella, L.M. (1997).: Ant colonies for the traveling salesman
problem. BioSystems 43(2), 73–81 .
El-Abd, M. (2010). A Cooperative Approach to The Artificial Bee Colony Algorithm.
CEC(79) El-Abd M .(2005). All rights reserved. doi:10.1016/j.aei..01.004.(67).
Elghandour, I., & Vienna, A. (1993) . on Data Engineering, , pp. 345–354. Restore
Reusing results of mapreduce jobs. Austria IEEE Conference , PVLDB,
5(6):586597.
Fender, P., & Moerkotte, G. (2011). A new, highly efficient, and easy to implement
top-down join enumeration algorithm. In ICDE, pages 864–875.
Fender, P., & Moerkotte, G. (2012). Reassessing top-down join enumeration.
TKDE,24(10):1803–1818.
Fender, P., Guido, M., Thomas, N. & Viktor L. (2012). Effective androbust pruning
for top-down join enumeration algorithms. In ICDE, pages 414–425.
Franklin & Joseph .(2003). Flux: An Adaptive Partitioning Operator for Continuous
Query Systems, ICDE.
Franklin, Mistry & jonsson, (1996). Adaptive query processing: Technology in
evolution. IEEE Data Engineering Bulletin.
Frisch, K. (1967). The Dance Language and Orientation of Bees. Cambridge, Mass.:
The Belknap Press of Harvard University Press.
165
Garro, B.A., Sossa, H. & Vazquez, RA. (2011). Artificial neural network synthesis by
means of artificial bee colony algorithm. 2011 IEEE Congress of Evolutionary
Computation (CEC)
Gelogo, Y. E., & Lee, S.(2014) . Database Management System as a Cloud Service.
International Journal of Future Generation Communication and Networking
Vol.5 No. 2, June,2012
Geng, K., Dobbie, G., & Meng, Y. (2009). Survey of XML Semantic Query
Optimization. In Proceedings of the 2009 Fourth International Conference on
Internet Computing for Science and Engineering (ICICSE '09). IEEE Computer
Society, Washington, DC, USA, 297-300.
Giakoumakis, L., & Galindo-Legaria, C. (2008). Testing SQL Servers Query
Optimizer: Challenges.Techniques and Experiences”, IEEE.
Glover , F., & Ullman, D. (1989). Optimizing joins in a map-reduce environment. In
EDBT, pages 99–110,. Tabu Search-Part I.
Goldberg, D. E. (1989). Genetic Algorithms in Search Optimization and Machine
Learning”, 0201157675Addison-Wesley Pub. Co. (1989).
Graefe, G. (1993b) .Query evaluation techniques for large databases. ACM Computing
Surveys, 25(2).
Graefe, G. (1995). The Cascades framework for query optimization. Data Engineer-
ing Bulletin, 18(3),
Graefe, G. (1996).The Microsoft Relational Engine. In Proceedings of the Twelfth
International Conference on Data Engineering (ICDE'96).
Graefe, G., & David, J.(1987). The EXODUS optimizer generator. In Proceedings of
the 1987 ACM International Conference on Management of Data (SIGMOD'87),
Graefe, G., & McKenna, W.( 1993a). The Volcano optimizer generator: Extensibility
and ancient search. In Proceedings of the Ninth International Conference on
Data Engineering (ICDE'93).
Guido M. & Thomas N., (2008). Dynamic programming strikes back. In University of
Mannheim Mannheim, Germany
Guido M.(2006). Analysis of two existing and one new dynamic programming, VLDB
'06 Proceedings of the 32nd international conference on Very large data, Pages
930-941
Gupta A, et al. (2001) Crystal structure of Rv2118c: an AdoMet-dependent
methyltransferase from Mycobacterium tuberculosis H37Rv. J Mol Biol
312(2):381-91
Haas, M., Lin, E. T., & Roth, M. A.(2002). Data Integration through Database
Federation. IBM System Journal, VOL 41, No 4.
Hawash, A., Deik, A., & Jarrar, M. (2010). Towards Query Optimization for the Data
Web - Disk Based Algorithms: Trace Equivalence and Bisimilarity. In:
Proceedings of the International Conference on Intelligent Semantic Web -
Services and Applications(ISWSA’10),Amman,Jordan(pp.131-137). /
doi:10.11451874590.
Horng & C. C. Yeh. (2000).App lying genetic algorithms to query optimization in
document retrie val”. Information Processing & Management, 36(5), pp. 737–
759.
Ibaraki & T.Kameda. (1984). On the optimal nesting order for computing n-relational
joins. A CM-TODS , 9(3):482.
Ioannidis, Y .(1997). Query optimization. In Handbook for Computer Science. CRC
Press
166
Jeanne,R.L. (1986). The evolution of the organization of work in social insects.Monit
. Zool. Ital. 20 (1986) 267–287.
Jehrek, R. (2010) .database systems for management third edition. University of
Wisconsin Madison, Wisconsin, USA,Book.
Jeya, D., & Mohan, V. (2009). ABC Tester - Artificial Bee Colony Based Software
Test Suite Optimization Approach.
Kadkhodaei, H., & Mahmoudi, F. (2011). A combination method for Join Ordering
Problem in relational databases using Genetic Algorithm and Ant Colony.
Kang & Bhargava (1994). Mullti query optimazation on algrithm level. Jurnal data
knowledgeenginering.Volume14 Issue1, Nov.1994 Pages 57-75.
Karaboga, D., & Basturk, B. (2008). On the performance of artificial bee colony
(ABC) algorithm . Volume 8, Issue 1, January (2008).
Karnan. (2013). A Comprehensive review of Artificial Bee Colony Algorithm.(70)
Kennedy, J., Eberhart, R. C., & Shi, Y. (2001), Swarm Intelligence,Morgan
Kaufmann, San Francisco, CA.(62).
Kiyoshi O,. & Guy M. Lohman(1990). Measuring the complexity of join enumeration
Knight, G. (2014). Writing a Wellcome Trust Data Management & Sharing
Plan,london school of hyegine & medicine.
Krink, B., & Thomsen, R.(2004). Noisy optimization problems a particular challenge
for differential evolution. in Proceedings of 2004 Congress on Evolutionary
Computation, IEEE Press, Piscataway,NJ, 2004, pp. 332–339, 2004
Krink, D., & karboga S.(2008). on the performance of ABC algorithm. Pages 687-
697 Volume 8 Issue 1, January , journal applied softcomputing .
Krishnamurthy, W.R., Boral, H., & Zaniolo .(1986). Optimization of nonrecursive
queries. In:Proc. Of the Conf. On Very Large Data Base (sVLDB), Kyoto, Japan,
pp. 128–137 .
Laura &, Johann (1989). Extensible query processing in Starburst.In Proceedings of
the ACM International Conference on Management of Data (SIGMOD'89).
Leo Giakoumakis & Cesar Galindo Legaria, (2008) . Testing SQL Servers Query
Optimizer:Challenges, Techniques and Experiences”, IEEE.
Li, N., Liu, Y., Dong, Y., & Gu, J. (2008). Application of Ant Colony Optimization
Algorithm to Multi Join Query Optimization, Springer-Verlag Berlin
Heidelberg.
Lim, H., Herodotou, H., & Babu, S. (2012). study: A transformation-based optimizer
for mapreduce workflows. PVLDB, 5(11):1196–1207.
Ling, C., Yixin, L., Jianli, C., Ling, & Jing, G . (1988). A diversity guaranteed ant
colony algorithm based on queries. In ICDE, pages 311–319.
Liu, S., & Wang, Y. (2010). Quantum dynamic mechanism-based parallel ant colony
optimization algorithm.International Journal of Computational Intelligence
Systems. 3:101–113,.(61).
Liu, X., Cai, Z. (2009). Artificial bee colony Programming Made Faster, Fifth
International Conference on Natural Computation(80).
Lohman, O., & Guy, M. (1990).. Measuring the complexity of join enumeration in
query optimization. In Proceedings of the Sixteenth, International Conference
on Very Large Databases (VLDB'90).
Lohman,G. (1988). Grammar-like functional rules for representing query optimization
alternatives. In Proceedings of the 1988 ACM International Conference on
Management of Data (SIGMOD'88), 1988.
167
Luo, T.S. Pan, P.W. Tsai, & J.S. Pan.(2010). Parallelized artificial bee colony with
ripple-communication strategy. In Genetic and Evolutionary Computing
(ICGEC), 2010 Fourth International Conference on, IEEE, 2010, 350–353.
Management Applications. ISSN 2150-7988 Volume 7 pp. 074-083 © MIR
Labs,www.mirlabs.net/ijcisim/index.htm.special issues.Management Journal,7
437-458.
Maniezzo, V., Dorigo, M., & Colorni, A. (1994). The ant system applied to the
quadratic assignmentproblem,IRIDIA/94-28. Universite de Bruxelles, Belgium
(80)
Manolescu, I., Bouganim, L., & Fabret, F. (2002). Simon. Efficient Querying of
Distributed Resources in Mediator Systems. CoopIS/DOA/ODBASE 2002,
pp.468 485, 2002.
Matysiak, M. (1995). Efficient optimization of large join queries using Tabu Search.
Information Sciences, 83(1-4), 77–88. doi:10.1016/0020-0255(94)00094-R.
McHugh, J., & Widom, J. (1999). Query Optimization for XML. In proceedings of the
25th Very Large Data Bases Conference, Edinburgh, Scotland.
McHugh, J., Abiteboul, S., Goldman., R., Quass, D., & Lore, W. (1997). A Database
Management System for Semistructured Data. SIGMOD Record, 26(3):54-66,
September .
McHugh, J., Widom, J . (1999). Optimizing branching path expressions. Technical
Report, Stanford University.
Mcleod & Khan . (2003). A probe based technique to optimize join queries in
distributed internet bases, Knowledge and Information Systems.
Mehta PK, et al. (1993) Aminotransferases: demonstration of homology and division
into evolutionary subgroups. Eur J Biochem 214(2):549-61.
Mezura-Montes, E., & Cetina-Domínguez, O. (2009). Exploring promising regions of
the search space with The scout bee in the Artificial Bee Colony for Constrained
optimization(79).
Michael m M., Guy, M. & Lohman, M. (2001) . DB2's .learning optimizer. In
Proceedings of the 27th International Conference on Very Large Databases
(VLDB'01),
Middendorf, M. (2002). Ant colony optimization, in: Tutorial Proc. Genetic and
Evolutionary Computation Conference .
Mistry, P. Roy, S. Sudarshan, & Ramanritham, (2001). Materialized view selection
and maintenance using multi-query optimization. In Proceedings of the 2001
ACM-SIGMOD Conference, Santa Barbara,A. ACM Press.
Moerkotte & T. Neumann. (2006). Analysis of two existing and one new dynamic
programming algorithmfor the generation of optimal bushy join trees without
cross product . VLDB '06 Proceedings of the 32nd international conference on
Very large data bases.
Moerkotte, G., & Kemper, A. (1997). Heuristic and randomized optimization for the
join ordering problem . VLDB Journal.
Moerkotte, G., & Neumann, G. (2006). Analysis of two existing and one new dynamic
programing algorithm for the generation of optimal bushy join trees without
cross products. VLDB endowment Seoul, Korea .
Moerkotte, G., & Neumann, T. (2006). Dynamic programming strikes back Analysis
of two existing and one new dynamic programming algorithm for the generation
of optimal bushy join trees without cross products. In VLDB, pages 930–941.
Moerkotte, G., & Neumann, T. (2008). Dynamic programming strikes back.
InSIGMOD, pages 539–552.
168
Montgomery , J., & Randall, M. (2002). Anti-pheromone as a tool for better
exploration of search space. In Proceedings of the Third International Workshop
on Ant Algorithms, ANTS ’02, pages 100–110, London, UK, Springer-
Verlag.(61).
Mukul, J., Praveen, S. (2013). Query Optimization: An Intelligent Hybrid Approach
using Cuckoo and Tabu Search. International Journal of Intelligent Information
Technologies, 9(1), 40-55,
Nakamichi & T. (2004). Diversity control in ant colony optimization.Artificial Life
Robot , 7:198–204.
Narasimhan.(2009). Parallel artificial bee colony (pabc) algorithm. In Nature &
Biologically Inspired Computing.. World Congress on, 2009, 306-311.
Nilesh, N., Dalvi, K. S., Sanghai, P., & Sudarshan, S. (2003). Pipelining in multi-query
optimization. J. Comput. Syst. Sci., 66(4):728–762,
Noha, A.R., Yousri, Khalil, M., & Nagwa, M. (2005). Algorithms for selecting
materialized views in a data warehouse. In The 3rd ACS/IEEE International
Conference on Computer Systems and Applications, page 27.
Pandao, M., & Isalkar, A. (2012). multi query optimization using heuristic approach
.International journal of computer science and network, ISSN 2277-5420, 2012.
Patricia & Raymond (1997). System R: Relational Approach to Database
Management. ACM Transactions on Database Systems, 1(2), 1976, 97-137.
Patricia, G., Selinger, M., Astrahan, D., Chamberlin, A., & Thomas, P. (1979). Access
path selection in a relational database anagement ystem. In Proceedings of the
1979 ACM International Conference on Management of Data (SIGMOD'79).
Petrou & Amiri, "Robust electrical spin injection into a semiconductor
heterostructure", Phys. Rev. B 62, 8180 (2000).
Pham, D.T., Otri, S., & Afify, A. (2007). data clustering using the bees algorithm.
Intelligent Systems Laboratory, Manufacturing Engineering Centre, Cardiff
University, Cardiff CF24 3AA, UK.(69).
Phuboon, J., & Auepanwiriyakul, R. (2007). Selecting materialized views using Two-
Phase optimization with multiple view processing plan. In World Academy of
Science, Engineering and Technology 27,
Phuboonob, J., (2009). Materialized View Selection Using Two-Phase Optimization
Algorithm. PhD thesis, National Institute of Development Admistration,
Bangkapi, Bangkok, Thailand.
Pinal Dave, (2014) . Adventureworks 2014. (http://blog.sqlauthority.com).
Pit Fender & Guido Moerkotte. (2012). Reassessing top-down join enumeration. IEEE
Transactions on Knowledge and Data Engineering, 24(10):1803–1818.
Pit Fender & Guido Moerkotte.,(2013). Top down plan generation: From theory to
practice.In ICDE, pages.
Plale, B. and K. Schwan (2003). Dynamic Querying of Event Streams with the
dQUOB System,IEEE Transactions of Parallel and Distributed Systems, IEEE
Computer Science Press,Vol. 14, No. 3, pp. 422432.
Pulikanti, S., Singh, A. (2009). An ABC for the Quadratic Knapsack problem. ICONIP
2009, pp. 196-205.
Quan & Xinling Shi (2008). On the analysis of performance of the improved artificial-
bee-colony algorithm, in natural computation, 7, 2008, 654-658.
Randall, M., & Tonkes, E.(2002). Intensification and diversification strategies in ant
colony system. Complexity International, 9, (61).
Rashid & Ali (2009). Efficient Transformation of a Natural Language Query to SQL
for Proceedings of the Conference on Language & Technol ogy 2009
169
Ribeiro, C. C., Ribeiro, C. D., & Lanzelotte, R. S. (1997). Query optimization in
distributed relational databases. Journal of Heuristics, 3(1), 3–23.
doi:10.1023/A:1009670031749.
Riley, J. R., Greggers, U., Smith, A. D., Reynolds, D. R., & R .(2005). Menzel. The
flight paths of honeybees recruited by the waggle dance,” Nature, vol. 435, no.
7039, pp. 205–207: View at Publisher · View at Google Scholar · View at
Scopus.
Rizzolo, F., & Mendelzon , A. (2001). Indexing XML Data with ToXin. In Proc. 4th
Int. Workshop on the Web and Database (in Conjunction with ACM SIGMOD),
Santa Barbara, CA, May.
Rodríguez, M. (2010) . Automata Theory Based Approach to the Join Ordering
Problem in Relational Database Systems.
Rosenthal and U. S. Chakravarthy (1988). Anatomy of a modular multiple query
optimizer. Intel. Conf. Very Large Databases, pages 230–239, 1988.
Roussopoulos,(2000). WebView Materialization. In the Proceedings of the
ACMSIGMOD International Conference on Management of Data, Dallas,
Texas, US A, May 2000.
Roy, P., Seshadri, S., & Siddhesh, B. (2000). Efficient and extensiblealgorithms for
multi query optimization. SIGMOD Rec., 29(2):249–260.
Russell, C., Eberhart & Yuhui, S., (1998). Comparison between Genetic Algorithms
and Particle Swarm Optimization. Evolutionary Programming VII,–
Springer.(67)
S. Abiteboul, S. Cluet, V. Christophides, T. Milo, G. Moerkotte, and J.(
1998) Simeon. Querying documents in object databases. Intl. Journal on Digital
Libraries, 1:5–19.
Salminen, A., & Wm, F. (1994). Tompa: Pat expressions: an algebra for text search.
In Acta Linguista Hungarica 41, pages 277 – 306.
Seeley T.D., Visscher P.K., & Passino K.M. (2006). Group decision making in honey
bee swarms". American Scientist 94: 220–229. doi:10.1511/2006.3.220, .
Seeley,(1995).
The Wisdom of the Hive. Harvard University Press, Cambridge., MA.
The Social Physiology of Honey Bee Colonies.
Sellis & Timos, K. (1988). Multiple-query optimization. TODS, 13(1):23–52, Tomasz,
N., Michalis, P., Chaitanya, M., George, K., & Nick, K. (1997). Mrshare:
Sharing across multiple queries in mapreduce. PVLDB, 3(1-2).
Seyed, H., Talebian & Sameem, A. (2009). Using genetic algorithm to select
materialized views subject to dual constraints. In 2009 International Conference
on Signal Processing Systems, pages 633–638, Singapore, 2009.
Shams, I., & Aryanezhad, M. (2010). Optimization of a Multiproduct CONWIP-based
Manufacturing System using Artificial Bee Colony Approach. IMECS, Hong
Kong(80).
Shekita Lapis, G., & wilim (1993).Starburst midnight: As the dust clears. IEEE
Transactions on Knowledge and Data Engineering, 2(1).
Shekita, E., Young, H., & Tan, K.L. (1993).Multi-join optimization for symmetric
multiprocessors. In:Proc. Of the Conf. on Very Large Data Bases (VLDB),
Dublin, Ireland, pp. 479–492.
Shuang, B., Chen, J., & Li, Z. ( 2009) Study on hybrid psaco algorithm. Applied
Intelligence, pages 1–10,.(61).SIGMOD, pages 539–552, 2008.
Simon & Jim Melton (1993). Understanding the New SQL:A Complete Guide.Morgan
Kaufman,.
170
Singh, S.K. (2006). Database system coceptual , design and application ., person
eduction first emmpirsing , ISBN 81-7758-567-3
Sinha & Craig M. Chase (1996): Prefetching and Caching for Query Scheduling in a
Special Class of Distributed Applications. ICPP, Vol. 3 : 95-102.
Sonmez. (2010) Design of Fiber Reinforced Laminates for Maximum Fatigue Life
Procedia Engng 2 , 251-25 6 .
Sood & Qureshi (1985).Database machines: illustrated. Computers Systems
Architecture : Springer-Verlag. ISBN 0387171649 .
Souley, B., & mohamed, D. (2013). performing analysis of query optimizers under
varity hardware component in RDBMS. in journal computer engineering and
information technology, Journal of Software 13(2), 250–256 (2002).
Srivastava, D Han, MA Rico-Ramirez, M Bray, T Islam.,(2012). Selection of
classification techniques for land use/land cover change investigation, Advances
in Space Research 50 (9), 1250–1265.
Steinbrunn, M., Moerkotte, G., & Kemper, A. (1997). Heuristic and randomized
optimization for the join ordering problem. The Very Large Data Bases Journal,
6(3), 191–208. Doe: 10.1007/s007780050040 (1997).
Surajit (1998). An overview of query optimization. In Rlatinal system in processidings
of the sympesuim on principles of database systems (PODS).
Surjanovic, S. & Bingham, D. (2013). Virtual Library of Simulation Experiments: Test
Functions and Datasets.Retrieved May 28, 2016, from
http://www.sfu.ca/~ssurjano.
Swami, A., Iyer, B. (1993). A polynomial time algorithm for optimizing join queries.
In: Proc. IEEEConf. on Data Engineering, Vienna, Austria: 345–354.
Tae, S., & Hyoung, J. ( 2002). Extracting indexing information from XML DTDs. Inf.
Process. Lett. 81, 2 (January 2002), 97-103.
Talbi, E. G., Roux, O., & Fonlupt, C. ( 2009). Robillard.Parallel ant colonies for the
quadratic assignment problem.Future Generation Computer Systems, 17:441–
449,
Tang YY, et al. (2010) Does short-term mental training induce grey matter change
Jurnal Progress in Modern Biomedicine 2010 Vol. 10 No. 15 pp. 2961-2963.
1673-6273
Teodorovi, C.D., & Dell M. (2005). Bee colony optimizationa cooperative learning
approach tocomplex transportation problems. In: Proceedings ofthe 10th EWGT
Meeting, Poznan, 13–16 September 2005.
Tereshko, CV., & Loengarov, A. (2005). “Collective Decision-Making in Honey Bee
Foraging Dynamics”. Computing and Information Systems Journal, ISSN 1352-
9404, vol. 9, No 3.(72)
Tereshko, V. (2000). Reaction–diffusion model of a honeybee colony’s foraging
behavior. in: M.Schoenauer (Ed.), Parallel Problem Solvingfrom Nature VI,
Lecture Notes in Computer Science, vol. 1917, Springer–Verlag, Berlin(72).
Tereshko, V., & Lee, T. (2002). How information mapping patterns determine
foraging behavior of a honeybee colony. Open Systems and Information
Dynamics 9 181–193(72).
Tereshko, V., & Loengarov, A. (2005) Collective Decision-Making in Honey Bee
Foraging Dynamics.Comput. Inf. Sys. J., 9(3): 1–7.ternational Journal of
Computer Science and Information Technologies, Vol. 5 (3) , 2014, 3052-
305.http://www.sersc.org/journals/IJFGCN/vol5_no2/6.pdf.
171
Thiele, L., Miettinen, K., Korhonen, P.J., & Molina, J. (2009). A preference-based
evolutionary algorithm for multi-objective optimization. Evolutionary
Computation, 17(3), 411–436.
Thusoo A & Borthakur D, (2010).Data Warehousing and Analytics Infrastructure at
Facebook, Data warehouse, s calability, data discovery, resource sharing,
distributed file system, Hadoop, Hive, Facebook, Scribe, log aggregation,
analytics, mapreduce, distributed systems
Tomasz, N., Michalis P., Chaitanya, M., George K., & Nick K., Mrshare. (2010).
Sharing across multiple queries in mapreduce. PVLDB, 3(1-2):494–505.
Trummer and I. Koch. (2015) Multi-objective parametric query optimization.VLDB,
8(3):221–232, 2015.
Tsai, et.al .(2009). Enhanced Artificial Bee Colony Optimization, Innovative
Computing, vol.5, pp.1349-4198, Aug.2009.
TUBA, M. (2013). Artificial Bee Colony (ABC) Algorithm, Exploitation and
Exploration Balance. Latest Advances in Information Science and Applications.
Upen, S., & Jack, M. ( 1986). Multiple query processing in deductive databases using
query graphs. In VLDB, pages 384–391.
Vidya Banu & N. Nagaveni. (2013) Evaluation of a perturbation-based technique for
Visscher, PK., & Seeley, TD. (1982). Foraging strategy of honey bee colonies in a
temperate deciduous fores. Issue 6,63:1790–1801(72).
Vivek, S., & Brajesh, P (2012). An Idea of Extraction of Information Using Query
Optimization and Rank Query. (2012). International Journal of Advanced
Research Vol., 74. John Wiley & Sons, New York.VLDB, pages 930–941.
Wang & Beni (2011). An Improved Artificial Bee Colony Algorithm, IEEE Bolaji
AL, Khader AT, Al-betar MA.
Wang, R,Y, strong & Guauacio,L.M., (1996). beyond Accurancy .what data quality
mean to data consumer jurnal of mangment information system .12(4),5-33.
Wang, X., Burns, R., Terzis, A., & sun. (2008). Network-Aware Join Processing in
Global Scale Database Federations. ICDE 2008.
Wei, S., Daxin, L., Wansong, Z. ( 2004 ). An efficient method for XML queries
optimization based DTD abstraction and classification. Intelligent Control and
Automation,. WCICA 2004. Fifth World Congress on , vol.5, no., pp. 3926-
3929 Vol.5.
Wu, S., & Banzhaf, W. (2008). the use of computational intelligence in intuition
detection system : review , Article. Bibliometrics Data Bibliometrics.
Wu, S., Feng L., Sharad M. & Beng, O. (2011). Query optimization for massively
parallel data processing. In SOCC, pages 12:1–12:13.Scientific & Engineering
Research Volume 2, Issue 9, ISSN 2229-5518.
Wu, Y., Patel, J.M., & Jagadish, H.V. (2003). Structural join order selection for XML
query optimization. In: ICDE, pp. 443-454. IEEE Computer Society, New York
Xue, H., Zhang, P., & Yang, L. (2010). A multiple ant colonies optimization algorithm
based on immunity for solving tsp. pages 289–293, (61).
Yang & Deb (2009). Cuckoo search via Lévy flights. World Congress on Nature &
Biologically Inspired Computing (NaBIC 2009). IEEE Publications. pp. 210–
214
Yang, J., Karlapalem, K., & Li, O. (1997). Algorithms for terializedView esign in data
warehousing environment. In Proceedings of the 23rdInternational onference on
Very Large Data Bases, pages 136145. MorganKaufmannPublishers Inc.
172
Yongwen, X. (1998). Effciency in the Columbia database query optimizer. M.S.
Thesis,Portland State University.
Zafarni, E. (1993). new method for optimizing join queris processing in heterogeneous
distributed database. IEEE in knowledge discovery and data mining search .In
Proceedings of the Ninth I.nternational Conference on Data Engineering
(ICDE'93).
Zhang & Lin. (2010). An adaptive heterogeneous multiple ant colonies system.
volume 1, pages 193–196,.(61).
Zhang, & Chen, Y. (2011). Best-worst ant system. In 2011 3rd International
Conference on Advanced Computer Control. pages 392–395, (61).
Zhang, C., & Yang, J. (1999). Genetic algorithm for materialized view selection in
data warehouse environments. In Proceedings of the First International
Conference on Data Warehousing and Knowledge Discovery, pages 116
125.Springer-Verlag.
Zhou, J., & Larson, P. (2007). Efficient exploitation of similar subexpressions for
query processing. In SIGMOD, pages 533–544.