Horizontal aggregatios in sql to prepare dataset using split-spj metho

18
LOGO Welcome To Thesis Presentati on

description

Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables, and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations in Split-SPJ method. Horizontal aggregations build data sets with a horizontal de-normalized layout (e.g., point-dimension, observation variable, instance-feature), which is the standard layout required by most data mining algorithms.

Transcript of Horizontal aggregatios in sql to prepare dataset using split-spj metho

Page 1: Horizontal aggregatios in sql to prepare dataset using split-spj metho

Welcome To

Thesis Presentation

Page 2: Horizontal aggregatios in sql to prepare dataset using split-spj metho

PresentationOn

Horizontal Aggregations in SQL to prepare Dataset using Split-SPJ Method

ATHESIS & PROJECT

BY

 Arifur Rahman (074051)Md. Taz Uddin (074044)

Md. Tareq Imran (074050)

Supervised BY

Sumaya KazaryAssistant professor, Dept. of CSE, DUET

Page 3: Horizontal aggregatios in sql to prepare dataset using split-spj metho

Introduction1

Analysis2

Experimental Overview3

Compare Performance4

Future plans5

Overview

3April 10, 2023

Page 4: Horizontal aggregatios in sql to prepare dataset using split-spj metho

IntroductionPreparing a data set for analysis is generally the most time

consuming task in a data mining project, requiring many complex SQL queries, joining tables, and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations in Split-SPJ method. Horizontal aggregations build data sets with a horizontal de-normalized layout (e.g., point-dimension, observation variable, instance-feature), which is the standard layout required by most data mining algorithms.

4April 10, 2023

Page 5: Horizontal aggregatios in sql to prepare dataset using split-spj metho

5April 10, 2023

Introduction (Contd)

Data Mining : Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into usefulinformation.

Page 6: Horizontal aggregatios in sql to prepare dataset using split-spj metho

6April 10, 2023

Introduction (Contd)

Dataset : A dataset (or data set) is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the dataset in question.

Vertical Aggregation : It arrange dataset from database in vertically as respect with necessary query (such as group by clause in SQL) .Generally in relational database system the aggregation are arranged by vertical aggregation.

Page 7: Horizontal aggregatios in sql to prepare dataset using split-spj metho

7April 10, 2023

Introduction (Contd)

Horizontal Aggregation : Here introduce a new class of aggregations that have similar behavior to SQL standard aggregations, but which produce tables with a horizontal layout. In contrast, we call standard SQL aggregations vertical aggregations since they produce tables with a vertical layout. Horizontal aggregations just require a small syntax extension to aggregate functions called in a SELECT statement.

Page 8: Horizontal aggregatios in sql to prepare dataset using split-spj metho

8April 10, 2023

Analysis

Problem of Horizontal Aggregation : Number of column may be exceed than the allowed number of column of DBMS. That means reaching the maximum number of columns in one table and reaching the maximum column name length when columns are automatically named.

To elaborate on this, a horizontal aggregation can return a table that goes beyond the maximum number of columns in the DBMS when the set of columns {R1,. . .,Rk} has a large number of distinct combinations of values, or when there are multiple horizontal aggregations in the same query.

Page 9: Horizontal aggregatios in sql to prepare dataset using split-spj metho

9April 10, 2023

Analysis (Contd)

Column limit of different Database System :

Database Maximum Permitted ColumnMicrosoft Access 255Microsoft SQL Server 1024MySql 4096Oracle Default 1000 but it can be

increase by command.

Page 10: Horizontal aggregatios in sql to prepare dataset using split-spj metho

10April 10, 2023

Analysis (Contd)

Introduce with Split-SPJ method

If vertical attributes of a table is :ID, VA1, VA2, VA3, VA4,, . . . . .. . . . . ,VA255, VA256, VA257, . . . . ,VA272, VA273 (It is impossible to aggregate in SPJ method)

The output of Split-SPJ method :Table-1ID, VA1, VA2, VA3, VA4, VA5, VA6, VA7, . . . . . . . . . ,VA255Table-2ID, VA256, VA257, . . . . . . . . . . . ,VA270, VA271, VA272, VA273

Page 11: Horizontal aggregatios in sql to prepare dataset using split-spj metho

11April 10, 2023

Experimental Overview

Facebook_id Image_name Character_lengthUser1 Pic1 31User1 Pic2 27User1 Pic4 20User1 Pic10 30

.

.

.

.

.

.

.

.

.

.

.

.User4 Pic200 10User4 Pic220 26User4 Pic299 15User4 Pic340 25User4 Pic360 35

Vertical aggregation of experimental data :

Page 12: Horizontal aggregatios in sql to prepare dataset using split-spj metho

12April 10, 2023

Experimental Overview (Contd)

Horizontal aggregation in SPJ method :

Facebook_id Image_name_pic1 Image_name_pic2 Image_name_pic3

. . . . . . . . . . . . . . . .

Image_name_pic255

User1 31 31 20

User2 14 17 14

User3 17 15 13

User4 10 5 8

Page 13: Horizontal aggregatios in sql to prepare dataset using split-spj metho

13April 10, 2023

Experimental Overview (Contd)Horizontal aggregation in proposed Split-SPJ method :

Facebook_id Image_name_pic1 Image_name_pic2 Image_name_pic3

...

Image_name_pic255

User1 31 31 20

User2 14 17 14

User3 17 15 13

User410 5 8

Facebook_id Image_name_pic256 Image_name_pic267 Image_name_pic258

...

Image_name_pic360

User1 31 31 60 50

User2 14 45 40

User3 17 15

User4 10 5 80

Table-1

Table-2

Page 14: Horizontal aggregatios in sql to prepare dataset using split-spj metho

14April 10, 2023

Experimental OverviewCompare Performance :

When aggregated column < 255, performance is same for SPJ and Split-SPJ method.

Page 15: Horizontal aggregatios in sql to prepare dataset using split-spj metho

15April 10, 2023

Experimental Overview (Contd)Compare Performance :

When aggregated column > 255, it is unable to aggregate up to 255 column.

Page 16: Horizontal aggregatios in sql to prepare dataset using split-spj metho

16April 10, 2023

Experimental Overview (Contd)Compare Performance :

When aggregated column > 255, it is possible to aggregate into multiple table.

Page 17: Horizontal aggregatios in sql to prepare dataset using split-spj metho

Future Plan

17April 10, 2023

If the length of aggregate object is exceed column length of related database than there occur an error which may be overcome by using alias method. That means it is very complex to aggregate when data field’s are contain image or file (such as blob data).

Page 18: Horizontal aggregatios in sql to prepare dataset using split-spj metho