Kudo Codefest: Faster data retrival with SQL query optimization

22
Faster Data Retrieval with SQL Query Optimization <[email protected]> <[email protected]> Andrew Kaligis Ajeng Tya Meiranti

Transcript of Kudo Codefest: Faster data retrival with SQL query optimization

Page 1: Kudo Codefest: Faster data retrival with SQL query optimization

Faster Data Retrieval with

SQL Query Optimization<[email protected]>

<[email protected]>Andrew KaligisAjeng Tya Meiranti

Page 2: Kudo Codefest: Faster data retrival with SQL query optimization

Kudo use agents as its primary business model

PROBLEM

To make kudo grow, we need to grow our agents across all provinces.

A lot of agents, means alot of transaction.

Growing transactions made a lot of kind of data saved into our database.

Millions of data spreading in our database in hundreds of tables

Page 3: Kudo Codefest: Faster data retrival with SQL query optimization

How search data faster in this such millions of data?

Page 4: Kudo Codefest: Faster data retrival with SQL query optimization

How to keep our performance while our data

still growing every day?

Page 5: Kudo Codefest: Faster data retrival with SQL query optimization

Indexing“indexing in database is like an index in a books”

Columns are often used in the clause "where" or the join condition.

Column contains values with a wide coverage. The column contains many null values. Table is large and most of the display data is more than 2-4%

Page 6: Kudo Codefest: Faster data retrival with SQL query optimization

Indexing

The whole point of having an index is to speed up

search queries by essentially cutting down

the number of records/rows in a table that need to be

examined.

Page 7: Kudo Codefest: Faster data retrival with SQL query optimization

Some programmers has a habit to write "SELECT * FROM my_table“.

Avoid (Select * from)

fetch all column fetch only required tables (agent_name & city)

0.1 KB * 6 column * 1000000 rows = 600000 KB

(585.9 MB)

0.1 KB * 2 column * 1000000 rows = 200000 KB

(195,3 MB)

Query with * means that you select all column when table scan.

example : Our table has 6 columns (id, agent_name, address, city, province_id, distributor_id) and 1.000.000 rows Each cell contain 2 KB data

Page 8: Kudo Codefest: Faster data retrival with SQL query optimization

Avoid (Select * from)

The result of both query is very significant.

So, never use * inside your query if it does not need to.

Page 9: Kudo Codefest: Faster data retrival with SQL query optimization

Case :Show 50 data for each pages, need 0.1 KB * 2 column * 50 rows = 10 KB (Small, Isn't it?)

PaginationQuery Limit and Offset

Faster retrieve data to show to end user

Page 10: Kudo Codefest: Faster data retrival with SQL query optimization

Join Many Tables Are Bad

Page 11: Kudo Codefest: Faster data retrival with SQL query optimization

Split “joined query” Total Sales By Main Cetegory

Categoryid category_i

dmain_category_

id1 5 32 6 23 7 14 8 4

Map_Category

id item_id category_id

1 8001 52 8002 63 8003 74 8004 8

Item_Category Orderid item_i

dtotal_sales

1 8001 32 8002 23 8003 14 8004 4

id name1 Fashion2 Healthy3 Elekrtonic4 Others5 TV6 Tooth Health7 Shoes8 Toys

Page 12: Kudo Codefest: Faster data retrival with SQL query optimization

Split “joined query” Case Query

SELECT order.total_salesFROM order

LEFT JOIN (SELECT item_category.item_id,

item_category.category_idmap_category.main_category_id

FROM item_categoryLEFT JOIN map_category

ON item_category.category_id = map_category.category_id

GROUP BY item_category.item_id) AS flag_categoryON order.item_id = flag_category.item_id

Page 13: Kudo Codefest: Faster data retrival with SQL query optimization

Split “joined query” Part 1

id name1 Fashion2 Healthy3 Elekrtonic4 Others5 TV6 Tooth Health7 Shoes8 Toys

id category_id

main_category_id

1 5 32 6 23 7 14 8 4

Map_Category

SELECT category_id,main_category_idFROM map_category

Category

Page 14: Kudo Codefest: Faster data retrival with SQL query optimization

Split “joined query” Part 2

id item_id category_id

1 8001 32 8002 23 8003 14 8004 4

SELECT category_id,item_idFROM item_category

Item Category

Page 15: Kudo Codefest: Faster data retrival with SQL query optimization

Split “joined query” Part 3

id item_id

total_sales

1 8001 32 8002 23 8003 14 8004 4

SELECT item_id , total_sales FROM order

Order

Page 16: Kudo Codefest: Faster data retrival with SQL query optimization

Caching Mechanism

Load data faster without a query to the server

Page 17: Kudo Codefest: Faster data retrival with SQL query optimization

Caching Mechanism Redis using RAM to store the data It helps to fetch the data faster, processing

data in RAM is faster than Hard Disk

Redis using key-value data structure We can get specific collection using specific

key

Page 18: Kudo Codefest: Faster data retrival with SQL query optimization

Caching Mechanism Sample implementation

Page 19: Kudo Codefest: Faster data retrival with SQL query optimization

Denormalization table

it contains rows with multiple values for an attribute (repeating groups) or

Denormalization is the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data.

https://en.wikipedia.org/wiki/Denormalization

Page 20: Kudo Codefest: Faster data retrival with SQL query optimization

Denormalization table

Still, denormalization brings the danger of update anomalies back to the database. Therefore, you have to do it deliberately. You should document any

denormalization thoroughly.

Id name1 TIKI2 JNE

Id name1 Jakarta2 Depok

Id name1 Shoes2 Handphone

Shipping Address item

Order_id Order_date Shipping_name

Address_name

Item_name

12010 2016/05/26 TIKI Jakarta Handphone12011 2016/05/26 TIKI Depok Handphone

Page 21: Kudo Codefest: Faster data retrival with SQL query optimization

“ The fastest query is the one you never make

Page 22: Kudo Codefest: Faster data retrival with SQL query optimization

Andrew Kaligis [email protected]

Ajeng Tya Meiranti [email protected]