Kudo Codefest: Faster data retrival with SQL query optimization

Faster Data Retrieval with

SQL Query Optimization<[email protected]>

<[email protected]>Andrew KaligisAjeng Tya Meiranti

Kudo use agents as its primary business model

PROBLEM

To make kudo grow, we need to grow our agents across all provinces.

A lot of agents, means alot of transaction.

Growing transactions made a lot of kind of data saved into our database.

Millions of data spreading in our database in hundreds of tables

How search data faster in this such millions of data?

How to keep our performance while our data

still growing every day?

Indexing“indexing in database is like an index in a books”

Columns are often used in the clause "where" or the join condition.

Column contains values with a wide coverage. The column contains many null values. Table is large and most of the display data is more than 2-4%

Indexing

The whole point of having an index is to speed up

search queries by essentially cutting down

the number of records/rows in a table that need to be

examined.

Some programmers has a habit to write "SELECT * FROM my_table“.

Avoid (Select * from)

fetch all column fetch only required tables (agent_name & city)

0.1 KB * 6 column * 1000000 rows = 600000 KB

(585.9 MB)

0.1 KB * 2 column * 1000000 rows = 200000 KB

(195,3 MB)

Query with * means that you select all column when table scan.

example : Our table has 6 columns (id, agent_name, address, city, province_id, distributor_id) and 1.000.000 rows Each cell contain 2 KB data

Avoid (Select * from)

The result of both query is very significant.

So, never use * inside your query if it does not need to.

Case :Show 50 data for each pages, need 0.1 KB * 2 column * 50 rows = 10 KB (Small, Isn't it?)

PaginationQuery Limit and Offset

Faster retrieve data to show to end user

Join Many Tables Are Bad

Split “joined query” Total Sales By Main Cetegory

Categoryid category_i

dmain_category_

id1 5 32 6 23 7 14 8 4

Map_Category

id item_id category_id

1 8001 52 8002 63 8003 74 8004 8

Item_Category Orderid item_i

dtotal_sales

1 8001 32 8002 23 8003 14 8004 4

id name1 Fashion2 Healthy3 Elekrtonic4 Others5 TV6 Tooth Health7 Shoes8 Toys

Split “joined query” Case Query

SELECT order.total_salesFROM order

LEFT JOIN (SELECT item_category.item_id,

item_category.category_idmap_category.main_category_id

FROM item_categoryLEFT JOIN map_category

ON item_category.category_id = map_category.category_id

GROUP BY item_category.item_id) AS flag_categoryON order.item_id = flag_category.item_id

Split “joined query” Part 1

id name1 Fashion2 Healthy3 Elekrtonic4 Others5 TV6 Tooth Health7 Shoes8 Toys

id category_id

main_category_id

1 5 32 6 23 7 14 8 4

Map_Category

SELECT category_id,main_category_idFROM map_category

Category


id item_id category_id

1 8001 32 8002 23 8003 14 8004 4

SELECT category_id,item_idFROM item_category

Item Category


id item_id

total_sales

1 8001 32 8002 23 8003 14 8004 4

SELECT item_id , total_sales FROM order

Order

Caching Mechanism

Load data faster without a query to the server

Caching Mechanism Redis using RAM to store the data It helps to fetch the data faster, processing

data in RAM is faster than Hard Disk

Redis using key-value data structure We can get specific collection using specific

key

Caching Mechanism Sample implementation

Denormalization table

it contains rows with multiple values for an attribute (repeating groups) or

Denormalization is the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data.

https://en.wikipedia.org/wiki/Denormalization

Denormalization table

Still, denormalization brings the danger of update anomalies back to the database. Therefore, you have to do it deliberately. You should document any

denormalization thoroughly.

Id name1 TIKI2 JNE

Id name1 Jakarta2 Depok

Id name1 Shoes2 Handphone

Shipping Address item

Order_id Order_date Shipping_name

Address_name

Item_name

12010 2016/05/26 TIKI Jakarta Handphone12011 2016/05/26 TIKI Depok Handphone

“ The fastest query is the one you never make

“

Andrew Kaligis [email protected]

Ajeng Tya Meiranti [email protected]

Kudo Codefest: Faster data retrival with SQL query optimization

Technology

Transcript of Kudo Codefest: Faster data retrival with SQL query optimization