Kudo Codefest: Faster data retrival with SQL query optimization
-
Upload
kudo-developers -
Category
Technology
-
view
639 -
download
0
Transcript of Kudo Codefest: Faster data retrival with SQL query optimization
Faster Data Retrieval with
SQL Query Optimization<[email protected]>
<[email protected]>Andrew KaligisAjeng Tya Meiranti
Kudo use agents as its primary business model
PROBLEM
To make kudo grow, we need to grow our agents across all provinces.
A lot of agents, means alot of transaction.
Growing transactions made a lot of kind of data saved into our database.
Millions of data spreading in our database in hundreds of tables
How search data faster in this such millions of data?
How to keep our performance while our data
still growing every day?
Indexing“indexing in database is like an index in a books”
Columns are often used in the clause "where" or the join condition.
Column contains values with a wide coverage. The column contains many null values. Table is large and most of the display data is more than 2-4%
Indexing
The whole point of having an index is to speed up
search queries by essentially cutting down
the number of records/rows in a table that need to be
examined.
Some programmers has a habit to write "SELECT * FROM my_table“.
Avoid (Select * from)
fetch all column fetch only required tables (agent_name & city)
0.1 KB * 6 column * 1000000 rows = 600000 KB
(585.9 MB)
0.1 KB * 2 column * 1000000 rows = 200000 KB
(195,3 MB)
Query with * means that you select all column when table scan.
example : Our table has 6 columns (id, agent_name, address, city, province_id, distributor_id) and 1.000.000 rows Each cell contain 2 KB data
Avoid (Select * from)
The result of both query is very significant.
So, never use * inside your query if it does not need to.
Case :Show 50 data for each pages, need 0.1 KB * 2 column * 50 rows = 10 KB (Small, Isn't it?)
PaginationQuery Limit and Offset
Faster retrieve data to show to end user
Join Many Tables Are Bad
Split “joined query” Total Sales By Main Cetegory
Categoryid category_i
dmain_category_
id1 5 32 6 23 7 14 8 4
Map_Category
id item_id category_id
1 8001 52 8002 63 8003 74 8004 8
Item_Category Orderid item_i
dtotal_sales
1 8001 32 8002 23 8003 14 8004 4
id name1 Fashion2 Healthy3 Elekrtonic4 Others5 TV6 Tooth Health7 Shoes8 Toys
Split “joined query” Case Query
SELECT order.total_salesFROM order
LEFT JOIN (SELECT item_category.item_id,
item_category.category_idmap_category.main_category_id
FROM item_categoryLEFT JOIN map_category
ON item_category.category_id = map_category.category_id
GROUP BY item_category.item_id) AS flag_categoryON order.item_id = flag_category.item_id
Split “joined query” Part 1
id name1 Fashion2 Healthy3 Elekrtonic4 Others5 TV6 Tooth Health7 Shoes8 Toys
id category_id
main_category_id
1 5 32 6 23 7 14 8 4
Map_Category
SELECT category_id,main_category_idFROM map_category
Category
Split “joined query” Part 2
id item_id category_id
1 8001 32 8002 23 8003 14 8004 4
SELECT category_id,item_idFROM item_category
Item Category
Split “joined query” Part 3
id item_id
total_sales
1 8001 32 8002 23 8003 14 8004 4
SELECT item_id , total_sales FROM order
Order
Caching Mechanism
Load data faster without a query to the server
Caching Mechanism Redis using RAM to store the data It helps to fetch the data faster, processing
data in RAM is faster than Hard Disk
Redis using key-value data structure We can get specific collection using specific
key
Caching Mechanism Sample implementation
Denormalization table
it contains rows with multiple values for an attribute (repeating groups) or
Denormalization is the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data.
https://en.wikipedia.org/wiki/Denormalization
Denormalization table
Still, denormalization brings the danger of update anomalies back to the database. Therefore, you have to do it deliberately. You should document any
denormalization thoroughly.
Id name1 TIKI2 JNE
Id name1 Jakarta2 Depok
Id name1 Shoes2 Handphone
Shipping Address item
Order_id Order_date Shipping_name
Address_name
Item_name
12010 2016/05/26 TIKI Jakarta Handphone12011 2016/05/26 TIKI Depok Handphone
“ The fastest query is the one you never make
“
Andrew Kaligis [email protected]
Ajeng Tya Meiranti [email protected]