Why MySQL could be slow with large tables

8/7/2019 Why MySQL could be slow with large tables

http://slidepdf.com/reader/full/why-mysql-could-be-slow-with-large-tables 1/14

Why MySQL could be slow with large tables ?

June 9, 2006 By Peter Zaitsev 125 Comments

If you¶ve been reading enough database related forums, mailing lists or blogs you

probably heard complains about MySQL being unable to handle more than 1.000.000

(or select any other number) rows by some of the users. On other hand it is well known

with customers like Google, Yahoo, LiveJournal,Technocarati MySQL has installations

with many billions of rows and delivers great performance. What could be the reason ?

The reason is normally table design and understanding inner works of MySQL. If you

design your data wisely considering what MySQL can do and what it can¶t you will get

great perfomance if not, you might become upset and become one of thouse bloggers.

Note ± any database management system is different in some respect and what works

well for Oracle,MS SQL, PostgreSQL may not work well for MySQL and other way

around. Even storage engines have very important differences which can affect

performance dramatically.

The three main issues you should be concerned if you¶re dealing with very large data

sets areBuffers, Indexes and Joins.

Buffers

First thing you need to take into account is the fact ± situation when data fits in memory

and when it does not are very different. If you started from in-memory data size and

expect gradual performance decrease as database size grows you may be surprised by

serve drop in performance. This especially apples to index lookus and joins which we

cover later. As everything usually slows down a lot once it does not fit in memory the

good solution is to make sure your data fits in memory as good as possible. This could

be done by data partitioning (ie old and rarely accessed data stored in different servers),

multi-server partitioning to use combined memory and a lot of other technics which I

should cover at some later time.

So you understand how much having data in memory changed things here is small

example with numbers. If you have your data fully in memory you could perform over 300.000 of random lookups per second from single thread depending on system and

table structure. Now if you data fully on disk (both data and index) you would need 2+

IOs to retrieve the row which means you get about 100 rows/sec. Note multiple drives

do not really help a lot as we¶re speaking about single thread/query here. So difference



is 3.000 times! It might be a bit too much as there are few completely uncached

workloads but 100+ times difference is quite frequent.

Indexes

What everyone knows about indexes is the fact they are good to speed up accesses to

database. Some people would also remember if indexes are helpful or not depends on

index selectivity ± how large proportion of rows matches to particular index value or

range. What is often forgotten about is ± depending if workload is cached or not

different selectivity might show benefit from using indexes. In fact even MySQL

optimizer currently does not take it into account. For In memory workload index

accesses might be faster even if 50% of rows are accessed, while for disk IO bound

accessess we might be better of doing full table scan even if only few percent or rows

are accessed.

Lets do some computations again. Consider table which has 100 byte rows. With

decent SCSI drive we can get 100MB/sec read speed which gives us about 1.000.000

rows per second for fully sequential access, jam packed rows ± quite possible scenario

for MyISAM tables. Now if we take the same hard drive for fully IO bound workload it will

be able to provide just 100 row lookups by index pr second. The difference is 10.000

times for our worse case scenario. It might be not that bad in practice but again it is not

hard to reach 100 times difference.

Here is little illustration I¶ve created the table with over 30 millions of rows. ³val´ column

in this table has 10000 distinct value, so range 1..100 selects about 1% of the table. Thetimes for full table scan vs range scan by index:mysql> select count(pad) from large;

+------------+

| count(pad) |

+------------+

| 31457280 |

+------------+

1 row in set (4 min 58.63 sec)

mysql> select count(pad) from large where val between 1 and 100;

+------------+| count(pad) |

+------------+

| 314008 |

+------------+

1 row in set (29 min 53.01 sec)

Also remember ± not all indexes are created equal. Some indexes may be placed in

sorted way or pages placed in random places ± this may affect index scan/range scan



speed dramatically. The rows referenced by indexes also could be located sequentially

or require radom IO if index ranges are scanned. There are also clustered keys in

Innodb which combine index access with data access, saving you IO for completely disk

bound workloads.

There are certain optimizations in works which would improve performance of index

accesses/index scans. For example retrieving index values first and then accessing

rows in sorted order can be a lot of help for big scans. This will reduce the gap but I

doubt it will be closed.

Joins

Joins are used to compose the complex object which was previously normalized to

several tables or perform complex queries finding relationships between objects.

Normalized structure and a lot of joins is right way to design your database as textbooks

teach you« but when dealing with large data sets it could be recepie to disaster. The

problem is not the data size ± normalized data normally becomes smaller, but

dramatically increased number of index lookups which could be random accesses. This

problem exists for all kinds of applications, however for OLTP applications with queries

examining only few rows it is less of the problem. Data retrieval, search, DSS, business

intelligence applications which need to analyze a lot of rows run aggregates etc is when

this problem is the most dramatic.

Some joins are also better than others. For example if you have star join with dimention

tables being small it would not slow things down too much. On other hand join of fewlarge tables, which is completely disk bound can be very slow.

One of the reasons elevating this problem in MySQL is lack of advanced join methods

at this point (the work is on a way) ± MySQL can¶t do hash join or sort merge join ± it

only can do nested loops method which requires a lot of index lookups which may be

random.

Here is good example. As we saw my 30mil rows (12GB) table was scanned in less

than 5 minutes. Now if we would do eq join of the table to other 30mil rows table and it

will be completely random. We¶ll need to perform 30 millions of random row reads,which gives us 300.000 seconds with 100 rows/sec rate. So we would go from 5

minutes to almost 4 days if we need to do the join. Some people assume join would be

close to two full table scans (as 60mil of rows need to be read) ± this is way wrong.



Do not take me as going against normalization or joins. It is great principle and should

be used when possible. Just do not forget about performance implications designing the

system and do not expect joins to be be free.

Finally I should mention one more MySQL limitation which requires you to be extra

careful working with large data sets. In MySQL single query runs as single thread (with

exeption of MySQL Cluster) and MySQL issues IO requests one by one for query

execution, which means if single query execution time is your concern many hard drives

and large number of CPUs will not help. Sometimes it is good idea to manually split

query into several, run in parallel and aggregate result sets.

So if you¶re dealing with large data sets and complex queries here are few tips

Try to fit data set you¶re working with in memory ± Processing in memory is so

much faster and you have whole bunch of problems solved just doing so. Use multiple

servers to host portions of data set. Store portion of data you¶re going to work with in

temporary table etc.

Prefer full table scans to index accesses ± For large data sets full table scans are

often faster than range scans and other types of index lookups. Even if you look at 1%

or rows or less full table scan may be faster.

Avoid joins to large tables Joining of large data sets using nested loops is very

expensive. Try to avoid it. Joins to smaller tables is OK but you might want to preload

them to memory before join so there is no random IO needed to populate the caches.

With proper application architecture and table design you can build applicationsoperating with very large data sets based on MySQL.

Filed Under: Insight for DBAs, Production, Tips

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when hefounded Percona. Peter has a Master's Degree in Computer Science and is an expert in

database kernels, computer hardware, and application scaling.

Comments



1. David Stone says:

November 21, 2009 at 8:17 am

Hello all,

I have a very large table, 3 billion rows, 300GB in size. It¶s of ip traffic logs. Select

queries were slow until I added an index onto the timestamp field. It took about 6 hours

to create the index, as I tweaked some buffer sizes to help it along. Previous attempts

at creating the index took days. The index file is about 28GB in size now. The server

has 4GB of RAM, dual Core 2 2.6GHz processors. All it does is process these logs,

and handle the occasional query we need to do when we run a report for someone,

maybe once a week.

Adding the index really helped our reporting, BUT now the inserts are taking forever.

Each file we process is about 750MB in size, and we insert 7 of them nightly. It used to

take about 9 hours to insert all the files, and now it takes upwards of 15 hours, which is

becoming a problem.

My original insert script used a mysqli prepared statement to insert each row as we

iterate through the file, using the getcsv() funtion. I did some reading and found some

instances where mysqli can be slow, so yesterday modified the script to use regular

mysql functions, but using an insert statement with multiple VALUES to insert 50

records at a time. There is no appreciable performance gain.

I know I will most likely have to break this very large table down into tables by week.

My question is what my plan of attack should be to get the best insert performance?

MERGE tables?

PARTITION tables?

Simply break up my big table into smaller ones?

I think the root of my issue is that the indexes don¶t fit into RAM.



2. David Stone says:

November 21, 2009 at 8:21 am

I forgot to add that while the server is inserting the logs, I see very LOW disk

throughput ² 1.5Mb/s! And very low CPU usage as well! I was hoping to see the

machine either disk or CPU-bound to help troubleshoot what the problem is, but this is

not the case.

Thanks all!

3. zia says:

December 15, 2009 at 5:45 am

HI,

What are the causes of Mysql index lost ? Can Mysql lose index during high traffic load

?

Best Regards

Zia Ullah Butt

4. Mustafa TURAN says:

December 27, 2009 at 6:32 pm



records having same value in column5 and another value in column10 of table A equal

column6 of table B. I use inner join. By this post I try to divide into smaller table and

running one sql per time, but still not faster. Could you please advise me. I need this in

resurgence.

Best regards

Seuth

7. Frank Qaxy says:

August 17, 2010 at 6:44 am

³One of the reasons elevating this problem in MySQL is lack of advanced join methods

at this point (the work is on a way) ± MySQL can¶t do hash join or sort merge join ± it

only can do nested loops method which requires a lot of index lookups which may be

random.´

Are the advanced join methods available now?

8. Eddie says:

August 22, 2010 at 2:16 am

Hello,

I need to delete all 300,000 records found in one table from a table that has 20,000,000

records, and neither the subquerry i wrote nor the join i wrote give me any result at all

in over 12 hours. This is being done locally on a laptop with 2 GB of Ram and a dual



core 1.86 Ghz Cpu ± while nothing else is happening. The large table has 2 indexes on

it and totals 3 GB ± more than the ram in the machine ± this done on Ubuntu 10. Is this

normal ± for a delete involving 2 tables to take so long? Is there something special

about a delete that makes it MUCH MUCH slower than a select? And will it improve

performance to remove the indexes so the larger table size is less than 2 GB? (There

are only 3 fields of a one characterseach in the table plus one field of 40 characters,

which is inexed and is the filed being used for the sql statement relating both tables).

MySQL works so fast on sume things, that I wonder if I have a real problem in my

Ubuntu OS or Mysql 5.1 or Php 5.1.x ? Can a real expert please comment on whether

these are realistic times or not, and offer advice on how to improve performance

without adding RAM ± which I will be able to do when I can afford new hardware.

Thanks very much.

9. Erick says:

August 22, 2010 at 2:23 am

Eddie, this depends on

1. Your MY.CONF settings

http://www.notesbit.com/index.php/web-mysql/mysql/mysql-tuning-optimizing-my-cnf-

file/

2. The type of table it is ² is it MYISAM or INNODB? For InnoDB, you may also fix the

INNODB specific settings.

10. nilesh says:

August 24, 2010 at 7:56 am



I think MySQL does support hash joins and merge joins (merge on primary, hash

otherwise). Been a while since I¶ve worked on MySQL. Please correct me if I am

wrong. And update the current status on the blog itself. Would be a great help to

readers.Thanks

11. Alex says:

September 21, 2010 at 12:13 pm

Hi

We have a small Data Warehouse with a 50 million fact table (around 12GB). It is

partitioned by month. (we are using InnoDB 1.0.6)

Some of our queries need to access entire table (full table scan), these queries are

deadly slow. Out I/O system offer around 60MB/sec but before this limit is reached the

I/O system is flooded by very high amount on IOPS (we have observed around 1200IOPS).

Seems to me that the limitation is how MYSQL (or InnoDB) reads data, it is not capable

to do scatter reads, every disk read is only 16K and to read a large table this basically

screws up the I/O system.

The only solution we found is to increase memory and try to cache the table but doesnt

seem to me a REAL solutiom in the long term.

Anything we can do with MYSQL and InnoDB configuration?

Thanks



12. Rag says:

November 13, 2010 at 2:05 pm

I have around 9,00,000 user records, I have Problems with Login, which is very slow

(Terribly Slow), All i have is PHP / MYSQL with a VPS with 768MB RAM. The total size

of the MySQL is Just around 650 MB.

What iam using to login check with this simple query ³SELECT useremail,password

FROM USERS WHERE useremail=´.$_REQUEST['USER_EMAIL'].´AND password=´

.$_REQUEST['USER_PASSWORD'] ;

Sometimes it takes few minutes or else , it times out.

My Max script execution time in PHP is set to 30 Secs.

Any idea to improve?

Thanks in Advance

13. John M Ventimiglia says:

November 13, 2010 at 2:10 pm

While your question is better suited elsewhere ± may I put my ESP cap on and suggest

you add indexes? There are many very good articles on optimizing queries on this site.

You may want to read them.



14. Mohammed says:


Peter,

I have MYSQL database performance issue and I have updated the MYSQL

Performance blog as below link.

Please provide your view on this and its very urgent and critical.

http://forum.percona.com/index.php/t/1639/

Thanks

Mohammed.

15. peter says:


Mohammed,

If you require urgent assistance for project of critical importance forum is not the right

way to seek help as it is only looked at at spare times.

In such cases commercial services work much

better http://www.percona.com/contact/sales/



16. subrahmanyam k says:

February 1, 2011 at 11:48 pm

Hi ,

I am using mysql query

select contentid,SheetName,languageName from contentapplicationview where

sheetName in (select tblapplicationauthorized.date from tblapplicationauthorized where

applicationid=¶´ + applicationId

+ ³µ and authscrname=¶´ + screenName + ³µ) and fkApplicationId=¶´+applicationId+´µ

order by SheetName,languageName

I had 40000 row in database when ever i fire this query in mysql its taking too much

time to get data from database.

what i do fro this.

thanks,

subrahmanyam k

17. Andrew says:

February 24, 2011 at 5:11 am

In response to Rag

What iam using to login check with this simple query ³SELECT useremail,password

FROM USERS WHERE useremail=´.$_REQUEST['USER_EMAIL'].´AND password=´

.$_REQUEST['USER_PASSWORD'] ;



Probably down to the way you mySQL table is setup. I¶d be more concerned about

your login though, I hope a bit further up the script you have a

$_REQUEST['USER_PASSWORD'] =

mysql_real_escape_string($_REQUEST['USER_PASSWORD']);

otherwise some little script kiddy is going to cause you an even bigger problem in the

future.

Why MySQL could be slow with large tables

Documents

Transcript of Why MySQL could be slow with large tables