InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB...

21
InnoDB Full-text Search in MySQL 5.6 By Ernie Souhrada Copyright © 2006-2014 Percona LLC In "InnoDB Full-text Search in MySQL 5.6," senior consultant Ernie Souhrada opens with a quick overview of Full-text Search (FTS) in InnoDB and some observations that he's made while getting it configured. In Chapter 2, he compares query results between MyISAM FTS and InnoDB FTS over the same data sets and also provides insight into query performance. He closes by revisiting some of the “quirks” from Chapters 1 and 2 to see if the behavior has changed.

Transcript of InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB...

Page 1: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6

By Ernie Souhrada

Copyright © 2006-2014 Percona LLC

In "InnoDB Full-text Search in MySQL 5.6," senior consultant Ernie Souhrada opens with a quick overview of Full-text Search (FTS) in InnoDB and some observations that he's made while getting it configured. In Chapter 2, he compares query results between MyISAM FTS and InnoDB FTS over the same data sets and also provides insight into query performance. He closes by revisiting some of the “quirks” from Chapters 1 and 2 to see if the behavior has changed.

Page 2: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

Table of Contents

310

Chapter 1: FTS in InnoDB Chapter 2: The Queries!Chapter 3: Performance 17

Percona was founded in August 2006 by Peter Zaitsev and Vadim Tkachenko and now employs a global network of experts with a staff of more than 100 people. Our customer list is large and diverse, including Fortune 50 firms, popular websites, and small startups. We have over 1,800 customers and, although we do not reveal all of their names, chances are we're working with every large MySQL user you've heard about. To put Percona's MySQL expertise to work for you, please contact us.

About Percona

Copyright © 2006-2013 Percona LLC

InnoDB Full-text Search in MySQL 5.6

Skype: oncall.perconaGTalk: [email protected] (AOL Instant Messenger): oncallpercona Telephone direct-to-engineer: +1-877-862-4316 or UK Toll Free: +44-800-088-5561Telephone to live operator: +1-888-488-8556 Customer portal: https://customers.percona.com/

Is this an emergency? Get immediate assistance from Percona Support 24/7. Click here

Page 3: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 1: FTS in InnoDB

Chapter 1: FTS in InnoDB

I’ve never been a very big fan of MyISAM; I would argue that in most situations, any possibleadvantages to using MyISAM are far outweighed by the potential disadvantages and the strengthsof InnoDB. However, up until MySQL 5.6, MyISAM was the only storage engine with support forfull-text search (FTS).

And I’ve encountered many customers for whom the prudent move would be a migration toInnoDB, but due to their use of MyISAM FTS, the idea of a complete or partial migration was, forone reason or another, an impractical solution. So, when FTS for InnoDB was first announced, Ithought this might end up being the magic bullet that would help these sorts of customers realize allof the benefits that have been engineered into InnoDB over the past few years and still keep theirFTS capability without having to make any significant code changes.

Unfortunately, I think that hope may be premature. While it is true that InnoDB full-text search inMySQL 5.6 (part 1) in MySQL 5.6 is syntactically identical to MyISAM full-text search, in the sensethat the SQL required to run a MATCH .. AGAINST is the same (modulo any new featuresintroduced with InnoDB full-text search), that’s largely where the similarities end.

Chapter 1 provides a very quick overview of FTS in InnoDB and some observations that I’ve madewhile getting it configured. In Chapter 2, I’ll compare query results between MyISAM FTS andInnoDB FTS over the same data sets, and then finally in the third installment, we’ll look atquery performance. In Chapter 3 I’ll also revisit some of the “quirks” from Chapters 1 and 2 tosee if the behavior has changed.

NOTE 2: For purposes of this discussion, I used two separate data sets. The first one is a set ofabout 8K very SEO-stuffed web pages, wherein the document title is the page title, and thedocument body is the HTML-tag-stripped body of the page. We’ll call this data set “SEO” – it’sabout 20MB of actual data. The other one is a set of about 790K directory records, each onecontaining the name, address, and some other public-records-type information about each person.We’ll call this data set “DIR”, and it’s about 155MB of actual data.

NOTE 3: Also, keep in mind that I used the community editions of MySQL 5.5.30 and MySQL5.6.10 with no tuning whatsoever (with one exception that I’ll explain below) – the idea behindthis investigation wasn’t to find out how to make InnoDB FTS blazingly-fast, but simply to get asense of how it works compared to traditional MyISAM FTS. We’ll get to performance in the thirdinstallment. For now, the important number here is to note that the InnoDB buffer pool for my 5.6instance is 128MB – smaller than the size of my DIR data.

So, with all of that out of the way, let’s get to it.

3

Page 4: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 1: FTS in InnoDB

Here is our basic table definition for the DIR dataset. The table for the SEO dataset looks identical,except that we replace “full_name” with a VARCHAR(255) “title” and “details” with a TEXT“body”.

We also have identical tables created in 5.5.30 where, of course, the only difference is that theengine is MyISAM rather than InnoDB. Loading the data was done via a simple Perl script,inserting one row at a time with AutoCommit enabled – remember, the focus here isn’t onperformance just yet.

Having loaded the data, the first thing we notice is that there are a lot of “new” InnoDB tablespacefiles in our database directory:

4

Page 5: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 1: FTS in InnoDB

By comparison, this is what we see on the MyISAM side:

5

I also observed that if I simply load the data into an InnoDB table that has never had a full-text index on it, and then I create one, the following warning is generated:

This doesn’t make a lot of sense to me. Why does InnoDB need to add a hidden column (similar to GEN_CLUST_INDEX when you don’t define a PRIMARY KEY, I assume) when I already have an INT UNSIGNED PK that should suffice as any sort of document ID ? As it turns out, if you create a column called FTS_DOC_ID which is a BIGINT UNSIGNED NOT NULL with a unique index on it, your table doesn’t need to be rebuilt. The most important item to note here – FTS_DOC_ID must be spelled and specified exactly that way – IN ALL CAPS. If you try “fts_doc_id” or any other mixture of lettercase, you’ll get an error:

Various points in the documentation mention the notion of a “document ID” that “might reflect the value of an ID column that you defined for the underlying table, or it can be a sequence value generated by InnoDB when the table does not contain a suitable column,” but there are only a handful of references to FTS_DOC_ID found when searching the MySQL 5.6 manual, and the only page which appears to suggest how using an explicitly-defined column is done is this one, which discusses improving bulk insert performance.

At the very bottom, the page claims that you can speed up bulk loading into an InnoDB FT index by declaring a column called FTS_DOC_ID at table creation time of type BIGINT UNSIGNED NOT NULL with a unique index on it, loading your data, and then creating the FT index after the data is loaded.

Page 6: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 1: FTS in InnoDB

6

Various points in the documentation mention the notion of a “document ID” that “might reflect the value of an ID column that you defined for the underlying table, or it can be a sequence value generated by InnoDB when the table does not contain a suitable column,” but there are only a handful of references to FTS_DOC_ID found when searching the MySQL 5.6 manual, and the only page which appears to suggest how using an explicitly-defined column is done is this one, which discusses improving bulk insert performance.

At the very bottom, the page claims that you can speed up bulk loading into an InnoDB FT index by declaring a column called FTS_DOC_ID at table creation time of type BIGINT UNSIGNED NOT NULL with a unique index on it, loading your data, and then creating the FT index after the data is loaded.

One obvious problem wih those instructions is that if you define a column and a unique key as they suggest, your data won’t load due to unique key constraint violations unless you also do something to provide some sort of sequence value for that column, whether as an auto_increment value or via some other means, but the bit that troubles me further is the introduction of a column-level case-sensitivity requirement that only seems to actually matter at table creation time.

Once I’ve got a table with an explicit FTS_DOC_ID column, however, MySQL apparently has no problem with either of the following statements:

Philosophically, I find that kind of behavior unsettling. I don’t like case-sensitivity in my table or column names to begin with (I may be one of the few people that likes lower_case_table_names = 1 in /etc/my.cnf), but I think it’s even worse that the case-sensitivity only matters some of the time. That strikes me as a good recipe for DBA frustration.

Now, let’s return to all of those FTS_*.ibd files. What, exactly, are they? In short, the _CONFIG.ibd file contains configuration info about the FT index (the same sort of configuration data that can be queried out of the I_S.INNODB_FT_CONFIG table, as I’ll discuss momentarily), and the others contain document IDs of new rows that are added to or removed from the table and which need to be merged back into or removed from the main index.

I’m not entirely sure about the_STOPWORDS.ibd file just yet; I thought it might be somehow related to a custom stopwords table, but that doesn’t seem to be the case (or at least not in the way that I had thought), so I will need to look through the source to figure out what’s going on there.

Page 7: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 1: FTS in InnoDB

In any case, for each new FULLTEXT KEY that you create, you’ll get a correspondingFTS_*_DOC_ID.ibd file (but none of the others), and if you drop a FT index, the its correspondingFTS_*_DOC_ID.ibd file will also be removed. HOWEVER, even if you drop all of the FT indexes for a given table, you’re still left with all of the other FTS_*.ibd files, and it appears that the only way to get rid of them is to actually rebuild the table.

Also, while we’re on the subject of adding and dropping FT indexes, it’s entirely possible to DROPmultiple FT indexes with InnoDB in the same ALTER TABLE statement, but it’s not possible toCREATE more than one at a time. If you try it, this is what happens:

7

That’s an odd limitation. Do it as two separate ALTER statements, and it appears to work fine.

But here’s where things start to get even weirder.

According to the documentation, if we specify the name of a table that has a FT index for the global variable innodb_ft_aux_table, we should be able to get some statistics about the FT indexes on that table by querying the various I_S.INNODB_FT_* tables.

In particular, the INNODB_FT_CONFIG table is supposed to “display metadata about the FULLTEXT index and associated processing for an InnoDB table.”

The documentation also tells us that we can keep our FT indexes up to date by setting innodb_optimize_fulltext_only = 1 and then running OPTIMIZE TABLE, and that we might have to run OPTIMIZE TABLE multiple times if we’ve had a lot of changes to the table.

This all sounds pretty good, in theory, but at least some part of it doesn’t seem to work. First, let’s check the stats immediately after setting these variables, and then let’s push some additional data into the table, run an OPTIMIZE or two, delete some data, and see what happens:

Page 9: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 1: FTS in InnoDB

9

I ran OPTIMIZE TABLE several more times, and each execution took between 6 and 8 seconds, but the output of the query against I_S.innodb_ft_config never changed, so it seems like at least some of the diagnostic output isn’t working quite right.

Intuitively, I would expect some changes in total_word_count, or optimize_(start|end)_time, and the like. However, if I check some of the other I_S tables, I do find that the number of rows in I_S.innodb_ft_index_table is changing, so it’s pretty clear that I do actually have a FT index available.

At the start of this post, I mentioned that I did make one configuration change to the default InnoDBsettings for MySQL 5.6, and that was to change innodb_ft_min_token_size from the default of 3 toa value of 4 so that it would be identical to the MyISAM default. After all, the (naive?) hope here isthat when I run an FTS query against both MyISAM and InnoDB I will get back the same results; ifthis equivalence doesn’t hold, then as a consultant, I might have a hard time recommending thisfeature to my customers, and as an end user, I might have a hard time using the feature at all,because it could completely alter the behavior of my application unless I also make a nontrivialnumber of code changes.

In the next chapter, we’ll reload and reset our SEO and DIR data sets back to their initial states,run some queries, and compare the output. Stay tuned, it gets rather curious.

Page 10: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 2: The Queries!

Chapter 2: The Queries!

In Chapter 1 we took a quick look at some initial configuration of InnoDB full-text search anddiscovered a little bit of quirky behavior; here, we are going to run some queries and compare theresult sets.

Our hope is that the one of two things will happen; either the results returned from a MyISAM FTSquery will be exactly identical to the same query when performed against InnoDB data, OR that theresults returned by InnoDB FTS will somehow be “better” (as much as it’s actually possible to dothis in a single blog post) than what MyISAM gives us.

Recall that we have two different sets of data, one which is the text of roughly 8000 SEO-stuffedwebpage bodies (we call that one SEO) and the other, which we call DIR, that is roughly 800,000directory records with name, address, and the like.

We are using MySQL 5.5.30 and MySQL 5.6.10 with no configuration tuning other than to set innodb_ft_min_token_size to 4 (rather than the default of 3) so that it matches MyISAM’s default ft_min_word_length.

First, MyISAM, with MySQL 5.5, on the SEO data set:

10

The same query, run against InnoDB on 5.6.10:

Page 11: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 2: The Queries!

11

Wow. I’m not sure if I should be concerned so much that the *scores* are different, but the matches* are COMPLETELY DIFFERENT between 5.5/MyISAM and 5.6/InnoDB.

Now, we know that MyISAM FTS does have the caveat with natural language searches whereby a word that’s present in 50% or more of the rows is treated as a stopword, so does that account for our problem?

It might, because the word ‘arizona’ appears in over 6900 of the 7150 rows, and the word ‘records’ appears in 7082 of them. So let’s try something else that’s less likely to have that issue.

The word “corporation” appears in 143 of the documents; the word “forms” appears in 439 of them, and the word “commission” appears in 130. There might be some overlap here, but even if there isn’t, 143+130+439 < 0.5 * 7150, so none of these should be treated as stopwords in MyISAM.

With 5.5:

With 5.6:

OK, now I’m starting to get a little worried.

The docs do tell us that the default stopword list is substantially different between InnoDB and MyISAM, and as it turns out, there are only 36 stopwords in the default InnoDB list, but there are 543 stopwords in the default MyISAM list.

What happens if we take the MyISAM stopwords, insert them into a table, and configure that table to be our stopword list for InnoDB?

Page 12: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 2: The Queries!

12

This is the table that we’re trying to emulate:

The docs tell us that we need to create an *InnoDB* table with a single VARCHAR column named “value”. OK, sounds easy enough:

But, when we try to use this table, here’s what comes back:

And here’s what appeared in the server’s error log:

Uh… Does this mean that my next blog post should be entitled, “When is a VARCHAR Not Really a VARCHAR?” Thinking that maybe this was a case of GEN_CLUST_INDEX causing me issues, I tried adding a second column to the table which was an integer PK, and in another attempt, I tried just making the “value” column the PK, but neither of those worked.

Also, trying to set innodb_ft_user_stopword_table produced the same error. I submitted a bug report (68450), and as you can see from the bug discussion, it turns out that this table is character-set-sensitive. If you’re going to use your own stopword table for InnoDB FTS, at least for the moment, this table must use the latin1 character set.

Page 13: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 2: The Queries!

13

As far as I can tell, this little gotcha doesn’t appear to be mentioned anywhere in the MySQL 5.6 documentation; every place where it talks about creating one of these stopword tables, it simply mentions the table engine and the column name/type, so I’m not sure if this is an intentional restriction that just needs to be better documented or if it’s a limitation with the InnoDB FTS feature that will be removed in a later version.

Now that we’ve sorted this out, let’s drop and rebuild our FT index on the InnoDB table and try theabove queries one more time. We already know what the MyISAM results are going to be; do our InnoDB results change? No, they are exactly the same, although the scores did change slightly.

And with 5.6:

What about a Boolean mode query? The docs tell us that if we use Boolean mode, and we put a“+” in front of our search term, then that term *must* appear in the search results. But does it?

With 5.5:

Page 14: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 2: The Queries!

14

There’s only one row in the table that actually matches all three search terms, and in this case,both MyISAM and InnoDB FTS performed identically and found it. I’m not really concerned aboutthe fact that the next four rows are completely different; the scores are zero, which means “nomatch.” This looks promising, so let’s explore further. Again, from the docs, if we run a booleanmode query where some of the search terms are prefixed with “+” and others have no prefix,results that have the unprefixed term will be ranked higher than those with out it. So, for example, ifwe change the above query to be “+james +peterson arizona” then we might expect to get backmultiple matches containing the words “James” and “Peterson”, and we should expect the recordfrom Arizona to be towards the top of the list.

With 5.5, this is exactly what happens:

With 5.6, we’re not so fortunate.

These results aren’t even close to identical. As it turns out, the full record for “Alphonso Lee Peterson Sr” does also contain the name “James”, and the word “Peterson” is listed in there several times, but “Arizona” is not present at all, whereas the record for “James R Peterson” had all three search terms and no significant repetition of any of them. Using this particular query,“James R Peterson” is #15 on the list.

At this point, it’s pretty obvious that the way MyISAM is calculating the scores is much different from the way that InnoDB is doing it, and given what I said earlier about the repetition of words in the “Alphonso Lee Peterson Sr” record versus the “James R Peterson” one, we might argue that InnoDB is actually behaving more correctly than MyISAM.

Page 15: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 2: The Queries!

15

Imagine if we were searching through newspaper articles or something of that sort, and we were looking for queries containing the word “MySQL” – odds are that an article which has 10 instances of “MySQL” might be more desirable to us than an article which only has it mentioned once. So if I look at these results from that persepctive, I can understand the how and the why of it. My concern is that there are likely going to be people who believe that switching to InnoDB FTS is simply a matter of upgrading to 5.6 and running ALTER TABLE foo ENGINE=InnoDB. In theory, yes. In practice, not even close.

I tried one more Boolean search, this time looking for someone’s full name, which I knew to bepresent only once in the database, and I used double quotes to group the search terms as a singlephrase:

With 5.5:

Looks good, there he is. Now what happens under 5.6?

In the immortal words of Homer J. Simpson, “D’OH!!” Why is MyISAM able to locate this record but InnoDB cannot find it at all? I suspect that the “B” is causing problems for InnoDB, because it’s only a single character and we’ve set innodb_ft_min_token_size to 4.

Thus, when InnoDB is parsing the data and building the word list, it’s completely ignoring Mr. Smith’s middle initial. To test this hypothesis, I reset innodb_ft_min_token_size to 1, dropped/rebuilt the InnoDB index, and tried again.

Page 16: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 2: The Queries!

16

Aha, there he is! Based on that result, I would caution anyone designing an application that’s going to use InnoDB FTS to be quite mindful of the types of queries that you’re expecting your users to run. In particular, if you expect or are going to allow users to enter search phrases that include initials, numbers, or any other string of length less than 3 (the default), I think you’re going to be forced to set innodb_ft_min_token_size to 1. Otherwise you’ll run into the same problem as our Mr. Smith here.

[This does raise the question of why it works with MyISAM when ft_min_word_length defaults to 4, but that is a topic for another day.]

Note that there may or may not be some performance implications to cranking this value all the way down; that is something I have not yet tested but will be reporting on in Chapter 3. I can, however, confirm that the on-disk size of my DIR dataset is exactly the same with a setting of 1 versus a setting of 4.

This may or may not be the case with multi-byte character sets or with ideographic languages such as Japanese, although Japanese poses its own unique problems for FTS of any kind due to its lack of traditional word boundaries.

In any event, it appears that we’ve solved the Boolean-mode search issue, but we still have vastly different results with the natural-language-mode search. For those of you who are expecting and need to have the MyISAM-style search results, there is at least one potential escape hatch from this rabbit hole. When defining a FULLTEXT KEY, you can use the “WITH PARSER” modifier to specify the name of a UDF which references your own custom-written fulltext parser plugin. Thus I am thinking that it may be possible to take the MyISAM full-text parser code, convert it to a plugin, and use it for InnoDB FT indexes where you’re expecting MyISAM-style results. Verifying or refuting this conjecture is left as an exercise for the reader.

That last point bears particular emphasis, as it also illustrates an important best practice even if FTS isn’t involved. Always test how your application behaves as a result of a major MySQL version upgrade before rolling it into production! Percona has tools (pt-upgrade and Percona Playback) that can help you with this. These tools are free and open source, please use them. You, and your users, will be happy that you did.

* There are parts of InnoDB FTS configuration which are both letter-case and character-setsensitive. Watch out!* When you add your first FULLTEXT KEY to an InnoDB table, be prepared for a tablerebuild.* Calculation of match score is completely different between the two engines; sometimesthis leads to wildly different results.* If you were hoping to use InnoDB FTS as a simple drop-in replacement for your currentMyISAM FTS, the results may surprise you.

A quick recap of what we’ve learned so far:

Page 17: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 3: Performance

Chapter 3: Performance

17

Recall that we have been working with two data sets, one which I call SEO (8000-keyword-stuffedweb pages) and the other which I call DIR (800K directory records), and we are comparingMyISAM FTS in MySQL 5.5.30 versus InnoDB FTS in MySQL 5.6.10.

For reference, although this is not really what I would call a benchmark run, the platform I’m using here is a Core i7-2600 3.4GHz, 32GiB of RAM, and 2 Samsung 256GB 830 SSDs in RAID-0. The OS is CentOS 6.4, and the filesystem is XFS with dm-crypt/LUKS. All MySQL settings are their respective defaults, except for innodb_ft_min_token_size, which is set to 4 (instead of the default of 3) to match MyISAM’s default ft_min_word_len.

Also, recall that the table definition for the DIR data set is:

The table definition for the SEO data set is:

Table Load / Index Creation

First, let’s try loading data and creating our FT indexes in one pass – i.e., we’ll create the FT indexes as part of the original table definition itself. In particular, this means adding “FULLTEXT KEY (`full_name`, `details`)” to our DIR tables and adding “FULLTEXT KEY (`title`, `body`)” to the SEO tables. We’ll then drop these tables, drop our file cache, restart MySQL, and try the same process in two passes: first we’ll load the table, and then we’ll do an ALTER to add the FT indexes. All times in seconds.

Page 18: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 3: Performance

18

Interesting. For MyISAM, we might say that it really doesn’t make too much difference which way you proceed, as the numbers from the one-pass load and the two-pass load are within a few percent of each other, but for InnoDB, we have mixed behavior. With the smaller SEO data set, it makes more sense to do it in a one-pass process, but with the larger DIR data set, the two-pass load is much faster.

Recall that when adding the first FT index to an InnoDB table, the table itself has to be rebuilt to add the FTS_DOC_ID column, so I suspect that the size of the table when it gets rebuilt has a lot to do with the performance difference on the smaller data set. The SEO data set fits completely into the buffer pool, the DIR data set does not. That also suggests that it’s worth comparing the time required to add a second FT index (this time we will just index each table’s TEXT/MEDIUMTEXT field). While we’re at it, let’s look at the time required to drop the second FT index as well. Again, all times in seconds.

InnoDB wins this second test all around. I’d attribute InnoDB’s win here partially to not having to rebuild the whole table with second (and subsequent) indexes, but also to the fact that at least some the InnoDB data was already in the buffer pool from when the first FT index was created. Also, we know that InnoDB generally drops indexes extremely quickly, whereas MyISAM requires a rebuild of the .MYI file, so InnoDB’s win on the drop test isn’t surprising.

Query PerformanceRecall the queries that were used in the previous post from this series:

Page 19: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 3: Performance

19

The queries were run consecutively from top to bottom, a total of 10 times each. Here are theresults in tabular format:

Not a lot of variance in execution times for a given query, so that’s good, but InnoDB is alwayscoming back slower than MyISAM. In general, I’m not that surprised that MyISAM tends to befaster; this is a simple single-threaded, read-only test, so none of the areas where InnoDB shines(e.g., concurrent read/write access) are being exercised here, but I am quite surprised by queries#3 and #5, where InnoDB is just getting smoked.

I ran both versions of query 5 with profiling enabled, and for the most part, the time spent in eachquery state was identical between the InnoDB and MyISAM versions of the query, with oneexception.

InnoDB: | Creating sort index | 0.626529 |MyISAM: | Creating sort index | 0.014588 |

That’s where the bulk of the execution time is. According to the docs, this thread state means that the thread is processing a SELECT which required an internal temporary table. Ok, sure, that makes sense, but it doesn’t really explain why InnoDB is taking so much longer, and here’s where things get a bit interesting. If you recall part 2 in this series, query 5 actually returned 0 results when run against InnoDB with the default configuration because of the middle initial “B”, and I had to set innodb_ft_min_token_size to 1 in order to get results back. For the sake of completeness, I did that again here, then restarted the server and recreated my FT index. The results? Execution time dropped by 50% and ‘Creating sort index’ didn’t even appear in the query profile:

Page 20: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

InnoDB Full-text Search in MySQL 5.6Chapter 3: Performance

20

Hmmm. It’s still slower than MyISAM by quite a bit, but much faster than before. The reason it’s faster is because it found an exact match and I only asked for one row, but if I change LIMIT 1 to LIMIT 2 (or limit N>1), then ‘Creating sort index’ returns to the tune of roughly 0.5 to 0.6 seconds, and ‘FULLTEXT initialization’ remains at 0.3 seconds. So this answers another lingering question: there is a significant performance impact to using a lower innodb_ft_min_token_size (ifmts), and it can work for you or against you, depending upon your queries and how many rows you’re searching for. The time spent in “Creating sort index” doesn’t vary too much (maybe 0.05s) between ifmts=1 and ifmts=4, but the time spent in FULLTEXT initialization with ifmts=4 was typically only a few milliseconds, as opposed to the 300ms seen here.

Finally, I tried experimenting with different buffer pool sizes, temporary table sizes, per-thread buffer sizes, and I also tried changing from Antelope (ROW_FORMAT=COMPACT) to Barracuda (ROW_FORMAT=DYNAMIC) and switching character sets from utf8 to latin1, but none of these made any difference. The only thing which seemed to provide a bit of a performance improvement was upgrading to 5.6.12. The execution times for the InnoDB FTS queries under 5.6.12 were about 5-10 percent faster than with 5.6.10, and query #2 actually performed a bit better under InnoDB than MyISAM (average execution time 0.00075 seconds faster), but other than that, MyISAM still wins on raw SELECT performance.

So what’s my overall take on InnoDB FTS in MySQL 5.6? I don’t think it’s great, but it’s serviceable. The performance for BOOLEAN MODE queries definitely leaves something to be desired, but I think InnoDB FTS fills a need for those people who want the features and capabilities of InnoDB but can’t modify their existing applications or who just don’t have enough FTS traffic to justify building out a Sphinx/Solr/Lucene-based solution.

Page 21: InnoDB Full-text Search in MySQL 5 - Percona › sites › default › files › Innodb...InnoDB Full-text Search in MySQL 5.6 1 6 Various points in the documentation mention the notion

Ernie joined Percona in April of 2012 as a Senior Consultant, bringing many years of diverse experience as a systems architect and engineer in both independent consulting and standard-employee roles. He has worked in almost every technology role present in the Internet era, from Perl/Java developer to Linux sysadmin, MySQL DBA to Cisco network engineer, security auditor to IT engineering manager. He thrives on and excels at taking on those novel challenges which require creative cross-disciplinary solutions.

Prior to joining Percona, Ernie was working heavily with deployment automation and designing infrastructure stacks driven by some of today’s leading virtualization platforms.

Ernie holds a BS in Mathematics and a BA in Political Science from Arizona State University in Tempe, Arizona, where he lives with his wife and two cats, patiently awaiting the start of the next ski season.

About the author: Ernie Souhrada

Copyright © 2006-2014 Percona LLC