Fulltext engine for non fulltext searches
-
Upload
adrian-nuta -
Category
Technology
-
view
622 -
download
5
Transcript of Fulltext engine for non fulltext searches
Fulltext engine for
Non-Fulltext Queries
Adrian Nuta // Sphinxsearch // 2013
• Introduction
• Non-fulltext queries
• Special data columns
• Fulltext for speed-up non-fulltext
Introduction
What is Sphinx
• free, open-source, search server
• fast 700 qps /core / 1M docs
• flexible 100+ features
• scalableo 300 mil. q / day
o 50 TB data, 100+ boxes
Fulltext Fields Attributes
Sphinx document
Doc
ID
Integer, Float, Bool, Timestamp, MVA,
String, JSON
...
● Inverted index
● indexed, not stored
● stored, not indexed
● held in memory or
on disk
MySQL
Application
Sphinx
MySQL protocol
MySQL protocol
MySQL language
SphinxQL language
MySQL connector
MySQL connector
SELECT * FROM mytable WHERE ...
Meet SphinxQL
Non-fulltext queries
What Sphinx can do beside fulltext?
• usual WHERE, ORDER, GROUP BY
• GROUP BY custom extensions:
o WITHIN GROUP ORDER BY
o GROUP <N> BY
• Aggregation, timestamp,math functions
• Comparasion functions: IF(), INTERVAL(), IN()
• Geo spatial: GEODIST(), GEOPOLY2D()
WITHIN GROUP ORDER BYmysql> SELECT *,DAY(added) as today FROM facetdemo WHERE property2 = 160 AND today =26 GROUP BY brand_id WITHIN GROUP ORDER BY
price ASC ORDER BY brand_id ASC;
+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+
| id | price | brand_id | property2 | added | title | brand_name | property | today |
+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+
| 520157 | 10 | 1 | 160 | 1382745486 | Product Nine Seven | brand1 | Three | 26 |
| 1726473 | 10 | 2 | 160 | 1382796463 | Product Two Three | brand2 | Eight | 26 |
| 1588875 | 11 | 3 | 160 | 1382762264 | Product Three Six | brand3 | Five | 26 |
| 1556197 | 10 | 4 | 160 | 1382754018 | Product Eight Six | brand4 | Seven | 26 |
| 751443 | 11 | 5 | 160 | 1382803444 | Product Six Three | brand5 | One | 26 |
| 512776 | 11 | 6 | 160 | 1382743642 | Product Ten Five | brand6 | Six | 26 |
mysql> SELECT *,DAY(added) as today FROM facetdemo WHERE property2 = 160 AND today =26 GROUP BY brand_id WITHIN GROUP ORDER
BY price DESC ORDER BY brand_id ASC;
+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+
| id | price | brand_id | property2 | added | title | brand_name | property | today |
+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+
| 815154 | 998 | 1 | 160 | 1382819286 | Product Two Nine | brand1 | Eight | 26 |
| 2793903 | 999 | 2 | 160 | 1382813601 | Product Eight Five | brand2 | Two | 26 |
| 699831 | 1000 | 3 | 160 | 1382790589 | Product One Six | brand3 | Eight | 26 |
| 714052 | 1000 | 4 | 160 | 1382794137 | Product One Ten | brand4 | Three | 26 |
| 2791902 | 999 | 5 | 160 | 1382813140 | Product Five Three | brand5 | Four | 26 |
| 2753725 | 1000 | 6 | 160 | 1382803662 | Product Seven Three | brand6 | Two | 26 |
Using GROUP <N> BY
mysql> SELECT * FROM facetdemo GROUP 3 BY brand_id WITHIN GROUP ORDER BY added DESC ORDER BY brand_id ASC;
+---------+-------+----------+------------+---------------------+------------+----------+
| id | price | brand_id | added | title | brand_name | property |
+---------+-------+----------+------------+---------------------+------------+----------+
| 1479848 | 938 | 1 | 1382735889 | Product Ten Seven | brand1 | Four |
| 2479064 | 398 | 1 | 1382734998 | Product Ten Five | brand1 | Eight |
| 1480553 | 687 | 1 | 1382734048 | Product Four Two | brand1 | One |
| 1479580 | 62 | 2 | 1382734834 | Product Nine Seven | brand2 | Ten |
| 1479585 | 357 | 2 | 1382734834 | Product Six Two | brand2 | Five |
| 477383 | 908 | 2 | 1382733871 | Product Ten Three | brand2 | Eight |
| 2478429 | 425 | 3 | 1382734839 | Product Three Ten | brand3 | Five |
| 477456 | 519 | 3 | 1382734818 | Product Ten One | brand3 | Six |
| 477521 | 190 | 3 | 1382734403 | Product Three Two | brand3 | Five |
| 2478459 | 931 | 4 | 1382734850 | Product One Two | brand4 | Five |
| 1479718 | 891 | 4 | 1382734065 | Product Two One | brand4 | Three |
| 2478514 | 106 | 4 | 1382733868 | Product Six Seven | brand4 | One |
| 477297 | 991 | 5 | 1382734844 | Product Five Eight | brand5 | Four |
| 2479053 | 648 | 5 | 1382733994 | Product Six One | brand5 | Nine |
| 1480798 | 250 | 5 | 1382732121 | Product One Seven | brand5 | Eight |
Using HAVING
mysql> SELECT *,COUNT(*) FROM facetdemo where property2 = 190 and price>900 GROUP BY brand_id HAVING COUNT(*)>1000;
+-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+
| id | price | brand_id | property2 | added | title | brand_name | property | count(*) |
+-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+
| 2566 | 934 | 24 | 190 | 1382615816 | Product One Three | brand24 | Six | 1023 |
| 4807 | 905 | 11 | 190 | 1382616392 | Product Five Six | brand11 | Eight | 1023 |
| 5539 | 985 | 44 | 190 | 1382616552 | Product Ten Four | brand44 | Three | 1009 |
| 7655 | 912 | 10 | 190 | 1382617104 | Product Four Five | brand10 | Ten | 1028 |
| 16837 | 968 | 20 | 190 | 1382619365 | Product One Nine | brand20 | Five | 1015 |
+-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+
5 rows in set (0.17 sec)
Comparing simple queriesOperation Example MySQL Sphinx difference
Filter by integer, group by
integer
WHERE property_int =190
GROUP BY brand_id0.32 0.14 2.2x
Group by integer, order by
count(*)
GROUP BY brand_id ORDER BY
COUNT(*) DESC
1.76 0.53 3.3x
Filter by integer, order by
timestamp
WHERE brand_id=20 ORDER BY
added ASC
0.00 0.14 0
Filter by integer, order by
timestamp and integer
column
WHERE brand_id=20 ORDER BY
added DESC, property_int ASC
0.31 0.19 1.5x
Using IF comparasion
mysql> SELECT COUNT(*), IF( property2=270 OR price<80, 1,
IF(property2=280 OR price> 900,2,3)
) AS expr FROM facetdemo GROUP BY expr;
+----------+------+
| count(*) | expr |
+----------+------+
| 7494455 | 3 |
| 1357178 | 2 |
| 1148366 | 1 |
+----------+------+
3 rows in set (1.04 sec)
Using INTERVAL for segmentation
mysql> SELECT id, price, INTERVAL(price,0,300,600,900) AS pricerange, COUNT(*) FROM facetdemo WHERE
brand_id=27 GROUP BY pricerange ORDER BY pricerange ASC;
+------+-------+------------+----------+
| id | price | pricerange | count(*) |
+------+-------+------------+----------+
| 219 | 196 | 1 | 58283 |
| 46 | 467 | 2 | 60535 |
| 109 | 667 | 3 | 60789 |
| 5 | 962 | 4 | 20285 |
+------+-------+------------+----------+
4 rows in set (0.19 sec)
Geo spatial in Sphinx
GEODIST(lat1, lon1, lat2, lon2, { option=value, ... })
o in { deg | degrees | rad | radians}
o out {m | meters | km | ft | mi | miles }
o method {haversine | adaptive}
haversine - high precision, expensive
adaptive - good precision, cheaper
(Polar flat-Earth algorithm )
• POLY2D(x1,y1,x2,y2,x3,y3, …)
• GEOPOLY2D (lat1,lng1,lat2,lng2,lat3,lng3,...)
• lat/lng in degrees
• CONTAINTS( polygon, x, y )
mysql> SELECT *, CONTAINS(GEOPOLY2D(40.95164274496,-76.88583678218,41.188446201688,-
73.203723511772,39.900666261352,-74.171833538046,40.059260979044,-
76.301076056469),latitude_deg,longitude_deg) AS inside FROM geodemo WHERE inside=1
LIMIT 0,100 ;
Special data columns
• set of integers column
Multi value attribute (MVA)
199.99
24
128
300
float MVA
Price Categories
... ... ...
MVA with multiple selection
mysql> SELECT id,price,brand_id,categories FROM facetdemo WHERE categories IN (13,14);
+------+-------+----------+------------+
| id | price | brand_id | categories |
+------+-------+----------+------------+
| 1 | 874 | 47 | 13 |
| 2 | 712 | 38 | 11,14 |
| 9 | 113 | 25 | 12,14 |
| 17 | 440 | 46 | 13,15 |
| 19 | 206 | 50 | 13,17 |
| 21 | 76 | 28 | 7,10,13 |
| 22 | 363 | 21 | 13,17,20 |
...
Grouping on MVA
mysql> SELECT id,price,brand_id,categories,GROUPBY(),COUNT(*) FROM facetdemo GROUP BY categories;
+------+-------+----------+------------+-----------+----------+
| id | price | brand_id | categories | groupby() | count(*) |
+------+-------+----------+------------+-----------+----------+
| 1 | 874 | 47 | 13 | 13 | 362931 |
| 2 | 712 | 38 | 11,14 | 14 | 185023 |
| 2 | 712 | 38 | 11,14 | 11 | 329874 |
| 3 | 773 | 7 | 12,16 | 16 | 143837 |
| 3 | 773 | 7 | 12,16 | 12 | 349446 |
| 4 | 803 | 31 | 6,9 | 9 | 267583 |
| 4 | 803 | 31 | 6,9 | 6 | 184772 |
...
Going further: JSON
• starting with 2.1 Sphinx supports JSON
documents
• useful for o unstructured data
o complex one to many relations
{
"id": 1,
"gid": 2,
"title": "some title",
"tags":
[ "tag1", "tag2", "tag3" ],
"property": [
{
"name": "color",
"value": "blue"
},
{
"name": "weight",
"value": 2.56
}
]
}
JSON attributes
• filter, sort and group
• JSON/MVA array functions:
LENGTH(), LEAST(), GREATEST()
• Advanced JSON search in array of objects:
ANY(), ALL(), INDEXOF()
Advanced searching in JSON
document :
id : 1011
title : Hotel Sky
myjson: {
…
offers: {
{
‘type’ : 3,
‘start’ : start_timestamp,
‘end’: end_timestamp
},
{
‘type’ : 1,
‘start’ : start_timestamp,
‘end’: end_timestamp
}
…
}
}
SELECT *,ANY (
( item.type = 1 AND
item.start > my_start_timestamp AND
item.end < my_end_timestamp )
FOR item IN myjson.offers
) AS condition
FROM index
WHERE condition =1
• ANY ( cond FOR var IN json.array)
o true if one element match condition
• ALL ( cond FOR var IN json.array)
o true if all elements match condition
• INDEXOF ( cond FOR var IN json.array)
o returns index key of first element that match
condition
Fulltext for speed up
non-fulltext
SELECT *,(...) as heavy_expr
WHERE attr=x AND heavy_expr =1
SELECT *,(...) as heavy_expr
WHERE MATCH(‘attrx’) AND heavy_expr =1
No fulltext match, query does fullscan,
computes for whole collection the heavy
expression
Fulltext match, heavy expression is
computed only on result set returned by
fulltext match
Sphinx with FT filter
Operation Example MySQL Sphinx w/o FT Sphinx with FT
Filter by integer,
order by
timestamp and
integer column
WHERE
brand_id=20
ORDER BY added
DESC, property_int
ASC
0.31 0.19
Fulltext filter,
order by
timestamp and
integer column
WHERE
MATCH(‘brand20’)
ORDER BY added
DESC, property_int
ASC
0.13
Speed up geo spatial with fulltext
• example: find items around a point in New York city in a
10km radius. Speed-up: search only items belonging to
New York states
mysql> SELECT *, GEODIST(0.710011075352, -
1.2918035709982,latitude,longitude,{in=rad,out=km,method=adaptive}) as distance FROM geodemo WHERE
distance < 10 ORDER BY distance ASC LIMIT 0,10;10 rows in set (0.17 sec)
mysql> SELECT *, GEODIST(0.710011075352, -
1.2918035709982,latitude,longitude,{in=rad,out=km,method=adaptive}) as distance FROM geodemo WHERE
MATCH('@state_code NY') AND distance < 10 ORDER BY distance ASC LIMIT 0,10;10 rows in set (0.03 sec)