Cassandra Summit 2015: Intro to DSE Search

39
An Introduction to DSE Search Caleb Rackliffe Software Engineer [email protected] @calebrackliffe

Transcript of Cassandra Summit 2015: Intro to DSE Search

Page 1: Cassandra Summit 2015: Intro to DSE Search

An Introduction to DSE SearchCaleb RackliffeSoftware [email protected]@calebrackliffe

Page 2: Cassandra Summit 2015: Intro to DSE Search

What problem were we trying to solve?

Page 3: Cassandra Summit 2015: Intro to DSE Search

3

Application

DataStax Driver

Page 4: Cassandra Summit 2015: Intro to DSE Search

4

SELECT * FROM customers WHERE country LIKE '%land%';

Page 5: Cassandra Summit 2015: Intro to DSE Search

5

What about secondary indexes?

Page 6: Cassandra Summit 2015: Intro to DSE Search

Why not just create your own secondary index implementation that supports wildcard queries?

Page 7: Cassandra Summit 2015: Intro to DSE Search

7

I need full-text search!

Page 8: Cassandra Summit 2015: Intro to DSE Search
Page 9: Cassandra Summit 2015: Intro to DSE Search

Why did we build something new?

Page 10: Cassandra Summit 2015: Intro to DSE Search

10

Application

DataStax Driver Solr Client

Page 11: Cassandra Summit 2015: Intro to DSE Search

Polyglot Persistence!

Page 12: Cassandra Summit 2015: Intro to DSE Search

12

Application

DataStax Driver Solr Client

Consistency

Cost

Complexity

Page 13: Cassandra Summit 2015: Intro to DSE Search
Page 14: Cassandra Summit 2015: Intro to DSE Search

14

partitioning

multi-DC

replication

geospatial

wildcards

monitoring

C* field type support (UDT, Tuple, collections)security

live indexing

sorting

faceting

fault-tolerant distributed search

cachingtext analysis

grouping

automatic index updates

JVM

CQL

repair

Page 15: Cassandra Summit 2015: Intro to DSE Search

15

Application

DataStax Driver Solr Client

Consistency

Complexity

Cost

Page 16: Cassandra Summit 2015: Intro to DSE Search

How about some examples?

Page 17: Cassandra Summit 2015: Intro to DSE Search

Creating a Solr Core

bash$ dse cassandra -s

cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',

'Solr':1};

cqlsh:test> CREATE TABLE test.user(username text PRIMARY KEY, fullname text, address_ map<text, text>);

bash$ dsetool create_core test.user generateResources=true

Start a node…

Create a table…

Create the core…

Page 18: Cassandra Summit 2015: Intro to DSE Search

bash$ dsetool get_core_schema test.user

<?xml version="1.0" encoding="UTF-8" standalone=“no"?><schema name="autoSolrSchema" version="1.5"> <types> <fieldType class="org.apache.solr.schema.TextField" name="text"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <fieldType class="org.apache.solr.schema.StrField" name="string"/> </types> <fields> <field indexed="true" name="username" stored="true" type="string"/> <field indexed="true" name="fullname" stored="true" type="text"/> <dynamicField indexed="true" name="address_*" stored="true" type="string"/> </fields> <uniqueKey>fullname</uniqueKey></schema>

The Schema

Page 19: Cassandra Summit 2015: Intro to DSE Search

Insert Rows (…and Index Documents)

cqlsh:test> INSERT INTO user(username, fullname, address)VALUES('sbtourist', 'Sergio Bossa', {'address_home' : 'UK', 'address_work' : 'UK'});

cqlsh:test> INSERT INTO user(username, fullname, address) VALUES('bereng', 'Berenguer Blasi', {'address_home' : 'ES', 'address_work' : 'ES'});

cqlsh:test> INSERT INTO user(username, fullname, address)VALUES('thegrinch', 'Sven Delmas', {'address_home':'US','address_work':'HQ'});

…and that’s it. No ETL. No writing to a second datastore.

Page 20: Cassandra Summit 2015: Intro to DSE Search

Wildcards

cqlsh:test> SELECT username, address FROM user WHERE solr_query='{"q":"address_home:U*"}'; username | address-----------+---------------------------------------------------- sbtourist | {‘address_home': 'UK', ‘address_work': 'UK'} thegrinch | {‘address_home': 'US', ‘address_work': 'HQ'}(2 rows)

Page 21: Cassandra Summit 2015: Intro to DSE Search

Sorting and Limitscqlsh:test> SELECT username, address FROM user WHERE solr_query=‘{"q":"*:*", "sort":"address_home desc"}'; username | address-----------+---------------------------------------------------- thegrinch | {'address_home': 'US', 'address_work': 'HQ'} sbtourist | {'address_home': 'UK', 'address_work': 'UK'} bereng | {'address_home': 'ES', 'address_work': 'ES'}(3 rows)

cqlsh:test> SELECT username, address FROM user WHERE solr_query='{"q":"*:*", "sort":"address_home desc"}' LIMIT 1; username | address-----------+---------------------------------------------------- thegrinch | {'address_home': 'US', 'address_work': 'HQ'}(3 rows)

Page 22: Cassandra Summit 2015: Intro to DSE Search

Faceting

cqlsh:test> SELECT * FROM user

WHERE solr_query='{"q":"*:*", "facet":{"field" : "address_work"}}';

facet_fields-------------------------------------------- {"address_work" : {"ES" : 1 , "HQ" : 1 , "UK" : 1}}

(1 rows)

Page 23: Cassandra Summit 2015: Intro to DSE Search

Partition Restrictions

cqlsh:test> CREATE TABLE event(sensor_id bigint, recording_time timestamp, description text, PRIMARY KEY(sensor_id, recording_time));

cqlsh:test> SELECT recording_time, description FROM test.event WHERE sensor_id = 2314234432 AND

solr_query=‘description:unremarkable’;

Page 24: Cassandra Summit 2015: Intro to DSE Search

What do the internals look like?

Page 25: Cassandra Summit 2015: Intro to DSE Search

Indexing

Page 26: Cassandra Summit 2015: Intro to DSE Search

26

Buffered

Searchable

Durable

Memory

Disk

Page 27: Cassandra Summit 2015: Intro to DSE Search

27

Buffered

Searchable

Durable

Memory

Disk

Page 28: Cassandra Summit 2015: Intro to DSE Search

28

RAMBuffer

Segment

Segment

Memory

Disk

Segment Segment

Buffered

Searchable

Durable

Soft Commit

Hard Commit

Page 29: Cassandra Summit 2015: Intro to DSE Search

Querying

Page 30: Cassandra Summit 2015: Intro to DSE Search

Replica Selection

A

A

RF=2shards: A-E

B

B CC D

D E

E

coordinator1

2

34

5

Healthy Unhealthy

Page 31: Cassandra Summit 2015: Intro to DSE Search

Replica Selection

A

A

RF=2shards: A-E

B

B CC D

D E

E

coordinator1

2

34

5

Healthy Unhealthy

Page 32: Cassandra Summit 2015: Intro to DSE Search

What happens if a shard query fails?

Page 33: Cassandra Summit 2015: Intro to DSE Search

Failover: Phase 1

4 nodesRF = 2shards: A-Dno vnodes

1

2

3

4

Page 34: Cassandra Summit 2015: Intro to DSE Search

Failover: Phase 2

4 nodesRF = 2shards: A-Dno vnodes

1

2

3

4

Page 35: Cassandra Summit 2015: Intro to DSE Search
Page 36: Cassandra Summit 2015: Intro to DSE Search

Failover: Phase 3

4 nodesRF = 2shards: A-Dno vnodes

1

2

3

4

Page 37: Cassandra Summit 2015: Intro to DSE Search

Platform Integrations

Page 38: Cassandra Summit 2015: Intro to DSE Search

Search + Analytics: Explicit Predicate Pushdown

bash$ dse spark

scala> val table = sc.cassandraTable("wiki","solr")

scala> val result = table.select("id","title") .where(“solr_query=‘body:dog'") .collect

Page 39: Cassandra Summit 2015: Intro to DSE Search

http://docs.datastax.com