Cassandra Summit 2015: Intro to DSE Search

Post on 12-Apr-2017

1.206 views 3 download

Transcript of Cassandra Summit 2015: Intro to DSE Search

An Introduction to DSE SearchCaleb RackliffeSoftware Engineercaleb.rackliffe@datastax.com@calebrackliffe

What problem were we trying to solve?

3

Application

DataStax Driver

4

SELECT * FROM customers WHERE country LIKE '%land%';

5

What about secondary indexes?

Why not just create your own secondary index implementation that supports wildcard queries?

7

I need full-text search!

Why did we build something new?

10

Application

DataStax Driver Solr Client

Polyglot Persistence!

12

Application

DataStax Driver Solr Client

Consistency

Cost

Complexity

14

partitioning

multi-DC

replication

geospatial

wildcards

monitoring

C* field type support (UDT, Tuple, collections)security

live indexing

sorting

faceting

fault-tolerant distributed search

cachingtext analysis

grouping

automatic index updates

JVM

CQL

repair

15

Application

DataStax Driver Solr Client

Consistency

Complexity

Cost

How about some examples?

Creating a Solr Core

bash$ dse cassandra -s

cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',

'Solr':1};

cqlsh:test> CREATE TABLE test.user(username text PRIMARY KEY, fullname text, address_ map<text, text>);

bash$ dsetool create_core test.user generateResources=true

Start a node…

Create a table…

Create the core…

bash$ dsetool get_core_schema test.user

<?xml version="1.0" encoding="UTF-8" standalone=“no"?><schema name="autoSolrSchema" version="1.5"> <types> <fieldType class="org.apache.solr.schema.TextField" name="text"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <fieldType class="org.apache.solr.schema.StrField" name="string"/> </types> <fields> <field indexed="true" name="username" stored="true" type="string"/> <field indexed="true" name="fullname" stored="true" type="text"/> <dynamicField indexed="true" name="address_*" stored="true" type="string"/> </fields> <uniqueKey>fullname</uniqueKey></schema>

The Schema

Insert Rows (…and Index Documents)

cqlsh:test> INSERT INTO user(username, fullname, address)VALUES('sbtourist', 'Sergio Bossa', {'address_home' : 'UK', 'address_work' : 'UK'});

cqlsh:test> INSERT INTO user(username, fullname, address) VALUES('bereng', 'Berenguer Blasi', {'address_home' : 'ES', 'address_work' : 'ES'});

cqlsh:test> INSERT INTO user(username, fullname, address)VALUES('thegrinch', 'Sven Delmas', {'address_home':'US','address_work':'HQ'});

…and that’s it. No ETL. No writing to a second datastore.

Wildcards

cqlsh:test> SELECT username, address FROM user WHERE solr_query='{"q":"address_home:U*"}'; username | address-----------+---------------------------------------------------- sbtourist | {‘address_home': 'UK', ‘address_work': 'UK'} thegrinch | {‘address_home': 'US', ‘address_work': 'HQ'}(2 rows)

Sorting and Limitscqlsh:test> SELECT username, address FROM user WHERE solr_query=‘{"q":"*:*", "sort":"address_home desc"}'; username | address-----------+---------------------------------------------------- thegrinch | {'address_home': 'US', 'address_work': 'HQ'} sbtourist | {'address_home': 'UK', 'address_work': 'UK'} bereng | {'address_home': 'ES', 'address_work': 'ES'}(3 rows)

cqlsh:test> SELECT username, address FROM user WHERE solr_query='{"q":"*:*", "sort":"address_home desc"}' LIMIT 1; username | address-----------+---------------------------------------------------- thegrinch | {'address_home': 'US', 'address_work': 'HQ'}(3 rows)

Faceting

cqlsh:test> SELECT * FROM user

WHERE solr_query='{"q":"*:*", "facet":{"field" : "address_work"}}';

facet_fields-------------------------------------------- {"address_work" : {"ES" : 1 , "HQ" : 1 , "UK" : 1}}

(1 rows)

Partition Restrictions

cqlsh:test> CREATE TABLE event(sensor_id bigint, recording_time timestamp, description text, PRIMARY KEY(sensor_id, recording_time));

cqlsh:test> SELECT recording_time, description FROM test.event WHERE sensor_id = 2314234432 AND

solr_query=‘description:unremarkable’;

What do the internals look like?

Indexing

26

Buffered

Searchable

Durable

Memory

Disk

27

Buffered

Searchable

Durable

Memory

Disk

28

RAMBuffer

Segment

Segment

Memory

Disk

Segment Segment

Buffered

Searchable

Durable

Soft Commit

Hard Commit

Querying

Replica Selection

A

A

RF=2shards: A-E

B

B CC D

D E

E

coordinator1

2

34

5

Healthy Unhealthy

Replica Selection

A

A

RF=2shards: A-E

B

B CC D

D E

E

coordinator1

2

34

5

Healthy Unhealthy

What happens if a shard query fails?

Failover: Phase 1

4 nodesRF = 2shards: A-Dno vnodes

1

2

3

4

Failover: Phase 2

4 nodesRF = 2shards: A-Dno vnodes

1

2

3

4

Failover: Phase 3

4 nodesRF = 2shards: A-Dno vnodes

1

2

3

4

Platform Integrations

Search + Analytics: Explicit Predicate Pushdown

bash$ dse spark

scala> val table = sc.cassandraTable("wiki","solr")

scala> val result = table.select("id","title") .where(“solr_query=‘body:dog'") .collect

http://docs.datastax.com