Cassandra Summit 2015: Intro to DSE Search

An Introduction to DSE SearchCaleb RackliffeSoftware Engineercaleb.rackliffe@datastax.com@calebrackliffe

What problem were we trying to solve?

Application

DataStax Driver

SELECT * FROM customers WHERE country LIKE '%land%';

What about secondary indexes?

Why not just create your own secondary index implementation that supports wildcard queries?

I need full-text search!

Why did we build something new?

Application

DataStax Driver Solr Client

Polyglot Persistence!

Application

Consistency

Complexity

partitioning

multi-DC

replication

geospatial

wildcards

monitoring

C* field type support (UDT, Tuple, collections)security

live indexing

sorting

faceting

fault-tolerant distributed search

cachingtext analysis

grouping

automatic index updates

repair

Application

Consistency

Complexity

How about some examples?

Creating a Solr Core

bash$ dse cassandra -s

cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',

'Solr':1};

cqlsh:test> CREATE TABLE test.user(username text PRIMARY KEY, fullname text, address_ map<text, text>);

bash$ dsetool create_core test.user generateResources=true

Start a node…

Create a table…

Create the core…

bash$ dsetool get_core_schema test.user

<?xml version="1.0" encoding="UTF-8" standalone=“no"?><schema name="autoSolrSchema" version="1.5"> <types> <fieldType class="org.apache.solr.schema.TextField" name="text"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <fieldType class="org.apache.solr.schema.StrField" name="string"/> </types> <fields> <field indexed="true" name="username" stored="true" type="string"/> <field indexed="true" name="fullname" stored="true" type="text"/> <dynamicField indexed="true" name="address_*" stored="true" type="string"/> </fields> <uniqueKey>fullname</uniqueKey></schema>

The Schema

Insert Rows (…and Index Documents)

cqlsh:test> INSERT INTO user(username, fullname, address)VALUES('sbtourist', 'Sergio Bossa', {'address_home' : 'UK', 'address_work' : 'UK'});

cqlsh:test> INSERT INTO user(username, fullname, address) VALUES('bereng', 'Berenguer Blasi', {'address_home' : 'ES', 'address_work' : 'ES'});

cqlsh:test> INSERT INTO user(username, fullname, address)VALUES('thegrinch', 'Sven Delmas', {'address_home':'US','address_work':'HQ'});

…and that’s it. No ETL. No writing to a second datastore.

Wildcards

cqlsh:test> SELECT username, address FROM user WHERE solr_query='{"q":"address_home:U*"}'; username | address-----------+---------------------------------------------------- sbtourist | {‘address_home': 'UK', ‘address_work': 'UK'} thegrinch | {‘address_home': 'US', ‘address_work': 'HQ'}(2 rows)

Sorting and Limitscqlsh:test> SELECT username, address FROM user WHERE solr_query=‘{"q":"*:*", "sort":"address_home desc"}'; username | address-----------+---------------------------------------------------- thegrinch | {'address_home': 'US', 'address_work': 'HQ'} sbtourist | {'address_home': 'UK', 'address_work': 'UK'} bereng | {'address_home': 'ES', 'address_work': 'ES'}(3 rows)

cqlsh:test> SELECT username, address FROM user WHERE solr_query='{"q":"*:*", "sort":"address_home desc"}' LIMIT 1; username | address-----------+---------------------------------------------------- thegrinch | {'address_home': 'US', 'address_work': 'HQ'}(3 rows)

Faceting

cqlsh:test> SELECT * FROM user

WHERE solr_query='{"q":"*:*", "facet":{"field" : "address_work"}}';

facet_fields-------------------------------------------- {"address_work" : {"ES" : 1 , "HQ" : 1 , "UK" : 1}}

(1 rows)

Partition Restrictions

cqlsh:test> CREATE TABLE event(sensor_id bigint, recording_time timestamp, description text, PRIMARY KEY(sensor_id, recording_time));

cqlsh:test> SELECT recording_time, description FROM test.event WHERE sensor_id = 2314234432 AND

solr_query=‘description:unremarkable’;

What do the internals look like?

Indexing

Buffered

Searchable

Durable

Memory

Buffered

Searchable

Durable

Memory

RAMBuffer

Segment

Memory

Segment Segment

Buffered

Searchable

Durable

Soft Commit

Hard Commit

Querying

Replica Selection

RF=2shards: A-E

B CC D

coordinator1

Healthy Unhealthy

Replica Selection

RF=2shards: A-E

B CC D

coordinator1

Healthy Unhealthy

What happens if a shard query fails?

Failover: Phase 1

4 nodesRF = 2shards: A-Dno vnodes

Failover: Phase 2

Failover: Phase 3

Platform Integrations

Search + Analytics: Explicit Predicate Pushdown

bash$ dse spark

scala> val table = sc.cassandraTable("wiki","solr")

scala> val result = table.select("id","title") .where(“solr_query=‘body:dog'") .collect

http://docs.datastax.com

Cassandra Summit 2015: Intro to DSE Search

Technology

Transcript of Cassandra Summit 2015: Intro to DSE Search

Running Cassandra on Amazon’s ECS - Meetupfiles.meetup.com/7439192/Cassandra-ECS.pdf · • Cassandra • ECS • Cassandra on Docker best practices • Cassandra on ECS. Motivation.

A reactive platform for real-time asset trading case study ...reactive+platform...scala, akka persistence, akka cluster, akka streams, cassandra (dse), apache kafka, gatling, prometheus,

COMMISSION ON YOUTH DSE/HKCEE/HKAL DSE/HKCEE/HKAL …

Intro to Cassandra and CassandraObject

NYC Cassandra Day - Java Intro

Apache Cassandra in Action - O'Reilly Mediaassets.en.oreilly.com/1/event/55/Apache Cassandra in Action... · Apache Cassandra in Action. Why Cassandra? ... Cassandra in production.

DataStax | Network Analysis Adventure with DSE Graph, DataStax Studio, and TinkerPop (Bob Briody) | Cassandra Summit 2016

DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela) | Cassandra Summit 2016

DSE Evalation System Evalates: DSE › en › ds › DS28E17K.pdf · DSE Evalation System Evalates: DSE General Description The DS28E17 evaluation system (EV system) provides the

Cassandra Intro -- TheEdge2012

DataStax | Effective Testing in DSE (Lessons Learned) (Predrag Knezevic) | Cassandra Summit 2016

DSE CVL Yellow Spring Paper Circles on C.. DSE CVL Yellow ... · DSE CVL Yellow Spring Alpha 5.jpg DSE CVL Yellow Spring Alpha 6.jpg DSE CVL Yellow Spring Alpha Numbers.jpg . Title:

Crash course intro to cassandra

GPS antenna connected to the DSE WebNet - Americas …€¦ · DSE Gateway DSE890 Installation Instructions DSE Gateway DSE890 Hardware Manual DSE Webnet Software Manual PART NO’S

CORPORATE COURSE CATALOGCourse+Catalog+2019... · INTRO TO CASSANDRA 3 FOR DEVELOPERS The Cassandra (C*) database is a massively scalable NoSQL database that provides high availability

Chicago Cassandra - Cassandra from Python

DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassandra Summit 2016

The Whatsapp+ Alternative Behind Your FirewallSearch Engine Integrated SOLR; Live Indexing, End-to-end encrypted Indexed Data DataStax Enterprise 6.0: Cassandra and DSE Graph Multimodal:

Intro to py spark (and cassandra)

GPS antenna connected to the DSE WebNetdavidsonsalesshop.com/catalog/files/Products/Deep Sea...DSE Gateway DSE890 Installation Instructions DSE Gateway DSE890 Hardware Manual DSE Webnet