DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | Cassandra Summit 2016
Enabling Search in your Cassandra Application with DataStax Enterprise
-
Upload
datastax-academy -
Category
Technology
-
view
470 -
download
1
Transcript of Enabling Search in your Cassandra Application with DataStax Enterprise
Solutions Engineer @MarcSelwan
Marc Selwan
Enabling Search in your Cassandra Application with Datastax Enterprise
1
Why Search?
Confidential
Confidential
Confidential
The bright blue butterfly hangs on the breeze.
[the] [bright] [blue] [butterfly] [hangs] [on] [the] [breeze]
Terms
Confidential Credit: https://developer.apple.com/library/mac/documentation/userexperience/conceptual/SearchKitConcepts/searchKit_basics/searchKit_basics.html
What is Solr Missing?
Not a Database
Doesn’t Cluster
Not transparently
sharded
Requires ETL to injest
application data
Doesn’t Reindex
Confidential
7
OLTP DB Search Cluster
Your ApplicationDB API Search API
YourETL
Transactional Workloads
Search Workloads
Open Source Search Reference Architecture
Confidential
+ =
Confidential
9
DSE Search Reference Architecture
Search+
Cassandra
80
10
3050
70
60
40
20
Your Application
CQLEasy CQL APIAll the goodness of DataStax driverDistributed, Replicated, Always OnData locality and shared memory• Automatic indexing on db insert• Higher ingestion throughput• Distributed query optimizationCompared to open source search• No separate search cluster to manage• Probably less total hardware required• No “Split Brain” data inconsistencies• No ETL or synch to build and maintain• No app level data management code
Data stored in Cassandra
Indexes stored in Solr/Lucene
Disk
Memory
Solr Cassandra
Disk
MemoryMem-Table
IndexSegment
s
Ram Buffer
IndexSegment
s
IndexSegment
s
Mem-Table
Mem-table
IndexSegments
SSTables
Commit Log
Coordinator
IndexSegments
Shard Router
UPDATE videos (videoid, tags)SET tags = {‘cat tubes’, ‘Al Gore’s Internet’, ‘NoSQL Fairytales’}WHERE voided = b3a76c6b-7c7f-4af6-964f-803a9283c401
OSS Solr
Disk
Memory
IndexSegment
s
Ram Buffer
IndexSegment
s
IndexSegment
s
IndexSegment
s
IndexSegment
s
Not Searchable
Searchable
DSE Search
Disk
Memory
IndexSegment
s
Ram Buffer
IndexSegment
s
IndexSegment
s
IndexSegment
s
IndexSegment
s
Searchable
Confidential
Let’s see this in action!
Search in Retail
Filter queries: These are awesome because the result set gets cached in memory.
SELECT * FROM amazon.metadata WHERE solr_query='{"q":"title:Noir~", "fq":"categories:Books", "sort":"title asc"}' limit 10;
Faceting: Get counts of fields
SELECT * FROM amazon.metadata WHERE solr_query='{"q":"title:Noir~", "facet":{"field":"categories"}}' limit 10;
Geospatial Searches: Supports box and radiusSELECT * FROM amazon.clicks WHERE solr_query='{"q":"asin:*", "fq":"+{!geofilt pt=\"37.7484,-122.4156\" sfield=location d=1}"}' limit 10;
Joins: Not your relational joins. These queries 'borrow' indexes from other tables to add filter logic. These are fast!
SELECT * FROM amazon.metadata WHERE solr_query='{"q":"*:*", "fq":"{!join from=asin to=asin force=true fromIndex=amazon.clicks}area_code:415"}' limit 5;
Fun all in one.
SELECT * FROM amazon.metadata WHERE solr_query='{"q":"*:*", "facet":{"field":"categories"}, "fq":"{!join from=asin to=asin force=true fromIndex=amazon.clicks}area_code:415"}' limit 5;
How do you get started??
Confidential
1) Spin up a new C* Cluster with search enabled using the DSE installer.$ sudo service dse cassandra -s
2) Run your schema DDL to create the C* keyspace and tables.
3) Run dse_tool on the videos table*$ dsetool create_core keyspace.table generateResources=true reindex=true
4) Write a CQL query with a Solr Search in it.
SELECT * FROM keyspace.tableWHERE solr_query=‘column:*’
*This will create lucene indexes on ALL the columns in your table.
Behind the scenes…dse_tool
schema.xmlsolrconfig.xml
CQL Query$ dsetool create_core killrvideo.videos generateResources=true
<?xml version="1.0" encoding="UTF-8" standalone="no"?><schema name="autoSolrSchema" version="1.5"><types>…<fields><field indexed="true" multiValued="false" name="added_date" stored="true" type="TrieDateField"/><field indexed="true" multiValued="false" name="location" stored="true" type="TextField"/><field indexed="true" multiValued="false" name="preview_image_location" stored="true" type="TextField"/><field indexed="true" multiValued="false" name="name" termVectors="true" stored="true" type="TextField"/><field indexed="true" multiValued="true" name="tags" termVectors="true" stored="true" type="TextField"/><field indexed="true" multiValued="false" name="userid" stored="true" type="UUIDField"/><field indexed="true" multiValued="false" name="videoid" stored="true" type="UUIDField"/><field indexed="true" multiValued="false" name="location_type" stored="true" type="TrieIntField"/><field indexed="true" multiValued="false" name="description" termVectors="true" stored="true" type="TextField"/></fields><uniqueKey>videoid</uniqueKey></schema>
<!--======= Copyright DataStax, Inc. Please see the included license file for details.--><!-- For more details about configurations options that may appear in this file, see http://wiki.apache.org/solr/SolrConfigXml.--><config> <!-- In all configuration below, a prefix of "solr." for class names is an alias that causes solr to search appropriate packages, including org.apache.solr.(search|update|request|core|analysis) You may also specify a fully qualified Java classname if you have your own custom plugins. -->…
SELECT * FROM killrvideo.videos WHERE solr_query=‘name:*’
Thank you!
25