Wanna search? Piece of cake!

Fast, scalable and easy to setup search engine for your data.

by Alexey Kursovhttp://www.linkedin.com/in/kursov

ElasticSearch is a● distributed● RESTful ● free/open source search server ● based on Apache Lucene.

It is developed by Shay Banon(@kimchy) and is released under the terms of the Apache License. ElasticSearch is developed in Java.

http://elasticsearch.org/http://elasticsearch.com/

Apache Lucene is a ● free/open source information retrieval software library● originally created in Java ● it is supported by the Apache Software Foundation ● it is released under the Apache Software License

While suitable for any application which requires full text indexing and searching capability, Lucene has been widely recognized for its utility in the implementation of Internet search engines and local, single-site searching.

http://lucene.apache.org/core/

Lucene?

Indexing.ElasticSearch is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead.

This type of index is called an inverted index, because it inverts a page-centric data structure (page->words) to a keyword-centric data structure (word->pages).

ElasticSearch uses Apache Lucene to create and manage this inverted index.

Basic Concepts

In computer science, an inverted index is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents. The purpose of an inverted index is to allow fast full text searches, at a cost of increased processing when a document is added to the database. Simple example:

Given the texts:

T[0] = "it is what it is"T[1] = "what is it"T[2] = "it is a banana"

we have the following inverted file index (where the integers in the set notation brackets refer to the indexes (or keys) of the text symbols, T[0], T[1] etc.):

"a": {2}"banana": {2}"is": {0, 1, 2}"it": {0, 1, 2}"what": {0, 1}

Inverted index

Basic ConceptsData representation.In ElasticSearch, a Document is the unit of search and index. An index consists of one or more Documents, and a Document consists of one or more Fields (in database terminology, a Document corresponds to a table row, and a Field corresponds to a table column). Schema declares:- what fields there are - which field should be used as the unique/primary key- which fields are required- how to index and search each field- etc.An index may store documents of different "mapping types". You can associate multiple mapping definitions for each mapping type. A mapping type is a way of separating the documents in an index into logical groups.

Competitors?

http://lucene.apache.org/solr/

http://sphinxsearch.com/

What's the same?

Lucene Query, Facet, Index functionality implementation:

Very similar, but have some differences and nuances, as the one or the other side (in the internet a lot of information about this, you can read for example this series of articles http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/ )

What's the difference?

ElasticSearch main advantages (IMHO):

1. Low barriers to entry. ElasticSearch is a more "intuitive, accessible" system (significantly less configuration, as it's dynamic via HTTP schema builder and sensible defaults)2. JSON-based API is cleaner and easier to use 3. The replication and sharding capabilities are much simpler to configure4. Complex documents (nested)5. Multiple document types per schema 6. Joins (parent/child relationships)7. Online schema changes 8. Self-contained cluster

What's the difference?

Solr main advantages (IMHO):

1. Solr has a bigger, more mature user, dev, and contributor community2. Solr is more mature and maybe more stable3. Solr has more response formats (XML,CSV,JSON)4. Better 3rd-party product integration 5. Pivot Facets6. More customizable

Who wins?

We are all!

ES Clients and "river" plugins

There are clients for languages and platforms (from official site):Java, .Net, Perl, Python, Python, Ruby, PHP, Javascript, Scala, Clojure, Go, Erlang, EventMachine, OCaml, Smalltalk

There are "river" (data import) plugins for:

JDBC, CouchDB, Wikipedia, Twitter, RabbitMQ, RSS, MongoDB, Open Archives Initiative (OAI) , St9, Sofa, Amazon SQS, LDAP, Dropbox, ActiveMQ, Solr, CSV, JMS

Who use ?

How to connect from my code?

NEST(Guys from stackowerflow.com and I think it is the best .net client for ElasticSearch)

NEST aims to be a .net client with a very concise API. (http://github.com/Mpdreamz/NEST)

Its main goal is to provide a solid strongly typed Elasticsearch client. It also has string/dynamic overloads for more dynamic use cases.

Why NEST?

● Fluent. Looks like:

ElasticClient.Search<Foo>(s => s.From(0).Size(10).SortAscending(f => f.Name).Query(...

● Json serializer/deserializer - Newtonsoft Json.NET with all its advantages● Strongly typed● Useful attributes for configuring● kept improving and developing● Open-source● Clear and beauty source code● Available on NuGet

Other clients you can find here: http://www.elasticsearch.org/guide/clients/

Practice

Wanna search? Piece of cake!

Technology

Transcript of Wanna search? Piece of cake!

Piece of Cake...?

AutoLearn’s authoring tool € A piece of cake for teachers

Piece of Cake - Schwab

It's a piece of cake cake design

Rolling Upgrade with Oracle 12c: A piece of cake!

Lesson 16 - Piece of Cake

Great Health is a Piece of Cake - Free recipes

It's a piece of cake cny 2012 cake design

Piece of cake: B3Hub opens! - BMDH Project...piece of cake! The 3Hub for Allied Health is now open and staff celebrated their achievement with cake in their new kitchen diner. Longest-serving

Piece Of Cake

Cake Cutting is Not a Piece of Cake - bit.csc.lsu.edubusch/slides/2003-STACS-cake.pdf · Cake Cutting is Not a Piece of Cake Malik Magdon-Ismail Costas Busch M. S. Krishnamoorthy

Make managing web projects a piece of cake

it's a piece of cake to bake a pretty cake

Cake Cutting is and is not a Piece of Cake

Social Media explained It’s a piece of cake I like cake This is where I eat cake Here I am eating cake.

AutoLearn’s authoring tool € A piece of cake for teachers

Introduction to Arduino: A piece of cake!

Multilayer tablets: a piece of cake? - Natoli

Cake Cutting is Not a Piece of Cake - LSUcsc.lsu.edu/~busch/slides/2003-STACS-cake.pdfCake Cutting is Not a Piece of Cake Malik Magdon-Ismail Costas Busch M. S. Krishnamoorthy Rensselaer

A Piece of Cake - DDD11 - Reading