ElasticSearch for data mining
-
Upload
william-simms -
Category
Technology
-
view
680 -
download
10
Transcript of ElasticSearch for data mining
![Page 2: ElasticSearch for data mining](https://reader034.fdocuments.in/reader034/viewer/2022052400/55a800f91a28aba14d8b456a/html5/thumbnails/2.jpg)
About Me
Software Developer
Agile Team Member
Team LeadAgile
Advocate
SDLC Implementer
![Page 3: ElasticSearch for data mining](https://reader034.fdocuments.in/reader034/viewer/2022052400/55a800f91a28aba14d8b456a/html5/thumbnails/3.jpg)
SDLC
![Page 4: ElasticSearch for data mining](https://reader034.fdocuments.in/reader034/viewer/2022052400/55a800f91a28aba14d8b456a/html5/thumbnails/4.jpg)
Big Data
“Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications.”
- Wikipedia
![Page 5: ElasticSearch for data mining](https://reader034.fdocuments.in/reader034/viewer/2022052400/55a800f91a28aba14d8b456a/html5/thumbnails/5.jpg)
The 3 Vs
• Volume• A few Gigabytes -> Petabyte
• Velocity• Arrives quickly
• Variety• Multiple types of data
![Page 6: ElasticSearch for data mining](https://reader034.fdocuments.in/reader034/viewer/2022052400/55a800f91a28aba14d8b456a/html5/thumbnails/6.jpg)
What is ElasticSearch?
• You know, for search…
• Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTfulweb interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.
![Page 7: ElasticSearch for data mining](https://reader034.fdocuments.in/reader034/viewer/2022052400/55a800f91a28aba14d8b456a/html5/thumbnails/7.jpg)
Let’s break that down…
• Distributed• Run on multiple servers simultaneously
• Multitenant• The same system serving different groups of data
• REST• Web-based programming interface
• NoSQL for storage• Uses JSON
• Open Source
![Page 8: ElasticSearch for data mining](https://reader034.fdocuments.in/reader034/viewer/2022052400/55a800f91a28aba14d8b456a/html5/thumbnails/8.jpg)
So what is ElasticSearch?
• It’s a search engine
• Stores data on multiple machines
• Stores multiple types of data
• Stores in JSON format
• REST interface• There are managed and unmanaged programming interfaces
• .NET• Java• NodeJs• JavaScript• Scala• Clojure
• PHP• Perl• Python• Ruby• Haskell• Erlang
• ColdFusion• SmallTalk• Ocaml• CommandLine• EventMachine• Go
![Page 9: ElasticSearch for data mining](https://reader034.fdocuments.in/reader034/viewer/2022052400/55a800f91a28aba14d8b456a/html5/thumbnails/9.jpg)
Administration Tools
• CURL• CommandLine REST interface
• Marvel
![Page 10: ElasticSearch for data mining](https://reader034.fdocuments.in/reader034/viewer/2022052400/55a800f91a28aba14d8b456a/html5/thumbnails/10.jpg)
Definitions• Cluster
• One or more nodes
• Document• A stored record
• Field• A document has a list of fields, or key-value pairs
• Index• Think of this as a database
• Term• This is an exact value to be matched (“FOO”, “Foo”, “foo”) are not the same term
• Type• Similar to a database
• Text• Field value• Analyzed into terms• Stored in the index
![Page 11: ElasticSearch for data mining](https://reader034.fdocuments.in/reader034/viewer/2022052400/55a800f91a28aba14d8b456a/html5/thumbnails/11.jpg)
ElasticSearch Resources
• ElasticSearch• elasticsearch.org
• ElasticSearch NEST• .NET client
• nest.azurewebsites.net
![Page 12: ElasticSearch for data mining](https://reader034.fdocuments.in/reader034/viewer/2022052400/55a800f91a28aba14d8b456a/html5/thumbnails/12.jpg)
Installation
• Get the binaries
• Unzip
• Run elasticsearch.bat
•