Page 1 © Hortonworks Inc. 2014
Discover HDP 2.1 Apache Solr for Hadoop Search
Hortonworks. We do Hadoop.
Page 2 © Hortonworks Inc. 2014
Speakers
Justin Sears
Hortonworks Product Marketing Manager
Rohit Bakhshi
Hortonworks Senior Product Manager & PM for Apache Hadoop & Apache Solr in Hortonworks Data Platform
Paul Codding
Hortonworks Solution Engineer, focused on customer success with Apache Storm & Apache Solr
Page 3 © Hortonworks Inc. 2014
Agenda
• Overview of Apache Solr and Hadoop Search
• Hadoop Search Demo
• Q & A
Page 4 © Hortonworks Inc. 2014
OPERATIONS TOOLS
Provision, Manage & Monitor
DEV & DATA TOOLS
Build & Test
A Modern Data Architecture AP
PLICAT
IONS
DATA
SYSTEM
REPOSITORIES
RDBMS EDW MPP
Business Analy<cs
Custom Applica<ons
Packaged Applica<ons
Gov
erna
nce
&
Inte
grat
ion
ENTERPRISE HADOOP
Secu
rity
Ope
ratio
ns
Data Access
Data Management
SOURC
ES
OLTP, ERP, CRM Systems
Documents, Emails
Web Logs, Click Streams
Social Networks
Machine Generated
Sensor Data
GeolocaCon Data
Page 5 © Hortonworks Inc. 2014
HDP 2.1: Enterprise Hadoop
HDP 2.1 Hortonworks Data Platform
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle & Governance
Falcon Sqoop Flume NFS
WebHDFS YARN : Data Opera<ng System
DATA MANAGEMENT
DATA ACCESS GOVERNANCE & INTEGRATION OPERATIONS
Script Pig
Search
Solr
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Others
In-‐Memory AnalyCcs, ISV engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
Batch
Map Reduce
SECURITY
Authen<ca<on Authoriza<on Accoun<ng
Data Protec<on
Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox
Page 6 © Hortonworks Inc. 2014
HDP 2.1: Enterprise Hadoop
HDP 2.1 Hortonworks Data Platform
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle & Governance
Falcon Sqoop Flume NFS
WebHDFS
DATA MANAGEMENT
GOVERNANCE & INTEGRATION OPERATIONS
Script Pig
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Others
In-‐Memory AnalyCcs, ISV engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
Batch
Map Reduce
SECURITY
Authen<ca<on Authoriza<on Accoun<ng
Data Protec<on
Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox
YARN : Data Opera<ng System
DATA ACCESS
Search
Solr
Page 7 © Hortonworks Inc. 2014
Agenda
Overview Features Q & A
Page 8 © Hortonworks Inc. 2014
Search: Overview Expanded Data Access Interfaces to Hadoop
BATCH MapReduce
INTERACTIVE Tez
STREAMING Storm
ONLINE HBase, Accumulo
HDFS: Redundant, Reliable Storage
YARN: Cluster Resource Management
SEARCH Solr
Page 9 © Hortonworks Inc. 2014
Search: Overview
Apache Solr Open source enterprise search for Hadoop and HDP
• Open architecture: In the community, for the community
• Simple, powerful UI for advanced search applications
• High performance indexing & sub-second search times over billions of documents
• Deep Integration Roadmap with HDP
LucidWorks Hortonworks partner for search
• Enterprise support provided as partnership with LucidWorks
• 9 committers total (7 PMC) for Apache Solr
Page 10 © Hortonworks Inc. 2014
Agenda
Overview Features Q & A
Page 11 © Hortonworks Inc. 2014
Open Source Components for HDP Search
Comprehensive enterprise search using open source technologies:
+ • High-Performance Indexing • Powerful, Accurate & Efficient Search Algorithms • Ranked & Field searching • Flexible faceting, highlighting, joins & result
grouping • Pluggable ranking models
• Advanced Full-Text Search Capabilities • Optimized for High Volume Web Traffic • Standards Based Open Interfaces - XML, JSON
and HTTP • Comprehensive HTML Administration Interfaces • Server statistics exposed over JMX for monitoring • Linearly scalable
Page 12 © Hortonworks Inc. 2014
Scalable Indexing of Data in HDFS • Ingest: MapReduce job
– CSV – Microsoft Office files – Grok (log data) – Zip – Solr XML – Seq files – WARC
• Processing: Apache Pig – Write your own pig
scripts to index content – Pig for preprocessing
and joining
– Output the resulting datasets to Solr
HDFS
MapReduce or Pig Job
Solr
Raw Documents Lucene Indexes
Page 13 © Hortonworks Inc. 2014
Search: Reference Architecture
HDFS (Hadoop Distributed File System)
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
MapReduce Indexing Job
Page 14 © Hortonworks Inc. 2014
HDP Search Demo Paul Codding
Page 15 © Hortonworks Inc. 2014
Agenda
Overview Features Q & A
Page 16 © Hortonworks Inc. 2014
Learn More About Hadoop Search
Hortonworks.com/hadoop/solr/
Register for the remaining 2 Discover HDP 2.1 Webinars
Hortonworks.com/webinars
Next Webinar:
Apache Storm for Stream Data
Processing in Hadoop Thursday, June 19, 10am Pacific
Page 17 © Hortonworks Inc. 2014
Thank you!
Top Related