Webinar: Replace Google Search Appliance with Lucidworks Fusion
Building a data driven search application with LucidWorks SiLK
-
Upload
lucidworks-archived -
Category
Technology
-
view
121 -
download
4
description
Transcript of Building a data driven search application with LucidWorks SiLK
Confidential and Proprietary © Copyright 2013
Building a Data-Driven Log Application
with SILK
April 21, 2014Search | Discover | Analyze
Confidential and Proprietary © Copyright 2013
Agenda
• Introduction to LucidWorks• The Continuum of Search• LucidWorks SILK
– Enabling Big Data Search– 360-degree view of customers and systems– Breakthrough ROI
• Solution Components• Demonstration• Summary and Q&A
Confidential and Proprietary © Copyright 2013
Speakers
• Chief Product Officer at LucidWorks• 15 years product, marketing and BD
experience• Prior to LW 8 years @Splunk (Employee ~9)• Proud Search Snob
• Leads LucidWorks’ newly created Solutions team
• 16-year track record of data-driven solutions– Customer analytics/nano-targeting– Improving product development operations– Video processing and transmission
• Establishing search as the paradigm for solving the "last mile problem" of big data
Confidential and Proprietary © Copyright 2013
Commercial entity behind Lucene/Solr - industry leading open search engine:
• 300+ enterprise customers
• Consulting, training, SLAs and “Pro-Active Support” for open source
LucidWorks platform provides advanced search capabilities directly on Solr:
Connectors , Entity Extraction, Security, pipelines, rules and more…
Solutions (e.g SiLK & LucidWorks App for Splunk) to help streamline use case adoption. Platform
Who is LucidWorks
Confidential and Proprietary © Copyright 2013
Intranet Search Knowledge Base
E-Discovery E-Commerce
‘Big Data Search’
Application Innovation
Index Characteristics
‘Enterprise Search’
‘Intelligent Search’
Gigabyte scale Single instance Full-text
Terabyte Scale Cluster-ready Structured/
Unstructured Data Near real-time
Search on Hadoop Log Analysis Fraud Detection
Unlimited Scale Cloud-ready Handles any data
type Real-time NoSQL Alternative
Continuum of Search
Confidential and Proprietary © Copyright 2013
Creates the data access layer leveraged by best-in-class data-drivenapplications:
is the choice of those building data-driven applications at massive scale
6
Solr is the Choice
Confidential and Proprietary © Copyright 2013
A Big Data Search search index
Unlimited Scale Cloud-ready Handles any data type Real-time NoSQL Alternative
7
Creates the data access layer
At-Hoc Discovery Personalization Context
That developers & users demand in
their Big Data applications
Big Data Search
is the partner of choice to deliver next generation search by the leading Big Data vendors
Confidential and Proprietary © Copyright 2013
Big Data Ecosystem WITHOUT LucidWorks Search
Input Data Stream
Traditional RDBMS/EDWDoc Stores
Platform for Data Storage and Machine Learning
Difficult Getting Value from Data
1. Opaque2. Narrow views into data3. Out-of-date4. Not Actionable5. Accessible mostly to
expert users6. Expensive, ineffective
translation to broader set of users
Product Mgr’s
Business Users
Rest of Org
Data Scientist
BI AnalystIT
HDFS; NoSQL; Hadoop
Real-time Processing
Confidential and Proprietary © Copyright 2013
Input Data Stream
Traditional RDBMS/EDWDoc Stores
Directly Access Data and Insights to Drive Actions:
Breakthrough ROI
Predictive
Relevant
Actionable
Timely
HDFS; NoSQL; Hadoop
Real-time Processing
Lucene/Solr
Solving the Last Mile Problem of Big Data
Confidential and Proprietary © Copyright 2013
Solution Components
Gateway
JDBC Connector
Web/File System Crawl
Data Warehouse
Hadoop Connectors
Clickstream Networking
Data Sources
Connectors
Servers
Confidential and Proprietary © Copyright 2013
Events from App/Server/Web Logs,etc
• Application Logs– 2013-12-18 01:37:20,637 INFO core.SolrCore - [collection1] webapp= path=/browse
params={fl=lucid_facet&facet.query={!tag%3Done_day}dateCreated:[NOW-1DAY/DAY+TO+NOW/DAY] &facet.query={!tag%3Done_year}dateCreated:[NOW-365DAYS/DAY+TO+NOW/DAY]&start=260&q=faceting&f.project.facet.limit=20&role=DEFAULT&req_type=main&hl.simple.post=</span>&facet.field={!ex%3Dsource}source&facet.field={!ex%3Dsource}list_type&facet.field={!ex%3Dsource}issue_status&facet.field={!ex%3Dsource}lucid_facet&facet.field={!ex%3Dproject}project&facet.field={!ex%3Dauthor_display}author_display} hits=6761 status=0 QTime=14
• Firewall Logs– Apr 07 2014 10:14:56 eventid='1278457197410173971' severity=severe
category="Penetrate/ArpPoisoning" hostId=r signature=3201-2 description="Unix Password File Access Attempt" attacker=110.236.0.15 target=27.96.128.0 target=141.146.8.66 gc_score="-5" gc_riskdelta="3" gc_riskrating="false" gc_deny_packet="true" gc_deny_attacker="false”
• Web Logs– 50.17.233.225 - - [09/Mar/2014:06:26:50 -0700] "GET / HTTP/1.1" 200 24442 "-" "Mozilla/5.0 (X11;
U; Linux i686; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 »
• Syslogs– Apr 17 07:00:42 Lucids-MacBook-Pro-25.local Microsoft Outlook[2461]: CGSCopyDisplayUUID:
Invalid display 0x18d88a81
• Other—Database Logs, Click Data, Conversions, Social Media (Tweets…), Financial Data, Product Catalogs, Knowledge Base, etc.
• Volume, Variety and Velocity
Confidential and Proprietary © Copyright 2013
Application Development Process
• Understand your Users• Know your Data• Prepare and Ingest Data into Solr• Build Visualizations• Iterate
Confidential and Proprietary © Copyright 2013
Search Analytics—Understand your Users
• Who will use this application– Business User (eCommerce or KM), IT and Search
Administrators
• What are they interested in?– What are people searching for?– Which queries are returning zero hits?– Which searches are providing slow response times?– What is my memory & cpu usage, jvm metrics, etc.?– Is there a trend in my slow searches?– Is the cache warm-up time very large?
• First three of interest to Business User, Search Admins/Developers interested in all six.
Confidential and Proprietary © Copyright 2013
Search Analytics–Know your Data
• Where is the data available?– Core Logs– Core Request Logs– Connector Logs– Mbeans API– Log4j
• Data Connectors– LogStash (for this example)– Hadoop Job Jar
Confidential and Proprietary © Copyright 2013
Centralized Logging Infrastructures
• Can be built using a combination of LogStash, Apache Flume, Lumberjack, Rabbit MQ, Apache Kafka, etc.
• Today’s example uses LogStash—extensive documentation at http://logstash.net/docs/1.4.0
Shipper
Shipper
Broker Indexer
Confidential and Proprietary © Copyright 2013
Solr/Solr Cloud
Search Analytics—Data Ingestion & Visualization
Gateway(Reverse Proxy)
Solr Output Writer for
LogStash (Http)
Search Logs
Visualization Configurable Dashboards
Hadoop ConnectorGrokIngestMapperLogStash
Confidential and Proprietary © Copyright 2013
5 DEMO
Search | Discover | Analyze
Confidential and Proprietary © Copyright 2013
Confidential and Proprietary © Copyright 2013
• Contacts– Will Hayes, Chief Product Officer
[email protected] twitter:@iamwillhayes
– Ravi Krishnamurthy, Director of Solutions [email protected]
• Links– http://www.lucidworks.com/silk
Q & A