The Enterprise Search Market in a Nutshell

26
1 The Enterprise Search Market in a Nutshell Iain Fletcher [email protected] October 19, 2015 ICIC 2015, Nice

Transcript of The Enterprise Search Market in a Nutshell

Page 1: The Enterprise Search Market in a Nutshell

1

The Enterprise Search Market in a Nutshell

Iain Fletcher

[email protected]

October 19, 2015

ICIC 2015, Nice

Page 2: The Enterprise Search Market in a Nutshell

2

Agenda

• About Search Technologies (30 seconds)

• The enterprise search market

• Likely future architectures for supporting

important search applications

Page 3: The Enterprise Search Market in a Nutshell

3

Search Technologies: Background

San Diego

London UK

San Jose, CR

Cincinnati

San Francisco

Washington (HQ)

Frankfurt DE

• Founded 2005

• 180 employees

• 600+ customers

• Independent consulting company

• Focus on enterprise search

• Working will all leading platforms

Prague, CZ

Page 5: The Enterprise Search Market in a Nutshell

5

The Enterprise Search Market

Page 6: The Enterprise Search Market in a Nutshell

6

High-level Search Engine Classifications

1. Part of a portfolio, many are recently acquired technologies

– E.g. SharePoint/FAST, HP Autonomy, IBM/Vivisimo, Dassault/Exalead,

Oracle/Endeca

2. Stand-alone specialists, often deployed to address specific apps or

challenges

– E.g. GSA, Coveo, Attivio, Sinequa, Recommind

3. Open source, with or without support or proprietary add-ons

– Raw: Lucene, Solr, Elasticsearch

– With support/add-ons: LucidWorks, Cloudera Search, Elastic ELK

4. Cloud-based services, typically based on open source technology

– E.g. Amazon Cloudsearch (Solr), Microsoft Azure search (Elasticsearch)

Page 7: The Enterprise Search Market in a Nutshell

7

The dominant market share is currently with

SharePoint, open source, and the GSA

• SharePoint 2013 search is credible, and bundled

– Search teams are under pressure to use it, or to provide a

compelling reason to do otherwise

• Solr and Elasticsearch are robust and reliable

– Thanks to very wide-spread deployment

• The Google brand sells – and a lot of GSAs have been

shipped during the past few years

Market Observations

Page 8: The Enterprise Search Market in a Nutshell

8

Functional Observations

• Core indexing / searching is generally fast and reliable

– Search is a maturing / converging technology

• Key differences remain in peripheral functionality, such as

content processing prior to indexing, and query processing

– Coveo, Attivio, Sinequa etc. have well-developed indexing

pipelines, UI tools, and a range of data connectors

– SharePoint and GSA are delivered with limited content

processing functionality and limited connectivity

– Solr, Elasticsearch, AWS Cloudsearch and Azure search don’t

provide a formal indexing pipeline, UI, or connectors

Page 9: The Enterprise Search Market in a Nutshell

9

Further Observations

• The search engines with less focus on peripheral issues

such as content processing and connectivity have dominant

market share

• Connectivity is often challenging, especially when

combined with continual data growth, and document-level

security requirements

• The movement of data sets to the cloud adds further

complexity for enterprise search systems

– Hybrid indexing environments will be with us for some years

– Some content sets in the cloud, some behind the firewall

Page 10: The Enterprise Search Market in a Nutshell

10

Great Search requires Attention to Detail

E.g. in content processing

prior to indexing • Normalization

– Names, dates, synonyms….

• Entity identification and resolution

• Categorization

• Document vector extraction

• Document splitting and concatenation

• Link & popularity analysis

• Dupe & near-dupe detectionIndex

security

category

metadata

Page 11: The Enterprise Search Market in a Nutshell

11

Future Directions for Search

So what will search architectures look like in the future?

Important influences:

• The business need for organizational and analytical agility

• The convergence of search and (“big data”) analytics

• Continual growth in data volumes, and evolution in

repository / storage fashions

Page 12: The Enterprise Search Market in a Nutshell

12

Converging Architectures

Let’s take a brief look at:

1. The “Big Data Architecture”, as evangelized by IBM,

Cloudera, etc.

2. Recent Search Architectures

Background Info

Page 13: The Enterprise Search Market in a Nutshell

13

The Big Data Architecture

Designed for Structured Data

Page 14: The Enterprise Search Market in a Nutshell

14

The Traditional Search Architecture

Integrated Search EngineContentSources

Connectors Index Pipeline SearchIndexEmployee

Directory

CMS

File Share

UI

Etc.

Designed for Unstructured Content

Page 15: The Enterprise Search Market in a Nutshell

15

The Traditional Search Architecture

Integrated Search EngineContentSources

Connectors Index Pipeline SearchIndexEmployee

Directory

CMS

File Share

UI

Etc.

• As data volumes grow, re-indexing

becomes challenging

• The rate at which content can be

acquired from repositories is usually the

bottleneck

Designed for Unstructured Content

Page 16: The Enterprise Search Market in a Nutshell

16

The Traditional Search Architecture

Integrated Search EngineContentSources

Connectors Index Pipeline SearchIndexEmployee

Directory

CMS

File Share

UI

Etc.

• A few documents-per-second?

• There are only 2.6 million seconds in a

month

RE-INDEX

Page 17: The Enterprise Search Market in a Nutshell

17

A Better Search Architecture

• Re-indexing rates greatly improved

• “Touch-time” with repositories can be managed autonomously

Search EngineContentSources

ConnectorsIndex

PipelineSearchIndex

EmployeeDirectory

CMS

Etc.

RE-INDEX

Content

Processing

SecureCache

Iterative

Development

Page 18: The Enterprise Search Market in a Nutshell

18

The Future Architecture?

Hadoop

Search EngineContentSources

ConnectorsIndex

PipelineSearchIndexEmployee

Directory

CMS

Etc.

RE-INDEX

Content

Processing

SecureCache

Iterative

Development

• This environment will encourage ever more sophisticated text analytics

• We expect to see much innovation in text analytics during the next few years

• The deliverable is a better, and richer search index

Page 19: The Enterprise Search Market in a Nutshell

19

An Established Architecture

Hadoop

Search EngineContentSources

ConnectorsIndex

PipelineSearchIndexEmployee

Directory

CMS

Etc.

RE-INDEX

Content

Processing

SecureCache

Iterative

Development

• Google.com works something like this, since 2004

Page 20: The Enterprise Search Market in a Nutshell

20

An Integrated Search/Analytics Architecture

Hadoop

ContentSources

Connectors

CMS

File system

Rapid Indexing

Content

Processing

SecureCache

Iterative

Development

ETL

DataSources

Data Warehouse

Logfiles

Etc.

Etc. Search App.

Search App.

Analysis App.

Analysis App.

• Encourages agile exploitation of data and content resources

Page 21: The Enterprise Search Market in a Nutshell

21

Summary 1

• Search and Big Data applications are tending towards to the same architecture

• Autonomous connectivity and content processing simplifies and de-risks – if you can get it right

• The foundation of great search is still a clean, rich and detailed index

• The “search index” itself is a mature technology, almost a commodity

• Much of the innovation during the next few years will be in text analytics, and other methods of preparing content prior to indexing

Page 22: The Enterprise Search Market in a Nutshell

22

The compulsory analyst quote….

And finally….

“Enterprise Search Can Bring Big Data Within Reach”

• Multiple, purpose-built indexes that are derived from enriched content are necessary.

http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/

* Darin Stewart, Enterprise Search Can Bring Big Data Within Reach, April 2014 Blog

Page 23: The Enterprise Search Market in a Nutshell

23

The Enterprise Search Market in a Nutshell

Iain Fletcher

[email protected]

October 20, 2015

Questions?

Page 24: The Enterprise Search Market in a Nutshell

24

Spare Slides

Page 25: The Enterprise Search Market in a Nutshell

25

Reference Architecture

Content sources

Connectors

Indexes

Semantics

Text Mining

Quality Metrics

Content Processing Pipelines

Big Data Framework

Indexes

Queryparsing

Search Engine

Web Browser

Staging Repository

Page 26: The Enterprise Search Market in a Nutshell

26

Where is the Focus?

• The Business View

• The Implementation View

ApplicationContent Capture & Preparation

Data Store

/ Index

ApplicationContent Capture

& PreparationData Store

/ Index