Enterprise Search - Introduction
-
Upload
amplexor -
Category
Technology
-
view
124 -
download
1
description
Transcript of Enterprise Search - Introduction
![Page 1: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/1.jpg)
Enterprise Search8/12/2011 – Damien Dewitte
![Page 2: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/2.jpg)
2.
Enterprise SearchSetting the scene
Damien Dewitte
Lead ECM consultant
![Page 3: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/3.jpg)
3.
search
The enterprise search promiseSome thoughts on search scenariosMake your content “findable”Search: How it worksThe enterprise search market
Contents
![Page 4: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/4.jpg)
4.
![Page 5: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/5.jpg)
5.
While on the Intranet …
![Page 6: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/6.jpg)
6.
the Enterprise Search promise
![Page 7: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/7.jpg)
7.
The Enterprise Search Promise
IDC 2001:”The High Cost of Not Finding Information”Ø Cost=
Poor decisions based on faulty or poor informationDuplicated efforts within different divisions/projectsLost sales due to customer’s inability to find product and servicesLost productivity due to employees inability to find information
![Page 8: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/8.jpg)
8.
The Enterprise Search Promise
Google (2008)
![Page 9: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/9.jpg)
9.
The Enterprise Search Promise
![Page 10: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/10.jpg)
10.
The Invisible Intranet
Using Search on an Intranet usually leaves a huge portion of existing valuable information ‘invisible’, becauseØ Some information silos are not indexed:
Databases with structured content
External sources
Isolated departmental content repositories
Individual desktops
Content applications ‘in the cloud’
Digital ArchivesØ Some Information is “over-secured”Ø Some Information is trapped in proprietary file formats, which can not
be indexedØ Some Information can not be extracted as text
Rich Media files (Audio, Video)
Badly scanned documents
![Page 11: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/11.jpg)
11.
The Enterprise Search Promise
![Page 12: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/12.jpg)
12.12
The Enterprise Search Promise
RDBMS(JDBC, ODBC,SQLNet, DW,
DM)
Applications(e.g. ERM, CRM,
Help Desk)
Legacy Data(e.g. ISAM, VSAM, IMS)
Message Queues(e.g. TIBCO, MQ-Series)
DMS(e.g. M’Soft CMS,
Documentum)
eMail Systems(e.g. Notes,Exchange)
Files(e.g. Word, Excel,pdf, images, mp3)
Portals(e.g. WebSphere,
WebLogic)
WWW(HTML, XML, WML,
JavaScript)
Private Webs(e.g. news feeds,
Intranets)
Direct Push
UNSTRUCTUREDSTRUCTURED REAL--TIME
Enterprise Search PlatformSI
TE S
EAR
CH
MA
IL S
EAR
CH
BI S
EAR
CH
DM
S SE
AR
CH
CO
RPO
RAT
ESE
AR
CH
ECO
MM
ERC
ESE
AR
CH…
![Page 13: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/13.jpg)
13.13
The Enterprise Search Promise
“There’s no reason to expect that search is going to get that much better. The basic algorithms by which search is done have not improved much since about 1975.
The only way to improve the situation is by enhancing search engines with more deterministic metadata.
If you look at the victory of Google, it wasn’t because they had better search techniques. It’s because they deployed one key metadata value – how many pages are linked to this one – to enhance the relevancy of their results.The same concepts need to be applied to the enterprise.”
(Tim Bray)
![Page 14: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/14.jpg)
14.
Some thoughts on search scenarios
![Page 15: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/15.jpg)
15.
Enterprise versus web search
Web EnterpriseContent Mainly HTML and
PDFAll formats and sources, including databases and legacy systems
Security Focus on system security
Also restricting user access to specific content
Updates Via (scheduled) crawling
Push updates to the index (near real time)
Volume On average: 1000 files
Potentially: > 1.000.000 “records”
Metadata management
Centrally in e.g. Web CMS
Consolidate metadata from various source systems
Relevance Popularity via hyperlinks
Popularity via “social” instruments?
![Page 16: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/16.jpg)
16.
Enterprise versus web search
Probably the cheapest website search you can find
![Page 17: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/17.jpg)
17.
Structured versus unstructured
Start by filtering
Start by typing
![Page 18: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/18.jpg)
18.
Search versus research
“Meeting minutes social collaboration project” “Amplexor
proposal for Intranet”
“Timesheets april 2009”
“Ecm and Green IT in Europe”
“Does ECM have impact on governmental decisions in Spain?”
“I know you’re out there..”“Life is like a box of chocolates, …You never know what you gonna get”
“average time spent on searching for content”
![Page 19: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/19.jpg)
19.
Search versus research
Search based onØ Information Type (Meeting minutes,
Proposal, Invoice, Timesheet, …)Ø Document Format (PDF, DOC, PPT, e-
mail, …)Ø Organisational Source
Projects
Products
Processes– HR– Compliance– Marketing– IT– …
…Ø Publication Date, Modification dateØ Author
“Meeting minutes social collaboration project”
Search queries are more or less predictable (after analysis)
![Page 20: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/20.jpg)
20.
Search versus research
Research based onØ Entities:
People
Geographical locations
Companies & Brands
…Ø Source: Internal or ExternalØ Publication Date RangeØ Natural language search
“Does ECM have impact on governmental decisions in Spain?”
Search queries are unpredictable. The system should be “taught” how to interpret a query. (natural language search, entity extraction from content, …
![Page 21: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/21.jpg)
21.
Metadata
What is metadata?Ø Information about the information:
Descriptive
Structural
Administrative
Types of metadata:Implicit (e.g. creation date, publication date, URL, filename, file format, source system, …)
Explicit (e.g. owner, topic, summary, expiry date, status, …)
Guiding metadata input with:Taxonomies
Folksonomies
Ontologies
![Page 22: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/22.jpg)
22.22
Taxonomies
![Page 23: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/23.jpg)
23.
Folksonomies
http://taggalaxy.de
![Page 24: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/24.jpg)
24.
Ontologies
Taxonomies, representing knowledge as a set of concepts within a domain, and the relationships between those concepts
http://en.wikipedia.org/wiki/Geopolitical_ontology
![Page 25: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/25.jpg)
25.
Metadata
Statement 1: “A performant Enterprise Search Engine should not require information workers to add metadata. It should just Crawl all my information sources”
But:Ø Will users understand the
results displayed? (title, author, …
Ø How will they filter results?Ø Does it really help to crawl
1.000.000 records if 900.000 have becomeirrelevant over time?
![Page 26: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/26.jpg)
26.
Metadata
Statement 2: “Google doesn’t need metadata”
Are you sure?
![Page 27: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/27.jpg)
27.
Metadata
So you think Google doesn’t need metadata?
![Page 28: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/28.jpg)
28.
Simple example of the semantic web
![Page 29: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/29.jpg)
29.
Metadata
Statement 3: Adding metadata is so time consuming my information workers will never do it.
Yes, but:Ø In an structured ECM approach, it is possible to automate lots of the
metadata input, because it can be deduced from some business rulesØ If you’re not 100% sure you will need a metadata field for a specific
purpose, then don’t create it.Ø Convince users about the value of the metadata fields which remainØ Make it user friendly for content contributors to add metadata
![Page 30: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/30.jpg)
30.
Metadata
Avoid defining metadata around the document, if it should already be present IN the document.
![Page 31: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/31.jpg)
31.
Make content findable
![Page 32: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/32.jpg)
32.
Findability
Findability is not obtained just by implementing search technology
AIIM.org: “Information Organization and Access (IOA) refers to a collection of technologies to help you organize and find information”, which includes:Ø enterprise searchØ content classificationØ categorization and clusteringØ fact and entity extractionØ taxonomy creation and managementØ information presentation (i.e., visualization)Ø information governance
![Page 33: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/33.jpg)
33.
Findability Tips & Tricks
The more value content has, the more effort should be spent in managing it (and making it findable)
![Page 34: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/34.jpg)
34.
Findability Tips & Tricks
One search interface doesn’t solve it all. Keep in mind thatØ Specific content sources or Lines of Business might require specialized
search screens
![Page 35: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/35.jpg)
35.
Findability Tips & Tricks
Define specific search scopes, if your information governance permits …
![Page 36: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/36.jpg)
36.
Findability Tips & Tricks
Landing Pages are still “in”!Ø Projects Overview PageØ Knowledge base page
(links to knowledge bases)Ø Practical Guide
(categorized hyperlinks to practical information)
Ø ToolsØ FormsØ Filtered listings (e.g.
Automatic listing of all FAQ Content types)
![Page 37: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/37.jpg)
37.
How search works
![Page 38: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/38.jpg)
38.
How it works
CO
NN
ECTO
RS
Pipeline
SEARCH QU
ERY &
RESU
LTPR
OC
ESSING
FILTER
Query
Results
Alert
VerticalApplications
Portals
CustomFront-Ends
MobileDevices
DATABASECONNECTO
R
FILETRAVERSE
R
WEBCRAWLER
ContentPush
DO
CU
MEN
TPR
OC
ESSING
Pipeline
WebContent
Files,Documents
Databases
CustomApplications
CO
NN
ECTO
RS
TUNING, ADMINISTRATION
Index Files
Pipeline
Multimedia
Architecture
![Page 39: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/39.jpg)
39.
How it works
Connect to content sources and get dataØ Web pages (e.g. XML, HTML, WML): CrawlerØ Files, documents (e.g. Word, Excel, pdf): File
traverserØ Database content (e.g. Oracle, DB2): Database
connectorsØ Applications (e.g. Sharepoint, Documentum,
Exchange, CMS/DMS): Application connectors
CO
NN
ECTO
RS
Pipeline
SEARCH QU
ER
Y &
RES
ULT
PR
OC
ESS
ING
FILTER
Query
Results
Alert
VerticalApplications
Portals
CustomFront-Ends
MobileDevices
DATABASECONNECTO
R
FILETRAVERSE
R
WEBCRAWLE
R
ContentPush
DO
CU
MEN
TPR
OC
ESSING
Pipeline
WebContent
Files,Documents
Databases
CustomApplications
CO
NN
ECTO
RS
TUNING, ADMINISTRATION
Index Files
Multimedia
![Page 40: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/40.jpg)
40.
How it works
WebContent
CO
NN
ECTO
RS
Pipeline
SEARCH QU
ERY /R
ESULT
PRO
CESSIN
G
FILTER
Query
Results
Alert
VerticalApplications
Portals
CustomFront-Ends
MobileDevices
DATABASECONNECTO
R
FILETRAVERSE
R
WEBCRAWLE
R
DO
CU
MEN
TPR
OC
ESSING
Pipeline
CO
NN
ECTO
RS
TUNING, ADMINISTRATION
Index Files
Files,Documents
Databases
CustomApplications
ContentPush
Pipeline
Multimedia
Analyze and index content to make it searchable
Ø Convert and process content through pre-processing pipeline:
Lemmatization/stemming, entity extraction, taxonomy classification
Custom logic (e.g. adding special tags)
Ø Write content to index files
![Page 41: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/41.jpg)
41.
Search EngineHow It Works
Analyze query
Ø Use query language or query APIØ Convert and process query through query pipeline:
Linguistic processing Custom logic (e.g. query term
modification/addition)
WebContent
CO
NN
ECTO
RS
Pipeline
SEARCH
QU
ERY
PRO
CESSIN
G
FILTER
Query
Results
Alert
VerticalApplications
Portals
CustomFront-Ends
MobileDevices
DATABASECONNECTO
R
FILETRAVERSE
R
WEBCRAWLE
R
ContentPush
DO
CU
MEN
TPR
OC
ESSING
Pipeline
CO
NN
ECTO
RS
TUNING, ADMINISTRATION
Index Files
Files,Documents
Databases
CustomApplications
Multimedia
![Page 42: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/42.jpg)
42.
How it works
Match query to content index
Ø Query- and content adaptive matchingØ Exploit all information and structure in the data
CO
NN
ECTO
RS
Pipeline
SEARCH QU
ERY /R
ESULT
PRO
CESSIN
G
FILTER
Query
Results
Alert
VerticalApplications
Portals
CustomFront-Ends
MobileDevices
DATABASECONNECTO
R
FILETRAVERSE
R
WEBCRAWLE
R
DO
CU
MEN
TPR
OC
ESSING
Pipeline
CO
NN
ECTO
RS
TUNING, ADMINISTRATION
Index Files
WebContent
ContentPush
Files,Documents
Databases
CustomApplications
Pipeline
Multimedia
![Page 43: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/43.jpg)
43.
CO
NN
ECTO
RS
How it works
Return results to user
Ø Convert and process results through result pipeline:
Resort, filter for security, organize for dynamic drilldown
Ø Pass results on to application (generated or through API) Ø Push results to alert engine and then external environment (e.g. mail, queue)
WebContent
Pipeline
SEARCH RESU
LTPR
OC
ESSING
FILTER
Query
Results
Alert
VerticalApplications
Portals
CustomFront-Ends
MobileDevices
DATABASECONNECTO
R
FILETRAVERSE
R
WEBCRAWLE
R
ContentPush
DO
CU
MEN
TPR
OC
ESSING
Pipeline
CO
NN
ECTO
RS
TUNING, ADMINISTRATION
Index Files
Files,Documents
Databases
CustomApplications
Multimedia
![Page 44: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/44.jpg)
44.
Mediafin
![Page 45: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/45.jpg)
45.
How it works
Federated Search: Relies on the indexes and the relevance algorithms of the under laying search engines
![Page 46: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/46.jpg)
46.
the Enterprise Search market
![Page 47: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/47.jpg)
47.
The Enterprise Search Market
What’s the vendors focus?Ø Business IntelligenceØ Text-mining (linguistic support!)Ø E-CommerceØ Image/Video: Visual Information retrievalØ Audio/Video: speech recognitionØ eDiscoveryØ …
![Page 48: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/48.jpg)
48.
The Enterprise Search Market
Enterprise search products can be:Ø Specialized — products that use search to address a need in a
specific area like customer service or to supplement business intelligence platforms
Ø Integrated — products that merge search capabilities with other information management functions like content management, collaboration or analytics; the goal of these products is to become deeply ingrained in the technology portfolio so that the use of the tool becomes a ubiquitous part of the information workplace
Ø Detached — products like Google’s appliance focused on ease of deployment and flexibility
![Page 49: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/49.jpg)
49.
The Enterprise Search Market
Forrester (september 2011) evaluated twelve vendors/products in its Market Overview (not including open source):Ø Autonomy IDOL 7 Acquired by HPØ Attivio AIE 1.3Ø Coveo Platform 6.5Ø Endeca Latitude 2 Acquired by OracleØ Exalead CloudView 5.1Ø Fabsoft Mindbreeze 5.0Ø Google Search Appliance 6.8Ø IBM Content Analytics with Enterprise Search 2.2Ø ISYS Enterprise Server v9.7Ø Microsoft FAST Search for SharePoint Server 2010Ø Sinequa ES 7Ø Vivisimo Velocity 8.0
![Page 50: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/50.jpg)
50.
The Enterprise Search Market
Important TrendsØ Social and collaborative featuresØ Mobile supportØ Audio/VideoØ CloudØ Spatial supportØ Semantics/text analyticsØ Search Based Applications
(“SBA”)
![Page 51: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/51.jpg)
51.
Wrap up
Search Technology platforms are mature and are available on the market in abundance and multiple flavors.
But,
make sure you are:
Cost-effective (what’s the business case? Priorities?)
Consistent in Content classification and Governance
Continuously monitoring usage and improving relevance
Clever & Pragmatic
Creative (User interface, multi-device)
![Page 52: Enterprise Search - Introduction](https://reader034.fdocuments.in/reader034/viewer/2022050905/54c8c1004a79598c568b459b/html5/thumbnails/52.jpg)
52.
Thank you!