Federated Search in a Disparate Environment

26
June 4, 2009 June 4, 2009 Federated Search in a Disparate Environment PREPARED FOR: PREPARED FOR: Gilbane San Francisco Gilbane San Francisco 8403 Colesville Road Silver Spring Metro Plaza 2 Suite 400 Silver Spring, MD 20910 301.588.5900 301.588.0390 [email protected] www.macf.com Helen L. Mitchell Curtis Helen L. Mitchell Curtis Senior Program Director, Enterprise Senior Program Director, Enterprise Solutions Solutions

Transcript of Federated Search in a Disparate Environment

Page 1: Federated Search in a Disparate Environment

June 4, 2009June 4, 2009

Federated Search in a Disparate Environment

PREPARED FOR:PREPARED FOR:Gilbane San FranciscoGilbane San Francisco

8403 Colesville Road Silver Spring Metro Plaza 2Suite 400Silver Spring, MD 20910

301.588.5900301.588.0390

[email protected] www.macf.com

Helen L. Mitchell CurtisHelen L. Mitchell CurtisSenior Program Director, Enterprise Senior Program Director, Enterprise

SolutionsSolutions

Page 2: Federated Search in a Disparate Environment

2

BiographyBiography

Helen L. Mitchell Curtis – Senior Program Director of Enterprise Solutions, Macfadden

• 32+ years at FDA, and led one of the largest enterprise search implementations among Civilian Federal Agencies

• Develop enterprise-wide search strategies & solutions• Integrate search technologies across IT applications and disparate

document repositories• Build governance, management and end user buy-in• Promote collaboration, standards, findability and improved

organization of data and document assets• Passion – to help clients to reduce costs, improve quality and

efficiency, reduce 'pain points' and achieve a positive search experience

Page 3: Federated Search in a Disparate Environment

3

About MacfaddenAbout Macfadden

• Founded in 1986 as a small disadvantaged entrepreneurial company-graduated SBA 8(a) in 1998

• Became 100% employee-owned in 2007, S-Corporation

• Acquired Systems Integration Group, Inc. and Total Security Services International, Inc. (TSSI) in 2008

• 225 employees; projected 2009 annual gross revenues $40 million; $135M in contract backlog; 90% prime contracts; (TSSI sole wholly-owned subsidiary)

CAPABILITIES:•Enterprise Search Solutions•Integrated IT Solutions & Security•Counter Terrorism Planning•Disaster Response Management•Threat & Vulnerability Assessment•Program/Project Management•Intelligence Gathering & Analysis

FAST X10 Partner

Microsoft Certified Partner - Information Worker Solutions with Search Specialization Competency

Page 4: Federated Search in a Disparate Environment

4

Clarify TermsClarify Terms

1. Definition by AIIM Market IQ2. Definition by CMS Watch3. A Federated Search Primer – Part II4. Deep Web Technologies

Page 5: Federated Search in a Disparate Environment

5

Findability IssuesFindability Issues

• AIIM Market IQ Research on Findability (of 528 end users):• 50% believe Findability in their organization is “Worse to Much Worse”

than their consumer-facing web sites• 49% have no formal goal for Enterprise Findability within their

organizations• 49% “Agreed or Strongly Agreed” that finding the information to do their

job is difficult and time consuming• 69% believe less than 50% of their organization's information is

searchable online• 36% reference five or more systems in any given week

Source: AIIM Market Intelligence, 2008

Page 6: Federated Search in a Disparate Environment

6

Why Use Federated SearchWhy Use Federated Search

1. To increase findability so users can accomplish their business objectives

2. To access multiple content sources through a common search interface

3. To increase user awareness of all content sources4. To eliminate using multiple database search

protocols and passwords5. To access public or subscription search sites6. To search the deep web for scientific, technical and

business content 7. To reduce search time and display results in a

common format

Page 7: Federated Search in a Disparate Environment

7

Federated Federated ‘‘Master IndexMaster Index’’ SearchSearch

• Index content from multiple data sources into a single master search index

• Queries & results come from that one master index• Many Enterprise Search products integrate FS via ‘connectors’ to

accomplish this (ex., FAST, Autonomy, Endeca)

Source: New Idea Engineering, Inc.

Page 8: Federated Search in a Disparate Environment

8

Federated Federated ‘‘Data SilosData Silos’’ SearchSearch

• ‘Search federator’ process queries each data source silo• Transforms the users search terms to match each content source's

requirements• Submits the query to each of the sources simultaneously• Merges each source’s results together - a single look and feel• Maintains no indices of its own, relies upon the capabilities of all

the linked systems

Source: New Idea Engineering, Inc.

Page 9: Federated Search in a Disparate Environment

9

Surface vs. Deep Web SearchSurface vs. Deep Web Search

Deep Web FS Examples:www.completeplanet.com ‐70,000+ searchable DBs & specialty search engineswww.science.gov‐federates U.S. federal agency science informationhttp://imlsdcc.grainger.uiuc.edu/‐ Institute of Museum & Library Services (IMLS) ‐ Digital Collections & Content w/descriptions of digital resources developed by IMLS grantees

Source: Juanico-Environmental Consultants, Ltd.

Page 10: Federated Search in a Disparate Environment

10

Vertical Search EngineVertical Search Engine

• Closely related to Deep Web – searches for a particular niche i.e., a specific industry, topic, type of content (e.g., scientific research, travel, movies, images, blogs)

• Example: www.vetseek.info - is a search engine focusing on veterinary science and related topics

Page 11: Federated Search in a Disparate Environment

11

ChallengesChallenges

• Authentication• Showing each record’s branding and copyright information• Licensed or subscription databases

• True De-duplication• Virtually impossible because DBs return 10-20 results at a

time• Vendors usually just de-duping the first results set

returned

• Security• Mapping user credentials and access rights to each

repository security model

• Speed• Limited by slowest search engine’s performance

Page 12: Federated Search in a Disparate Environment

12

Challenges Challenges (continued)(continued)

• Lack of data standardization• Each source has a unique access method & needs

translation• Metadata mapping between FSS and underlying systems

• Access methods to sources may change• Requires an interface rewrite or modification

• Rules for error handling • Ex. Query term not available—exclude the query, the

repository, or proceed without the term?• Ex. Timeouts or connection problem

• Complex searches usually not available• Fielded searches

Page 13: Federated Search in a Disparate Environment

13

Challenges Challenges (continued)(continued)

• Relevancy scores• Can’t identify a single relevancy ranking model

• Relevancy rankings for repository’s results refers to its own• May be not be useful when comparing the results with

those from another system

• Access to content stored in a variety of places

• Results page may not let user obtain identified documents• This may involve a built-in viewer or invoking the owning

product’s interface.

• Combining navigators from each result set• i.e., faceted search, taxonomies and auto-generate

clusters

• Selecting the right FS engine• Depends on business goals, type of content sources –

structured vs. unstructured, licensed/subscriptions

Page 14: Federated Search in a Disparate Environment

14

BenefitsBenefits

• Single master index• Quicker response times• No need to access original data sources• Relevancy algorithms applied uniformly• Dynamic navigators are available for all documents

• Time savings• Searches many sources at one time• Combines results into a single results page

• Quality of results• Client selects the sources to search

• Minimum impact on the data silos • Only accessed when a user performs a query

• Eliminates increased load crawling/indexing the data source

Page 15: Federated Search in a Disparate Environment

15

Benefits Benefits (continued)(continued)

• Improve productivity• Reduces number of searches executed to find relevant results• Save, reuse, schedule, and even share effective search queries

• Leverage security controls at queried source• Access repositories secured against crawls but can be accessed by

search queries

• Reduce costs• No additional capacity requirements for content index since its not

crawled by search server

• Most current content• As soon as the source is updated, the info is available to the searcher

on the very next query

• Increase awareness• Identify most relevant sources to search based on # of results each

source produced

Page 16: Federated Search in a Disparate Environment

16

FDA Case Study SuccessFDA Case Study Success(Federated (Federated ‘‘Master IndexMaster Index’’ Search System)Search System)

ACTIONS RESULT

Started small with high ‘pain points’

Increased productivity & popularity

Modified business processes*

Standardized nomenclature increased efficiencies

Users across organization could find content in silos

Produced more timely and QUALITY work products

Indexed structured & unstructured content repositories with document level security

Grew from 1 repository of 500 documents to 50 repositories with 30+ million documents & data. Users access based on ‘need to know’.

Introduced standardized search web services into applications

Decreased development time and costs, increased management & user acceptance, integrated in more applications

Increased user awareness through training, newsletters and meetings

Used more & content added. Search requirements gathered at BEGINNING of project development.

Page 17: Federated Search in a Disparate Environment

17

FSS ExampleFSS Example(uses FAST ESP (uses FAST ESP –– Vertical Search)Vertical Search)

Page 18: Federated Search in a Disparate Environment

18

FSS ExampleFSS Example(uses MS & (uses MS & VivisimoVivisimo))

Page 19: Federated Search in a Disparate Environment

19

FSS Example FSS Example (uses (uses WebfeatWebfeat))

Page 20: Federated Search in a Disparate Environment

20

Best PracticesBest Practices

Page 21: Federated Search in a Disparate Environment

21

Future VisionFuture Vision

Page 22: Federated Search in a Disparate Environment

22

Future Vision Future Vision (continued)(continued)

Page 23: Federated Search in a Disparate Environment

23

ResourcesResources

• Great source of info on many Federated Search topics: www.federatedsearchblog.com – Author: Sol Lederman

• List of Open Source & commercial search components & tools: http://www.searchcomponentsonline.com/federated-search-vendors.html

• List of many Deep Web Databases: http://www.noodletools.com/debbie/literacies/information/5locate/advicedepth.html

• Info on the Deep Web: http://www.internettutorials.net/deepweb.asp

• Some Digital Image Resources on the Deep Web: http://www.readwriteweb.com/archives/digital_image_resources_on_the_deep_web.php

• Info on Vertical Search Engines:http://www.altsearchengines.com/category/verticals/

• 50 Niche Search Engines: http://www.accrediteddldegrees.com/2008/50-niche-search-engines-that-will-make-your-everyday-life-easier/

• Library of Congress list of FS Portal Products & Vendors: http://www.loc.gov/catdir/lcpaig/portalproducts.html

• 99 Resources to Research & Mine the Invisible Web: http://www.collegedegree.com/library/college-life/99-resources-to/

Page 24: Federated Search in a Disparate Environment

24

ReferencesReferences

• “What’s in a Name: Federated Search” – By Miles Kehoe, New Idea Engineering, Inc. - Volume 4 Number 4 - August 2007

• “Federated Search Engine Article” - Online (Weston, Conn.) 28 no2 16-19 Mr/Ap2004 (Reprint of article by Donna Fryer www.SearchitRight.com )

• “Growing Up With Federated Search” - by Walt Warnick, OSTI • “Sophisticated Yet Simple - The Technology Behind OSTI's E-print Network:

Part 3” – Walt Warnick, OSTI• “Vertical Search Engines & the Deep Web” - Laura B. Cohen

http://www.internettutorials.net/• www.federatedsearchblog.com – by Sol Lederman • “Exploring a ‘Deep Web’ that Google can’t Grasp” - NYT 2-23-09

http://www.nytimes.com/2009/02/23/technology/internet/23search.html?_r=1&ref=business

• “Federated Search Primer, Part I-III” – by Sol Lederman• www.searchdoneright.com – by Vivisimo –Raoul – CEO & Cofounder• “Enterprise Search Grows Up’”- Podcast from BizTalk• “Federation: Big Need, Still a Challenge” – Stephen Arnold, 4/25/08• “The Future of Federated Search or What Will the World Look Like in 10 Years”

– Rich Turner

Page 25: Federated Search in a Disparate Environment

25

THANK YOU!

Helen L. Mitchell CurtisSenior Program Director, Enterprise Solutions

[email protected]

240-247-1946 (w)240-743-7975 (m)

25

Page 26: Federated Search in a Disparate Environment

26

MACFADDENMACFADDEN

Delivering Results. Exceeding Expectations.Delivering Results. Exceeding Expectations.