II-SDV 2015, 20 - 21 April, in Nice
-
Upload
dr-haxel-cem-gmbh -
Category
Internet
-
view
650 -
download
0
Transcript of II-SDV 2015, 20 - 21 April, in Nice
1
Search Technologies: Who We Are
The leading independent IT services firm specializing in the design,
implementation, and management of enterprise search and big data
search solutions.
2
Search Technologies: Background
San Diego
London UK
San Jose, CR
Cincinnati
San Francisco
Washington (HQ)
Frankfurt DE
• Founded 2005
• 150+ employees
• 600+ customers worldwide
• Deep enterprise search expertise
• Consistent revenue growth
• Consistent profitability
3
Search Engine and Big Data ExpertiseOur Technology and Integration Partners
4
600+ Customers
5
Search Technologies: What We Do
• All aspects of search application implementation
– Content access and processing, search system architecture, configuration, deployment
– Accuracy analysis, metrics, engine scoring, relevancy ranking, query enhancement
– User interface, analytics, visualization
• Technology assets to support implementation– Aspire high performance content processing
– Content Connectors (Document, Jive, SharePoint, Salesforce, Box.com, etc.)
• Engagement models
– Most projects start with an “assessment”
– Fully project-managed solutions, designed, delivered, and supported
– Experts for hire, supporting in-house teams or as a subcontractor
6
Search Technologies: Expertise by Role
Role Responsibilities
Project Manager Ensures project is on time and within budget.
Architect Designs overall solution architecture.
Requirements Analyst Documents requirements and solution goals.
Engineer Hands-on software development and configuration.
Lead Engineer Lead developer – understands application from end to end.
Search Quality Analyst Analyzes search metrics, improves quality of results.
Data Analyst Analyzes data; defines content processing needs.
Support Engineer Provides 8x5 of 24x7 for software and managed services
7
Q&A
8
Microsoft Search Expertise
• Over 150 SharePoint/FAST customers
• 50 Engineers trained on Microsoft search technologies
• Projects with Elsevier, CPA, Florida Power & Light, Library of Congress,
GPO, Norton Rose, Daily Mail Group, Accenture, Unilever
• Working with all versions and combinations of SharePoint and FAST
(ESP, FSIS, FSIA, Etc.)
• FAST’s Worldwide Partner of the Year back in 2006
– 30,000+ days of implementation experience since…
• Already up-to-speed with SharePoint 2013
9
Google Search Appliance Expertise
• 30 Engineers trained on GSA
• Over 100 Google Search Appliance Customers
• Projects with AARP, Isilon, Petco, SAIC, ESRI, General Electric, Vistage,
Ning, NVIDIA, Hershey, Mayo Clinic and others
• Google recommends us for their most challenging GSA integration
projects
• Focus on integration and implementation issues that GSA does not
handle out-of-the-box
• 3rd Party Connector Development
10
Open Source Expertise
• 40 Engineers Trained on Solr/Lucene or ElasticSearch stack (ELK)
• Projects with Comcast, BBC, U.S. House, MemoryLane, Bloomberg,
Citibank, BusinessLink, Genentech, Qualcomm, YP.com
• Focus on extending document processing and query parsing
frameworks to enable open source to function in complex enterprise
scenarios
• Focus on extreme scale and performance scenarios – YP.com, Comcast
11
Big Data Expertise
• Expertise with Big Data technologies
– NoSQL – Hadoop, Cloudera
– Data Mining and Machine Learning – Mahout
– Distributed databases – Cassandra / Datastax
• Projects in Data Warehouse Search, Fraud Detection, Automated
Candidate Matching
12
Assessment & Delivery
13
Primary Engagement Models
• Provide complete search & big data solutions, from
requirements analysis and design through to
implementation and ongoing support
• Provide expert services to support in-house customer
projects or larger system integrators
14
Delivery Model
Assessment
• Deep dive to evaluate
technical situation and
business objectives
• Document Analysis
• Develop detailed project
plan and schedule
Implementation
• Focus on technical
execution and quality
• Tightly manage objectives
as per Assessment
• Ensure completion to
timeframe and budget
Support
• Knowledge transfer from
Implementation team
ensures smooth hand-off to
Support
– 8x5 or 24x7
– Managed Services
– Hosting
AssessmentStatement
of Work Implementation Completion Support
15
Define Business and
Technical Objectives
Review Existing
Applications
Review Data, Environment
and User Requirements
Review Performance Requirements
Define Future Architecture
Generate Assessment
Report
Assessment ProcessAn Assessment typically involves the following steps, but is always customized to the requirements of the customer.
DEFINE REVIEW & ANALYZE RECOMMEND REPORT
16
Assessment Document
• Executive Summary
• High Level Requirements
• System Overview
• Detailed Requirements
• New Initiatives
• Proposed Project Plan
• Conclusions and Next Steps
17
Assessment Benefits
• Deep focus on Technical and Business Objectives
• Detailed, documented options and recommendations
• Better visibility into the Project Scope
• Better communications and Project Management
• Opportunity to leverage expertise on your team
18
Project Execution Model• Projects organized around 3 key personnel
• Agile development methodologies (Sprints and Scrums)
• JIRA and Greenhopper used for issue tracking and project management
Technical Lead
Project Manager
Architect
19
Support & Managed Services
20
Technical Support & Managed Services• Standard and Premium Support available worldwide
• Application Managed Services available worldwide
• Communication Channels• Support Online Portal (http://support.searchtechnologies.com
• Support Phone Line (619 564 4351 option 1)
• Support Email ([email protected])
• Support Time Frames• 8x5 or 24x7
Regular Support Premium Support
Critical 4 business hours after logging the issue 2 hours after logging the issue and Call Support
Major 1 business day after logging the issue 4 business hours after logging the issue
Minor 2 business day after logging the issue 1 business day after logging the issue
Trivial 1 business week after logging the issue 2 business days after logging the issue
21
Support Online Portalhttp://support.searchtechnologies.com
22
Other Online Resources (Wiki)http://wiki.searchtechnologies.com
23
Hosting Services
• 10+ hosted customer applications
• 24x7 Technical Support
• Cloud Hosted Services
24
Organization
25
Executive TeamExecutive Enterprise Search Industry Experience
Kamran KhanPresident & CEO
19 years: International Sales, VP Sales, Executive
John Steinhauer VP Technology
16 years: Development Management, Project Management, Executive
Pat BoothDirector of Finance
17 years: Finance, Operations, Executive
Paul NelsonChief Architect
25 years: Development, Innovation, Architecting, Dev. Management
John BackVP Sales - US
15 years: Sales, VP Sales
Graham CharlesworthVP Sales - Europe
17 years: Business Development, VP Sales, Executive
Dennis TranVice President
21 years: International Sales, VP Sales
Graham GillenVP Marketing
15 years: VP Marketing, Product Marketing, Analyst & Partner Relations
Iain FletcherDirector Marketing Europe
17 years: International Sales, Product Management & Marketing
Years in the Search / IT Industry
26
Organization Chart
Kamran KhanCEO
Pat Booth
Director of Finance
Joni Morgan
Sr. Bus Analyst
Nathalie Rodriguez
Corp. Accountant
Karen Pramis
Corp. Accountant
Graham Gillen
VP Marketing
Stacy BrooksMarketing Mgr
Iain FletcherDir. Marketing
Europe
Telemarketing Associates
Graham Charlesworth
VP Sales Europe
Graham Jackson
Account Mgr
Bernd Rahmig
DE Acct Mgr
Linda BerryEU Finance &
Admin
John SteinhauerVP Technology
Phil LewisUK Tech Dir
16 Engineers
Maynor AlvaradoCR Tech Dir
59 Engineers
Joan SchaechEast PS Mgr
31 Engineers
Matt LumsdenWest PS Mgr9 Engineers
John-Henry GrossProduct Mgr
John BackVP Sales NA
Mary Jo HoughtonAccount Mgr - NE
Jerry JunkerAccount Mgr - MW
Joe AbramsAccount Mgr – W
Dennis TranGoogle Accounts
Mimy Indra
Account Mgr -Federal
Jan SeatonDirector HR
Paula SmallRecruiter
Amanda BolanosSr. Admin Asst
Kristin Andrews
Receptionist/AA
Paul NelsonVP, Chief Architect
27
Engineering Team
• Project Engineering
– Frontline technical consultants working on customer projects
• Project Management
– Global organization to manage customer projects
• Core Engineering
– Building assets and tools used by project teams and customers
• Technical Support and Managed Services
– Supporting software and Applications
• Sales Engineering
– Technical expertise to drive sales
28
Aspire Content Processing, Connectors and QPLTechnology Assets
29
Content sources
Connectors
AspireContent Processing
PipelinesIndexes
Search Engine
Web Browser
Staging Repository
Publishers
Technology Assets
1. Aspire Framework– High Performance Content Processing
– Ingests and processes content and publishes to a variety of indexes for commercial and open source search engines
2. Aspire Data Connectors– API level access to content repositories
3. Query Processing Language (QPL)– Advanced query processing
Complements to commercial and open source search technologies
1
2
3 QPL
30
Aspire Content Processing
31
Importance of Content Processing
• Inconsistent and sparse content, especially metadata, is a
leading cause of user dissatisfaction and underperformance
in search applications
• Meticulous preprocessing prior to indexing is a critical, yet
often neglected aspect of search systems
• The original format and structure of the content is typically
optimized for human consumption, content processing
optimizes it for indexing and search
32
Content sources
Connectors
AspireContent Processing Pipelines
Indexes
Search Engine
Web Browser
Staging Repository
Publishers
Content Processing Supports
QPL
• Optimum Relevancy & Recall
• Search Navigators
• Content Grouping
• Secure Content Hub For Enterprise Content
• Support for Advanced Analytics
33
Content sources
Connectors
AspireContent Processing Pipelines
Indexes
Search Engine
Web Browser
Staging Repository
Publishers
Content Processing Stages
QPL
• Connectors – Secure access to content
• Staging Repository – Fast & secure re-indexing
• Pipelines
– Cleansing, enriching and normalizing prior to indexing
• Publishers – Output to search engine
34
What is Aspire?
• A vendor neutral framework to support high-volume, high-
performance content processing
• A toolkit to create custom components needed to
implement high quality search implementations
• A highly effective and low cost way to prepare data for
indexing by extracting and normalizing metadata, cleansing
and enriching data
• A framework that enables Search Technologies to create
outstanding search experiences for customers
35
Content Processing Examples• Normalization
– Names, dates, synonyms, spelling
• Entity identification and resolution
• Derive additional metadata from content
• Discover hierarchy metadata
• Categorization
• Document Matching
• Document segmentation and concatenation
• Link analysis
• Duplicate detection
• Security analysis
Index
security
category
metadata
36
Indexes
Semantics
Text Mining
Quality Metrics
Aspire Aspire Aspire Aspire
Aspire Aspire Aspire Aspire
Big Data Framework
Big Data Array
Aspire Reference Architecture with Big Data Scaling for Big Data Solutions
Content sources
Connectors
AspireContent Processing Pipelines
Indexes
Search Engine
Web Browser
Staging Repository
Publishers
QPL
37
Aspire Benefits
• Vendor neutral framework “future proofs” solutions
• Mature toolkit provides full set of components to create
solutions faster, economically and reliably
• Improved index quality enabled by content processing
• Java based solution supports a wide array of computing
platforms and is scalable
• Workflow and scripting support enables more flexible and
maintainable solutions
38
Customers Using Aspire
• Search Technologies
• ACS / Xerox
• Adecco
• ASCO
• Aspermont
• BASF
• Bayer (POC)
• Blackberry
• Bloomberg/BNA
• Boehringer Ingelheim
• Carson-Dellosa
• CBBB
• Celera Systems
• Chick-Fil-A
• CPA Global
• EMC Corporation
• Evonik (Germany)
• Florida Power & Light
• GE Research
• GFR Media
• Haymarket (PistonHeads)
• Haymarket (HIFI)
• Hershey
• JobSite
• Just Eat
• Labour
• LOC
• Mitre
• NARA
• Deloitte
• Nectar
• NetDocuments
• New York Housing
• OLRC
• OSD/CAPE
• Penske Truck leasing
• Reed Business International
• Rolls Royce
• SCIE
• Seagate
• Shire
• Sony Media
• Sprint
• Thoughtworks
• United Nations
39
Aspire Fundamentals• An OSGi framework + plug-in components architecture
• Vendor independent
• Intuitive Admin UI
• Rich library of component bundles and components
– Connectors to content sources
– Document processing components
• Parsing, extracting, splitting, joining, metadata mapping, etc.
• Scripting support using Groovy
– Publishers to leading search engines
• Integration with Hadoop
40
Intuitive Modern Administration UI
41
Aspire Community
Licensing & Maintenance
• Free to download and use
• Registration Required
• License Agreement Required
• Maintenance & Support is not available
Packaging
• Framework and Core Components
• Publishers: Solr, CloudSearch and GSA
• Connectors for File system and RDB
• No security
• Javadoc for Programming New Components
• Administration Tool
• Archetypes for quickly creating new components and
distributions
• Access to Aspire Wiki
• Access to the Maven repository
– But for a limited set of components
Licensing & Maintenance
• Priced per server per month
• Maintenance and Technical Support included
Packaging
• Aspire Community, plus
– All currently available publishers*
– Corporate Site Map
– Enterprise Security
– Distributed Processing
– Connectors: CIFS, Heritrix, Enhanced RDB
– Dynamic Crawler Controls
• Access to Wiki
• Access to the Aspire Maven Repository
– Includes access to all released pipeline
components
• Technical Support (via support portal, telephone,
and e-mail) 8x5 or 24 x 7 support available
(additional cost)* Except FAST Content API
Aspire Enterprise
42
Connectors
43
Connectors Provide• API level access to repositories
• Retrieval of:
– Content and metadata
– ACLs for repositories that support security
– Hierarchy information
• Full and incremental crawling
• Multiple modes for crawl scheduling
• Search engine independence
• Ease of install and configuration from a common Admin UI
Connectors
44
Connectors• Aspire Enterprise Connectors
– File (CIFS)
– RDB
– Heritrix
• Premium Connectors
• SharePoint 2010
• SharePoint 2013
• Lotus Notes
• Amazon S3
• Confluence
• Documentum
• EMC eRoom
• Socialcast
• IBM Connections
• Salesforce.com
• TeamForge
• Oracle RightNow
• Jive
Connectors
45
QPL – Query Processing Language
46
We Expect Help With Queries
47
What is Query Processing?
• Analyzing the content of a query, determine a users intent and
optimize it for the search engine
• Examples:
– Term consolidation: red wine → “red wine”
– Term expansion: FSA → FSA OR “Financial Services Authority”
– Semantic expansion: Gun → Gun OR Rifle OR Pistol OR Firearm
– Geographic: Near Buffalo NY → &q=*:*&fq={!geofilt pt=45.15,-
93.85 sfield=store d=5}
– Normalization: Bill Smith → William Smith
48
Benefit of Query Processing
• Improved Precision and Recall
– Users want to type just a few terms
– Search engines want users to speak advanced Boolean
• Improved User Experience
– Query processing acts like a skilled interpreter
• Remove the extraneous
• Fill in the details to bridge the gap between human and machine
49
Query Parsing Language - QPL
• Search Engine Independent Server to Process Queries
– Scripting rule-based approach
– Supports maintainability of business logic
– Search engine independence reduces TCO
– Gives search engineers control, where it belongs
• UI engineers should not be controlling queries
– Search Technologies expert services to implement
and tune
QPL
50
DPMS and Aspire - EXAMPLES
51
DPMS Example #1 – Federal Register
52
DPMS Example #1 – Federal Register
53
DPMS Example #1 – Federal Register
54
DPMS Example #2 – World’s Patent Data
• Consolidation of 80 million XML encoded patents from 95 patent offices into a single, searchable application.
• A long and rich history since 1790 with numerous liguistics, normalization, cleansing, enrichment and data linking challenges
• Forward and backward references
• Assignee, inventor, corporate hierarchies for which normalization is required
• Multiple classification hierarchies which change over time
• Hierarchical claims structure
• Whole document comparison features (similarity search)
• KEY ISSUES: Controlling complexity and handling scale
55
DPMS Example #2 – World’s Patent Data
56
DPMS Example #2 – World’s Patent Data
57
DPMS Example #2 – World’s Patent Data
58
Document Processing Methodology for Search
• The Philosophy
– Understand the Document Model
– Understand the User Model• Includes business-level requirements
– Create the Search Engine Model• Search = the pivot point between User and Data
– Document everything
59
DPMS – The Methodology
Assessment
(Search Technologies Architect and Business
Analyst)
DPMSAnalysis
(Knowledge Engineer, Business Analyst, etc.)
Assessment ReportExpert assessment and recommendations
Validation
Aspire
DMDs
Review(Architect, Domain
Experts, Peers)
1Assessment
2Detailed Analysis
3Execution
Implementation(Developer)
Validate DMDs
SearchEngine
60
Business Process Overview
Submission
Ingest Process
Congressional Submission
Workflow (folder)
Migration
Application
Bulk Submission
Process
Preservation
Archival Processing
Workflow
Archival Updating
Workflow
Access
Public User
Access & Delivery
Application
Authorized User
Access & Delivery
Application
Processing
Package Updating
Workflow
Access Processing
Workflow
Publishing Process
ILS Integration
Application
Submission
Process
Congressional Submission
Workflow (interactive)what renditionsare available?
how will metadata be
extracted and merged?
what manual edits may be
required?
how are PDF files processed?
how will the HTML rendition be
created
how will parser data and input files be
validated
what’s on the search form?
how will the content and metadata be
indexed
what are the navigators?
how will the MODS be created?
how are search results formatted?
what do content URLs look like?
DMD Defines How Data Flows Through System
61
Google Additional Slides
62
What Search Technologies Provides
• GSA Search Assessment Analysis
• Search application development
• Corporate Wide Search Solution
• SharePoint GSA search integration
• Custom Connectors, such as RightNow, Lotus Connections, Confluence, etc.
• System architecture and design
• Security integration
• Performance analysis and optimization
• Managed Service and 24x7 Support
63
GSA Assessment Services
• Search Application Assessment
– Requirements gathering and planning
• Entity Recognition Assessment
– Entity identification and implementation planning
• Sensitive Data Assessment
– Data security above and beyond document-level ACL compliance
64
Customer Examples
• EMC – Storage Platform
– Corporate Wide Search Platform for internal users and partners
– Aspire connectors: SharePoint, File system, Database, eRoom, JIVE, Teamforge, Socialcast
• Isilon Systems – Storage Platform
– Customer Support – RightNow Connector
– Sales – Salesforce.com
• Amirsys – Medical Diagnosis
– Decision Support Portal
– Used by 40,000 physicians in 50 countries
• Savvis – Service and Web Hosting Company
– Command Center application
– SharePoint Connector
65
Case Study Slides
66
Example CustomersCorporate Wide Search
• EMC
• Norton Rose (FAST ESP) – Application Management, Technical Support
• PTC (FAST ESP) – Tier 3 Technical Support, Hosting
• BNA (Solr/Lucene) – Application Management, Consulting, Hosting
• Unilever (Verity K2, RetrievalWare, FAST ESP) – Application Management, Consulting
• NXT Customers (NXT) – 40 Hosted NXT Applications
• Chick-Fil-A (FAST ESP) – Application Management, Consulting
• Seagate - GSA-based CWS. 3 connectors + Aspire Enterprise framework to normalize
Data Warehouse (Big Data)
• State Compensation Insurance Fund
E-Commerce
• Nordstrom
• Apple (anonymous)
• Samsung (anonymous)
Search & Match
• Adecco (anonymous) and/or Jobsite
Media & Publishing
• Reed Elsevier (Reconstruction Data etc.)
• CPA (FAST ESP) – Application Management, Consulting, Hosting
• Haymarket
• Gartner (?)
Government
• GPO (FAST ESP) – Application Management, Consulting
• Library of Congress (FAST ESP) – Application Management, Consulting, Hosting
• NARA – National Archives – Application and Infrastructure Architecture and Development, Consulting
• OLRC
Need more examples inDifferent solution areasMaybe not so many on CWS
67
Corporate Wide Search / Enterprise Search
68
Comcast
69
Comcast
Background
• Built on Solr/Lucene
• Largest cable and home internet provider in the US
• Search Technologies provides expert architecture, design and
development services to in-house team.
• Replacement of a home-grown system with new Solr / Hadoop
application used to service set-top box requests and browsing of TV
listings by subscribers
70
Comcast
Key Details
• Very fast indexer - 500 records per second
• Recommendations engine processes 2.8 billion records in 8 hours
(down from 24 hours).
• Vote-counting recommendations algorithm calculates recommended
movies and TV shows for a million movies and shows in Comcast’s
library.
• Millisecond search response - using Solrj
• Integration with and improvement of existing
ranking/grouping/boosting rules
71
Capital Group
72
Capital Group
Background
• Built on FAST ESP
• Global investment and financial management firm
• Search Technologies built the complete solution
• Intranet search portal serving multiple applications and departments
covering every aspect of the business
• Searching prior customer communications, presentations, and legal
documents
• Used by every aspect of the business
73
Capital Group
Key Details
• New, highly customised search user interface
• Migration from legacy RetrievalWare system
• Core technologies: Java Server Faces, Weblogic 9.2, Apache Web
Services API, Apache commons, Embedded Java DB
• Support for Chinese and Japanese
• Customised feeding and document processing
• Data resides in Documentum, Lotus Notes, Oracle
• Full Windows AD-based security
74
SAIC
75
SAIC
• Background
• Built on Google GSA
• Large Government-focused systems integrator
• Search Technologies provides expert services
• Intranet application
76
SAIC
• Key Details
• Indexing SharePoint Cluster of 50 Site collections
• Hundreds of User-Managed Sub-Sites
• Document-level security and NTLM authentication
• XSLT customization to display fields according to document type
• Massive expansion planned
77
Media & Publishing
78
Yellowpages.com
79
Yellowpages.com
Background
• Originally Built on FAST ESP. Recently migrated to Solr/Lucene
• Worlds Leading Internet Yellowpages Site
• Owned by AT&T
• Search Technologies involved since 2005 providing expert services on
both FAST ESP and Solr/Lucene
80
Yellowpages.com
Key Details
• Business Listings available for all 50 states
• Massively scalable search clusters in 2 data centers
• ATG based JSP GUI
• Oracle content updated daily
• Handling over 2000 queries per second
• Linguistic work (spellings, synonyms)
81
GPO.gov
82
GPO.gov
Background
• Built on FAST ESP & Documentum
• The publishing arm of the Federal Government
• Search Technologies is the main contractor for search, including
architecture, design, development and implementation
• The Federal Digital System www.gpo.gov/fdsys provides public access
to information provided by Congress and other Federal agencies
“The GPO and the Office of the Federal Register accomplished a minor miracle in warp speed time” - Ray Mosely, Director of the Federal Register
83
GPO.govKey Details
• 50+ data sources, each with its own legacy, format & purpose, including
US Laws, Congressional Reports, Daily Congressional Records,
Economic Indicators, Reports to the President and the Budget of the US
Government
• Developed a document processing infrastructure to prepare incoming
data sets for indexing
84
Computer Patent Annuities (CPA)
85
Computer Patent Annuities (CPA)
Background
• Built on FAST ESP
• Leading legal/intellectual property services provider
• Search Technologies is providing the complete solution
• A major new patent search application involving 90MM patents from
100+ authorities around the world
86
Computer Patent Annuities (CPA)
Key Details
• Data cleansing, normalization & enrichment
• Establishing new relationships between patents
• Fast “similarity searching” requiring highly optimized indexes
• Collaborative tools for patent research teams
• Search-driven BI features in SharePoint