CREATE SEARCH DRIVEN BUSINESS INTELLIGENCE APPLICATION USING FAST SEARCH FOR SHAREPOINT

30
CREATE SEARCH DRIVEN BUSINESS INTELLIGENCE APPLICATION USING FAST SEARCH FOR SHAREPOINT Pankaj Bose Niraj Tenany

Transcript of CREATE SEARCH DRIVEN BUSINESS INTELLIGENCE APPLICATION USING FAST SEARCH FOR SHAREPOINT

CREATE SEARCH DRIVEN BUSINESS INTELLIGENCE APPLICATION USING

FAST SEARCH FOR SHAREPOINT

Pankaj Bose

Niraj Tenany

• Session Overview

• Presenters Bio

• Introduction to Netwoven

• Industry Facts (Business Intelligence and Search)

• Business Intelligence Challenges

• Benefits of integrating Business Intelligence and Search

• Search Market

• FAST Search features and functions

• Steps To Build Search Centric Application

• Demo

• Wrap up and Next Steps

AGENDA

SESSION OVERVIEW

• Understand the Importance of Search Based Applications in today’s enterprise and how to integrate Business Intelligence and Search for business benefit

• Role of Microsoft FAST Search in an enterprise for building Search based Business IntelligenBusiness Intelligence Application

• Demonstration of a FAST search based BI applications

NIRAJ TENANY – PRESIDENT AND CEO, NETWOVEN, INC.

• Based in USA

• Formerly Microsoft Consulting Services Head of Enterprise Applications Practice

• Frequent speaker in Enterprise Content Management and Search events

• Works with Fortune 1000 companies to define and implement ECM, BI, and Search strategies

PANKAJ BOSE – ECM AND SEARCH PRACTICE HEAD, NETWOVEN, INC.

• Based in India

• Architect and implementer of large scale Enterprise Content Management and Search based applications for large and medium sized companies

• Formerly, Architect at Lockheed Martin Corp in USA as Technical Lead for ECM and Search implementations

• Extensive experience with different ECM and Search platforms

NETWOVEN BACKGROUND

Founded in 2001 by former Microsoft executives

Top talent from industry

Firm leadership comprised of Microsoft, Accenture, Oracle and Intel talent

Former senior executive of Wipro, Infosys, McKinsey on our board

US headquartered company with development center in India

Save the Children

Industry Verticals

Life Sciences Financial

Services Energy Manufacturing Not For Profit Software

Netwoven Technology Services

Enterprise Content Management

Business Intelligence

Business Process Management

Netwoven Solution Practices

NETWOVEN SERVICES

Solution Area Description

Out-Tasking Your SharePoint 2010 SharePoint managed services with L1, L2 and L3 support

Upgrading to SharePoint 2010 Upgrade intranet, extranet or internet sites

Social Networking with SharePoint 2010 Build communities with SharePoint

Document Management with SharePoint 2010 Develop or Migration document management systems to SharePoint

Business Intelligence with SharePoint 2010 Develop reports, dashboards and map based drill downs with SharePoint

Portal and Collaboration with SharePoint 2010 Developing intranet and collaboration sites using SharePoint

Web Content Management with SharePoint 2010 Develop intranet and extranet sites using SharePoint

Enterprise Search with SharePoint 2010 Develop Search based Applications using SharePoint 2010

NETWOVEN SHAREPOINT SERVICES

• Every 2 days we generate more data than we did from the dawn of time through 2003

• Worldwide volume of data is growing at 59% per year

• Between 75% and 85% of data is unstructured

• In 5 years the majority of analytic data will come from unstructured sources

- Gartner Blog

BUSINESS INTELLIGENCE FACTS

• Time spent searching for information averages 8.8 hours per week for a cost of $14,209 per worker per year

• Analyzing information soaks up 8.1 hours per week, costing an organization $13,078 annually

SEARCH FACTS

- IDC

BUSINESS INTELLIGENCE CHALLENGES

• With data growing exponentially businesses need better tools to get information faster

• Complexity of integrating large number of disparate data sources

• Difficulty in integrating structured and unstructured data

• End users spend a great deal of time trying to find information, reinventing the wheel, and not having the right information to make decisions

BENEFITS OF INTEGRATING BUSINESS INTELLIGENCE AND SEARCH

• Reduce the time lost searching for information

• Simplifies integration of disparate data sources

• Improves integration of structured and unstructured data there by providing better insights

• Reduce the time lost reinventing the wheel

• Improve decision making by having the right information available in a timely manner

BENEFITS OF INTEGRATING BUSINESS INTELLIGENCE AND SEARCH

• Integration of search and other types of applications creates a new category of applications called Search Based Applications

• Integration of BI and search is one form of search based application

BENEFITS OF INTEGRATING BUSINESS INTELLIGENCE USING SEARCH

• Easy to use interface that end users understand

• Enables the integration and search of any data source

• Search Across Multiple Sources

• Easily integrates structured and unstructured data sources

• Indexes the sources in Real Time

• Provided Assisted Navigation To Filter the Search Results there by reducing the time it takes to find information

• Ability to display results in highly visual and interactive form

INFORMATION ACCESS COMPLEXITY

SIMPLIFIED INFORMATION ACCESS

WHAT IS A SEARCH BASED APPLICATION?

• Search-based applications (SBA) are software applications in which a search engine platform is used as the core infrastructure for information access and reporting. SBAs use semantic technologies to aggregate, normalize and classify unstructured, semi-structured and/or structured content across multiple repositories, and employ natural language technologies for accessing the aggregated information.

- Wikipedia

• Advanced content processing

• Extraction of entities, properties, key phrases

• Content classification

• Sentiment analysis

• Connectors

• Out of the box (from SharePoint interface)

• Out of the box JDBC connectors

• Content API to create custom connectors

• Query and Federation Object Model

• FOM to search repositories by native search process

• FOM to create core results XML and Populates Refiners

• Query object model to execute complex queries using Fast Query Language

COMPONENTS OF FAST SEARCH

• Identify your content source (possibly a mix)

• Structured (database fields with traditional field types)

• Non-structured (database fields – text, documents, web pages)

• Configure connectors to crawl content sources

• Use filters to crawl only specific type(s) of content you would like to crawl

• Review generated crawled properties

• Use SharePoint Central Admin UI or FAST PowerShell cmdlets

• Use SPY processor stage to review contents of crawled properties

• Add additional crawled properties if needed

STEPS TO BUILD A SEARCH CENTRIC APPLICATION - I

• Review and update content processing pipeline

• Extract entities

• Persons / Locations / Companies / Key phrases / Any other custom entities

• Use entity extraction framework of FAST For SharePoint, Service Pack 1

• Use Out of The Box or custom dictionaries

• Configure custom property extraction stage

• Create / Update \etc\config_data\DocumentProcessor\CustomPropertyExtractors.xml

• Create new crawled properties if needed

• Create managed properties and make them searchable and refinable

STEPS TO BUILD A SEARCH CENTRIC APPLICATION - II

• Review and update content processing pipeline

• Extend pipeline with custom processing stages

• Why?

• Mechanism

• Create an executable that takes some inputs and produce some outputs

• The executable can be any command (exe, java class, scripts etc.)

• Update \etc\pipelineextensibility.xml to add a RUN section that uses the command.

• Provide a set of crawled properties that act as input.

• Provide a set of crawled properties that get populated with the output.

• Reset the document processor service

o psctrl reset

» Feed a document

» Map crawled and managed properties

» Do a full crawl

STEPS TO BUILD A SEARCH CENTRIC APPLICATION - III

Classification Geo Search Sentiments

• Develop Search Interface

• Refinement panel makes great Dimensions

Refiners sorted by frequency

Indicates importance of a refiner

Exact counts / percentage

Helps in deep analysis of content

Applying refiner filters the result set

Leads to further granular analysis while exposing new dimensions

• Create visual refiners

• Extend the Refinement Panel web part

• Override the GetXPathNavigator method

• Get the refinement XML base.GetXPathNavigator

• Use the XML as data source for Chart controls

STEPS TO BUILD A SEARCH CENTRIC APPLICATION - IV

• Customize Search Result Web parts

• Extend SearchCoreResults web part.

Add additional sources

Override CreateDataSource and ConfigureDataSource properties to create / configure data source

Override GetXPathNavigator for mixing of results from data sources

• Change XSLT to display specific metadata

• Roll-up numbers by result collapsing

• Display previews

• Aggregate Search Results from Federation

• Create a new LocationRunTime class inheriting from ILocationRuntime and Irefinable

• Execute queries in native format

• Create Core Results XML

• Fill up the refiner

STEPS TO BUILD A SEARCH CENTRIC APPLICATION - V

Overview of the scenario

A US based Hospital chain conducts patient surveys for all of its locations to

Improve patient loyalty

Increase referrals

Evaluate healthcare provider performance

Identify areas of improvements

They target all of its in-house patient for surveys at the time of their discharge. The survey responses are stored in a database. The hospital typically use SQL Server SSAS and SSRS to produce BI dashboards and reports. While this works to a great extent there are some short falls

The reports only considers the specific answers to objective questions like “How did you like the meal?”. The options being Excellent, Good, Not so good, Horrible. However survey respondents can express their true sentiments in one of more sentences. As traditional BI cannot make use of non-structured content, these are left out.

BI reports precisely tell us about WHAT. However many times it stops short of informing us WHY?

The BI reports does not have provisions for answering to flexible user questions like: Cleanliness of hospital toilets.

Important attributes / entities hidden within the comments text are ignored while they could be crucial business dimensions.

Hospital management decided to deploy search to extract information as discussed above while retaining BI capabilities.

USE CASE

DEMO

HOW WE DID IT • Survey data is available in database

Comments is a text field that is used for key phrase extraction

Other fields used are of regular data types – string, integer, etc.

• For key phrase extraction and normalization used external application

(FAST ESP does have key phrase extraction processor, but FS4SP does not have that yet)

• Using key phrases created a dictionary. The dictionary is used in a custom property extraction processor

• The processor fills in crawled properties of sentiments during indexing

• Database indexing is done using JDBC connector (BDC also works)

• Generated crawled properties are mapped to managed properties that need to be searched or used in refiners – such as Overall Experience, Speciality, No of days in hospital, etc.

HOW WE DID IT - II • Using Federation Object Model

• Visual refiners are created using existing RefinementManager object in the search page. This can also be done extending RefinementPanel webpart.

• RefinementManager provides refiner XML

• MSCharts control is using refiner XML

• Selected refiners are being used to construct the breadcrumb

• KeywordQueries objects are also being used to collect data points for multiple timeframes.

• SearchCoreResults webpart XSLT has been updated to display patient comments

• Sentiments are extracted key phrases represented as refiners

COMMON SEARCH APPLICATION CATEGORIES

• Extended search platforms

• • Search engines

• • Question-answering applications

• • Categorization/metadata tagging tools

• • Categorizers and clustering engines

• • Visualization tools for information navigation and analysis

• • Filtering and alerting tools and text analytics

• • Translation and globalization software

CONTACTS

• Niraj Tenany

• President and CEO, Netwoven, Inc.

[email protected]

• Pankaj Bose

• ECM Practice Head, Netwoven, Inc.

[email protected]

• Rashi Bajaj

• Business Development Manager

[email protected]