Bussiness intelligence 2011

94
Business Intelligence Connecting the dots for discoveries 1 Ding Li

description

How to release the value locked in isolated data to assist business discovery

Transcript of Bussiness intelligence 2011

Page 1: Bussiness intelligence 2011

1

Business IntelligenceConnecting the dots for discoveries

D i n g L i

Page 2: Bussiness intelligence 2011

Can Business be Intelligent?

• Today’s business is in an age of dramatic change, Business Intelligence (BI) is an interactive process for corporates to promptly discern the trends or patterns of business operations, products, services, customers, markets and competitors, thereby to derive insights and draw conclusions.

• Human brains are extremely powerful to integrate separated data (even almost forgotten ones) with current scenario to make the best possible decision, cooperation’s decision systems have a long way to go to be even nearly as efficient.

• It requires a combination of technologies, art and human intelligence to surface the value under the data sea efficiently.

• Contents of the study:

2

◦ BI System

◦ BI Data Flow Architecture

◦ BI Development Process

◦ BI and Data Preparation

◦ BI and Data Visualization

◦ BI and Dashboard Design

◦ BI and Web Analytics

◦ BI and Social Network

◦ BI and Semantic Technologies

◦ BI and Algorithm

Page 3: Bussiness intelligence 2011

3

Pressures-Responses-Support Model

Globalization

Customer Demand

Market Conditions

Competition

Technology Advance

Regulations

Business Environment

Organization Responses

Strategic Planning

New Business Models

Restructure Business Processes

Choose New Vendors

Improve Partnership Relationships

Improve Information Systems

Encourage Innovation

Improve Customer Service

Improve Communication

Improve Data Access

Automate tasks

Real-time Response

Pressures

Opportunities

Decision and Support

Analysis

Predictions

Decisions

Business Intelligence

Support

(Turban, 2010)

Page 4: Bussiness intelligence 2011

4

Brief History of BI

• 1958, Hans Peter Luhn published a paper “A Business Intelligence System” in the IBM System Journal. “the ability to apprehend the interrelationships of presented facts in such a way as to

guide action towards a desired goal.”

• 1983, Teradata sold the first relational database management system (RDBMS) designed specifically for decision support to Wells Fargo.

• 1992, Bill Inmon published a book “Building the Data Warehouse” (Wiley).• 1995, The Data Warehouse Institute (TDWI) was formed.• 1996, Ralph Kimball published a book “The Data Warehouse Toolkit: practical

techniques for building dimensional data warehouses” Business units build their own data “marts”, which could be connected with a “bus”.

• 1996, Jim Gray published an article “Data Cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals.” Support OLAP (online analytical processing)

(Hammerbacher, 2009)

Page 5: Bussiness intelligence 2011

5

Bench marking

Goal of BI

Historical Current Predictive

Views of Business Operations

Better, Quicker Business Decision-Making

PerformanceManagementReporting Analytics Data

MiningPredictiveAnalytics

Internal Data External Data

FinanceR&D Supply &Production

Customer & Sales Usage Industry

AnalysisCompetitor

StatusUser

AnalysisProduct Ranking

TechnologyAnalysis

Page 6: Bussiness intelligence 2011

6

Common Pitfalls of Current System• Reporting data from departments are fragmented (e.g. in excel/PDF files).

• Manual extraction is prevalent.

• Updating frequency is relatively low, usually monthly.

• Analysts often spend more time on data collection than data analysis. Developers spend previous time for manual data feeding instead of improving the products and services.

• “Information silo problem”: rich information at source is not easily accessible, or even known to users.

• If it is a sin to have useful data unused or underused, then most organization, if not all, in the business world have the sin. The waste is tremendous.

Financial Supply Production Subscription Usage IndustryAnalysis

CompetitorStatus

Executives

Page 7: Bussiness intelligence 2011

7

Target System

Departmental Product Customer Relation

Categorized, Top-down Business Views

Automatic and Integrated SystemCross departments

Data Integration Statistical AnalysisBusiness Metrics Calculation

Overnight/Real time Data Collection

Executives

Strategic Analysis

Knowledge Workers

Operational Analysis

Managers

Tactic Analysis

Internal Data External Data

FinanceR&D Supply & Production

Subscription & Sales Usage Industry

AnalysisCompetitor

StatusUser

AnalysisProduct Ranking

TechnologyAnalysis

Page 8: Bussiness intelligence 2011

8

What BI is Not

• BI is not a panacea for a poor or outdated information system If information is not complete because some pieces are still in text file manipulated

manually, it is better to change the business process to move all the information into better data systems and automate business logics.

If the information is fragmented because there is no unique and well formatted keys to link them, it is better to improve the production system with well designed keys.

• BI is not just a collection of charts or tables BI is supposed to transform data into information. BI is supposed to link information together to provide insights and assist discovery. BI is supposed to support both information aggregation and drilldown. BI is supposed to support “information retrieval” – search capability. Replication of excel chart/table in BI system often results in static or mediocre

reports.

Page 9: Bussiness intelligence 2011

9

Where does Intelligence come from in BI?

• BI system organize and visualize information so well that human intelligence can be well engaged to analyze the information efficiently.

• Human put analysis methods and knowledge into BI system so a BI system can behave like a “smart” expert, following pre-defined logics.

• In a well-designed BI system, tremendous data can be linked together in a data network and manifest their underlying relationships which can be hidden from human eyes.

• In a (near) real time system, fresh data can arrive to decision makers’ fingertips so quickly that prompt steps can be taken before permanent damages are done, such as to retain customers just requested to cancel services.

Page 10: Bussiness intelligence 2011

10

BI Vendor Examples

• QlikTech – Qlikview is a flexible, nimble BI solution

• Microsoft – SQL server + SharePoint + Excel Power Pivot + Silverlight

• Actuate – Business Performance Management (BPM), built on BIRT (an open source BI platform)

• Oracle – comprehensive platform

• SAS – Business Analysis, Forecast, and Data Visualization

• IBM Cognos – Corporate Performance Management (CPM)

• SAP – supports a software-as-a-service infrastructure

• Google – Google Analytics

• Information Builders – Customer Relationship Management (CRM)

Page 11: Bussiness intelligence 2011

11

QlikView

• Pros Click driven, visually interactive interface is simple to learn and use. Based on in-memory associative technology, which is fast. Flexible data source (Oracle, SQL, excel, txt file). Quicker to build comparing with traditional BI systems.

• Cons Need straight-forward relationship among tables, which requires very clean

data to link multiple tables. Its underlining calculation logic, set analysis, is not rigorous and hard to use

for complicated logics. Its script language is not complete enough to accomplish comprehensive

tasks. All the data need to be in memory.

Page 12: Bussiness intelligence 2011

12

References• Gray, J., Bosworth, A., Layman, A. & Priahesh, H. (1996). Data cube: a relational aggregation operator generalizing group-by, cross-tab,

and sub-totals”. In Su, S. (Ed.). Proceedings of the 12th International Conference on Data Engineering (pp. 152-159). New York, NY: IEEE.

• Hammerbacher, J. (2009). Information platforms and the rise of the data scientist. In Segaran, T. & Hammerbacher, J. (Eds.). Beautiful Data, chapter 5. Sebastopol, CA: O’Reilly Media.

• Inmon, W. H. (1992). Building the data warehouse. New York, NY: Wiley and Sons.

• Kimball, R. & Ross, M. (1996). The Data Warehouse Toolkit: practical techniques for building dimensional data warehouses. New York, NY: Wiley and Sons.

• Luhn, H.P. (1958). A business intelligence system. IBM Journal of Research and Development, 2(4), 314-319.

• Turban, E., Sharda, R., Delen, D., and King, D. (2010). Business Intelligence: a managerial approach (2nd ed.). Upper Saddle River, NJ: Prentice Hall.

Page 13: Bussiness intelligence 2011

13

BI and Data SourceData Warehouse

Data Mart,Staging Table

Production Database

Manual Edited File

Description

A repository of an organization’s electronically stored data

A subset of an organizational data store, usually oriented to a specific purpose

Data extracted from production system directly

Files maintained by information workers.

Pros

Integrated,Validated,Logic clearly defined

Validated,Logic clearly defined,Easy to build

Real time,No extra storage

Flexible,Cheap

Cons

Long time to build,Expensive

Not fully integrated Impact on production,Data not validated,Transformation limited

Prone to human error,Lack of details

Page 14: Bussiness intelligence 2011

14

Data Warehouse and ETL

Oracle MS SQL Excel File Text File Web

Extract

Standardize Primary Keys

Cleaning

TransformTransform

FormatTranslate Embedded

Logic

Referential Integrity Check Indexing

Load

BI Data Warehouse

Summarization, Derivation

Merge Sort

Integration,Aggregation

BI System

(Moss, 2003)

Page 15: Bussiness intelligence 2011

15

Data Flow ArchitectureBuilding Data Mart

• Each department offers aggregated data in staging tables, or BI system queries directly from production/standby table.

• BI system integrates data and generates reports.

• Pro Quick to build

• Con Data not fully integrated

BI System

SubmissionStaging Tables

ProductionStagingTables

SalesStaging Tables

UsageStagingTables

FinancialsStaging Tables

External DataStaging Tables

Page 16: Bussiness intelligence 2011

16

Data Flow ArchitectureBuilding Data Warehouse

• Each department offers raw data or aggregated data in staging tables and push the data to a central database repository.

• BI system pulls data and results from data warehouse and generates reports.

• Pro Deep data integration and complicated

analysis can be realized efficiently.

• Con Long time to build

Data Warehouse

SubmissionStaging Tables

ProductionStagingTables

SalesStaging Tables

UsageStagingTables

FinancialsStaging Tables

External DataStaging Tables

BI System

Page 17: Bussiness intelligence 2011

17

Data Flow ArchitectureHybrid Design

• Start with data Mart.

• Gradually build data warehouse.

• Pro Quick to build data mart, eventually

have the advantage of data warehouse.

• Con Complicated process.

Central Staging

Repository

ProductionStaging Tables

SalesSummary

Tables

External DataStaging Tables

BI SystemUsage

StagingTables

FinancialsStaging Tables

SubmissionStaging Tables

Page 18: Bussiness intelligence 2011

18

Facebook’s Dataspace Management with Open Source Tools

Transactional Databases

Application LogsWeb

Crawls (Post)

All Data from Enterprise

Structured Data Unstructured Data

Hadoop Distributed File System (HDFS)

Query language Query UI (HiPal)

Hive

15 terabytes new data per day in 2009

Data Warehousing Framework

Argus

Portal for Sharing Charts and Graphs

Databee

Workflow Management

System

PyHive

Python Script Framework for

MapReduce

Cassandra

Storage System for Serving Data to End

Users

Tools

Parallelized Data Processing at Massive Scale

(Hammerbacher, 2009)

Page 19: Bussiness intelligence 2011

19

References• Hammerbacher, J. (2009). Information platforms and the rise of the data scientist. In Segaran, T. & Hammerbacher, J. (Eds.). Beautiful

Data, chapter 5. Sebastopol, CA: O’Reilly Media.

• Moss, L. T. & Atre, S. (2003). Business intelligence roadmap: the complete project lifecycle for decision-support applications. Boston, MA: Addison –Wesley.

Page 20: Bussiness intelligence 2011

20

Heavyweight Development Process

(Moss, 2003)

Page 21: Bussiness intelligence 2011

21

Agile Development ProcessPlan•Business Goals•KPI

Analysis•Data Sources•Calculation Logics

Data ETL•Extraction•Transform•Loading

Design•Report Layout•Data Visualization

Validation•Data•Logics

Feedback•New Requirements

Phased Release.

◦ Important KPI first.

◦ Well connected data first.

Quick Feedback

◦ Design

◦ Data

◦ Logic

Page 22: Bussiness intelligence 2011

22

Challenges of BI Management

• BI project is across all departments, winning a cooperative support is the key for its success.

• BI development often encounter unexpected issues. Forcing a deadline may cause low-quality report; relaxing due date too much may halt a project.

• BI system is very efficient to expose data abnormalities, if data owners and suppliers can treat the process as a rare opportunity to fix data at source, a more cleaner data system can be an excellent bonus of a BI project.

Page 23: Bussiness intelligence 2011

23

References• Moss, L. T. & Atre, S. (2003). Business intelligence roadmap: the complete project lifecycle for decision-support applications. Boston,

MA: Addison –Wesley.

Page 24: Bussiness intelligence 2011

24

Data Connection and Naming Issues

• Naming issues to link dataSame thing with different namesDifferent things with the same name

• Possible SolutionsMatching on multiple fields

Choose a set of parameters and create a set of fixed rules deciding things match or not.

Collective reconciliation Take advantage of the full network of data for record

matching.(Segaran, 2009)

Page 25: Bussiness intelligence 2011

25

Matching on Multiple Fields

• Setup matching rules1. First Name

Last NameCountryOrganizationDepartment

2. EmailLast Name

Submit AuthorTufte, Ed

CountryUS

OrganizationPrinceton

DepartmentPolitics

[email protected]

Author ProfileEdward R. Tufte

CountryUnited States

OrganizationPrinceton University

DepartmentPolitical Science

[email protected]

Page 26: Bussiness intelligence 2011

26

Collective Reconciliation

• Even not one field match perfectly for the submitting author, we can conclude this as a match by combining the similarity of multiple fields.

Submit AuthorTufte, Ed

CountryUS

OrganizationPrinceton

DepartmentPolitics

Author ProfileEdward R. Tufte

CountryUnited States

OrganizationPrinceton University

DepartmentPolitical Science

Page 27: Bussiness intelligence 2011

27

Data Modeling – Event ChainSubmit

Editor Review

Peer Review

Production

Online

Downloads

Submit Date

Final Decision Date

Online Date

Review Dates

Download Date

• Separate dates make it easy to trace the history of articles in the system.

• User can select a period of submit date, and the charts of accept articles and published articles will only include the articles submitted in the period.

• The model is suitable for detail analysis.

Page 28: Bussiness intelligence 2011

28

Data Modeling – Event List• All the event dates

and regions are consolidated in the event table.

• When a journal, a period or a region is selected, all the charts will be changed to reflect the selection.

• The data model is suitable for high level overview.

Event IDJournal

Event DateEvent Region

Submit

Editor Review

Peer Review Production

Online

Usage

Page 29: Bussiness intelligence 2011

29

References• Segaran, T. (2009). Connecting data. In Segaran, T. & Hammerbacher, J. (Eds.). Beautiful Data, chapter 20. Sebastopol, CA: O’Reilly

Media.

Page 30: Bussiness intelligence 2011

30

Data Visualization

• Informative Reveal intended message clearly with enough data With different perspectives to facilitate discovery

• Efficient Visually emphasize what matters and reveal relationship Use axes, color and size to convey meaning

• Novel Break the limit of default format, choose best format to suit data A fresh look at the data A new level of understanding

• Aesthetic Appropriate usage of graphical construction to offer visual appeal.

(Lliinsky, 2010)

Page 31: Bussiness intelligence 2011

31

1854 Cholera Epidemic in London

The epidemic took the lives of 600 Londoners in September 1854. What was the cause?

Dr. John Snow started the mapping of incident location.

(Tufte, 2001)

Page 32: Bussiness intelligence 2011

32

Discovery seems so easy when right information are put togetherThen Dr. John Snow linked the incident location to pump sites.

It is verified later the Broad Street pump was the cause of the epidemic.

(Tufte, 2001)

Page 33: Bussiness intelligence 2011

33

2008 Electoral Vote Results of Presidential Election

(Nagourney, 2008)

Issue: the geographically accurate map is actually a very inaccurate map of electoral influence.

Electoral Votes

N.J.

16 15

Page 34: Bussiness intelligence 2011

34

2008 Electoral Vote Results of Presidential Election - Revision

(Lliinsky, 2010)

Accurate and beautiful: a proportionally weighted electoral vote results map of the United States

Electoral Votes

16 15

Page 35: Bussiness intelligence 2011

35

Mining and Visualizing Social Patterns

From public data on a local newspaper: 18 women attending 14 different social events.

The links between woman are weighted by the number of events both woman attended.

Start with strongest link to reveal clustering.

(Krebs, 2010)

Page 36: Bussiness intelligence 2011

36

Mining and Visualizing Social Patterns(2)

Gradual Inclusion: focuses initially on the strongest tires in the structure and then gradually lowers the membership threshold to reveal weaker tiers in the network.

Very weak links are dismissed as social noise.

(Krebs, 2010)

Page 37: Bussiness intelligence 2011

37

References• Krebs, V. (2010). Your choices reveal who you are: mining and visualizing social patterns. In Steele, J. & Lliinsky, N. (Eds.). Beautiful

visualization, Chapter 7. Sebastopol, CA: O’Reilly Media.

• Lliinsky, N. (2010). On beauty. In Steele, J. & Lliinsky, N. (Eds.). Beautiful visualization, Chapter 1. Sebastopol, CA: O’Reilly Media.

• Nagourney, A., Zeleny, J. & Carter, S. (2008). The electoral map: key states. The New York Times. Retrieved from http://elections.nytimes.com/2008/president/whos-ahead/key-states/map.html.

• Tufle, E. (2001). The Visual Display of Quantitative Information (2nd ed.). Connecticut , US: Graphics Press.

Page 38: Bussiness intelligence 2011

38

Challenge of Dashboard Design• “A dashboard is a visual display of the most important information needed to achieve one

or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance.”

• “Most dashboards fail to communicate efficiently and effectively, not because of inadequate technology (at least not primarily), but because of poorly designed implementations.”

• “No matter how great the technology, a dashboard’s success as a medium of communication is a product of design, a result of a display that speaks clearly and immediately.”

• “Dashboards can tap into the tremendous power of visual perception to communicate, but only if those who implement them understand visual perception and apply that understanding through design principles and practices that are aligned with the way people see and think.”

• Unfortunately, most vendors focus their marketing efforts on flash and dazzle that subvert the goals of clear communication. “Once implemented, however, these cute displays lose their spark in a matter of days and become just plain annoying.”

(Few, 2006)

Page 39: Bussiness intelligence 2011

39

Common Measures (KPIs)Category Measures

Sales BookingsBillingsSales pipelineNumber of ordersOrder amountsSelling prices

Marketing Market shareCampaign successCustomer demographics

Finance RevenuesExpensesProfits

Web Services

Number of visitorsNumber of page hitsVisit durations

Comparative measure Example

The same measure at the same point in time in the past

The same day last year

The same measure at some other point in time in the past

The end of last year

The current target for the measure

A budgeted amount for the current period

A prior prediction of the measure

Forecast of where we expected to be today

An extrapolation of the current measure

Projection out into the future, e.g. year end.

Some measure of the norm for this measure

Average, normal range or a bench mark.

(Few, 2006)

Page 40: Bussiness intelligence 2011

40

Non-Quantitative Dashboard Data

• Tasks that behind schedule

• Tasks that need to be completed

• Accomplishments that should to be highlighted.

• Issues that need to be investigated

(Few, 2006)

Page 41: Bussiness intelligence 2011

41

Utilize Short-Term Memory

• Memory comes in three fundamental types: Iconic memory (a.k.a. the visual sensory register) Short-term memory (a.k.a. working memory) Long-term memory

• Only 3-9 chunks of information can be stored in short-term memory.• Graphs over text.

Individual numbers are stored in discrete chunks. One or more lines in a line graph, can represent a great deal of information as a single

chunk.

• Relevant information on the same screen. Once the information is no longer visible, unless it is one of the few chunks of

information stored in short-term memory, it is no longer available. If everything remains within eye span, users can exchange information in and out of

short-term memory at lighting speed.(Few, 2006)

Page 42: Bussiness intelligence 2011

42

Information in Well-designed Dashboard

• Exceptionally well organized All important data in one page

• Condensed, primarily in the form of summaries and exceptions Single numbers from sums or averages. Something falls outside the realm of normality, which needs attention.

• Specific to and customized for the dashboard’s audience and objectives Information should be narrowed to address the objective(s). Use audience’s vocabulary.

• Displayed using concise and often small media that communicate the data and its message in the clearest and most direct way possible. Reduce the non-data pixels. Enhance the data pixels.

(Few, 2006)

Page 43: Bussiness intelligence 2011

43

Reducing the Non-Data Ink

(Few, 2006)

When the non-data ink is removed or reduced, the data become more manifest and it is easier to find the trending or pattern among them.

Page 44: Bussiness intelligence 2011

44

Emphasize Most Important Data

(Few, 2006)

Different degrees of visual emphasis are associated with different regions of a dashboard.

The information in the center results in the emphasis only when it is set apart from what surrounds it.

Recent data often deserve display with smaller timing scale than remote history data.

Visual attributes, such as color, size, line width, enclosure, and added marked, can also be used to manifest important data.

Page 45: Bussiness intelligence 2011

45

Effective Dashboard Display Media

(Few, 2006)

Easier to spot trend with line chart

Clean display of

related data

Simple symbol

or number

Page 46: Bussiness intelligence 2011

46

(Few, 2006)

Organize the display objects to reveal their intrinsic relationship

Page 47: Bussiness intelligence 2011

47

Sample Sales Dashboard

(Few, 2006)

Page 48: Bussiness intelligence 2011

48

Add Interactivity to Dashboard

Add selection box so users can focus on a subset of data

Page 49: Bussiness intelligence 2011

49

When Dashboard is not Enough

• As soon as a dashboard shows abnormalities, users will often want to know more details about them.

• The responsible individual can be called to provide the details, who may query the database or ask IT staff to do the query… The process is long and resource consuming.

• Layered reports can provide top-down views: Layer 1: One page dashboard

Layer 2: More detailed aggregation such as regional reports

Layer 3: Data tables with all the details needed

• The data in detail views can be narrowed from top views, which offers a natural analysis flow.

Page 50: Bussiness intelligence 2011

50

References• Few, S. (2006). Information Dashboard Design. Sebastopol, CA: O’Reilly Media.

Page 51: Bussiness intelligence 2011

51

BI and Web AnalyticsSo many data, still so little insights

The reason for so few actionable insights even with abundant web click data:

The clickstream is about “what”, but not “why”.

(Kaushik, 2010)

Page 52: Bussiness intelligence 2011

52

Web Analytics 2.0

(Kaushik, 2010)

Page 53: Bussiness intelligence 2011

53

Web Analytics Tools

(Kaushik, 2010)

Page 54: Bussiness intelligence 2011

54

Metrics for Clickstream Analysis

• Visits and Unique Visitors Using session ID and persistent cookie ID

• Time on Page and Time on Site No leaving time on last page, unless using “unloaded” script.

• Bounce Rate People leave the site without a single click. Useful: bounce rate from top referrers

• Exit Rate Useful only in the middle of “sequential” pages.

• Conversion Rate

(Kaushik, 2010)

Page 55: Bussiness intelligence 2011

55

Top Questions to Answer

• How many visitors to my site?• Where are visitors coming from?

Direct traffic. Referring sites. Search engine: Keywords. Campaign and paid ads.

• What do I want visitors to do on my site?• What visitors are actually doing?

Top entry pages. Top viewed pages. Site overlay analysis (navigation analysis) Abandonment analysis.

(Kaushik, 2010)

Page 56: Bussiness intelligence 2011

56

Typical Analysis Flow

(Kaushik, 2010)

Bounce Rate of Top Search keywordsSearch Keywords: Users’ intentBounce: not happy with findingQ: ranked wrong keyword?Q: landing pages miss info?

Site Overlay (Click Density) Analysis% clicks or conversions

User Behavior:

Also check days to convert

Page 57: Bussiness intelligence 2011

57

Source of Traffic AnalysisWho sends valued traffic?

(Kaushik, 2010)

Page 58: Bussiness intelligence 2011

58

Module Click Analysis

• Pages using same layout template share same modules.

• Click analysis at module level can reveal which modules are outperforming or underperforming.

• Click on link positions within each module can reveal more user behavior pattern.

Many PagesSame Layout

Performance Across Pages?

Page 59: Bussiness intelligence 2011

59

Scroll Percentage for Long Page0-20% Scrolled

30%

20%-40% Scrolled22%

50%-60% Scrolled11%

60%-80% Scrolled9%

80%-100% Scrolled26%

Page 60: Bussiness intelligence 2011

60

Visitor Segmentation

• What/how are they viewing?

• Why do they leave?

• How to engage them more?

• How to connect them?

New Visitors

Casual Visitors

Loyal Visitors

Elapsed Visitors

• Growing the loyal visitors is essential to keep the site thriving.

• So it is important to understand their navigation pattern, what do they like and unlike.

Page 61: Bussiness intelligence 2011

61

Consumption of Content

Page 62: Bussiness intelligence 2011

62

Navigation Flow Among Top Pages/Content

(Adobe.com)

Page 63: Bussiness intelligence 2011

63

Navigation Flow to a Page

(Adobe.com)

Page 64: Bussiness intelligence 2011

64

Navigation Flow from a Page

(Adobe.com)

Page 65: Bussiness intelligence 2011

65

Markov Chain AnalysisGrouping Page Views for Behavior Analysis

(Gwizdka, 2010)

Page 66: Bussiness intelligence 2011

66

Factors Influencing Satisfaction for Information Retrieval

• System Effectiveness Measures how well a given IR system achieves it objective.

Precision (relevant documents retrieved /total retrieved documents)

Recall (relevant documents retrieved / total relevant documents in database)

• User Effectiveness Measures accuracy and completeness with which users achieve certain goals.

Number of tasks successfully completed

Number of relevant documents obtained

Time taken by users to complete set tasks

• User Effort Measures users’ effort to get relevant information.

Number of Clicks

Number of queries and queries reformulation

Rank position accessed

(Al-Maskari, 2010)

Page 67: Bussiness intelligence 2011

67

See Users’ Experienceby Visual Replay of HTML Steam

http://www.tealeaf.com/products/real-time-customer-experience-management.phpAccessed on Dec 6th, 2011

Tealeaf is one of tools to record all the dynamically generated HTML at the network level and store it for later searching and visual replay.

Page 68: Bussiness intelligence 2011

68

See Users’ Joy and Tearby Visual Replay of What Users Saw and Their Actions

Such case study can help to understand the reasons behind the summarized numbers.

http://www.tealeaf.com/products/real-time-customer-experience-management.php Accessed on Dec 6th, 2011

Page 69: Bussiness intelligence 2011

69

Web DetectiveSolve the web mysteries

Post

Third Party Payment System received payment for one candy, forwarded the user to application server to receive a receipt.

ReceiptServer stored the order as two candies, and print a receipt of two candies.

Valid

Page 70: Bussiness intelligence 2011

70

Web DetectiveReplaying web session can reveal true culprit

Open Tab 1 and add one candy.

Time

10:00

10:05 Open Tab 2 and add second candy.

10:10 Submit Tab 1.

10:10Receive the payment in Tab 1.

10:11 Process the order in Tab 2.

Page 71: Bussiness intelligence 2011

71

References• Adobe training video. Retrieved from https://outv.omniture.com/.

• Al-Maskari, A. and Sanderson, M. (2010). A review of factors influencing user satisfaction in information retrieval. Journal of the American Society for Information Science and Technology, 61: 859–868. Doi: 10.1002/asi.21300

• Gwizdka, J. (2010). Distribution of cognitive load in Web search. Journal of the American Society for Information Science and Technology, 61: 2167–2187. DOI: 10.1002/asi.21385

• Kaushik, A. (2010). Web Analytics 2.0. Indianapolis, IN: Wiley Publishing.

Page 72: Bussiness intelligence 2011

72

BI and Social Network

• Social Networks, such as LinkedIn, Facebook, and Twitter, are becoming important means for people, including scientists, to share information, though academic world had been slow to utilize social network. (Curry, 2009)

• Capability to extract the tremendous, unstructured, time-sensitive information is becoming increasing important for business analysis.

• The recent development of literature-based scientific social networks is promising Sites

BioMedExperts UniPHY

Unique for research world Preloaded professional profiles based on publications. Preloaded networking based on co-authorship analysis. Periodically sending publication updates in each user’s network.

• The effective ways to analyze the content on social network and promote scientists’ contribution on social network are still need to be developed.

Page 73: Bussiness intelligence 2011

73

Effectiveness of Scientific Social Networks

• Academic social networks will soon be out of favor if it cannot help scientists effectively.• We need to study weather such network can improve scientists’ research productivity,

increase collaboration among scientists, as well as increase the traffic to scientific content web sites. Statistical analysis based on user’s profiles on the site. Web analytics using tools like Google Analytics. Scenarios analysis using session capture tools like Tealeaf. Traditional usability test using tools like Morae. Survey.

• Linking user’s activities on academic social networks, profiles on professional member societies and clicking streaming on academic content sites can help to understand and server each user efficiently. Organize the order of contents to user’s long/short term interest. Recommend relevant events, such as academic forum/seminar, industrious shows. Let users promote academic contents or events interesting to them via social network.

Page 74: Bussiness intelligence 2011

74

Building Users’ Expert Profile Based on Concepts in Publications

(Gunter, 2009)

Document fingerprints aggregated to expert profiles

Page 75: Bussiness intelligence 2011

75

Motivating Contribution in Social Media

• Social Learning People learn by observation in social situations, and that they will begin to act like people they

observe even without external incentives. (Bandura, 1977). Social sites can make it easy for users to observe the behaviors of active users.

• Feedback Theories of reciprocity (Cialdini, 1984;Gouldner, 1960), reinforcement (Ferster, 1957) and the need

to belong (Baumeister, 1995) all suggest that feedback from other users should predict long-term participation of the social media users.

Site design and its backend technologies can bring users convenience to tag and comment

• Distribution Reputation is a common motivation for participation in many online environments.

Competitive motivations in the form of reputation and status attainment have been cited as a primary incentive for continued participation for open-source software. (Hertel, 2003)

Bloggers cite the intent to affect their professional reputation as being among their top motivations for blogging. (Marlow, 2006).

Promoting active users and distributing their influence is the effective social currency to ‘bribe’ key contributing users.

Page 76: Bussiness intelligence 2011

76

Case Study at Facebook: Motivating Newcomer Contribution

• Measures Dependent variable

The number of photos uploaded by the newcomers between their third and fifteenth weeks on the site.

Independent variables Learning – the number of photo-uploading stories the newcomers saw in

their News Feeds during their first two weeks. Singling out – whether the newcomer was tagged in a photo during his or

her first two weeks. Feedback – whether the newcomer received any comments on his or her

initial photos during the first two weeks. Distribution – the number of News Feed stories shown to friends about

the newcomer’s photos.(Burke, 2009)

Page 77: Bussiness intelligence 2011

77

Result of Case Study at Facebook: Motivating Newcomer Contribution

• “Design elements which facilitate learning from friends, singling out, feedback, and content distribution can help increase the level of engagement for new users, leading to further content contributions and an overall better user experience.

• “The most consistent result we found was for learning from friends. An increase in visible photo activity was always predictive of increased newcomer contribution.”

• “Designers of social networking sites should also find ways to support newcomers with varying behavioral patterns.”

“For newcomers who are active, highlighting opportunities for others to leave them feedback and allowing the newcomers to increase the size of their audience may be particularly effective.”

“For newcomers who are relatively inactive, designers might want to encourage their friends to pay more attention to them, whether through singling out in a public fashion or sending more directed private communication.”

(Burke, 2009)

Page 78: Bussiness intelligence 2011

78

References• Bandura, A. (1977). Social Learning Theory. New York, NY: General Learning Press.

• Baumeister, R. & Leary, M. (1995). The need to belong: desire for interpersonal attachments as a fundamental human motivation. Psychological Bulletin, 117(3), 497-529.

• Burke, M., Marlow, C. & Lento, T. (2009). Feed me: motivating newcomer contribution in social network sites. Proceedings of the 27 th international conference on human factors in computing systems (pp. 945-954). Boston, MA: ACM Press.

• Cialdini, R.B. (1984). Influence. New York, NY: William Marrow and Company.

• Curry, R., Kiddle, C. and Simmonds, R. (2009). Social networking and scientific gateways. Proceedings of the 5th Grid Computing Environments Workshop. Doi: 10.1145/1658260.158266.

• Gouldner, A. (1960). The norm of reciprocity: A preminary statement. American Sociological Review, 25(2), 161-178.

• Ferster, C. & Skinner, B. (1957). Schedules of Reinforcement. New York, NY: Appleton-Century-Corfts.• Gunter, D. (2009). Semantic Search. Bulletin of the American Society for Information Science and Technology, 36: 36-37.

• Gunter, D. (2009). Semantic Search. Bulletin of the American Society for Information Science and Technology, 36: 36-37.

• Hertel, G., Niedner, S. & Herrmann. S. (2003). Motivation of software developers in open source projects: An internet-based survey of contributiors to the linux kernel. Research Policy, 32(7), 1159-1177.

• Marlow, C. (2006). Linking without thinking: Weblogs readership and online social capital formation. In Proceedings of the International Communication Association, Dresden, Germany.

Page 79: Bussiness intelligence 2011

79

Semantic Technologies,BI and Just-in-Time Discovery

• “Discoverability requires the ability to recall related historical data so that an arriving piece of data can find its place, similar to the way each jigsaw puzzle piece is assessed relative to a work-in-progress puzzle.” (Jonas, 2009)

• Directories for enterprise-wide discoverability Context-less directories

Basic directories to locate information

Semantically reconciled directories Concepts with similar meanings are bundled together

Semantically reconciled and relationship-aware directories. Information are linked together in Context

Context-based discovery

• Academic publishers can organize the factors and activities of their subscribers, users and authors in a way to be easily pulled together, and put new information into the context to assist business discovery.

Page 80: Bussiness intelligence 2011

80

Semantic Web – Linked Data

(Berners-Lee, 2001)

Relational database is too strict to catch the dynamic relationship. New fields and new relationship need to be added to the database all the times, which is not efficient.

Graphical database is designed to store the dynamic relationship with simple and flexible schema. Here are some open source examples:Sesame (http://openrdf.org)Jena (http://jena.sourceforge.net)AllegroGraph (http://agraph.franz.com)Neo4J (http://neo4j.org)

(Segaran, 2009)

Page 81: Bussiness intelligence 2011

81

Semantic Web ElementsURI, RDF, Ontology

Gene 1 Modify Gene 2

Gene 2 Affect Disease A

Gene 1 May Affect Disease A

URIUniversal Resource Identifier• Specify an entity• Identical, exchangeable in different

documents

RDF Resource Description Framework• Subject – Predicate – Object (Triples)• Express the relationship between entities

Ontology• Collection of URI, RDF• Collection of inferring rules

Page 82: Bussiness intelligence 2011

82

Dublin Core Metadata InitiativeThe Dublin Core is a set of predefined properties for describing documents.

The following example demonstrates the use of some of the Dublin Core properties in an RDF document:

<?xml version="1.0"?><!DOCTYPE rdf:RDF PUBLIC "-//DUBLIN CORE//DCMES DTD 2002/07/31//EN" "http://dublincore.org/documents/2002/07/31/dcmes-xml/dcmes-xml-dtd.dtd"><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc ="http://purl.org/dc/elements/1.1/"> <rdf:Description rdf:about="http://dublincore.org/"> <dc:title>Dublin Core Metadata Initiative - Home Page</dc:title> <dc:description>The Dublin Core Metadata Initiative Web site.</dc:description> <dc:date>2001-01-16</dc:date> <dc:format>text/html</dc:format> <dc:language>en</dc:language> <dc:contributor>The Dublin Core Metadata Initiative</dc:contributor> <!-- guesses for the translation of the above titles --> <dc:title xml:lang="fr">L'Initiative de métadonnées du Dublin Core</dc:title> <dc:title xml:lang="de">der Dublin-Core Metadata-Diskussionen</dc:title> </rdf:Description></rdf:RDF>

Page 83: Bussiness intelligence 2011

83

Semantic ToolsRDFS, OWL, SPARQL

(Shadbolt, 2006)

<rdfs:Class rdf:ID="animal" />

<rdfs:Class rdf:ID="horse"> <rdfs:subClassOf rdf:resource="#animal"/></rdfs:Class>

RDFSRDF Schema • RDFS is an extension to RDF• Provides the framework to describe application-specific classes

and properties

Class(a:cat_owner complete intersectionOf(a:person restriction(a:has_pet someValuesFrom (a:cat))))SubPropertyOf(a:has_pet a:likes) Class(a:cat_liker complete intersectionOf(a:person restriction(a:likes someValuesFrom (a:cat))))

• Cat owners have cats as pets.• has pet is a subproperty of likes, so anything that has a pet

must like that pet.=> Cat owners must like a cat.OWL

Web Ontology Language • A family of knowledge representation languages for authoring

ontologies• Express and Process information on the web

PREFIX abc: <http://example.com/exampleOntology#>SELECT ?capital ?country WHERE { ?x abc:cityname ?capital ; abc:isCapitalOf ?y . ?y abc:countryname ?country ; abc:isInContinent abc:Africa . }

What are all the country capitals in Africa?

SPARQLA RDF query language

Page 84: Bussiness intelligence 2011

84

Linked Data for STM Publication

R. Arlen Price

Faculty

An obesity-related locus in chromosome region 12q23-24

Diabetes

Author

Subscribe

Read

American Diabetes Association

Publication

National Institutes of Health

Funding

Research InterestGenetics of Complex Traits, Genetics of Obesity, Behavioral Genetics, Genetic Epidemiology

Faculty Profile

Research TechniquesLinkage mapping, linkage disequilibrium association analyses, and gene expression profiling

Profile

Research Strength Ding Li

Author

Student

Attend EventsProposal

Review

Linking data helps to server each researcher’s need better.

Page 85: Bussiness intelligence 2011

85

Semantic Publishing – Integrate Data in Academic Journals

(Serinhaus, 2007)

Publish machine-readable summary information in XML along with the article.

BI system can retrieve and organize the meta data.

Page 86: Bussiness intelligence 2011

86

Semantic Publishing – Semantic Enhancement to Research Articles

(Shotton, 2009)

The relevant data can be linked together online.

BI system can help to retrieve and organize the relationship and data.

Page 87: Bussiness intelligence 2011

87

BI and E-ScienceResearch is becoming more data-driven, often require to link data in large scale.

BI can trace the location of data sources, understand the relationship of these academic databases, and provide user with corresponding data services.

(Luciano, 2007)Multiple pathway databases are linked to construct the human insulin signaling pathway.

Page 88: Bussiness intelligence 2011

88

References• Berners-Lee, T.,Hendler, J. and Lassila, O.(2001)The Semantic Web. Scientific American, 284(5), 28–37.

• Jonas, J. & Sokol, L. (2009). Data finds data. In Segaran, T. & Hammerbacher, J. (Eds.). Beautiful Data, chapter 7 . Sebastopol, CA: O’Reilly Media.

• Luciano, J. and Stevens, R. (2007). e-Science and biological pathway semantics. BMC Bioinformatics, 8(Suppl 3): S3. doi: 10.1186/1471-2105-8-S3-S3.

• Segaran, T. (2009). Connecting data. In Segaran, T. & Hammerbacher, J. (Eds.). Beautiful Data, chapter 20. Sebastopol, CA: O’Reilly Media.

• Seringhaus, M. and Gerstein, M. (2007). Publishing perishing? Towards tomorrow's information architecture. BMC Bioinformatics, 8:17. doi: 10.1186/1471-2105-8-17.

• Shadbolt, N., Berners-Lee, T., and Hall, W. (2006). The Semantic Web Revisited. IEEE Intelligent Systems 21(3): 96–101.

• Shotton, D., Portwin, K., Klyne, G., and Miles, A. (2009). Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article. PLoS Comput Biol ,5(4): e1000361. doi: 10.1371/journal.pcbi.1000361.

Page 89: Bussiness intelligence 2011

89

BI and Algorithm by exampleIn US and Canada:A list of hospitalsA list of medical groupsAll with (latitude, longitude)

How to find the nearby hospitals (within 1 mile) for each medical group?

It is too time-consuming to calculate the distance of all combination.

We need to limit candidates before calculation.

Simple spherical law of cosines formula to calculate distance:

d = acos(sin(lat1).sin(lat2)+cos(lat1).cos(lat2).cos(long2−long1)).R

where R is earth’s radius (mean radius = 6,371km)

Page 90: Bussiness intelligence 2011

90

Can We Find a Key?Database is efficient to judge two things are same, but we need to find a key.

The distance of two points 1apart at equator:69.1 mile mile

NYC lat:40.714623, long:-74.006605 Round(latitude * 70)=2850Round(longitude * 70)=-5180How about using Key ‘2850_-5180’?

The distance of two points 1apart in longitude depends on latitude, but it will be less than 70 miles, so the above key is sufficient.

Simple spherical law of cosines formula to calculate distance:

d = acos(sin(lat1).sin(lat2)+cos(lat1).cos(lat2).cos(long2−long1)).R

where R is earth’s radius (mean radius = 6,371km, or 3,959mi)

Page 91: Bussiness intelligence 2011

91

Boundary ConditionThe two points within a square will be less than 1 mile away, but how about the points across the adjacent squares?

We produce 9 keys of adjacent squares for one group (say hospital), then compare them with another group (say medical groups).

This also solve the boundary problem of the transition point of longitude: it is adjacent from 89.99 to – 89.99

2851,-5181

2851,-5180

2851,-5179

2850,-5181

2850,-5180

2850,-5179

2849,-5181

2849,-5180

2849,-5179

Page 92: Bussiness intelligence 2011

92

References• Movable Type Ltd. Calculate distance, bearing and more between Latitude/Longitude points. Retrieved from http://www.movable-

type.co.uk/scripts/latlong.html

Page 93: Bussiness intelligence 2011

93

Future Plan

• More on Data Mining• More on Data Modeling• BI and User Experience• BI and Predictive Analysis• BI and Technology Intelligence

Page 94: Bussiness intelligence 2011

94

Thank You

• Please send your comment, suggestion and discussion to [email protected]

• The file will be updated at: http://www.slideshare.net/dingli2/