Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs....

16
November 2006 © 2006 Glenbrook Networks, Inc. Deep Web Search Series A Financing Julia Komissarchik Co-Founder & VP Products Glenbrook Networks

Transcript of Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs....

Page 1: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc.

Deep Web Search

Series A FinancingJulia Komissarchik

Co-Founder & VP ProductsGlenbrook Networks

Page 2: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc. 2

Deep Web vs. Surface Web

Surface Web - This is what search engines (e.g., Google) sift through

Deep Web • Pages revealed only after actions on prior page(s) • Rich in content• Pages often are created on the fly, in response to a specific inquiry • Content is often perishable• More and more of the desired content is stored in Deep Web

Challenge - To harvest desired data from the Surface Web and determine intelligently when and how to go to the Deep Web to collect more

Franchise Locations

Lawyers/CPAs/

DoctorsListings

Country/State/

Federal Government Registrations

Job ListingsOffice

LocationsProfessional Memberships Events Blogs

Surface Web – 12 Billion Pages

Deep Web – 300+ Billion Pages

Page 3: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc. 3

Deep Web Examples l Travel

• United Airlines• Expedia

l Job Search• URS• Express Script

l Store Locator• Wendy’s• H&R Block

l Katrina• Katrina Survivor – Connector List from “Gulf Coast News”

Page 4: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc. 4

Probleml Size - 10-30 times larger than the existing Surface

web covered by Google, Yahoo, MSN and othersl Pages are dynamically generated in response to a

question entered into DHTML forms – no passwords just appropriate questions

l Pages don’t have urls - non-addressable space, no classic search engine link-based page ranking

l Designed for humans not machinesl Long tail – millions of web sites with DHTML forms of

arbitrary structure

Page 5: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc. 5

Solution

l Vertical focus –• Manageable size

• Restricted Semantics

• Pragmatic knowledge

l Sophisticated AI techniques – penetration through forms

l Fact-based page ranking vs. link-based page ranking

Page 6: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc. 6

Prediction - 3-5 year out impactl Web

• Deep Web will become pervasive • Most businesses will have an extensive online presence• No Standardization in view for Facts Representation

l Factual Search will become prevailing• Expectations of answers to a question, not references to a bunch of documents to be

read• Ability to ask questions like

• Find a restaurant in 5 mile radius that serves chicken enchilada under $5 and is open right now

• Find a local plumber that has been in business for more than 10 years, has experience with sprinkle systems and accepts Visa

• Find a business that is within 5 minutes walk of a bus stop for a line that is within 10 minutes of a given location

• Find locations that have the highest increase of job openings in IT industry• Who are the top 10 RBI players in American league over last mont h• What was the apartment rentals trend in San Francisco over the last 6

months

Page 7: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc. 7

So – A Lot Of Business Opportunities

l Successful vertical applications• Business Info/Local Search• Events/Entertainment• Travel• Sports• Health• Job Search

l New Google in breeding – synthesis of multiple factual vertical Deep Web search engines into a massive horizontal search

Page 8: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc.

Glenbrook Networks

Series A FinancingData Collection and Fact Extraction

Via Intelligent Web Trawling

Page 9: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc. 9

l Glenbrook delivers data using intelligent patented search technology capable of extracting targeted information from the web with high precision, efficiency, and speed

l What makes us unique:• Glenbrook trawls the Deep Web, not just crawls the Surface

Web

• Glenbrook’s output is a structured factual data feed rather than a simple list of links to pages of potential interest

• It is not template based (so it is highly scalable)

Glenbrook Networks

Page 10: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc. 10

Application ExamplesGlendor Local Search

Page 11: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc. 11

Application ExamplesGlendor Local Search

Page 12: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc. 12

Application ExamplesGlendor Deep Web Job Search

Page 13: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc. 13

Trend Analysis of Job PostingsCase Study

l A large Wall Street financial institution has approached Glenbrook to help collecting in-depth information about public companies, in particular job postings

l Using Glenbrook Deep Web trawler and Fact Extractor data was collected biweekly directly from public companies websites

l The data feed was used by analysts to perform trend analysis that influenced their recommendations for stock market

Page 14: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc. 14

Public Companies Job Postings Trend Analysis (by Company)

0

200

400

600

800

1000

1200

1400

1600

1800

12/2

8/20

05

1/4/

2006

1/11

/200

6

1/18

/200

6

1/25

/200

6

2/1/

2006

2/8/

2006

2/15

/200

6

2/22

/200

6

3/1/

2006

3/8/

2006

3/15

/200

6

3/22

/200

6

3/29

/200

6

4/5/

2006

4/12

/200

6

4/19

/200

6

4/26

/200

6

5/3/

2006

5/10

/200

6

5/17

/200

6

5/24

/200

6

5/31

/200

6

QUALCOMM Incorporated

Altera CorporationAmerisourceBergen Corporation

Target Corp.

QUALCOMM Incorporated

Dell Inc.

Page 15: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc. 15

Public Companies Job Postings Trend Analysis (by Company and by City)

0

20

40

60

80

100

120

12/2

8/20

05

1/4/

2006

1/11

/200

6

1/18

/200

6

1/25

/200

6

2/1/

2006

2/8/

2006

2/15

/200

6

2/22

/200

6

3/1/

2006

3/8/

2006

3/15

/200

6

3/22

/200

6

3/29

/200

6

4/5/

2006

4/12

/200

6

4/19

/200

6

4/26

/200

6

5/3/

2006

5/10

/200

6

5/17

/200

6

5/24

/200

6

5/31

/200

6

McKesson Ontario, CA

QUALCOMM India

QUALCOMM GermanyURS Corporation Bagdad Iraq

Sun Microsystems Bulington MA

Dell Inc. Cincinnati OH

Page 16: Deep Web Search - INFORMSnymetro.chapter.informs.org/prac_cor_pubs/GlendorNov15.pdf · Deep Web vs. Surface Web Surface Web - This is what search engines (e.g., Google) sift through

November 2006 © 2006 Glenbrook Networks, Inc. 16

Thank you

Julia [email protected] 759 3959