Estrat search

35
©2012 LHST Search Recovery and Discovery Prof. Lee SCHLENKER E-Stratégies Sept 5 th 2014 - Preliminary Draft - How can you use enterprise technologies to improve apprenticeship?

description

 

Transcript of Estrat search

Page 1: Estrat search

©2012 LHST

Search Recovery and Discovery

Prof. Lee SCHLENKER

E-StratégiesSept 5th 2014

- Preliminary Draft -

How can you use enterprise technologies to improve

apprenticeship?

Page 2: Estrat search

©2012 LHST

THE ANSWER(at least in part)

Focus Improve Knowledge Leverage Mesure

Organization Processes Explicit Transactions Efficiency

Services Delivery Implicit Interactions Effectiveness

Networks Relationships Emerging Interactions Innovation

Search Relevancy Connected Proximity CTR

Page 3: Estrat search

©2012 LHST

Agenda

©2009 LHSTProf. Lee SCHLENKER

Are business solutions anything more than recovering something

something you once knew or discovering something that is « out there » that others can’t

find?

Page 4: Estrat search

©2012 LHST

The World Wide Web

• The size of the indexed world wide web in 2012 - Indexed by Google: about 40 billion pages

• Yahoo deals with 12TB of data per day (according to Ron Brachman)

• Twitter hits 400 million tweets per day (June, 2012. Dick Costolo, CEO at Twitter)

• Over 2.5 billion photos uploaded to Facebook each month (2010. blog.facebook.com)

• 55 Million WordPress Sites in the World

http://www.worldwidewebsize.com/

Page 5: Estrat search

©2012 LHST

How important is « Search »?

• Search is the attempt to make sense of information

• As the amount of information explodes, search has become the user’s interface metaphor.

• Twenty percent of searches are for entertainment, 15 percent are commercial in nature, and 65 percent are informational

• On the Internet, all intent is commercial in form or another

The perfect search engine," says Google co-founder Larry Page, "would understand exactly what you mean and give back exactly what you want."

Prof. Lee SCHLENKER

Page 6: Estrat search

©2012 LHST

August 2002: How Google beat Amazon

• Pourquoi Paul Ford fait un lien entre la recherche de “meaning” et le “Semantic Web” ?

• Comment définir le “The New Economy.” Cette notion a-t-elle un sens aujourd’hui ?

• L’auteur compare Google à Amazon et EBay. Pourquoi le modèle de gestion (« business model ») de ce dernier est menacé aujourd’hui ?

• Quelles sont les différences entre les notions de « web search » et d’« entreprise search » ?

• Analysez la faisabilité aujourd’hui de sa notion de “personal agent” ?

Page 7: Estrat search

©2012 LHST

Three types of search

• Web search applies search technology to documents on the open web, and 

• Desktop search applies search technology to the content on a single computer.

• Enterprise search involves making diverse content searchable for a defined audience.With Search you won’t ever have to

leave your house or open a physical book…

Prof. Lee SCHLENKER

Eric Borboen

Page 8: Estrat search

©2012 LHST

Types of search engines

• Text-based (Bing, Google, Yahoo!).Search by keywords. Limited search using queries in natural language.

• Multimedia (QBIC, WebSeek, SaFe)Search by visual appearance (shapes, colors,… ).

• Question answering systems (Ask, NSIR, Answerbus). Search in (restricted) natural language

• Clustering systems (Vivísimo/IBM, Clusty)• Research systems (Lemur/MIT, Nutch)

Page 9: Estrat search

©2012 LHST

Multi-staged process

• Crawl the set of documents to to skim the keywords from their contents,

• Indexing the buzzwords (foam) in a semi-structured form, and

• Resolving user entries/queries to return mostly relevant results

Prof. Lee SCHLENKER

Robert Korfhage

Page 10: Estrat search

©2012 LHST

Major information retrieval models

• Boolean• Vector• Probabilistic• Fuzzy retrieval• Language modeling• Latent semantic indexing

Page 11: Estrat search

©2012 LHST

Crawl

• The first step in classifying web pages is to find an ‘index item’ that might relate expressly to the ‘search term.’

• These days, a continuous crawl method is employed as opposed to an incidental discovery based on a seed list.

• Most search engines use sophisticated scheduling algorithms to “decide” when to revisit a particular page, to appeal to its relevance.

• The speed of the web server running the page as well as resource constraints like amount of hardware or bandwidth also figure in.

With Search you won’t ever have to leave your house or open a physical book…

Prof. Lee SCHLENKER

Page 12: Estrat search

©2012 LHST

Database Search (Indexing Engines)

• Searching for text-based content in structured data formats (databases, XML, CSV etc.) presents a special challenges

• Databases allow logical queries which full-text search doesn't (use of multi-field boolean logic for instance).

• There is no crawling necessary for a database since the data is already structured.

• Databases are slow when solving complex queries or using customize indexing formats (compounding, normalization, transformation, transliteration, etc.)

Prof. Lee SCHLENKER

Page 13: Estrat search

©2012 LHST

Entreprise Search

• Content Ingestion – push or pull content collection

• Content processing and analysis – normalizing content

• Indexing - dictionary of all unique words , ranking and frequency

• Query parsing – user entries, multiple dimensional filters and paging information

• Matching – comparing the query to the stored index

Prof. Lee SCHLENKER

Page 14: Estrat search

©2012 LHST

Company Overview

Profitability Profit Margin (ttm): 27.48%Operating Margin (ttm): 32.45%

Management EffectivenessReturn on Assets (ttm): 15.21%Return on Equity (ttm): 22.36%

Income StatementRevenue (ttm): 13.43BRevenue Per Share (ttm):

43.676

Qtrly Revenue Growth (yoy):

57.70%

Gross Profit (ttm): 6.38B

Internet users spend about 15 million hours a month on the site. Nearly four out of five Internet searches happen on Google or on sites that license its technology

Prof. Lee SCHLENKER

Page 15: Estrat search

©2012 LHST

January 1996-December 1997 – Sergey Brin and Larry Page create BackRub, the precursor to the Google search engine.

Sept. 7, 1998 - Google is incorporated and takes up residence in a Menlo Park, California, garage with four employees

September-October 2002 - Google rolls out its keyword advertising program worldwide based on the GoTo.com model

March-April 2002 - Google launches a beta version of Google News May-June 2003 - Google launches AdSense, an advertising program that delivers ads based on the content of Web sites

15

History

Google is the fastest growing company ever – 400 000 percent revenue growth in five years.

Prof. Lee SCHLENKER

Page 16: Estrat search

©2012 LHST

“To organize the world's information and make it universally accessible and useful"

« You Can Make Money Without Doing Evil »

“You Can Be Serious Without a Suit »

« No Pop Up Ads » 

16

Vision Statement

Larry Page : “I’m not a big believer in strategy”

Prof. Lee SCHLENKER

Page 17: Estrat search

©2012 LHST

What made Google Google?

PageRank algorithm looks at the links on a page, the anchor text around those links, and the popularity of the pages that link to another page for relevance

Google has 175,000 computers dedicated to the job of crawling, more than all computers on earth in the early 70’s

Google developed its own OS on top of its servers, unique approach to designing, cooling and stacking the components

Prof. Lee SCHLENKER

Page 18: Estrat search

©2012 LHST

Don’t be evil

“Being a different kind of company" encompasses more than the products we make and the business we're building; it means making sure that our core values inform our conduct in all aspects of our lives as Google employees. “

I. Serving our Users  II. Respecting Each Other III. Avoiding Conflicts of Interest IV. Preserving Confidentiality V. Maintaining Books and RecordsVI. Protecting Google's AssetsVII. Obeying the Law VIII. Using our Code

Google tracks what products you shop for, the mail you send, which phrases you research in a book, which satellite photos and news stories you view,…

Prof. Lee SCHLENKER

Page 19: Estrat search

©2012 LHST

Search as an artifact

Giving a different meaning to the concept of « Portal »

Prof. Lee SCHLENKER

Page 20: Estrat search

©2012 LHST

AdWords

• You create your ads• Your ads appear on

Google • You attract customers • You're charged only if

someone clicks your ad, not when your ad is displayed.

©2007 LHSTProf. Lee SCHLENKER

Page 21: Estrat search

©2012 LHST

AdSense

Automatically crawls the content of your pages and delivers ads (you can choose both text or image ads) that are relevant to your audience and your site

©2007 LHSTProf. Lee SCHLENKER

Page 22: Estrat search

©2012 LHST

Google AppsGmail -- Offer custom email addresses to your organization with up to 25 gigabytes of storage for each account, search tools to help people find information fast, plus instant messaging and calendar tools built right into the email interface. Google Talk -- Your users can call or send instant messages to their contacts for free -- anytime, anywhere in the world. File sharing and voicemail is included, too. Google Calendar -- Your users can organize their schedules and share events, meetings and entire calendars with others. Your organization can also publish calendars and events on the web. Google Docs -- Your users can create documents, spreadsheets and presentations and collaborate with each other in real-time right inside a web browser window.The Start Page -- A central place for your users to preview their inboxes and calendars, access your essential content, and search the web. Google Page Creator -- Create and publish web pages for your domain quickly and easily with this what-you-see-is-what-you-get page design tool.

Prof. Lee SCHLENKER

Page 23: Estrat search

©2012 LHST

Stepping off line

• Google continues to bet on centralized servers and thin clients. That's why they are spending $600 million to build a new data center in North Carolina - the purpose is to provide 100% uptime for business applications..

• Google built its web office suite via acquisitions. The startups they have acquired are: Gtalkr (instant messaging), Writely (word processing), iRows (spreadsheets), JotSpot (wiki), Tonic Systems (presentations), and Zenter (presentations).

• Google, whose web office solutions are based on AJAX, has a clear online office strategy among the big companies. In order to provide offline capabilities Google developed Google Gears, which is a set of browser plugins and Javascript libraries that enable AJAX applications to run offline.

Prof. Lee SCHLENKER

Page 24: Estrat search

©2012 LHST

Social Media• Google plans to begin introducing a common

set of standards (Open Social) to allow software developers to write programs for Google’s social network, Orkut, as well as others, including LinkedIn, hi5, Friendster, Plaxo, Ning as well as Salesforce and Oracle.

• Google can benefit from their success, in part, by selling advertising on those sites, in part by incorporating social media functions inside their own applications

• Google said it has advertising relationships with several social networks (including Facebook), and $900 million partnership to sell ads on MySpace.

Prof. Lee SCHLENKER

Page 25: Estrat search

©2012 LHST

Structuring the world’s information

• An application to handle all the information, browser – Chrome

• Internet - Support net neutrality initiatives

• Mobile OS -  Android as an open platform

• Mobile Device - Nexus

Page 26: Estrat search

©2012 LHST

What is Google trying to do?

• Vic Gundotra, « Google's mobile moves are driven by one objective: pushing the industry to open up”

• The phones sold on the Google website will all be available unlocked.

• Google doesn't want to compete with other companies offering handsets.

• They want to change the mindset of consumers towards having an open handset that will work with any network any where

Page 27: Estrat search

©2012 LHST

Staying focused

• Constant transformation: from big mainframes to PCs, and from PCs to the Internet

• People increasingly rely on powerful mobile phones instead of PCs to surf the Web

• Online advertising may well lose its role as the Web's primary economic engine

• Recent Google acquisitions include Android, maker of a mobile operating system; GrandCentral, a VOIP operator; and AdMob, a mobile advertising network

• Google has invested heaviy in mapping and location technologies

• Google's mobile strategy isn't hardware--- it's about generating money from its core business: advertising

Sizing up Google's Nexus 10 tablet

Page 28: Estrat search

©2012 LHST

Localisation – the next frontier

• Google's US ad revenue = 15 billion • The size of the US Yellow Pages market is roughly 14 billion.• Jonathan Rosenberg : mobile ads are already a billion Dollar market for Google.• Google owns 97% search marketshare, while offering localized search auto-complete, ads that map to physical locations, and creating a mobile coupon offers network• Google Trusted Stores, Google Wallet, and now Google Local Delivery

Prof. Lee SCHLENKER

Page 29: Estrat search

©2012 LHST

More than just « local »

Rich content SERP will allow Google to move into:

• Travel search• Paid media (ebooks, music, magazines,

newspapers, videos etc.)• Real estate• Large lead generation markets (like

insurance, mortgage, credit cards, .edu)• Ecommerce search

Page 30: Estrat search

©2012 LHST

What are businesses really looking for?

Web Search Entreprise Search

Validity Popular search + Deep Search

Algorithms Links Semantics

Scope Public pages + Private pages

Type Web pages + Data stores

Concerns Ranking + Security

Page 31: Estrat search

©2012 LHST

What will Enteprise Search require?

Architecture Issues

Query layer How will people find the data?

Indexing layer What metadata (context) is relevant?

Processing layer How should we interpret the data?

Connector layer How can bring this data “home”?

These are multiple opportunities to add value to the Microsoft platform!

Page 32: Estrat search

©2012 LHST

Digital footprints and clickstreams

• Before the Web we assumed that our digital footprint was as ephemeral as a phone

• Clickstreams can provide a level of intelligence about how people use the Web

• Innovative companies have figured out how to deliver great Web-based services by divining clickstream patterns

• We have yet to aggregate the critical mass of clickstreams in a database of intentions

Prof. Lee SCHLENKER

Page 33: Estrat search

©2012 LHST

The power of blogs

• Blogs are personal statements of who they are and who they wish to be in the searchable world.

• The Blog is an indexable statement of individual’s social standing, relationships, interests and history.

• Mass personalization – blogs can become proxies for personal taxonomies

• Intelligent engines will be able to discern patterns among blogs that will provide third order relevance inputs that will help define and return far better search results

John Battelle

Prof. Lee SCHLENKER

Page 34: Estrat search

©2012 LHST

The Semantic Web

• The Web is in the process of becoming the next great computing platform, owned by no-one and used by everyone.

• The telephone, the automobile, the television, the stereo are all part of the network (your dog, your kid)

• By tracking not only what searches you do, but what sites you visit, the engines of the future will be able to build a real-time profile of your interests

• Recovery is everywhere you’ve been before, discovery is everything you may wish to find, but have yet to encounter.

• In the near future we’ll store everything that can be digitalized on one massive platform – the Google grid?

Prof. Lee SCHLENKER

Page 35: Estrat search

©2012 LHST

Why should Search make sense to you?

• It’s what your job in marketing, sales and management is all about

• Decisions are based on judgment and precision

• Search ends with proof of value rather than a empty box

• Enterprise Search is an integral part of BI, Collaboration, ECM, and UC