Flax ovum search-across_the_enterprise

Post on 25-Jun-2015

1.353 views 1 download

Tags:

description

See some common myths, discover the various open source enterprise search packages available and see some case studies on how open source software has helped organisations build effective search.

Transcript of Flax ovum search-across_the_enterprise

Open Source Search for the Enterprise

Charlie HullManaging Director, Flax3rd November 2010OVUM Briefing, Search Across the Enterprise

charlie@flax.co.ukwww.flax.co.uk/blog+44 (0) 8700 118334Twitter: @FlaxSearch

Search engine specialists with decades of experience Developers, innovators and strategists Based in Cambridge, UK Technology agnostic – but open source exponents Recently selected as UK Authorized Partner by Lucid

Imagination Customers include Mydeco, NLA, Durrants Ltd, Financial

Times, MediaMiser, MySkreen, Accenture, University of Cambridge Recently asked to present at British Computer Society

and Lucene Revolution conferences

Who are Flax?

“Open-source software (OSS) is computer software that is available in source code form for which the source code and certain other rights normally reserved for copyright holders are provided under a software license that permits users to study, change, and improve the software. […] Some open source software is available within the public domain” (Wikipedia)

What is open source?

“Open-source software (OSS) is computer software that is available in source code form for which the source code and certain other rights normally reserved for copyright holders are provided under a software license that permits users to study, change, and improve the software. […] Some open source software is available within the public domain” (Wikipedia)

What is open source?

It's the work of amateur developers

Myths about open source

It's the work of amateur developers If I use open source, I have to open up my

software/servers/network to all and sundry

Myths about open source

It's the work of amateur developers If I use open source, I have to open up my

software/servers/network to all and sundry Open source software isn't reliable or scalable

Myths about open source

It's the work of amateur developers If I use open source, I have to open up my

software/servers/network to all and sundry Open source software isn't reliable or scalable It's free

Myths about open source

It's the work of amateur developers If I use open source, I have to open up my

software/servers/network to all and sundry Open source software isn't reliable or scalable It's free It's unsupported

Myths about open source

Open source search software

Apache Lucene and Solr are trademarks of The Apache Software Foundation

- Flexible licensing- Vector space model- Java and other languages- Well known and supported

Apache Lucene and Solr are trademarks of The Apache Software Foundation

Open source search software

Apache Lucene and Solr are trademarks of The Apache Software Foundation

- The successor to Muscat- Bayesian probabilistic ranking- C/C++ with language bindings - Highly accurate & scalable

- Flexible licensing- Vector space model- Java and other languages- Well known and supported

Apache Lucene and Solr are trademarks of The Apache Software Foundation

Open source search software

Apache Lucene and Solr are trademarks of The Apache Software Foundation

- The successor to Muscat- Bayesian probabilistic ranking- C/C++ with language bindings - Highly accurate & scalable

- Flexible licensing- Vector space model- Java and other languages- Well known and supported

And more....

Apache Lucene and Solr are trademarks of The Apache Software Foundation

Some exampleshttp://www.nla-clipshare.com

Newspaper Licensing Agency – NLA Clipshare20 million newspaper stories6500 usersContent from every major newspaper (and most regionals)Used by journalists, clippings agencies, media monitorsReplacing internal systems at major newspapers

Some exampleshttp://www.nla-clipshare.com

Newspaper Licensing Agency – NLA Clipshare20 million newspaper stories6500 usersContent from every major newspaper (and most regionals)Used by journalists, clippings agencies, media monitorsReplacing internal systems at major newspapersOne of very few ways to search content from all the papers within hours of publication

Some examples

Financial Times – press cuttingsWeb Service for easy integrationXML source dataFaceted searchArea filters (whole article, body, headline, byline or any combination)Synonyms, spelling suggestions

http://presscuttings.ft.com

Some examples

Financial Times – press cuttingsWeb Service for easy integrationXML source dataFaceted searchArea filters (whole article, body, headline, byline or any combination)Synonyms, spelling suggestionsBuilt from scratch in a fortnightDesigned as a prototype, scaled to production use without significant change

http://presscuttings.ft.com

Some examples

Durrants Ltd. Media monitoring platformThousands of client search profiles Hundreds of thousands of articles per dayComplex publication heirarchyEstablished pipeline

SolutionFlexible query language allows OCR errors, punctuation, fuzzy matching, weightingSupports features of previous engineScalable master-slave architecture

Some examples

Durrants Ltd. Media monitoring platformThousands of client search profiles Hundreds of thousands of articles per dayComplex publication heirarchyEstablished pipeline

SolutionFlexible query language allows OCR errors, punctuation, fuzzy matching, weightingSupports features of previous engineScalable master-slave architecture

Accuracy improved in some cases from 95% rejected to 95% accepted Hardware budget 15% of previous system

Some examples

(Unnamed multinational radio suppliers) Intranet search12 million documentsMultiple formats – Office, PDF, HTML...User and group-based security (LDAP)Faceted searchUsers can 'tag' interesting documents – for

example to identify a 'reference' version

Some examples

(Unnamed multinational radio suppliers) Intranet search12 million documentsMultiple formats – Office, PDF, HTML...User and group-based security (LDAP)Faceted searchUsers can 'tag' interesting documents – for

example to identify a 'reference' versionOpen source chosen because of significant

cost advantage – commercial solutions uneconomic at this scale

A look at Lucene & Solr

Among the top 15 open source projects Installations at over 4,000 companies Downloads have grown nearly 10x over the past three

years Over 7,000 downloads a day.

A look at Lucene & Solr

Among the top 15 open source projects Installations at over 4,000 companies Downloads have grown nearly 10x over the past three

years Over 7,000 downloads a day.

USA based Employs 9 out of 15 top Lucene committers Offers training, consulting and up to 24x7

support Developing value-add software

A look at Lucene & Solr

Among the top 15 open source projects Installations at over 4,000 companies Downloads have grown nearly 10x over the past three

years Over 7,000 downloads a day.

USA based Employs 9 out of 15 top Lucene committers Offers training, consulting and up to 24x7

support Developing value-add software Flax are UK partners & resellers

Lucid Works Enterprise

Who are Lucid working with?

Some Lucene & Solr numbers

LinkedIn – 30 million users Internet Archive – a billion indexed pages Salesforce.com – 8 terabytes of searchable data Twitter – a billion queries a day

Why open source search?

Flexible, extendable

Why open source search?

Flexible, extendable Powerful & scalable

Why open source search?

Flexible, extendable Powerful & scalable Lower cost, especially when planning for growth

Why open source search?

Flexible, extendable Powerful & scalable Lower cost, especially when planning for growth Commercial support available as necessary

Why open source search?

Flexible, extendable Powerful & scalable Lower cost, especially when planning for growth Commercial support available as necessary

- Freedom to innovate

Looking to the future

Looking to the future

More and more content including social media

Looking to the future

More and more content including social mediaMultiple delivery platforms

Looking to the future

More and more content including social mediaMultiple delivery platforms Search-powered applications

Looking to the future

More and more content including social mediaMultiple delivery platforms Search-powered applicationsCloud computing

Looking to the future

More and more content including social mediaMultiple delivery platforms Search-powered applicationsCloud computingMore use of entity extraction & sentiment

analysis

Looking to the future

More and more content including social mediaMultiple delivery platforms Search-powered applicationsCloud computingMore use of entity extraction & sentiment analysis

Search no longer a bolt-on, but a platform for innovation

Looking to the future

More and more content including social mediaMultiple delivery platforms Search-powered applicationsCloud computingMore use of entity extraction & sentiment

analysis

Search no longer a bolt-on, but a platform for innovationOpen source no longer an outsider, but the obvious choice

Thankyou!

Any questions?

charlie@flax.co.ukwww.flax.co.uk/blog+44 (0) 8700 118334Twitter: @FlaxSearch