Flax ovum search-across_the_enterprise

44
Open Source Search for the Enterprise Charlie Hull Managing Director, Flax 3 rd November 2010 OVUM Briefing, Search Across the Enterprise [email protected] www.flax.co.uk/blog +44 (0) 8700 118334 Twitter: @FlaxSearch

description

See some common myths, discover the various open source enterprise search packages available and see some case studies on how open source software has helped organisations build effective search.

Transcript of Flax ovum search-across_the_enterprise

Page 1: Flax ovum search-across_the_enterprise

Open Source Search for the Enterprise

Charlie HullManaging Director, Flax3rd November 2010OVUM Briefing, Search Across the Enterprise

[email protected]/blog+44 (0) 8700 118334Twitter: @FlaxSearch

Page 2: Flax ovum search-across_the_enterprise

Search engine specialists with decades of experience Developers, innovators and strategists Based in Cambridge, UK Technology agnostic – but open source exponents Recently selected as UK Authorized Partner by Lucid

Imagination Customers include Mydeco, NLA, Durrants Ltd, Financial

Times, MediaMiser, MySkreen, Accenture, University of Cambridge Recently asked to present at British Computer Society

and Lucene Revolution conferences

Who are Flax?

Page 3: Flax ovum search-across_the_enterprise

“Open-source software (OSS) is computer software that is available in source code form for which the source code and certain other rights normally reserved for copyright holders are provided under a software license that permits users to study, change, and improve the software. […] Some open source software is available within the public domain” (Wikipedia)

What is open source?

Page 4: Flax ovum search-across_the_enterprise

“Open-source software (OSS) is computer software that is available in source code form for which the source code and certain other rights normally reserved for copyright holders are provided under a software license that permits users to study, change, and improve the software. […] Some open source software is available within the public domain” (Wikipedia)

What is open source?

Page 5: Flax ovum search-across_the_enterprise

It's the work of amateur developers

Myths about open source

Page 6: Flax ovum search-across_the_enterprise

It's the work of amateur developers If I use open source, I have to open up my

software/servers/network to all and sundry

Myths about open source

Page 7: Flax ovum search-across_the_enterprise

It's the work of amateur developers If I use open source, I have to open up my

software/servers/network to all and sundry Open source software isn't reliable or scalable

Myths about open source

Page 8: Flax ovum search-across_the_enterprise

It's the work of amateur developers If I use open source, I have to open up my

software/servers/network to all and sundry Open source software isn't reliable or scalable It's free

Myths about open source

Page 9: Flax ovum search-across_the_enterprise

It's the work of amateur developers If I use open source, I have to open up my

software/servers/network to all and sundry Open source software isn't reliable or scalable It's free It's unsupported

Myths about open source

Page 10: Flax ovum search-across_the_enterprise

Open source search software

Apache Lucene and Solr are trademarks of The Apache Software Foundation

- Flexible licensing- Vector space model- Java and other languages- Well known and supported

Apache Lucene and Solr are trademarks of The Apache Software Foundation

Page 11: Flax ovum search-across_the_enterprise

Open source search software

Apache Lucene and Solr are trademarks of The Apache Software Foundation

- The successor to Muscat- Bayesian probabilistic ranking- C/C++ with language bindings - Highly accurate & scalable

- Flexible licensing- Vector space model- Java and other languages- Well known and supported

Apache Lucene and Solr are trademarks of The Apache Software Foundation

Page 12: Flax ovum search-across_the_enterprise

Open source search software

Apache Lucene and Solr are trademarks of The Apache Software Foundation

- The successor to Muscat- Bayesian probabilistic ranking- C/C++ with language bindings - Highly accurate & scalable

- Flexible licensing- Vector space model- Java and other languages- Well known and supported

And more....

Apache Lucene and Solr are trademarks of The Apache Software Foundation

Page 13: Flax ovum search-across_the_enterprise

Some exampleshttp://www.nla-clipshare.com

Newspaper Licensing Agency – NLA Clipshare20 million newspaper stories6500 usersContent from every major newspaper (and most regionals)Used by journalists, clippings agencies, media monitorsReplacing internal systems at major newspapers

Page 14: Flax ovum search-across_the_enterprise

Some exampleshttp://www.nla-clipshare.com

Newspaper Licensing Agency – NLA Clipshare20 million newspaper stories6500 usersContent from every major newspaper (and most regionals)Used by journalists, clippings agencies, media monitorsReplacing internal systems at major newspapersOne of very few ways to search content from all the papers within hours of publication

Page 15: Flax ovum search-across_the_enterprise
Page 16: Flax ovum search-across_the_enterprise
Page 17: Flax ovum search-across_the_enterprise
Page 18: Flax ovum search-across_the_enterprise

Some examples

Financial Times – press cuttingsWeb Service for easy integrationXML source dataFaceted searchArea filters (whole article, body, headline, byline or any combination)Synonyms, spelling suggestions

http://presscuttings.ft.com

Page 19: Flax ovum search-across_the_enterprise

Some examples

Financial Times – press cuttingsWeb Service for easy integrationXML source dataFaceted searchArea filters (whole article, body, headline, byline or any combination)Synonyms, spelling suggestionsBuilt from scratch in a fortnightDesigned as a prototype, scaled to production use without significant change

http://presscuttings.ft.com

Page 20: Flax ovum search-across_the_enterprise
Page 21: Flax ovum search-across_the_enterprise

Some examples

Durrants Ltd. Media monitoring platformThousands of client search profiles Hundreds of thousands of articles per dayComplex publication heirarchyEstablished pipeline

SolutionFlexible query language allows OCR errors, punctuation, fuzzy matching, weightingSupports features of previous engineScalable master-slave architecture

Page 22: Flax ovum search-across_the_enterprise

Some examples

Durrants Ltd. Media monitoring platformThousands of client search profiles Hundreds of thousands of articles per dayComplex publication heirarchyEstablished pipeline

SolutionFlexible query language allows OCR errors, punctuation, fuzzy matching, weightingSupports features of previous engineScalable master-slave architecture

Accuracy improved in some cases from 95% rejected to 95% accepted Hardware budget 15% of previous system

Page 23: Flax ovum search-across_the_enterprise

Some examples

(Unnamed multinational radio suppliers) Intranet search12 million documentsMultiple formats – Office, PDF, HTML...User and group-based security (LDAP)Faceted searchUsers can 'tag' interesting documents – for

example to identify a 'reference' version

Page 24: Flax ovum search-across_the_enterprise

Some examples

(Unnamed multinational radio suppliers) Intranet search12 million documentsMultiple formats – Office, PDF, HTML...User and group-based security (LDAP)Faceted searchUsers can 'tag' interesting documents – for

example to identify a 'reference' versionOpen source chosen because of significant

cost advantage – commercial solutions uneconomic at this scale

Page 25: Flax ovum search-across_the_enterprise

A look at Lucene & Solr

Among the top 15 open source projects Installations at over 4,000 companies Downloads have grown nearly 10x over the past three

years Over 7,000 downloads a day.

Page 26: Flax ovum search-across_the_enterprise

A look at Lucene & Solr

Among the top 15 open source projects Installations at over 4,000 companies Downloads have grown nearly 10x over the past three

years Over 7,000 downloads a day.

USA based Employs 9 out of 15 top Lucene committers Offers training, consulting and up to 24x7

support Developing value-add software

Page 27: Flax ovum search-across_the_enterprise

A look at Lucene & Solr

Among the top 15 open source projects Installations at over 4,000 companies Downloads have grown nearly 10x over the past three

years Over 7,000 downloads a day.

USA based Employs 9 out of 15 top Lucene committers Offers training, consulting and up to 24x7

support Developing value-add software Flax are UK partners & resellers

Page 28: Flax ovum search-across_the_enterprise

Lucid Works Enterprise

Page 29: Flax ovum search-across_the_enterprise

Who are Lucid working with?

Page 30: Flax ovum search-across_the_enterprise

Some Lucene & Solr numbers

LinkedIn – 30 million users Internet Archive – a billion indexed pages Salesforce.com – 8 terabytes of searchable data Twitter – a billion queries a day

Page 31: Flax ovum search-across_the_enterprise

Why open source search?

Flexible, extendable

Page 32: Flax ovum search-across_the_enterprise

Why open source search?

Flexible, extendable Powerful & scalable

Page 33: Flax ovum search-across_the_enterprise

Why open source search?

Flexible, extendable Powerful & scalable Lower cost, especially when planning for growth

Page 34: Flax ovum search-across_the_enterprise

Why open source search?

Flexible, extendable Powerful & scalable Lower cost, especially when planning for growth Commercial support available as necessary

Page 35: Flax ovum search-across_the_enterprise

Why open source search?

Flexible, extendable Powerful & scalable Lower cost, especially when planning for growth Commercial support available as necessary

- Freedom to innovate

Page 36: Flax ovum search-across_the_enterprise

Looking to the future

Page 37: Flax ovum search-across_the_enterprise

Looking to the future

More and more content including social media

Page 38: Flax ovum search-across_the_enterprise

Looking to the future

More and more content including social mediaMultiple delivery platforms

Page 39: Flax ovum search-across_the_enterprise

Looking to the future

More and more content including social mediaMultiple delivery platforms Search-powered applications

Page 40: Flax ovum search-across_the_enterprise

Looking to the future

More and more content including social mediaMultiple delivery platforms Search-powered applicationsCloud computing

Page 41: Flax ovum search-across_the_enterprise

Looking to the future

More and more content including social mediaMultiple delivery platforms Search-powered applicationsCloud computingMore use of entity extraction & sentiment

analysis

Page 42: Flax ovum search-across_the_enterprise

Looking to the future

More and more content including social mediaMultiple delivery platforms Search-powered applicationsCloud computingMore use of entity extraction & sentiment analysis

Search no longer a bolt-on, but a platform for innovation

Page 43: Flax ovum search-across_the_enterprise

Looking to the future

More and more content including social mediaMultiple delivery platforms Search-powered applicationsCloud computingMore use of entity extraction & sentiment

analysis

Search no longer a bolt-on, but a platform for innovationOpen source no longer an outsider, but the obvious choice

Page 44: Flax ovum search-across_the_enterprise

Thankyou!

Any questions?

[email protected]/blog+44 (0) 8700 118334Twitter: @FlaxSearch