eGrove Systems - "SOLR" An Apache Product

22
SOLR 777 Washington Road #5 Parlin, NJ 08859 Phone: 732 307 2655 Email: [email protected] - An Apache Product

Transcript of eGrove Systems - "SOLR" An Apache Product

Page 1: eGrove Systems - "SOLR" An Apache Product

SOLR

777 Washington Road #5Parlin, NJ 08859

Phone: 732 307 2655Email: [email protected]

- An Apache Product

Page 2: eGrove Systems - "SOLR" An Apache Product

CONTENTS

INTRODUCTION

FEATURES

FUNCTIONS

ARCHITECTURE

PERFORMANCE

PROs & CONs

FUTURE TRENDS

WEBSITES USING SOLR

2

Page 3: eGrove Systems - "SOLR" An Apache Product

INTRODUCTION

Page 4: eGrove Systems - "SOLR" An Apache Product

INTRODUCTION

• A full text search server based on Lucene• XML/HTTP Interfaces• Loose Schema to define types and fields• Web Administration Interface• Extensive Caching• Index Replication• Extensible Open Architecture• Written in Java5, deployable as a WAR

4

Page 5: eGrove Systems - "SOLR" An Apache Product

5

INTRODUCTION

Page 6: eGrove Systems - "SOLR" An Apache Product

FEATURES

Page 7: eGrove Systems - "SOLR" An Apache Product

• Advanced full – text search.• Optimized for high traffic volume.• Standards based open interfaces – XML, JSON & HTTP• Comprehensive administration interfaces• Near real – time indexing• Extensible plugin architecture• Multiple search indices• Apache UIMA• Rich document parsing• Advanced storage options• Performance optimization

FEATURES

7

Page 8: eGrove Systems - "SOLR" An Apache Product

FUNCTIONS

Page 9: eGrove Systems - "SOLR" An Apache Product

• XML/HTTP and JSON APIs• Hit highlighting• Faceted Search and Filtering• Geospatial Search• Fast Incremental Updates and Index Replication• Caching• Replication• Web administration interface

FUNCTIONS

9

Page 10: eGrove Systems - "SOLR" An Apache Product

ARCHITECTURE

Page 11: eGrove Systems - "SOLR" An Apache Product

ARCHITECTURE

Source : www.xaviermorera.com 11

Page 12: eGrove Systems - "SOLR" An Apache Product

PERFORMANCE

Page 13: eGrove Systems - "SOLR" An Apache Product

Performance Factors

• Schema design• # of indexed fields• omitNorms• Term – vectors• Docvalues

• Configuration• mergeFactor• Caches

• Indexing• Bulk updates• Commit Strategy• Optimize

• Querying

PERFORMANCE

14

Page 14: eGrove Systems - "SOLR" An Apache Product

1. Memory Testing – SOLR response time for 1 million volume index with 8 GB and 32 GB instance.

Source : www.hathitrust.org

PERFORMANCE

15

Page 15: eGrove Systems - "SOLR" An Apache Product

2. SOLR index size analysis for Twitter dataset

Source : www.dzone.com

PERFORMANCE

16

Page 16: eGrove Systems - "SOLR" An Apache Product

PROs & CONs

Page 17: eGrove Systems - "SOLR" An Apache Product

PROS CONS Easy monitoring. Highly Scalable. Fault Tolerant. Flexible and adaptable with

easy configuration. Performance Optimization. Highly Configurable and

user extensible caching. Freely available. Multilingual support. Easy implementation and setup Less resource utilization

A general lack of commitment towards SOLR.

Less attentions on JVM settings & garbage.

Increased latency. Occasional large IO load to

replicate large merges. Complicated load balance and

management. Reconfiguration if the master

is lost.

PROs & CONs

18

Page 18: eGrove Systems - "SOLR" An Apache Product

FUTURE TRENDS

Page 19: eGrove Systems - "SOLR" An Apache Product

• OOTB Simple Faceted Browsing• Automatic Database Indexing• Federated Search– HA with failover

• Alternate output formats (JSON, Ruby)• Highlighter integration• Spellchecker• Alternate APIs (Google Data, OpenSearch)

FUTURE TRENDS

20

Page 20: eGrove Systems - "SOLR" An Apache Product

WEBSITESUSING SOLR

Page 21: eGrove Systems - "SOLR" An Apache Product

• Whitehouse.gov• Buy.com• Cnet• Netflix• Apple• Disney• eTrade• NASA• MTV• Zappos• AOL• Digg

WEBSITES USING SOLR

22

Page 22: eGrove Systems - "SOLR" An Apache Product

Thank You