MontySolr:Embedding CPython in Solr
Roman Chyla, [email protected], May 26, 2011
Thursday, May 26, 2011
Why should I care?
- Our challenge is to connect Python and Java- Without compromises- We created MontySolr extension
- Robust, tested (will be used by our system)- But works for any Python application (eg. Django)- And for any C/C++ app that Python understands!- Open source (GPL v2)
- Try it out!- https://github.com/romanchyla/montysolr
2Thursday, May 26, 2011
Outline
‣ Context- The Challenge- Key components
- Available technologies- Our approach- Problems solved
- Evaluation- Wrap-up
3Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research- Switzerland, Geneva
- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide
4Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research- Switzerland, Geneva
- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide
4Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research- Switzerland, Geneva
- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide
4Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research- Switzerland, Geneva
- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide
4Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research- Switzerland, Geneva
- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide
4Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research- Switzerland, Geneva
- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide
4Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research- Switzerland, Geneva
- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide
4Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research- Switzerland, Geneva
- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide
4Thursday, May 26, 2011
SPIRES
- Stanford Linear Accelerator Center - SLAC- High-Energy Physics Literature Database- Started December 1991
- The first web outside Europe/CERN- The first database on web
5Thursday, May 26, 2011
SPIRES
- Stanford Linear Accelerator Center - SLAC- High-Energy Physics Literature Database- Started December 1991
- The first web outside Europe/CERN- The first database on web
5Thursday, May 26, 2011
6Thursday, May 26, 2011
7Thursday, May 26, 2011
Invenio
- Integrated digital library software behind INSPIRE- Used by very large institutional repositories
- http://repositories.webometrics.info/toprep_inst.asp
- Customizable virtual collections- Flexible management of metadata
- 3 000 authors per article
- Powerful search engine- Incl. citation map analysis
- Written in Python (since 2001)- 290 000 lines of code
8Thursday, May 26, 2011
Outline
- Context‣ The Challenge- Key components
- Available technologies- Our approach- Problems solved
- Evaluation- Wrap-up
9Thursday, May 26, 2011
The Challenge
- HEP scientific community- Searches metadata oriented
- However fulltexts are changing the situation- And we want to provide even better service
- Bigger volumes of data- NLP processing- Semantic search
10Thursday, May 26, 2011
The Challenge
11
Invenio
Thursday, May 26, 2011
The Challenge
11
Invenio
Query: supersymmetry AND author:ellis
Thursday, May 26, 2011
The Challenge
11
Invenio
Query: supersymmetry AND author:ellis
fulltext:supersymmetry
Thursday, May 26, 2011
The Challenge
11
Invenio
Query: supersymmetry AND author:ellis
fulltext:supersymmetry
IDs: 1;2;3;9....
Thursday, May 26, 2011
The Challenge
11
Invenio
Query: supersymmetry AND author:ellis
fulltext:supersymmetry
IDs: 1;2;3;9....
Thursday, May 26, 2011
The Challenge
11
Invenio
Query: supersymmetry AND author:ellis
fulltext:supersymmetry
IDs: 1;2;3;9....
Thursday, May 26, 2011
The Challenge
11
Invenio
Query: supersymmetry AND author:ellis
fulltext:supersymmetry
IDs: 1;2;3;9....
Thursday, May 26, 2011
The Challenge
11
Invenio
Query: supersymmetry AND author:ellis
fulltext:supersymmetry
IDs: 1;2;3;9....
1-6M IDs
Thursday, May 26, 2011
The Challenge
11
Invenio
Query: supersymmetry AND author:ellis
fulltext:supersymmetry
IDs: 1;2;3;9....
1-6M IDs
1. only IDs,no score= no ranking
Thursday, May 26, 2011
The Challenge
11
Invenio
Query: supersymmetry AND author:ellis
fulltext:supersymmetry
IDs: 1;2;3;9....
1-6M IDs
1. only IDs,no score= no ranking
2. score merging difficult (if available)
Thursday, May 26, 2011
The Challenge
11
Invenio
Query: supersymmetry AND author:ellis
fulltext:supersymmetry
IDs: 1;2;3;9....
1-6M IDs
1. only IDs,no score= no ranking
2. score merging difficult (if available)
3. push IDs ? (eg._faceting)
Thursday, May 26, 2011
What is the “best” solution?
- We love Python...- ...and our applications are written in Python...
- But what if Solr is the master search engine?- Merge results inside Solr?
- Typical size: 1-10 mil. IDs- Expected latency: 1-2 s.
- What we want to achieve:- Fast transfer of hits from Invenio to Solr- Leverage the power of both (no compromises)- Developer-friendly integration, simplicity
12Thursday, May 26, 2011
Outline
- Context- The Challenge‣ Key components
- Available technologies- Our approach- Evaluation
- Demonstration- Wrap-up
13Thursday, May 26, 2011
To embed Solr (in Java app)
14
- Your app simulates Java web container?- use EmbeddedSolrServer
- It knows nothing about Java servlets?- use DirectConnect class
- Maybe we are too lazy?- Embed the web container (in my case Jetty)- Seemed strange (webserver inside webserver)- ... but it worked well
Thursday, May 26, 2011
To embed Solr (in Java app)
14
- Your app simulates Java web container?- use EmbeddedSolrServer
- It knows nothing about Java servlets?- use DirectConnect class
- Maybe we are too lazy?- Embed the web container (in my case Jetty)- Seemed strange (webserver inside webserver)- ... but it worked well
Thursday, May 26, 2011
To embed Solr (in Java app)
14
- Your app simulates Java web container?- use EmbeddedSolrServer
- It knows nothing about Java servlets?- use DirectConnect class
- Maybe we are too lazy?- Embed the web container (in my case Jetty)- Seemed strange (webserver inside webserver)- ... but it worked well
Thursday, May 26, 2011
To embed Solr (in Java app)
14
- Your app simulates Java web container?- use EmbeddedSolrServer
- It knows nothing about Java servlets?- use DirectConnect class
- Maybe we are too lazy?- Embed the web container (in my case Jetty)- Seemed strange (webserver inside webserver)- ... but it worked well
Thursday, May 26, 2011
To embed Solr (in Java app)
14
- Your app simulates Java web container?- use EmbeddedSolrServer
- It knows nothing about Java servlets?- use DirectConnect class
- Maybe we are too lazy?- Embed the web container (in my case Jetty)- Seemed strange (webserver inside webserver)- ... but it worked well
Thursday, May 26, 2011
To use Solr in non-Java app
15
- Solr is already usable via HTTP requests, but we need something else here...
- Remote objects/calls?- Pyro, execnet, CORBA, SOAP...- or simply pipes?
- Access Python from Java?- Jython- JEPP
- Access Java from Python?- JPype- JCC
Thursday, May 26, 2011
Jython?
16
- Implementation of Python in 100% Java- Both Java and Python code- Truly multithreaded
- C modules will not work- but see http://bit.ly/iTRYbb
- Slower than CPython
Thursday, May 26, 2011
Jython?
17
- Implementation of Python in 100% Java- Both Java and Python code- Truly multithreaded
- C modules will not work- but see http://bit.ly/iTRYbb
- Slower than CPython
Thursday, May 26, 2011
Jython?
17
- Implementation of Python in 100% Java- Both Java and Python code- Truly multithreaded
- C modules will not work- but see http://bit.ly/iTRYbb
- Slower than CPython
Thursday, May 26, 2011
JEPP - Java Embedded Python
- Python code runs inside Python interpreter
- Embeds CPython interpreter via Java Native Interface (JNI) in Java
- http://jepp.sourceforge.net/- recently updated (27-Jan)- but JCC is more active
18Thursday, May 26, 2011
JEPP - Java Embedded Python
19Thursday, May 26, 2011
JCC
- Embeds JVM in Python- C++ code generator- C++ object interface
wraps a Java library- C++ wrappers conform
to Python's C type system
- result: complete Python extension module
20Thursday, May 26, 2011
JCC
21Thursday, May 26, 2011
JCC
21Thursday, May 26, 2011
JCC
21Thursday, May 26, 2011
To use Solr in non-Java app
22
Jython JCC JEPP
Python CModulesSpeed
No code changesAccess from PythonAccess from Java
✓ ✓
✓ ?
✓ ✓
✓ ✓
✓ ... ✓
Thursday, May 26, 2011
The first try
23
Invenio
JCC
Solr
Thursday, May 26, 2011
Devil is in details...
24Thursday, May 26, 2011
GIL - Global Interpreter Lock
25
Unfortunately Python webapp is not like Java...
Thursday, May 26, 2011
GIL - Global Interpreter Lock
26
We can have 200 threads, but only 4 will run at time...
Thursday, May 26, 2011
GIL - Global Interpreter Lock
27Thursday, May 26, 2011
Fortunately solution exists
- JCC can embed Python inside Java- Special thanks to Andi Vajda! (JCC creator)
- We write ‘empty’ classes in Java ...- ... and implement them in Python
28Python /w Java inside Java /w Python inside
Thursday, May 26, 2011
The second try
29
Inveniofrontend
Solr /w Invenio(backend)
XML
JCC
Thursday, May 26, 2011
Implementing the bridge
- Special Java class- With method pythonExtension()
- Native method pythonDecRef()- JCC provides its implementation
- And number of other native methods- These will be implemented using Python
- Like writing JNI Java/C code but without compilation...
30Thursday, May 26, 2011
MontySolr extension
- JCC has great potential, but also added complexity...
- So the MontySolr project was born- Modules must be built in shared mode- JCC dynamic library loaded and started from the main
thread- Simple mechanism of the Python bridge and message- Configurable handlers on the Python side- Secured dereferencing of the native objects- Threading on the Java side- Multiprocessing on the Python side- Easy ant targets (compilation) ...
31Thursday, May 26, 2011
Hello World - Java partpublic class MontySolrBridge extends BasicBridge implements PythonBridge { private long pythonObject; public void pythonExtension(long pythonObject) { this.pythonObject = pythonObject; } public long pythonExtension() { return this.pythonObject; } public void finalize() throws Throwable { pythonDecRef(); } public native void pythonDecRef(); public void sendMessage(PythonMessage message) { PythonVM vm = PythonVM.get(); vm.acquireThreadState(); receive_message(message); vm.releaseThreadState(); } public native void receive_message(PythonMessage message);} 32
Thursday, May 26, 2011
Hello World - Python part
from montysolr import MontySolrBridge
class SimpleBridge(MontySolrBridge): def __init__(self): super(SimpleBridge, self).__init__() def receive_message(self, message): query = message.getParam(‘query’) message.setResults(‘Hello world!’) print ‘Python received from Java:’, query
33Thursday, May 26, 2011
Example - running MontySolr
34
- Java side- JRE (32/64 bit)- Standard Solr/Lucene jars- JCC dynamic library
- Python side- Python interpreter (32/64 bit)- 4 Python modules (jcc, solr, lucene, montysolr)
- In the main thread- First we load JCC- Then start Python interpreter ...- ... load Python handlers
Thursday, May 26, 2011
Solr as search service
35
Inveniofrontend
Solr /w Invenio(backend)
XML
JCC
Thursday, May 26, 2011
Solr
Example
36
MyCustomHandler
Thursday, May 26, 2011
Solr
Example
37
MyCustomHandler
refersto:author:ellis
Thursday, May 26, 2011
Example - Solr custom handler
MontySolrVM.INSTANCE.sendMessage(message); PythonMessage msg = MontySolrVM.INSTANCE .createMessage("perform_search") .setSender("Invenio") .setParam("query","refersto:author:ellis");
MontySolrVM.INSTANCE.sendMessage(msg); Object result = msg.getResults(); if (result != null) { int[] hits = (int[]) message.getResults(); }
38Thursday, May 26, 2011
Solr
Example
39
MyCustomHandler
refersto:author:ellis
PythonBridge
Example - JNI connection
Thursday, May 26, 2011
Solr
Example
40
MyCustomHandler
refersto:author:ellis
PythonBridge
Example - JNI connection
Inveniowrappers
Thursday, May 26, 2011
Example - Python side
# handler is made ‘visible’ at startupSolrpieTarget('Invenio:perform_search', perform_search)
# search time - called from Javadef perform_search(message): query = message.getParam(“query”) hits = call_real_search(query) # cast Python list into Java array message.setResults(JArray_ints(hits))
41Thursday, May 26, 2011
Solr
Example
42
MyCustomHandler
refersto:author:ellis
PythonBridge
Inveniowrappers
Example
Invenio
Invenio
Invenio
Invenio
Thursday, May 26, 2011
Example - Java side again MontySolrVM.INSTANCE.sendMessage(message); PythonMessage msg = MontySolrVM.INSTANCE .createMessage("perform_search") .setSender("Invenio") .setParam("query","refersto:author:ellis");
MontySolrVM.INSTANCE.sendMessage(msg); Object result = msg.getResults(); if (result != null) { int[] hits = (int[]) message.getResults(); }
43Thursday, May 26, 2011
Solr as search service
44
Apachewebserver
Solr /w Invenio(backend)
XML
JCC
Invenio Invenio
Thursday, May 26, 2011
Outline
- Context- The Challenge- Key components
- Available technologies- Our approach- Problems solved
‣ Evaluation- Wrap-up
45Thursday, May 26, 2011
Memory and garbage collection
46Thursday, May 26, 2011
Comparing speed and load...
47Thursday, May 26, 2011
The effect of cache
48Thursday, May 26, 2011
Robust?
- Extensive siege tests show very good performance and stability under high load- 100-200 users, complex searches- 50 concurrent users, citation analysis- JCC incurs small overhead
- We detected no memory leaks - The same as dbpedia.org
- But watch out for errors in C- An error in C module brings down the whole JVM- (errors in pure Python module can be handled)
49Thursday, May 26, 2011
Easy to develop/maintain?
- Added complexity- Java in the toolbox- Need to compile C++ extensions- Python/OS version dependencies
- For this we get- Easy integration with Invenio- The best of two applications- A lot of features for free- And we can control Solr from Python!
50Thursday, May 26, 2011
Outline
- Context- The Challenge- Key components
- Available technologies- Our approach- Problems solved
- Evaluation‣ Wrap-up
51Thursday, May 26, 2011
Wrap-up
- Our challenge was to connect two different languages/systems
- And we wanted to get the best of the two...- So we had to plug Python into Solr- And now our Solr knows citation analysis!
- We created MontySolr extension- Robust, tested (will be used by INSPIRE)- Works for any Python application (eg. Django)- And for any C/C++ app that Python understands!- Free software license
- Try it out! Help us make it better!- https://github.com/romanchyla/montysolr
52Thursday, May 26, 2011
Questions?
- MontySolr- https://github.com/romanchyla/montysolr
- Roman Chyla - Fellow, CERN Scientific Information Service- [email protected] @rchyla- https://svnweb.cern.ch/trac/rcarepo
Thursday, May 26, 2011
Additional information
54Thursday, May 26, 2011
Links
- Invenio platform- http://invenio-software.org/
- INSPIRE Digital library- http://inspirebeta.net/
- Diagrams of JCC and JEPP- Andreas Schreiber : Mixing Java and Python- http://www.slideshare.net/onyame/mixing-python-and-
java
- On Jython C Extension API- http://stackoverflow.com/questions/3097466/using-
numpy-and-cpython-with-jython
- Demo of a running service:- http://insdev01.cern.ch 55
Thursday, May 26, 2011
#1 - How to embed Solr (standard)
56
- solr.client.solrj.embedded.EmbeddedSolrServer
Thursday, May 26, 2011
#2 - How to embed Solr (simplified)
- solr.servlet.DirectSolrConnection- like previous, but simpler- all the queries are sent as strings, everything is
just a string- very flexible and probably suitable for quick
integration
57Thursday, May 26, 2011
#2 - How to embed Solr (simplified)
- solr.servlet.DirectSolrConnection- like previous, but simpler- all the queries are sent as strings, everything is
just a string- very flexible and probably suitable for quick
integration
57Thursday, May 26, 2011
#3 - Example of a Solr custom handler
58Thursday, May 26, 2011
#4 - Example Python handler
59Thursday, May 26, 2011
Top Related