Edanz journal selector case study a prototype based on solr nutch hadoop
-
Upload
lucenerevolution -
Category
Education
-
view
570 -
download
6
description
Transcript of Edanz journal selector case study a prototype based on solr nutch hadoop
© 2013 LucidWorks
Edanz Journal Selector: Case Study: a
Prototype based on Solr/Nutch/Hadoop
Liang SHEN @shenzhuxi
European Bioinformatics Institute
© 2013 LucidWorks
Edanz Journal Selector
a Prototype based on Solr/Nutch/Hadoop
© 2013 LucidWorks
English editing for scientists
© 2013 LucidWorks
Help scientists publish papers
© 2013 LucidWorks
Target journal?
© 2013 LucidWorks
Journal Selector
© 2013 LucidWorks
Open Access
PubMed
© 2013 LucidWorks
Journal TOCs
created in 2009
21,498 journals from
1,677 publishers
Institute for Computer
Based Learning
Heriot-Watt University
© 2013 LucidWorks
Partner
• Springer Metadata API
Provides metadata for over 5 million online documents
• Springer Open Access API
Provides metadata, full-text content, and images for
over 80,000 open access articles
© 2013 LucidWorks
Open Source Stack
• Infrastructure: Amazon Web Service
• Data processing: Hadoop/Hive
• Index: Solr/Lucene
• Web service: Drupal
• Secret Sauce/Custom Works
© 2013 LucidWorks
Infrastructure: Amazon EC2
© 2013 LucidWorks
Data processing
HDFS
Index
AP
I
Feed
s
Web
Pages
© 2013 LucidWorks
<script>
http://global.js.wid
get.eja.hk/ja/edan
z_ja/w.js
</script>
Web service
© 2013 LucidWorks
Embeddable web widget
© 2013 LucidWorks
Split Index for performance
Index can be divided without losing ranking, if there is always a facet field.
© 2013 LucidWorks
@shenzhuxi
Thanks!
Questions?