Faceted browsing for ACL Anthology
description
Transcript of Faceted browsing for ACL Anthology
Faceted browsing for ACL Anthology
Praveen Bysani
ACL Anthology
• a digital archive of research papers in CL and NLP
• contains over 20,100 papers
• free of cost
• archive for sister conferences and journals
Current browser
• direct and navigational search
• hard to navigate
• non-customized search
• non-sortable results
Faceted browsing
• Combination of navigational and direct search paradigms
• Facets are properties of information elements
• Access to organized information
• Ability to explore the collection in multiple dimensions through filters
Faceted Browsing
• RoR + Blacklight plugin
• Apache Solr
• Metadata from XML
• Blacklight customization for XML
Show view
Index View
More cookies..
• User Feedback• Comment/ Share / Like • Suggestions for correcting the meta data
• Ability to export bib in six formats
• Author pages• List of publications• Co-authors
• Third-party annotations• Automatically annotate articles with new metadata• Anthology as a corpus • API to make anthology an object of study
• OAI compatible• allows metadata harvesting
• @ http://aclanthology.heroku.com/
Challenges
• Normalizing the quality of anthology meta data information
• SIG Information• yaml files• no identifiers provided
• DOI• from acm• changes in names of papers, authors
Similar works
ACL Author Network
• bibliometrics
ACL Search Bench
• Semantic search
Plans for the future• A common data schema to integrate all
• Indexing the whole text data
• Range queries for year facet
• Exporting total volume bibliography
• Enriching author pages