BioNames - Amazon S3€¦ · BioNames is partly a tool to help find these (e.g., by ... Then I...
Transcript of BioNames - Amazon S3€¦ · BioNames is partly a tool to help find these (e.g., by ... Then I...
BioNamesThese are some notes to kickstart discussion about "BioNames". For background on the projectsee the original proposal http://dx.doi.org/10.6084/m9.figshare.92091
BackgroundMy goal with BioNames is "evidence based taxonomy". Each (animal) name should be linked tothe publication that made that name available. Ideally, the user should be able to read thatpublication in situ, but an external link (e.g., DOI to paper behind a paywall) is also acceptable. Inthe short term this enables users to find the original description of a taxon, in the long term it willfacilitate data mining these descriptions for trait data. By linking names to taxonomic concepts,users could also see maps, phylogenies, and images associated with the taxonomic names.In many ways the underlying goal of BioNames is not "yet another names database", but rather away to visualise a large mapping of names to literature. This mapping is incomplete, partlybecause the original source database misses many links between names and literature.BioNames is partly a tool to help find these (e.g., by displaying articles that contain the taxonname, so that users can discover the original publications themselves).
Principles
Everything is a first class citizenA key principle is that every object we care about gets equal status. Most taxonomic databasesfocus, say, on the name, and the publication is relegated to a dumb text string. I want thepublication to have the same status as the name. It has its own page which displays what weknow about that publication, with (internal) links to the containing journal, the published names,the author(s), etc. The experience should be like reading a Wikipedia page, where eachimportant concept in the article on a topic is itself linked to another Wikipedia page. The user canexplore names, publications, journals, authors, all on the same site.
DashboardI like the idea of having a Google Analytic's style dashboard for displaying information. Obviousexamples are use of a name over time (like http://synynyms.com/ but with the ability to click on atime period and see the corresponding publications), the distribution of publications over time foran author, etc.
Multiple entry pointsThere are multiple ways to get "into" the data. Search is the default, but if we have an externalidentifiers (say, a DOI) we can append that to a base URL and get the corresponding item (e.g.,the DOI 10.1016/j.ode.2007.11.004 can be used like this:
http://bionames.org/id/10.1016/j.ode.2007.11.004 to get the database record for a publication).
What's in BioNames?
First releaseThe first release needs to have taxonomic names and literature, with the ability to search byname, citation, and to display full text of a publication (where it is available and open). Thisbasically requires a simple interface to the underlying database.
Second releaseThe second step is to add taxonomic concepts so users can have a taxonomy to browse to helpnavigation and discovery. This would also enable the addition of conceptspecific stuff such asmaps (GBIF), sequencebased phylogenies (NCBI), and images (EOL). This would make theinterface much friendlier and accessible, but requires that I build a mapping between taxonnames and taxon concepts (this is in progress). It also needs tools to display maps andphylogenetic trees. I have code for this.
Stories
Feature: Tell me about a taxon name In order to learn more about a species Users should be able to see a summary of the name, its authorship, and thumbnail of the relevant publication.
As a result, they have a "bibliography" for the taxon (rather like the existing feature in BHL) but also have the "key" publication, the original description, singled out. They can then explore those publications (wherever possible these will be in the same database).
Scenario: See original description Given that we know the original description of a name When I visit the page for that name Then I should see the original description clearly marked And be able to read it with minimal effort Because that's the whole point of this project!
Scenario: See a species with an unknown original description Given that we don't know the original description of a name When I visit the page for that name Then I should see clearly that we don't know the original description
And I should se all the literature we do know about So that I might be able to find the OD in the existing literature
Scenario: List all literature known about a name Given that we have multiple articles containing a name When I visit the page for that name Then I should see a list of all the literature containing that name And it should be ordered chronologically So I can attempt to find an OD if BioNames doesn't have one marked
Scenario: See a classification When I visit the page for a name Then I should see it in a classification So I know where it lies in the tree of life, and what it might be related to
Scenario: Navigate a classification When I visit the page for a name Then I should be able to click higher order taxa So I can browse around the tree of life
Scenario: Link to other projects Given a name has links out to EoL, ALA, and an LSID When I visit the page for a name Then I should see a link to its taxa page in EoL And a link to its page in ALA And a link to resolve its LSID So that I can discover more information about this taxa on other sites And make Cyndy happy
Scenario: Link to synonyms for this name Given a name that has synonyms When I visit the page for this name Then I should see links to it synonyms So that I can find more information about this critter
NCBI classification and trees
GBIF classification and maps
Feature: Tell me about a genus In order to explore the tree of life Users should be able to see and browse all the species in a genus
In order to find missing data Curators should be able to visually see which species do not have an OD
Scenario: See species When I visit a genus page Then I should see all the species in that genus
So I know what's available to me
Scenario: Browse species When I click on a species Then I should be taken to that species' page So I can see the OD and other information on the name
Scenario: Visualize missing data As a curator When I visit a genus page Then I should see the OD or lack thereof for all species So I can spot species that still need their ODs identified
Scenario: See a classification When I visit the page for a genus Then I should see it in a classification So I know where it lies in the tree of life, and what it might be related to
Scenario: Navigate a classification When I visit the page for a genus Then I should be able to click higher taxa So I can browse around
Feature: Tell me about a family or higherorder taxa In order to explore the tree of life Users should be able to see and browse the children of this taxon
In order to find missing data Curators should be able to visually see what percentage of children have ODs
Scenario: See children When I visit a highlevel taxon page Then I should see all the species in that genus So I know what's available to me
Scenario: Browse child
When I click on a child Then I should be taken to that child's page So I can see the information about the child
Scenario: Visualize missing data As a curator When I visit a highlevel taxon page Then I should see a visual representation of how many species in each child have ODs So I can spot species that still need their ODs identified
Scenario: See a classification When I visit the page for a highertaxon Then I should see it in a classification So I know where it lies in the tree of life, and what it might be related to
Scenario: Navigate a classification When I visit the page for a highertaxon Then I should be able to click higher taxa So I can browse around
Feature: Tell me about a publication In order to read about a species, including its OD Users should be able to see the text of the publication in their browser and be able to navigate to the journal, names contained, and author
Scenario: Read a publication Given a publication with an accessible digital copy When I visit the page for a publication Then I should see the scans in my browser So I can read the text and do my work
Scenario: Visit a publication behind a paywall Given a publication locked up behind a paywall When I visit the page for a publication Then I should see a link to where I can view the text So I can read the text
Scenario: See all names in a publication When I am reading a publication with species names mentioned Then I should see all the names aggregated into one place And I should be able to click on them to visit that name page So I can explore species related to ones I was looking at
Scenario: See the name originally described Given an original description When I'm reading that publication Then I should see the originallydescribed name And it should be speciallymarked Because ODs are ultimately what Bionames is all about
Scenario: View citation information When I'm reading a publication Then I should be able to copyandpaste a citation to it And the citation is available in varying format styles So I'm encouraged to cite it in my work
Scenario: Visit external resources Given a publication with external identifiers When I'm viewing that publication Then I should be able to click the identifiers and be taken out to their respective projects(Biostor, BHL, DOI resolver, etc) So that our collaborators get more traffic
Scenario: Browse all pubs by the authors When I'm viewing a publication Then the authors should be listed And I should be able to click them to see their other pubs So that I can explore related works and learn something new
Scenario: Browse all pubs in journal When I'm viewing a publication Then I should see the journal it was published in And I should be able to click on it to see its other pubs So that I can get a big overview of potentially related publications
Feature: Tell me about an author In order to discover more work by an author Users should be able to See and explore a summary of an author's output eg publications, taxon names.
Scenario: See all articles by an author When I visit an author's page Then I should see all the articles published by them So that I can explore publications relevant to my interests
Scenario: Group articles by Journal When I visit an author's page Then I should be able to slice and dice their publications So that I can find what I'm looking for, or what may interest me
Scenario: See all taxon mentioned by an author When I visit an author's page Then I should see all the taxa mentioned by them in print So that I can see their area of expertise and explore taxa related to those I'm interested in
(you can see this live at http://iphylo.org/~rpage/afd/author/Distant%2CW+L). A nice thing abouttreemaps is you can quickly see gaps (e.g., publications that haven't been linked to their digitalequivalent).
Feature: Tell me about a journal In order to get a big overview of publications Users should be able to browse articles by journal
Scenario: See all the articles published in a journal When I visit a journal page Then I should see all the articles published in it So that I can explore publications relevant to my interests
Scenario: Refine the articles list When I visit a journal page Then I should be able to refine the list of articles by date and so forth So that I can reduce the number of things onscreen to just what I may be interested in
Scenario: Visualize publication trends When I visit a journal page Then I should be able to see the publication trends through time
So that I have a rough idea of whether this journal is relevant to what I'm looking forFeature: Full search User treats site like Google and just types in a query (e.g., "homonym replacement name"). They get a list of publications that match the query, e.g.:
The search results show a thumbnail of the publication, and also the names in each publication(as "tags"). You can go to the publication, or click on the name.
Feature: Give me an overviewThere are various overviews that could be displayed to give people insight into the state oftaxonomy. For example, here is a bubble diagram of the numbers of names published in eachjournal, showing how Zootaxa dominates the landscape:
We could also display timelines for taxa, e.g within a genus what is the cumulative rate ofdescription of species:
Feature: Export data to EoL We will export our data, including the original descriptions for species, to the Encyclopedia ofLife using a DwCA.
Technology
DatabaseThe database is the JSONbased document store CouchDB. The data will be hosted athttp://cloudant.com, which means I don't have to manage a server, and interested users couldgrab the entire database.The actual data is currently in a MySQL database where it is being cleaned and linked. I havescripts for generating JSON documents from this database, which are then sent to theCouchDB database.
Web interfaceThe prototypes are currently a mixture of PHP and Javascript. The bulk of the database queryingcomprises simple PHP wrappers around calls to the CouchDB API. Very crude HTML templatesare then populated with Javascript functions which generate the web pages.