Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

56
free author registration Thomas Krichel LIU & НГУ 2008-12-11

Transcript of Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

Page 1: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

free author registration

Thomas KrichelLIU & НГУ

2008-12-11

Page 2: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

me today• I am working for the Palmer School of

Library and Information Science in he College of Information and computer science of the CW Post Campus of Long Island University in Brookville NY, U.S.A. and for the Division of Information Systems in the Faculty of Information Technology at Novosibirsk State University in Novosibirsk, Russia.

• I do a lot of programming & sysadmin.

Page 3: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

formerly

• I am a trained economist.

• My main claim to fame is the creation and and coordination of the RePEc digital library for economics at http://repec.org.

• My main area of work within RePEc is the NEP: New Economics Papers current awareness service. It's a totally different topic.

Page 4: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

RePEc now

• It is a collection of data about academic economics.

• The bulk of the data is data about documents.

• And the bulk of that is– published article data– working paper data

• But the interesting data is the author, institution and usage data.

Page 5: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

RePEc principle of 1997• many archives

– archives offer metadata about digital objects (mainly working papers & journal articles)

• one database – the data from all archives forms one single logical

database

• many services – users can access the data through many service – providers of archives offer their data to all

services

Page 6: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

repec is based 900+ archives

• Blackwell• MPRA• DEGREE• S-WoPEc• NBER• CEPR• Taylor & Francis

• US Fed in Print• IMF• OECD• MIT• University of Surrey• CO PAH• Elsevier

Page 7: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

to form a 630k item dataset

254,000 working papers

370,000 journal articles

1,600 software components

4,200 book and chapter listings

17,600 author records

10,800 institutional contact listings

Page 8: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

RePEc is used in many services

• EconPapers

• NEP: new economics papers

• Google Scholar• RePEc Author Service• Twitter bulk posting (planned)• LogEc

• IDEAS• RuPEc• EDIRC• LogEc• CitEc• MPRA

Page 9: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

… describes documentstemplate-type: redif-paper 1.0title: dynamic aspect of growth and fiscal policyauthor-name: thomas krichel author-person: repec:per:1965-06-05:thomas_krichelauthor-email: [email protected] author-name: paul levine author-email: [email protected] author-workplace-name: university of surreyclassification-jel: c61; e21; e23; e62; o41 file-url: ftp://www.econ.surrey.ac.uk/

pub/repec/sur/surrec/surrec9601.pdf file-format: application/pdfcreation-date: 199603 revision-date: 199711 handle: repec:sur:surrec:9601

Page 10: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

… describes persons (ras)template-type: redif-person 1.0name-full: mankiw, n. gregoryname-last: mankiwname-first: n. gregoryhandle: repec:per:1984-06-16:n__gregory_mankiwemail: [email protected]:http://post.economics.harvard.edu/faculty/ mankiw/mankiw.htmlworkplace-institution: repec:edi:deharusworkplace-institution: repec:edi:nberrusauthor-article: repec:aea:aecrev:v:76:y:1986:i:4:p:676-91author-article: repec:aea:aecrev:v:77:y:1987:i:3:p:358-74author-article: repec:aea:aecrev:v:78:y:1988:i:2:p:173-77….

Page 11: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

… describes institutions

template-type: redif-institution 1.0 primary-name: university of surreyprimary-location: guildfordsecondary-name: department of economicssecondary-phone: (01483) 259380secondary-email: [email protected]: (01483) 259548secondary-postal: guildford, surrey gu2 5xhsecondary-homepage: http://www.econ.surrey.ac.uk/handle: repec:edi:desuruk

Page 12: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

author registration

• It started when JISC funding allowed us to hire a student to write an author registration system.

• The system went online as “HoPEc” in late 2000.

• It has been renamed “RePEc Author Service” (RAS).

• A 2002 grant from OSI allows for a rewrite and expansion.

Page 13: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

researcherID

• researcherID is a system by Thomson ISI. It allows authors to find their documents

• It has been modeled after the RePEc author service.

• But the document and personal records are not freely available.

Page 14: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

success of RAS

• Measuring the success of an author registration service is difficult in general.

• In RePEc we are fortunate that an independent list of top 1000 authors exists.

• Of those 80% are registered.

Page 15: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

author registration ?

• Author registration is not disambiguation of names.

• Author registration is not authority control.• Author registration is usually done by

authors themselves. It involves two steps– Registrants put in some personal data.– Registrants finds in the document data records

about documents they have written.

Page 16: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

personal data

• These contains required element:– person's name– email

• and optional elements– institutional affiliation– homepage URL

Page 17: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

search for authorships

• This is based on a set of name variations.• A name variations is a string by which

document metadata authors may have referred to the registrant.

• Example:– Thomas Krichel– Крихель, Т.

• Registrants maintain a name variations profile.

Page 18: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

authors

• An author is a registrant who has at least one work claim.

• Since author registration is a pionering innovation by yours truly, it's purpose is not yet clearly understood.

• A user who registers to gain access to data is called a bozo registrant.

• RAS managers periodically clear presumed bozo registrants.

Page 19: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

free? as in $0

• Registrations don't pay in money terms for registration.

• Document data providers don't pay to have their document data list.

• Registrants data is freely available if they allow it.

Page 20: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

free ? as in freedom

• Author records are freely available for any purpose, as long as we have registrants consent.

• Registrants' consent is assumed for anything but the email address. By default email addresses are not exported.

Page 21: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

freedom is crucial

• Users will not register with the intention that the records will be used.

• They will prefer a system that has high re-usage.

• Therefore I am confident an open system will win over a closed system.

Page 22: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

free document data

• In principle, document data has to contain only three fields– Title– Author name expressions– URL for further information and/or

• Such data is in principle not copyrightable. But there are still only few sources that have such data readily available.

Page 23: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

service implementation scale

• Registration of authors can be conducted against any document datasets.

• What is the appropriate set– type scale?– subject scale?

• RAS shows it works for a single discipline scale with research paper documents, both article.

• But economics is fairly insular.

Page 24: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

AuthorClaim.org

• Since 2008 yours truly have been working on an interdisciplinary system.

• This will be the last important project before my death.

• The idea is that it will help the fledging institutional repository (IR) movement.

• Since IRs currently are either empty or contain rubbish, AuthorClaim has to be primed with other contents.

Page 25: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

datasets • The data used in an AuthorClaim are

– PubMed (problematic)– DBLP (XML file only)– CiteSeer– arXiv (not announced yet)– CIS (non-free dataset)– E-LIS

• Work is under way to include broad range of the repositories listed in DOAR.

Page 26: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

PubMed

• The 800 pound gorilla of bibliographic datasets, with 17 million records.

• Free only as $0, through a convoluted license.

• In addition, NLM added the condition that I would not offer the personal records to them. Just saying that they would refuse them if I offered them was not enough for them.

Page 27: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

DBLP

• Not freely available either. – only an XML dump of some records (individual

documents)– only for non-commercial purposes

• Overlap with CiteSeer would be nice to clean up.

Page 28: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

CIS

• This is the Current Index to Statistics.• Not a free dataset at all but your truly has

access to a database version where extract the 3 metadata fields that are required.

Page 29: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

DOAR repositories• DOAR repositories used the OAI-PMH

protocol. Dirty UTF-8/XML seems to the main culprit.

• Roughly, out of 1200 registered repositories, ½ work on a particular day.

• For roughly 2/3rd we can get some records by trying and stopping when the first error occurs.

• BTW RePEc makes for the second-largest DOAR repository by record number.

Page 30: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

subject coverage and overlap

• The subject coverage of AuthorClaim will remain uneven unless publishers are giving data directly (replacing libraries, eventually).

• Overlap is less of a problem than lack of good data. RePEc routinely groups various versions of authors' work together. This is feasible if they are in the claimed set of a person.

Page 31: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

scaling issue

• With 30 times the number of record, and with PubMed only using initials (phew!) registrants with common names have large sets of potential documents to work through.

• Clearly they also derive more benefits.• Example: Joanna P. Davies has currently

795 proposed documents. Now think about Chen or Li.

Page 32: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

machine learning

• In a new project Илья Королёв and Thomas Krichel are working on enhancing ACIS to provide help through machine learning.

• The idea is that the users will submit a few positive and negative examples, and machine learning sorts the most likely authored documents to the front. The assessment of such a system is really interesting.

Page 33: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

ACIS

• This is the Academic Contribution Information System.

• It is a generic software to enable author registration services that are somewhat more general.

• Work on ACIS was sponsored by the Open Society Institute.

• The software was written by Ivan V. Kurmanov. It is verrrry complicated.

Page 34: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

basic idea

• A contribution is a relationship between document data records and personal records that a registrant can claim.

• Authorship and editorship are built-in contribution types, but others can be configured.

• The contribution system allows registrants to provide information about their contribution.

Page 35: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

no document creation

• Using ACIS, registrants can not create document records.

• While many RAS registrants want to do this, it is considered out of scope for an ACIS installation.

• ACIS-based systems are not supposed to substitute but complement the work of publishers.

Page 36: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

ACIS implementations and document services

• An ACIS implementation service (AIS) can work with a document submission service (DSS).

• A DSS would typically run EPrints, Dspace or Fedora-Commons.

• While such systems are distinct, on different machines etc, they can be so interconnected that they appear integrated to a naive user.

Page 37: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

interoperability

• AIS and DSS interoperability comes in different levels.

• With each level up, we have more (better) interoperability.

• We have levels 0 to 4.

• At level zero, an AIS and an DSS simply live side by side, and no interaction is happening.

Page 38: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

level 1

• In level 1, a DSS provides metadata about its documents to an AIS. – The data is stored in files.– in a compatible format. for ACIS this would be

AMF or ReDIF.

• The AIS processes the data periodically. – adds new records to the document data set– perform probationary associations between

documents and authors

Page 39: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

level 2• A DSS delivers to the AIS data for some of

its authorships that point to data in the AIS. The AIS can accept any of the following 3 identification avenues– an identifier known to the AIS– a shortID, previously generated by the AIS– an email address, know to the AIS as the login

of a registrant.

• This data will have to be entered by a submitter.

Page 40: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

level 3

• The DSS helps submitters to find the data required for level 2 interoperability.

• While submitters enter authorship data, the DSS performs searches in the AIS data. If matching records are found, the submitter is invited to select them.

• The document data is the exported to the AIS in the usual way.

Page 41: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

implementing level 3

• The AIS needs to expose registrants data to the DSS. The data can not be made available publicly if we want the email to be an avenue of identification.

• The DSS must search the AIS data display optional matches in an unobtrusive way and give submitters an easy way to choose an option.

Page 42: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

level 4

• The DSS immediately notifies the AIS about a document submission.

• The AIS processes the notification, the document is added to the research profiles of its identified authors.

Page 43: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

level dependency

• There is level dependency– level 1 is really required for other levels.– level 2 is a basis for level 3.– level 4 can be done without either level 2 or

level 3.

• Current ACIS code can implement all four levels.

• There is code written for EPrints 2.0 that implements the DSS side of the interoperability.

Page 44: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

ACIS components

• “rid” is a feeding daemon. It feeds records in files into a processor. It used the Berkeley DB transactional database system.

• “ARDB” is a software suite that implements bibliographic relational bibliographical datasets.

• There is general web application layer. It fires up XSLT.

Page 45: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

ACIS components, a few more

• As “shortID” system associates shortIDs with documents and more importantly, registrants

• A “userData” system manages the data handled by users and feeds it back to the ARBD system.

• A “resources” system deals with searches and suggestions.

Page 46: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

ACIS functionality

• Beside the association of documents with users, ACIS provides a range of functionality that complement or extend the basic functionality.

• I will review some now.

Page 47: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

ACIS contact details

• This is a set of trivial fields– email. This detail is required but not exported

by default.– homepage – phone number– postal address

• We don't do pictures of the registrants' dogs etc.

Page 48: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

affiliations profile

• This is more complicated.• Institutional data is kept as separate

records, not as string data.• Registrants can search for existing

institutional records to create an affiliation with.

• Or they can propose a new record to be added by filling out a form.

Page 49: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

research profile

• This is collection of metadata about research documents the registrant has written.

• Available functions include– display a list of works in the profile– search for new suggested works– manual search for works by title– display refused research documents– change preferences for automatic updates

Page 50: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

automatic updates• By default, when a document record quotes

an person short id, the document is added to the profile.

• By default, a regular search using the name variations profile identifies a set of potential new documents and reports them to the user via email.

• The registrant may choose to have exact matches of these searches being added to the research profile.

Page 51: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

document to document links

• Document to document links can be created for authors to say that two documents in the profile are related.

• Document full-text links can be confirmed or rejected.

• Typically such full-text files would found by an automated search external to the AIS.

Page 52: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

citations profile

• Within this profile, author can partially manage citation information for items is the research profile.

• Like a DSS may submit data to a AIS a citation discovery service may take give citations data to a AIS.

• Such data can be maintained in the citations profile.

Page 53: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

references processing

• References are processed to see if they may correspond to a document in the research profile.

• If a document in the profile has a potential citation it is called an “interesting” document.

• Once reference processing is done, registrants can navigate by decreasing level of interest.

Page 54: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

suggestions processing

• Registrants navigate the set of suggested citations to see if the reference string really matches the research profile item.

• If the registrant refuses a citations, there is a screen where she can later overturn such a decision.

Page 55: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

automatic citation updates

• If the reference is very close to citation data, the registrant can have it added automatically.

• When a co-author has identified a citation to an item in her profile, the registrant can allow it to be added automatically.

Page 56: Free author registration Thomas Krichel LIU & НГУ 2008-12-11.

thank you for your attention!

http://openlib.org/home/krichel