Which One’s Which? Understanding Versioning in Repositories 22 nd April 2008

30
Which One’s Which? Understanding Versioning in Repositories 22 nd April 2008

description

Which One’s Which? Understanding Versioning in Repositories 22 nd April 2008. This afternoon’s programme. 13.30 – VIF and Versioning 14.30 – Refreshment break 14.50 – Breakouts 1. Metadata 2. Strategy and Advocacy 15.50 – Refreshment break 16.10 – Software and Versioning - PowerPoint PPT Presentation

Transcript of Which One’s Which? Understanding Versioning in Repositories 22 nd April 2008

Which One’s Which?

Understanding Versioning in Repositories

22nd April 2008

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

This afternoon’s programme

• 13.30 – VIF and Versioning

• 14.30 – Refreshment break

• 14.50 – Breakouts• 1. Metadata• 2. Strategy and Advocacy

• 15.50 – Refreshment break

• 16.10 – Software and Versioning

• 17.30 – Reception

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

The VIF project

• Funded by JISC’s Repositories and Preservation Programme from July 2007 to May 2008

• Ran in 3 stages:• a user requirements exercise• development of a framework with input from an Expert Group

and comments from a Review Group• a dissemination phase to promote the recommendations and

guidance and raise awareness of the issue of versioning

• The framework is web-based; it highlights the issues associated with versioning and gives guidance for people involved in:• repository management• software development• creation of content

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

The Framework: www.lse.ac.uk/library/vif

This session covers an

overview of the research

and the problem

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Starting with the research

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

VIF User Requirements Exercise

• Surveys:• Drew on experience of VERSIONS Project• Created with BOS software• Split into 2 surveys Information Professionals and Academics

• Interviews:• A number of informal background gathering interviews• A few structured formal interviews with specific audiences such as an

archivist and a records manager

• Follow up dataset questions:• Very small questionnaire sent to targeted individuals• Identified from the surveys and from DataShare project

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

The Surveys

• Timing:• First draft completed by late July• Piloted in August• Survey ran from 30th August to 15th October 2007• Follow-on survey ended mid November

• Incentives used – Amazon vouchers & IPod Nano

• Promotion via:• JISC lists• Personal e-mails/telephone calls• Internal newsletters

• Approx. 150 responses (plus approx. 60 incomplete not used in analysis)

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Respondent Profile

• Academics Survey:• 50 responses• Mainly Lecturers/Professors• Mainly UK, some US, Australian and European

• Professionals Survey:• 100 responses• Mainly Library Repository people• Mainly UK

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Survey results 1 – current attitudes

• Only 5% of academics and 6.5% of Information Professionals surveyed found it easy to identify versions of digital objects within institutional repositories. The situation becomes even worse across multiple repositories (1.8% and 1.1% respectively)

• Academics are broadly happy (66%) with how they identify versions on their own computer etc.

• There is strong feeling amongst Academics that repositories should only include the ‘finished’ version of a work. Free text boxes were often used to make this point even when this was not the question being asked

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Survey results 2 – Current situation

• Although text documents are the most popular type of material created by academics and stored within repositories both Information Professionals and Academics anticipate a substantial rise in the use of different types of digital objects i.e. audio and video files etc.

• Approximately a third of Information Professionals involved with repositories stated that they either have no system currently in place or ‘don’t know’ how they deal with versioning at present

• Information Professionals have little influence prior to ingest/deposit

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Survey results 3 – Proposed versioning solutions

• No one silver-bullet solution

• Many of the potential solutions to the issue of versioning covered by the survey received strong support from both groups of respondents but numerous problems were captured by free text responses

• The only solution with any claim to broad support was the use of date stamps – but again with many warnings

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Survey recommendations

• Versioning is relevant to all types of digital objects – any framework should therefore be deliberately broad

• Premium placed on ‘final’ versions of digital objects - 91.6% of total respondents thought that being able to clearly identify the ‘finished’ version of an object was ‘essential’ or ‘important’

• Academics appear largely disengaged from the problem of versioning (and are largely happy with the way they deal with their own work) – any advice given should be simple and flexible and avoid top-down enforcement

• Cross-repository versioning is a problem and should be dealt with in the framework

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Back to the framework: the problem

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

What is a version anyway?

Are a pre-publication text document and the published journal article versions of each other?

Are a digitised 18th century map of Hertfordshire and a present day map of the same place versions of each other?

Are audio recordings of the same piece of music played by different orchestras at different times and in different places versions of each other?

Are a video of a conference session, a photo taken there, the presentation given and the original article that led to the session in any way versions of each other?

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Question 1 – just iterations or outputs as well?

• There are different levels of versioning iterations:  • minor changes (a revision)• significant changes (a landmark version, e.g. peer reviewed,

published etc)• formatting or stylistic changes (e.g. typesetting or font)• change of file format (creating a digital variant)

• But one research project can generate many outputs describing the same idea or work

• It is possible to call both outputs and iterations ‘versions’

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

One research project – many potential ‘versions’

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Question 2 – version object vs. version relationship

• What links the objects in the examples together? • Author? Idea? Time? We will go into a good potential model later – FRBR

• Understanding requires recognition that there is a difference between a single ‘version’ and a ‘version relationship’:

• Example 1: A researcher• Version - I want to cite the latest version, is this one it? • Version relationship - I’m writing about the development of an academic’s work over

the past 10 years. His outputs are numerous and includes diagrams, conference presentations and articles. When and in what order were they produced?

• Example 2: A repository manager• Version - Does this wind speed dataset contain the latest collection of data? • Version relationship - There are 2 datasets measuring ‘wind speed’ taken from exactly

the same place. They were recorded by different people for different purposes. Should they be linked? If so, how?

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

VIF’s assumptions:

• A version is identifiable; the change between versions is describable and understood by either human or machine

• The understanding of what a version is relates to;• either its content (i.e. a digital variant) or its format (i.e. a digital copy)• either an iteration or output• both the object itself and its relationship to other objects

• Some versions can be perceived to be more relevant/appropriate, authorised and/or authentic than others by either author or reader. But, only the end user might determine which version is most relevant for them and why

• Clarity about versions should help an end user understand which is the ‘best version’ for their purposes

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Developing a definition and the framework:

• The framework has been developed:• recognising a fairly broad audience of interests groups and levels of

knowledge• to provide user driven advice to support repositories• to provide best practice which will be spread via a ground-up approach• recognising that certain things like ‘final version’ are critical to identify, but

maintaining an agnostic stance about relative importance of objects – ‘fit for purpose’ rather than ‘best or most relevant’

• We therefore needed:• a deliberately wide understanding of what constituted a version for all

involved• to encompass anything that anyone might consider to be a version• to find ways to make versioning information transparent to the end user

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

VIF’s definition:

• A 'version' is a digital object (in whatever format) that exists in time and place and has a context that can be described by the relationship it has to other objects

• A ‘version relationship’ is an understanding or expression of how two or more objects relate to each other

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Now for the framework itself:

The essential information for all within the framework

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

• VIF has identified the pieces of information that give clues about version status

• We then worked out how to make this information available to the people who use repositories

Making information about version transparent

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Essential Versioning Information

• There are five pieces of information that when some or all are present will allow someone to understand what version they have:

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Embedding the information

• The repository metadata itself is frequently bypassed, therefore if the object does not have the information contained within it, it can become impossible to ascertain version status for an end user

• Metadata is not evident when:• Access to the object is through a direct link• Access occurs via a search engine like Google• Cross repository search services are used. They often deal with

inconsistent metadata by harvesting as much information as possible and then re-producing it in their standard format

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Object Solutions

• Filename –

should be unambiguous,

preferably uniform.

Could include repository

no., for example.• Coversheet - http://eprints.lse.ac.uk/2631/ • ID tags / Properties – dates, version no etc can be store in these

easily at the creation stage or by repositories.

• Watermark:

http://arxiv.org/PS_cache/astro-ph/pdf/0701/0701001v2.pdf

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

How to access the Framework’s Recommendations:

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Overview of framework recommendations

Repository Management:• Formulate wider strategy; set and promote clear policies • Use object solutions and get version information at ingest• Include version information in metadata

Software Development:• Make systems cope with and link more than one version• Look at a FRBRised structure to establish version relationships• Support richer metadata using DC application profiles

Recommendations for Content Creators:• State the author, title and date last changed• Keep track of which versions are available and where• See VERSIONS Toolkit for more information

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

The Framework

• Go look at the framework - full details, explanations, pros and cons, guidance and recommendations contained within

• All on the web, will be available as PDF in May

• It’s not finished and can still be improved!

• www.lse.ac.uk/library/vif

VIF Workshop 22nd April 2008www.lse.ac.uk/library/vif

Survey recommendations - revisited

• Versioning is relevant to all types of digital objects – any framework should therefore be deliberately broad

• Premium placed on ‘final’ versions of digital objects - 91.6% of total respondents thought that being able to clearly identify the ‘finished’ version of an object was ‘essential’ or ‘important’

• Academics appear largely disengaged from the problem of versioning (and are largely happy with the way they deal with their own work) – any advice given should be simple and flexible and avoid top-down enforcement

• Cross-repository versioning is a problem and should be dealt with in the framework

Project Director:Frances Shipsey, LSE Library, [email protected]

Project Manager:Jenny Brace, LSE Library, [email protected]

Project and Communications Officer:Dave Puplett, LSE Library, [email protected]

Project Officer:Paul Cave, University of Leeds, [email protected]

Project Officer: Catherine Jones, Science and Technology Facilities Council, [email protected]