Easy as ABC A triumph of re-useable metadata Julia Hickie Mark Raadgever 1 Trove Support.

Post on 15-Dec-2015

216 views 1 download

Tags:

Transcript of Easy as ABC A triumph of re-useable metadata Julia Hickie Mark Raadgever 1 Trove Support.

1

Easy as ABCA triumph of re-useable metadata

Julia HickieMark Raadgever

Trove Support

2

3

https://plot.ly/~wragge/6/trove-newspaper-articles-by-state/

1955

4

5

1. A web crawling bot to pickup records

2. Transformers to change the records

3. A loader to dump them in Trove

Dragline loading a dump truck at German Creek brown coal open mine, Queensland, 1985 Sievers, Wolfganghttp://nla.gov.au/nla.pic-vn4801485

Tonka, Brian Auerhttps://flic.kr/p/4qKzQE

Mi colección de Transformers (17/Dic/2007)Gustavo Vargashttps://flic.kr/p/4ee2Nh

6

Why didn’t they become a Trove contributor?

1. No resources, no money, no capability for technical change

2. Can’t meet a metadata standard (that no longer exists)

7

http://www.abc.net.au/radionational/feed/2887252/podcast.xml

8

http://www.abc.net.au/radionational/programs/healthreport/

9

http://www.abc.net.au/radionational/programs/healthreport/past-programs/index=2013

10

11

12

Radio National

Website NLA Harvester

HTML

XML

HTML

Trove

XML

PHPScript

• Regular expressions• XSLT stylesheets• Java modules

13

14

15

16

17

Radio National

Website NLA Harvester

HTML

XML

HTML

Trove

XML

PHPScript

• Regular expressions• XSLT stylesheets• Java modules

18

19

20

21

22

Radio National

Website NLA Harvester

HTML

XML

HTML

Trove

XML

PHPScript

• Regular expressions• XSLT stylesheets• Java modules

23

http://trove.nla.gov.au/version/209294503

24

WHY?

25

26

27

28

Michael Neubert From wheels to bikes -http://wheelbike.blogspot.com.au/2012/02/starting-search-for-bikes-in-trove.html

29

2013-07

2013-08

2013-09

2013-10

2013-11

2013-12

2014-01

2014-02

2014-03

2014-04

2014-05

2014-06

2014-07

2014-08

2014-09

0

200

400

600

800

1000

1200

1400

1600

ABC Clickthroughs July 2013-September 2014

Clickthroughs

Content added

Promotion

30

31

Why Radio National

32

What else?

• Standardised records allow analysis of content• Digital historians can use the API to investigate

trends• E.g. Tim Sherratt’s In a Word

33

34http://inaword.dhistory.org

35

https://github.com/wragge/radio-national-data

36

37

38

39

Lessons Learned• Adaptation of existing functions– Sitemap harvesting– RSS harvesting– XSLT Transformation

• Development of generic rather than specialised tools

• Staff learning opportunities – we became better at using core technology

40

Future

• Re-examine contributors previously unable to meet technical requirements

• Encourage re-use of the dataset – including adding it to library catalogues as well as scholarly analysis

• Think beyond conventional data

41

Questions?