1 CS 430: Information Discovery Lecture 14 Automatic Extraction of Metadata.
Metadata Extraction Projects for Education Network Australia
-
Upload
pru-mitchell -
Category
Education
-
view
1.294 -
download
2
description
Transcript of Metadata Extraction Projects for Education Network Australia
edna is partly funded by the Australian Government Department of Education, Employment and Workplace Relations. Managed and maintained by Education Services Australia
Metadata Extraction Projects
Pru Mitchell & Sarah Hayman
Education Network Australia
delivering innovative, cost-effective services across all sectors of
education formed 1 March 2010
not-for-profit, ministerial company (MCEECDYA)www.esa.edu.au
VETADATA
ANZ-LOM
IEEE LOM
ASCEDASCED
ASCOASCO
Metadata is not scalable
We can no longer be comprehensive or meet the standards set by our collection policy, because now we have:
more contentless fundingfewer cataloguerssame old clunky metadata tools
Solutionsreduce quantity of metadata reduce quality of metadata get someone else to create/pay for
metadata users other organisations
improve metadata creation tools ? program machines to create
metadata
edna proof of concepts
me.edu.au professional networking
edna sustainable collections (ESC)
faceted search: rights and user level
Flinders University-edna AI research
professional networking site for educators
users bookmark and discuss resources and these are aggregated to own url
the system collects, manages and maps metadata
person – resource - tag - community
edna Sustainable Collections (ESC)harvests bookmarks
from key educators in me.edu.au
and external services
links to OpenCalais entity data
How does it do this?
takes an RSS feedextracts available metadata
checks for duplicatesmaps it to edna metadata profile in DSpace metadata management system
Outcomes
increasing efficiency for information managersfreeing of Information managers to focus on higher end work, eg subject, user level metadataadding user suggestion to collectionwidening the range of resources being captured and evaluated
Faceted search
use metadata to help solve issue for stakeholder - cost of educational copying
harvest rights/licence metadata
make this meaningful to educators ‘what can I do with this resource?’
preference openly licensed content
AI proof of concept
Flinders University Artificial Intelligence and Knowledge Laboratory and Education.au 2008-09
partial automation of categorisation and annotation of web pages
Elements of the project
text analysisautomatic classification edna categorysuggestion of categories from controlled vocabularyclassification data capture tool
Findings35% accuracy for mapping category
from title alone, 60% accuracy using WordNet-based semantic relatedness
confirmation of the need for a human eye/expertise
classification information may be contained in images/style/tone not text
Conclusionsconsider new approaches and keep
pace with developments, cultural and technical
find opportunities to involve users in discovery, evaluation and description of content
continue to explore smart tools to help build and manage collections
Questions, feedback