1 Newspaper Digitisation Workflows Rose Holley- Manager ANDP Presentation to Cultural Heritage...

of 35 /35
1 Newspaper Digitisation Newspaper Digitisation Workflows Workflows Rose Holley- Manager ANDP Rose Holley- Manager ANDP Presentation to Cultural Heritage Digitisation professionals Presentation to Cultural Heritage Digitisation professionals 26 November 2008 26 November 2008

Embed Size (px)

Transcript of 1 Newspaper Digitisation Workflows Rose Holley- Manager ANDP Presentation to Cultural Heritage...

  • Slide 1
  • 1 Newspaper Digitisation Workflows Rose Holley- Manager ANDP Presentation to Cultural Heritage Digitisation professionals 26 November 2008
  • Slide 2
  • 2 Preparing for Digitisation Preparing for Digitisation Creation of digital images Creation of digital images Adding metadata and Quality Assurance Adding metadata and Quality Assurance Optical Character Recognition Optical Character Recognition Quality Assurance Quality Assurance Other information Access & interaction Access & interaction Statistics Statistics General Workflow
  • Slide 3
  • 3 Identify title to be digitised Identify title to be digitised Source master microfilm from owner Source master microfilm from owner Send master microfilm to scanning contractors Send master microfilm to scanning contractors Add title to Content Management System Add title to Content Management System Preparing for Digitisation
  • Slide 4
  • 4 Add Title Screen
  • Slide 5
  • 5 Microfilm converted to digital images
  • Slide 6
  • 6 Image Reception Images received from scanning contractor on LTO2 Tape Images received from scanning contractor on LTO2 Tape Tapes added to tape robot and extracted Tapes added to tape robot and extracted Reels automatically added to Content Management System Reels automatically added to Content Management System Reel details are checked Reel details are checked Images ingested into Content Management System Images ingested into Content Management System
  • Slide 7
  • 7 Check Reel Details
  • Slide 8
  • 8 Ingest Reels
  • Slide 9
  • 9 Quality Assurance (QA) QA Phase 1 Add metadata (dates and page numbers) QA Phase 1 Add metadata (dates and page numbers) Supervisor reviews marked pages Supervisor reviews marked pages QA Phase 2 Define batches QA Phase 2 Define batches QA Phase 2 Resolve duplicates QA Phase 2 Resolve duplicates QA Phase 2 Create missing page targets QA Phase 2 Create missing page targets
  • Slide 10
  • 10 Adding Metadata Date and Page Sequence number added Date and Page Sequence number added
  • Slide 11
  • 11 Supervisor Review Supervisor reviews pages marked for attention Supervisor reviews pages marked for attention
  • Slide 12
  • 12 Define Batches Batches defined by date Batches defined by date Each batch contains 2-3000 images Each batch contains 2-3000 images Batches are automatically assigned a number Batches are automatically assigned a number
  • Slide 13
  • 13 Resolve Duplicates Duplicate pages compared and the best copy is selected Duplicate pages compared and the best copy is selected
  • Slide 14
  • 14 Missing page targets are generated Missing page targets are generated Missing Pages
  • Slide 15
  • 15 Optical Character Recognition (OCR) Complete batches are added to a tape Complete batches are added to a tape Tapes are generated and written by IT Tapes are generated and written by IT Tapes sent to OCR contractor Tapes sent to OCR contractor Contractor completes OCR processes Contractor completes OCR processes OCR data (not images) is returned via FTP OCR data (not images) is returned via FTP
  • Slide 16
  • 16 Tapes Created Completed batches added to a tape Completed batches added to a tape
  • Slide 17
  • 17 Optical Character Recognition (OCR) of pages and article zoning
  • Slide 18
  • 18 OCR Data Reception (Automated process) OCR contractor advises NLA server that a batch has been completed OCR contractor advises NLA server that a batch has been completed NLA server downloads the batch NLA server downloads the batch Batch is ingested into Content Management System Batch is ingested into Content Management System Checks are performed on data validity Checks are performed on data validity QA Derivatives are generated QA Derivatives are generated Articles may now be searched, but are not yet accessible Articles may now be searched, but are not yet accessible
  • Slide 19
  • 19 Batch information
  • Slide 20
  • 20 Quality Assurance (QA) A random sample of Issues and Articles is checked A random sample of Issues and Articles is checked Volume and Issue number are checked for accuracy Volume and Issue number are checked for accuracy Sample articles are checked against Quality Acceptance Criteria (QAC) Sample articles are checked against Quality Acceptance Criteria (QAC) Error rates calculated against QAC on the fly Error rates calculated against QAC on the fly Supervisor checks final result and decides on accepting the batch Supervisor checks final result and decides on accepting the batch
  • Slide 21
  • 21 Selecting the batch
  • Slide 22
  • 22 Volume & Issue Number Check
  • Slide 23
  • 23 Article checked against QAC
  • Slide 24
  • 24 Clean fields checked for accuracy
  • Slide 25
  • 25 Supervisor checks result and makes a decision
  • Slide 26
  • 26 QA Results Automated email sent to supplier advising the result Automated email sent to supplier advising the result Emails for rejected batches include a summary of errors Emails for rejected batches include a summary of errors Summary of errors saved for all batches Summary of errors saved for all batches Accepted batches are immediately accessible Accepted batches are immediately accessible
  • Slide 27
  • 27 Access Access is provided through Australian Newspapers beta Access is provided through Australian Newspapers beta Users can search or browse newspapers Users can search or browse newspapers Search results can be refined using filters Search results can be refined using filters Users can browse by Newspaper title or Date. Users can browse by Newspaper title or Date.
  • Slide 28
  • 28 Search Results
  • Slide 29
  • 29 Newspaper information
  • Slide 30
  • 30 User Interaction Users are able to : Correct the text Correct the text Add tags Add tags Add comments Add comments User-added content is not currently moderated, but may be in future.
  • Slide 31
  • 31 Statistics Stats for content received and QAd generated on request by the Content Management System Stats for content received and QAd generated on request by the Content Management System Stats for volume usage of Beta collected using Google Analytics Stats for volume usage of Beta collected using Google Analytics Stats for user contributions to beta collected on an as-needed basis Stats for user contributions to beta collected on an as-needed basis
  • Slide 32
  • 32 Content Statistics
  • Slide 33
  • 33 Work Statistics
  • Slide 34
  • 34 Usage Statistics
  • Slide 35
  • 35 Questions?