SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Post on 13-Apr-2017

143 views 0 download

Transcript of SMART-GS: A Tool for Studying Digitized Historical Manuscripts

SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Yuta HashimotoPhD student, Department of Humanistic InformaticsKyoto University

March 15, 2015 @ University of Michigan

Introduction• Who am I

• A PhD student studying DH at Kyoto University• Research interest: Digital History• Background: History of Science• Also an iOS/Android Developer

• Kin Digi Reader (近デジリーダー) • A mobile reader for the Kindai Digital Library

• In this talk, I will…• Introduce an application named SMART-GS• And its possible contributions to Japanese studies

What is SMART-GS?

• A transcription/annotation suite for digitized historical manuscripts

• Has been developed in Kyoto University since 2007

• An open source project

• SMART-GS is NOT• An OCR application for handwritten texts• A language-dependent application

A Screenshot

Project Background:The Increase of Large-Scale Digital Archives

How Should Historians Handle Digital Images?

David Hilbert (1862-1943)

Problems with Paper-based Research

1. Papers are heavy and require space

2. Difficult to share the “metadata” added to the manuscripts with co-workers

3. Organizing information is also difficult• Searching, grouping, indexing, etc…

Main Features of SMART-GS

Introducing SMART-GS

Markup Functions for Texts and Images

• Various ways of marking up image regions:• rectangle or polygon shape• Drawing an arrow from one

region to another• Putting a comment on it• etc.

• HTML markup for texts:• Highlighting a certain word or

phrase• Adding a link to an external

website

Linking Markups

• Any two markups can be linked to each other

• These links are one-to-many and bidirectional

• Link itself can be annotated

Word Spotting for Handwritten Text (DSC Search)

Search results for query “Scheler” (a German philosopher’s name)

How DSC Search indexes images1. Separate the image into

lines

2. Divide each line into thin slits

3. Compute a gradient vector for each pixel in each slits

4. Accumulate these gradient vectors (which will be used as “feature vectors”)

How DSC Search Finds Similar Images

Query image

Candidate Image

• DSC Search calculates the “distances” between the query and candidate images by comparing their feature vector sequences

• The smaller the distance is, the more likely two images have similar shapes

Pros and Cons of DSC Search

• Pros• Can be applied to any type of documents, regardless of

its languages and text directions• No need for executing machine learning

• Cons• Requires preprocessing by users for separating lines• Not accurate for manuscripts written by multiple authors

Applications of SMART-GS to Historical Research Projects

Transcription Project of Kuratomi’s Diary

• Baron Yuzaburo Kuratomi (1853-1943)• An elite bureaucrat-politician of

Meiji, Taisho, and early Showa era

• Project goal• to publish complete transcription of

Kuratomi’s diary• which consists of more than 300

notebooks

Team-based Transcription with SMART-GS

WebDAV Server

gsx file

1. Create draft transcriptions

2. Add annotations

3. Revise and finalize transcription texts

Transcription of Hajime Tanabe’s Lecture Notebooks

• Hajime Tanabe (1885-1962)• One of prominent philosophers

of Kyoto School

• Tanabe’s lecture notebooks• Written in Japanese, German,

Latin, Greek, and English• And written in extremely bad

handwriting

Group Reading of Tanabe’s Notebooks

Transcription of Earthquake Recordings

◀ Teibi Shinsai Roku ( 丁未震災録 ): A recording of a large earthquake that took place in 1847

▲Reading Group of Earthquake Recordings (古地震研究会)

How SMART-GS can Contribute to Japanese Studies

As a Group Learning Tool

Creating a Shared Dictionary with SMART-GS

As a Platform for the International Collaboration

• NIJL’s large-scale project• Titled “Construction of the International Collaborative

Network on Japanese Classical Books”• 0.3 million books will be digitized and published on the

web by 2024

Our Current Attempts

• To have NIJL use SMART-GS as their official transcription tool

• And to make SMART-GS a global platform for Japanese studies

• So that scholars all over the world can cooperate through the network on the same platform

Ongoing Development: the Web Version

Conclusion• More and more digital images of historical manuscripts have become available on the web

• SMART-GS provides a set of features to handle these digital images effectively

• And it offers ways to collaborate with other scholars through the network

• Our next attempt is to make SMART-GS a global platform where scholars can collaborate with each other

Thank you for listening!

ご清聴ありがとうございましたAny questions and comments?