SMART-GS: A Tool for Studying Digitized Historical Manuscripts

30
SMART-GS: A Tool for Studying Digitized Historical Manuscripts Yuta Hashimoto PhD student, Department of Humanistic Informatics Kyoto University March 15, 2015 @ University of Michigan

Transcript of SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Page 1: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Yuta HashimotoPhD student, Department of Humanistic InformaticsKyoto University

March 15, 2015 @ University of Michigan

Page 2: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Introduction• Who am I

• A PhD student studying DH at Kyoto University• Research interest: Digital History• Background: History of Science• Also an iOS/Android Developer

• Kin Digi Reader (近デジリーダー) • A mobile reader for the Kindai Digital Library

• In this talk, I will…• Introduce an application named SMART-GS• And its possible contributions to Japanese studies

Page 3: SMART-GS: A Tool for Studying Digitized Historical Manuscripts
Page 4: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

What is SMART-GS?

• A transcription/annotation suite for digitized historical manuscripts

• Has been developed in Kyoto University since 2007

• An open source project

• SMART-GS is NOT• An OCR application for handwritten texts• A language-dependent application

Page 5: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

A Screenshot

Page 6: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Project Background:The Increase of Large-Scale Digital Archives

Page 7: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

How Should Historians Handle Digital Images?

David Hilbert (1862-1943)

Page 8: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Problems with Paper-based Research

1. Papers are heavy and require space

2. Difficult to share the “metadata” added to the manuscripts with co-workers

3. Organizing information is also difficult• Searching, grouping, indexing, etc…

Page 9: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Main Features of SMART-GS

Page 10: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Introducing SMART-GS

Page 11: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Markup Functions for Texts and Images

• Various ways of marking up image regions:• rectangle or polygon shape• Drawing an arrow from one

region to another• Putting a comment on it• etc.

• HTML markup for texts:• Highlighting a certain word or

phrase• Adding a link to an external

website

Page 12: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Linking Markups

• Any two markups can be linked to each other

• These links are one-to-many and bidirectional

• Link itself can be annotated

Page 13: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Word Spotting for Handwritten Text (DSC Search)

Search results for query “Scheler” (a German philosopher’s name)

Page 14: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

How DSC Search indexes images1. Separate the image into

lines

2. Divide each line into thin slits

3. Compute a gradient vector for each pixel in each slits

4. Accumulate these gradient vectors (which will be used as “feature vectors”)

Page 15: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

How DSC Search Finds Similar Images

Query image

Candidate Image

• DSC Search calculates the “distances” between the query and candidate images by comparing their feature vector sequences

• The smaller the distance is, the more likely two images have similar shapes

Page 16: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Pros and Cons of DSC Search

• Pros• Can be applied to any type of documents, regardless of

its languages and text directions• No need for executing machine learning

• Cons• Requires preprocessing by users for separating lines• Not accurate for manuscripts written by multiple authors

Page 17: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Applications of SMART-GS to Historical Research Projects

Page 18: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Transcription Project of Kuratomi’s Diary

• Baron Yuzaburo Kuratomi (1853-1943)• An elite bureaucrat-politician of

Meiji, Taisho, and early Showa era

• Project goal• to publish complete transcription of

Kuratomi’s diary• which consists of more than 300

notebooks

Page 19: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Team-based Transcription with SMART-GS

WebDAV Server

gsx file

1. Create draft transcriptions

2. Add annotations

3. Revise and finalize transcription texts

Page 20: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Transcription of Hajime Tanabe’s Lecture Notebooks

• Hajime Tanabe (1885-1962)• One of prominent philosophers

of Kyoto School

• Tanabe’s lecture notebooks• Written in Japanese, German,

Latin, Greek, and English• And written in extremely bad

handwriting

Page 21: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Group Reading of Tanabe’s Notebooks

Page 22: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Transcription of Earthquake Recordings

◀ Teibi Shinsai Roku ( 丁未震災録 ): A recording of a large earthquake that took place in 1847

▲Reading Group of Earthquake Recordings (古地震研究会)

Page 23: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

How SMART-GS can Contribute to Japanese Studies

Page 24: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

As a Group Learning Tool

Page 25: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Creating a Shared Dictionary with SMART-GS

Page 26: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

As a Platform for the International Collaboration

• NIJL’s large-scale project• Titled “Construction of the International Collaborative

Network on Japanese Classical Books”• 0.3 million books will be digitized and published on the

web by 2024

Page 27: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Our Current Attempts

• To have NIJL use SMART-GS as their official transcription tool

• And to make SMART-GS a global platform for Japanese studies

• So that scholars all over the world can cooperate through the network on the same platform

Page 28: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Ongoing Development: the Web Version

Page 29: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Conclusion• More and more digital images of historical manuscripts have become available on the web

• SMART-GS provides a set of features to handle these digital images effectively

• And it offers ways to collaborate with other scholars through the network

• Our next attempt is to make SMART-GS a global platform where scholars can collaborate with each other

Page 30: SMART-GS: A Tool for Studying Digitized Historical Manuscripts

Thank you for listening!

ご清聴ありがとうございましたAny questions and comments?