Kotonoha: An Example Sentence Based Spaced Repetition System

Post on 21-Mar-2017

248 views 0 download

Transcript of Kotonoha: An Example Sentence Based Spaced Repetition System

KotonohaAn Example Sentence Based

Spaced Repetition SystemArseny Tolmachev, Sadao Kurohashi

Kyoto UniversityGraduate School of Informatics

D1

2017-03-15

Background: Learning words with flashcards• Lexical knowledge is crucial for language learning• Mostly self-learning• E.g. Japanese Language Proficiency Test level N1

requires knowing about 10,000 words

2

Flashcards: a method of organizing information, which can be formulated in question-answer form, for learning

Question Answer

3

Spaced Repetition

graph is from supermemo.com website

4

Spaced Repetition: Software• One of first implementations:

https://www.supermemo.com• The most popular SRS: Anki

http://ankisrs.net/

• And much more

5

Japanese (Word) Learning Tools• Most of them are for elementary/beginner learners• Hiragana/katakana• Fixed word lists/lessons

• Tools for advanced learners are scarce• Anki• Several more

6

Motivation: Context• We use words in context: with other words• Contextual word usage differ from language to

language

Life example: • バスは乗客を拾った

Non-canonical usage in Japanese, OK in Russian• バスは乗客を乗せた

Canonical usage in Japanese

7

Flashcard problems• Creating flashcards from scratch is time-consuming• Need to fill all information• Possibly find example sentences somewhere

• Premade decks do not work as well as manually created• Matter of UI and system implementation

• Lack of context• Especially in questions• Card content is usually fixed

(e.g. only one context)

8

Kotonoha SRS• Web (responsive)• +mobile apps (in plans)

• Flashcards• Spaced Repetition• Intermediate+Features• Example sentences• In question cards

• Batch operation• Japanese-oriented

https://kotonoha.ws

9

Kotonoha: Usage Pattern• Find new words• Reading books, classes, assignments

• Add words into the system• Kotonoha makes it easy to add new words

• Repeat flashcards• E.g 100 cards every day• Learn word usage too:

Kotonoha shows a new example each repetition

• Have a rich vocabulary (in a long term)

Kotonoha: Adding words

10

Batch operation

Words are added in lists

Kotonoha fills reading and glosses from dictionary (JMDict, Warodai)

Kotonoha assigns example sentences

Easy to learn words you want

Word was already added

Word was not added before

Report that you forgot the word

11

Kotonoha: Adding words (2)

Check what gets into flashcards Recommendations: words using same characters

12

Kotonoha: RepetitionQuestion card (reading) Answer card

13

Kotonoha: Writing PrintoutsPrint out and practice writing

difficult words

14

Example sentences• Automatically extracted from web corpus

• Tatoeba corpus is small and not very diverse

• Consider a set of sentences for a target word• Three aspects: Value, Diversity, Coverage• Intrinsic Value (for a single sentence)

• Not a garbage sentence like a fragment of something• Representative usage of target• Understandable by a learner• Grammatical

• Diversity (for a sentence set)• Different usages of target, distinct words

• Coverage: acquire usages of rare words and rare senses

15

Example sentence extraction overview

私は走るのすき走っている子供を見た…遊びに走る若者酒に走りたい気持ち…悪事千里を走る

Query走る

SearchEngine

High-quality sentences(~10-15)

Preprocessing

Raw Corpus Analyze and index

Search Selection

Solving coverage problem Dealing with value

and diversity

• Distributed • Handles huge corpora• Uses lexical dependency information• Prefers sentences with rich syntactic

structure near target

Example Candidates (~10k sentences)

Details are out of scope of this presentation

16

Flashcards: Daily Repetition

17

Flashcards: Daily Repetition

18

Flashcards: Daily Repetition

19

Flashcards: Daily Repetition

Example sentence evaluation

This is idea, no results yet

Show different sentences to learners of similar level

Assumption:Good example sentences help to remember words.

Assumption 2:We can use confidence to judge sentence educational quality

21

Collecting NLP training dataKotonoha can be useful source of NLP training data for:

Reading estimation

Word sense disambiguation

Learners are interested to get this information right

Presently only reporting is implemented

22

Implementation problems• Three segmentation standards in one package• Flashcards are mostly JMDict-based

• And words over there are rather inconsistent• On the other hand, it is not a segmentation dictionary

• Example sentence extraction uses JUMAN/KNP pipeline• Reading estimation is done using KyTea/UniDic

• And resources for reading-annotated Japanese are extremely sparse :(

• Because of this• Some example sentence coverage problems• Reading estimation errors

23

Kotonoha: Present• Available: https://kotonoha.ws• Open source (core SRS)

• https://github.com/kotonoha/server

• (Very low volume) open beta test• Will try to increase user base in following months• Potential users (Japanese Learners) are very welcome!

• Side note:• https://github.com/kotonoha/akane• JUMAN/KNP/KyTea + other Scala library