Nonword Repetition and Sentence Repetition as Clinical Markers of SLI: The Case of Cantonese
Kotonoha: An Example Sentence Based Spaced Repetition System
-
Upload
eiennohito -
Category
Education
-
view
248 -
download
0
Transcript of Kotonoha: An Example Sentence Based Spaced Repetition System
KotonohaAn Example Sentence Based
Spaced Repetition SystemArseny Tolmachev, Sadao Kurohashi
Kyoto UniversityGraduate School of Informatics
D1
2017-03-15
Background: Learning words with flashcards• Lexical knowledge is crucial for language learning• Mostly self-learning• E.g. Japanese Language Proficiency Test level N1
requires knowing about 10,000 words
2
Flashcards: a method of organizing information, which can be formulated in question-answer form, for learning
Question Answer
3
Spaced Repetition
graph is from supermemo.com website
4
Spaced Repetition: Software• One of first implementations:
https://www.supermemo.com• The most popular SRS: Anki
http://ankisrs.net/
• And much more
5
Japanese (Word) Learning Tools• Most of them are for elementary/beginner learners• Hiragana/katakana• Fixed word lists/lessons
• Tools for advanced learners are scarce• Anki• Several more
6
Motivation: Context• We use words in context: with other words• Contextual word usage differ from language to
language
Life example: • バスは乗客を拾った
Non-canonical usage in Japanese, OK in Russian• バスは乗客を乗せた
Canonical usage in Japanese
7
Flashcard problems• Creating flashcards from scratch is time-consuming• Need to fill all information• Possibly find example sentences somewhere
• Premade decks do not work as well as manually created• Matter of UI and system implementation
• Lack of context• Especially in questions• Card content is usually fixed
(e.g. only one context)
8
Kotonoha SRS• Web (responsive)• +mobile apps (in plans)
• Flashcards• Spaced Repetition• Intermediate+Features• Example sentences• In question cards
• Batch operation• Japanese-oriented
https://kotonoha.ws
9
Kotonoha: Usage Pattern• Find new words• Reading books, classes, assignments
• Add words into the system• Kotonoha makes it easy to add new words
• Repeat flashcards• E.g 100 cards every day• Learn word usage too:
Kotonoha shows a new example each repetition
• Have a rich vocabulary (in a long term)
Kotonoha: Adding words
10
Batch operation
Words are added in lists
Kotonoha fills reading and glosses from dictionary (JMDict, Warodai)
Kotonoha assigns example sentences
Easy to learn words you want
Word was already added
Word was not added before
Report that you forgot the word
11
Kotonoha: Adding words (2)
Check what gets into flashcards Recommendations: words using same characters
12
Kotonoha: RepetitionQuestion card (reading) Answer card
13
Kotonoha: Writing PrintoutsPrint out and practice writing
difficult words
14
Example sentences• Automatically extracted from web corpus
• Tatoeba corpus is small and not very diverse
• Consider a set of sentences for a target word• Three aspects: Value, Diversity, Coverage• Intrinsic Value (for a single sentence)
• Not a garbage sentence like a fragment of something• Representative usage of target• Understandable by a learner• Grammatical
• Diversity (for a sentence set)• Different usages of target, distinct words
• Coverage: acquire usages of rare words and rare senses
15
Example sentence extraction overview
私は走るのすき走っている子供を見た…遊びに走る若者酒に走りたい気持ち…悪事千里を走る
Query走る
SearchEngine
High-quality sentences(~10-15)
Preprocessing
Raw Corpus Analyze and index
Search Selection
Solving coverage problem Dealing with value
and diversity
• Distributed • Handles huge corpora• Uses lexical dependency information• Prefers sentences with rich syntactic
structure near target
Example Candidates (~10k sentences)
Details are out of scope of this presentation
16
Flashcards: Daily Repetition
17
Flashcards: Daily Repetition
18
Flashcards: Daily Repetition
19
Flashcards: Daily Repetition
Example sentence evaluation
This is idea, no results yet
Show different sentences to learners of similar level
Assumption:Good example sentences help to remember words.
Assumption 2:We can use confidence to judge sentence educational quality
21
Collecting NLP training dataKotonoha can be useful source of NLP training data for:
Reading estimation
Word sense disambiguation
Learners are interested to get this information right
Presently only reporting is implemented
22
Implementation problems• Three segmentation standards in one package• Flashcards are mostly JMDict-based
• And words over there are rather inconsistent• On the other hand, it is not a segmentation dictionary
• Example sentence extraction uses JUMAN/KNP pipeline• Reading estimation is done using KyTea/UniDic
• And resources for reading-annotated Japanese are extremely sparse :(
• Because of this• Some example sentence coverage problems• Reading estimation errors
23
Kotonoha: Present• Available: https://kotonoha.ws• Open source (core SRS)
• https://github.com/kotonoha/server
• (Very low volume) open beta test• Will try to increase user base in following months• Potential users (Japanese Learners) are very welcome!
• Side note:• https://github.com/kotonoha/akane• JUMAN/KNP/KyTea + other Scala library