Lexical Simplification
A subtask of text simplification Replacing words or short phrases by
simpler variants in a context aware fashion
Motivation To reach out to wider range of readers
having limited vocabulary▪ Children▪ People with low literacy level or cognitive
disability▪ Second language learners
Involved Processes
Identification of complex words or phrases
Substitute lookup Synonyms from thesaurus Distributional similarity
Context-based ranking
Examples
Technical Medical Language Hypertension risk factors include obesity,... High blood pressure risk factors include excessive
weight,... Legal Language
The Products transacted through the Service are... The Products managed through the Service are...
Low Literacy Readers Hitler committed terrible atrocities during the
second World War Hitler committed terrible cruelties during the
second World War
Related Approaches
Knowledge-based approach Using thesaurus, Wordnet Hard to capture all simplification contexts
Lexical simplification as paraphrasing Paraphrasing does not deal with complexity
reduction specifically Lexical simplification as machine
translation Requires a complex-simple parallel corpora Wikipedia-Simple Wikipedia corpora▪ Not comparable
Wikipedia: Resource for Lexical Simplification
Simple English Wikipedia (SEW) Edition of normal or Complex English
Wikipedia (CEW) written in simpler constructs with restricted vocabulary
Wikipedia for children, low literacy readers, second language readers etc.
121,095 content pages Semi-parallel to it’s complex counterpart
Resource: For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia, Yatskar et al.
Edit Model
An Wikipedia article evolves from one version to other with different types of edits fix edits (): correction of grammar or
factual contents simplify (): simplification of lexical items
or phrases no-op (): no edit spam ():removal of spam
Edit Model
Edits in SEW versions are mix of different types of edits
The task Separate out only simple edits from
other edits
Edit Model
Definitions article in Wikipedia correspond to a title sequence of article versions caused
by successive edits for article A word or phrase if there is version in
that contains Lexical edit instances: ▪ in one version was changed to in the next
Edit Model
probability that is applied to probability of being modified to
under operation Probability that a phrase is edited to
Our interest Probability of for simplification edit
operation () ▪ Estimate
Edit Model: Simplifying Assumptions
For the sake of simplicity, discard spam edits ()
For no-op edit ()
Edit Model: Probability Estimates
Assumption occurrences of simplification in
ComplexEW are negligible in comparison to fixes▪ Only edits occur in ComplexEW
fraction of in containing modifications in
Probability estimation of fix edit
Edit Model: Probability Estimates
fraction of in containing modifications in
Assumption: probability of any particular fix operation being applied in SimpleEW is proportional to that in ComplexEW SimpleEW fix rate might be dampened because
already-edited ComplexEW articles are copied over
fix + simple edit
Edit Model: Probability Estimates
probability that A is changed to a different word in SimpleEW
Estimate of
Estimate of
Top Related