Word representations: a simple and general method for semi-supervised learning · 2016-01-07 ·...

G U A N N A N L U

M A R C H 1 7 , 2 0 1 5

Word representations: a simple and general method for

semi-supervised learning

Outline

Motivation

Word representations

Distributional representations

Clustering-based representations

Distributed representations

Supervised evaluation tasks

Chunking

Named entity recognition (NER)

Experiments & Results

Summary

Motivation

Semi-supervised approaches can improve accuracy

It can be tricky and time-consuming

Motivation

A popular approach:

use unsupervised methods to induce word features

Motivation

A popular approach:

clustering

word embeddings

Motivation

A popular approach:

clustering

word embeddings

Questions:

Which features are good for what tasks?

Should we prefer certain word features?

Can we combine them?

Word Representations

Word representation: A mathematical object associated with each

word, often a vector

Word feature: each dimension’s value

Conventional representation E.g. One-hot representation

Problems:

Data sparsity

Distributional representations

Clustering-based representations

Brown clustering (Brown et al., 1992)

A hierarchical clustering algorithm

A class-based bigram language model

Time complexity: O(V*K2)

V is the size of the vocabulary, K is the number of clusters.

Limitations：

Only based on bigram statistics

not consider word usage

Not to be confused with distributional representations!

also known as word embeddings

dense, real-valued, low-dimensional

Neural language models

Collobert and Weston embeddings (2008) Neural language model

Discriminative and non-probabilistic

General architecture (e.g. SRL, NER, POS tagging)

Differences on implementation Not achieve the low log-rank

Corrupt the last word for each n-gram

Learning rates are separated

Distributed representation

HLBL embeddings(2009)

Log-bilinear model

Predict the feature vector of the next word

Hierarchical structure (binary tree)

Represent each word as a leaf with a particular path

Calculate the product of the probability of each binary choice

Evaluation tasks

Chunking: syntactic sequence labeling

CoNLL-2000 shared task

CRFsuite

The Penn Treebank

7936 sentences(training)

1ooo sentences (development)

Evaluation tasks

NER: sequence prediction problem

The regularized averaged perceptron model (Ratinov and Roth, 2009)

CoNLL03 shared task

204k words for training, 51k words for development, 46K words for testing

Out-of-domain dataset: MUC7 formal run (59K words)

Evaluation---Features

Chunking NER

Experiment

Unlabeled data

RCV1 corpus (63 millions words in 3.3 million sentences)

Preprocessing technique(Liang, 2005)

Remove all sentences that are less than 90% lowercase a-z.

Results

Capacity of word representations

Results

Chunking NER

Results

Chunking NER

Summary

Word features in an unsupervised, task-inspecific, and model-agnostic

manner The disadvantage Accuracy might be lower than a task-specific semi-

supervised method The contributions The first work to compare different word representations Combining different word representations can improve

accuracy further Future work Induce phrase representations Apply to other supervised NLP systems

References

Brown, P. F., deSouza, P. V., Mercer, R. L., Pietra, V. J. D., & Lai, J. C. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18, 467–479.

Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. ICML.

Landauer, T. K., Foltz, P.W., & Laham, D. (1998).An introduction to latent semantic analysis. Discourse Processes, 259–284.

Liang, P. (2005). Semi-supervised learning for natural language. Master’s thesis, Massachusetts Institute of Technology

Mnih, A., & Hinton, G. E. (2009). A scalable hierarchical distributed language model. NIPS (pp. 1081–1088).

Ratinov, L., & Roth, D. (2009). Design challenges and misconceptions in named entity recognition. CoNLL.

Turian, J., Ratinov, L., & Bengio, Y. (2010, July). Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 384-394). Association for Computational Linguistics.

Any questions?

Thank you！

Word representations: a simple and general method for semi-supervised learning · 2016-01-07 ·...

Documents

Transcript of Word representations: a simple and general method for semi-supervised learning · 2016-01-07 ·...

SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED … · SUPERVISED, SEMI-SUPERVISED AND UNSUPERVISED METHODS IN DISCRIMINATIVE LANGUAGE MODELING FOR AUTOMATIC SPEECH RECOGNITION by Erin˘c

Semi-supervised Information Extraction

1 COMP3503 Semi-Supervised Learning COMP3503 Semi-Supervised Learning Daniel L. Silver.

Factorized Graph Representations for Semi-Supervised ......Factorized Graph Representations for Semi-Supervised Learning from Sparse Data Krishna Kumar P. IIT Madras Paul Langton Northeastern

Semi-supervised Self-training for Sentence Subjectivity …ccc.inaoep.mx/~villasen/bib/Semi-supervised Self-training... · 2009-06-25 · Semi-supervised Self-training for Sentence

Using Self-Supervised Learning Can Improve Model ......Self-supervised learning holds great promise for improving representations when labeled data are scarce. In semi-supervised learning,

Supervised Nonlinear Factorizations Excel In Semi-supervised … · 2020. 12. 9. · Supervised Nonlinear Factorizations Excel In Semi-supervised Regression Josif Grabocka 1, Erind

Semi-Supervised Learning Tutorial

Step 1. Semi-supervised

Augmenting Feature-driven fMRI Analyses: Semi-supervised ... · performance of feature-driven regression of stimulus-driven data. 3 Semi-supervised Learning Semi-supervised learning

A semi-supervised learning algorithm via adaptive Laplacian …crabwq.github.io/pdf/2020 A Semi-supervised Learning... · 2021. 7. 31. · A semi-supervised learning algorithm via

Semi-supervised semantic segmentation needs strong, varied … · 2020. 9. 3. · FRENCH ET AL.: SEMI-SUPERVISED SEMANTIC SEGMENTATION 1 Semi-supervised semantic segmentation needs

Learning Semi-Supervised Representation … Semi-Supervised Representation Towards a Uniﬁed Optimization Framework for Semi-Supervised Learning Chun-Guang Li1, Zhouchen Lin2,3, Honggang

Self-Supervised Hyperboloid Representations from Logical ...

Semi-Supervised Learning - UBC Computer Scienceschmidtm/MLRG/Semi-Supervised Learning.pdf · • Semi-Supervised Learning ... Demiriz, 1999) (Demirez & Bennett, 2000) (Fung & Mangasarian,

Semi-Supervised Network Embedding

Semi-Supervised Clustering

S4L: Self-Supervised Semi-Supervised Learning...and existing semi-supervised learning methods. We then show that S4L and existing semi-supervised methods can be jointly trained, yielding

Learning Action Representations for Self-supervised Visual ...andrea/papers/2019_ICRA__LearningAction... · Learning Action Representations for Self-supervised Visual Exploration

Semi Supervised e Book