NLP in Scala with Breeze and Epic
description
Transcript of NLP in Scala with Breeze and Epic
NLP in Scala with Breeze and Epic
David HallUC Berkeley
ScalaNLP Ecosystem
Breeze Epic Puck
• Linear Algebra• Scientific Computing• Optimization
• Natural Language Processing
• Structured Prediction
• Super-fast GPUparser for English≈ ≈
Numpy/Scipy PyStruct/NLTK
≈{ }
Some fruit visionaries say the Fuji could someday tumble the Red Delicious from the top of America's apple heap.
PP
NP
VP
VPNP
V S
VP
S
It certainly won’t get there on looks.
Natural Language Processing
Epic for NLP
Some fruit visionaries say the Fuji could someday tumble the Red Delicious from the top of America's apple heap.
PP
NP
VP
VPNP
V S
VP
S
Named Entity Recognition
Person Organization Location Misc
NER with Epic> import epic.models.NerSelector> val nerModel = NerSelector.loadNer("en").get> val tokens = epic.preprocess.tokenize("Almost 20
years ago, Bill Watterson walked away from \"Calvin & Hobbes.\"")
> println(nerModel.bestSequence(tokens).render("O"))
Almost 20 years ago , [PER:Bill Watterson] walked away from `` [LOC:Calvin & Hobbes] . ''
Not a location!
Annotate a bunch of data?
Building an NER system> val data: IndexedSeq[Segmentation[Label, String]]
= ???> val system = SemiCRF.buildSimple(data,
startLabel,
outsideLabel)> println(system.bestSequence(tokens).render("O"))
Almost 20 years ago , [PER:Bill Watterson] walked away from `` [MISC:Calvin & Hobbes] . ''
Gazetteers
http://en.wikipedia.org/wiki/List_of_newspaper_comic_strips_A%E2%80%93F
Gazetteers
Using your own gazetteer> val data: IndexedSeq[Segmentation[Label, String]]
= ???> val myGazetteer = ???> val system = SemiCRF.buildSimple(data,
startLabel, outsideLabel,
gaz = myGazetteer)
Gazetteer• Careful with gazetteers!• If built from training data, system will use it
and only it to make predictions!• So, only known forms will be detected.• Still, can be very useful…
Semi-CR-What?• Semi-Markov Conditional Random Field• Don’t worry about the name.
Semi-CRFs
Semi-CRFs
+ score(Berkeley, CA)+ score(- A bowl of )+ score(Churchill-Brenneis Orchards)+ score(Page mandarins and medjool dates)
score(Chez Panisse)
Features
= w(starts-with-Chez)+w(starts-with-C…)+w(ends-with-P…)+w(starts-sentence)+w(shape:Xxx Xxx) +w(two-words)+w(in-gazetteer)
score(Chez Panisse)
Building your own featuresval dsl = new WordFeaturizer.DSL[L](counts) with SurfaceFeaturizer.DSL import dsl._
word(begin) // word at the beginning of the span + word(end – 1) // end of the span + word(begin – 1) // before (gets things like Mr.) + word (end) // one past the end + prefixes(begin) // prefixes up to some length + suffixes(begin) + length(begin, end) // names tend to be 1-3 words + gazetteer(begin, end)
Using your own featurizer> val data: IndexedSeq[Segmentation[Label, String]]
= ???> val myFeaturizer = ???> val system = SemiCRF.buildSimple(data,
startLabel, outsideLabel,
featurizer = myFeaturizer)
Features• So far, we’ve been able to do everything with
(nearly) no math.• To understand more, need to do some math.
Machine Learning Primer• Training example (x, y)– x: sentence of some sort– y: labeled version
• Goal:– want score(x, y) > score(x, y’), forall y’.
Machine Learning Primer
score(x, y) = wTf(x, y)
Machine Learning Primer
score(x, y) = w.t * f(x, y)
Machine Learning Primer
score(x, y) = w dot f(x, y)
Machine Learning Primer
score(x, y) >= score(x, y’)
Machine Learning Primer
w dot f(x, y) >= w dot f(x, y’)
Machine Learning Primer
w dot f(x, y) >= w dot f(x, y’)
Machine Learning Primer
f(x,y)
f(x,y’)
w
w + f(x,y)
(w + f(x,y)) dot f(x,y) >= w dot f(x,y)
The Perceptron> val featureIndex : Index[Feature] = ???> val labelIndex: Index[Label] = ???> val weights = DenseVector.rand[Double]
(featureIndex.size)
> for ( epoch <- 0 until numEpochs; (x, y) <- data ) { val labelScores = DV.tabulate(labelIndex.size) { yy =>val features = featuresFor(x, yy)weights.t * new FeatureVector(indexed)// or weights dot new FeatureVector(indexed) } …}
The Perceptron (cont’d)> val featureIndex : Index[Feature] = ???> val labelIndex: Index[Label] = ???> val weights = DenseVector.rand[Double](featureIndex.size)
> for ( epoch <- 0 until numEpochs; (x, y) <- data ) { val labelScores = ... val y_best = argmax(labelScores)
if (y != y_best) {weights += new FeatureVector(featuresFor(x, y))weights -= new FeaureVector(featuresFor(x, y_best))
}}
Structured Perceptron• Can’t enumerate all segmentations! (L * 2n)• But dynamic programs exist to max or sum• … if the feature function has a nice form.
Structured Perceptron> val featureIndex : Index[Feature] = ???> val labelIndex: Index[Label] = ???> val weights = DenseVector.rand[Double]
(featureIndex.size)
> for ( epoch <- 0 until numEpochs; (x, y) <- data ) { val y_best = bestStructure(weights, x) if (y != y_best) { weights += featuresFor(x, y) weights -= featuresFor(x, y_best) } }
Constituency Parsing
Some fruit visionaries say the Fuji could someday tumble the Red Delicious from the top of America's apple heap.
PP
NP
VP
VPNP
V S
VP
S
Multilingual Parser
Avg Ar Ba Fr Ge HeHun Ko Po
Swe
70
75
80
85
90
95"Berkeley"Epic
“Berkeley:” [Petrov & Klein, 2007]; Epic [Hall, Durrett, and Klein, 2014]
Epic Pre-built Models• Parsing
– English, Basque, French, German, Swedish, Polish, Korean
– (working on Arabic, Chinese, Spanish)• Part-of-Speech Tagging
– English, Basque, French, German, Swedish, Polish
• Named Entity Recognition– English
• Sentence segmentation– English – (ok support for others)
• Tokenization– Above languages
What Is Breeze?
Dense Vectors, Matrices, Sparse Vectors,Counters, Matrix Decompositions
What Is Breeze?
Nonlinear Optimization,Probability Distributions
Getting startedlibraryDependencies ++= Seq( "org.scalanlp" %% "breeze" % "0.8.1",
// optional: native linear algebra libraries // bulky, but faster "org.scalanlp" %% "breeze-natives" % "0.8.1")
scalaVersion := "2.11.1"
Linear Algebra> import breeze.linalg._> val x = DenseVector.zeros[Int](5)
// DenseVector(0, 0, 0, 0, 0)
> val m = DenseMatrix.zeros[Int](5,5)
> val r = DenseMatrix.rand(5,5)
> m.t // transpose> x + x // addition> m * x // multiplication by vector> m * 3 // by scalar> m * m // by matrix> m :* m // element wise mult, Matlab .*
Return Type Selection> import breeze.linalg.{DenseVector => DV}
> val dv = DV(1.0, 2.0)
> val sv = SparseVector.zeros[Double](2)
> dv + sv// res: DenseVector[Double] = DenseVector(1.0, 2.0)
> sv + sv// res: SparseVector[Double] = SparseVector(1.0, 2.0)
Return Type Selection> import breeze.linalg.{DenseVector => DV}
> val dv = DV(1.0, 2.0)
> val sv = SparseVector.zeros[Double](2)
> (dv: Vector[Double]) + (sv: Vector[Double])// res: Vector[Double] = DenseVector(1.0, 2.0)
> (sv: Vector[Double]) + (sv: Vector[Double])// res: Vector[Double] = SparseVector()
Static: VectorDynamic: Dense
Static: VectorDynamic: Sparse
Linear Algebra: Slices> val m = DenseMatrix.zeros[Int](5, 5)
> m(::,1) // slice a column// DV(0, 0, 0, 0, 0)
> m(4,::) // slice a row
> m(4,::) := DV(1,2,3,4,5).t
> m.toString0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5
Linear Algebra: Slices> m(0 to 1, 3 to 4).toString
0 0 2 3
> m(IndexedSeq(3,1,4,2),IndexedSeq(4,4,3,1))
0 0 0 0 0 0 0 0 5 5 4 2 0 0 0 0
Universal Functions> import breeze.numerics._
> log(DenseVector(1.0, 2.0, 3.0, 4.0))// DenseVector(0.0, 0.6931471805599453, // 1.0986122886681098, 1.3862943611198906)
> exp(DenseMatrix( (1.0, 2.0), (3.0, 4.0)))
> sin(Array(2.0, 3.0, 4.0, 42.0))
> // also sin, cos, sqrt, asin, floor, round, digamma, trigamma, ...
UFuncs: In-Place> import breeze.numerics._
> val v = DV(1.0, 2.0, 3.0, 4.0)
> log.inPlace(v)// v == DenseVector(0.0, 0.6931471805599453, // 1.0986122886681098, 1.3862943611198906)
UFuncs: Reductions> sum(DenseVector(1.0, 2.0, 3.0))
// 6.0> sum(DenseVector(1, 2, 3))
// 6> mean(DenseVector(1.0, 2.0, 3.0))
// 2.0> mean(DenseMatrix( (1.0, 2.0, 3.0)
(4.0, 5.0, 6.0)))// 3.5
UFuncs: Reductions> val m = DenseMatrix((1.0, 3.0), (4.0, 4.0))
> sum(m) == 12.0
> mean(m) == 3.0
> sum(m(*, ::)) == DenseVector(4.0, 8.0)> sum(m(::, *)) == DenseMatrix((5.0, 7.0))
> mean(m(*, ::)) == DenseVector(2.0, 4.0)> mean(m(::, *)) == DenseMatrix((2.5, 3.5))
Broadcasting> val dm = DenseMatrix((1.0,2.0,3.0), (4.0,5.0,6.0))
> sum(dm(::, *)) == DenseMatrix((5.0, 7.0, 9.0))
> dm(::, *) + DenseVector(3.0, 4.0)// DenseMatrix((4.0, 5.0, 6.0), (8.0, 9.0, 10.0)))
> dm(::, *) := DenseVector(3.0, 4.0)// DenseMatrix((3.0, 3.0, 3.0), (4.0, 4.0, 4.0))
UFuncs: Implementation> object log extends UFunc
> implicit object logDouble extends log.Impl[Double, Double] { def apply(x: Double) = scala.math.log(x)
}
> log(1.0) // == 0.0
> // elsewhere: magic to automatically make this work:> log(DenseVector(1.0)) // == DenseVector(0.0)> log(DenseMatrix(1.0)) // == DenseMatrix(0.0)> log(SparseVector(1.0)) // == SparseVector()
UFuncs: Implementation> object add1 extends UFunc
> implicit object add1Double extends add1.Impl[Double, Double] { def apply(x: Double) = x + 1.0
}
> add1(1.0) // == 2.0
> // elsewhere: magic to automatically make this work:> add1(DenseVector(1.0)) // == DenseVector(2.0)> add1(DenseMatrix(1.0)) // == DenseMatrix(2.0)> add1(SparseVector(1.0)) // == SparseVector(2.0)
Breeze Benchmarks
Breeze jblas Mahout Apache Commons0
500
1000
1500
2000
2500
3000
135.04
685.5
2626
2151
Singular Value Decomposition
Lower is better
Breeze Benchmarks
Breeze jblas Mahout Apache Commons0
500
1000
1500
2000
2500
3000
26
2562
56
Sparse/Dense Multiply
Lower is better
N/A
Breeze Benchmarks
Breeze jblas Mahout Apache Commons0
5
10
15
20
25
30
0.037 0.075
25.376
Sparse/Dense Addition
Lower is better
N/A
Nonnegative Matrix Factorization
V W H≈ Vij, Wij, Hij >= 0
Nonnegative Matrix Factorization
• Input: matrix V• Output: W * H ≈ V, W and H ≥ 0
[Lee and Seung, 2011]
Breeze NMF val n = V.rows val m = V.cols val r = dims val W = DenseMatrix.rand(n, r) val H = DenseMatrix.rand(r, m)
for(i <- 0 until iters) { H :*= ((W.t * V) :/= (W.t * W * H)) W :*= ((V * H.t) :/= (W * H * H.t)) }
(W, H)
From Breeze to Gust
+
Breeze NMF val n = X.rows val m = X.cols val r = dims val W = DenseMatrix.rand(n, r) val H = DenseMatrix.rand(r, m)
for(i <- 0 until iters) { H :*= ((W.t * X) :/= (W.t * W * H)) W :*= ((X * H.t) :/= (W * H * H.t)) }
(W, H)
Gust NMF val n = X.rows val m = X.cols val r = dims val W = CuMatrix.rand(n, r) val H = CuMatrix.rand(r, m)
for(i <- 0 until iters) { H :*= ((W.t * X) :/= (W.t * W * H)) W :*= ((X * H.t) :/= (W * H * H.t)) }
(W, H)
≥6x speedup on my laptop!
Optimization
(x2 + y – 11)2 + (x + y2 – 7)2
Optimizationtrait DiffFunction[T] extends (T=>Double) { /** Calculates both the value
and the gradient at a point */ def calculate(x:T):(Double,T)}
Optimizationval df = new DiffFunction[DV[Double]] { def calculate(values: DV[Double]) = { val gradient = DV.zeros[Double](2) val (x,y) = (values(0),values(1)) val value = pow(x * x + y - 11, 2) + pow(x + y * y - 7, 2) gradient(0) = 4 * x * (x * x + y - 11) + 2 * (x + y * y - 7) gradient(1) = 2 * (x * x + y - 11) + 4 * y * (x + y * y - 7)
(value, gradient)
}}
Optimizationbreeze.optimize.minimize(df, DV(0.0, 0.0))
// DenseVector(3.0000000000012905, 1.999999999997128)
Optimization• Fully configurable• Support for:– LBFGS (+ L2/L1 Regularization)– Stochastic Gradient (+ L2/L1 Regularization)– Truncated Newton– Linear Programs– Conjugate Gradient Descent
Thanks!
Breeze Epic Puck
www.github.com/dlwh/{breeze, epic, puck}