Using for NLPMichelle CasbonText By the BayMay 17, 2016San Francisco
The construction of predictive models, trained on features
extracted from raw text
Turn text into numbers, do some math, and turn
it back into text.
NLP in the wild• Data ingestion• Interactive Voice Response• SMS prioritization• Multilingual news• Release feedback• Intent to purchase
Prediction
Math to the rescue
ln[p/(1-p)] = a + BX + e
p/(1-p) = e(a + BX + e)
p = 1/[1 + e(-a - BX)]
MLlib to the rescue
Training Datapipeline.fit(training)
[1.0, 3.0, 7.0, …]
IdiML to the rescuehttps://github.com/g-c-k/idiml
IdiML• Feature extraction• Model training• Prediction
[1.0, [1.0, 0.0, 3.0]]
FeatureExtraction
Training
Prediction
[1.0, 0.0, 3.0]
Lorem ipsumdolor sitamet,consecteturadipiscing elit
PROFIT
Featurization
ExtractContent Tokenize
Bigrams
Trigrams
FeatureLookup
[1.0, 0.0, 3.0]
Vector
Lorem ipsumdolor sitamet,consecteturadipiscing elit
Model Training
LogisticRegressionWithLBFGS
[1.0, [1.0, 0.0, 3.0]]
LabeledPoint
ModelStorage
[1.0, 0.0, 3.0]
Vector
Addclassification
LogisticRegressionModel
Prediction
ExtractContent Tokenize
Bigrams
Trigrams
FeatureLookup
[0.0, 1.0, 4.0]
Vector
ModelLookup
Predict
Newdocument
[0.0, 1.0, 4.0]
Vector
ClassificationLookup
Lorem ipsumdolor sitamet,consecteturadipiscing elit
PROFIT
What makes it so great?
Single object
Flexibility• Deployment environment• Device• Logging framework
Standardization for developers
Corefunctionality CustomML
…
RESTAPI
IdiMLpersistence
layer
Version Control
Hyperparameter Tuning
Performance… if you have small data
Task Timein µs
Vector prediction 300
DataFrame prediction 7800
DataFrames are slow ...
Performance
Computing power to process the entire Twitter feed in real-time
from this: to this:
What’s next for IdiML?• Support more statistical
models• Expand automated
hyperparameter tuning across the full training pipeline• Support more options
for featurization• Generic external
touchpoints
Summary• Flexibility, speed, woot!• Continuous stream processing, woot!• Multi-language support, woot!• Scala & MLlib, woot!
Top Related