Shifu plugin-trainer and pmml-adapter

Shifu-Plugin Demo

Lisa Hua7/21/2014

1. Convert PMML back to ML model2. Integrate to Shifu as Shifu-plugin-*3. Add examples4. Performance test for PMML evaluator

Miscellaneous

1. Compatible issue: Spark depends on Akka 2.2.3, while shifu uses 2.1.1

2. Spark overview3. About showcase

a video that introduces shifua poster that describes my projecta project title and project description

PMML Adapter Demo

Lisa Hua06/23/14

ML Framework Neural Network Logistic Regression SVM Decision Tree

Encog Support Support TBD None

Spark None Support TBD TBD

Mahout Support Support TBD TBD

H2o TBD None TBD TBD

Outline

1. Neural Network Model Conversiona. Encog NN modelb. Mahout NN model

2. Logistic Regression Model Conversiona. Encog LR model (NN)b. Spark LR modelc. Mahout LR model

2. PMML Adapter API and how to extend PMML Adapter

Performance Test

protected void initMLModel() {...

mlModel = new MultilayerPerceptron();

mlModel.addLayer(20, false, "Identity");

// numInputFields,isFinalLayer,squashFunction

mlModel.addLayer(45, false, "Sigmoid");

mlModel.addLayer(1, true, "Sigmoid");

for (MahoutData data : inputDataSet) {

mlModel.trainOnline(data.getInput()); …}

protected void adaptToPMML() {...

Matrix[] matrixList = nnModel.getWeightMatrices();...

squashFunctions: 1. only supports identity and sigmoid now.2. squashFunctionList is protected without getter function, now we set activationFunction as sigmoid by default.

Mahout NN Model - trainOnline()

//in Adapterfor (int k = 1; k < columnSize; k++) {

neuron.withConnections(new Connection(matrix.get(j, k))); } // bias neuron for each layer, set to bias=1 neuron.withConnections(new Connection(matrix.get(j, 0)));

Bias is the first Neuron in each layer that is not the final layer

protected void evaluatePMML() {

for (int i = 0; i < mahoutDataSet.size(); i++) {

Assert.assertEquals(

getPMMLEvaluatorResult(pmmlEvalResultList.get(i)),

getMahoutResult(mahoutDataSet.get(i)),

DELTA);//DELTA=10-5

private double getMahoutResult(MahoutData data) {

return mlModel.getOutput(data.getEvalInput()).get(0);

Mahout NN Model - getOutput()

Outline

Encog LR Model - compute()

lrModel = (BasicNetwork) networkReader.read(new

FileInputStream("EncogLR.lr"));

double[] weights = lrModel.getWeights();...}

}protected void evaluatePMML() {

for (int i = 0; i < dataSet.size(); i++) {

Assert.assertEquals( getPMMLEvaluatorResult(index++),

getNextEncogLRResult(mlResultIterator), DELTA);

private double getNextEncogLRResult(Iterator<MLDataPair>

mlResultIterator) {

MLData result =

lrModel.compute(mlResultIterator.next().getInput());

return result.getData(0);

Spark LR Model: train() and predict()

lrModel = LogisticRegressionWithSGD.train(points.rdd(),

iterations,stepSize);

List<double> weights = lrModel.weights();

protected void evaluatePMML() {... List<Double> evalList = lrModel.predict(evalRDD).cache().collect();

for (...) {

Assert.assertEqual( getPMMLEvaluatorResult(i),

sparkEvalList.get(i),DELTA);

Notes: 1. The method lrModel.weights() returns intercept followed by the weight list.

2. Compatible issue:

Spark depends on Akka 2.2.3, while shifu uses 2.1.1. Currently, these is compatible issue if we change Akka version of shifu-core from 2.1.1 to 2.2.3, I suspect the issue lies in Guagua based on the building history, the root cause is still unknown to me.

Mahout LR Model - train() and classifyScalar()

lrModel = new OnlineLogisticRegression(2, 20, new L1());

//numCategory, numFeatures, PriorFunction

for (MahoutDataPair pair :

inputDataSet) {

lrModel.train(pair.getActual(),

pair.getFeatureField());

protected void adaptToPMML() {... Matrix matrix = lrModel.getBeta(); // coefficients. This is a dense matrix

// that is (numCategories-1) x numFeatures

private double

getMahoutResult(MahoutDataPair data) {

return

lrModel.classifyScalar(data.getVector());

//Returns a single scalar

probability in the case where we have two

categories.

Summary of Evaluation Dataset

Model ML Framework Input Data Field Input Data Evaluation Data Nodes in each layer

NeuralNetwork

Encog 2 layers 20 450118

20,45,45,1560

Encog 3 layers 25 450 550 25,20,15,20,1

Mahout 2 layers 20 450118

20,45,45,1560

Mahout 3 layers 25 450 550 25,20,15,20,1

LogisticRegression

Encog 20 450118

LogisticRegression

Spark 20 450118

LogisticRegression

Mahout 20 450118

Summary of the Functions

model class nameparent class/interface Training method

retrieve training result

evalution method

Basic Data Structure

Neural Network

BasicNetork MLClassificationcompute (MLDataSet data)

getWeights(): double[] compute()

MLData: Double[], MLDataSet: Set<Double[]>

Logistic Regression

SparkLogistic Regression

Logistic Regression Model

GeneralLinearModel, ClassificationModel train(RDD data) weights():double[]

predict (RDD <Vector>): RDD<Double>

RDD: Resilient Distributed Dataset

Mahout

Neural Network

Multilayer Perceptron NeuralNetwork

trainOnline (Vector instance)

getWeightMatrices ():Matrix

getOutput (Vector):Vector

VectorMatrix: List<Vector>

Logistic Regression

Online Logistic Regression

AbstractOnline LogisticRegression

train(Vector actual, Vector instance) getBeta(): Matrix

classifyScalar (Vector instance) :double

Outline

3. PMML Adapter API

1. For new ML model conversiona. implement a subclass of PMMLModelBuilder<TargetPMMLModel, SourceMLModel>, implement adaptMLModelToPMML()

Next Step

● Support: supported by PMML Adapter● None: The ML framework doesn’t support this ML

model currently ● TBD: To be determined

ML Framework Neural Network Logistic Regression SVM Decision Tree

Encog Support Support TBD None

Spark None Support TBD TBD

Mahout Support Support TBD TBD

H2o TBD None TBD TBD

1. PMML skeleton - Neural Network<PMML>

<Header></Header><DataDictionary></DataDictionary> (specify the format of the input csv)<NeuralNetwork functionName=”classification”> (models)

<MiningSchema></MiningSchema> (how to use the input data)<LocalTransformation></LocalTransformation> (specify derived field)

<NeuralInput></NeuralInput> (Input layer, which field should be used)

<NeuralLayers> (Layers,not include input layer and output layer)<NeuralLayer

activationFunction=”logistic”><Neuron id=”X,Y” bias=”0.0”>

</NeuralLayer></NeuralLayers> <NeuralOutputs numberOfOutputs="1">

2.1 PMML Neural Network - Mahout

2,3,1{ 0 => {0:-0.2861259717601905,1:-0.4079344783742465,2:-0.43218273192749174} 1 => {0:0.223912887382075,1:-0.08865866120943716,2:0.4095464158191267} 2 => {0:0.14754755237008804,1:0.2638192545136143,2:0.06633581725392071}}{ 0 => {0:0.04388751672411058,1:-0.35597268769777723,2:0.21149680575173224,3:0.34402628331423807}}0.5635827615510126,0.5482023969601073,0.5609684690326279,0.5751568027254008,

Propagation Weight train evaluate

Encog backpropagation double[] MLTrain/Propagation

Mahout feed-forward Matrix network.trainOnline (vector)

network.getOutput(vector)

3. PMML Evaluationpublic Map<String, Double> evaluateRaw(EvaluationContext context){

NeuralNetwork neuralNetwork = getModel();Map<String, Double> result = Maps.newLinkedHashMap();NeuralInputs neuralInputs = neuralNetwork.getNeuralInputs();for(NeuralInput neuralInput: neuralInputs){

DerivedField derivedField = neuralInput.getDerivedField();FieldValue value = ExpressionUtil.evaluate(derivedField, context);...result.put(neuralInput.getId(), (value.asNumber()).doubleValue());

}List<NeuralLayer> neuralLayers = neuralNetwork.getNeuralLayers();for(NeuralLayer neuralLayer : neuralLayers){

List<Neuron> neurons = neuralLayer.getNeurons();for(Neuron neuron : neurons){

double z = neuron.getBias();//the bias for each Neuron, should be set to 0

List<Connection> connections = neuron.getConnections();for(Connection connection : connections){

double input = result.get(connection.getFrom());z += input * connection.getWeight();

}double output = activation(z, neuralLayer);result.put(neuron.getId(), output);

}normalizeNeuronOutputs(neuralLayer, result);

}return result;

}private double activation(double z, NeuralLayer neuralLayer){...

switch(activationFunction){case LOGISTIC: return 1.0 / (1.0 + Math.exp(-z)); //Sigmoidcase IDENTITY: return z; ...//Linear

How to get score from PMML evaluator - EvaluatorTest

PMML pmml = loadPMML(getClass()); //InputStream is = getResourceAsStream("/pmml/" +getSimpleName() + ".pmml");//return IOUtil.unmarshal(is);

NeuralNetworkEvaluator evaluator = new NeuralNetworkEvaluator(pmml);InputStream is = getClass().getResourceAsStream("/pmml/NormalizedData.csv");List<Map<FieldName, String>> input = CsvUtil.load(is);for (Map<FieldName, String> maps : input) {

Map<FieldName, NeuronClassificationMap> evaluateList = (Map<FieldName, NeuronClassificationMap>)

evaluator.evaluate(maps);for

(NeuronClassificationMap cMap : evaluateList.values())

for (Map.Entry<?, Double> entry : cMap.entrySet())

System.out.println(index++ +":"+entry.getKey() + ":" + entry.getValue() * 1000);

List<FieldName> activeFields = evaluator.getActiveFields();

Shifu plugin-trainer and pmml-adapter

Technology

Transcript of Shifu plugin-trainer and pmml-adapter

Pattern: PMML for Cascading and Hadoopchbrown.github.io/kdd-2013-usb/workshops/PMML/docs/pattern.pdf · property in predictive models, and the core competencies of analytics sta .

Lee Xian Science Shifu

PCM Plugin - d36j349d8rqm96.cloudfront.net PCM Plugin Manual.pdf · PCM Plugin Manual – Version 1.0 – April 2014 Page 3 of 24 What is the PCM Plugin? The PCM Plugin is the plugin

A Standardized PMML Format for Representing Convolutional ...

Pattern: PMML for Cascading and Hadoop

CAD - dl.sariasan.comdl.sariasan.com/New Softwares/Dialux 1 SariAsan.pdf · PowerEn.ir dialux 2 plugin. 1.X. plugin, plugin luminaire selection Plugin plugin. home page, Intenet Explorer

Snakes in a plugin - WordPress plugin security

3.06 Logo Logon Plugin Exercisen Plugin Exercise

CKEditor Plugin Documentation (AutoRecovered) Plugin Documentation.pdfCKEditor 4 Plugin Documentation Contents ... This plugin allows you to install CKEditor - the Smart WYSIWYG HTML

Easyazon plugin - Easyazon plugin review, Easyazon plugin download, and use

MASTER SHIFU - University of Florida Repo… · MASTER SHIFU STUDENT NAME: Vikramadityan. M ROBOT NAME: Master Shifu ... back walk away from it. The demo was done on the obstacle

Package ‘pmml’ - R · PDF filePackage ‘pmml’ January 8, 2018 Type Package Title Generate PMML for Various Models Version 1.5.4 Date 2018-01-08 Author Graham Williams, Tridivesh

deepblue.lib.umich.edu€¦ · Acknowledgments Iwouldliketoexpressmydeepestappreciationtomytwoadvisors,shifu, andfriends,MichaelCafarellaandH.V.Jagadish,fortheirrelentlesssup ...

vCares Issue 1 - The Volunteer Shifu

MariaDB Sicherheit: Audit Plugin, Authentification Plugin, Rollen

Shifu- The Ancient Craft of Handmade Paper Thread and Its ...

Easy Execution of Data Mining Models through PMML

Package ‘pmml’ - The Comprehensive R Archive Networkcran.mtu.edu/web/packages/pmml/pmml.pdf · Package ‘pmml’ August 5, 2015 Type Package Title Generate PMML for Various Models

wenyongxin@huawei.com Huang Shifu Wen Yongxin on TTCN-3 2/a... · 2013-01-15 · HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential Security Level: Wen Yongxin Huang Shifu Testing

ACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop