BSSML16 L8. REST API, Bindings, and Basic Workflows
-
Upload
bigml-inc -
Category
Data & Analytics
-
view
171 -
download
2
Transcript of BSSML16 L8. REST API, Bindings, and Basic Workflows
Automating Machine LearningAPI, bindings, BigMLer and Basic Workflows
#BSSML16
December 2016
#BSSML16 Automating Machine Learning December 2016 1 / 29
Outline
1 Introduction: ML as a System Service
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
#BSSML16 Automating Machine Learning December 2016 2 / 29
Outline
1 Introduction: ML as a System Service
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
#BSSML16 Automating Machine Learning December 2016 3 / 29
Machine Learning as a System Service
The goalMachine Learning as a systemlevel service
The means
• APIs: ML building blocks
• Abstraction layer over featureengineering
• Abstraction layer overalgorithms
• Automation
#BSSML16 Automating Machine Learning December 2016 4 / 29
The Roadmap
#BSSML16 Automating Machine Learning December 2016 5 / 29
Outline
1 Introduction: ML as a System Service
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
#BSSML16 Automating Machine Learning December 2016 6 / 29
RESTful-ish ML Services
#BSSML16 Automating Machine Learning December 2016 7 / 29
RESTful-ish ML Services
#BSSML16 Automating Machine Learning December 2016 8 / 29
RESTful-ish ML Services
#BSSML16 Automating Machine Learning December 2016 9 / 29
RESTful-ish ML Services
• Excellent abstraction layer
• Transparent data model
• Immutable resources and UUIDs: traceability
• Simple yet effective interaction model
• Easy access from any language (API bindings)
Algorithmic complexity and computing resourcesmanagement problems mostly washed away
#BSSML16 Automating Machine Learning December 2016 10 / 29
RESTful done right: Whitebox resources
• Your data, your model
• Model reverse engineering becomesmoot
• Maximizes reach (Web, CLI, desktop,IoT)
#BSSML16 Automating Machine Learning December 2016 11 / 29
Outline
1 Introduction: ML as a System Service
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
#BSSML16 Automating Machine Learning December 2016 12 / 29
Higher-level Machine Learning
#BSSML16 Automating Machine Learning December 2016 13 / 29
Example workflow: Batch Centroid
Objective: Label each row in a Dataset with its associated centroid.
We need to...
• Create Dataset
• Create Cluster
• Create BatchCentroid from Clusterand Dataset
• Save BatchCentroid as new Dataset
#BSSML16 Automating Machine Learning December 2016 14 / 29
Example workflow: building blocks
curl -X POST "https://bigml.io?$AUTH/dataset" \
-D '{"source": "source/56fbbfea200d5a3403000db7"}'
curl -X POST "https://bigml.io?$AUTH/cluster" \
-D '{"source": "dataset/43ffe231a34fff333000b65"}'
curl -X POST "https://bigml.io?$AUTH/batchcentroid" \
-D '{"dataset": "dataset/43ffe231a34fff333000b65",
"cluster": "cluster/33e2e231a34fff333000b65"}'
curl -X GET "https://bigml.io?$AUTH/dataset/1234ff45eab8c0034334"
#BSSML16 Automating Machine Learning December 2016 15 / 29
Example workflow: Web UI
#BSSML16 Automating Machine Learning December 2016 16 / 29
Automation via bindingsfrom bigml.api import BigMLapi = BigML()
project = api.create_project({'name': 'ToyBoost'})
orig_source =api.create_source(source,
{"name": "ToyBoost","project": project['resource']})
api.ok(orig_source)
orig_dataset =api.create_dataset(orig_source, {"name": "Boost"})api.ok(orig_dataset)
trainset = api.get_dataset(trainset)
for loop in range(0,10):api.ok(trainset)model = api.create_model(trainset, {
"name": "ToyBoost - Model%d" % loop,"objective_fields": ["letter"],"excluded_fields": ["weight"],"weight_field": "100011"})
api.ok(model)
batchp =api.create_batch_prediction(model, trainset, {
"name": "ToyBoost - Result%d" % loop,"all_fields": True,"header": True})
api.ok(batchp)batchp = api.get_batch_prediction(batchp)batchp_dataset =
api.get_dataset(batchp['object'])trainset = api.create_dataset(batchp_dataset, {})
#BSSML16 Automating Machine Learning December 2016 17 / 29
Example workflow: Python bindings
from bigml.api import BigML
api = BigML()
source = 'source/5643d345f43a234ff2310a3e'
# create dataset and cluster, waiting for both
dataset = api.create_dataset(source)
api.ok(dataset)
cluster = api.create_cluster(dataset)
api.ok(cluster)
# create a batch centroid with output to dataset
centroid = api.create_batch_centroid(cluster, dataset,
{'output_dataset': True,
'all_fields': True})
api.ok(centroid)
# wait again, via polling, until the dataset is finished
batch_dataset_id = centroid['object']['output_dataset_resource']
batch_dataset = api.get_dataset(batch_dataset_id)
api.ok(batch_dataset)
#BSSML16 Automating Machine Learning December 2016 18 / 29
Client-side automation via bindings
Strengths of bindings-based solutionsVersatility Maximum flexibility and possibility of encapsulation (via
proper engineering)Native Easy to support any programming languageOffline Whitebox models allow local use of resources (e.g.,
real-time predictions)
#BSSML16 Automating Machine Learning December 2016 19 / 29
Client-side automation via bindings
Strengths of bindings-based solutionsfrom bigml.model import Model
model_id = 'model/5643d345f43a234ff2310a3e'
# Download of (whitebox) resource
local_model = Model(model_id)
# Purely local calculations
local_model.predict({'plasma glucose': 132})
#BSSML16 Automating Machine Learning December 2016 20 / 29
Client-side automation via bindings
Problems of bindings-based solutionsComplexity Lots of details outside the problem domain
Reuse No inter-language compatibilityScalability Client-side workflows are hard to optimize
Not enough abstraction
#BSSML16 Automating Machine Learning December 2016 21 / 29
Outline
1 Introduction: ML as a System Service
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
#BSSML16 Automating Machine Learning December 2016 22 / 29
Higher-level Machine Learning
#BSSML16 Automating Machine Learning December 2016 23 / 29
Simple workflow in a one-liner
# 1-clikc cluster
bigmler cluster \
--output-dir output/job
--train data/iris.csv \
--test-datasets output/job/dataset \
--remote \
--to-dataset
# the created dataset id:
cat output/job/batch_centroid_dataset
#BSSML16 Automating Machine Learning December 2016 24 / 29
Simple automation: “1-click” tasks
# "1-click" ensemble
bigmler --train data/iris.csv \
--number-of-models 500 \
--sample-rate 0.85 \
--output-dir output/iris-ensemble \
--project "vssml tutorial"
# "1-click" dataset with parameterized fields
bigmler --train data/diabetes.csv \
--no-model \
--name "4-featured diabetes" \
--dataset-fields \
"plasma glucose,insulin,diabetes pedigree,diabetes" \
--output-dir output/diabetes \
--project vssml_tutorial
#BSSML16 Automating Machine Learning December 2016 25 / 29
Rich, parameterized workflows: cross-validation
bigmler analyze --cross-validation \ # parameterized input
--dataset $(cat output/diabetes/dataset) \
--k-folds 3 \ # number of folds during validation
--output-dir output/diabetes-validation
#BSSML16 Automating Machine Learning December 2016 26 / 29
Rich, parameterized workflows: feature selection
bigmler analyze --features \ # parameterized input
--dataset $(cat output/diabetes/dataset) \
--k-folds 2 \ # number of folds during validation
--staleness 2 \ # stop criterium
--optimize precision \ # optimization metric
--penalty 1 \ # algorithm parameter
--output-dir output/diabetes-features-selection
#BSSML16 Automating Machine Learning December 2016 27 / 29
Client-side Machine Learning Automation
Problems of client-side solutionsComplex Too fine-grained, leaky abstractions
Cumbersome Error handling, network issuesHard to reuse Tied to a single programming languageHard to scale Parallelization again a problemHard to generalize CLI tools like bigmler hide complexity at the cost of
flexibility
Algorithmic complexity and computing resources managementproblems mostly washed away are back!
#BSSML16 Automating Machine Learning December 2016 28 / 29
Client-side Machine Learning Automation
Problems of client-side solutionsComplex Too fine-grained, leaky abstractions
Cumbersome Error handling, network issuesHard to reuse Tied to a single programming languageHard to scale Parallelization again a problemHard to generalize CLI tools like bigmler hide complexity at the cost of
flexibility
Algorithmic complexity and computing resources managementproblems mostly washed away are back!
#BSSML16 Automating Machine Learning December 2016 28 / 29
Questions?
#BSSML16 Automating Machine Learning December 2016 29 / 29