[Research] protocols and structures for inference a res tful api for machine learning - James...
-
Upload
papisio -
Category
Data & Analytics
-
view
865 -
download
0
Transcript of [Research] protocols and structures for inference a res tful api for machine learning - James...
psikit.netgithub.com/psi-project
Protocols and Structures for Inference
A RESTful API for Machine Learning
work conducted at ANU in collaboration with CISRA
and additional support from an Amazon AWS Education Grant
James [email protected]@jamesatbond
Mark [email protected]@mdreid
Barry [email protected]
http://...
http://...
http://...
psikit.netgithub.com/psi-project An ‘ecosystem’ of ML services
Prediction API
Microsoft AzureMachine Learning
Amazon Machine Learning
…and many others
The problem is not that these are bad (they’re all very good)nor that there is competition (also good)
But this ecosystem doesn’t encourage service composition or provide a way for ML practitioners of all sizes to share their data and algorithms
psikit.netgithub.com/psi-project Goals of PSI
A web service API• that is standardised, yet• sufficiently flexible to support a wide range of ML techniques
Select your ML du jour
Flexible Federated
http://...
http://...
http://...
psikit.netgithub.com/psi-project Protocols and Structures for Inference
An API specification for ML web services
Communication via JSON
Set of common ML-related resources that describe their differences using a schema language (based on JSON schema)
Support for data transformation, training, prediction and updating/online learning
with points for extension and customisation
and support for data formats beyond JSON
Iris Versicolor by Danielle Langlois / CC-BY-SA-3.0
psikit.netgithub.com/psi-project Catalogue of PSI resources
Relations (datasets)of instances
@Attributes
Learners Predictors
f(x)Transformers
Collections of PSI resources
and their
psikit.netgithub.com/psi-project
PSI start
Schema collection
integer
string
...
schema
...
Relations collection
relation
attribute
sub-attribute
...
...
...
Learners collection
learner
...
Predictors collection
predictor
update
...
Transformers collection
transformer
...
Structure of a PSI service
Optional resources
Resource instances
Required resources
L
L
f(x)
f(x)
Structured attributes (arrays, objects) can be decomposed and new attributes created
psikit.netgithub.com/psi-project
PSI start
Schema collection
integer
string
...
schema
...
Relations collection
relation
attribute
sub-attribute
...
...
...
Learners collection
learner
...
Predictors collection
predictor
update
...
Transformers collection
transformer
...
This is also a PSI service
Optional resources
Resource instances
Required resources
L
L
f(x)
f(x)
An organisation or individual could choose to provide access to one or more datasets
The ‘root’ and collection resources are very lightweight
psikit.netgithub.com/psi-project
PSI start
Schema collection
integer
string
...
schema
...
Relations collection
relation
attribute
sub-attribute
...
...
...
Learners collection
learner
...
Predictors collection
predictor
update
...
Transformers collection
transformer
...
…and so is this
Optional resources
Resource instances
Required resources
L
L
f(x)
f(x)
ML researchers could present their just-published learning algorithm as a resource
psikit.netgithub.com/psi-project
PSI start
Schema collection
integer
string
...
schema
...
Relations collection
relation
attribute
sub-attribute
...
...
...
Learners collection
learner
...
Predictors collection
predictor
update
...
Transformers collection
transformer
...
…and this, etc.
Optional resources
Resource instances
Required resources
L
L
f(x)
f(x)
Or even a single predictor
psikit.netgithub.com/psi-project Is it RESTful? Is that important?
100% RESTful is not a reasonable aim
But can improve interoperability and development of clients
• Discoverable namespace
• Extensible through links entry in resource representations(similar to HTML link element & part of JSON Hyper-schema standard)
Client must still know it’s using a PSI service…
• but each resource does provide informationabout how to use it through schema
(no PSI media types)
psikit.netgithub.com/psi-project Schema describes…
@
Form of learning tasks that learners can process
The domain and range of transformers
f(x)The domain and range
of transformersService-specific queriessupported by relations
The data format of attributes
psikit.netgithub.com/psi-project Common workflows
Predict
Train
Predicted
value
Relation
resource
instance
Attribute 1 emits
resource
Attribute 2 emits
Attribute n emits
instance
representations
Learnerrequires
resource
Predictoraccepts
update
emits
resource
Attribute emits
resource
Update
Transformer emitsaccepts
resource
Relation
resource
instance
schema
other datasource
Attribute emits
resourceRelation
resource
instance
a resourceresource
Legend:
JSON
psikit.netgithub.com/psi-project Common workflows
Predict
Train
Predicted
value
Relation
resource
instance
Attribute 1 emits
resource
Attribute 2 emits
Attribute n emits
instance
representations
Learnerrequires
resource
Predictoraccepts
update
emits
resource
Attribute emits
resource
Update
Transformer emitsaccepts
resource
Relation
resource
instance
schema
other datasource
Attribute emits
resourceRelation
resource
instance
a resourceresource
Legend:
JSON
psikit.netgithub.com/psi-project Training
@
GET
representation includes task schema
resources
any other
vector attribute
training parameters
nominal attribute
@@@
POST
a task
resources
n = 5
a vector attribute
l = 0.5
a nominal attribute
JSON representationsof attributes (not their values) or URI references
discover attributes;reshape as needed;
compose with transformers
201 Created/202 Accepted
URI of
1
2
3
4
psikit.netgithub.com/psi-project What’s in the schema
Algorithm requires an attribute that produces
JSON schema required to describe PSI attribute’s representation and enforce it produces correct values
Feature vectors of numbers
{"type":"object","properties":{"responseType":{"enum":["attribute#description"],"required":true},"uri":{"type":"string","required":true},"schema":{"type":"object","properties":{"type":{"enum":["array"],"required":true},"items":{"type":"array","items":{"type":"object","properties":{"type":{"enum":["integer","number"],"required":true}}},"required":true}},"required":true},"description":{"type":"string"},"provenance":{"type":["string","object"]},"relation":{"type":"string"},"subattributes":{"type":"array","items":{"type":"string"}}}}
𝑋 ∈ 𝑛
This is really the only change, but this is still very complicated
psikit.netgithub.com/psi-project
𝑋 ∈ ( ∪ ∗)𝑛
Pre-defined PSI schema eases burden
Algorithm requires an attribute that produces
PSI schema
Feature vectors of numbers "$arrayAttribute": {
"allItems" : "$numberSchema"}
Feature vectors of real-numbers, integers or strings
"$arrayAttribute": {"allItems" : "$atomicValueSchema"
}
𝑋 ∈ 𝑛
psikit.netgithub.com/psi-project Common workflows
Predict
Train
Predicted
value
Relation
resource
instance
Attribute 1 emits
resource
Attribute 2 emits
Attribute n emits
instance
representations
Learnerrequires
resource
Predictoraccepts
update
emits
resource
Attribute emits
resource
Update
Transformer emitsaccepts
resource
Relation
resource
instance
schema
other datasource
Attribute emits
resourceRelation
resource
instance
a resourceresource
Legend:
JSON
psikit.netgithub.com/psi-project Prediction
GET?value=[5.1,3.5,1.4,0.2]
@setosa
join request with URI of
201 Created, URI of @
Simple Prediction Or join predictor with attribute to predict on whole relation
GET?instance=all
[setosa,setosa,setosa,…,virginica]
@
psikit.netgithub.com/psi-project Beyond JSON data types
• PSI rich values support data of any media type(PSI schema can still be used for data type validation)
• Rich value is either an HTTP URI or Data URI
Iris Versicolor by Danielle Langlois / CC-BY-SA-3.0
data:image/png;base64/9j/4AAQSkZJRgABAQEASABIAAD/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcUFhYaHSUfGhsjHBYWICwgIyYnKSop
psikit.netgithub.com/psi-project Proofs of concept
Demonstration PSI service at poseidon.cecs.anu.edu.au
Demonstration Javascript client atpsi.cecs.anu.edu.au/demo
• HTML forms generated from PSI schema
• Predictor evaluation and comparison
Play 1.2-based service, that exposes some classification and regression algorithms from
scikit-learnplus a simple
ranking algorithm using scikit-learn
An HTML to bag-of-words transformer in Python will be on GitHub soon
psikit.netgithub.com/psi-project Querying relations
Schema defines the elements of the query, their data type and can even include descriptions (which become hints here)
psikit.netgithub.com/psi-project Constructing a learning task
psikit.netgithub.com/psi-project Evaluation via client-service interactions
psikit.netgithub.com/psi-project Future
Amazon AWS AMI of play-based
service planned
thanks to an Amazon AWS Education Grant
PSI provides the core of a flexible ML
API that can be freely
implemented
Security & authentication can
be built on top
Can be offered as alternative interface to existing ML web
services