Intelligence Artificial Intelligence Ian Gent [email protected] Topics in Artificial Intelligence.
Artificial intelligence and IoT
-
Upload
veselin-pizurica -
Category
Technology
-
view
953 -
download
1
description
Transcript of Artificial intelligence and IoT
Page 1 Free Powerpoint Templates
Programmable web of the future
{ firstName: Veselin, lastName: Pizurica, epochTime: 1381953702
}
Page 2
Today talk is about the future - future of the web
Integration/convergence: – API’s – Sensor Networks/M2M – Cloud – Data mining – Intelligent decision engines
Page 3
Introduction to AI
– Learning, Pattern recognition – Intelligent agents – Probabilistic reasoning and uncertainty – Graphical models
Page 4
Material used
• UGent AI course: http://telin.ugent.be/~sanja/ArtificialIntelligence • BaysiaLab white paper • Wikipedia • Google search
Page 5
Map of Analy;c Modeling
Breiman (2001) and Shmueli (2010)
Page 6
= f (X)
Predic;ve modeling
Page 7
Y = (X)
Explanatory modeling
Page 8 8
Intelligent agents
Agent: an en;ty that perceives and acts (from La;n agere, to do) Ra)onal agent is one that acts so as to achieve the best outcome, or when there is uncertainty, the best expected outcome Abstractly, an agent is a func;on from percept histories to ac;ons: For any given class of environments and tasks, we seek the agent (or class of agents) with the best performance In prac;ce, computa;onal limita;ons make perfect ra;onality unachievable à design best program for given machine resources
Page 9
Ra;onality
9
• A ra;onal agent is one that does the right thing. • How do we know whether it is the right thing?
-‐ By considering the consequences of the agent behavior (i.e., the sequence of states through which the environment goes as a result of agent’s behavior)
• A sequence of states (through which the environment goes) is evaluated by a performance measure
Page 10
Specifying the task environment
10
To design a rational agent, we must specify the task environment Consider the task of designing an automated taxi:
– Performance measure: safety, destination, profits, legality, comfort
– Environment: streets/freeways, traffic, pedestrians, weather – Actuators: steering, accelerator, brake, horn, speaker/display – Sensors: video, acceleromaters, gauges, engine sensors,
keyboard, sensors
Page 11
Environment types
11
Page 12
Environment types
12
Page 13
Environment types
13
Page 14
Environment types
14
Page 15
Environment types
15
Page 16
Environment types
16
Page 17
Environment types
17
Page 18
Environment types
18
Page 19
• Four basic types in order of increasing generality: – simple reflex agents – reflex agents with state – goal-‐based agents – u;lity-‐based agents
All these can be turned into learning agents
Agent types
19
Page 20
Simple reflex agents
20
Page 21
Reflex agents with state
21
Page 22
Goal-‐based agents
22
Page 23
U;lity-‐based agents
23
Page 24
Why learning?
Why do we want an agent to learn? (Why not program an improved design from the beginning)?
– Cannot an;cipate all possible situa;ons that the agent might find itself in
– Cannot an;cipate all changes over ;me – Programmers might not know how to program a solu;on themselves (e.g. how to program face recogni;on)
Learning modifies the agent's decision mechanisms to improve performance
Page 25
Paaern recogni;on
Unsupervised learning – Learning paaerns without explicit feedback supplied – The system forms clusters or natural groupings of the input paaerns
(based on some similarity criteria). ➡Clustering
Reinforcement learning – Learning from a series of reinforcements – rewards and punishments
Supervised learning – Learning a func;on that maps input to output based on available
(observed) input-‐output pairs (Correct answers for each instance)
Semi-‐supervised learning – A few labeled samples available and a large collec;on of unlabeled
ones – Learn from geometry of unlabeled samples and use the labeled ones
to improve the learning
Page 26
Supervised Learning
labeled training sets, used to train a classifier
Page 27
Unsupervised Learning
• No labeled training sets are provided • System applies a specified clustering/grouping criteria to unlabeled dataset Clusters/groups
together “most similar” objects (according to given criteria)
Page 28
Pattern Recognition Process Data acquisition and sensing
– Measurements of physical variables. – Important issues: bandwidth, resolution , etc.
Pre-processing – Removal of noise in data. – Isolation of patterns of interest from the background.
Feature extraction – Finding a new representation in terms of features.
Classification – Using features and learned models to assign a pattern to a category.
Post-processing – Evaluation of confidence in decisions.
Page 29
Feature vectors
Single object represented by several features, e.g. shape, size, color, weight
x1 = shape(e.g.nr of sides) x2 = size(e.g. some numeric value) x3 = color (e. g. rgb values) xd = some other(numeric)feature.
X becomes a feature vector
Page 30
Classical model of Pattern Recognition
Page 31
Example of Simple Classifier
Page 32
Clustering: k-means
Page 33
“Curse of dimensionality”
Finding the principal eigenvectors of the covariance matrix of the data: PCA
Page 34
PCA Principal component analysis (PCA) is a orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
It is not, however, optimized for class separability. An alternative is the linear discriminant analysis, which does take this into account. PCA is also sensitive to the scaling of the variables.
Page 35
Deep Learning
• Choosing the correct feature representation of input data, is a way that people can bring prior knowledge of a domain to increase an algorithm's computational performance and accuracy. To move towards general artificial intelligence, algorithms need to be less dependent on this feature engineering and better learn to identify the explanatory factors of input data on their own.
• Deep learning tries to move in this direction by capturing a 'good' representation of input data by using compositions of non-linear transformations.
Page 36
Two types of models • Probabilistic graphical models have
nodes in each layer that are considered as latent random variables. In this case, you care about the probability distribution of the input data x and the hidden latent random variables h that describe the input data in the joint distribution p(x,h). These latent random variables describe a distribution over the observed data.
• Direct encoding (neural network) models have nodes in each layer that are considered as computational units. This means each node h performs some computation (normally nonlinear like a sigmoidal function) given its inputs from the previous layer.
Page 37
Decision trees
1. Learn rules from data 2. Apply each rule at each
node 3. Classification is at the
leafs of the tree
Page 38
Decision Trees example
Example: decision whether to wait for a table in a restaurant depending on the following aaributes: 1. Alternate (Alt): Is there a suitable alterna;ve restaurant nearby? 2. Bar: Is there a comfortable bar area in the restaurant, where I can wait? 3. Fri/Sat (Fri): True on Fridays/Saturdays 4. Hungry (Hun): Are we hungry? 5. Patrons (Pat): How many people are in the restaurant (None, Some or Full) 6. Price: the restaurant’s price range ($, $$, $$$) 7. Raining (Rain): Is it raining outside? 8. ReservaBon (Res): Did we make a reserva;on? 9. Type: the kind of restaurant (French, Italian, Thai or burger) 10. WaitEsBmate (Est): the wait ;me es;mated by the host (0-‐10min, 10-‐30, 30-‐60, or >60)
Page 39
Decision tree How many dis;nct decision trees we have with n Boolean aaributes? = number of Boolean func;on = number of dis;nct truth tables with 2^n rows = 2^n^n E.g., with 6 Boolean attributes 18,446,744,073,709,551,616
Page 40
Uncertainty
40
? ?
?
Let At denote the ac;on “leave for airport t minutes before flight” Will At get me there on ;me?
• Purely logical approach leads to weak conclusions: § “A90 will get me there on ;me if there is no accident on the way and it doesn't rain and my ;res remain intact and no meteorite hits the car, etc”
§ None of these can be inferred for sure à plan success cannot be inferred
Page 41
Uncertainty
41
• Consider diagnosis of a pa;ent with headache. Many reasons are possible like sinus problems or eye vision, tense muscles, flu, cancer,… Suppose a logical rule that aaempts to express this
Headache ⇒ SinusiBs ∨ EyeSight ∨ SBffNeck ∨ Flu ∨ Cancer…
• The problem is that there is almost unlimited list of possible causes. The causal rule, like SBffNeck=>Headache doesn’t work either (s;ff neck doesn’t always cause headache)
• Trying to use logic in this type of domains fails because § there is too much work to list all the aaributes § no complete theory or knowledge § not all the necessary tests can be or have been run
Page 42
Theore;cal
(no complete knowledge of the domain)
Why probabilis;c reasoning?
42
• Probabilis;c reasoning is useful because logic olen fails due to
Laziness Ignorance too many aaributes to list
and
Prac;cal
(not enough observa;ons, tests,..)
• Probabilis;c asser;ons summarize the effects of laziness and ignorance
Page 43
Graphical models
43
• Graphical models • Markov random fields • Bayesian networks
Page 44
Graphical models
Graphical models
Bayesian networks
Graphical models are related to mathema;cal graph theory
44
Page 45
• A graph is a set of objects (represented by nodes, also called ver)ces or points), where some pairs of the nodes are connected by links (edges).
• If the edges are directed, they are also called arrows and the
graph is directed. In a weighted graph, weights are assigned to the edges. The graph is complete if all the ver;ces are connected to each other.
• Probabilis;c graphs – nodes ↔ random variables (r.v.s) – edges ↔ probabilis;c dependencies between these r.v.s.
Probabilis;c graphs
45
Page 46
• Bayesian networks – directed graphical models
• Markov random fields – not directed graphs
Common graphical models
X neighbors of X
descendants of X
X Causal influence
46
Page 47
• In a directed graph • A special case: Markov chain
• Markov random field
Markov rule
P(Xi | all other nodes) = P(Xi | Neighbors (Xi ))
P(Xi | all nondescend ants) = P(Xi | Parents(Xi ))
P(Xi | Xi−1,...,X1) = P(Xi | Xi−1)
…
47
Page 48
• Non-‐directed probabilis;c graphs • Used a lot in digital image processing and computer vision • This example illustrates applica;on in image segmenta;on
Markov Random Fields (MRFs)
48
Page 49
Bayesian networks
disease 1
X-‐ray
symptoms
travel
smoker?
disease 2
49
Page 50
Bayes’ rule
50
Olen we perceive as evidence the effect of some unknown cause and we want to determine that cause, e,g. the chance of diseasex given symptomy:
Product rule
)()|()( bPbaPbaP =∧
Or in distribu;on form
Bayes’ rule
)()()|()|(
bPaPabPbaP =
)()|()(
)()|()|( YYXX
YYXXY PPP
PPP α==
Useful for accessing diagnos)c probability from causal probability
)(
)()|()|(Effect
CauseCauseEffectEffectCauseP
PPP =
)()()|(
)|(y
xxyyx symptomP
diseasePdiseasesymptomPsymptomdiseaseP =
Page 51
A simple, graphical nota;on for condi;onal independence asser;ons and hence for compact specifica;on of full joint distribu;ons Syntax: • a set of nodes, one per variable • a directed, acyclic graph (each link means “directly influences”) • a condi;onal distribu;on for each node given its parents:
Bayesian networks
51
))(|( ii XParentsXP
Page 52
X has causal influence on Y • Evidence for X forms causal support for Y • Evidence for Y forms diagnostic support for X
Network: directed acyclic graph
52
X
Y
nodes: random variables
edges: causal influence
Descendants of X Non-‐descendants of X
Page 53
Network separa;on
53
Let us inves;gate (condi;onal) independence in three simple networks featuring these types of nodes, and let denote “a and b are condi;onally independent given c”
)|()|()(),,( cbPacPaPcbaP =)()(
)|()()|()|()(),(
bPaP
abPaPcbPacPaPbaPc
≠
==∑
(in this network a and b are in general not independent)
⇒
Consider now evidence in c: P(a,b | c) = P(a,b,c)
P(c)=P(a)P(c | a)P(b | c)
P(c)=
= P(a | c)P(b | c)
So, we can say that the node c blocks the path between a and b.
Page 54
D-‐separa;on contd.
The sets A and B are d-‐separated by C if each node in A is d-‐separated from each node in B by C
A, B and C are non-‐overlapping sets
54
A B C
Page 55
Example: Car diagnosis
55
Ini;al evidence: car won't start Testable variables (green), “broken, so fix it” variables (orange) Hidden variables (gray) ensure sparse structure, reduce parameters
Page 56
• Belief propaga;on algorithm was introduced by Judea Pearl, 1982 • Exact inference in networks without loops; complexity linear in the number
of nodes • Became very popular aler it was shown that the same computa;ons are in
turbo codes and the same principles in the Viterbi algorithm • Main idea: inference by local message passing among neighboring nodes
The message can loosely be interpreted as “I (node i ) think that you (node j) are that much likely to be in a given state”.
Belief propaga;on
56
Page 57
Message passing revisited
57
1. Distributed soldier coun;ng.
2. Distributed soldier coun;ng with the leader in line.
Page 58
Numenta: HTM model
An HTM network consists of regions arranged in a hierarchy. Jeff Hawkins: “It combines and extends approaches used in Bayesian networks, spatial and temporal clustering algorithms, while using a tree-shaped hierarchy of nodes that is common in neural networks.”
Read a book, it is a great fun ->
Page 59
Semantic web and IBM’s Watson
The "heart and soul” is Unstructured Information Management Architecture [UIMA]
Page 60
Presentation 2nd part
• Smart web – API economy – IOT
• Bayesian nets – Troubleshooting and diagnostic – Sensor integration via plugin framework – Inteligent decisions and actions – Cloud deployment – IFTTT like application using framework above
Page 61
Page 62
API
• APIs have become new patents • Who holds the data, holds the knowledge • Companies don’t share their know-how, but
they are willing to share their know-what (via application programming interface API)
• API economy is coming, and it will be the major driver of the profit for many companies
Page 63
Classical products distribution
Services distributed via API
Page 64
API Market
Page 65
Page 66
Sensor Networks
• Network of specialized sensors intended to monitor and record conditions at diverse locations.
• Commonly monitored parameters are temperature, humidity, pressure, wind direction and speed, illumination intensity, vibration intensity, sound intensity, power-line voltage, chemical concentrations, pollutant levels and vital body functions.
Page 67
Page 68
M2M is becoming a reality
API economy has become reality
Page 69
Programmable web of the future
Sensors gather and push data to the cloud. API economies share data and services in the cloud. In the cloud, intelligent engine aggregates and correlates data from different sources, creating a new VALUE. That can be used either to:
– Provide new insights (analysis) – Create new instructions (actions) via API
Page 70
Three types of AI/IOT implementations
• “Ambient intelligence” – mash networks, information flow and decisions stay local
• “IOT Analytics” – big data like use case scenarios
• IOT Analytics + API’s + cloud + decision engine + actions
Page 71
From IBM talk on IOT
Page 72
Decision Engine
Page 73
IF THIS THEN THAT IS NOT GOING TO WORK
Page 74
CRM/BPM IS NOT GOING TO WORK
Page 75
Technology that can deal with huge data sets under complexity and uncertainty?
Google/Toyota/Renault/Volvo driverless car research projects
Page 76
Bayes models will win the battle
Page 77
Why is this different?
Page 78
Bayesian network modeling
Data analysis technique ideally suited to messy, complex data. The focus is on structure discovery – determining an optimal graphical model which describes the inter-relationships in the underlying processes
structure discovery AND inter-relationships
Page 79
• How do you express that car needs both battery and fuel to function? Easy.
• How do you say that if your lights are not working, most likely it is a battery fault, but it could be as well that just lights are broken? Still the fact that lights are not working point to most likely cause of the battery fault.
If you only model via composition and add behavior
separately – what most of the tools do these days – you are heading for complexity!
Page 80
Example, car model
Car model with relations: NO Data available Chance that the car will start is above 98%
Page 81
Lights are off Chance that battery functions dropped from 99,99% to less 50% Chance that the car will start is bellow 50%
off
Car example, lights are off
Page 82
Lights are on Battery works, there is no need to check it Chance that the car will start now only depends on the fuel
on
Car example, lights are on
Page 83
Prototype architecture
Pluggable sensors
Pluggable Actions
Decision engine
Website where User configures Logic (recipes)
Developer extensions (new capabilities)
Database of recipes
Page 84
DEMO!!
Page 85
“Trading places”