Artificial intelligence and IoT

Free Powerpoint Templates

Programmable web of the future

{ firstName: Veselin, lastName: Pizurica, epochTime: 1381953702

}

Today talk is about the future - future of the web

Integration/convergence: – API’s – Sensor Networks/M2M – Cloud – Data mining –  Intelligent decision engines

Introduction to AI

– Learning, Pattern recognition –  Intelligent agents – Probabilistic reasoning and uncertainty – Graphical models

Material used

•  UGent AI course: http://telin.ugent.be/~sanja/ArtificialIntelligence •  BaysiaLab white paper •  Wikipedia •  Google search

Map of Analy;c Modeling

Breiman (2001) and Shmueli (2010)

= f (X)

Predic;ve modeling

Y = (X)

Explanatory modeling

8

Intelligent agents

Agent: an en;ty that perceives and acts (from La;n agere, to do) Ra)onal agent is one that acts so as to achieve the best outcome, or when there is uncertainty, the best expected outcome Abstractly, an agent is a func;on from percept histories to ac;ons: For any given class of environments and tasks, we seek the agent (or class of agents) with the best performance In prac;ce, computa;onal limita;ons make perfect ra;onality unachievable à design best program for given machine resources

Ra;onality

9

•  A ra;onal agent is one that does the right thing. •  How do we know whether it is the right thing?

-‐ By considering the consequences of the agent behavior (i.e., the sequence of states through which the environment goes as a result of agent’s behavior)

•  A sequence of states (through which the environment goes) is evaluated by a performance measure

Specifying the task environment

10

To design a rational agent, we must specify the task environment Consider the task of designing an automated taxi:

–  Performance measure: safety, destination, profits, legality, comfort

–  Environment: streets/freeways, traffic, pedestrians, weather –  Actuators: steering, accelerator, brake, horn, speaker/display –  Sensors: video, acceleromaters, gauges, engine sensors,

keyboard, sensors

Environment types

11

Environment types

12

Environment types

13

Environment types

14

Environment types

15

Environment types

16

Environment types

17

Environment types

18

•  Four basic types in order of increasing generality: –  simple reflex agents –  reflex agents with state –  goal-‐based agents –  u;lity-‐based agents

All these can be turned into learning agents

Agent types

19

Simple reflex agents

20

Reflex agents with state

21

Goal-‐based agents

22

U;lity-‐based agents

23

Why learning?

Why do we want an agent to learn? (Why not program an improved design from the beginning)?

–  Cannot an;cipate all possible situa;ons that the agent might find itself in

–  Cannot an;cipate all changes over ;me –  Programmers might not know how to program a solu;on themselves (e.g. how to program face recogni;on)

Learning modifies the agent's decision mechanisms to improve performance

Paaern recogni;on

Unsupervised learning –  Learning paaerns without explicit feedback supplied –  The system forms clusters or natural groupings of the input paaerns

(based on some similarity criteria). ➡Clustering

Reinforcement learning –  Learning from a series of reinforcements – rewards and punishments

Supervised learning –  Learning a func;on that maps input to output based on available

(observed) input-‐output pairs (Correct answers for each instance)

Semi-‐supervised learning –  A few labeled samples available and a large collec;on of unlabeled

ones –  Learn from geometry of unlabeled samples and use the labeled ones

to improve the learning

Supervised Learning

labeled training sets, used to train a classifier

Unsupervised Learning

•  No labeled training sets are provided •  System applies a specified clustering/grouping criteria to unlabeled dataset Clusters/groups

together “most similar” objects (according to given criteria)

Pattern Recognition Process Data acquisition and sensing

– Measurements of physical variables. – Important issues: bandwidth, resolution , etc.

Pre-processing – Removal of noise in data. – Isolation of patterns of interest from the background.

Feature extraction – Finding a new representation in terms of features.

Classification – Using features and learned models to assign a pattern to a category.

Post-processing – Evaluation of confidence in decisions.

Feature vectors

Single object represented by several features, e.g. shape, size, color, weight

x1 = shape(e.g.nr of sides) x2 = size(e.g. some numeric value) x3 = color (e. g. rgb values) xd = some other(numeric)feature.

X becomes a feature vector

Classical model of Pattern Recognition

Example of Simple Classifier

Clustering: k-means

“Curse of dimensionality”

Finding the principal eigenvectors of the covariance matrix of the data: PCA

PCA Principal component analysis (PCA) is a orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

It is not, however, optimized for class separability. An alternative is the linear discriminant analysis, which does take this into account. PCA is also sensitive to the scaling of the variables.

Deep Learning

•  Choosing the correct feature representation of input data, is a way that people can bring prior knowledge of a domain to increase an algorithm's computational performance and accuracy. To move towards general artificial intelligence, algorithms need to be less dependent on this feature engineering and better learn to identify the explanatory factors of input data on their own.

•  Deep learning tries to move in this direction by capturing a 'good' representation of input data by using compositions of non-linear transformations.

Two types of models •  Probabilistic graphical models have

nodes in each layer that are considered as latent random variables. In this case, you care about the probability distribution of the input data x and the hidden latent random variables h that describe the input data in the joint distribution p(x,h). These latent random variables describe a distribution over the observed data.

•  Direct encoding (neural network) models have nodes in each layer that are considered as computational units. This means each node h performs some computation (normally nonlinear like a sigmoidal function) given its inputs from the previous layer.

Decision trees

1.  Learn rules from data 2.  Apply each rule at each

node 3.  Classification is at the

leafs of the tree

Decision Trees example

Example: decision whether to wait for a table in a restaurant depending on the following aaributes: 1.  Alternate (Alt): Is there a suitable alterna;ve restaurant nearby? 2.  Bar: Is there a comfortable bar area in the restaurant, where I can wait? 3.  Fri/Sat (Fri): True on Fridays/Saturdays 4.  Hungry (Hun): Are we hungry? 5.  Patrons (Pat): How many people are in the restaurant (None, Some or Full) 6.  Price: the restaurant’s price range ($, $$, $$$) 7.  Raining (Rain): Is it raining outside? 8.  ReservaBon (Res): Did we make a reserva;on? 9.  Type: the kind of restaurant (French, Italian, Thai or burger) 10.  WaitEsBmate (Est): the wait ;me es;mated by the host (0-‐10min, 10-‐30, 30-‐60, or >60)

Decision tree How many dis;nct decision trees we have with n Boolean aaributes? = number of Boolean func;on = number of dis;nct truth tables with 2^n rows = 2^n^n E.g., with 6 Boolean attributes 18,446,744,073,709,551,616

Uncertainty

40

? ?

?

Let At denote the ac;on “leave for airport t minutes before flight” Will At get me there on ;me?

•  Purely logical approach leads to weak conclusions: §  “A90 will get me there on ;me if there is no accident on the way and it doesn't rain and my ;res remain intact and no meteorite hits the car, etc”

§ None of these can be inferred for sure à plan success cannot be inferred

Uncertainty

41

•  Consider diagnosis of a pa;ent with headache. Many reasons are possible like sinus problems or eye vision, tense muscles, flu, cancer,… Suppose a logical rule that aaempts to express this

Headache ⇒ SinusiBs ∨ EyeSight ∨ SBffNeck ∨ Flu ∨ Cancer…

•  The problem is that there is almost unlimited list of possible causes. The causal rule, like SBffNeck=>Headache doesn’t work either (s;ff neck doesn’t always cause headache)

•  Trying to use logic in this type of domains fails because §  there is too much work to list all the aaributes §  no complete theory or knowledge §  not all the necessary tests can be or have been run

Theore;cal

(no complete knowledge of the domain)

Why probabilis;c reasoning?

42

•  Probabilis;c reasoning is useful because logic olen fails due to

Laziness Ignorance too many aaributes to list

and

Prac;cal

(not enough observa;ons, tests,..)

•  Probabilis;c asser;ons summarize the effects of laziness and ignorance

Graphical models

43

•  Graphical models • Markov random fields •  Bayesian networks

Graphical models

Graphical models

Bayesian networks

Graphical models are related to mathema;cal graph theory

44

•  A graph is a set of objects (represented by nodes, also called ver)ces or points), where some pairs of the nodes are connected by links (edges).

•  If the edges are directed, they are also called arrows and the

graph is directed. In a weighted graph, weights are assigned to the edges. The graph is complete if all the ver;ces are connected to each other.

•  Probabilis;c graphs –  nodes ↔ random variables (r.v.s) –  edges ↔ probabilis;c dependencies between these r.v.s.

Probabilis;c graphs

45

•  Bayesian networks – directed graphical models

• Markov random fields – not directed graphs

Common graphical models

X neighbors of X

descendants of X

X Causal influence

46

•  Non-‐directed probabilis;c graphs •  Used a lot in digital image processing and computer vision •  This example illustrates applica;on in image segmenta;on

Markov Random Fields (MRFs)

48

Bayesian networks

disease 1

X-‐ray

symptoms

travel

smoker?

disease 2

49

Bayes’ rule

50

Olen we perceive as evidence the effect of some unknown cause and we want to determine that cause, e,g. the chance of diseasex given symptomy:

Product rule

)()|()( bPbaPbaP =∧

Or in distribu;on form

Bayes’ rule

)()()|()|(

bPaPabPbaP =

)()|()(

)()|()|( YYXX

YYXXY PPP

PPP α==

Useful for accessing diagnos)c probability from causal probability

)(

)()|()|(Effect

CauseCauseEffectEffectCauseP

PPP =

)()()|(

)|(y

xxyyx symptomP

diseasePdiseasesymptomPsymptomdiseaseP =

A simple, graphical nota;on for condi;onal independence asser;ons and hence for compact specifica;on of full joint distribu;ons Syntax: •  a set of nodes, one per variable •  a directed, acyclic graph (each link means “directly influences”) •  a condi;onal distribu;on for each node given its parents:

Bayesian networks

51

))(|( ii XParentsXP

X has causal influence on Y •  Evidence for X forms causal support for Y •  Evidence for Y forms diagnostic support for X

Network: directed acyclic graph

52

X

Y

nodes: random variables

edges: causal influence

Descendants of X Non-‐descendants of X

Network separa;on

53

Let us inves;gate (condi;onal) independence in three simple networks featuring these types of nodes, and let denote “a and b are condi;onally independent given c”

)|()|()(),,( cbPacPaPcbaP =)()(

)|()()|()|()(),(

bPaP

abPaPcbPacPaPbaPc

≠

==∑

(in this network a and b are in general not independent)

⇒

Consider now evidence in c: P(a,b | c) = P(a,b,c)

P(c)=P(a)P(c | a)P(b | c)

P(c)=

= P(a | c)P(b | c)

So, we can say that the node c blocks the path between a and b.

D-‐separa;on contd.

The sets A and B are d-‐separated by C if each node in A is d-‐separated from each node in B by C

A, B and C are non-‐overlapping sets

54

A B C

Example: Car diagnosis

55

Ini;al evidence: car won't start Testable variables (green), “broken, so fix it” variables (orange) Hidden variables (gray) ensure sparse structure, reduce parameters

•  Belief propaga;on algorithm was introduced by Judea Pearl, 1982 •  Exact inference in networks without loops; complexity linear in the number

of nodes •  Became very popular aler it was shown that the same computa;ons are in

turbo codes and the same principles in the Viterbi algorithm •  Main idea: inference by local message passing among neighboring nodes

The message can loosely be interpreted as “I (node i ) think that you (node j) are that much likely to be in a given state”.

Belief propaga;on

56

Message passing revisited

57

1. Distributed soldier coun;ng.

2. Distributed soldier coun;ng with the leader in line.

Numenta: HTM model

An HTM network consists of regions arranged in a hierarchy. Jeff Hawkins: “It combines and extends approaches used in Bayesian networks, spatial and temporal clustering algorithms, while using a tree-shaped hierarchy of nodes that is common in neural networks.”

Read a book, it is a great fun ->

Semantic web and IBM’s Watson

The "heart and soul” is Unstructured Information Management Architecture [UIMA]

Presentation 2nd part

•  Smart web – API economy –  IOT

•  Bayesian nets – Troubleshooting and diagnostic – Sensor integration via plugin framework –  Inteligent decisions and actions – Cloud deployment –  IFTTT like application using framework above

API

•  APIs have become new patents •  Who holds the data, holds the knowledge •  Companies don’t share their know-how, but

they are willing to share their know-what (via application programming interface API)

•  API economy is coming, and it will be the major driver of the profit for many companies

Classical products distribution

Services distributed via API

API Market

Sensor Networks

•  Network of specialized sensors intended to monitor and record conditions at diverse locations.

•  Commonly monitored parameters are temperature, humidity, pressure, wind direction and speed, illumination intensity, vibration intensity, sound intensity, power-line voltage, chemical concentrations, pollutant levels and vital body functions.

M2M is becoming a reality

API economy has become reality

Programmable web of the future

Sensors gather and push data to the cloud. API economies share data and services in the cloud. In the cloud, intelligent engine aggregates and correlates data from different sources, creating a new VALUE. That can be used either to:

– Provide new insights (analysis) – Create new instructions (actions) via API

Three types of AI/IOT implementations

•  “Ambient intelligence” – mash networks, information flow and decisions stay local

•  “IOT Analytics” – big data like use case scenarios

•  IOT Analytics + API’s + cloud + decision engine + actions

From IBM talk on IOT

Decision Engine

IF THIS THEN THAT IS NOT GOING TO WORK

CRM/BPM IS NOT GOING TO WORK

Technology that can deal with huge data sets under complexity and uncertainty?

Google/Toyota/Renault/Volvo driverless car research projects

Bayes models will win the battle

Why is this different?

Bayesian network modeling

Data analysis technique ideally suited to messy, complex data. The focus is on structure discovery – determining an optimal graphical model which describes the inter-relationships in the underlying processes

structure discovery AND inter-relationships

•  How do you express that car needs both battery and fuel to function? Easy.

•  How do you say that if your lights are not working, most likely it is a battery fault, but it could be as well that just lights are broken? Still the fact that lights are not working point to most likely cause of the battery fault.

If you only model via composition and add behavior

separately – what most of the tools do these days – you are heading for complexity!

Example, car model

Car model with relations: NO Data available Chance that the car will start is above 98%

Lights are off Chance that battery functions dropped from 99,99% to less 50% Chance that the car will start is bellow 50%

off

Car example, lights are off

Lights are on Battery works, there is no need to check it Chance that the car will start now only depends on the fuel

on

Car example, lights are on

Prototype architecture

Pluggable sensors

Pluggable Actions

Decision engine

Website where User configures Logic (recipes)

Developer extensions (new capabilities)

Database of recipes

DEMO!!

“Trading places”

Artificial intelligence and IoT

Technology

Transcript of Artificial intelligence and IoT