Early work in intelligent systems
description
Transcript of Early work in intelligent systems
![Page 1: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/1.jpg)
Early work in intelligent systems
Alan Turing (1912 – 1954) Arthur Samuel (1901-1990)
![Page 2: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/2.jpg)
Early work in intelligent systems
• Alan Turing (1912 – 1954)Father of computer science, mathematician, philosopher, codebreaker (WW II), homosexual
• The Turing Machine• The Turing Test (AI)
![Page 3: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/3.jpg)
Early work in intelligent systems
Alan Turing (1950):We cannot expect to find a good child-machine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse. There is an obvious connection between this process and evolution, by the identifications:Structure of the child machine = Hereditary materialChanges of the child machine = MutationsNatural selection = Judgment of the experimenter.
![Page 4: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/4.jpg)
Early work in intelligent systems
Arthur Samuel (1901-1990)• “How can computers learn to
solve problems without being explicitly programmed? In other words, how can computers be made to do what is needed to be done, without being told exactly how to do it?" (1959)
• “The aim is to get machines to exhibit behavior, which if done by humans, would be assumed to involve the use of intelligence.” (1983)
![Page 5: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/5.jpg)
Genetic Programming
• Breed a population of computer programs to solve a given problem– An extension of genetic algorithms
– Selection, crossover, mutation
![Page 6: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/6.jpg)
Preparatory Steps
John Koza: the human user supplies:(1)The set of terminals (e.g., the independent
variables of the problem, zero-argument functions, and random constants)
(2) The set of primitive functions for each branch of the program to be evolved
(3) The fitness measure (4) The parameters for controlling the run(5) The termination criteria
![Page 7: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/7.jpg)
1. Terminal Set
• External inputs to the program
• Numerical constants (problem dependent?) , e, 0, 1, … , random numbers, …
![Page 8: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/8.jpg)
2. Function Set
• Arithmetic functions
• Conditional branches (if statements)
• Problem specific functions (controllers, filters, integrators, differentiators, circuit elements, …)
![Page 9: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/9.jpg)
3. Fitness Measure
• The GP measures the fitness of each individual (computer program)
• Fitness is usually averaged over a variety of different cases– Program inputs– Initial conditions– Different environments
![Page 10: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/10.jpg)
4. Control Parameters
• Population size (thousands or millions)
• Selection method
• Crossover probability
• Mutation probability
• Maximum program size
• Elitism option
![Page 11: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/11.jpg)
5. Termination Criterion
• Maximum number of generations / real time
• Convergence of highest / mean fitness
• …
![Page 12: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/12.jpg)
GP Flowchart
![Page 13: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/13.jpg)
Initialization
![Page 14: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/14.jpg)
Initialization
Max((* x x) (+ x (* 3 y)))
Prefix notation (Lisp)
Max(x*x, x+3*y)
• Nodes (points, functions)
• Links, terminals
![Page 15: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/15.jpg)
Program Tree
(+ 1 2 (IF (> TIME 10) 3 4))
If Time > 10 then x = 3
elsex = 4
Solution = 1 + 2 + x
![Page 16: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/16.jpg)
Mutation
• Select one individual probabilistically• Pick one point in the individual• Delete the subtree at the chosen point• Grow a new subtree at the mutation point in
same way as for the initial random population• The result is a syntactically valid executable
program
![Page 17: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/17.jpg)
Crossover
• Select two parents probabilistically based on fitness
• Randomly pick a node in the first parent (often internal nodes 90% of the time)
• Independently randomly pick a node in the second parent
• Swap subtrees at the chosen nodes
![Page 18: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/18.jpg)
Reproduction
• Select an individual probabilistically based on fitness
• Copy it (unchanged) into the next generation of the population (cloning)
![Page 19: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/19.jpg)
ExampleIndependent variable Dependent variable
-1.00 1.00
-0.80 0.84
-0.60 0.76
-0.40 0.76
-0.20 0.84
0.00 1.00
0.20 1.24
0.40 1.56
0.60 1.96
0.80 2.44
1.00 3.00
Generate a computer program with one input x whose output equals the given data y
( y = x2 + x + 1 )
![Page 20: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/20.jpg)
Preparatory Steps
1 Terminal set: T = {x, Random Constants}
2 Function set: F = { +, -, *, % }
3 Fitness: The sum of the absolute value of the differences between the program’s output and the given data (low is good)
4 Parameters: Population size M = 4
5 Termination: An individual emerges whose sum of absolute errors is less than 0.1
![Page 21: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/21.jpg)
Initialization
![Page 22: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/22.jpg)
Fitness Evalution
x+1 x2+1 2 x
0.67 1.00 1.70 2.67
![Page 23: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/23.jpg)
Reproduction
• Copy (a), the most fit individual• Mutate (c)
![Page 24: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/24.jpg)
Crossover
![Page 25: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/25.jpg)
Crossover
![Page 26: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/26.jpg)
Interpreting a program tree
![Page 27: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/27.jpg)
Interpreting a program tree
{ – [ + ( – 3 0 ) ( – x 1 ) ] [ / ( – 3 0 ) ( – x 2 ) ] }
What does this evaluate as?
What are the terminals, functions, and lists?
![Page 28: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/28.jpg)
Interpreting a program tree
{ – [ + ( – 3 0 ) ( – x 1 ) ] [ / ( – 3 0 ) ( – x 2 ) ] }
[ (3 – 0) + (x – 1) ] – [ (3 – 0) / (x – 2) ]
• Terminals = { 3, 0, x, 1, 2}
• Functions = { –, +, / }
• Lists = ( – 3 0 ), [ + ( – 3 0 ) ( – x 1 ) ], …
![Page 29: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/29.jpg)
Interpreting a program tree
recursion
\Re*cur"sion\ (-sh?n), n. [L. recursio.]
See recursion.
factorial ( n )
if n = = 0 then return 1
else return n * factorial (n – 1)
![Page 30: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/30.jpg)
Interpreting a program treeRecursive function EVAL:
if EXPR is a list then // i.e., delimited by parenthesesPROC = EXPR(1)VAL = PROC [ EVAL(EXPR(2)), EVAL(EXPR(3)),
…]else // i.e., EXPR is a terminal
if EXPR is a variable or constant thenVAL = EXPR
else // i.e., EXPR is a function with no argumentsVAL = EXPR ( )
endend
![Page 31: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/31.jpg)
Can computer programs create new inventions?
![Page 32: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/32.jpg)
GP Inventions
• Two patents filed by Keane, Koza, and Streeter on July 12, 2002 1. Creation of Tuning Rules for PID
Controllers that Outperform the Ziegler-Nichols and Åström-Hägglund Tuning Rules
2. Creation of 3 Non-PID Controllers that Outperform a PID Controller that uses the Ziegler-Nichols or Åström-Hägglund Tuning Rules
![Page 33: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/33.jpg)
GP for Antenna Design
• X band antenna – Jason Lohn, NASA Ames– Wide beamwidth for a circularly polarized wave
– Wide bandwidth
![Page 34: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/34.jpg)
The evolution of genetic programmingHardware Years CPU Power Results
Texas Instruments LISP machine
1987–1994 1 (base) Example problems
64-node Transtech transputer
1994–1997 9 Human competitive results
64-node Parsytec parallel computer
1995-2000 22 Reproduction of 20th century patents
70-node Alpha parallel computer
1999-2001 7 Circuit synthesis
1,000-node Pentium II 2000-2002 9 Reproduction of 21st century patents
1,000-node Pentium II4 weeks of CPU time
2002 9 Two new patents
![Page 35: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/35.jpg)
GP Computational Effort
• Human brain 1012 neurons1 msec 1015 operations per second 1 peta-op = 1 brain second (B-sec)• Keane, Koza, Streeter patents:
Pop. Gens. Hours Nodes MHz B-sec
Patent 1 100K 76 107 1K 350 135
Patent 2 100K 325 1409 1K 350 1775
![Page 36: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/36.jpg)
When should you use GP?
• Problem involving many variables that are interrelated in highly nonlinear ways
• Relationships among variables is not well understood
• Discovery of the size and shape of the solution is a major part of the problem
• “Black art” problems (controller tuning)• Areas where you have no idea how to program
a solution, but you know what you want
![Page 37: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/37.jpg)
When should you use GP?
• Problems where a good approximate solution is satisfactory– Design
– Control and estimation
– Bioinformatics
– Classification
– Data mining
– System identification
– Forecasting
![Page 38: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/38.jpg)
When should you use GP?
Areas where large computerized databases are accumulating and computerized techniques are needed to analyze the data genome, protein, microarray data satellite image data astronomical data petroleum databases medical records marketing databases financial databases
![Page 39: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/39.jpg)
Schema Theory for GP
• The # symbol represents “don’t care”
• Example: H = ( + ( – # y ) # ) instances are:( + ( – x y ) x ) → ( x – y ) + x
( + ( – x y ) y ) → ( x – y ) + y
( + ( – y y ) x ) → ( y – y ) + x
( + ( – y y ) y ) → ( y – y ) + y
![Page 40: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/40.jpg)
Schema Theory for GP
Example: H = ( + ( – # y ) # )
• o(H) = number of defined symbolso(H) = ?
• Length N(H) = number of symbolsN(H) = ?
• Defining length L(H) = number of links joining defined symbolsL(H) = ?
![Page 41: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/41.jpg)
Schema Theory for GP
All these schema sample the program ( + ( – 2 x ) y )What are the schema defining length L, order o, and length N?
+
– #
+ #
–
#
–#
2 x
#
# x
#
2 #
#
# #
![Page 42: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/42.jpg)
Schema Theory for GP
L = 3 L = 2 L = 1 L = 0o = 4 o = 2 o = 2 o = 1N = 5 N = 5 N = 5 N = 5
+
– #
+ #
–
#
–#
2 x
#
# x
#
2 #
#
# #
![Page 43: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/43.jpg)
Schema Theory for GP
How many schema match a tree of length N ?
For example, consider the program
( + ( – 2 x ) ( – 3 y ) )
+
–
2 x
–
3 y
![Page 44: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/44.jpg)
Schema Theory for GP
Definitions:
• m(H, t) = number of schema H at generation # t
• G = structure of schema H
For example, if H = ( + ( – # y ) # )
then G = ( # ( # # # ) # )
![Page 45: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/45.jpg)
Schema Theory for GP
• m(H, t) = number of schema H at gen. # t
• m(H, t+1/2) = number of schema selected for crossover / mutation
• m(H, t+1) = number of schema after crossover / mutation
• Fitness proportionate selection:m(H, t+1/2) = m(H, t) f(H, t) / fave
![Page 46: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/46.jpg)
Schema Theory for GP
Crossover: two ways for destruction of schema H1. Program h H crosses with program g that has a
different structure than G Event D1
2. Program h H crosses with program g that has the same structure as G, but g H Event D2
Pr(crossover destruction) = Pr(D) = Pr(D1) + Pr( D2 )
![Page 47: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/47.jpg)
Crossover Destruction – Type 1
+
yx
+
–
2 x
–
3 y
( + ( – 2 x ) ( – 3 y ) )
( + x y )
Crossover results in
( + y ( – 3 y ) )
( + x ( – 2 x ) )
Both schema are destroyed
+
y
x
+
–
2 x
–
3 y
![Page 48: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/48.jpg)
Crossover Destruction – Type 2
If h = (+ x y) H = ( # x y) and g = (g1 y x) Hthen crossover between the + and x gives:
( + y x ) and (g1 x y ) H, schema preserved
But if h = (+ x y) H = ( + x #) and g = ( g1 y x) H then crossover between the + and x gives:
( + y x ) and (g1 x y ) H, schema destroyed
(unless g1 = “+”)
![Page 49: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/49.jpg)
Crossover Destruction – Type 1
Program h H crosses with program g that has a different structure than G Event D1
M = population size
Pr(D1) = Pr(D | g G) Pr(g G)
Pr(g G) = [M – m(G, t+1/2)] / M
Pr(D | g G) = Pdiff
![Page 50: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/50.jpg)
Crossover Destruction – Type 2
Program h H crosses with program g that has the same structure as G but g H Event D2
Pr(D2) = Pr(D | g G) Pr(g G)
Pr(g G) = m(G, t+1/2) / M
Pr(D | g G) = Pr(D | g H) Pr(g H | g G)
Pr(g H | g G) =
[ m(G, t+1/2) – m(H, t+1/2) ] / m(G, t+1/2)
![Page 51: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/51.jpg)
Crossover Destruction – Type 2
Pr(D | g H) ≤ L(H) / [ N(H) – 1 ]
Therefore,
Pr(D2) ≤ { L(H) / [ N(H) – 1 ] }
[ m(G, t+1/2) – m(H, t+1/2) ] / M
![Page 52: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/52.jpg)
Crossover Destruction
Pr(D) = Pr (D1) + Pr ( D2 ) ≤
{ [M – m(G, t+1/2)] / M } Pdiff +
{ L(H) / [ N(H) – 1 ] }
[ m(G, t+1/2) – m(H, t+1/2) ] / M
![Page 53: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/53.jpg)
Crossover Destruction
Crossover occurs with probability pc
m(H, t+1) = (1 – pc ) m(H, t+1/2) +
pc m(H, t+1/2) [1 – Pr(D)]
m(H, t+1) = m(H, t+1/2) [1 – pc Pr(D)]
![Page 54: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/54.jpg)
Mutation Destruction
Pr(Mutation Destruction) =
1 – (1 – pm)o(H) pm o(H)
m(H, t+1) =
m(H, t+1/2) [1 – pc Pr(D)] [1 – pm o(H) ]
Combine previous results to obtain a lower bound for m(H, t+1)
![Page 55: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/55.jpg)
Schema Theory for GP
m(H, t+1)
[ m(H, t) f(H, t) / fave ] [ 1 - pm o(H) ] ×
[1 – pc { ( 1 – m(G,t) f(G,t) / M fave) Pdiff +
( L(H) / [ N(H) – 1 ] )
[ m(G,t) f(G,t) – m(H,t) f(H,t) ] / M fave) } ]
Slightly more complex than GA schema theorem
![Page 56: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/56.jpg)
Schema Theory for GP
Simplification: early in the GP run (highdiversity) we have:
• Pr(D | g G) = Pdiff 1
• m(G,t) f(G,t) / M fave << 1
![Page 57: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/57.jpg)
Schema Theory for GP
m(H, t+1)
[ m(H, t) f(H, t) / fave ] (1 - pm o(H)) ×
[1 – pc { ( 1 – m(G,t) f(G,t) / M fave) Pdiff +
( L(H) / [ N(H) – 1 ] )
(m(G,t) f(G,t) – m(H,t) f(H,t)) / M fave) } ]
[ m(H, t) f(H, t) / fave ] (1 - pm o(H)) ×
[1 – pc { 1 + ( L(H) / [ N(H) – 1 ] )
(– m(H,t) f(H,t)) / M fave) } ]
![Page 58: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/58.jpg)
Schema Theory for GPs
For short schema, L(H) / [ N(H) – 1 ] << 1 and
m(H, t) f(H, t) / fave (1 - pm o(H)) ×
[1 – pc { 1 + ( L(H) / [ N(H) – 1 ] )
(– m(H,t) f(H,t)) / M fave) } ]
[ m(H, t) f(H, t) / fave ] (1 - pm o(H)) (1 – pc )
![Page 59: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/59.jpg)
Schema Theory for GPs
m(H, t+1)
[ m(H, t) f(H, t) / fave ] (1 - pm o(H)) (1 – pc )
Nearly the same as the GA schema theorem:
m(H, t+1)
[ m(H, t) f(H, t) / fave ] (1 - pm o(H)) ×
(1 – pc L(H) / [ N(H) – 1 ] )
![Page 60: Early work in intelligent systems](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814f2e550346895dbcbaad/html5/thumbnails/60.jpg)
GP references
• www.genetic-programming.org
• www.genetic-programming.com
• cswww.essex.ac.uk/staff/poli