IF THE SINGULARITY ARRIVES, WILL IT BE BY DESIGN OR
EVOLUTION?Bill Worzel [email protected]
Evolution Enterprises http://evolver.biz
Data Day Texas 11 Jan 2014Austin, TX
Monday, January 13, 14
NATURE HAS MANY ROOMS
• Animals solve the problem of survival in many ways
• Most are adapted to specific ecological niches
• Genetics forms the common language of living creatures
Monday, January 13, 14
EVOLUTIONARY ALGORITHMS (EA) BORROW FROM NATURE
• Based on natural selection and population dynamics
• Create a population of solutions
• Preferentially select and recombine better individuals to find better solutions
Monday, January 13, 14
AN ELEGANT SEARCH
• EAs combine global search with local search
• Randomly generated individuals test many niches
• Selection and recombination hones in on the best neighborhoods
Monday, January 13, 14
GENETIC ALGORITHMS (GA)
• GAs encode information and then combine and mutate individuals
• In simplest case, encoding is a bit string mapped to variable values
• Initial population of individuals are created randomly
101001011010P/E Trend
Population
Monday, January 13, 14
SELECTION & FITNESS
• Subset of individuals are selected at random from population
• Fitness of each is calculated
• Best pair are combined to produce offspring
32 16 18 90 Fitness
9032x
Monday, January 13, 14
CROSSOVER & MUTATION
• Crossover combines bit strings
• Mutation changes bits
• Both operations are stochastic
• Offspring replace parents or weaker individuals in population
101001011010x
crossover pt
011101101110=
011101111010+
101011001110
mutationMonday, January 13, 14
BUILDING BLOCKS AND SCHEMAS
• Building block hypothesis states that GAs find good simple components that confer better fitness on individuals
• The Schema Theorem shows that better building blocks accrue to produce best individuals: E(m(H,t+1)) ≥ ((m(H,t) f(H))/at)[1-p].
Monday, January 13, 14
CASE STUDY: AGRICULTURAL MODELING
• Decision support software for farmers: With large number of new hybrids, what to choose?
• Needed to integrate agronomic, weather, economic, personal factors
• GA not as an optimizer but as an optionizer in a multi-objective space
Monday, January 13, 14
RICH LITERATURE FOR GA• Many conferences, particularly GECCO (Genetic and
Evolutionary Computing COnference)
• D. Goldberg, Genetic algorithms in search, optimization, and machine learning, Addison-Wesley, 1989
• J. Holland, Hidden Order: How adaptation builds complexity, Helix Books, 1995
• J. Holland, Adaptation in Natural and Artificial Systems, MIT Press 1975
Monday, January 13, 14
GENETIC PROGRAMMING (GP)
• GP evolves computer programs (usually functions)
• Essentially a program that produces programs as its output
• Extends idea of combining bit strings to parse trees
Monday, January 13, 14
GP OVERVIEW
ProgramPopulation
SelectMating Group
Terminate?
Crossoverand
Mutate
Replace Least Fit
With Offspring
SelectTwoBest
Programs
OutputResults
Yes
No
Input Data GP Parameters
GPCycle
?
?
?
? = stochastic process
Monday, January 13, 14
CONSTRUCTING TREES• Randomly assemble a population of function trees as
constrained by GP parameters
From: ‘A Field Guide To Genetic Programming’
Monday, January 13, 14
CROSSOVER (RECOMBINATION)
From: ‘A Field Guide To Genetic Programming’
Monday, January 13, 14
MUTATION
From: ‘A Field Guide To Genetic Programming’
Monday, January 13, 14
THE DEVIL IN THE DETAILS
• How do you correct syntax errors?
• Type coherence?
• Control overfitting?
• Computationally intensive
Monday, January 13, 14
BUT HEAVEN’S ON OUR SIDE
• Naturally parallel algorithm - linear speedup, mostly not iterative
• Sub-populations may be run asynchronously in parallel: m*n/p where m is individuals in a sub-population, n is the number of sub-populations, and p is number of processors
• Matches up well with cloud computing
Monday, January 13, 14
THE SKGP
• Uses purely functional combinators to represent programs
• Efficient, powerful, reusable code
• Algorithm becomes superlinear in parallel application because of code reuse
Monday, January 13, 14
COMBINATORS• Applicative algebra, derived from Lambda calculus,
binds left-to-right
• Sxyz = xz(yz)
• Kxy = x
• Ix = x
• Bxyz = x(yz)
• Cxyz = xzy
Monday, January 13, 14
VARIABLE ABSTRACTION
• D.A. Turner showed that all bound variables could be removed completely using combinators (Turner 1979, A New Implementation Technique for Applicative Languages, Software–Practice and Experience, vol 9, 31-49 )
• Essentially this provides a way to create expressions that are, combinators applied to data with no reference to variables
Monday, January 13, 14
EXAMPLE COMBINATOR FUNCTION
Example: ‘S(S(K +)(K 1))I’ is the function that adds 1so S(S(K +)(K 1)I applied to 3 is:
S(S(K +)(K 1))I 3
S(K +)(K 1)3(I 3)
K+3((K 1)3)(I 3)
+K 1 3 (I 3)
+ 1 (I 3)
+ 1 3
4Monday, January 13, 14
COMBINATORS FUNCTIONS QUICKLY BECOME COMPLEX
Here is the function for factorial:
def fac = S(S(S(K cond)(S(S(K =)(K 0)))I))(K 1))(S(S(K *)I) (S(K fac)(S(S(K -)I)(K 1))))
Evaluation is left as an “exercise to the reader.”
Monday, January 13, 14
THE SKGP• Implements programs as graphs
using both combinators with GP to produce pure functional (combinator) expressions
• Combinators have the property of being ‘structure altering operators’
• There is evidence that GP can be limited in its search ability without such a capability
Daida, unpublished based on Daida2004 Demonstrating Constraints to Diversity with a
Tunably Difficulty Problem for Genetic Programming
Monday, January 13, 14
CHURCH-ROSSER THEOREM
• The Church-Rosser Theorem says pure function evaluation can be order independent: Regardless of order of evaluation, result will be the same
• Because of this, each functional piece, when evaluated, can be stored for re-use since order of evaluation does not matter
• Because GP shares pieces across generations, reuse gives super-linear speed up: you don’t have to recompute each component
Monday, January 13, 14
CASE STUDY: MODELING THE MODEL
• Modeling chemical kinetics for NASA
• NASA had a set of first principle models used to simulate combustion of jet fuel and its exhaust gases: accurate but very slow
• By using the simulator to train the SKGP, it was able to produce a highly accurate function for predicting output gas amounts across a wide range of values
• Functional results was 2370x faster than running simulation
• Function was highly accurate empirical solution of PDEsMonday, January 13, 14
CASE STUDY: LISTENING TO DATA• Collaboration with Dr. Richard Cote and USC to study
bladder cancer
• Is there a molecular signature that matches T-stage of tumors? No! Attempt produced complicated, poorly performing functions
• Examining data showed that tumors with local metastasis were consistently misclassified
• Is there a signature in tumor that indicates local mets? Yes! Produced a set of concise, highly accurate, biologically sensitive functions that could identify when a tumor had metastasized
Monday, January 13, 14
SOME APPLICATIONS
• Inferential sensors (Dow Chemical)
• Financial modeling (Analytic Research Foundation, State Street Global Advisors)
• Antenna design (NASA)
• Analog circuit layout (Solido Design)
• Solid State Memory management (NVM durance)
Monday, January 13, 14
OPEN SOURCE SOLUTIONS
• Java: ECJ - a well known Java implementation from one of the well known researchers in GP
• Python: DEAP - an “all-in one package” written in Python
• Clojure: PushGP - a stack-based version of GP with many nice features, also written developed by a respected GP researcher
Monday, January 13, 14
PROPRIETARY
• Evolver by Evolution Enterprises: http://evolver.biz
• Data Modeler by Evolved Analytics: http://www.evolved-analytics.com/
Monday, January 13, 14
GP REFERENCES• J. Koza, Genetic Programming I-IV, Morgan Kauffman
and Kluwer.
• R. Poli, W.B. Langdon and N.F. McPhee, A Field Guide to Genetic Programming
• <Various> Genetic Programming Theory and Practice I-X1, 2002-2013
• Mitra et al, The use of genetic programming in the
analysis of quantitative gene expression profiles for
identification of nodal status in bladder cancer, BMC Cancer, 6(159) 2006
Monday, January 13, 14
POSSIBLE FUTURES• Some immediate areas of application include Smart Grid
and energy efficient designs, intrusion detection, discovery of protein-gene-SNP networks
• Since evolutionary algorithms give a multi-dimensional analysis in the form of a population of solutions they provide more information than a single solution
• EAs can continuous analyze data as it comes in, adapting to a changing environment while still providing high performance solutions
• There is a bridge from functions to full programs, though functional methods reduce the gap and could lead to functional co-applications (an ecology of functions)
Monday, January 13, 14
“THE BEST WAY TO PREDICT THE FUTURE IS TO INVENT IT.”
-ALAN KAY
Monday, January 13, 14
Top Related