1 Jerry Tsai [email protected] This presentation available at: clintuition.com/pubs
-
Upload
steve-winey -
Category
Documents
-
view
219 -
download
1
Transcript of 1 Jerry Tsai [email protected] This presentation available at: clintuition.com/pubs
![Page 2: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/2.jpg)
22
Optimal Model Search By a Genetic Algorithm Using SAS®
Jerry Tsai
![Page 3: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/3.jpg)
33
Problem Statement
n observations; p possible predictors n >> p >> 0 2p possible subsets of
the set of predictors The challenge: Choose a subset of the
possible predictors that has the greatest predictive ability relative to its size
![Page 4: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/4.jpg)
44
Problem Definition
What do statisticians call this problem? “Subset selection” Finding the “best (predictive) model” Finding a “parsimonious model”
How do statisticians approach this problem? Conduct a search through a space defined by
the 2p possible combinations of the p parameters to find a subset of those parameters that optimizes an objective function
![Page 5: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/5.jpg)
55
Reasons to Search for an Optimal Model
1. To describe the relative importance of variables
2. To save money in data collection and management
3. To enhance predictive ability But we should make very sure it is
worth the effort Inappropriate for estimation and
hypothesis testing Time-consuming
![Page 6: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/6.jpg)
66
Commonly-Known Search Heuristics
Forward; backward; stepwise Found in REG, LOGISTIC, PHREG, more
LAR (least angle regression) LASSO (least absolute shrinkage and
selection operator) Both found in GLMSELECT
All of these heuristics use an incremental approach when searching for an optimal model
![Page 7: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/7.jpg)
77
Incremental Approach
To a set, add or subtract one variable at a time
Include or exclude a candidate variable if: The variable meets entry and stopping criteria
OR The set of variables with the candidate variable
added better optimizes the objective function
![Page 8: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/8.jpg)
88
Holistic Approach
Assess a set of variables as a whole Sets of variables are compared to one another Each element (variable) of the set is treated
equally Disadvantage: less “helpful” elements of
the set are treated the same as more “helpful” elements of the set
Advantage: May uncover synergism or confounding among variables
![Page 9: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/9.jpg)
99
Advantage of a Non-incremental Approach
The absolute optimum may be undiscoverable through a incremental approach, due to: Confounding Endogeneity Nonlinearity (with respect to a link function)
Space searched could be much greater Forward selection: O(p2)
limp → ∞
O(p2)
2p= 0
and this expression quickly converges
![Page 10: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/10.jpg)
1010
Advantage of Using Regression
Statisticians are very familiar with generalized linear models (GLMs)
Parameter estimates are amenable to comprehensible interpretation
![Page 11: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/11.jpg)
1111
Genetic Algorithm Implementation
Create a generation of sets of variables (a set of sets)
Score all sets in a generation Sets that score higher are selected for
reproduction These selected sets are recombined and
mutated to yield additional sets. These additional sets will constitute a new
generation that will in turn undergo scoring, selection, and recombination.
![Page 12: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/12.jpg)
1212
Why Use a Genetic Algorithm?
Examples from nature suggest local optima are eventually found
A holistic approach allows variables to be assessed simultaneously
The search covers a much larger area than traditional incremental approaches
![Page 13: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/13.jpg)
1313
Implementation
The presence (or absence) of each variable in a set is represented by a bit
A string of bits together represent a chromosome of bits
So each chromosome represents a subset of the possible predictors
![Page 14: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/14.jpg)
1414
Implementation Illustration
12 possible parameters alfa, bravo, charlie, delta…kilo, lima
Representation example: The variables bravo, charlie, and kilo
constitute a subset (i.e., constitute a model)0110 0000 0010abcd efgh ijkl
![Page 15: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/15.jpg)
1515
Genetic Operation – Mutation
Logically negate bits within a chromosome (point mutation) 0 becomes 1; 1 becomes 0
![Page 16: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/16.jpg)
1616
Implementation Illustration
Assume 12 possible parameters alfa, bravo, charlie, delta…kilo, lima
Example: bravo, charlie, and kilo are in the model, all
other variables are not 0110 0000 00100110 0000 0010abcd efgh ijkl
![Page 17: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/17.jpg)
1717
Mutation Example
0110 0000 0010
![Page 18: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/18.jpg)
1818
Mutation Example
0110 0000 0010
![Page 19: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/19.jpg)
1919
Mutation Example
Randomly selected for mutation
0110 0000 0010
bravo echo lima
![Page 20: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/20.jpg)
2020
0010 1000 0011
Mutation Example
bravo echo lima
Randomly selected for mutation
![Page 21: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/21.jpg)
2121
Genetic Operation – Mutation
Logically negate random bits within a chromosome (point mutation)
0 becomes 1; 1 becomes 0 Example: {bravo; charlie; kilo}; MUTATE(bravo; echo; lima)
0010 1000 0011
![Page 22: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/22.jpg)
2222
Genetic Operation – Crossover
Two chromosomes exchange genetic information (Morgan 1916)
![Page 23: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/23.jpg)
2323
Crossover Example
0110 0000 00100100 1000 0001
{bravo; charlie; kilo}
{bravo; echo; lima}
![Page 24: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/24.jpg)
2424
Crossover Example
0110 0000 00100100 1000 0001
{bravo; charlie; kilo}
{bravo; echo; lima}
![Page 25: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/25.jpg)
2525
Crossover Example
0110 0000 00100100 1000 0001
{bravo; charlie; kilo}
{bravo; echo; lima}
![Page 26: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/26.jpg)
2626
Crossover Example
0110 0000 00100100 1000 0001
0110 0000 00010100 1000 0010
{bravo; charlie; kilo}
{bravo; echo; lima}
{bravo; charlie; lima}
{bravo; echo; kilo}
![Page 27: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/27.jpg)
2727
Genetic Operation – Crossover
Two chromosomes exchange genetic information (Morgan 1916)
Example: CROSSOVER[{bravo; charlie; kilo}; {bravo;
echo; lima}; @ foxtrot]
0110 0000 00100100 1000 0001
0110 0000 00010100 1000 0010
![Page 28: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/28.jpg)
2828
Genetic Algorithm - Main Steps
Initialize Set up environment Create starting generation
Evaluate (i.e., score) Chromosomes (i.e., individuals) Generation
Report, interim Select (i.e., choose which individuals reproduce) Reproduce (i.e., create new generation)
Apply genetic operators
![Page 29: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/29.jpg)
2929
Flow Chart
Report, Interim
Evaluate
SelectInitial-ize
Repro-duce
Escape?
Report, Final
Yes
No
![Page 30: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/30.jpg)
3030
Initialize
Clear environment Initialize parameters
Create &&VAR&I macro variables from the list of possible parameters
Evaluate and store minimum (aka null) model
Evaluate and store maximum (aka full) model
Initialize parents (create starting generation)
![Page 31: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/31.jpg)
3131
Flow Chart
Report, Interim
Evaluate
SelectInitial-ize
Repro-duce
Escape?
Report, Final
Yes
No
![Page 32: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/32.jpg)
3232
Evaluate
Individual (chromosomes) If a chromosome has a score saved, assign
that score to the chromosome Otherwise, evaluate the chromosome on its
fitness for reproduction Save scores for newly-evaluated chromosomes
Generation (of chromosomes) Evaluate and store historical information on
the characteristics of the generation, e.g., the mean score.
![Page 33: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/33.jpg)
3333
Scores
Evaluate each chromosome by computing the value of these functions: Objective function = the function to be
optimized Reward greater predictive ability while
penalizing any increase in the number of parameters
e.g., Akaike’s Information Criterion (AIC)
Fitness function A function based on the objective function
that determines the probability of a chromosome being selected for reproduction.
![Page 34: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/34.jpg)
3434
SAS® Code Evaluation Illustration
proc anly-proc data = input-data-set <options>; model %do i = 1 to %cntvars.; %if %substr(&bitstrg., &i., 1) = 1 %then %do; &&var&i.. %end; %end; </ options>; <other statements>;run;
p = # of possible parameters
chromosome
variable(s)
![Page 35: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/35.jpg)
3535
SAS® Code Comments
You will very likely create output data sets from the PROC– through the use of ODS statements, OUTPUT statements, or an output option on the MODEL statement– to obtain statistics that will constitute your objective function and fitness function scores.
I actually use a modified version of my %ITERLIST macro (Tsai, WUSS 2008) to create the list of variables in the MODEL statement.
![Page 36: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/36.jpg)
3636
Flow Chart
Report,
Interim
Evaluate
SelectInitial-ize
Repro-duce
Escape?
Report, Final
Yes
No
![Page 37: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/37.jpg)
3737
Flow Chart
Report, Interim
Evaluate
SelectInitial-ize
Repro-duce
Escape?
Report, Final
Yes
No
![Page 38: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/38.jpg)
3838
Evaluate Escape Criterion
You need to specify a condition to escape the loop… if you want to algorithm to terminate
Escape criteria examples: Mean score for a particular generation fails to
exceed any of those for a specified number of generations immediately preceding
Failure to surpass the best score seen so far within a specified number of generations
Time or resource constraints reached Minimum score surpassed
![Page 39: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/39.jpg)
3939
Flow Chart
Report, Interim
Evaluate
SelectInitial-ize
Repro-duce
Escape?
Report, Final
Yes
No
![Page 40: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/40.jpg)
4040
Flow Chart
Report, Interim
Evaluate
SelectInitial-ize
Repro-duce
Escape?
Report, Final
Yes
No
![Page 41: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/41.jpg)
4141
Select
Those chromosomes with superior scores are given preference in the selection for reproduction
The method of selection is at the analyst’s discretion. One popular method used in GAs is
stochastic universal sampling
![Page 42: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/42.jpg)
4242
Stochastic Universal Sampling
Uses a single randomly-chosen value to sample from the chromosome, choosing variables at evenly-spaced intervals across their collective fitness score
F = sum of the fitness scores for all chromosomes in a generation
N = number of chromosomes to be selected for reproduction
Wikipedia, 2009
![Page 43: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/43.jpg)
4343
Flow Chart
Report, Interim
Evaluate
SelectInitial-ize
Repro-duce
Escape?
Report, Final
Yes
No
![Page 44: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/44.jpg)
4444
Reproduce
Apply to selected chromosomes the genetic operations of crossover and mutation.
The resulting chromosomes constitute (in part and possibly in full) a new generation.
![Page 45: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/45.jpg)
4545
Flow Chart
Report, Interim
Evaluate
SelectInitial-ize
Repro-duce
Escape?
Report, Final
Yes
No
![Page 46: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/46.jpg)
4646
Flow Chart
Report, Interim
Evaluate
SelectInitial-ize
Repro-duce
Escape?
Report, Final
Yes
No
![Page 47: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/47.jpg)
4747
Flow Chart
Report, Interim
Evaluate
SelectInitial-ize
Repro-duce
Escape?
Report, Final
Yes
No
![Page 48: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/48.jpg)
4848
Flow Chart
Report,
Interim
Evaluate
SelectInitial-ize
Repro-duce
Escape?
Report, Final
Yes
No
![Page 49: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/49.jpg)
4949
Flow Chart
Report, Interim
Evaluate
SelectInitial-ize
Repro-duce
Escape?
Report, Final
Yes
No
![Page 50: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/50.jpg)
5050
Flow Chart
Report, Interim
Evaluate
SelectInitial-ize
Repro-duce
Escape?
Report, Final
Yes
No
![Page 51: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/51.jpg)
5151
Flow Chart
Report, Interim
Evaluate
SelectInitial-ize
Repro-duce
Escape?
Report,
Final
Yes
No
![Page 52: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/52.jpg)
5252
Final Report
Number of generations algorithm evaluated
Mean fitness score for each generation Most optimal chromosome discovered and
its fitness and objective scores
![Page 53: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/53.jpg)
5353
Disadvantages of using a GA
Not a built-in SAS functionality Many parameters to specify
Generation size Crossover probability Mutation rate Objective function / Fitness function
Time-consuming to run Still may not find the absolute optimum
![Page 54: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/54.jpg)
5454
Advantages of using a GA
Deeper exploration of the model space. Allows you to remain within a familiar
paradigm (regression) with interpretable parameter coefficients
Agnostic to the regression model chosen – can use the same macro for any GLM with minor modifications
“Proven” success in the real world
![Page 55: 1 Jerry Tsai Jerry.Tsai@clintuition.com This presentation available at: clintuition.com/pubs](https://reader037.fdocuments.in/reader037/viewer/2022102818/56649cb65503460f9497b97e/html5/thumbnails/55.jpg)
55
Suggested Reading
References in paper Search heuristics
LAR and LASSO heuristics -- Robert Cohen, Peter Flom, and David Cassell
Information criteria in model selection Linear regression -- Dennis Beal Logistic and proportional hazards
regression -- Ernest Shtatland Mixed models -- Jesse Canchola and Torsten
Neilands