Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for...

25
Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia

Transcript of Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for...

Page 1: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat

Map Visualization for Biomarkers Detection of LGL Leukemia

By: David Garcia

Page 2: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

Table of Contents

• What is LASSO?• How does LASSO Work?• LASSO and Feature Selection• LGL Leukemia• Statistical Biomarker Discovery• Methods and Results• Questions

Page 3: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

What is LASSO?

• LASSO = Least Absolute Shrinkage and Selection Operator

• Developed by Robert Tibshirani in 1996

• LASSO is a method of feature selection

Page 4: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

What is LASSO?

• Estimates regression coefficients bi for each feature xi

• Uses a penalty function via a tuning parameter l

• Sets coefficients of less relevant features to zero

Page 5: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

How Does LASSO Work?

Regression Equation:

ŷ = b0 + b1x1 + b2x2 + … + bnxn

x1, x2, ..., xn are the variables/features

ŷ is the predicted outcome

Page 6: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

How Does LASSO Work?

y1 = b0 + b1x11 + b2x12 + … + bnx1n

y2 = b0 + b1x21 + b2x22 + … + bnx2n

.

.

.

ym = b0 + b1xm1 + b2xm2 + … + bnxmn

Page 7: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

How Does LASSO Work?

Page 8: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

How Does LASSO Work?

y1 = b0 + b1x11 + b2x12 + … + bnx1n + e1

y2 = b0 + b1x21 + b2x22 + … + bnx2n + e2

.

.

.

ym = b0 + b1xm1 + b2xm2 + … + bnxmn + em

Page 9: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

HOW DOES LASSO WORK?

• GOAL:

find b0, b1, …, bn that minimize the

square of the total prediction error

(e1 + e2 + ... + em)2

Page 10: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

HOW DOES LASSO WORK?

• GOAL:

find b0, b1, …, bn that minimize the

square of the total prediction error

Page 11: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

HOW DOES LASSO WORK?

• GOAL:

find b0, b1, …, bn that minimize the

square of the total prediction error

Page 12: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

How Does LASSO Work?

• Presence of dependent variables (xi) leads to regression coefficients (bi) with very large variances

• Tuning parameter l used to restrict the regression coefficients

b0 + b1 + … + bn ≤ c

Page 13: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

How Does LASSO Work?

-c

-c

c

c

Page 14: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

LASSO and Feature Selection

• Use of l drives less relevant bi to zero

• LASSO can be used to filter features that contribute less to the expected result

ŷ = b0 + b1x1 + b2x2 + b3x3 + b4x4

Page 15: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

LASSO and Feature Selection

• Use of l drives less relevant bi to zero

• LASSO can be used to filter features that contribute less to the expected result

l = 0.5

ŷ = b0 + 0x1 + b2x2 + 0x3 + b4x4

Page 16: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

LASSO and Feature Selection

• LASSO can be used in bioinformatics to select genes that may contribute more to the presence of disease

l = 0.5

ŷ = b0 + 0x1 + b2x2 + 0x3 + b4x4

xi is the transcription level of gene i

ŷ is the presence or absence of disease

Page 17: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

LGL Leukemia

• LGL = large granular lymphocytic

• Results from lack of programmed cell death

• No current standard treatment

Page 18: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

Statistical Biomarker Discovery

• Other methods of biomarker detection select genes based on biomedical perspectives

• Proposed method uses a purely statistical approach

• Results need to be verified via further biomedical studies

Page 19: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

Methods and Results

• sample of 45 subjects with 10444 attributes

• 37 infected / 8 normal

• y = 0 for normal / 1 for infected

• sample data standardized based on z score

• combination of heat map visualization and LASSO

Page 20: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

Methods and Results

Page 21: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

Methods and Results

• Testing set contains one sample

• Leave-one-out cross validation used to choose optimal l

• Authors choose l that results in the most shrinkage with a mean squared error within one standard error of the minimum• l = 0.02868446

Page 22: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

Methods and Results

Page 23: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

Methods and Results

• 21 genes selected from LASSO method

• "FCGBP", "KIT", "CD34", "NLGN2", "SPINK2", "HIPK1", "SNORA31", "NR4A3", "SNORA27", "CASK", "SNORA4", "ACSM3", "NELL2", "NAGPA", "VPS25", "LYZ", "DUSP2", "GOLGA8A", "PHGDH", "SERF1A“, "TNFSF9"

Page 24: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

Methods and Results

• Database for Annotation, Visualization and Integrated Discovery (DAVID) tool used to classify genes

• One gene shows potential as LGL leukemia biomarker

Page 25: Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for Biomarkers Detection of LGL Leukemia By: David Garcia.

Questions