Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for...

Post on 18-Jan-2016

214 views 1 download

Transcript of Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat Map Visualization for...

Combining Least Absolute Shrinkage and Selection Operator (LASSO) and Heat

Map Visualization for Biomarkers Detection of LGL Leukemia

By: David Garcia

Table of Contents

• What is LASSO?• How does LASSO Work?• LASSO and Feature Selection• LGL Leukemia• Statistical Biomarker Discovery• Methods and Results• Questions

What is LASSO?

• LASSO = Least Absolute Shrinkage and Selection Operator

• Developed by Robert Tibshirani in 1996

• LASSO is a method of feature selection

What is LASSO?

• Estimates regression coefficients bi for each feature xi

• Uses a penalty function via a tuning parameter l

• Sets coefficients of less relevant features to zero

How Does LASSO Work?

Regression Equation:

ŷ = b0 + b1x1 + b2x2 + … + bnxn

x1, x2, ..., xn are the variables/features

ŷ is the predicted outcome

How Does LASSO Work?

y1 = b0 + b1x11 + b2x12 + … + bnx1n

y2 = b0 + b1x21 + b2x22 + … + bnx2n

.

.

.

ym = b0 + b1xm1 + b2xm2 + … + bnxmn

How Does LASSO Work?

How Does LASSO Work?

y1 = b0 + b1x11 + b2x12 + … + bnx1n + e1

y2 = b0 + b1x21 + b2x22 + … + bnx2n + e2

.

.

.

ym = b0 + b1xm1 + b2xm2 + … + bnxmn + em

HOW DOES LASSO WORK?

• GOAL:

find b0, b1, …, bn that minimize the

square of the total prediction error

(e1 + e2 + ... + em)2

HOW DOES LASSO WORK?

• GOAL:

find b0, b1, …, bn that minimize the

square of the total prediction error

HOW DOES LASSO WORK?

• GOAL:

find b0, b1, …, bn that minimize the

square of the total prediction error

How Does LASSO Work?

• Presence of dependent variables (xi) leads to regression coefficients (bi) with very large variances

• Tuning parameter l used to restrict the regression coefficients

b0 + b1 + … + bn ≤ c

How Does LASSO Work?

-c

-c

c

c

LASSO and Feature Selection

• Use of l drives less relevant bi to zero

• LASSO can be used to filter features that contribute less to the expected result

ŷ = b0 + b1x1 + b2x2 + b3x3 + b4x4

LASSO and Feature Selection

• Use of l drives less relevant bi to zero

• LASSO can be used to filter features that contribute less to the expected result

l = 0.5

ŷ = b0 + 0x1 + b2x2 + 0x3 + b4x4

LASSO and Feature Selection

• LASSO can be used in bioinformatics to select genes that may contribute more to the presence of disease

l = 0.5

ŷ = b0 + 0x1 + b2x2 + 0x3 + b4x4

xi is the transcription level of gene i

ŷ is the presence or absence of disease

LGL Leukemia

• LGL = large granular lymphocytic

• Results from lack of programmed cell death

• No current standard treatment

Statistical Biomarker Discovery

• Other methods of biomarker detection select genes based on biomedical perspectives

• Proposed method uses a purely statistical approach

• Results need to be verified via further biomedical studies

Methods and Results

• sample of 45 subjects with 10444 attributes

• 37 infected / 8 normal

• y = 0 for normal / 1 for infected

• sample data standardized based on z score

• combination of heat map visualization and LASSO

Methods and Results

Methods and Results

• Testing set contains one sample

• Leave-one-out cross validation used to choose optimal l

• Authors choose l that results in the most shrinkage with a mean squared error within one standard error of the minimum• l = 0.02868446

Methods and Results

Methods and Results

• 21 genes selected from LASSO method

• "FCGBP", "KIT", "CD34", "NLGN2", "SPINK2", "HIPK1", "SNORA31", "NR4A3", "SNORA27", "CASK", "SNORA4", "ACSM3", "NELL2", "NAGPA", "VPS25", "LYZ", "DUSP2", "GOLGA8A", "PHGDH", "SERF1A“, "TNFSF9"

Methods and Results

• Database for Annotation, Visualization and Integrated Discovery (DAVID) tool used to classify genes

• One gene shows potential as LGL leukemia biomarker

Questions