Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction...

67
Feature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral Research Fellow, Emory University October 31, 2011

Transcript of Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction...

Page 1: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Feature Extraction with Data Mining:Introducing Sky’s research

Ph.D. Kichun “Sky” Lee

Post-doctoral Research Fellow, Emory University

October 31, 2011

Page 2: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Content

1 Introduction:an overview of research and presentation

2 Dependence Maps:a new dimensionality reduction

3 Semi-supervised Shrinkage Rule:a classification on wavelet domains

Page 3: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Introduction

Part I

Introduction:an overview of research and presentation

Kichun “Sky” Lee 10/31/2011 3/30 Feature Extraction with Data Mining

Page 4: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Education and Experiences

B.S. Industrial Management

Education

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

B.S. Industrial Management

(minor Electrical Engineering)

KAISTM.S. Industrial Engineering

(Human Computer Interaction)

KAIST

Ph.D. in Statistics

Industrial Systems Engineering

(minor Electrical Computer Engineering)

Georgia TechTeagu Science High School

Kichun “Sky” Lee 10/31/2011 4/30 Feature Extraction with Data Mining

Page 5: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Education and Experiences

B.S. Industrial Management

Education

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

B.S. Industrial Management

(minor Electrical Engineering)

KAISTM.S. Industrial Engineering

(Human Computer Interaction)

KAIST

Ph.D. in Statistics

Industrial Systems Engineering

(minor Electrical Computer Engineering)

Georgia TechTeagu Science High School

IBM (internship)

ETRI (part-time engr. staff)

ICG (development head)

TmaxSoft (researcher)

Samsung SDS (researcher)

Georgia Tech, Emory

(post-doc)

Work Experience

Military Obligation

Kichun “Sky” Lee 10/31/2011 4/30 Feature Extraction with Data Mining

Page 6: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Research Methods

Information TechnologyInformation Technology

HCIHCI

System SWSystem SW

Kichun “Sky” Lee 10/31/2011 5/30 Feature Extraction with Data Mining

Page 7: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Research Methods

Data MiningData Mining

Semi-supervisedLearningSemi-supervisedLearning

Dim. ReductionDim. Reduction

SVMsSVMs

Multivariate StatisticsMultivariate Statistics

Information TechnologyInformation Technology

HCIHCI

System SWSystem SW

Kichun “Sky” Lee 10/31/2011 5/30 Feature Extraction with Data Mining

Page 8: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Research Methods

Data MiningData Mining Time SeriesTime Series

Semi-supervisedLearningSemi-supervisedLearning

WaveletWavelet

Multi-scale methodsMulti-scale methods

Dim. ReductionDim. Reduction

SVMsSVMs

Multivariate StatisticsMultivariate Statistics FractalityFractality

Functional Data AnalysisFunctional Data Analysis

Multi-scale methodsMulti-scale methods

Information TechnologyInformation Technology

HCIHCI

System SWSystem SW

Kichun “Sky” Lee 10/31/2011 5/30 Feature Extraction with Data Mining

Page 9: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Research Methods

Data MiningData Mining

Semi-supervisedLearningSemi-supervisedLearning

Time SeriesTime Series

WaveletWavelet

BioinformaticsMulti-scale methodsMulti-scale methods

Dim. ReductionDim. Reduction

SVMsSVMs

FractalityFractality

Functional Data AnalysisFunctional Data Analysis

Bioinformatics

Mgt. Science

Multivariate StatisticsMultivariate Statistics

Multi-scale methodsMulti-scale methods

Information TechnologyInformation Technology

HCIHCI

System SWSystem SW

Kichun “Sky” Lee 10/31/2011 5/30 Feature Extraction with Data Mining

Page 10: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Research Outputs

JournalsI Kichun Lee∗ and Brani Vidakovic. Semi-supervised Wavelet Shrinkage. Computational Statistics & Data Analysis, 2011

(Accepted).

I Youngja Park∗ , Dean P Jones, Thomas R Ziegler, Kichun Lee, Kavitha Kotha, Tianwei Yu, Greg S. Martin. Metabolic effectsof albumin therapy in acute lung injury measured by 1H-NMR spectroscopy of plasma: a pilot study. Critical Care Medicine,10:2308–2313, 2011.

I Pepa Ramirez Cobo, Kichun Sky Lee, Annalisa Molini, Amilcare Porporato, Gabriel Katul, Brani Vidakovic∗ . Wavelet-basedspectral methods for extracting self-similarity measures in time-varying two-dimensional rainfall maps. Journal of Time SeriesAnalysis, 32:337–446, 2011.

I Youngja Park, Ngoc-Anh Le, Tianwei Yu, Nana Gletsu Miller, Carolyn J. Accardi, Kichun S. Lee, Shaoxiong Wu, Thomas R.Ziegler, and Dean P. Jones∗ . Sulfur Amino Acid-Free Meal Increased Plasma Triglyceride as assessed by 1H-NMRSpectroscopy. Journal of Nutrition, 141:1424–1431, 2011.

I Sharla Gayle Patterson, Charlene W. Bayer, Robert J. Hendry, Nancy Sellers, K. Sky Lee, Brani Vidakovic, Boris Mizaikoff,Sheryl G.A. Gabram-Mendola∗ . Breath Analysis by Mass Spectrometry: A new Tool for Breast Cancer (BC) Detection? TheAmerican Surgeon, 77:747–751, 2011.

I Kichun Sky Lee∗ , Jongphil Kim, Brani Vidakovic. Regularity of Irregularity: Testing for Monofractality by Multifractal Tools.International Journal of Mathematics and Computer Science: Special Issue on Computational Biology and Data Mining, 2010(Accepted).

I Kichun Sky Lee∗ , M. Forrest Abouelnasr, Charlene W. Bayer, Sheryl G.A. Gabram, Boris Mizaikoff, Andre Rogatko, andBrani Vidakovic. Mining exhaled volatile organic compounds for breast cancer detection. Advances and Applications inStatistical Sciences, 1:327–342, 2009.

I Gordana Derado∗ , Kichun Lee, Orietta Nicolis, F. DuBois Bowman, Mary Newell, Fabrizio F. Rugger, and Brani Vidakovic.Wavelet-Based 3-D Multifractal Spectrum with Applications in Breast MRI Images, pages 281–292. Springer, 2008.

I H. Ahn∗ , K. Lee, and K. Kim. Global optimization of support vector machines using genetic algorithms for bankruptcyprediction. In Neural Information Processing, pages 420–429. Springer, 2006.

Kichun “Sky” Lee 10/31/2011 6/30 Feature Extraction with Data Mining

Page 11: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Research Outputs

JournalsI Kichun Lee∗ and Brani Vidakovic. Semi-supervised Wavelet Shrinkage. Computational Statistics & Data Analysis, 2011

(Accepted).

I Youngja Park∗ , Dean P Jones, Thomas R Ziegler, Kichun Lee, Kavitha Kotha, Tianwei Yu, Greg S. Martin. Metabolic effectsof albumin therapy in acute lung injury measured by 1H-NMR spectroscopy of plasma: a pilot study. Critical Care Medicine,10:2308–2313, 2011.

I Pepa Ramirez Cobo, Kichun Sky Lee, Annalisa Molini, Amilcare Porporato, Gabriel Katul, Brani Vidakovic∗ . Wavelet-basedspectral methods for extracting self-similarity measures in time-varying two-dimensional rainfall maps. Journal of Time SeriesAnalysis, 32:337–446, 2011.

I Youngja Park, Ngoc-Anh Le, Tianwei Yu, Nana Gletsu Miller, Carolyn J. Accardi, Kichun S. Lee, Shaoxiong Wu, Thomas R.Ziegler, and Dean P. Jones∗ . Sulfur Amino Acid-Free Meal Increased Plasma Triglyceride as assessed by 1H-NMRSpectroscopy. Journal of Nutrition, 141:1424–1431, 2011.

I Sharla Gayle Patterson, Charlene W. Bayer, Robert J. Hendry, Nancy Sellers, K. Sky Lee, Brani Vidakovic, Boris Mizaikoff,Sheryl G.A. Gabram-Mendola∗ . Breath Analysis by Mass Spectrometry: A new Tool for Breast Cancer (BC) Detection? TheAmerican Surgeon, 77:747–751, 2011.

I Kichun Sky Lee∗ , Jongphil Kim, Brani Vidakovic. Regularity of Irregularity: Testing for Monofractality by Multifractal Tools.International Journal of Mathematics and Computer Science: Special Issue on Computational Biology and Data Mining, 2010(Accepted).

I Kichun Sky Lee∗ , M. Forrest Abouelnasr, Charlene W. Bayer, Sheryl G.A. Gabram, Boris Mizaikoff, Andre Rogatko, andBrani Vidakovic. Mining exhaled volatile organic compounds for breast cancer detection. Advances and Applications inStatistical Sciences, 1:327–342, 2009.

I Gordana Derado∗ , Kichun Lee, Orietta Nicolis, F. DuBois Bowman, Mary Newell, Fabrizio F. Rugger, and Brani Vidakovic.Wavelet-Based 3-D Multifractal Spectrum with Applications in Breast MRI Images, pages 281–292. Springer, 2008.

I H. Ahn∗ , K. Lee, and K. Kim. Global optimization of support vector machines using genetic algorithms for bankruptcyprediction. In Neural Information Processing, pages 420–429. Springer, 2006.

Kichun “Sky” Lee 10/31/2011 6/30 Feature Extraction with Data Mining

Page 12: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Research Outputs

Revision/SubmittedI Kichun Lee∗ and Alexander Gray. Dependency maps, a dimensionality reduction with dependency distance and

low-dimensional representation for high-dimensional data. Submitted to Data Mining and Knowledge Discovery (under minorrevision).

I Kichun Sky Lee, Jean Francois Coeurjolly, Brani Vidakovic. Variance estimation for fractional Brownian motions with fixedHurst parameters. Submitted to Communications in Statistics: Theory and methods

I Kichun Sky Lee∗ and Brani Vidakovic. Assessing time-changing hurst exponent and variance in mutifractional brownianmotion. Submitted to Annals of the Institute of Statistical Mathematics.

I Kichun Lee, John D. Carew, Jun-Hee Heu∗ . Recovering the boundary of a vessel wall from phase contrast magneticresonance images in low resolutions. Submitted to IEICE Transcationc on Information and Systems.

I Jongsu Lee, Chul-Yong Lee∗ , Kichun Lee. Forecasting Demand for a Newly Introduced Product Using Reservation PriceData and Bayesian Updating. Submitted to Industrial Marketing Management.

I Youngja H. Park, Kichun Lee, Quinlyn A. Soltow, Frederick H. Strobel, Kenneth L. Brigham, Richard E.Parker, Mark E.Wilson, Roy L. Sutliff, Keith G. Mansfield, Lynn M. Wachtman, Thomas R. Ziegler, Dean P. Jones∗ . Divergent behavior of 1environmental chemicals and endogenous metabolites in mammals. Submitted to Toxicology.

Working papersI Kichun Lee, Youngja Park, Dean P Jones∗ . Principal component loading statistics (PCLS) based feature selection for

discriminant analysis with PCA, PLS, and OPLS in spectroscopy data.I Youngja Park∗ , Kichun Lee, Thomas R. Ziegler, Gautam Habber, Brani Vidakovic, Dean P Jones. Assessment of nutritional

deficiency using Multifractal analysis on 1H NMR spectra of human plasma. In preparation for American Journal ofPhysiology-Regulatory Integrative and Comparative Physiology.

I Kyoung-jae Kim, Kichun Lee, Hyunchul Ahn∗ . Predicting financial distress for corporate risk management using SVM withtwo-dimensional reduction technique. In preparation for Information Science.

Kichun “Sky” Lee 10/31/2011 7/30 Feature Extraction with Data Mining

Page 13: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Research Outputs

Revision/SubmittedI Kichun Lee∗ and Alexander Gray. Dependency maps, a dimensionality reduction with dependency distance and

low-dimensional representation for high-dimensional data. Submitted to Data Mining and Knowledge Discovery (under minorrevision).

I Kichun Sky Lee, Jean Francois Coeurjolly, Brani Vidakovic. Variance estimation for fractional Brownian motions with fixedHurst parameters. Submitted to Communications in Statistics: Theory and methods

I Kichun Sky Lee∗ and Brani Vidakovic. Assessing time-changing hurst exponent and variance in mutifractional brownianmotion. Submitted to Annals of the Institute of Statistical Mathematics.

I Kichun Lee, John D. Carew, Jun-Hee Heu∗ . Recovering the boundary of a vessel wall from phase contrast magneticresonance images in low resolutions. Submitted to IEICE Transcationc on Information and Systems.

I Jongsu Lee, Chul-Yong Lee∗ , Kichun Lee. Forecasting Demand for a Newly Introduced Product Using Reservation PriceData and Bayesian Updating. Submitted to Industrial Marketing Management.

I Youngja H. Park, Kichun Lee, Quinlyn A. Soltow, Frederick H. Strobel, Kenneth L. Brigham, Richard E.Parker, Mark E.Wilson, Roy L. Sutliff, Keith G. Mansfield, Lynn M. Wachtman, Thomas R. Ziegler, Dean P. Jones∗ . Divergent behavior of 1environmental chemicals and endogenous metabolites in mammals. Submitted to Toxicology.

Working papersI Kichun Lee, Youngja Park, Dean P Jones∗ . Principal component loading statistics (PCLS) based feature selection for

discriminant analysis with PCA, PLS, and OPLS in spectroscopy data.I Youngja Park∗ , Kichun Lee, Thomas R. Ziegler, Gautam Habber, Brani Vidakovic, Dean P Jones. Assessment of nutritional

deficiency using Multifractal analysis on 1H NMR spectra of human plasma. In preparation for American Journal ofPhysiology-Regulatory Integrative and Comparative Physiology.

I Kyoung-jae Kim, Kichun Lee, Hyunchul Ahn∗ . Predicting financial distress for corporate risk management using SVM withtwo-dimensional reduction technique. In preparation for Information Science.

Kichun “Sky” Lee 10/31/2011 7/30 Feature Extraction with Data Mining

Page 14: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Research in IT

Client Layer Source LayerWAS Middleware Layer

HTML/Web browser

WSDL/DLL Web service client

Applet

Web server

HTML/CGI/PHP/SSI

Engine Container

EJB Servlet

JMS WS/ebXML

Database

TX Mgr

Other J2EE

HTTP

SOAP/ebXML

RMI

JDBC

X/OPEN

IIOP

Client Layer Source LayerWAS Middleware Layer

Java Application

COM Application

NMS

JMS WS/ebXML

Manager

JNDI Security JTA

Session JMX JCA

Administration Tools

Other J2EE

Mainframe

Legacy EIS

RMI

COM Bridge

JMX

TCP/IP

Connector

Kichun “Sky” Lee 10/31/2011 8/30 Feature Extraction with Data Mining

Page 15: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Research in IT

Client Layer Source LayerWAS Middleware Layer

HTML/Web browser

WSDL/DLL Web service client

Applet

Web server

HTML/CGI/PHP/SSI

Engine Container

EJB Servlet

JMS WS/ebXML

Database

TX Mgr

Other J2EE

Client Layer Source LayerWAS Middleware Layer

HTTP

SOAP/ebXML

RMI

JDBC

X/OPEN

IIOP

Java Application

COM Application

NMS

JMS WS/ebXML

Manager

JNDI Security JTA

Session JMX JCA

Administration Tools

Other J2EE

Mainframe

Legacy EIS

RMI

COM Bridge

JMX

TCP/IP

Connector

I Java EE 1.3 and 1.4 certificates, the first in the world (team award)

Kichun “Sky” Lee 10/31/2011 8/30 Feature Extraction with Data Mining

Page 16: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Research in IT

Client Layer Source LayerWAS Middleware Layer

ERP BPM MRP CRM

IT Services: logistics, financial, health-care etc.

HTML/Web browser

WSDL/DLL Web service client

Applet

Web server

HTML/CGI/PHP/SSI

Engine Container

EJB Servlet

JMS WS/ebXML

Database

TX Mgr

Other J2EE

Client Layer Source LayerWAS Middleware Layer

HTTP

SOAP/ebXML

RMI

JDBC

X/OPEN

IIOP

Java Application

COM Application

NMS

JMS WS/ebXML

Manager

JNDI Security JTA

Session JMX JCA

Administration Tools

Other J2EE

Mainframe

Legacy EIS

RMI

COM Bridge

JMX

TCP/IP

Connector

I Java EE 1.3 and 1.4 certificates, the first in the world (team award)

Kichun “Sky” Lee 10/31/2011 8/30 Feature Extraction with Data Mining

Page 17: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Research in IT

Client Layer Source LayerWAS Middleware Layer

ERP BPM MRP CRM

IT Services: logistics, financial, health-care etc.

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

-1.5

-1

-0.5

0

0.5

-1

-0.5

0

0.5

HTML/Web browser

WSDL/DLL Web service client

Applet

Web server

HTML/CGI/PHP/SSI

Engine Container

EJB Servlet

JMS WS/ebXML

Database

TX Mgr

Other J2EE

Client Layer Source LayerWAS Middleware Layer

HTTP

SOAP/ebXML

RMI

JDBC

X/OPEN

IIOP

500 1000 1500 2000-2

500 1000 1500 2000-2

-1.5

Java Application

COM Application

NMS

JMS WS/ebXML

Manager

JNDI Security JTA

Session JMX JCA

Administration Tools

Other J2EE

Mainframe

Legacy EIS

RMI

COM Bridge

JMX

TCP/IP

Connector

I Java EE 1.3 and 1.4 certificates, the first in the world (team award)

Kichun “Sky” Lee 10/31/2011 8/30 Feature Extraction with Data Mining

Page 18: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Presentation Overview

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

Data of High Frequency

Kichun “Sky” Lee 10/31/2011 9/30 Feature Extraction with Data Mining

Page 19: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Presentation Overview

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

Data of High Frequency

transform Inverse transform

Decision

Transformed Data

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

Transformed Data

Kichun “Sky” Lee 10/31/2011 9/30 Feature Extraction with Data Mining

Page 20: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Presentation Overview

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

Data of High Frequency

Classification

Y

N

Descriptors

transform Inverse transform

Decision

Transformed Data

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

Transformed Data

Kichun “Sky” Lee 10/31/2011 9/30 Feature Extraction with Data Mining

Page 21: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Presentation Overview

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

Data of High Frequency

Classification

Y

N

Descriptors

Fractality, Regularity

transform

Fractality, Regularity

Dimensionality Reduction

Inverse transform

Decision

Transformed Data

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

Shrinkage

Transformed Data

SVMs Applications

Kichun “Sky” Lee 10/31/2011 9/30 Feature Extraction with Data Mining

Page 22: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

IntroductionResearch OverviewPresentation Overview

Presentation Overview

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

Data of High Frequency

Classification

Y

N

Descriptors

Fractality, Regularity

transform

Fractality, Regularity

Dimensionality Reduction

Inverse transform

Decision

Transformed Data

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

Shrinkage

Transformed Data

SVMs Applications

Kichun “Sky” Lee 10/31/2011 9/30 Feature Extraction with Data Mining

Page 23: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reduction

Part II

Dependence Maps:a new dimensionality reduction

Kichun “Sky” Lee 10/31/2011 10/30 Feature Extraction with Data Mining

Page 24: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reduction

Part II

Dependence Maps:a new dimensionality reduction

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

Data of High Frequency

Classification

Y

N

Descriptors

Fractality, Regularity

transform

Fractality, Regularity

Dimensionality Reduction

Inverse transform

Decision

Transformed Data

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

Shrinkage

Transformed Data

SVMs Applications

Kichun “Sky” Lee 10/31/2011 10/30 Feature Extraction with Data Mining

Page 25: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Introduction of Dimensionality Reduction (DR)

I Goal?I Low dimensional representation of high dimensional data:

Ω = x1, . . . ,xn, xi ∈ Rm

find z1, . . . ,zn ∈ Rp such that pm

Kichun “Sky” Lee 10/31/2011 11/30 Feature Extraction with Data Mining

Page 26: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Introduction of Dimensionality Reduction (DR)

I Goal?I Low dimensional representation of high dimensional data:

Ω = x1, . . . ,xn, xi ∈ Rm

find z1, . . . ,zn ∈ Rp such that pm

Kichun “Sky” Lee 10/31/2011 11/30 Feature Extraction with Data Mining

Page 27: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Introduction of Dimensionality Reduction (DR)

I Goal?I Low dimensional representation of high dimensional data:

Ω = x1, . . . ,xn, xi ∈ Rm

find z1, . . . ,zn ∈ Rp such that pm

m dimension p dimension

1x2x

1z2z

3x4x

3z4z

1nx 1nz1nnx

1nnz

Kichun “Sky” Lee 10/31/2011 11/30 Feature Extraction with Data Mining

Page 28: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Motivation for a new DR

Kichun “Sky” Lee 10/31/2011 12/30 Feature Extraction with Data Mining

Page 29: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Motivation for a new DR

Kichun “Sky” Lee 10/31/2011 12/30 Feature Extraction with Data Mining

Page 30: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Motivation for a new DR

Kichun “Sky” Lee 10/31/2011 12/30 Feature Extraction with Data Mining

Page 31: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Motivation for a new DR

Kichun “Sky” Lee 10/31/2011 12/30 Feature Extraction with Data Mining

Page 32: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Introduction of Dependence

Definition

Depdendency between xm ∈ X0 and xi ∈ Xt , denoted by Dept(m, i), isPr(Xt=xi ,X0=xm)

Pr(Xt=xi)Pr(X0=xm).

I Dept (m, i) < 1, negatively dependentI Dept (m, i) = 1, independentI Dept (m, i) > 1, positively dependent

I Assumption?I Each xi ∈ Ω is connected with others with its own “closenesses” to

otherså Consider connectivities of xi at t-th step Markovian transition

Q. How is it related to our goal?

Kichun “Sky” Lee 10/31/2011 13/30 Feature Extraction with Data Mining

Page 33: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Introduction of Dependence

Definition

Depdendency between xm ∈ X0 and xi ∈ Xt , denoted by Dept(m, i), isPr(Xt=xi ,X0=xm)

Pr(Xt=xi)Pr(X0=xm).

I Dept (m, i) < 1, negatively dependentI Dept (m, i) = 1, independentI Dept (m, i) > 1, positively dependent

I Assumption?I Each xi ∈ Ω is connected with others with its own “closenesses” to

otherså Consider connectivities of xi at t-th step Markovian transition

Q. How is it related to our goal?

Kichun “Sky” Lee 10/31/2011 13/30 Feature Extraction with Data Mining

Page 34: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Introduction of Dependence

Definition

Depdendency between xm ∈ X0 and xi ∈ Xt , denoted by Dept(m, i), isPr(Xt=xi ,X0=xm)

Pr(Xt=xi)Pr(X0=xm).

I Dept (m, i) < 1, negatively dependentI Dept (m, i) = 1, independentI Dept (m, i) > 1, positively dependent

I Assumption?I Each xi ∈ Ω is connected with others with its own “closenesses” to

otherså Consider connectivities of xi at t-th step Markovian transition

Q. How is it related to our goal?

Kichun “Sky” Lee 10/31/2011 13/30 Feature Extraction with Data Mining

Page 35: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Dependence Distance

I The dependence distance between xi and xj at t-step transition in Ωis defined:

D2t (i, j) =

n

∑m=1

(Dept(m, i)−Dept(m, j)

)2

transitions

x ixdependence

)( iXXD………… …mx ),( 0 iXmXDep t

xdependencejxp

),( 0 jXmXDep t

X X X X0X 1X 2X tX

Kichun “Sky” Lee 10/31/2011 14/30 Feature Extraction with Data Mining

Page 36: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Dependence Mapping

Theorem

D2t (i, j) =

n

∑m=1

λ2m

(vm(i)− vm(j)

)2,

vm are eigenvectors of P t(D(t))−1, P t is the t-th step transition probabilityand a diagonal matrix D(t) is defined by the column sum of P t :D(t)

kk = ∑n`=1 P t

`,k .

I By taking the largest p nontrivial eigenvalues of P t(D(t))−1

xi 7→(v1(i), . . . ,vp(i)

)Kichun “Sky” Lee 10/31/2011 15/30 Feature Extraction with Data Mining

Page 37: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Dependence Mapping

Theorem

D2t (i, j) =

n

∑m=1

λ2m

(vm(i)− vm(j)

)2,

vm are eigenvectors of P t(D(t))−1, P t is the t-th step transition probabilityand a diagonal matrix D(t) is defined by the column sum of P t :D(t)

kk = ∑n`=1 P t

`,k .

I By taking the largest p nontrivial eigenvalues of P t(D(t))−1

xi 7→(v1(i), . . . ,vp(i)

)Kichun “Sky” Lee 10/31/2011 15/30 Feature Extraction with Data Mining

Page 38: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Illustration of Dependence Mapping

Input LLE ISO Diffusion Dependence

Kichun “Sky” Lee 10/31/2011 16/30 Feature Extraction with Data Mining

Page 39: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Application to Volatile Organic Compounds (VOCs)

To diagnose cancerbased on VOCs of

their breathsubject index

VOCs

Case group

5 10 15 20

50

100

150

200

250

300

350

Scaled image of VOC data for the casegroup; 20 individuals;xi ∈ R380

subject index

VOCs

Control group

5 10 15 20

50

100

150

200

250

300

350

Scaled image for the control group: 20individuals

0 100 200 300 4000

200

400

600

800

1000

1200

index of VOCs

mag

nitu

de o

f VO

C

2nd subject in the case group;2nd column of thematrix in (a)

0 100 200 300 4000

200

400

600

800

1000

1200

index of VOCs

mag

nitu

de o

f VO

C

10th subject in the control group;10th column ofthe matrix in (b)

Kichun “Sky” Lee 10/31/2011 17/30 Feature Extraction with Data Mining

Page 40: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Dependence Maps: a new dimensionality reductionMotivationIntroduction of DependenceDependence Distance

Application to Volatile Organic Compounds (VOCs)

−0.4 −0.2 0 0.2 0.4−0.2

−0.1

0

0.1

0.2

0.3

1st component

2nd

com

pone

nt

ControlCase

Dependence Maps

−5000 0 5000 10000−3000

−2000

−1000

0

1000

2000

1st component

2nd

com

pone

nt

ControlCase

ISOMAP.

−0.5 0 0.50

0.05

0.1

0.15

0.2

0.25

0.3

1st component

2nd

com

pone

nt

ControlCase

Diffusion Maps zoomed in

−5000 0 5000 10000 15000−1000

0

1000

2000

3000

4000

1st component

2nd

com

pone

nt

ControlCase

PCA

−4 −3 −2 −1 0 1 2−6

−5

−4

−3

−2

−1

0

1

2

1st component

2nd

com

pone

nt

ControlCase

LLE

−800 −600 −400 −200 0 200−200

−150

−100

−50

0

1st component

2nd

com

pone

nt

ControlCase

PCA zoomed in

Kichun “Sky” Lee 10/31/2011 18/30 Feature Extraction with Data Mining

Page 41: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domains

Part III

Semi-supervised Shrinkage Rule:a classification on wavelet domains

Kichun “Sky” Lee 10/31/2011 19/30 Feature Extraction with Data Mining

Page 42: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domains

Part III

Semi-supervised Shrinkage Rule:a classification on wavelet domains

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

500 1000 1500 2000-2

-1.5

-1

-0.5

0

0.5

Data of High Frequency

Classification

Y

N

Descriptors

Fractality, Regularity

transform

Fractality, Regularity

Dimensionality Reduction

Inverse transform

Decision

Transformed Data

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

500 1000 1500 2000-25

-20

-15

-10

-5

0

5

10

Shrinkage

Transformed Data

SVMs Applications

Kichun “Sky” Lee 10/31/2011 19/30 Feature Extraction with Data Mining

Page 43: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

Wavelet Data Shrinkage: Hard-thresholding

0 0.2 0.4 0.6 0.8 1

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

W−→ 100 200 300 400 500 600 700 800 900 1000−5

−4

−3

−2

−1

0

1

2

3

4

5

0 0.2 0.4 0.6 0.8 1

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

W−1

←− 100 200 300 400 500 600 700 800 900 1000−5

−4

−3

−2

−1

0

1

2

3

4

5

Kichun “Sky” Lee 10/31/2011 20/30 Feature Extraction with Data Mining

Page 44: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

Wavelet Data Shrinkage: Hard-thresholding

0 0.2 0.4 0.6 0.8 1

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

W−→ 100 200 300 400 500 600 700 800 900 1000−5

−4

−3

−2

−1

0

1

2

3

4

5

0 0.2 0.4 0.6 0.8 1

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

W−1

←− 100 200 300 400 500 600 700 800 900 1000−5

−4

−3

−2

−1

0

1

2

3

4

5

Kichun “Sky” Lee 10/31/2011 20/30 Feature Extraction with Data Mining

Page 45: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

Idea of Semi-supervised Learning

I IdeaI Use both labeled and unlabeled observations

−2 −1 0 1 2

−1

−0.5

0

0.5

1

?

With labeled observations only

−2 −1 0 1 2

−1

−0.5

0

0.5

1

?

With labeled and unlabeled observa-tions (blue dots)

Kichun “Sky” Lee 10/31/2011 21/30 Feature Extraction with Data Mining

Page 46: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

Idea of Semi-supervised Learning

I IdeaI Use both labeled and unlabeled observations

−2 −1 0 1 2

−1

−0.5

0

0.5

1

?

With labeled observations only

−2 −1 0 1 2

−1

−0.5

0

0.5

1

?

With labeled and unlabeled observa-tions (blue dots)

Kichun “Sky” Lee 10/31/2011 21/30 Feature Extraction with Data Mining

Page 47: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

SS Learning by Manifold Regularization

I Given empirical data

(x1,y1), . . . ,(x`,y`) ∈ X × 0,1,xl+1, . . . ,x`+u ∈ X ,

goal: Find a function f

f : X 7→ 0,1.

I Approach: Penalty V for labeled data + penalty along the topology ofX .

f = arg minf

‖f‖A=c

1`

`

∑i=1

V (xi ,yi , f ) + γI‖f‖2I .

Kichun “Sky” Lee 10/31/2011 22/30 Feature Extraction with Data Mining

Page 48: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

SS Learning by Manifold Regularization

I Given empirical data

(x1,y1), . . . ,(x`,y`) ∈ X × 0,1,xl+1, . . . ,x`+u ∈ X ,

goal: Find a function f

f : X 7→ 0,1.

I Approach: Penalty V for labeled data + penalty along the topology ofX .

f = arg minf

‖f‖A=c

1`

`

∑i=1

V (xi ,yi , f ) + γI‖f‖2I .

Kichun “Sky” Lee 10/31/2011 22/30 Feature Extraction with Data Mining

Page 49: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

SS Learning by Manifold Regularization

I Solution:I For V , an approximation: Fix f (xi ) = yi , i = 1, . . . , `I For ‖ ‖I , the graph Laplacian L = D−W

with adjacency matrix W = [Wij ] and D = diag∑`+uj=1 Wij

I Typically, Wij = exp−‖xi − xj‖2/σ if xi ,xj are in the neighborhood

f = arg minf

‖f‖A=c

1`

`

∑i=1

V (xi ,yi , f ) + γI

`+u

∑i,j=1

(f (xi)− f (xj)

)2Wij

= arg minf

‖f‖A=c

f˜T Lf˜ = [f`; −L−13 L2f`],

where f˜=(

f (x1), . . . , f (x`+u))T

and f` =(

f (x1), . . . , f (x`))T

Kichun “Sky” Lee 10/31/2011 23/30 Feature Extraction with Data Mining

Page 50: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

SS Learning by Manifold Regularization

I Solution:I For V , an approximation: Fix f (xi ) = yi , i = 1, . . . , `I For ‖ ‖I , the graph Laplacian L = D−W

with adjacency matrix W = [Wij ] and D = diag∑`+uj=1 Wij

I Typically, Wij = exp−‖xi − xj‖2/σ if xi ,xj are in the neighborhood

f = arg minf

‖f‖A=c

1`

`

∑i=1

V (xi ,yi , f ) + γI

`+u

∑i,j=1

(f (xi)− f (xj)

)2Wij

= arg minf

‖f‖A=c

f˜T Lf˜ = [f`; −L−13 L2f`],

where f˜=(

f (x1), . . . , f (x`+u))T

and f` =(

f (x1), . . . , f (x`))T

Kichun “Sky” Lee 10/31/2011 23/30 Feature Extraction with Data Mining

Page 51: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

SS Learning by Manifold Regularization

I Solution:I For V , an approximation: Fix f (xi ) = yi , i = 1, . . . , `I For ‖ ‖I , the graph Laplacian L = D−W

with adjacency matrix W = [Wij ] and D = diag∑`+uj=1 Wij

I Typically, Wij = exp−‖xi − xj‖2/σ if xi ,xj are in the neighborhood

f = arg minf

‖f‖A=c

1`

`

∑i=1

V (xi ,yi , f ) + γI

`+u

∑i,j=1

(f (xi)− f (xj)

)2Wij

= arg minf

‖f‖A=c

f˜T Lf˜ = [f`; −L−13 L2f`],

where f˜=(

f (x1), . . . , f (x`+u))T

and f` =(

f (x1), . . . , f (x`))T

Kichun “Sky” Lee 10/31/2011 23/30 Feature Extraction with Data Mining

Page 52: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

Semi-supervised Shrinkage

I Select background shrinkage δ , e.g., hard thresholding withthresholds, λ1, and λ2; λ = σ

√(2 + τ) logN

I Estimator θ SS of SS for θ is given by

θSS(d |δ ,λ1,λ2) =

0 if |d |< λ1,δ (d)f (d˜) if λ1 ≤ |d | ≤ λ2,

δ (d) if |d | ≥ λ2

å Sort labeled coefficients (included, excluded) and unlabeled ones.

å Determine identities of unlabeled ones by “semi-supervised learning.”

Kichun “Sky” Lee 10/31/2011 24/30 Feature Extraction with Data Mining

Page 53: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

Semi-supervised Shrinkage

I Select background shrinkage δ , e.g., hard thresholding withthresholds, λ1, and λ2; λ = σ

√(2 + τ) logN

I Estimator θ SS of SS for θ is given by

θSS(d |δ ,λ1,λ2) =

0 if |d |< λ1,δ (d)f (d˜) if λ1 ≤ |d | ≤ λ2,

δ (d) if |d | ≥ λ2

å Sort labeled coefficients (included, excluded) and unlabeled ones.

å Determine identities of unlabeled ones by “semi-supervised learning.”

Kichun “Sky” Lee 10/31/2011 24/30 Feature Extraction with Data Mining

Page 54: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

Semi-supervised Shrinkage

I Select background shrinkage δ , e.g., hard thresholding withthresholds, λ1, and λ2; λ = σ

√(2 + τ) logN

I Estimator θ SS of SS for θ is given by

θSS(d |δ ,λ1,λ2) =

0 if |d |< λ1,δ (d)f (d˜) if λ1 ≤ |d | ≤ λ2,

δ (d) if |d | ≥ λ2

å Sort labeled coefficients (included, excluded) and unlabeled ones.

å Determine identities of unlabeled ones by “semi-supervised learning.”

Kichun “Sky” Lee 10/31/2011 24/30 Feature Extraction with Data Mining

Page 55: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

Illustration of δ in SS Shrinkage

−6 −4 −2 0 2 4 6

−6

−4

−2

0

2

4

6

d

δSS(d

| δha

rd)

δ SS based on δ hard

−6 −4 −2 0 2 4 6

−6

−4

−2

0

2

4

6

d

δSS(d

| δse

miso

ft )

δ SS based on δ semisoft

Kichun “Sky” Lee 10/31/2011 25/30 Feature Extraction with Data Mining

Page 56: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

Optimality of the SS Rule

I The oracle risk of diagonal projection (DP), which is unachievable,becomes

R(θ ,θ) = E ||θ −θ ||22 =N

∑i=1

min(θ2i ,σ

2) := Roracle(DP,θ)

Theorem

The SS estimator δ SS(d |δ hard ,λ1,λ2), as defined above, satisfies theinequality

R(δSS,θ)≤ L

σ

2 + Roracle(DP,θ)

for an L∼ logN and all θ ∈ RN , where λ1 and λ2 are sufficiently close toσ√

2 logN

Kichun “Sky” Lee 10/31/2011 26/30 Feature Extraction with Data Mining

Page 57: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

Optimality of the SS Rule

I The oracle risk of diagonal projection (DP), which is unachievable,becomes

R(θ ,θ) = E ||θ −θ ||22 =N

∑i=1

min(θ2i ,σ

2) := Roracle(DP,θ)

Theorem

The SS estimator δ SS(d |δ hard ,λ1,λ2), as defined above, satisfies theinequality

R(δSS,θ)≤ L

σ

2 + Roracle(DP,θ)

for an L∼ logN and all θ ∈ RN , where λ1 and λ2 are sufficiently close toσ√

2 logN

Kichun “Sky” Lee 10/31/2011 26/30 Feature Extraction with Data Mining

Page 58: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

Simulations: Noised Signals

0 0.5 10

2

4

6

8

10

12Bumps

0 0.5 10

2

4

6

Blocks

0 0.5 1

0

2

4

6HeaviSine

0 0.5 10

2

4

6Doppler

0 0.5 1

−2

0

2

4Piecewise−Regular

0 0.5 1

−2

0

2

4Piecewise−Polynomial

Kichun “Sky” Lee 10/31/2011 27/30 Feature Extraction with Data Mining

Page 59: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

Simulations: After Shrinkage

0 0.5 10

2

4

6

8

10

12Bumps

0 0.5 10

2

4

6

Blocks

0 0.5 1

0

2

4

6HeaviSine

0 0.5 10

2

4

6Doppler

0 0.5 1

−2

0

2

4Piecewise−Regular

0 0.5 1

−2

0

2

4Piecewise−Polynomial

Kichun “Sky” Lee 10/31/2011 28/30 Feature Extraction with Data Mining

Page 60: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

Comparisons with the Background Shrinkages

I Comparison (AMSE ratio) of SS based on its backgroundthresholding

I Ratio < 1 means SS performed better than its backgroundthresholding on average

1 2 3 4 5 60.92

0.94

0.96

0.98

1

1.02

AM

SE

rat

io

SNR=3, Hybrid-SureShrink,N = 1024

1 2 3 4 5 60.92

0.94

0.96

0.98

1

1.02

AM

SE

rat

io

SNR=5, Hard shrinkage,N = 2048, which minimizesAMSE

1 2 3 4 5 6

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Thr

esho

ld le

vels

in τ

Hard−thresholdSS threshold

Threshold levels of λ1 and λ2

in τ

Kichun “Sky” Lee 10/31/2011 29/30 Feature Extraction with Data Mining

Page 61: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Semi-supervised Shrinkage Rule: a classification on wavelet domainsWavelet Data ShrinkageIntroduction of Semi-supervised LearningSemi-supervised Shrinkage

Relationship to Other Measurements

Table: Comparisons (AMSE ratio) of the SS rule with various backgroundshrinkages. Under-bared numbers mean SS performed better than itsbackground thresholding

ABE SS Rule BAMS SS Rule Hybrid SS Rule VisuShrink SS RuleBumps 1.934 2.063 1.951 1.948 2.600 2.532 2.939 2.702Blocks 1.008 .9196 .9093 .9071 2.031 1.818 1.148 1.009

HeaviSine .7119 .5189 .4071 .4072 .4004 .3836 .5131 .5497Doppler 1.018 .8907 .7964 .7955 1.238 1.185 1.076 1.053P.Reg. 1.341 1.296 1.118 1.117 1.621 1.565 1.654 1.615P.Poly. 1.756 1.832 1.555 1.555 2.105 2.058 2.641 2.517

Kichun “Sky” Lee 10/31/2011 30/30 Feature Extraction with Data Mining

Page 62: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

I Conclusion

Data mining techniques + Feature extractionon domains of both original time and wavelet

ex:I Dependence-based dimensionality reductionI Semi-supervised shrinkage rules

Now what?

Page 63: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

I Conclusion

Data mining techniques + Feature extractionon domains of both original time and wavelet

ex:I Dependence-based dimensionality reductionI Semi-supervised shrinkage rules

Now what?

Page 64: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

I Research plan

IT Service BT ServiceKnowledge

Service

Page 65: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

I Research plan

IT Service BT ServiceKnowledge

Service

Data MiningData Mining Time SeriesTime SeriesData MiningData Mining Time SeriesTime Series

Information TechnologyInformation Technology

Page 66: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

I Research plan

Data MiningData Mining Time SeriesTime Series

IT Service BT ServiceKnowledge

Service

Data MiningData Mining Time SeriesTime Series

Service Quality EngineeringService Quality Engineering

Information TechnologyInformation Technology

Knowledge engineeringKnowledge engineering

Business intelligenceBusiness intelligence

Data mining in IT/BT/Knowledge servicesData mining in IT/BT/Knowledge services

Systematic quality management, delivery, integration of IT/BT/Knowledge services

Systematic quality management, delivery, integration of IT/BT/Knowledge services

Page 67: Feature Extraction with Data Miningzoe.bme.gatech.edu/~klee7/docs/HYU-IE-Sky.pdfFeature Extraction with Data Mining: Introducing Sky’s research Ph.D. Kichun “Sky” Lee Post-doctoral

Thank you!