Towards root-cause analysis in compression ... - IBM Research
Transcript of Towards root-cause analysis in compression ... - IBM Research
Machine Learning group
© Copyright IBM Corporation 2006IBM Haifa Labs
Towards root-cause analysis in compression-based similarity
Justin Wong, Yiheng Xu, Elad Yom-Tov & Michal Rosen-ZviDecember 2006
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Similarity by compression
�If the tree for compressing one signal is good for compressing another signal, the two must be similar
�Therefore, the compression-based similarity measure:
� �)|(),|(min)|(
BA
B
TreeBsizeTreeAsizeTreeAsize
S �
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Similarity by compression
�The compression-based similarity measure: Normalized compression distance
Compression-based measures have been proposed and applied in �DNA clustering � Language hierarchies�Optical character recognition�Analysis of literature�Plagiarism detection�Classification of music
� �� �)(),(max
)(),(min)(BsizeAsize
BsizeAsizeABsizeS
��
“Clustering by Compression” by Rudi Cilibrasi and Paul M.B. Vitanyi, IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO 4, APRIL 2005, 1523–1545 1
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Example
�10 Noisy signals from two sensors, one has 5% addition to the basic signal in samples 15-20
0 10 20 30 40 50 60 70 80 90 100-1.5
-1
-0 .5
0
0.5
1
1.5
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
0 10 20 30 40 50 60 70 80 90 100-1.5
-1
-0 .5
0
0.5
1
1.5
Example
�10 Noisy signals from two sensors, one has 5% addition to the basic signal in samples 15-20
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Feature space
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6-1.4
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.305 0.31 0.315 0.320.308
0.31
0.312
0.314
0.316
0.318
0.32
Simple PCA 1st PCA on compression features
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Compression via general-purpose compression software
�Pros:
�Simple to do (“Black box”)
�Elegant
�Works! (Author detection, DNA analysis, image processing, …)
�Cons:
�No root cause analysis
�Expensive computationally
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
IBM’s East Fishkill 300mm Fab
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Residual Gas Analysis process
�A Quadrupole Mass Spectrometer, used for process monitoring and fault detection in semiconductor manufacturing.
�The main purpose of RGA systems is to detect the contamination residue on the wafer as well as undesirable process variation.
�During the degas process monitoring, the RGA measures the intensity of each mass within the pre-defined mass range.
�One full spectrum scan from mass 1 to mass 100 usually takes around 4-5 seconds
�For a typical degas process, about 20 scans are recorded per wafer and each scan consists of measurements of 100 masses. The intensity of each mass represents the concentration/partial pressure of its corresponding chemical specie.
© Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
RGA data
© Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Residual Gas Analysis (RGA) data
�Two datasets:�211 wafers, 80 defective (37.9%)
�135 wafers, 12 defective (8.9%)
�Data consists of between 19 and 25 spectral scans taken at 4 second intervals. �Compression allows using such variable length data.
�Preprocessing:�All measures below 10-6 were set to 10-6.
�All measurements were then transformed logarithmically (due to the large span of values).
�Processing:�Compression using bzip2
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Results
�Slightly over 88% of the wafers in dataset 1 and 97% of the wafers in dataset 2 were correctly classified � We used 10-fold cross-validation to validate
results.� AdaBoost, 100 iterations, linear weak learner� First 10 eigenvectors used as features
� The feature space of the first dataset (right) still shows clusters. These clusters have a strong correlation to the number of scans taken (that is, each cluster represents wafers with a certain number of scans).
� Therefore, it seems that the actual spectral values has a smaller effect than the number of data scans.
Defective/Working
-0.1-0.0500.050.10.15
-0.15
-0.1
-0.05
0
0.05
0.067
0.068
0.069
0.07
0.071
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Receiver Operating Characteristic Curve
Dataset 1 Dataset 2
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Identify the first scan at which a wafer can be identified as defective
�Let N be the maximal scan number for the specific wafer
�For i = 1 to N
�Check similarity of wafers using scans 1 through i
� If defective wafers are significantly different from working wafers, and the specific wafer is more similar to the defective wafers, stop.
�We generated a leave-one-out ROC using SVM (libsvm) for each of the N scans. If the area under the ROC is large, defective and working wafers are significantly different.
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Results
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Identify the variable which indicates that a specific wafer is defective
�Can compression be used?
�No, because amplitudes matter most.
�We identify the most indicative variable by measuring the ratio between the maximal amplitude of the mass (in any scan) divided by the median (across scans) of the maximal height of this mass
�This ratio is the severity level for each mass
�Tested on defective wafers in dataset 2, full agreement was found between current software and compression-based similarity
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Identifying problematic manufacturing stages in lithographic tools
Prime bake
Photoresistspin coating
Soft bake(PAB)
Exposure
Post-exposurebake (PEB)
Puddle resistdevelopment
Post-developbake (PDB)
Available measures:
� Critical dimension (CD) SEM measure
� Critical dimension (CD) Scatterometry pre-etching measures
Critical Dimension (CD) is measured to assess performance
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Finding relevant features using regression in the baking process
Clear correlation between pre-exposed bake timing and CDs
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Prediction of out of spec
Compression based prediction – provided the following categories:
� Baking
�In module (not in process)
�In process
� Between modules + in transition modules
� Other modules (coating/exposure)
5 modules x 2
2
Benefits from concentration in first part of the sequence that causes variations in CDs
� © Copyright IBM Corporation 2006
IBM Haifa Labs
Machine Learning group
Summary
�Compression is a useful method to measure similarity between many data types, without worrying about feature extraction
�Root-cause analysis remains a challenge, though possible in some cases
�Also, in some cases it may miss salient features of the data