Post on 15-Jul-2020
Bayesian Inference Technique for Data mining for Yield
Enhancement in Semiconductor Manufacturing Data
Presenter: M. Khakifirooz
Co-authors: C-F Chien, Y-J Chen
National Tsing Hua University
ISMI 2015, 16th -18th Oct.
KAIST, Daejeon, Korea
1
The Purpose of Bayesian Inference
Data Structure provided by Data Model
Data Analysis Approach
•Bayesian Variable Selection (BVS)
•Data Clearance
•Yield Classification
Conclusive Research
Framework
Final Decision
Table
Conclusion &
Path Forward
2
Outline
Bayesian Inference
Naïve Bayesian Classifier
Gaussian Bayesian Classifier
…
Bayesian Networks
3
Learning Curve
The Purpose of Bayesian Inference
4
Human ExperienceHuman Experience
+
System Analysis
Yield Learning Curve of Semiconductor Manufacturing
Yield Learning Curve of
Semiconductor
Manufacturing:
In addition to data
analytics, Cumulative
Engineering Training
and Experiencesignificantly enhanced
yield improvement
Effron(1996), Tobin et al. (1999)
The Purpose of Bayesian Inference
5
Data Structure provided by Data Model
𝑖 = 1, … ,𝑀𝑁
⋕ of process stagesample size
1 ≤ 𝑘𝑖 ≤ 𝑁 ⋕ of specify tools at each stage𝑛𝑖𝑗 , 𝑗 = 1,… , 𝑘𝑖1 ≤ 𝑃𝑛𝑖𝑗 ≤ 𝑛𝑖𝑗
𝑝𝑙 , 𝑙 = 1,… , 𝑃𝑛𝑖𝑗
frequency of each specify tool⋕ of exist chambers for each toolfrequency of each exist chamber
𝑁 =
𝑗=1
𝑘𝑖
𝑙=1
𝑃𝑛𝑖𝑗
𝑝𝑙 𝑁 ∗ 𝑀 =
𝑖=1
𝑀
𝑗=1
𝑘𝑖
𝑙=1
𝑃𝑛𝑖𝑗
𝑝𝑙
Response Variable: %Yield (continues)
Explanatory Variables: Stages (tools-chambers) (nominal)
Stages (process time) (continues)
Obs. 𝐯𝐚𝐫𝟏 𝐯𝐚𝐫𝟐
𝑛1 𝑎1 𝑎2
𝑛2 𝑎1 𝑏2
𝑛3 𝑏1 Na
Obs. 𝐯𝐚𝐫𝟏-𝒂𝟏 𝐯𝐚𝐫𝟏−𝒃1 𝐯𝐚𝐫𝟐-𝒂𝟐 𝐯𝐚𝐫𝟐-𝒃2
𝑛1 1 0 1 0
𝑛2 1 0 0 1
𝑛3 0 1 0 0
Nominal Variables
Dummy Variables
6
Data Structure provided by Data Model
Yield 𝒔𝒕𝒂𝒈𝒆 𝟏 𝒔𝒕𝒂𝒈𝒆 𝟐
obs. 1 𝑇𝑜𝑜𝑙 1 𝑇𝑜𝑜𝑙 2
obs. 2 𝑇𝑜𝑜𝑙 1 𝑇𝑜𝑜𝑙 1
obs. 3 𝑇𝑜𝑜𝑙 2 Tool 2
Yield 𝒔𝒕𝒂𝒈𝒆 𝟏 𝒔𝒕𝒂𝒈𝒆 𝟐
obs. 1 Chamber 1 Chamber 2
obs. 2 Chamber 2 Chamber 1
obs. 3 Chamber 1 Chamber 2
Yield 𝒔𝒕𝒂𝒈𝒆 𝟏 𝒔𝒕𝒂𝒈𝒆 𝟐
obs. 1 𝑇𝑜𝑜𝑙 1. Chamber 1 𝑇𝑜𝑜𝑙 2. Chamber 2
obs. 2 𝑇𝑜𝑜𝑙 1. Chamber 2 𝑇𝑜𝑜𝑙 1. Chamber 1
obs. 3 𝑇𝑜𝑜𝑙 2. Chamber 1 𝑇𝑜𝑜𝑙 2. Chamber 2
Yield 𝒔𝒕𝒂𝒈𝒆 𝟏 𝒔𝒕𝒂𝒈𝒆 𝟐
obs. 1 𝐷𝑎𝑡𝑒 1.1 𝐷𝑎𝑡𝑒 1.2
obs. 2 𝐷𝑎𝑡𝑒 2.1 𝐷𝑎𝑡𝑒 2.2
obs. 3 𝐷𝑎𝑡𝑒 3.1 Date 3.2
Yield 𝒔 𝟏. 𝑻 𝟏. 𝑪𝒉 𝟏 𝒔 𝟏. 𝑻 𝟏. 𝑪𝒉 𝟐 𝒔 𝟏. 𝑻 𝟐. 𝑪𝒉 𝟏 𝒔 𝟐. 𝑻 𝟐. 𝑪𝒉 𝟐 𝒔 𝟐. 𝑻 𝟐. 𝑪𝒉 𝟐
obs. 1 1 0 0 1 0
obs. 2 0 1 0 0 1
obs. 3 0 0 1 1 0
Yield 𝒔 𝟏. 𝑻 𝟏. 𝑪𝒉 𝟏 𝒔 𝟏. 𝑻 𝟏. 𝑪𝒉 𝟐 𝒔 𝟏. 𝑻 𝟐. 𝑪𝒉 𝟏 𝒔 𝟐. 𝑻 𝟐. 𝑪𝒉 𝟐 𝒔 𝟐. 𝑻 𝟐. 𝑪𝒉 𝟐
obs. 1 𝐷𝑎𝑡𝑒 1.1 0 0 𝐷𝑎𝑡𝑒 1.2 0
obs. 2 0 𝐷𝑎𝑡𝑒 2.1 0 0 𝐷𝑎𝑡𝑒 2.2
obs. 3 0 0 𝐷𝑎𝑡𝑒 3.1 𝐷𝑎𝑡𝑒 2.3 0
7
Data Structure provided by Data Model
Obs. 𝐯𝐚𝐫𝟏-𝒂𝟏 𝐯𝐚𝐫𝟏−𝒃1 𝐯𝐚𝐫𝟏-𝒄𝟏
𝑛1 1 0 0
𝑛2 0 0 1
𝑛3 0 1 0
Pr(ith variable sellected)1
3
1
3
1
3
var1−𝑎1, var1−𝑏1, var1−𝑐1 𝑑Multinomial
1
3,1
3,1
3
1,0,0
0,0,10,1,0
𝐯𝐚𝐫𝟏-𝒂𝟏
𝐯𝐚𝐫𝟏-𝒄𝟏 𝐯𝐚𝐫𝟏−𝒃1
To randomly pick a point
in this space, we need a
continues distribution
Distribution over Multinomial
(posterior distribution):
Dirichlet Distribution
selection probability based on engineer experience
Critical Phenomena:
i. High dimensionality caused by transforming categorical variables to
dummies
ii. Multicollinearity caused by dummies nature
iii. Complicated posterior distribution caused hardness for direct
variable selection
Remedy:
Approximate Inference with SamplingUse random sampling (MCMC techniques: Gibbs sampler, Metropolis-Hastings,…) to approximate the
distribution and selecting significant explanatories
8
Data Analysis Approach
9
Data Analysis Approach: Gibbs Sampler
Suppose 𝒙𝟏, 𝒙𝟐~𝐏𝐫 𝑥, 𝑥2
Beginning with initial value 𝒙𝟏𝟎, 𝒙𝟐𝟎
Sampling at iteration t as follow:
Iteration Sample 𝐱𝟏 Sample 𝐱𝟐
k x𝟏𝑡 ~𝐏𝐫 x𝟏|x𝟐
t−1 x𝟐𝑡 ~𝐏𝐫 x𝟐|x𝟏
𝑡
Iterating the above step until the
sample values have the same
distribution as if they where
sampled from the true posterior
joint distribution
Based on frequency of visits, selecting the most probable variables
10
Data Analysis Approach: Data Clearance
When X is categorical (dummy var.) &
Y is quantitative variable- parametric or non-parametric?
- dependent or independent?
- unbalanced class?
Yield value Representative var.
Bad Yield 53.12 < 1
Middle Yield 53.12 ≤ and ≤ 57.51 ignore
Good Yield >57.51 0
11
Data Analysis Approach: Data Clearance
Level a Level b
Level c fc𝑎 fc𝑏
Level d fd𝑎 fd𝑏
Variable
I
Variable II
If both 𝑣𝑎𝑟. 𝐼 & 𝑣𝑎𝑟. 𝐼𝐼 are explanatory:
- test the Interchangeability of measures
- measurement of the degree of Homogeneity
If 𝑣𝑎𝑟. 𝐼 is explanatory and 𝑣𝑎𝑟. 𝐼𝐼 is response:
- measurement of the Reliability of instrument (test/scale)
- measurement of the Objectivity or lack of bias
MEASURMENT of AGREEMENTW. S. Robinson(1957)
Cohen’s Kappa 𝓚
𝒦 < 0, "No agreement"
0 ≤ 𝒦 < 0.2, “Slight agreement“
0.2 ≤ 𝒦 < 0.4, "Fair agreement"
0.4 ≤ 𝒦 < 0.6, "Moderate agreement"
0.6 ≤ 𝒦 < 0.8, "Substantial agreement"
0.8 ≤ 𝒦 ≤ 1, "Almost perfect agreement"
12
Research Framework (I)
Data
Preparation
Data
Mining &
Key Factor
Screening
Problem
Definition
Data Integration
Dummy Variable Construction for
Integrated Variables (1460 var.)
Wrap the associate variables
Cohen’s Kappa
Statistics for
each pairs of
input variables
Agreement
Assign Cutting Point &
Bad/Middle/Good WafersNo Agreement
A Bayesian Framework for
Semiconductor Manufacturing Data
Almost perfect
agreementSubstantial agreement Moderate agreement
3 109 1,764
Fair agreement Slight agreement No agreement
24,539 280,081 758,574
THE CLASS DISTRIBUTION FOR THE KAPPA TEST FOR EACH PAIR OF INPUT VARIABLES
13
Research Framework (II)
BVS via Gibbs Sampler
Data Clearance 𝒦 ≤ 0.2
No Agreement
Agreement
GLM Construction with Gaussian
distribution & Repeated Random
Sub-sampling Validation
A Comparison to the Wrapped
Variables
Define Abnormal Devices & Time
Model
Construction,
Evaluation &
Interpretation
Cohen’s Kappa
Statistics for
each pairs of X
& Y
Data
Mining &
Key Factor
Screening
ModelRMSE Adjusted R-squared
Min Median Max Min Median Max
Gibbs +
GLM1.842 2.653 2.841 0.046 0.371 0.711
GBM +
GLM2.534 3.051 3.332 0.000 0.053 0.337
RF +
GLM2.268 2.838 3.660 0.016 0.293 0.507
GLM 7.951 34.60 139.8 0.000 0.029 0.214
Number of resamples 20, Number of iterations 2
14
Decision Graph
High Yield
Middle Yield
Low Yield
FactorsDate
Bad Good
Stage10 - Tool2 - Chamber3 before 8/29/2014 2:32 after 8/29/2014 12:50
Stage12 - Tool2 - Chamber1between 8/30/2014 3:26 &
8/30/2014 3:43before 8/29/2014 10:55
Stage12 - Tool2 - Chamber4after 8/29/2014 7:36 till 8/30/2014
3:44before 8/29/2014 7:36
Stage13 - Tool5 - Chamber2 - generally effected the high yield
Stage17 - Tool2 - Chamber2 after 8/30/2014 12:21 before 8/30/2014 10:37
Stage23-Tool3-Chamber2 - generally effected the high yield
Stage44 - Tool7.- Chamber2 and
Chamber3at 9/3/2014 at 9/1/2014
Stage49 - Tool1.- Chamber4 at 9/3/2014 at 9/2/2014
Stage57 - Tool1.- Chamber3 - generally effected the high yield
15
Decision Table
Based on the empirical results, we validate that the proposed approach has
practical viability, which means adding the efficacy of domain knowledge
and experience to the system could improve results.
Using the domain knowledge might be to restrict conjunctions in rules to
tools, chambers and steps that are related to occurs within a reasonable
time frame.
The data are not sampled from a stationary population, hence, over the
time, the results may change significantly, or some empirical answer might
be reject based on engineer domain knowledge, which doesn’t mean that
the result is incorrect.
The result may be a proxy for one or more events that are occurring
elsewhere or at the other periods of the time, hence, the simulation study is
an essential tool for evaluation the accuracy of our proposed method.
16
Conclusion &
Path Forward
17