FINAL SUBMISSION FORMAT INSTRUCTIONS FOR PROCEEDINGS OF BUSINESS ...
description
Transcript of FINAL SUBMISSION FORMAT INSTRUCTIONS FOR PROCEEDINGS OF BUSINESS ...
Using Data Mining Technology to Build an Quality Improvement System
Ruey-Shun ChenInstitute of Information Management, China University of Technology, Taiwan ,
No. 56, Sec. 3, Shinglung Rd., Wenshan Chiu, Taipei City 116, Taiwan [email protected]
R. C. Wu, and C. C. ChenInstitute of Information Management, National Chiao Tung University, Taiwan ,
1001 Ta Hsueh Road, Hsinchu, Taiwan 300, ROC
ABSTRACT Data mining technology is to provide enterprises to make decisions efficiently. In this
paper, we illustrate the design and establishing of such a system, we called an quality
improvement system is using data warehouse and data mining technology to discover
the main significant variables in the manufacturing packaging plants. Through the
comparisons of classification analysis of the proposed methods, we set up an
improvement system providing an efficiency tool for analyzing the data and detecting
problems, with a view to identifying the causes of problems and eventually enhancing
the yield. Moreover, analysis of data related to some real world problems are found
and solved in this research. The experimental results of this research shows that the
predictions made by decision tree analysis are more accurate than those made by the
other classifications, i.e. neural network, Bayesian, clustering and association rules.
Eventually, the use of decision tree algorithm will increase the yields and is more
powerful for detecting hidden patterns of problems in the packaging industry.
Keyword: Data mining, Information System, quality improve
INTRODUCTIONModern semiconductor manufacturing companies continues to play an important role
in the demands of the marketplace to push increasing chip productions. This paper
shows how the packaging flows run and discover the critical problems in a detailed
way. In each independent station will capture, store, integrate, and report the data
generated from each of the machine. In this paper, we based on our experiences to
develop such a system to analyze the main variable (Alex et al., 2000; Michael et al.,
1999). We also discuss experiences related to the practical use of data mining as a tool
to improve the productivity of problem solving in the yield enhancement. Because
enterprises engaged in keen competition attach great importance to improve products
quality and deliver products on time meeting customer’s requirements whoever
obtains accurate information faster and make decisions immediately, than rivals do,
will have a chance of being successful (Robert, 2000). Moreover, we detect the
hidden reasons of product quality problems and solve them. Another key issue that a
corporate faced is information system nowadays is to grasp correct information at the
right time and deliver it to the correct executives. In a word, an intelligent quality
improvement system based on data warehouse and data mining should be
implemented with a view to achieving the following four achievements.
1. Constructing data warehouse of quality problems from legacy systems with
semiconductor product quality and establishing applicable procedures to analyze
and solve the quality problems.
2. Exploring the critical variables that cause the problems of products quality and
finding the solutions to the vital variables from each problem.
3. Distinguish the accurate method of the proposed algorithms.4. Decreasing the chip defeats and increasing the production yields.
LITERATURE REVIEW2.1 Data warehouse
Data warehouse is the summation of all decision-making support techniques. It assists
knowledge workers in making decisions better and faster (Jiawei Han and Micheline
Kamber, 2001) and it is the core of a decision-making support system. It is believed
that, data warehouse should not only have a database function, but it should also have
the following four features: 1. Integration – data warehouse combines an enterprise’s information sources,
including various computer systems, databases, and application programs, etc. The information sources may be discrete and inconsistent.
2. Subject-oriented – combinations of data are created at will to answer questions raised by specific companies or organizations.
3. Time variables – unlike conventional operating data, data warehouse attaches great importance to dynamic data that vary with time (week, month, year) and data acquired from outside an enterprise.
4. Invariability – once data are stored in a data warehouse, they will be preserved and will not be changed any more, and as a result they are read-only. In other words, new data increase with time and thus they are continually added to a data warehouse to be used by decision-makers.
In short, by creating a centralized data warehouse, using appropriate data analysis
tools, and quickly developing software that supports decision-making, data warehouse
enables decision-makers to acquire intended information at any time and use the
acquired information as important references for supporting their decision-making (J.
Ross Quinlan, 1993; Surajit Chaudhuri, Umeshwar Dayal, 1997).
2.2 Data mining
Data mining is the process whereby knowledge is discovered in a database and then
implicit, previously unknown and potentially useful information is extracted from the
database (Frawley et al., 1991). It enables the discovery of potentially useful
information in voluminous information in order to provide references for decision-
makers. The whole process of data mining comprises data selection, preprocessing,
conversion, data analysis, and interpretation and evaluation (Yu, 1999).
After understanding the definition of data mining and the objective thereof, we have
to look into the steps leading to the discovery of knowledge. Kleissner (1998)
suggests that a knowledge discovery cycle should comprise the following four steps
1. data selection
2. data cleaning
3. data conversion and meaning-giving
4. data mining
The aforesaid steps lead to discovery of knowledge wherein the essence lies in mining
target data in order to discover knowledge. Brachman et al. (1996) believe that all the
activities and processes in connection to exploration of knowledge are intended to
find out useful patterns in those data, and then important causes of problems are
identified in order to solve the problems, using the data mining algorithm as well as
subsequent processing or re-processing of knowledge. After the discovery of
knowledge, related experts have to evaluate and explain the extracted knowledge so
as to ensure that the discovered knowledge will have genuine efficacy. A complete
process of knowledge discovery is shown in Fig. 1 (Jiawei Han, Micheline Kamber,
2001).
Fig 1. Data collection and relevant procedures in the course of data mining
2.3 LCD driver IC packaging process
To categorize different LCD driver IC back-end manufacturing processes, it can be
done by IC packaging types, three types can be identified: TCP (Tape Carrier
Package), COF (Chip on Film) and COG (Chip on Glass). Currently LCD driver IC
mostly use TCP package, Mobile phone LCD plate modules’ driver IC mostly use
COG package, and COF package is the future trend. Fig. 2 (Tsai, 2001) is LCD driver
IC back-end main process (Yang et al., 2001), and Table 1 (Industrial Technology
Research Institute Material Center, 2004) is the advantage comparisons among TAB
(Tape Automated Bonding, i.e. TCP) COG and COF.
Fig. 2. LCD driver IC back-end packaging process.
Table 1 :Advantage comparisons among TAB, COG and COF
PROBLEM STATEMENT
3.1 TCP packaging technology
The original purpose of TCP’s tape packaging technology was to replace the wire
bonding packaging, along with the increase of IC’s I/O numbers, and the trend of
automated production, TCP technology became ever more mature, and is currently the
main steam technology for large sized LCD driver IC packaging (Yang et al., 2001),
illustrated as Fig. 3. The following is the LCD driver IC’s packaging procedure
covered in our study.
1. Irradiate ultra-violet light:
After the wafers are cut, they become individual separate chips, still attached
to the original UV tape film. Therefore, following irradiating by UV light, the
tape film is softened, allowing the attached chips to be conveniently detached.
2. Inner Lead Bonding (ILB):
The inner lead bonding process involves in taking the inner lead of the tape
and the gold bumps on the chips, using heated press to attach them together on
the tape, making connection points; Thus the process allows the chip to be
connected to the circuit on the tape.
3. Inner Lead Bonding QC:
After making the inner lead bonding process, follows the quality control of the
inner lead’s completeness, pitch, lead and tape bond, etc.
4. Potting:
The process of applying a coating uses resin sealing to provide protection for
the chips to prevent damage by moisture, increase support of lead frames and
also helps in heat exchange etc.
5. Curing:
When a product finishes with potting, in addition to the brief heating of the
resin on the machine, the finished product needs to be put in a oven for further
heating, making the resin on the product completely moisture free and harden.
6. Marking:
The marking process mainly deals with printing text onto the IC product’s
packaging, allowing the product specifications to be identified and the originality
production process. The marking method is determined by the client needs.
7. Final testing:
Final testing is done after the completion of the packaging process, using
probes to connect with the product’s outer leads; using electronic detection
products to ensure the TCP packaged product reaches specification and sift
defects.
8. RW:
Using the tape bonding machine to wind-up and coarctation of the leads. The
method of transport is using fixed rollers to transport soft chip type bearer, when
the inner leads that are attached to the tape, reach position, the bonding head will
lower down to press and bond the leads, thus completing a chip’s RW process.
9. Packing:
Classifying the finished products and packing them into client specified
packing containers, and affix label and logos etc.
Fig. 3. TCP packaging process.
3.2 Problem definition
The study is designed to discuss data mining in the semiconductor packaging industry
with an aim to identify unknown but useful knowledge. As the manufacturing industry
always uses run card to record quality-related problems raised by customers, lacking
an effective way to make the most of information, it results in wasteful time and
unnecessary cost on investigations and analysis when the problem reoccurs.
Furthermore, data is often large in size and complicated, the personnel in charge of
quality-related problems can hardly identify the discrepancy factor or generalize the
characteristics or types of the problems rapidly or correctly. For these two main
reasons, the challenge required to be settled in the study lies in the conclusion of the
problems arising in connection with the semiconductor manufacturing plant.
There are cases applying data mining in a number of literature reviews, including
manufacturing, financing and telecommunications. Among the available data mining
tools, the common classification methods are decision Tree, Naïve Bayesian, Neural
Network, clustering and association rules, shown as Table 2 (Chien, 2004).
Table 2:Classification of data mining technology
In applied telecommunications cases, Naïve Bayesian is better than decision Tree in
prediction effect (Hsieh, 2005). Thus the study is intended to use the data mining
methods, decision Tree, Neural Network, Bayesian, clustering and association rules
acquire the previous run card data content for analysis, and identify which algorithm
data mining result is superior in application to the semiconductor packaging industry,
based on implementation outcomes. In this way, we may make a reasonable
conclusion for the patterns of incidents and build a problem diagnosis analysis system
and, through discussion with the experts in the domain concerned, assist the personnel
in charge of quality-related problems to reduce the diagnosis time and scope of the
incident.
SYSTEM DESIGN AND IMPLEMENTATION
4.1 Proposed system architecture
Based upon the data mining system, the complete design framework for intelligent
quality improvement system is illustrated in Fig. 4 (Lee, 2001). The function of each
design element is shown is in the following steps.
1. Experts, domain knowledge: At first, determine which goals to achieve with data
mining for relevant data collection, data pre-processing, selection of data
attributes and data mining methods.
2. Data collection: The conversion of historical data from the existing system, i.e.
WIP, ERP into the processing area should be considered.
3. Data standardization: After determination of data acquisition source, standardize
the data type to ensure the consistency between subsequently collected data and
pre-processed data.
4. Data preprocess: When data is collected in the processing area but not stored in
the data warehouse, there might be lost or inaccurate data in some fields. To
enhance processing efficiency and accuracy, it's essential to proceed with data
integration, conversion, extraction, and cleaning.
5. Data warehouse, OLAP (Online Analytical Process): As the data to be processed
is distributed in different databases and is always large in size, reduced data
search time is the key to the whole process of data mining. Hence data
warehousing is applied to address these challenges. Besides, OLAP operations in
data cubes include rollup, drilldown, slice, dice, and pivot.
6. Select attributes: Typically, data analysis is proceeded after a proper attribute is
selected, which is based on the proper attribute for specific analysis target
determined by the expert in the domain concerned, because either insufficient or
excessive attributes cannot achieve correct analysis results.
7. Data mining engine: A data mining engine is the core, and also the critical part,
in the system framework. The most commonly used classification methods are
decision tree, Bayesian, Neural Network, clustering, and association rules, etc.
8. Results evaluation: Tremendous mined data and patterns may exist; the mined
result can be more available and interpretable only through parameter setup. In
addition, we may set some restrictive conditions to retrieve more significant
outcomes, and then the expert may participate to or assist interpret and assess the
mined rules or patterns. In case of dissatisfaction with the assessment results, one
may return to the previous adjustment methods or parameters until a proper
outcome is retrieved.
9. Results display: The mined result may be presented by user preference.
10. Knowledge base: The knowledge base, which stores expert expertise and the
rules available after data mining, can be updated from time to time to be the
basis for various decision-making supports.
Experts Domain Knowledge
Application goal is determined
KnowledgeBase
Data preprocess
Data collection
DataWarehouse
Data standardization
Evaluate results
Decision tree mined knowledge
OLAP
Results Display
3
4
7
8
5
9
1
10
2
ERPWIP
select attributes6
Decision tree
Neural network
Bayesian
Clustering
Association
Dataminingengine
Fig. 4. Complete design framework for intelligent quality improvement system.
In order to construct the proposed system architecture, we should some set up
elements including:
1. Setting up a data warehouse including following steps:
(1)Setting up data warehouse architecture
(2)Setting up data warehouse procedures
(3)Setting up data warehouse schema
(4)Setting up fact table
(5)Setting up dimension table
(6)Setting up multidimensional model
2. Setting up Decision Analysis and Data Mining System
After completing the construction of data cubes, it is possible to integrate
decision-making analysis and the data mining system. The goals of integration
are to allow OLAP analysis results to supply the knowledge base within the data
mining system, thus providing analysis information to the data mining system
and creating a point of reference for data mining tasks. OLAP technology is able
to blend together people’s observations and intelligence within the data mining
system, thus improving the speed and depth at which data is excavated.
Furthermore, the intelligence discovered by the data mining system acts as a
guide in OLAP analysis tasks, increasing the depth of analysis. As a result,
information left unearthed by the OLAP, is extremely complex and delicate in
nature.
3. Setting up Data Mining System
Data classification is basically comprised of the following two-step process (J.
Ross Quinlan, 1993; Jiawei Han and Micheline Kamber, 2001):
(1)Training model: Through the collecting of items within the database, a training
data set is determined. This set is analyzed in accordance with the algorithm used
to classify data, for example, decision tree and clustering. The learning model or
classifier is represented in the form of classification rules.
(2)Classification: Through the collecting of items within the database, a test data
set is established. This set is entered into a classifier. After deviations within the
classification model have been rectified, unknown data is entered into the revised
classifier, thus, predicting subsequent results.
4.2 System implementation architecture
The system's environment and framework are shown in Fig. 5; including Data
Warehouse Server, Data Mining Server, Web Server, and Data Mining and quality
improvement front-end PC. Microsoft SQL 2005 provides several kinds of data
mining algorithms for various applications. When building classifications engines, we
adopt the algorithms of SQL server 2005 for calculation (Hsieh, 2005).
Front-end pc
Front-end pc
Firewall
Internet
DMZ
Web server
Data Warehouse
Data Mining Server
WIP Server
ERP Server
Intranet
Fig.5. The environment and framework of intelligent quality improvement system.
4.3 Experimental results analysis
Quality problem data (25,150 entries) are predicted, classified, and analyzed with the
decision tree, neural network, Bayesian, clustering and association rules algorithm.
The data is randomized to training set and testing set by 3:1 (training set: 18,862
entries, testing set: 6,288 entries). For classification and analysis with the decision
tree, there are 5,653 entries of correct data, representing a success rate of 89.9%.
To establish the database, we have collected the data form January to December in
2005. With the decision tree and clustering provide by the intelligent quality
improvement system for reference of man operation analysis and problem settlement
form January to December in 2004, the four major factors that influence quality are:
broken/bending/delamination internal pin, shrinkage, short, resin wrapping drawn-in
object/tape indent. Before using decision tree algorithm, we have the improvement
rate 8.0%, 7.8%, 7.9%, 7.6%, 7.9%, 8.2%, 7.9%, 7.3%, 7.9%, and 7.6% for each
quality problem, and overall average improvement rate is 7.8%. After improvement
with the decision tree, we have the improvement rate 13.0%, 13.9%, 13.1%, 14.5%,
12.7%, 12.6%, 12.8%, 13.6%, 13.2%, and 13.3% for each quality problem, and
overall average improvement rate is 13.3%. The total average improvement rate
shown in Table 3 and the statistical curves before improvement vs. after improvement
is shown in Fig. 6. Hence, decision tree method is more effective and accurate than
the other methods to apply to the quality problems in the semiconductor packaging
industry.
Table 3: Comparisons of data mining results
Fig. 6. Diagram of overall improvement rate curves for each quality problem yearly.
CONCLUSIONS
In order to meet the target mentioned above, our research involves using data
warehouse, OLAP, decision tree, neural network, Bayesian, clustering and
association rules algorithms to perform classification analysis of the causes of yields
in the manufacturing process of semiconductor packaging plant, comparing the
correctness and applicability of proposed algorithms, and providing a decision-
making policy for the executives, with a view to identifying the causes of problems
and solutions of main variables to the problems, making decisions quickly, and
eventually reducing the time taken to solve quality problems. The results and
contributions of this research are listed as follows.
Compared with proposed classification algorithms, predictions made by means of
decision tree have an accuracy of 89.9% and predictions made by means of neural
network, Bayesian, clustering and association rules have accuracies of 84.3%, 83.1%,
82.6% and 80.7% respectively. Decision tree algorithm is more effective and
appropriate than clustering algorithm to analyze the quality problems in the
semiconductor packaging industry.
In the experimental results, it is found that among the four attributes, man, machine,
material and method, we will explore the first priority is machine, second priority is
material, third priority is method and fourth priority is man in the semiconductor
packaging industry.
We have also found the solutions to the major variables for pressure and temperature
of inner lead bonding and potting flows occurred in the packaging level.
References
Alex Berson, Stephen Smith, Kurt Thearling. (2000). Building data mining
applications for CRM. McGraw-Hill.
Atsumi, K., N. Kashima, Y. Maehara, T. Mitsuhashi, T. Komatsu, and N. Ochiai.
(1989). Inner lead bonding techniques for 500 lead dies having a 90 um lead
pitch. Proc. 39th Electronic Components Conference, 171-176.
Brachman, R.J., T. Khabaza, W. Kloesgen, G.P. Shapiro, E. Simoudis. (1996). Mining
business databases. Communication of the ACM, 39(11), 42-48.
Chien H.H. (2004). Using data mining techniques for analysis of manufacturing
process quality and improvement – using LCD drive IC packaging as example.
National Chiao Tung University Masters in Management.
Frawley, W.J., G. Paitetsky-Shapiro, C.J. Matheus. (1991). Knowledge discovery in
database: an overview. Knowledge Discovery in Database, AAAI/MIT Press, 1-
30.
Hsieh C.B. (2005). Data Mining and Business intelligence: SQL Server 2005. Ting
Mao Publish Company.
Hsu G.H. (1999). An Advanced Packaging Technology: Wafer Packaging Technology.
Materials Magazine, 151, 86-91.
Ikeya, Y., K. Atsumi, N. Kashima, Y. Maehara, K. Okano. (1989). High-accuracy
inner lead bonding technique. Proc. IEMT-Japan, 71-74.
Industrial Technology Research Institute Material Center, (2004). Industrial
Economics and Knowledge Center Project.
J. Ross Quinlan. (1993). C4.5: Programs for machine learning. Morgan Kaufmann
Publishers.
Jiawei Han, Micheline Kamber. (2001). Data mining: concepts and techniques.
Morgan Kaufmann Publishers.
Kleissner, C. (1998). Data mining for the enterprise. IEEE Proceedings of the 31st
Annual Hawaii International Conference on System Sciences, 7, 295-304.
Lee J. F. (2001). Research and exploration of data mining. Information and Education
Magazine.
Michael J.A. Berry, Gordon S. Linoff. (1997). Data mining techniques: for marketing,
sales, and customer support. John Wiley & Sons.
Michael J.A. Berry, Gordon S. Linoff. (1999). Mastering data mining: the art &
science of customer relationship management. John Wiley & Sons.
Peter F. Drucker, Ikujiro Nonaka, David A. Garvin. (1998). Harvard business review
on knowledge management. Harvard Business School Press.
Robert Groth. (2000). Data mining: building competitive advantage. Prentice-Hall
Inc.
Scharr, T.A. (1983). TAB bonding a 200 lead die. Proc. ISHM Symposium, 561-565.
Surajit Chaudhuri, Umeshwar Dayal. (1997). An overview of data warehousing and
OLAP technology. SIGMOD, 26, 65-74.
Tsai Tsan-Lian. (2001). Research in Taiwan LCD driver IC finishing process optimum
work distribution model. National Chiao Tung University Masters in High
Executive Management.
Vivek R. Gupta. (1997). An introduction to data warehousing. System Services
Corporation.
Yang et al. (2001). Analysis and reliability assessment in inner lead welding machine
characteristics for tape carrier package IC. Electronics and Materials Magazine,
132-142.
Yu, P.S. (1999). Data mining and personalization technologies. IBM T. J. Watson
Research Center, IEEE.
Zhengxin Chen. (2001). Data mining and uncertain reasoning: an integrated approach.
John Wiley & Sons.