Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association...

30
Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems Fang-Yie Leu Dept. of Computer Science and Info rmation Engineering, Tunghai Unive rsity, Taiwan [email protected]

description

Huelva 2007, International Conference of Territorial Intelligence organised in the framework of CAENTI. WORKSHOP 2: Territorial Analysis Tools

Transcript of Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association...

Page 1: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park

using Association Rules, Neural Networks and Geographical Information Systems

Fang-Yie LeuDept. of Computer Science and Information E

ngineering, Tunghai University, [email protected]

Page 2: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

1 Introduction

• Nowadays, Geographical Information Systems (GISs) are widely used, particularly in designing and showing a city’s road networks, underground pipes, power lines, and et al. Users can search roads or landmarks on a electronic map or in internet if the map provides a web version, to realize the locations they are interested in.

Page 3: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• Besides, machine learning is also well known intelligent techniques/models.

• Most of the researchers or decision makers rely on computers to analyze their data in deep which are always stored in computer databases or files.

• However, databases or files are passive data. We can query or manipulate them only.

• They never actively tell us the knowledge deeply embedded or hidden in them. In the social or geographic domain, few applications deploy GIS and data mining at the same time.

Page 4: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• In this paper, we will discuss how to apply association rules and neural networks to analyze surveyed data collected from people living in the Situn district and Dayia village which are two areas surrounding Central Taiwan Science Park

• so that researchers can accordingly realize facts that can not be superficially obtained from raw data concerning the construction of the science park (before and after), classify survey data based on some specific rules and predict the trends of given information.

Page 5: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• The prediction results can be referred to by local and central governments as a reference when making public policies.

• Besides, if we can input the analytical results to GIS, the hidden meanings or rules embedded in the survey data can be then uncovered more deeply and precisely.

Page 6: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

2 System Description

• Computers are sometimes good at learning concepts.

• A concept is a set of objects, symbols, or events grouped together due to sharing certain characteristics.

• Concepts can be well designed and structured for future retrieval and management.

• Common concept structures include trees, rules, networks, and mathematical equations.

Page 7: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• Many types of data mining techniques have been developed.

• Among them, association rule and Neural networks are the two most commonly used ones.

• Association rule is often used to discover relationship between two set of items.

• Neural networks is able to predict outcomes of an input data, or classify data into groups.

Page 8: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• Affinity analysis is the general process of determining which things go together.

• A typical application is market basket analysis, where the desire is to determine those items likely to be purchased by a consumer during a shopping experience.

• The output of the market basket analysis is a set of associations about consumer-purchase behaviour.

2.1 Association rules

Page 9: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• To perform affinity analysis between two things A and B, confidence and support are two important parameters required to be considered.

• Confidence is the conditional probabilities of the occurrence of A given the occurrence of B, and the occurrence of B given the occurrence of A.

• Support is defined as the percentage of all transactions or instances containing the attribute values found in an association rule.

Page 10: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

2.2 Neural networks

• Neural networks offer a mathematical model that attempts to mimic human brains.

• Knowledge is often represented as a layered set of interconnected processors.

• These processor nodes are frequently referred to as neurodes so as to indicate a relationship with the neurons of the brain.

Page 11: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• Each node has a weighted connection to several other nodes in adjacent layers.

• Individual nodes take the input received from connected nodes and use the weights together with a simple function to compute output values.

Page 12: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• Learning is accomplished by modifying network connection weights while a set of input instances is repeatedly passed through the network.

• Once trained, an unknown instance passing through the network is classified according to the value(s) seen at the input layer.

Page 13: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• Number of input nodes is determined by the input of the application concerned.

• Values of input nodes should be between -1.0 and 1.0.

• Table 1 shows the initial weight values for the Neural network shown in Fig.1.

Page 14: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

Node 1

Node 3

Node 2

Node j

Node i

Node k

1.0

0.4

0.7

output

Input layer Hidden layer Output layer

•Fig. 1 Neural network framework

Page 15: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

Table 1 The initial weight values for the Neural network shown in Fig. 1.

Wij W1i W2j W2i W3j W3i Wjk Wik

0.20 0.10 0.30 -0.10 -0.10 0.20 0.10 0.50

Table 2 A population of weight elements for the Neural network in Fig. 1

Training sequence

Wij W1i W2j W2i W3j W3i Wjk Wik

1 0.20 0.10 0.30 -0.10 -0.10 0.20 0.10 0.50

2 0.14 0.38 0.19 0.25 -0.17 0.27 0.11 0.54

3 0.20 0.10 0.38 -0.16 -0.16 0.24 0.12 0.53

4 0.23 0.10 0.39 -0.18 -0.17 0.26 0.15 0.54

… … … … … … … …

Page 16: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• When an output is larger (smaller) than its corresponding value, -ΔW(ΔW) is added to the corresponding weight.

• Table 2 shows examples. Output of a Neural network is limited to 0 to 1.0.

• Often a sigmoid function is employed (see Fig. 2).

• The number of output nodes is determined by number of output bits.

• An N bit output requires bits, i.e., nodes

Page 17: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

0

0.2

0.4

0.6

0.8

1

1.2

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

1數列

•Fig. 2 Sigmoid function

Page 18: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

2.3 Geographical Information System (GIS)

• A GIS system (or GIS in short) is an application system for creating, storing, analyzing and managing spatial data and associated attributes.

• GIS is often used for scientific investigations, resource management, asset management, environmental impact assessment, city development planning, cartography, and route planning,

• for example, to identify a polluted area that need to be isolated from others.

Page 19: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

3. Data Analysis on Survey Data of CTSP

• Cental Taiwan Science Park (CTSP) have started its running Since 2003,

• its surrending environment has hugely changed.

• Currently about 20,000 people are employed by CTSP coorporations.

• The utimate number of employees will be 50,000, which will increase population of CTSP surrounding residents to about 200,000.

• The more people, the more business chance and the more shops and restaurants.

Page 20: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• Question 34: About 4 to 5 years ago (before CTSP started running), where had you gone shopping?

• Question 35: In the passed one year, where have you gone shopping?

• About 65 % of answers of question 35 is the same as those of question 35.

• Most people go to large-scale super markets or nearby smaller-scale super markets to prrchase their daily needs.

Page 21: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

Their answers are that they are familar with everythings in the one or ones they have often gone.

• They know what the super markets have and the prices. They can also easily find what they want and used to buy.

• Therefore, they do not change their shopping markets.

Page 22: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• Question 36: About 4 to 5 years ago (before CTSP started running), where had you eatten out? • Question 36a: In the passed one year, where have

you eatten out?

• Many residents have their meals at home.

• But those who often eat out gradually change their restaurants from far away ones to nearby since many restaurants have been opened in the area they live in, i.e., Shitun district and Daya village.

Page 23: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

The reason is : More than 70,000 people ( employees and their families) come to live near CTSP.

• many restaurants are required. Further, most restaurants (old and new) are much more expensive than before since high-tech employees earn much more money than other oniginal residents.

• Of course, restaurants offer higher quality foods.

Page 24: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• Question 37: About 4 to 5 years ago (before CTSP

started running), where had you gone to spend

your leisure time?

• Question 37a: In the passed one year, where have you

gone to spend your leisure time?

• About 83% of people do not change the locations where they like to spend their leisure time.

Page 25: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• There are three reasons.

• One is the leisure facilities, e.g., new parks and department stores, are not newly constructed and equipped. Nothing can attract them to change their original leisure facilities.

• The second is original residents currently are not rich enough to change to new leisure facilities.

• The last one is they enjoy their current leisure ways.

• As we further study the reasons, the three reasons truly exist.

• But, we dare to conclude that people living in these two areas have earned much more money than before.

Page 26: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• Question 38: About 4 to 5 years ago (before CTSP started running), where was your office or business location?• Question 38a: In the passed one year, where was your office or bussiness location? A

• About 89 % of people do not change their works since most people living in Daya village are less educated.

• Seldem residents have the chance to enter CTSP as workers.

Page 27: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

Most Shitun original residents, even having a little bit higher education, have their own jobs before 2003.

Their professions, positions and ages may or may not be accepted by CTSP coorporations.

So only few original residents are hired by the coorporations.

Page 28: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• the remaining 11 % of residents are business people.

• They move their businesses back to the area near CTSP.

• They do benefit from this science park

Page 29: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

4. Conclusions

• We are still working on the survey data, and paln to deploy association rules to explore the deep relationship between two large intem sets and knowledge existing in raw data.

• We also attemp to use Neural network as a well-trained learing model to predict outcome of a future event or trends of currently known phenomenon.

• The details will be introduced and described in the full paper which should be finished before Nov. 20, 2007.

Page 30: Analyzing Survey Data Concerning the Construction of Central Taiwan Science Park using Association Rules, Neural Networks and Geographical Information Systems, Fang-Yie LEU

• Thank you for your attention.