Session 7 Introduction to important statistical techniques ...

66
1 AsiaPacific Research and Training Network on Trade www.artnetontrade.org ARTNeT Greater Mekong Sub-region (GMS) initiative Session 7 Introduction to important statistical techniques for competitiveness analysis example and interpretations ARTNeT Consultant Witada Anukoonwattaka, PhD Thammasat University, Thailand [email protected]

Transcript of Session 7 Introduction to important statistical techniques ...

1Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

ARTNeT Greater Mekong Sub-region (GMS) initiative

Session 7

Introduction to important statistical techniques for competitiveness analysis – example and interpretations

ARTNeT ConsultantWitada Anukoonwattaka, PhDThammasat University, [email protected]

2Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Outline• Concepts of data analysis• Basic data analysis:

– Interpreting quantitative and qualitative data

• Technical tools – Statistic analysis– Regression

• Concepts and interpretation of basic regression analysis

3Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

What is data analysis?1. Describing what is going on in the dataset

E.g. You explore the sample to find out– the level and changes in relative price

competitiveness of the observed garment producers on average.

– differences in the cost competitiveness among firm groups, such as

• purely-national firms vs. foreign joint-ventures• small vs. large firms

4Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

2. Testing hypothesisE.g. You may want to know

– Whether the changes in relative cost of Chinese garments to that of the GMS group systematically related to tariff reductions?

– Does the changes in relative costs differ systematically between countries in the group?

– Are the trends of competitiveness similar between exports to the US and Japanese markets?

5Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

3. Forecasting• Can exchange rate depreciation increase export

competitiveness of GMS countries to China? By how much?

• Can tariff reductions enhance export competitiveness of GMS countries? To what extent?

6Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Describing what is going on in the data

7Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Interpreting Quantitative Data (1)1. Overall Average Scores - high or low? Very high or very low

scores might mean that the question is poorly worded.

2. Standard Deviations - A low standard deviation means respondents generally had a common response. A high standard deviations mean they had different responses.

3. The frequency distribution will help you get a better idea of what is happening.

• Is there any bi-modal distribution where there are two different groups who had very different responses?

• Bi-modal distribution might show up as having a normal average score, but high standard deviations.

8Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Interpreting Quantitative Data (2)4. Compare the results between the different

demographic subgroups.– Especially focusing on the items where you had

interesting things happening in the frequency distributions.

5. If you are serious about understanding your numeric data, you should also perform some statistical analyses.

9Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Interpreting Qualitative Data

1. Read through all the comments. Get a feeling for what people are saying.

2. Categorize the comments into different areas. 3. Look at each category separately. How many unique

comments are in each? How detailed are those comments? How strongly are they stated? At this point, you should be able to identify which categories are more important and which are less important.

4. Look at the different subgroups to see if any relationships emerge between subgroups and categories of comments.

10Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Technical Data Analysis:

• Statistic analysis • Hypothesis testing • Forecasting

11Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Statistic Analysis 1. Analysis of individual variables

– Look at the “central tendency”, “distribution”and “dispersion” of responses to each data variable.

2. Analysis of relationships between variables– Look at “possible interdependence” between

data variables. 3. Analysis of difference characteristics between

subgroups.– Look at “characteristic differences” between

subgroups.

12Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

What are we analyzing when we investigate a competitiveness survey dataset to find out…a) Whether foreign investment tends to enhance labor productivity of the garment industry?

b) Whether export-oriented industries have higher labor productivity than import-competing industries?

c) How productive is labor in the garment industry ?

Examples

13Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Descriptive Statistics

Activity

Statistics Worker Industry 1 2 3 4 AFTA ForeignMean 58.75 2 0.33 0.50 0.25 0.42 1 0.58Standard

Error 13.50 0.28 0.14 0.15 0.13 0.15 0.33 0.15Median 45 2 0 0.5 0 0 1 1Mode 30 2 0 1 0 0 2 1SD 46.76 0.95 0.49 0.52 0.45 0.51 1.13 0.51Minimum 15 1 0 0 0 0 -1 0Maximum 180 4 1 1 1 1 2 1Sum 705 24 4 6 3 5 12 7Count 12 12 12 12 12 12 12 12

14Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Note: You can do descriptive statistics in Excel

• Go to menu Tools – Add Ins – check the Analysis Tool pack and then press OK button. Next time when you open the Tools menu again, you will see Data Analysis in the bottom of Tools menu.

• Click menu Tools – Data Analysis and you will see Data Analysis dialog. Scroll down and you will see Descriptive Statistics. Select it and click OK button.

15Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

• You will get the Descriptive Statistics dialog form. In the Input range, select range of your data that you want to be analyzed. Include the label in the first rowand check that check box. Check also the Summary statistics check box and then click OK button.

16Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

The result of the descriptive statistics tool, after formatting, is shown in the figure below.

17Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Analyzing Individual Variables

• Central tendency of the data• Distribution of the data• Dispersion of the data

18Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Tools for Measuring Central Tendency: Mode, Median, Mean

• Mode is the most frequently occurring value, • Median is the middle value,• Mean is the average value.

Notes:a “Yes” means the indicator is suitable for the measurement level shown.b May be OK in some circumstances. See Example 2.c May be misleading when the distribution is asymmetric or has a few

outliers.

19Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Competitiveness Analysis Examples:Example 1: Which measures of central tendency to use to find the following information from your dataset?

a) Unit labor cost of firms in the footwear industryb) The majority of foreign investors in the textile industryc) Average export ratio when the dataset shows that

Firm No. Export ratio1 20%2 24%3 28%4 30%5 85%

20Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Example 2. The following ordinal scale data shows customers’ views on the quality of domestically produced garments (sample size is 30). Is it possible to find the “mean” of this ordinal variable?

21Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Analyzing Data Dispersion: ‘Range’ and ‘Standard Deviation (SD)’

Dispersion is the spread of the values around the central tendency.

Range = Max-Min

SD =

Note: All statistic programs (event Excel) re capable of calculating descriptive statistics for you.

22Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Analyzing Data Distribution: A Frequency Distribution

The frequency distribution is a summary of the frequency of individual values or ranges of values for a variable.

A Frequency Distribution of Age Groups

23Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Normal DistributionWe usually expect normal distribution of the data observations if we performed random sampling.

Normal Distribution

If the mean of our example is 20.5 and the standard deviation is 7.5, we can estimate that approximately 95% of the scores will fall in the range of 20.5-(2*7.5) to 20.5+(2*7.5) or between 4.5 and 35.5

1 SD-1 SD

2 SD-2 SD

24Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Analyzing Relationships between Variables

• Scattered-plot diagram• Cross tabulation (Pivot Table)• Regression analysis

25Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Relationships between Variables

Is there any relationship between the two variables shown in the scattered-plot diagram?

26Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Cross Tabulation (Pivot Table)

Export orientationAttitude toward QCLow Medium High Total

Indifferent 27 37 56 120Somewhat positive 35 39 41 115Positive 43 33 30 106Total 105 109 127 341

Note: Some statistician called it Contingency Table, while MS excel calls it Pivot Table.

27Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Interpretation (1)

Distribution of attitude variable.

Export orientationAttitude toward QCLow Medium High Total

Indifferent 120Somewhat positive 115Positive 106Total 105 109 127 341

Distribution of export-orientation variable.

35%34%31%

100%

100%

Does the sample bias toward particular attitude?

Does the sample bias toward particular firm types?

28Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Interpretation (2)

Distribution of attitudes forhigh export firms.

Export orientationAttitude toward QC Low Medium High TotalIndifferent 56Somewhat positive 41Positive 43 33 30 106Total 127

Distribution of export-orientation for positive attitude toward QC.

• Is attitude toward QC associated with export orientation of the firms?

• Do the firms with a positive attitude toward QC tend to be low or high export-orientation firms?

• Do the firms with high export-orientation tend to be positive or indifferent toward QC?

29Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Analysis of Differences between Groups

• Are there differences between low-export and high-export firms in the attitude toward QC?

Export orientationAttitude toward QC Low Medium High TotalIndifferent 26 34 44 35

Somewhat positive 33 36 32 34

Positive 41 30 24 31

Total 100 100 100 100

E.g. Differences between firm groups.

Percentage Cross Tabulation

30Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Note: You can do Cross Tabulation in ExcelIn Microsoft Excel, CrossTabs can be automated using Pivot Table. You may use either Pivot Table icon in the toolbar or using MS Excel Menu Data – Pivot Table and Pivot Chart Report .

When you click the toolbar or menu, Pivot Table wizard will pop up, click Next

31Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

In the step 2 of the wizard, you highlight the data including the label of the data in the top as shown in the following figure.

32Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

In step 3 of the Pivot Table Wizard, select Layout button.

33Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

To answer the relationship between variable Playground and Satisfaction , drag and drop the name of the variables on the right into the diagram. Put Satisfaction button in the row and Playground button in the column and make another drop to put Satisfaction once again to the Data . It will appear as Sum of Satisfaction . After that, double click the last button ( Sum of Satisfaction ) and Pivot Table Field dialog will appear. Select summarized by Count and then click the OK button twice.

34Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

When you go back to the Step 3 of Pivot table wizard, click Finish button.

35Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

MS excel will automatically create the Cross Tabulation table. Personally, I don't like to use it directly because it may contain very long formula. Thus, I prefer to highlight this Pivot Table and use Menu Edit Copy (CTRL-C). Then select another cell, and use menu Edit - Paste Special . Click Values options and click OK button.

36Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Key Considerations• Watch the "n" (number of observations)- Be wary

of small samples.– If there are few respondents in a particular category,

you should NOT trust the data, or at least, you should look for much stronger trends before trusting the results.

For example, can we make a conclusion if we found that…

Case A) 38% of sample (8 observations) said they have not had a problem competing with imports from China.

Case B) 88% of sample (8 observations) said they have not had a problem competing with imports from China.

37Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

• Knowing whether a relationship is strong enough or not strong enough with smaller respondent numbers takes some practice and experience.

• What you really want to know is whether the relationship is "statistically significant". – This type of analysis is rather technical.

38Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Introduction to Regression Analysis• Regression Analysis

A technique for using data to identify relationships among variables and use these relationships to make predictions.

39Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Basic Concepts of Regression Analysis

• You first fit a straight line to model the data.

• A straight line provides the simplest model of the relationship between the response (y variable) and the predictor (x variable).

0 1y b b x error= + +

40Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Simple Linear RegressionPr

oduc

tivity

Ind

ex

Firm-size Index

Productivity = b0 + b1(Size) + error

0 1y b bx error= + +X

Y

Fitted line

41Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

0 1y b b x error= + +

coefficientsDependent variable

Independent variable

How far is the fitted line from the data.

• The size of the coefficient gives you the size of the effect that variable is having on your dependent variable.

• The sign on the coefficient (positive or negative) gives you the direction of the effect.

42Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Productivity = b0 + b1(Size) + error

Regression Productivity = 5 + 3 Size + error

Prediction Expected Productivity = 5+ 3 Size

• Productivity is predicted to increase by 3 units if firm-size increases by 1 unit.

• If the average firm size of the industry of interest is 20, we get a predicted productivity of 5+ 3(20) = 65.

b1 represents the increase in productivity for an additional value of firm size.

b0 could in theory be thought of as the productivity for which the firm-size is zero

Interpretation

43Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Your turn!

What is the following regression telling? Market share = 100 – 0.2 (labor cost) + error

44Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

General Regression• If a straight line doesn’t fit the data well, you

can

– Fit a curved line with quadratic or cubic terms

– Apply a log transformation to the response (y) or predictor variable (X).

E.g. 0 1ln lny x erro rβ β= + +

45Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

A regression model may need more than one dependent variable to adequately describe the response (Y variable).

This is called “Multiple Regression”.

• The coefficient tells you how much the response is expected to increase when that independent variable increases by one, holding all the other independent variables constant.

0 1 1 2 2 3 3y b b x b x b x error= + + + +

46Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

E.g. What is the regression telling?

export price = 120 -3 (exchange rate) + 1.7 (wage) + e

47Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Regression Output

Estimated Coefficient

SE t statistic P value

constant 41.36 37.82 1.094 0.280

Real wage -15.85*** 2.88 5.500 0.000

Investment 0.64 0.27 0.236 0.814

Labor Productivity

2.42*** 0.81 2.992 0.004

Note: Statistical significance at the 1 percent, 5 percent and 10 percent levels is indicated by ***, **, and *.

0 1 2 3Export share = b +b ( ) b ( ) b ( . .)realwage investment L prod e+ + +R square = 0.646

Adjusted R square = 0.613Prob>F =0.000

48Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Interpretation of a regression output (1)1) Are the independent (X) variables having a

genuine effect on the response (Y) ?

1.1 Look for a small “P value” in a regression output. – “P value” tells you how confident you can be that each

individual variable has some correlation with the dependent variable. It is also called significant level.

– “P < 0.05" is the most common standard threshold for statistical significance.

• It says there is a 95% probability of being correct that the variable is having some effect, assuming your model is specified correctly.

49Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

(2)1.2 Looks for a large “t statistic” in a regression

output. – t statistic is the coefficient divided by its standard

error (SE). – SE tells the precision of the regression coefficient. If

a coefficient is large compared to its standard error, then t statistic is large (significantly different from 0).

– Your regression software will compare the t statistic on your variable with values in the Table of t distributionto determine the P value, which is the number that you really need to be looking at.

50Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

– The larger the t statistic, the more likely there is a 95% (or higher) probability that the variable is having some effect, then you have P value < 0.05.

1.3 Look for symbols indicating statistical significance at the 1%, 5%, and 10% level.

– statistical significance at the 1%, 5%, and 10% is another way of saying P < 0.01, P< 0.05, and P< 0.10, respectively.

(3)

51Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

2) Whether your regression model is making accurate predictions?

- Look for “R-squared (R2) ” close to 100%. - It says how much of dependent variable (Y) has been

explained by the regression model.Ex. What is meant by R2 = 100% ?

3) Is there any explanatory variable missing from the model?

- See whether “Adjusted R square ( )” is significantly lower than R2.

- It usually says that there are some explanatory variables missing from the model

2R

(4)

52Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

4) You should be aware that P-value is generally more important than R-square

- The P value tells you how confident you can be that each individual variable has some correlation with the dependent variable.

- The R-squared is generally of secondary importance, unless your main concern is using the regression equation to make accurate predictions.

(5)

53Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

5) The sign of multicollinearity (independent variables may be correlated)

- Small P-value of the regression as a whole (Prob>F at the upper part of the regression output is less than 0.05), but large P-value of individual variables.

- It means the coefficients on individual variables may be insignificant when the regression as a whole is significant.

- Intuitively, this is because highly correlated independent variables are explaining the same part of the variation in the dependent variable, so their explanatory power and the significance of their coefficients is "divided up" between them.

(6)

54Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Regression Methods and Choosing Criteria

55Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

RegressionContinuous X variables ⇒ Continuous response (Y)

E.g. How are the age and the body mass index (BMI) of a patient associated with the length of stay in the hospital?

0 1 2Day b b Age b BMI e= + + +

56Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

General Linear ModelCategorical X variables ⇒ Continuous response (Y)

How is the payment method and the day of the week associated with the cost of a transaction?

57Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

0 1 2Cost b b DayDummy b MethodDummy e= + + +

Day (x1) Dummy Value

Mon 0,1

Tue 0,1

Wed 0,1

Method(x2) Dummy Value

Credit 0,1

Cash 0,1

Check 0,1

58Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Binary Logistic Regression

Two Response (Y) Categories

Whether customers who saw an advertisement for its new cereal are more likely to buy the product?

Analysts randomly sample customers and ask them whether they saw the advertisement and whether they bought the cereal.

Advertisement

59Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Decision (y) Coding

Buy 1

Don’t buy 0

( )Pr( ) .Decision f Ad Dummy e= +

Binary Logistic Regression

Two Response (Y) Categories

Advertisement

60Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Ordinal Logistic RegressionMore than Two Response (Y) Categories in Natural Order

Whether the weight of a hen is related to the size of its eggs?

They randomly sample hens, record the weight of each hen, and classify the size of its eggs as small, medium, or large.

Hen Weight

61Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Ordinal Logistic RegressionMore than Two Response (Y) Categories in Natural Order

Egg Size (y) Coding

Small 1

Medium 2

Large 3

( )Pr( )EggSize f HenWeight e= +

Hen Weight

62Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Nominal Logistic RegressionMore than Two Response (Y) Categories with No Natural Order

Whether the color of the vehicle that consumers purchase is related to their gender or age?

Because the colors of the vehicles cannot be arranged from least to greatest, the response categories do not follow a natural order.

63Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Nominal Logistic RegressionMore than Two Response (Y) Categories with No Natural Order

Color (y) Dummy Value

Silver 0,1

Blue 0,1

Red 0,1

( )P r( )C olor f Age e= +

64Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Potential Misuses of Statistics

• Manipulating the scale to change the appearance of the distribution of data

• Eliminating high/low scores for more coherent presentation

• Inappropriately focusing on certain variables to the exclusion of other variables

• Presenting correlation as causation

65Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Conclusion• Statistical analysis is just one way of

working with observable information.• It consists of tests used to analyze data.

These tests provide an analytical framework within which researchers can pursue their research questions.

• However, statistical tests may be misused, resulting in potential misinterpretation and misrepresentation.

66Asia‐Pacific Research and Training Network on Trade

www.artnetontrade.org

Reading• Sykes, A. An Introduction to Regression Analysis.

Inaugural Coase Lecture. Chicago Working Paper in Law & Economics.

• US General Accounting Office (1992), Quantitative Data Analysis: An Introduction. Report to Program Evaluation and Methodology Division.

• Colorado State University. Introduction to Statistics.http://writing.colostate.edu/guides/research/stats/index.cfm.

• William M.K. Trochim (2006). Research Methods Knowledge Base. http://www.socialresearchmethods.net/kb/index.php