A Simple Guide To The Analysis Of Social Science Quantitative Data

download A Simple Guide To The Analysis Of Social Science Quantitative Data

If you can't read please download the document

description

One of the complexities for many undergraduate students and for first time researchers is ‘How to blend their socialization with the systematic rigours of scientific inquiry?’ For some, the socialization process would have embedded in them hunches, faith, family authority and even ‘hearsay’ as acceptable modes of establishing the existence of certain phenomena. These are not principles or approaches rooted in academic theorizing or critical thinking. Despite insurmountable scientific evidence that have been gathered by empiricism, the falsification of some perspectives that students hold are difficulty to change as they still want to hold ‘true’ to the previous ways of gaining knowledge. Even though time may be clearly showing those issues are obsolete or even ‘mythological’, students will always adhere to information that they had garnered in their early socialization. The difficulty in objectivism is not the ‘truths’ that it claims to provide and/or how we must relate to these realities, it is ‘how do young researchers abandon their preferred socialization to research findings? Furthermore, the difficulty of humans and even more so upcoming scholars is how to validate their socialization with research findings in the presence of empiricism. Within the aforementioned background, social researchers must understand that ethic must govern the reporting of their findings, irrespective of the results and their value systems. Ethical principles, in the social or natural research, are not ‘good’ because of their inherent construction, but that they are protectors of the subjects (participants) from the researcher(s) who may think the study’s contribution is paramount to any harm that the interviewees may suffer from conducting the study. Then, there is the issue of confidentiality, which sometimes might be conflicting to the personal situations faced by the researcher. I will be simplistic to suggest that who takes precedence is based on the code of conduct that guides that profession. Hence, undergraduate students should be brought into the general awareness that findings must be reported without any form of alteration. This then give rise to ‘how do we systematically investigate social phenomena?’ The aged old discourse of the correctness of quantitative versus qualitative research will not be explored in this work as such a debate is obsolete and by rehashing this here is a pointless dialogue. Nevertheless, this textbook will forward illustrations of how to analyze quantitative data without including any qualitative interpretation techniques. I believe that the problems faced by students as how to interpret statistical data (ie quantitative data), must be addressed as the complexities are many and can be overcome in a short time with assistance. My rationale for using ‘hypotheses’ as the premise upon which to build an analysis is embedded in the logicity of how to explore social or natural happenings. I know that hypothesis testing is not the only approach to examining current germane realities, but that it is one way which uses more ‘pure’ science techniques than other approaches.

Transcript of A Simple Guide To The Analysis Of Social Science Quantitative Data

  • 1. A Simple Guide to the Analysis ofQuantitative Data An Introduction with hypotheses,illustrations and referencesByPaul Andrew Bourne

2. A Simple Guide to the Analysis of Quantitative Data: An Introduction with hypotheses, illustrations and referencesByPaul Andrew BourneHealth Research Scientist, the University of the West Indies,Mona Campus Department of Community Health and Psychiatry Faculty of Medical Sciences The University of the West Indies, Mona Campus, Kingston, Jamaica 2 3. Paul Andrew Bourne 2009A Simple Guide to the Analysis of Quantitative Data: An Introduction with hypotheses, illustrations and references The copyright of this text is vested in Paul Andrew Bourne and the Department of Community Health and Psychiatry is the publisher, no chapter may be reproduced wholly or in part without the expressed permission in writing of both author and publisher. All rights reserved. Published April, 2009 Department of Community Health and Psychiatry Faculty of Medical Sciences The University of the West Indies, Mona Campus, Kingston, Jamaica.National Library of Jamaica Cataloguing in Publication DataA catalogue record for this book is available from the National Library of JamaicaISBN 978-976-41-0231-1 (pbk) Covers were designed and photograph taken by Paul Andrew Bourne3 4. Table of ContentsPagePreface 8 Menu bar Contents of the Menu bar in SPSS11 Function - Purposes of the different things on the menu bar12 Mathematical symbols (numeric operations), in SPSS 13 Listing of Other Symbols 14 The whereabouts of some SPSS functions, or commands16 Disclaimer 19 Coding Missing Data20 Computing Date of Birth21 List of Figures26 List of Tables 29 How do I obtain access to the SPSS PROGRAM?35 1. INTRODUCTION ........431.1.0a: steps in the analysis of hypothesis 451.1.1a Operational definitions of a variable 471.1.1b Typologies of variable ... 491.1.1 Levels of measurement.....501.1.3 Conceptualizing descriptive and inferential statistics .. 592. DESCRIPTIVE STATISTICS ANALYZED ......... 62 2.1.1 Interpreting data based on their levels of measurement... 64 2.1.2 Treating missing (i.e. non-response) cases..843. HYPOTHESES: INTRODUCTION ..87 3.1.1 Definitions of Hypotheses.....88 3.1.2: Typologies of Hypothesis89 3.1.3: Directional and non-Directional Hypotheses..90 3.1.4 Outliers (i.e. skewness)..91 3.1.5 Statistical approaches for treating skewness. 934. Hypothesis 1[using Cross tabulations and Spearman ranked ordered correlation] .. 96A1. Physical and social factors and instructional resources will directly influence the academic performance of students who will write the Advanced Level Accounting Examination; A2. Physical and social factors and instructional resources positively influence the academic performance of students who write the Advanced level Accounting examination and that the relationship varies according to gender;4 5. B1. Pass successes in Mathematics, Principles of Accounts and English Language at the Ordinary/CXC General level will positively influence success on the Advanced level Accounting examination; B2. Pass successes in Mathematics, Principles of Accounts and English Language at the Ordinary.5. Hypothesis 2[using Crosstabulations].... 152There is a relationship between religiosity, academic performance, age and marijuana smoking of Post-primary schools students and does this relationship varies based on gender.6. Hypothesis 3...[Paired Sample t-test].164There is a statistical difference between the pre-Test and the post-Test scores.7. Hypothesis 4.[using Pearson Product Moment Correlation].......... 184Ho: There is no statistical relationship between expenditure on social programmes (public expenditure on education and health) and levels of development in a country; and H1: There is a statistical association between expenditure on social programmes (i.e. public expenditure on education and health) and levels of development in a country8. Hypothesis 5.. [using Logistic Regression]........199The health care seeking behaviour of Jamaicans is a function of educational level, poverty, union status, illnesses, duration of illnesses, gender, per capita consumption, ownership of health insurance policy, and injuries. [ Health Care Seeking Behaviour = f( educational levels, poverty, union status, illnesses, duration of illnesses, gender, per capita consumption, ownership of health insurance policy, injuries)]9. Hypothesis 6.. [using Linear Regression] ...207 There is a negative correlation between access to tertiary level education andpoverty controlled for sex, age, area of residence, household size, and educational level of parents10. Hypothesis 7.. [using Pearson Product Moment Correlation Coefficient andCrosstabulations]....................... 223There is an association between the introduction of the Inventory Readiness Test and the Performance of Students in Grade 1 5 6. 11. Hypothesis 8.[using Spearman rho].... 232 The people who perceived themselves to be in the upper class and middle class are more so than those in the lower (or working) class do strongly believe that acts of incivility are only caused by persons in garrison communities 12. Hypothesis 9........ 235Various cross tabulations13. Hypothesis 10[using Pearson and Crosstabulations]........ 249There is no statistical difference between the typology of workers in the construction industry and how they view 10-most top productivity outcomes14. Hypothesis 11.[using Crosstabulations and Linear Regression]........ 265Determinants of the academic performance of students 15. Hypothesis 12.[using Spearman ranked ordered correlation]........278People who perceived themselves to be within the lower social status (i.e. class) are more likely to be in-civil than those of the upper classes.16. Data Transformation........ 281Recoding 291 Dummying variables 309 Summing similar variables331 Data reduction 340Glossary.... ........ 350Reference...........352Appendices.... ........356 Appendix 1- Labeling non-responses 356 6 7. Appendix 2- Statistical errors in data357 Appendix 3- Research Design 359 Appendix 4- Example of Analysis Plan366 Appendix 5- Assumptions in regression 367 Appendix 6- Steps in running a bivariate cross tabulation 368 Appendix 7- Steps in running a trivariate cross tabulation380 Appendix 8- What is placed in a cross tabulations table, using the above SPSS output394 Appendix 9- How to run a Regression in SPSS 395 Appendix 10- Running Regression in SPSS 396 Appendix 11a- Interpreting strength of associations 407 Appendix 11b - Interpreting strength of association 408 Appendix 12- Selecting cases409 Appendix 13- UNDO selecting cases 417 Appendix 14- Weighting cases420 Appendix 15- Undo weighting cases 429 Appendix 15- Statistical symbolisms 440 Appendix 16 Converting from string to numeric data Apparatus One Converting from string data to numeric data443 Apparatus Two Converting from alphabetic and numeric data to all numeric data447Appendix 17- Steps in running Spearman rho454Appendix 18- Steps in running Pearsons Product Moment Correlation459Appendix 19-Sample sizes and their appropriate sampling error 464Appendix 20 Calculating sample size from sampling error(s)465Appendix 21 Sample sizes and their sampling errors467Appendix 22 - Sample sizes and their sampling errors468Appendix 23 If conditions 469Appendix 24 The meaning of value477Appendix 25 Explaining Kurtosis and Skewness478Appendix 26 Sampled Research Papers479-5607 8. PREFACEOne of the complexities for many undergraduate students and for first time researchers is How to blend their socialization with the systematic rigours of scientific inquiry? For some, the socialization process would have embedded in them hunches, faith, family authority and even hearsay as acceptable modes of establishing the existence of certain phenomena. These are not principles or approaches rooted in academic theorizing or critical thinking. Despite insurmountable scientific evidence that have been gathered by empiricism, the falsification of some perspectives that students hold are difficulty to change as they still want to hold true to the previous ways of gaining knowledge. Even though time may be clearly showing those issues are obsolete or even mythological, students will always adhere to information that they had garnered in their early socialization. The difficulty in objectivism is not the truths that it claims to provide and/or how we must relate to these realities, it is how do young researchers abandon their preferred socialization to research findings? Furthermore, the difficulty of humans and even more so upcoming scholars is how to validate their socialization with research findings in the presence of empiricism. Within the aforementioned background, social researchers must understand that ethic must govern the reporting of their findings, irrespective of the results and their value systems. Ethical principles, in the social or natural research, are not good because of their inherent construction, but that they are protectors of the subjects (participants) from the researcher(s) who may think the studys contribution is paramount to any harm that the interviewees may suffer from conducting the study. Then, there is the issue of confidentiality, which sometimes might be conflicting to the personal situations faced by the researcher. I will be simplistic to suggest that who takes precedence is based on the code of conduct that guides that profession. Hence, undergraduate students should be brought into the general awareness that findings must be reported without any form of alteration. This then give rise to how do we systematically investigate social phenomena?The aged old discourse of the correctness of quantitative versus qualitative research will not be explored in this work as such a debate is obsolete and by rehashing this here is a pointless dialogue. Nevertheless, this textbook will forward illustrations of how to analyze quantitative data without including any qualitative interpretation techniques. I believe that the problems faced by students as how to interpret statistical data (ie quantitative data), must be addressed as the complexities are many and can be overcome in a short time with assistance. My rationale for using hypotheses as the premise upon which to build an analysis is embedded in the logicity of how to explore social or natural happenings. I know that hypothesis testing is not the only approach to examining current germane realities, but that it is one way which uses more pure science techniques than other approaches.Hypothesis testing is simply not about null hypothesis, Ho (no statistical relationships), or alternative hypothesis, Ha, it is a systematic approach to the investigation of observable phenomenon. In attempting to make undergraduate students recognize the rich annals of hypothesis testing and how they are paramount to the discovery of social fact, I will 8 9. recommend that we begin by reading Thomas S. Kuhn (the Scientific Revolution), Emile Durkheim (study on suicide), W.E.B. DuBois (study on the Philadelphian Negro) and the works of Garth Lipps that clearly depict the knowledge base garnered from their usage.In writing this book, I tried not to assume that readers have grasped the intricacies of quantitative data analysis as such I have provided the apparatus and the solutions that are needed in analyzing data from stated hypotheses. The purpose for this approach is for junior researchers to thoroughly understand the materials while recognizing the importance of hypothesis testing in scientific inquiry. Paul Andrew Bourne, Dip Ed, BSc, MSc, PhD Health Research ScientistDepartment of Community Health and Psychiatry Faculty of Medical SciencesThe University of the West IndiesMona-Jamaica.9 10. ACKNOWLEDGEMENT This textbook would not have materialized without the assistance of a number of people (scholars, associates, and students) who took the time from their busy schedule to guide, proofread and make invaluable suggestions to the initial manuscript. Some of the individuals who have offered themselves include Drs. Ikhalfani Solan, Samuel McDaniel and Lawrence Nicholson who proofread the manuscript and made suggestions as to its appropriateness, simplicities and reach to those it intend to serve. Furthermore, Mr. Maxwell S. Williams is very responsible for fermenting the idea in my mind for a book of this nature. Special thanks must be extended to Mr. Douglas Clarke, an associate, who directed my thoughts in time of frustration and bewilderment, and on occasions gave me insight on the material and how it could be made better for the students. In addition, I would like to extend my heartiest appreciation to Professor Anthony Harriott and Dr. Lawrence Powell both of the department of Government, UWI, Mona- Jamaica, who are my mentors and have provided me with the guidance, scope for the material and who also offered their expert advice on the initial manuscript. Also, I would like to take this opportunity to acknowledge all the students of Introduction to Political Science (GT24M) of the class 2006/07 who used the introductory manuscript and made their suggestions for its improvement, in particular Ms. Nina Mighty.10 11. Men Bar Content:A social researcher should not only be cognizant of statistical techniques and modalities of performing his/her discipline, but he/she needs to have a comprehensive grasp of the various functions within the menu of the SPSS program. Where and what are constituted within the menu bar; and what are the contents functions? Menu bar containsthe following: - File- Edit- View- Data- Transform- Analyze- Graph- Utilities- Add-ons- Window- Help The functions of the various contents of themenu bar are explored overleaf Box 1: Menu Function 11 12. Menu Bar Functions: Purposes of the different things on the menu barFile This icon deals with the different functions associated with files such as (i) opening ..,(ii) reading , (iii) saving , (iv) existing.Edit This icon stores functions such as (i) copying, (ii) pasting, (iii) finding, and (iv)replacing.View Within this lie functions that are screen related.Data This icon operates several functions such as (i) defining, (ii) configuring, (iii) entering data, (iv) sorting, (v) merging files, (vi) selecting and weighting cases, and (vii) aggregating files.Transform Transformation is concerned with previously entered data including (i) recoding, (ii) computing, (iii) reordering, and (vi) addressing missing cases.Analyze This houses all forms of data analysis apparatus, with a simply click of the Analyze command. Graph Creation of graphs or charts can begin with a click on Graphs command Utilities This deals with sophisticated ways of making complex data operations easier, as well as just simply viewing the description of the entered data 12 13. MATHEMATICAL SYMBOLS (NUMERIC OPERATIONS), in SPSSNUMERIC OPERATIONSFUNCTIONS+ Add- Subtract* Multiply/ Divide ** Raise to a power () Order of operations< Less than> Greater than = Greater than or equal to= Equal ~= Not equal to &and: both relations must be trueI Or: either relation may be true~ Negation: true between false, falsebecome true Box 2: Mathematical symbols and their Meanings13 14. LISTING OF OTHER SYMBOLSSYMBOLS MEANINGS YRMODA (i.e. yr. month, day) Date of birth (e.g. 1968, 12, 05) a Y intercept b Coefficient of slope (or regression) f frequency n Sample size N Population R Coefficient of correlation, Spearmans r Coefficient of correlation , PearsonSy Standard error of estimateW ot WtWeight Mu or population mean Beta coefficient 3 or Measure of skewness summationStandard deviation2 Chi-Square or chi square, this is the value use to test for goodness of fit CCCoefficient of Contingencyfa Frequency of class interval above modal groupfb Frequency of class interval below modal groupXA single value or variable_Adjusted r, which is the coefficient ofRcorrelation corrected for the number of cases _ _ Arithmetic mean of X or YX or YRNDRound off to the nearest integerSYSMIS This denotes system-missing valuesMISSINGAll missing values Type I ErrorClaiming that events are related (or means are different when they are not Type II Error This assumes that events (or means are not different) when they arePhi coefficientr2 The proportion of variation in the dependent variable explained by the independent variable(s)14 15. LISTING OF OTHER SYMBOLSSYMBOLSMEANINGS P(A)Probability of event A P(A/B) Probability of event A given that eventB has happened CVCoefficient of variation SEStandard error OObserved frequencyXIndependent (explanatory, predictor)variable in regressionYDependent (outcome,response,criterion) variable in regression dfDegree of freedom tSymbol for the t ratio (the criticalratio that follows a t distribution R2Squared multiple correlation inmultiple regression 15 16. FURTHER INFORMATION ON TYPE I and TYPE II ErrorThe Real world The null hypothesis is really..TrueFalse Finding from your Survey You found that True No ProblemType 2 Error the null hypothesis is:FalseType 1 ErrorNo Problem THE WHEREABOUTS OF SOME SPSS FUNCTIONS Functions or Commands Whereabouts, in SPSS (the process in arriving at various commands)Mean, Analyze Mode, Descriptive statistics Median, Frequency Standard deviation, Skewness, or kurtosis, Statistics Range Minimum or maximum Analyze Chi-squareDescriptive statisticscrosstabs16 17. Analyze Pearsons Moment Correlation Correlate bivariate Analyze Spearmans rho Correlate Bivariate(ensure that you deselect Pearsons, and select Spearmans rho) Analyze Linear RegressionRegressionLinear Analyze Logistic RegressionRegressionBinary Analyze Discriminant AnalysisClassify Discriminant Analyze Mann-Whitney U TestNonparametric Test 2 Independent SamplesIndependent Sample t-test AnalyzeCompare meansIndependent Samples T-TestAnalyze Wilcoxon matched-pars test orNonparametric Test 2 Independent Samples Wilcoxon signed-rank test Analyze t-test Compare meansAnalyze Paired-samples t-testCompare meansPaired-samples T-test Analyze One-sample t-testCompare meansOne-samples T-test Analyze One-way analysis of variance Compare meansOne-way ANOVA 17 18. Analyze Factor AnalysisData reductionFactor Analyze Descriptive (for a single metric Descriptive statisticsDescriptive variable) Graphs Graphs (select the appropriate type) Pie chart Bar charts Histogram Graphs Scatter plotsScatterData Weighting casesWeight cases.Select weight cases byGraphs Selecting casesSelect cases If all conditions are satisfiedSelect If Transform Replacing missing values Missing cases values Box 3: The whereabouts of some SPSS Functions18 19. Disclaimer I am a trained Demographer, and as such, I have undertaken extensive review ofvarious aspects to the SPSS program. However, I would like to make this unequivocally clearthat this does not represent SPSS (Statistical Product and Service Solutions, formerly StatisticalPackage for the Social Sciences) brand. Thus, this text is not sponsored or approved by SPSS,and so any errors that are forthcoming are not the responsibility of the brand name.Continuing, the SPSS is a registered trademark, of SPSS Inc. In the event that you need morepertinent information on the SPSS program or other related products, this may be forwarded to:SPSS UK Ltd., First Floor, St. Andrews House, West Street, Working GU211EB, UnitedKingdom.19 20. Coding Missing DataThe coding of data for survey research is not limited to response, as we need to code missingdata. For example, several codes indicate missing values and the researcher should know themand the context in which they are applicable in the coding process. No answer in a surveyindicates something apart from the respondents refusal to answer or did not remember toanswer. The fundamental issue here is that there is no information for the respondent, as theinformation is missing.Table : Missing Data codes for Survey ResearchQuestion Refused answerDidnt know answer No answer recorded Less than 6 categories 7 8 9 More than 7 and less979899than 3 digits More than 3 digits997998 999 NoteLess than 6 categories when a question is asked of a respondent, the option (or response) maybe many. In this case, if the option to the question is 6 items or less, refusal can be 7, didntknow 8 or no answer 9.Some researchers do not make a distinction between the missing categories, and 999 are usedin all cases of missing values (or 99).20 21. Computing Date of Birth If you are only given year of birth Step 1 Step 1:First, select transform, and then compute 21 22. Step 2 On selecting compute variable it will provide this dialogue box22 23. Step 3 In the target variable, write the word which the researcher wants to use to represents the idea23 24. Step 4 If the SPSS program is more than 12.0 (ie 13 17), the next process is to select all in function group dialogue boxIn order toconvert yearof birth toactual age,selectXdate.Year24 25. Step 5 Replace the? markwithvariable inthe dataset Having selected XYear, use this arrow to take it into the Numeric Expression dialogue box25 26. LISTING OF FIGURES AND TABLESListing of FiguresFigure 1.1.1: Flow Chart: How to Analyze Quantitative Data?Figure 1.1.2: Properties of a Variable.Figure 1.1.3: Illustration of Dichotomous VariablesFigure 1.1.4: Ranking of the Levels of MeasurementFigure 1.1.5: Levels of MeasurementFigure 2.1.0: Steps in Analyzing Non-Metric DataFigure 2.1.1: Respondents GenderFigure 2.1.2: Respondents GenderFigure 2.1.3: Social Class of RespondentsFigure 2.1.4: Social Class of RespondentsFigure 2.1.5: Steps in Analyzing Metric DataFigure 2.1.6: Running SPSS for a Metric VariableFigure 2.1.7: Running SPSS for a Metric VariableFigure 2.1.8: Running SPSS for a Metric VariableFigure 2.1.9: Running SPSS for a Metric VariableFigure 2.1.10: Running SPSS for a Metric VariableFigure 2.1.11: Running SPSS for a Metric VariableFigure 2.1.12: Running SPSS for a Metric VariableFigure 2.1.13: Running SPSS for a Metric VariableFigure 2.1.14: Running SPSS for a Metric VariableFigure 2.1.15: Running SPSS for a Metric Variable26 27. Figure 2.1.16: Running SPSS for a Metric VariableFigure 4.1.1: Age - Descriptive StatisticsFigure 4.1.2: Gender of RespondentsFigure 4.1.3: Respondents parent educational levelFigure 4.1.4: Parental/Guardian Composition for RespondentsFigure 4.1.5: Home Ownership of Respondents Parent/GuardianFigure 4.1.6: Respondents Affected by Mental and/or Physical IllnessesFigure 4.1.7: Suffering from mental illnessesFigure 4.1.8: Affected by at least one Physical IllnessesFigure 4.1.9: Dietary Consumption for RespondentsFigure 6.1.2: Typology of Previous SchoolFigure 6.1.3: Skewness of Examination i (i.e. Test i)Figure 6.1.4: Skewness of Examination ii (i.e. Test ii)Figure 6.1.5: Perception of AbilityFigure 6.1.6: Self-perceptionFigure 6.1.7: Perception of taskFigure 6.1.8: Perception of utilityFigure 6.1.9: Class environment influence on performanceFigure 6.1.10: Perception of AbilityFigure 6.1.11: Self-perceptionFigure 6.1.12: Self-perceptionFigure 6.1.13: Perception of taskFigure 6.1.14: Perception of Utility27 28. Figure 6.1.15: Class Environment influence on PerformanceFigure 7.1.1: Frequency distribution of total expenditure on health as % of GDPFigure 7.1.2: Frequency distribution of total expenditure on education as % of GNPFigure 7.1.3: Frequency distribution of the Human Development IndexFigure 7.1.4: Running SPSS for social expenditure on social programmeFigure 7.1.5: Running bivariate correlation for social expenditure on social programmeFigure 7.1.6: Running bivariate correlation for social expenditure on social programme Figure13.1.1: Categories that describe Respondents PositionFigure13.1.2: Companys Annual Work VolumeFigure13.1.3: Companys Labour Force on an averAge per yearFigure13.1.4: Respondents main Area of Construction WorkFigure13.1.5: Percentage of work self-performed in contrast to sub-contractedFigure13.1.6: Percentage of work self-performed in contrast to sub-contractedFigure 13.1.7: Years of Experience in Construction IndustryFigure13.1.8: Geographical Area of EmploymentFigure13.1.9: Duration of service with current employerFigure13.1.10: Productivity changes over the past five yearsFigure 14.1.1: Characteristic of Sampled PopulationFigure 14.1.2: Employment Status of Respondents 28 29. Listing of Tables Table 1.1.1: Synonyms for the different Levels of measurementTable 1.1.2: Appropriateness of Graphs, from different Levels of measurementTable 1.1.3: Levels of measurement1 with examples and other characteristicsTable1.1.4:Levels of measurement, and measure of central tendencies and measure of variabilityTable1.1.5: combinations of Levels of measurement, and types of statistical Test which are applicationTable 1.1.6a: Statistical Tests and their Levels of MeasurementTable 1.1.6b:Table 2.1.1a: Gender of RespondentsTable 2.1.1b: General happinessTable 2.1.2: Social StatusTable 2.1.3: Descriptive Statistics on the Age of the RespondentsTable 2.1.4:From the following list, please choose what the most important characteristic ofdemocracy are for youTable 4.1.1: Respondents AgeTable 4.1.2 (a) Univariate Analysis of the explanatory VariablesTable 4.1.2(b): Univariate Analysis of explanatoryTable 4.1.2 (c): Univariate Analysis of explanatoryTable 4.1.3: Bivariate Relationships between academic performance and subjective Social Class (n=99)1 29 30. Table 4.1.4:Bivariate Relationships between comparative academic performance andsubjective Social Class (n=108) Table 4.1.5: Bivariate Relationships between academic performance and physical exercise (n=111)Table 4.1.6 (i): Bivariate Relationships between academic performance and instructional materials (n=113)Table 4.1.6 (ii) Relationship between academic performance and materials among studentswho will be writing the A Level Accounting Examination, 2004Table 4.1.7: Bivariate Relationships between academic performance and Class attendance (n=106)Table 4.1.8: Bivariate Relationship between academic performance and attendanceTable 4.1.9: Bivariate Relationships between academic performance and breakfast consumption, (n=114)Table 4.1.10: Relationship between academic performances and breakfasts consumption among A Level Accounting students, controlling for GenderTable 4.1.11: Bivariate Relationships between academic performance andmigraine (n=116)Table 4.1.12: Bivariate Relationships between academic performance and mental illnesses, (n=116)Table 4.1.13: Bivariate Relationships between academic performance and physical illnesses,(n=116)Table 4.1.14: Bivariate Relationships between academic performance and illnesses (n=116)Table 4.1.15. Bivariate Relationships between current academic performance and past performance in CXC/GCE English language Examination, (n= 112) Table 4.1.16: Bivariate Relationships between academic performance and past performance inCXC/GCE English language Examination, controlling for GenderTable 4.1.17: Bivariate Relationships between academic performance and past performance inCXC/GCE Mathematics Examination n=Table 4.1.18 (i): Bivariate Relationships between academic performance and past performancein CXC/GCE principles of accounts Examination (n= 114) 30 31. Table 4.1.19 (ii):Bivariate Relationships between academic performance and pastperformance in CXC/GCEPOA Examination, controlling for GenderTable 4.1.20: Bivariate Relationships between academic performance and Self-Concept (n= 112)Table 4.1.21: Bivariate Relationships between academic performance and DietaryRequirements (n=116)Table 4.1.22: Summary of TablesTable 5.1.1: Frequency and percent Distributions of explanatory model VariablesTable 5.1.2: Relationship between Religiosity and Marijuana Smoking (n=7,869)Table 5.1.3: Relationship between Religiosity and Marijuana Smoking controlled for GenderTable 5.1.4: Relationship between Age and marijuana smoking (n=7,948)Table 5.1.5: Relationship between marijuana smoking and Age of Respondents, controlledfor sexTable 5.1.6: Relationship between academic performances and marijuana smoking,(n=7,808)Table 5.1.7: Relationship between academic performances and marijuana smoking,controlled for GenderTable 5.1.8: Summary of TablesTable 6.1.1: Age Profile of respondentTable 6.1.2: Examination ScoresTable 6.1.3(a): Class Distribution by GenderTable 6.1.3(b): Class Distribution by Age CohortsTable 6.1.3(c): Pre-Test Score by Typology of GroupTable 6.1.3(c): Pre-Test Score by Typology of GroupTable 6.1.4: Comparison of Examination I and Examination IITable 6.1.5: Comparison a Cross the Group by Tests 31 32. Table 6.1.6: Analysis of Factors influence on Test ii ScoresTable 6.1.7: Cross-Tabulation of Test ii Scores and FactorsTable 6.1.8: Bivariate Relationship between students Factors and Test ii ScoresTable 7.1.1: Descriptive Statistics - total expenditure on public health (as Percentage of GNPHRD, 1994)Table 7.1.2: Descriptive Statistics of expenditure on public education (as Percentage of GNP,Hrd, 1994) Table 7.1.3: Descriptive Statistics of Human Development (proxy for development)Table 7.1.4: Bivariate Relationships between dependent and independent VariablesTable 7.1.5: Summary of Hypotheses AnalysisTable8.1.1: Age Profile of Respondents (n = 16,619)Table 8.1.2: Logged Age Profile of Respondents (n = 16,619)Table 8.1.3: Household Size (all individuals) of RespondentsTable 8.1.4: Union Status of the sampled Population (n=16,619)Table 8.1.5: Other Univariate Variables of the Explanatory ModelTable 8.1.6: Variables in the Logistic EquationTable 8.1.7: Classification TableTable 8.1.1: Univariate AnalysesTable 8.1.2: Frequency Distribution of Educational Level by QuintileTable 8.1.3: Frequency Distribution of Jamaicas Population by Quintile and GenderTable 8.1.4: Frequency Distribution of Educational Level by QuintileTable 8.1.5: Frequency Distribution of Pop. Quintile by Household SizeTable 8.1.6: Bivariate Analysis of access to Tertiary Edu. and Poverty StatusTable 8.1.7:Bivariate Analysis of access to Tertiary Edu. and Geographic Locality ofResidents32 33. Table 8.1.8: Bivariate Analysis of geographic locality of residents and poverty StatusTable 8.1.9: Bivariate Relationship between access to tertiary level education by GenderTable 8.1.10: Bivariate Relationship between Access to Tertiary Level Education by Gender controlled for Poverty StatusTable 8.1.11: Regression Model SummaryTable 10.1.1: Univariate Analysis of Parental InformationTable 10.1.2: Descriptive on Parental InvolvementTable 10.1.3: Univariate Analysis of Teachers InformationTable 10.1.4: Univariate Analysis of ECERS-R ProfileTable 10.1.5: Bivariate Analysis of Self-reported Learning Environment and Mastery onInventory TestTable 10.1.6: Relationship between Educational Involvement, Psychosocial and Environment involvement and Inventory TestTable 10.1.6: Relationship between Educational Involvement, Psychosocial and EnvironmentInvolvement and Inventory TestTable 10.1.8: School Type by Inventory Readiness ScoreTable 11.1.1: Incivility and Subjective Social StatusTable 12.1.2: Have you or someone in your family known of an act of Corruption in the last 12 months?Table 12.1.3: Gender of RespondentTable 12.1.4: In what Parish do you live?Table 12.1.5: Suppose that you, or someone close to you, have been a victim of a crime. What would you do...?Table 12.1.6: What is your highest level of Education?Table 12.1.7: In terms of Work, which of these best describes your Present situation?Table 12.1.8: Which best represents your Present position in Jamaica Society?Table 12.1.9: Age on your last Birthday?Table 12.1.10: Age categorization of Respondents33 34. Table 12.1.11: Suppose that you, or someone close to you, have been a victim of a crime. what would you do... by Gender of respondent Cross TabulationTable 12.1.12: If involved in a dispute with neighbour and repeated discussions have not made adifference, would you...? by Gender of respondent Cross TabulationTable 12.1.13: Do you believe that corruption is a serious problem in Jamaica? by Gender ofrespondent Cross TabulationTable 12.1.14: have you or someone in your family known of an act of corruption in the last 12months? by Gender of respondent Cross TabulationTable 14.1.1: Marital Status of RespondentsTable 14.1.2: Marital Status of Respondents by GenderTable 14.1.3: Marital Status by Gender by Age cohortTable 14.1.4: Marital Status by Gender by Age CohortTable 14.1.5 Educational Level by Gender by Age CohortsTable 14.1.6: Income Distribution of RespondentsTable 14.1.7: Parental Attitude Toward SchoolTable 14.1.8: Parent Involving SelfTable 14.1.9: School Involving ParentTable 14.1.8: Regression Model SummaryTable 15.1.1: CorrelationsTable 15.1.2: Cross Tabulation between incivility and social status 34 35. How do I obtain access to the SPSS PROGRAM?Step One:In order to access the SPSS program, the student should select START to thebottom left hand corner of the computer monitor. This is followed by selectingAll programs (see below).Select START and then AllProgram 35 36. Step Two:The next step to the select SPSS for widows. Having chosen SPSS forwidows to the right of that appears a dialogue box with the following options SPSS for widows; SPSS 12.0 (or 13.0or, 15.0); SPSS Map Geo-dictionaryManager Ink; and last with SPSS Manager.SelectSPSS for widows 36 37. Step Three: Having done step two, the student will select SPSS 12.0 (or 13.0, or 14.0 or 15.0) forWidows as this is the program with which he/she will be working. Select SPSS 12.0 (or 13.0, or 14.0 or 15.0) for Widows37 38. Step Four:On selecting SPSS for widows in step 3, the below dialogue box appears. Thenext step is the select OK, which result in what appears in step five.SelectOK38 39. Step Five:39 40. What should I now do? The student should then select the inner red box with the X. Select the inner red box with the X.40 41. Step Six: This is what the SPSS spreadsheet looks like (see Figure below).41 42. 42 43. Step Seven:What is the difference here? Look to the bottom left-hand cover the spreadsheetand you will see two terms (1) Data View and (2) Variable View. DataView accommodates the entering of the data having established the template inthe Variable View. Thus, the variable view allows for the entering of data (i.e.responses from the questionnaires) in the Data View. Ergo, the student mustensure that he/she has established the template, before any typing can be done inthe Data View. widow looks like Data View Observe what theData View 43 44. 44 Variable View Observe what the Variable View widow looks like 45. CHAPTER 1 1.1.0a: INTRODUCTIONThis book is in response to an associates request for the provision of some material that wouldadequately provide simple illustrations of How to analyze quantitative data in the SocialSciences from actual hypotheses. He contended that all the current available textbooks,despite providing some degree of analysis on quantitative data, failed to provide actualillustrations of cases, in which hypotheses are given and a comprehensive assessment made toanswer issues surrounding appropriate univariate, bivariate and/or multivariate processes ofanalysis. Hence, I began a quest to pursued textbooks that presently exist in Research Methodsin Social Sciences, Research Methods in Political Sciences, Introductory Statistics,Statistical Methods, Multivariate Statistics, and Course materials on Research Methodswhich revealed that a vortex existed in this regard. Hence, I have consulted a plethora of academic sources in order to formulate this text.In wanting to comprehensively fulfill my friends request, I have used a number of dataset thatI have analyzed over the past 6 years, along with the provision of key terminologies which areapplicable to understanding the various hypotheses. I am cognizant that a need exist to provide some information in Simple QuantitativeData Analysis but this text is in keeping with the demand to make available materials foraiding the interpretation of quantitative data, and is not intended to unveil any new materialsin the discipline. The rationale behind this textbook is embedded in simple reality that manyundergraduate students are faced with the complex task of how to choose the most appropriatestatistical test and this becomes problematic for them as the issue of wanting to complete an 45 46. assignment, and knowing that it is properly done, will plague the pupil. The answer to thisquestion lies in the fundamental issues of - (1) the nature of the variables (continuous ordiscrete), and (2) what is the purpose of the analysis is to mere description, or to providestatistical inference and/or (3) if any of the independent variables are covariates2. Nevertheless,the materials provided here are a range of research projects, which will give new informationon particular topics from the hypothesis to the univariate analysis and the bivariate ormultivariate analyses. 2 If the effects of some independent variables are assessed after the effects of other independent variables are statistically removed (Tabachnick and Fidell 2001, 17)46 47. 1.1.0b: STEPS IN ANALYZING A HYPOTHESIS One of the challenges faced by a social researcher is how to succinctly conceptualize (i.e.define) his/her variables, which will also be operationalized (measured) for the purpose of thestudy. Having written a hypothesis, the researcher should identify the number of variableswhich are present, from which we are to identify the dependent from the independent variables.Following this he/she should recognize the level of measurement to which each variablebelongs, then the which statistical test is appropriate based on the level of measurementcombination of the variables. The figure below is a flow chart depicting the steps in analyzingdata when given a hypothesis. The production of this text is in response to the provision of a simple book whichwould address the concerns of undergraduate students who must analyze a hypothesis. Amongthe issues raise in this book are (1) the systematic steps involved in the completion ofanalyzing a hypothesis, (2) definitions of a hypothesis, (3) typologies of hypothesis, (4)conceptualization of a variable, (4) types of variables, (5) levels of measurement, (6)illustration of how to perform SPSS operations on the description of different levels ofmeasurement and inferential statistics, (7) Type I and II errors, (8) arguments on the treatmentof missing variables as well as outliers, (9) how to transform selected quantitative data, (10)and other pertinent matters. The primary reason behind the use of many of the illustrations, conceptualizations andperipheral issues rest squarely on the fact the reader should grasp a thorough understanding ofhow the entire process is done, and the rationale for the used method. 47 48. STEP ONESTEP TEN Write your Having used the HypothesisSTEP TWO test, Identify the analyze the data variables from the carefully, based onhypothesis the statistical test STEP TENSTEP THREE Choose theDefine andappropriateoperationalizestatistical test based each variableon the combination selected from theof DV and IVS, and hypothesisSTEP NINESTEP FOURANALYZINGIf statistical Inference is needed, look at theQUANTITATIVE Decide on the levelcombination DV andDATAof measurement IV(s)for each variableSTEP EIGHTSTEP FIVE If statisticalassociation, causalityDecide which or predictability isneed, continue, if notvariable is DV, and stop!IVSTEP SIX STEP SEVEN Check for Do descriptiveskewness, and/orstatistics for chosenoutliers in metric variables selectedvariables FIGURE 1.1.1: FLOW CHART: HOW TO ANALYZE QUANTITATIVE DATA?This entire text is how to analyze quantitative data from hypothesis, but based on Figure1.1.1, it may appear that a research process begins from a hypothesis, but this is not the case.Despite that, I am emphasizing interpreting hypothesis, which is the base for this monographstarting from an actual hypothesis. Thus, before I provide you with operational definitions of48 49. variables, I will provide some contextualization of what is a variable? then the steps will beworked out. 49 50. 1.1.1a: DEFINITIONS OF A VARIABLEUndergraduates and first time researchers should be aware that quantitative data analysis are primarily based on (1) empirical literature, (2) typologies of variables within the hypothesis, (3) conceptualization and operationalization of the variables, (4) the level of measurement for each variables. It should be noted that defining a variable is simply not just the collation a group of words together, because we feel a mind to as each variable requires two critical characteristics in order that it is done properly (see Figure 1.1.2). PROPERITIES OF A VARIABLE MUTUAL EXCLUSIVITIY EXHAUSTIVNESSFIGURE 1.1.2: PROPERTIES OF A VARIABLE.In order to provide a comprehensive outlook of a variable, I will use the definitions of avarious scholars so as to give a clear understanding of what it is. Variables are empirical indicators of the concepts we are researching. Variables, as their name implies, have the ability to take on two or more values...The categories of each variable must have two requirements. They should be both exhaustive and mutually exclusive. By exhaustive, we mean that the categories of each variable must be comprehensive enough that it is possible to categorize every observation (Babbie, Halley, and Zaino 2003, 11)... Exclusive refers to the fact that every observation should fit into only one category (Babbie, Halley and Zaino 2003, 12)A variable is therefore something which can change and can be measured. (Boxill, Chambers and Wint 1997, 22) 50 51. The definition of a variable, then, is any attribute or characteristic of people, places, or events that takes on different values. (Furlong, Lovelace, Lovelace 2000, 42)A variable is a characteristic or property of an individual population unit (McClave, Benson and Sincich 2001, 5)Variable. A concept or its empirical measure that can take on multiple values (Neuman 2003, 547).Variables are, therefore, the quantification of events, people, and places in order to measure observations which are categorical (i.e. nominal and ordinal data) and non-categorical (i.e. metric) in an attempt to be informed about the observation in reality. Each variable must fill two basic conditions (i) Exhaustiveness the variable must be so defined that all tenets are captured as its is comprehensive enough include all the observations, and (ii) mutually exclusivity the variable should be so defined that it applies to one event and one event only (i.e. Every observation should fit into only one category) (Bourne 2007). One of the difficulties of social research is not the identification of a variable orvariables in the study but its the conceptualization and oftentimes the operationalization ofchosen construct. Thus, whereas the conceptualization (i.e. the definition) of the variable may(or may not) be complex, it is the how do you measure such a concept (i.e. variable) whichoftentimes possesses the problem for researchers. Why this must be done properly bearing inmind the attributes of a variable, it is this operational definition, which you will be testing inthe study (see Typologies of Variables, below). Thus, the testing of hypothesis is embeddedwithin variables and empiricism from which is used to guide present studies. Hypothesistesting is a technique that is frequently employed by demographers, statisticians, economists,psychologists, to name new practitioners, who are concerned about the testing of theories, andthe verification of reality truths, and the modifications of social realities within particular time,space and settings. With this being said, researchers must ensure that a variable is properlydefined in an effort to ensure that the stated phenomenon is so defined and measured. 51 52. 1.1.1b TYPOLOGIES of VARIABLE (examples, using Figure 1.1.2, above) Health care seeking behaviour: is defined as people visiting a health practitioner or healthconsultant such as doctor, nurse, pharmacist or healer for care and/ or advice.Levels of education: This is denominated into the number of years of formal schooling thatone has completed.Union status It is a social arrangement between or among individuals. This arrangementmay include conjugal or a social state for an individual.Gender: A sociological state of being male or female.Per capita income: This is used a proxy for income of the individual by analyzing theconsumption pattern.Ownership of Health insurance: Individuals who possess of an insurance polic/y (ies).Injuries: A state of being physically hurt. The examples here are incidences of disability,impairments, chronic or acute cuts and bruises.Illness: A state of unwellness.Age: The number of years lived up to the last birthday.Household size - The numbers of individuals, who share at least one common meal, usecommon sanitary convenience and live within the same dwelling. Now that the premise has been formed, in regard to the definition of a variable, the nextstep in the process is the category in which all the variables belong. Thus, the researcher needsto know the level of measurement for each variable - nominal; ordinal; interval, or ration (see1.1.2a). 52 53. 1.1.2a: LEVELS OF MEASUREMENT3: Examples and definitionsNominal - The naming of events, peoples, institutions, and places, which are coded numericalby the researcher because the variable has no normal numerical attributes. Thisvariable may be either (i) dichotomous, or (ii) non-dichotomous. Dichotomous variable The categorization of a variable, which has only two sub-groupings - for example, gender male and female; capital punishment permissive and restrictive; religious involvement involved and not involved. Non-dichotomous variable The naming of events which span more than twosub-categories (example Counties in Jamaica Cornwall, Middlesex and Surrey;Party Identification Democrat, Independent, Republican; Ethnicity Caucasian,Blacks, Chinese, Indians; Departments in the Faculty of Social Sciences Management Studies, Economics, Sociology, Psychology and Social Work,Government; Political Parties in Jamaica Peoples National Party (PNP),Jamaica Labour Party (JLP), and the National Democratic Movement (NDM);Universities in Jamaica University of the West Indies;University ofTechnology, Jamaica; Northern Caribbean University; University College of theCaribbean; et cetera)Ordinal - Rank-categorical variables: Variables which name categories, which by their verynature indicates a position, or arrange the attributes in some rank ordering (Theexamples here are as follows i) Level of Educational Institutions Primary/Preparatory, All-Age, Secondary/High, Tertiary; ii) Attitude toward guncontrol strongly oppose, oppose, favour, strongly favour; iii) Social status upper--upper, upper-middle, middle-middle, lower-middle, lower class; iv)Academic achievement A, B, C, D, F.Interval or ratio These variables share all the characteristics of a nominal and an ordinal variablealong with an equal distance between each category and a true zero value (forexample age; weight; height; temperature; fertility; votes in an election,mortality; population; population growth; migration rates, . Now that the definitions and illustrations have been provided for the levels of measurement,the student should understand the position of these measures (see 1.1.2b).3Stanley S. Stevens is created for the development of the typologies of scales level of measurement (i) nominal, (ii) ordinal, (iii) interval and (iv) ratio. (see Steven 1946, 1948, 1968; Downie and Heath 1970) 53 54. Dichotomy(orDichotomousvariable Typologies ofGenderScienceBookNon- Fictional MaleFemalePure AppliedFictionalAlive DeadInduction Deduction Non- Parametric Burial Non-burialparametricstatistics statistics Religious Non-religious Non-use primaryuse secondaryDecomposed datadataserviceservicedecomposed Figure 1.1.3: Illustration of dichotomous variables 54 55. 1.1.2b: RANKING LEVELS OF MEASUREMENT RATIOhighestINTERVAL ORDINALlowest NOMINAL Figure 1.1.4: Ranking of the levels of measurementThe very nature of levels of measurement allows for (or do not allow for) data manipulation. Ifthe level of measurement is nominal (for example fiction and non-fiction books), then theresearcher does not have a choice in the reconstruction of this variable to a level which isbelow it. If the level of measurement, however, is ordinal (for example no formal education,primary, secondary and tertiary), then one may decide to use a lower level of measure (forexample below secondary and above secondary). The same is possible with an intervalvariable. The social scientist may want to use one level down, ordinal, or two levels down,nominal. This is equally the same of a ratio variable. Thus, the further ones go up thepyramid, the more scope exists in data transformation.55 56. Table 1.1.1: Synonyms for the different Levels of measurementLevels of Measurement Other termsNominal Categorical; qualitative, discrete4 Ordinal Qualitative, discrete; rank-ordered; categoricalInterval/Ratio Numerical, continuous5, quantitative; scale; metric, cardinal Table 1.1.2: Appropriateness of Graphs for different levels of measurement Levels of Measurement GraphsBar chart Pie chartHistogram Line Graph Nominal ____ ____ Ordinal __ __ Interval/Ratio (or metric) 4 Discrete variable take on a finite and usually small number of values, and there is no smooth transition from one value or category to the next gender, social class, types of community, undergraduate courses 5 Continuous variables are measured on a scale that changes values smoothly rather than in steps 56 57. Table 1.1.3: Levels of measurement6 with Examples and Other CharacteristicsLevels of Measurement NominalOrdinal IntervalRatioExamples Gender Social class TemperatureAgeReligion Preference Shoe sizeHeightPolitical PartiesLevel of education Life spanWeightRace/EthnicityGender equityReaction timePolitical Ideologieslevels of fatigueIncome; Score on an Exam.Noise levelFertility; Population of a countryJob satisfaction Population growth; crime ratesMathematical propertiesIdentity IdentityIdentity Identity ____ Magnitude Magnitude Magnitude_________Equal IntervalEqual interval____ _____ _____ True zeroMathematical Operation(s) None RankingAddition;Addition;SubtractionSubtraction; Division; MultiplicationCompiled: Paul A. Bourne, 2007; a modification of Furlong, Lovelace and Lovelace 2000, 746Levels of measurement concern the essential nature of a variable, and it is important to know this because it determines what one can do with a variable (Burham, Gilland, Grant and Layton-Henry 2004, 114)57 58. Table1.1.4: Levels of measurement, Measure of Central Tendency and Measure of VariabilityLevels of Measurement Measure of central tendencies Measure of variability MeanMode MedianMean deviationStandard deviationNominalNANANA NAOrdinal NA NA NAInterval/Ratio7 NA denotes Not Applicable 7 Ratio variable is the highest level of measurement, with nominal being first (i.e. lowest); ordinal, second; and interval, third.58 59. Table1.1.5: Combinations of Levels of measurement, and types of Statistical test which are applicable8Levels of Measurement Statistical TestDependent Independent Variable NominalNominalChi-squareNominalOrdinalChi-square; Mann-WhitneyNominalInterval/ratio Binomial distribution; ANOVA;Logistic Regression; Kruskal-Wallis Discriminant AnalysisOrdinalNominalChi-squareOrdinalOrdinalChi-square; Spearman rho;OrdinalInterval/ratio Kruskal-Wallis H; ANOVAInterval/ratio NominalANOVA;Interval/ratio Ordinal Interval/ratio Interval/ratioPearson r, Multiple RegressionIndependent-sample t test Table 1.1.5 depicts how a dependent variable, which for example is nominal, which when combined with an independent variable,Nominal, uses a particular statistical test.8 One of the fundamental issues within analyzing quantitative data is not merely to combine then interpret data, but it is to use each variable appropriately. This is further explained below.59 60. STATISTICAL TESTS AND THEIR LEVELS OF MEASUREMENTTest IndependentDependentVariablevariableChi-Square (2) Nominal, OrdinalNominal, Ordinal Mann-Whitney U DichotomousNominal, Ordinal test Kruskal-Wallis H Non-dichotomous,Ordinal, or skewed9 testOrdinal Metric Pearsons r Normally distributed10Normally distributedMetric Metric Linear Regress Normally distributed Normally distributedMetric, dummyMetric IndependentDichotomousNormally distributed Samples Metric T-test AVONA Nominal, Ordinal Normally distributed (non-dichotomous11) Metric Logistic regressionMetric, dummy Dichotomous (skewedvalues or otherwise Discriminant Metric, dummy Dichotomous (normally distributed analysisvalue)Notes to Table 1.1.6bChi-Square (2) Used to test for associations between two variables Mann-Whitney U test Used to determine differences between two groups Kruskal-Wallis H test Used to determine differences between three or more groups Pearsons r Used to determine strength and direction of a relationship between two values Linear Regression Used to determine strength and direction of a relationship between two or more values Independent Samples T-testUsed to determine difference between two groups AVONA Used to determine difference between three or more groups Logistic regression Used to predict relationship between many values Discriminant analysis Used to predict relationship between many values 9 Skewness indicates that there is a pileup of cases to the left or right tail of the distribution 10Normality is observed, whenever, the values of skewness and kurtosis are zero 11Non-dichotomous (i.e. polytomous) which denotes having many (i.e. several) categories 61 61. LEVELS OF MEASURMENTANDTHEIR MEASURING ASSOCIATIONLEVELS OF MEASUREMENTNOMINALORDINAL INTERVAL/RATIOLambdaGamma Pearsons rCramers VSomers DContingency coefficientsKendall s tau-BPhi Kendalls tau-cFigure 1.1.5: Levels of measurement Lambda ( ) This is a measure of statistical relationship between the uses of two nominalvariables Phi () This is a measure of association between the use of two dichotomousvariables (i.e. dichotomous dependent and dichotomous independent) [ = [ 2/N]Cramers V (V) This is a measure of association between the use of two nominal variables (i.e. in the event that there is dichotomous dependent and dichotomous independent) V = [ 2/N(k 1)] is identical to phi. Gamma ( ) This is used to measure the statistical association between ordinal by ordinal variableContingency coefficient (cc) Is used for association in which the matrix is more than 2 X 2 (i.e. 2 for dependent and 2 for the independent for example 2X3; 3X2; 3X3 ) - [2/ 2 + N]Pearsons r This is used for non-skewed metric variables - nxy - x.y [nx2 (x) 2 - [ny2 (y) 262 62. 1.1.3: CONCEPTUALIZING DESCRIPTIVE AND INFERENTIAL STATISTICSResearch is not done in isolation from the reality of the wider society. Thus, the socialresearcher needs to understand whether his/her study is descriptive and/or inferential as itguides the selection of certain statistical tools. Furthermore, an understanding of twoconstructs dictate the extent to which the analyst will employ as there is a cleardemarcation between descriptive and inferential statistics.In order to grasp thisdistinction, I will provide a number of authors perspectives on each terminology.Descriptive statistics describe samples of subjects in terms of variables or combinationof variables (Tabachnick and Fidell 2001, 7)Numerical descriptive measures are commonly used to convey a mental image ofpictures, objects, tables and other phenomenon. The two most common numericaldescriptive measures are: measures of central tendencies and measures of variability(McDaniel 1999, 29; see also Watson, Billingsley, Croft and Huntsberger 1993, 71)Techniques such as graphs, charts, frequency distributions, and averages may be usedfor description and these have much practical use (Yamane 2973, 2; see also Blaikie2003, 29; Crawshaw and Chambers 1994, Chapter 1)Descriptive statistics statistics which help in organizing and describing data, includingshowing relationships between variables (Boxill, Chamber and Wind 1997, 149) 63 63. Well see that there are two areas of statistics: descriptive statistics, which focuses ondeveloping graphical and numeral summaries that describes somephenomenon, andinferential statistics, which uses these numeral summaries to assist in makingdecisions (McClave, Benson, Sinchich 2001, 1)Descriptive statistics utilizes numerical and graphical methods to look for patterns in adata set, to summarize the information revealed in a data set, and to present theinformation in a convenient form (McClave, Benson and Sincich 2001, 2)Inferential statistics utilizes sample data to make estimates, decisions, predictions, orother generalizations about a larger set of data (McClave, Benson and Sincich 2001, 2)The phrase statistical inference will appear often in this book. By this we mean, wewant to infer or learn something about the real world by analyzing a sample of data.The ways in which statistical inference are carried out include: estimatingparameters;predictingoutcomes, and testinghypothesis (Hill, Griffiths and Judge 2001, 9). Inferential statistics is not only about causal relationships; King, Keohane andVerba argue that it is categorized into two broad areas: (1) descriptive, and (2) causalinference. Thus, descriptive inference speaks to the description of a population fromwhat is made possible, the sample size. According to Burham, Gilland, Grant andLayton-Henry (2004) state that: Causal inferences differ from descriptive ones in one very significant way: theytake a leap not only in terms of description, but in terms of some specific causal 64 64. process [i.e. predictability of the variables] (Burham, Gilland, Grand and Layton-Henry 2004, 148). In order that this textbook can be helping and simple, I will provide operationaldefinitions of concepts as well as illustration of particular terminologies along withappropriateness of statistical techniques based on the typologies of variable and the levelof measurement (see in Tables 1.1.1 1.1.6, below). 65 65. CHAPTER 22.1.0: DESCRIPTIVE STATISTICS The interpretation of quantitative data commences with an overview (i.e. backgroundinformation on survey or study this is normally demographic information) of thegeneral dataset in an attempt to provide a contextual setting of the research (descriptivestatistics, see above), upon which any association may be established (inferentialstatistics, see above). Hence, this chapter provides the reader with the analysis ofunivariate data (descriptive statistics), with appropriate illustration of how various levelsof measurement may be interpreted, and/or diagrams chosen based on their suitability.A variable may be non-metric (i.e. nominal or ordinal) or metric (i.e. scale,interval/ratio). It is based on this premise that particular descriptive statistics are provide.In keeping with this background, I will begin this process with non-metric, then metricdata. The first part of this chapter will provide a thorough outline of how nominal and/orordinal variables are analyzed. Then, the second aspect will analyze metric variables. 66 66. STEP ONEEnsure that the STEP TEN variable is non-Analyze the output metric (e.g. Gender, STEP TWO(use Table 2.1.1a) general happiness) Select Analyze STEP TEN STEP THREESelect descriptive select paste or ok statisticsHOW TO DO DESCRIPTIVESTEP NINESTATISTICS FOR ASTEP FOURNO-METRIC Choose bar or pie graphs VARIABLE? select frequency STEP FIVE STEP EIGHT select the non-metricselect Chartvariable STEP SEVENSTEP SIXselect mode or mode and median (based on if theselect statistics at thevariable is nominal orendordinal respective Figure 2.1.0: Steps in Analyzing Non-metric data67 67. 2.1.1a: INTERPRETING NON-METRIC (or Categorical) DATA NOMINAL VARIABLE (when there are not missing cases) Table 2.1.1a: Gender of respondentsFrequency Percent Valid PercentMale 150 69.4 69.4 Gender: Female 6630.6 30.6 Total216 100.0100.0Identifying Non-missing Cases: When there are no differences between the percentcolumn and those of the valid percent column, then there are no missing cases.How is the table analyzed? Of the sampled population (n=21612), 69.4% were malescompared to 30.6% females. 12 The total number of persons interviewed for the study. It is advisable that valid percents are used in descriptive statistics as there may be some instances then missing cases are present with the dataset, which makes the percent figure different from those of the valid percent (Table 2.1.1b). 68 68. NOMINAL VARIABLE: Establishment of when missing casesTable 2.1.1b: General HappinessFrequency Percent Valid PercentVery happy 467 30.8 31.1 General Happiness: Pretty happy 872 57.5 58.0Not too happy165 10.9 11.0Missing Cases130.9- Total1,517 100.0100.0 Identifying Missing Cases: In seeking to ascertain missing data (which indicates that some of the respondents did no answer the specified question), there is a disparity between the values for percent and those in valid percent. In this case, 13 of 1,517 respondents did not answer question on general happiness. In cases where there is a difference between the two aforementioned categories (i.e. percent and valid percent), the student should remember to use the valid percent. The rationale behind the use of the valid percent is simple, the research is about those persons who have answered and they are captured in the valid percent column. Hence, it is recommended that the student use the valid percent column at all time in analyzing quantitative data. Interpretation: Of the sampled population (n=1,517), the response rate is 99.1%(n=1,504)13. Of the valid responses (n=1,504), 31.1% (n=467) indicated that they werevery happy, with 58.0% (n=872) reported being pretty happy, compared to 11.0%(n=165) who said not too happy. 13 Because missing cases are within the dataset (13 or 0.9%), there is a difference between percent and valid percent. Thus, care should be taken when analyzing data. This is overcome when the valid percents are used.69 69. Owing to the typology of the variable (i.e. nominal), this may be presented graphical byeither a pie graph or a bar graph.Pie graphFemale, 30.6, 31% Male, 69.4,69% Figure 2.1.1: Respondents gender ORBar graph70605040302010 0MaleFemale Figure 2.1.2: Respondents gender 70 70. ORDINAL VARIABLETable 2.1.2: Subjective (or self-reported) Social ClassFrequency PercentValid Percent Social class: Lower 100 46.3 46.3Middle104 48.1 48.1Upper 125.650.6 Total 216 100.0100.0 Interpreting the Data in Table 2.1.2:When the respondents were asked to select what best describe their social standing, of the sampled population (n=216), 46.3% reported lower (working) class, 48.1% revealed middle class compared to 5.6% who said upper middle class. Based on the typology of variable (i.e. ordinal), the graphical options are (i) pie graph and/or (2) bar graph.Note: In cases where there is no difference between the percent column and that of valid percent, researchers infrequently use both columns. The column which is normally used is valid percent as this provides the information of those persons who have actually responded to the specified question. Instead of using valid percent the choice term is percent. 71 71. 50 4548.1 4046.3 35 30 25 20 15 1055.60 Lower class Middle classUpper middleclass Figure 2.1.3: Social class of respondentsOr Upper middle class, 5.6Lowerclass, 46.3 Middleclass, 48.1Figure 2.1.4: Social class of respondents 72 72. 2.1.1b: STEPS IN INTERPRETING METRIC VARIABLE: METRIC (i.e. scale or interval/ratio) STEP ONE STEP TENKnow the metricvariable (Age)STEP TWO Analyze the output (use Table 2.1.3)Select AnalyzeSTEP TENSTEP THREESelect descriptiveselect paste or okstatisticsHOW TO DO STEP NINE DESCRIPTIVESTATISTICS FOR STEP FOUR Choose histogramA METRIC with normal curveVARIABLE? select frequencySTEP FIVESTEP EIGHTselect Chartselect the metricvariable STEP SIX STEP SEVENselect mean, select statistics atstandard deviation, the endskewness Figure 2.1.5: Steps in Analyzing Metric data73 73. INTERPRETING METRIC DATA: METRIC (i.e. scale or interval/ratio) VARIABLETable 2.1.3: Descriptive statistics on the Age of the RespondentsNValid 216 Missing 0Mean20.33Median20.00Mode20Std. Deviation1.692Skewness2.868Std. Error of Skewness .166 Of the sampled population (n=216), the mean age of the sample was 20 yrs and 4 months (i.e. 4 = 0.33 x 12) 1 yr. and 8 months (i.e. 8 = 0.692 x 12), with a skewness of 2.868 yrs. Statistically an acceptable skewness must be less than or equal to 1.0. Hence, this skewness in this sample is unacceptable, as it is an indicator of errors in the reporting of the data by the respondents. With this being the case, the researcher (i.e. statistician) has three options available at his/her disposal. They are (1) to remove the skewness, (2) not use the data because of the high degree of errors and (3) use the median instead of the mean. It should be noted that all the measure of central tendencies (i.e. the arithmetic mean, arithmetic mode and the arithmetic median) are about the same (i.e. mean 20.33, mode 20.0, and median 20.0). This situation is caused by extreme values in the data set. Hence, in this case, the arithmetic mean is disported by the values (or value) and so it is not advisable this be used to indicate the centre of the distribution. (See below how this is done in SPSS)The figure below is to enable readers to have a systematic plan of how to arriveat the SPSS output for analyzing a metric variable (for example age of respondents).Following the figure, I implement the plan in an actual SPSS illustration of how this isdone.74 74. Step One:ANALYZE Figure 2.1.6: Running SPSS for a Metric variable75 75. Step Two:Descriptive statistics Figure 2.1.7: Running SPSS for a Metric variable76 76. Step Three: selectFrequency Figure 2.1.8: Running SPSS for a Metric variable77 77. Step Four:Select themetricvariable The metricvariable inthis case is age Figure 2.1.9: Running SPSS for a Metric variable 78 78. Step Five select the metric variable from over here toto here Figure 2.1.10: Running SPSS for a Metric variable79 79. to the end of Step Five, youll see statistics select it Figure 2.1.11: Running SPSS for a Metric variable80 80. Step Six:A metricvariablerequires thatyou do themeanChoose the following:SD, minimum, rangeselect skewness, kurtosis Figure 2.1.12: Running SPSS for a Metric variable 81 81. Step Seven: To the end of Step Five, you will see Charts; this means you should select Histogram with normal curve Figure 2.1.13: Running SPSS for a Metric variable 82 82. Step Nine:select run, which is thisKey Step Eight:Highlight the argument Figure 2.1.14: Running SPSS for a Metric variable 83 83. Step Ten:Final Output, which the researcher will now analyze Figure 2.1.15: Running SPSS for a Metric variable84 84. Histogram 120Step Eleven:100 This is pictorial of thedistribution of the metricvariable, age 80 60 n u q F y c e r 40 20Mean = 34.95 Std. Dev. = 13.5660N = 1,28020 406080 Age on your last birthday? Figure 2.1.16: Running SPSS for a Metric variable85 85. 2.1.2a: MISSING (i.e. NON-RESPONSE) CASES Table 2.1.4: From the following list, please choose what the most important characteristic of democracy are for you FrequencyPercentOpen and fair election31423.5An economic system that guarantees a dignified salary 17713.2Freedom of speech 32124.0Equal treatment for everybody29522.0Respect for minority 35 2.6Majority rules 54 4.0Parliamentarians who represented their electorates52 3.9A competitive party system 47 3.5Dont know/No answer43 3.214Total1338 100.0 Source: Powell, Bourne and Waller 2007, 11Of the sampled population (n=1,338), when asked From the following list, please choose what is four you the most important characteristic of democracy ?, 23.5% (n=314) open and fair elections 13.2% (n=177) remarked An economic system that guarantees a dignified salary, 24.0% (n=321) said Freedom of speech , 22.0% (n=295) indicated Equal treatment for everybody by courts of law, 2.6% (n=35) mentioned Respect for minorities, 4.0% (n=54) felt Majority rule, 3.9% (n=52) believed Members of Parliament who represent their electors, and 3.5% (n=47) informed that A competitive party system compared to 3.2% (n=43) who had no answer (i.e. Dont know/No answer), which is referred to as missing values or, see note 4. 14 Dont know/no answer is an issue of fundamental importance in survey research. This is called non- response. 86 86. The issue of non-response becomes problematic whenever it is approximately 5%, ormore (see for example George and Mallery 2003, chapter 4; Tabachnick and Fidell 2001,chapter 4; Thirkettle 1988, 10). Missing data are simply not just about non-response,but they may distort the interpretation of data in case of inferential statistics. In someinstances that they are so influential that they create what is called, Type II error.According to Thirkettle 1998, Unless every person to be interviewed is interviewed theresults will not be valid. Non-response must therefore be kept to negligible proportions(Thirkettle 1988, 10). Thirkettles perspective is idealistic, and this is not supported byant of the other scholars to which I have read (see for example Babbie, Halley and Zaino2003; George and Mallery 2003; Tabachnick and Fidell 2001; Bobko 2001; Willemsen1974). The issue of what is an unacceptable non-response rate is 20%. When thismarker is reached or surpassed, researchers are inclined not to use the variable. Thus, inthe case of Table 2.1.4, a non-response rate of 3.2% is considered to be negligible. Furthermore, missing data is simply not about non-response from theinterviewed but it is the difficulty of generalizability that it may cause, which posses theproblem in data analysis. Its seriousness depends on the pattern of missing data, howmuch is missing, and why it is missing (Tabachnick and Fidell 2001, 58). According to Tabachnick and Fidell (2001): The pattern of missing data is more important than the amount missing. Missingvalues scattered randomly through a data matrix pose less serious problems.Nonrandomly missing values, on the other hand, are serious no matter how few ofthem there are because they affect the generalizability of results (Tabachnick andFidell 2001, 58). He continues thatIf only a few data points, say, 5% or less, are missing in a randomly pattern forma large data set, the problems are less serious and almost any procedure forhandling missing vales yields similar results (Tabachnick and Fidell 2001, 59). 87 87. 2.1.2b: TREATING MISSING (i.e. NON-RESPONSES) CASESUnlike a dominant theory which is generally acceptable by many scholars, the constructof missing data is fluid. Thus, I will be forwarding some of the arguments that exist onthe matter.Fundamentally, the handling of missing cases primarily rest in the following categorizations. These are (1) if the cases are less than 5%, (2) number of non-response exceeds 20% and (3) randomly or non-randomly distributed with the dataset. Scholars, such as Thirkettle (1988) ands Tabachnick and Fidell (2003) believe that in the event that the number of such cases are less than or equal to 5%, they are acceptable. On the other hand, in the event when such non-responses are more than or equal to 20%, those variables are totally dropped from the data analysis. Thus, according to Tabachnick and Fidell 2001, chapter 4; George and Mallery 2003, chapter 4, these are the available options in manipulating missing cases: drop all cases with them; deletion of cases (i.e. this is a default function of SPSS, SAS, and SYSTAT); impute values for those missing cases- insert series mean15,16 mean of nearby points, median of nearby points; using regression (i) linear trends at point, and (ii) linear interpolation; expectation maximization (EM)17, 18 using prior knowledge, and multiple imputation 15It is best to avoid mean substitution unless the proportion of missing is very small and there are no other options available to you (Tabachnick and Fidell 2001, 66) 16Series mean is by far the most frequently used method (George and Mallery 2003, 50) 17EM methods offer the simplest and most reasonable approach to imputation of missing data. as long as you have access to SPSS MVA (Tabachnick and Fidell 2001, 66) 18 Regression or EM. These methods are the most sophisticated and are generally recommended (de Vaus 2002, 69) 88 88. CONCLUSIONThe issue of how to treat missing variables is as unresolved as the inconclusiveness of aSupreme Being, God and as the divergence of views on the same.One scholarforwards the view that 10% of the data cases can be missing for them to be replaced bymean values (Marsh 1988), whereas another group of statisticians Tabachnick andFidell (2004) believed that not more than 5% of the cases should be absence, forreplacement by any approach. The latter scholars, however, do not think that a 5%benchmark in and of itself is an automatic valuation for replacement but that theresearcher should test this by way of cross tabulation. This is done with some othervariable(s) in an attempt to ascertain if any difference exists between the responses andthe non-responses. If on concluding that no-difference is present between the responsesand the non-responses, it is only then that they subscribe to replacement of missing datawithin the dataset. Hence, missing data are replaced by one of the appropriatemathematical technique series mean, mean of nearby points, median of nearbypoints, linear interpolation, and/or linear trends at points.The perspective is not the dominant viewpoint as within the various disciplines,some scholars are purist and so take a fundamental different stance from other who mayrelax this somewhat.One of the difficulties is for social researchers and upcoming practitioners of thecraft are to grasp their disciplines delimitations and some of the rationale which arepresent therein in an effort to concretize their own position grounded by someempiricism. In keeping with this tradition, I will present a discourse on the matter; and I 89 89. will add that scholars should be mindful of what obtains within their craft. It should benoted that sometimes these premises are best practices and in other instances, they aremerely guide and not laws. On the other hand, in a dialogue with Professor of Demography at the Universityof the West Indies, Mona, C. Uche, PhD., he being a purist of the Chicago School,believe than the arbitrary substitution of non-responses can be a misrepresentation of theviews of the non-respondents, and so he advice researcher do to take that route, even ifthe cases are less than 5%. In a monologue with Professor of Applied Sociology, Patricia Anderson, PhD.,from the same Chicago School held the view that while it is likely to replace missing datapoint for a variable, in the case in Jamaica non-response should be taken as is. Sheargued that no answer, in Jamaica, is somewhat different from those who are indicatedchoiced responses. Thus, if the researcher substitution missing cases with mean valueor any other technique for that rather, he/she runs the risk of misrepresenting the socialreality. With Marsh, Tabachnick and Fidell, Uche, and Anderson, we may conclude thisdiscourse has many more time left in its wake. Thus, the treatment of missing valuesmust be left up to the researcher within the context of society and any validation of achosen perspective.90 90. CHAPTER 33.1.0: HYPOTHESIS: INTRODUCTION All research is based on the premise of an investigation of some unknown phenomenon.Quantitative studies, on the other hand, are not merely to provide information but it issubstantially hinged on the foundation of hypothesis testing, as this allows for somelogical way of thinking. Therefore, this chapter focuses on the continuation of Chapter 2,while further the research process, which is the use of hypothesis, and the use ofappropriate statistical test in an effort to validate the hypothesis of the research, inquestion. One author argues that it is widely accepted that studies should be gearedtowards testing hypothesis (Blaikie 2003, 13). He continues that when research startsout with one or more hypotheses, they should ideally be derived from a theory of somekind, preferably expressed in for of a set of propositions (Blaikie 2003, 14). The use of hypothesis, in objectivism, is not limited to examination of some pasttheories, but without this the realities that social scientists seek to explore become moreso a maze, with no ending in sight. According to Blaikie 2003, Hypotheses that areplucked out of thin air, or are just based on hunches, usually makes limited contributionsto the development of knowledge because they are unlikely to connect with the existingstate of knowledge (Blaikie 2003, 14). Thus, I will begin the definition of the construct, hypothesis. Then I will proceedwith a full interpretation of the results beginning with the germane univariate data (see91 91. for example chapter 2) followed by the most suitable associational test (see chapter 1),given the levels of measurement. 3.1.1: DEFINITIONS OF HYPOTHESISA hypothesis is a preposition of a relationship between two variables: a dependent and an independent (Babbie, Hally, and Zaino 2003, 12). The dependent variable is influenced by external stimuli (or the independent variable), and the independent variable is actually acting on its own to cause, or lead to an impact on the dependent. According to Babbie, Hally and Zaino, A dependent variable is the variable you are trying to explain (Babbie, Hally and Zaino 2003, 13).Boxill, Chambers and Wint (1997), on the other hand, write that a Hypothesis a non- obvious statement which makes an assertion establishing a testable base about a doubtful or unknown statement (Boxill, Chambers and Wint 1997, 150).With Neuman (2003) stating that a hypothesis is The statement from a causal explanation or proposition that has a least one independent and one dependent variable, but it has yet to be empirically tested (Neuman 2003, 536).Another group of scholars write that a hypothesis is A statement about the (potential) relationship between the variables a researcher is studying. They are usually testable statements in the form of predictions about relationships between the variables, and are used to guide the design of studies. (Furlong, Lovelace and Lovelace 2000, G8). Every hypothesis must have two attributes. These are (1) a dependent variable, and(2) an independent variable. Thus, embedded within each hypothesis are at least twovariables. So as to make this easily understandable, I will a few examples. There is an association between breakfast consumption and ones academic performance DV (dependent variable) academic performance; and IV (independent variable) breakfast consumption. Determinants of wellbeing of the Jamaica elderly (such a hypothesis require the use of multiple regression analysis as they possesses a number92 92. of different causal factors. Hence, the DV is wellbeing. And IVs are educational attainment; biomedical conditions; age cohorts of the elderly(young elderly, old-elderly and the oldest-old elderly); union status; areaof residence; social support; employment status; number of people inhousehold; financial support; environment conditions; income; cost ofhealth care; exercise;3.1.2: TYPOLOGIES OF HYPOTHESISIn social research hypotheses are categorized as either (1) theoretical or (2) statistical.According to Blaikie (2003) Statistical hypotheses deal only with the specific problemof estimating whether a relationship found in a probability sample also exists in thepopulation (Blaikie 2003, 178).This textbook will only use statistical hypotheses. Furthermore, statistical hypotheses arewritten as null, Ho19 and alternative, Ha20. The Ho indicates no statistical association inthe population; whereas the Ha denotes a statistical association in the population betweenthe dependent and the independent variable (s). Furthermore, a statistical hypothesis maybe either directional or non-directional. 19In regression analysis, the null hypothesis, Ho: = 0. 20When using regression analytic technique, the alternative hypothesis, Ha : 093 93. 3.1.3: DIRECTIONAL AND NON-DIRECTIONAL HYPOTHESESNON-DIRECTIONAL HYPOTHESESNon-directional hypotheses exist whenever the researcher has not specified any directionfor the hypothesis: The examples here are as follows: Politicians are more corrupt than Clergymen; There is an association between number of hours spent studying and the examination results had; Men are less likely to be personal secretaries than women; curative care, preventative care, social class, educational attainment, and types of school attended are determinants of well-beingDIRECTIONAL HYPOTHESESDirectional hypotheses exist when the researcher specifies a direction for the hypothesis: 1. Positive relationship meaning an increase in one variable sees an increase in other variable(s): - An increase in ones age is associated with a direct change in moreyears of worked experiences; There is a positive relationship between educational attainment andincome received; There is a direct relationship between fertility and populationincreases.2. Negative relationship meaning an increase in one variable result in a reduction in other variable(s): -94 94. An increase in ones age is associated with a reduction in physical functioning; There is an inverse relationship between educational attainment and the fertility of a woman; There is an inverse relationship between the number of hours the West Indian crickets spent practice and them failing; 3.1.4a: OUTLIERS Despite the fact that it is mathematically appropriate to compute the mean forinterval and ratio data [i.e. metric or scale data], there are times when the medianmay be more descriptive measure of central tendency for interval and ratio databecause highly irregular values (called outliers) [exist] in the data set [and these]may affect the value of the mean (especially in small sets of scores), but they haveno effect on the value of the median (Furlong, Lovelace and Lovelace 2000,94-95). It is on this premise that median is used instead of the mean as a measure ofcentral tendency. Statistically, the mean is affect by extremely large or small values,which explains the reason for the skewness that exists in the descriptive statistics forinterval/ratio variables. Thus, care must be taken in using highly skewed data for ahypothesis. In the event that the researcher intends to use the skewed variable as is,he/she should ensure that the statistical test is appropriate for this situation (see ChapterI). Otherwise, the information that is garnered is of no use.95 95. In the event that outliers are detected within a variable, the researcher shouldexplore his/her available options before a decision is taken on any particular event. Ifskewness (i.e. an indicator of outliers) is detected, this does not presuppose that mean isinappropriate as some statisticians argue that an acceptable value is approximately 1. The social research should be cognizant that outliers are not only an issue inmetric variable but may also be present in categorical variables. According toTabachnick and Fidell: Rummel (1970) suggests deleting dichotomous variables with 90-10 splitsbetween categories or more both because the correlation coefficients betweenthese variables and others are truncated and because the scores for the cases in thesmall category are more influential than those in the category with numerouscases (Tabachnick and Fidell 2001, 67)3.1.4b: REASONS for OUTLIERS data recording entry; Instrumentation error - the item entered in the particular category,may be different from those previously entered. 3.1.4c: IDENTIFICATION of OUTLIERS mathematically using skewness; graphical approach. 3.1.4d: TREATMENT of OUTLIERS If data entry correct this by using the questionnaire, then redothe analysis; If instrumentation drop the case(s). 96 96. 3.1.5: STATISTICAL APPROACHES FOR ADDRESSING SKEWNESS However, if the skewness happens to be more than the absolute value of 1 (i.e. thenumerical value without taking into consideration the sign for the value), the followingshould be sought in an attempt to either (i) remove the skewness, or (ii) reduce theskewness. These options are as follows: i) Log10 the value; ii)Loge or ln, the value; iii) Square root, the variable; iv)Square, the variable. In the event that we are unable to reduce or remove skewness, the researchershould not use the mean as a measure of the average as it is affect by outliers21 whichare present within the dataset. In addition, he/she should ensure that the variable inquestion, for the purpose of hypothesis testing, is in keeping with a statistical test that isable to accommodate such a skewness (see Chapter I). In order to provide a better understanding the construct in this text, I will presenteach hypothesis in a new chapter. 21 An outlier is a case with such an extreme value on one variable ( a univariate outlier) or such a strange combination of scores on two or more variables (multivariate outlier) that they distort statistics (Tabachnick and Fidell 2001, 66)97 97. 3.1.6: LEVEL OF SIGNIFICANCE and CONFIDENCE INTERVALSetting the level of confidence is a critical aspect of hypothesis testing in quantitativestudies. A confidence interval (CI) of 95% means that we may reject the null hypothesis,Ho, 5% of the time (level of significance = 100% minus CI or CI = 100% minus level ofsignificance). According to Blaikie, If we do not want to make this mistake [level of significance), we should set thelevel as high as possible, say 99.9%, thus running only a 0.01% risk. Theproblem is that the higher we set the level, the greater is the risk of a type II error[see Appendix II]. Conversely, the lower we set the level [of significance], thegreater is the possibility of committing a type I error [see Appendix II] and thepossibility of committing a type II error. (Blaikie 2003, 180)In the attempt to complete research projects and/or assignments, we sometimesfail to execute all the assumptions that are applicable to a particular variable. Eventhough we would like to examine the association and/or causal relationships that exitbetween or among different variables (i.e. hypothesis testing), this anxiety should notovershadow ones adherence to the statistical principles, which are there to guide thesoundness of the interpretation of the figures. Thus, care is needed in ensuring that weapply mathematical appropriateness prior to the execution of hypothesis testing. The chapters that will proceed from here onwards will utilize the precedingchapter and this one. In that, I will commence each chapter with a hypothesis followedby presentation of the appropriate descriptive and inferential statistics. The socialresearcher should not that the hypothesis will be separated into variables; this will allowme to apply the most suitable inferential tools as was discussed in chapter I and II. 98 98. I am cognizant that undergraduate students would want a textbook that do theirparticular study but this book is not that. This textbook seeks to bridge that vortex, whichis how do I interpret various descriptive and inferential statistics? Hence, I have soughtto provide a holistic interpretation of the data analysis section of a study, usinghypotheses. Hypothesis testing disaggregates generalizations into simple propositionsthat can be verified by empirical, which is rationale for using them to depict the logicalprocesses in data interpretation. 99 99. CHAPTER 4It may appear from you reading thus far that descriptive statistics is presented separatelyfrom inferential statistics in your paper, and that they are disjoint. A research is a whole,which requires descriptive and sometimes inferential statistics.It should be notedhowever that a study may be entirely descriptive (see for example Probing JamaicasPolitical Culture by Powell, Bourne and Waller 2007) or it may some association,causality or predictability (i.e. inferential statistics). If project requires inferentialstatistics, then a fundamental layer in the data analysis is the descriptive statistics. Theuse of the inferential statistics rests squarely with the level of measurement, thetypologies of variable and the set of assumptions which are met by the variables.Tabachnick and Fidell (2001) aptly summarize this fittingly when they said that: Use of inferential and descriptive statistics is rarely on either-or proposition. Weare usually interested in both describing and making inferences about a data set.We describe the data, find reliable difference or relationships, and estimatepopulation values for the reliable findings. However, there are more restrictionson inferences than there are on description (Tabachnick and Fidell 2001, 8)In keeping with providing a simple textbook of how to analyze quantitative data,the previously outlined chapters have sought to give a general framework of what isexpected in the interpretation of social science research. This is only the base; as such, Iwill not embark, from henceforth, to provide the readers with worked examples ofdifferent hypotheses, in each chapter, and the inclusion of detailed interpretations of thosehypotheses, from a descriptive to an inferential statistical perspective.100 100.