Data Inputting, Preparing, Codding, Presenting, and Tabulating
description
Transcript of Data Inputting, Preparing, Codding, Presenting, and Tabulating
Data Inputting, Preparing, Codding,
Presenting, and Tabulating
Data Inputting by Using SPSS
Showing an example on SPSS with Likert scale
Click here
Stages of Data Analysis
Stages of Data Analysis
Raw data may not be in a form that lends itself well to analysis. Raw data are recorded just as the respondent indicated. For an oral response, the raw data are in the words of the respondent, whereas for a questionnaire response, the actual number checked is the number stored.
Raw data will often also contain errors both in the form of respondent errors and non-respondent errors. Whereas a respondent error is a mistake made by the respondent, a non-respondent error is a mistake made by an interviewer or by a person responsible for creating an electronic data file representing the responses.
Data EditingThe process of checking the completeness, consistency(tutarlılık)and legibility(açıklık)of data and making the data ready for coding and transfer to storage. Compare these two questions:
How old are you? 52 YearsHow many years have you been married? 43 Years
Comment: It is impossible, so one of these answers are incorrect.
field editingPreliminary editing by a field supervisor on the same day as the interview to catch technical omissions, check legibility of handwriting, and clarify responses that are logically or conceptually inconsistent.
in-house editingA careful editing job performed by a centralized office staff.
■ Editing TechnologyToday, computer routines can check for inconsistencies automatically. Thus, for electronic questionnaires, rules can be entered which prevent inconsistent responses from ever being stored in the file used for data analysis.
These rules should represent the conservative judgment of a trained data analyst. Some online survey services can assist in providing this service. Show a likert example in SPSS
Stages of Data Analysis
Data CodingIs the assignment of numerical scores or classifying symbols to previously edited data. Careful editing makes the coding job easier. Codes are meant to represent the meaning in the data.
Assigning numerical symbols permits the transfer of data from questionnaires or interview forms to a computer. Codes often, are numerical symbols. However, they are more broadly defined as rules for interpreting, classifying, and recording data. In qualitative research, numbers are seldom used for codes.
Pre-coding Fixed-Alternative Questions
When a questionnaire is highly structured, the categories may be pre-coded before the data are collected. This coding is useful when inputting data into SPSS.
Error CheckingThe final stage in the coding process is error checking and verification, or data cleaning, to ensure that all codes are legitimate.
For example, if “sex” is coded 1 for “male” and 2 for “female” and a 3 code is found, a mistake obviously has occurred and an adjustment must be made.
Female Male
WrongJobless Sta
ffWorker
Multivariate
Statistics
Bivariate Statistics
Univariate
Statistics
Descriptive
statistics
Statistical Methods
Variation Coefficient
Kurtosis & Skewnsess
Standard Deviation &
Varians
Means
Descriptive
statistics
Cross Tabulation
and Percentages
Kurtosis
is any measure of the "peakedness" of the probability distribution of a real-valued random variable. kurtosis is a descriptor of the shape of a probability distribution There are various interpretations of kurtosis, and of how particular measures should be interpreted; these are primarily peakedness (width of peak), tail weight, and lack of shoulders (distribution primarily peak and tails, not in between).
Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. A uniform distribution would be the extreme case.
Positive Peaked
Distribution
Negatıve flat Distribution
The Skewness For a normal distribution is zero, and any symmetric data should have a skewness near zero. Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly, skewed right means that the right tail is long relative to the left tail. Some measurements have a lower bound and are skewed right. For example, in reliability studies, failure times cannot be negative.
Positive right skewed
Negative left skewed
How to calculate Kurtosis and Skewness by Using
SPSS
Calculating Positive Peaked
Kurtosis
Distribution of Peaked Positive Kurtosis
Calculating Negative Flat
Kurtosis
Distribution of Flat
Negative Kurtosis
Calculating Skewness
Distribution of Negative left skewed
Positive right skewed
How to graph Kurtosis & Skewness
Negative left skewed
flat Kurtosis Distribution
Positive right skewed
Variation Coefficient
(/)x100
V.C. =
From the population
From the Sample
The coefficient of variation represents the ratio of the standard deviation to the mean, and it is a useful statistic for comparing the degree of variation from one data series to another, even if the means are drastically different from each other.
The higher the CV, the greater the dispersion in the variable. The CV for a model aims to describe the model fit in terms of the relative sizes of the squared residuals and outcome values. The lower the CV, the smaller the residuals relative to the predicted value. This is suggestive of a good model fit.
Variation Coefficient
InterpretationHomogenousRelatively
homogenousNo homogenous nor heterogeneous Relatively
HeterogeneousHeterogeneous
Positive Peaked Kurtosis
V.C.= (2,10442/5)x100=42
No homogenous nor heterogeneous
Negative Flat Kurtosis
V.C.= (0,61237/3)x100=20
Homogenous
Negative left skewed
V.C.= (788,88/1430)x100=55
No homogenous nor heterogeneous
Positive right skewed
V.C.= (991,79/676)x100=100
Heterogeneous
Graphics and Pie Charts
Years Unemployment in x Country
2007 52008 82009 102010 92011 82012 122013 10
Unemployment Rates in x Country
Line Graphs
Unemployment By Years in X Country
Histogram With Two Varıables
Unemployment by Gender in x Country
Pie Chart
100%
Tabulation and Cross Tabulation
Tabulation is a descriptive methods aimed to classify and arrange the raw data into readable, understandable, interpretable and visible form. This step consists of only one variable.
Example:
Let us conduct a mini survey on small group and ask which color do they prefer for their new car. The raw data has loaded into SPSS database as follow
Two-side Cross TabulationUsing tabulation with two nonmetric variablesEXAMPLE:
Let us add the gender of the participants to the mini survey as a second nonmetric variable
Total Percentage of the Cross Tabulation
22.2% of the sample is female who prefers gray color.16.7% of the sample is male who prefers white color.
Row or Column Percentage of the Cross
Tabulation
44.4% of the females prefer gray color.33.3% of the males prefer white color.
57.1% of people who prefers gray is female.60% of people who prefers white is male.
See you
Index Numbers (IN)
Scores or observations recalibrated to indicate how they relate to a base number.
Measurable variable used as a representation of an associated (but non-measured or non-measurable) factor or quantity. For example, consumer price index (CPI) serves as an indicator of general cost of living which consists of many factors some of which are not included in computing CPI. Indicators are common statistical devices employed in economics. See also economic indicators and measure.