Section 2 - Getting Started
-
Upload
api-19867504 -
Category
Documents
-
view
122 -
download
7
Transcript of Section 2 - Getting Started
11 Core Skill Training Session Six: “Data Analysis”
Session 2“Getting Started”
Core Skills for Data ProcessingORSC 2004 - Internal Training
2
Objective
At the end of the training program, participants should be able to
Understand data layouts
Understand how tables will look like
Defining data structure for various formats of data
Understand coding conventions
Get an appreciation of basic elements
3
Various data formats
Questionnaire data can be computerised in many ways
Market Research software mostly uses FLAT files
There are customised software available for capturing MR data
QINPUT, MERLIN, Surveycraft are some of the most popular ones
4
Single Card data
1000290022 00061860200310041324 040800100000000000 1.3979167
1000390022 00061860200310041359 040800100000000000 0.6460563
1001210022 00061860200310041249 040800100000000000 0.8865789
1013240022 00061867200310051800 040800100000000000 0.6759740
1013250022 00061867200310051831 040800100000000000 0.8857447
1013260022 00061867200310051842 040800100000000000 1.3810526
1013300022 00061867200310051857 040800100000000000 1.5300000
1015240022 00062321200310041216 040800100000000000 1.4328262
Serial Number/ Respondent IDR1
R2
R3
R4
R5
Record length
Respondent ID is the unique ID for the recordNumber of lines in the file = Sample SizeMaximum Length of record = 32,767 (Size of integer)
5
Multicard data
00048011 01 04070917213204070917374232570237550000480202837525750 111020744t242-345235849862468-24860004803 1 111-4 208050505050810 2452486098240960004804001010 55334333333433453145555413155 6468900004805 2115245444433353443442343435514334333 42592400070011 01 040709173010040709175624 245982496000700201395277173 2310190746464640600007003 1 112-7 105080803050308 4262460007004030707 335435532455335352554523555555530007005 21113123322&2133222122431232323212313
R1
R2
Each respondent will have more than 1 line of information called “CARD”In general the length of card is 99 charactersCan also have more than 99 card lengthUnique identification in this data format is Respondent ID + Card IDMaximum Length of record = 32,767 (Size of integer). Maximum recordLength in this case is sum of record lengths of all cards
6
Quantum data format
Quantum can handle both single card/ multicard data formats
In both the formats, quantum allows something called multi-punch
In multi-punch data format, each column is capable of holding 12 values – the individual constants, 0123456789-&.
Any combination of the above 12 codes (punches) can exist in a single column
The advantage of using this format is more data can be fit into the available maximum record length – 32,767 chars
7
Introducing Quantum – What does it do?
Check and validate the data
Edit and correct the data
Produce different types of lists and reports of data
Produce new data files
Recode data and produce new variables
Generate tables
Perform Statistical Calculations
8
Underlying concepts
EditSection
For each questionnaire:-Check and correct data-Modify/ Recode data
TabulationSection
Count questionnairesProduce TablesFormat tables
Quantum consists of 2 phases or sessions
9
Underlying concepts
Edit section
•Data examination•Data modification•Data correction
Tables section •Cross tabulation of data •Control statements to determine layout
10
Layout of a table
X-break
Base size
Table title
Base Title
Frequency
Percentage
Mean score
Project Heading
Side headings
11
Coding conventionsA Quantum program is a file created using an editor –
Text editor
The tables section consists of statement types
Each statement starts on a new line
Each statement consists of parameters and options
A statement may be up to 200 characters
The standard Quantum separator is the semi-colon (;)
Long statements may be continued on new lines with a + in the first position. In certain cases long statements may be continued with a ++ in the first position
Comments are denoted by /* at the start of the line. You may see Quantum programs that use C at the start a line for comments.
12
Coding conventions
A Sample of Quantum Program
/*
/* Here is a comment
/*
tab q5 brk1;c=c115’1’;nz
+dsp
13
Fundamentals and Terminology
14
Fundamentals
Individual constants
These are ASCII characters or multicodes which are any combination of the codes 1234567890-& or blank alone. They are enclosed in single quotes: ‘1’ ‘2’ ‘123’ ‘ ‘…. A slash (/) between two numbers denotes ‘through’ in the order &-01234567890-&.
Punch codes are referenced in apostrophes. Punches are listed individually and range of punches is denoted by a / to represent through
Examples:
‘1’ Punch 1 ; ‘123’ Punches 1 or 2 or 3
‘1/5’ Punches 1 or 2 or 3 or 4 or 5; ‘ ‘ no punches (blank)
Order of punches is & - 0 1 2 3 4 5 6 7 8 9 0 - &
‘&/9’ is the same as ‘1/&’
15
Fundamentals
Individual constants
The – punch is sometimes referred as the 11th or X punch, and & is sometimes referred as 12th or Y or V punch.
Each code represents one answer to a question. For example,
‘What is your favorite color?’ which has the response list:
Red : 1
Yellow : 2
Blue : 3
Green : 4
Black : 5
White : 6
coded into one column. If my favorite color is green, this will appear in the data file as a
4 in the appropriate column, just as if your favorite color is red, there will be a 1 in that
column.
16
Fundamentals
Strings of Data Constants
Strings are lists of single ASCII characters. They are enclosed in dollar signs ($).
Strings are referenced in dollar signs
Refer to more than one column of data
Examples:
$1234$
$ABC$
$ $
17
Fundamentals
Numbers
- Whole Numbers
- Real Numbers
Variables: Variables or arrays may be defined as being data, integer or real types. Names up to 10 chars.
Example: int unit 1
real weight 10s
whenever “s” is used varn is interpreted as var(n)
18
Variables/ column referencing
Columns are referred by their actual position in the data. This means, if you open the data file in any editor and see the cursor position on which the data is highlighted, the column position refers to the cursor position
In the case of single card data file, the actual column position itself is directly used for referring to a column. For example, c12 refers to column 12 in a single card data file
In the case of milticard data file, the column should be referred in combination with the card number. The format of column referencing is “cXNN” if the number of cards are less than 9 and “cXXNN” if the number of cards are more than 9. Where X refers to the card number and NN refers to the column position. One digit column positions should be referred by preceding the column number with “0”.
Example: c108 refers to 1st card 8th column
c412 refers to 4th card 12th position
c1009 refers to 10th card 9th position
19
Variables/ column referencing
A series of columns may be considered as either string or numeric and is referenced as c(m,n) where m is the start column position and n is the end column position
Examples:
c(12,15) refers to columns 12 to 15 in a single card data file
c(106,110) refers to columns 6 to 10 of 1st card in a multicard data file
20
Describing Data Structure
21
Data Structure
By default Quantum reads one record or a line from your data file at a time. Each record may be up to 100 columns long
Most Market Research surveys consist of multi-card records
Some surveys consist instead of long records with more than 100 columns of data
These data structure must be described on the struct statement
Format: struct;options
The “struct” statement must be the first statement in your program
22
Data Structure – contd..
Specifying Long records
struct;reclen=n
where n is the length of the record in columns
the maximum length of a record is approximately 32,000 columns
Specifying Multi-card Data Sets
This is the most common form of struct statement
struct;read=2;ser=c(m,n);crd=c(p,q)
Where, read = 2 denotes a multi-card set; ser = defines the columns of the serial number; crd = defines columns of the card number
Example: struct;read=2;ser=c(1,4);crd=c80
23
Data Structure – contd..
When a multi-card set is read, the cards are defined as follows:
Card 1 Columns 101-200
Card 2 Columns 201-300
Card 3 Columns 301-400
Card 4 Columns 401-500
…..
Card 10 Columns 1001-1100
By default a maximum of 9 cards are permitted in a set.
Reading Multi-card data sets with 10 or more cards
The option max=n is used to define the maximum number of cards in the set
Example:
struct;read=2;ser=c(1,5);crd=c(6,7); max=19
24
Data Structure – contd..
Checking the structure of multi-card data sets
Quantum automatically checks for - Duplicate card types within serial number and adjacent duplicate serial numbers
It is not mandatory that all cards should be present for every respondent in a multicard data file
It is possible check that specific cards are present using req=
Example:
struct;read=2;ser=c(1,5);crd=c(6,7); max=19;req=1,2
In this example each record must have a card 1 and 2 present. If either or both are missing the record will be rejected
If you require a series of cards to be present specify the first and last separated by a slash
struct;read=2;ser=c(1,5);crd=c(6,7); max=19;req=1/5