Section 2 - Getting Started

24
1 1 Core Skill Training Session Six: “Data Analysis” Session 2 “G etting Started” Core Skills for Data Processing O R SC 2004 - Internal T raining

Transcript of Section 2 - Getting Started

Page 1: Section 2 - Getting Started

11 Core Skill Training Session Six: “Data Analysis”

Session 2“Getting Started”

Core Skills for Data ProcessingORSC 2004 - Internal Training

Page 2: Section 2 - Getting Started

2

Objective

At the end of the training program, participants should be able to

Understand data layouts

Understand how tables will look like

Defining data structure for various formats of data

Understand coding conventions

Get an appreciation of basic elements

Page 3: Section 2 - Getting Started

3

Various data formats

Questionnaire data can be computerised in many ways

Market Research software mostly uses FLAT files

There are customised software available for capturing MR data

QINPUT, MERLIN, Surveycraft are some of the most popular ones

Page 4: Section 2 - Getting Started

4

Single Card data

1000290022 00061860200310041324 040800100000000000 1.3979167

1000390022 00061860200310041359 040800100000000000 0.6460563

1001210022 00061860200310041249 040800100000000000 0.8865789

1013240022 00061867200310051800 040800100000000000 0.6759740

1013250022 00061867200310051831 040800100000000000 0.8857447

1013260022 00061867200310051842 040800100000000000 1.3810526

1013300022 00061867200310051857 040800100000000000 1.5300000

1015240022 00062321200310041216 040800100000000000 1.4328262

Serial Number/ Respondent IDR1

R2

R3

R4

R5

Record length

Respondent ID is the unique ID for the recordNumber of lines in the file = Sample SizeMaximum Length of record = 32,767 (Size of integer)

Page 5: Section 2 - Getting Started

5

Multicard data

00048011 01 04070917213204070917374232570237550000480202837525750 111020744t242-345235849862468-24860004803 1 111-4 208050505050810 2452486098240960004804001010 55334333333433453145555413155 6468900004805 2115245444433353443442343435514334333 42592400070011 01 040709173010040709175624 245982496000700201395277173 2310190746464640600007003 1 112-7 105080803050308 4262460007004030707 335435532455335352554523555555530007005 21113123322&2133222122431232323212313

R1

R2

Each respondent will have more than 1 line of information called “CARD”In general the length of card is 99 charactersCan also have more than 99 card lengthUnique identification in this data format is Respondent ID + Card IDMaximum Length of record = 32,767 (Size of integer). Maximum recordLength in this case is sum of record lengths of all cards

Page 6: Section 2 - Getting Started

6

Quantum data format

Quantum can handle both single card/ multicard data formats

In both the formats, quantum allows something called multi-punch

In multi-punch data format, each column is capable of holding 12 values – the individual constants, 0123456789-&.

Any combination of the above 12 codes (punches) can exist in a single column

The advantage of using this format is more data can be fit into the available maximum record length – 32,767 chars

Page 7: Section 2 - Getting Started

7

Introducing Quantum – What does it do?

Check and validate the data

Edit and correct the data

Produce different types of lists and reports of data

Produce new data files

Recode data and produce new variables

Generate tables

Perform Statistical Calculations

Page 8: Section 2 - Getting Started

8

Underlying concepts

EditSection

For each questionnaire:-Check and correct data-Modify/ Recode data

TabulationSection

Count questionnairesProduce TablesFormat tables

Quantum consists of 2 phases or sessions

Page 9: Section 2 - Getting Started

9

Underlying concepts

Edit section 

•Data examination•Data modification•Data correction 

Tables section •Cross tabulation of data •Control statements to determine layout

Page 10: Section 2 - Getting Started

10

Layout of a table

X-break

Base size

Table title

Base Title

Frequency

Percentage

Mean score

Project Heading

Side headings

Page 11: Section 2 - Getting Started

11

Coding conventionsA Quantum program is a file created using an editor –

Text editor

 The tables section consists of statement types

Each statement starts on a new line

Each statement consists of parameters and options

A statement may be up to 200 characters

 The standard Quantum separator is the semi-colon (;)

 Long statements may be continued on new lines with a + in the first position. In certain cases long statements may be continued with a ++ in the first position

 Comments are denoted by /* at the start of the line. You may see Quantum programs that use C at the start a line for comments.

Page 12: Section 2 - Getting Started

12

Coding conventions

A Sample of Quantum Program

 

/*

/* Here is a comment

/*

 

tab q5 brk1;c=c115’1’;nz

+dsp

Page 13: Section 2 - Getting Started

13

Fundamentals and Terminology

Page 14: Section 2 - Getting Started

14

Fundamentals

Individual constants

These are ASCII characters or multicodes which are any combination of the codes 1234567890-& or blank alone. They are enclosed in single quotes: ‘1’ ‘2’ ‘123’ ‘ ‘…. A slash (/) between two numbers denotes ‘through’ in the order &-01234567890-&.

Punch codes are referenced in apostrophes. Punches are listed individually and range of punches is denoted by a / to represent through

Examples: 

‘1’ Punch 1 ; ‘123’ Punches 1 or 2 or 3

‘1/5’ Punches 1 or 2 or 3 or 4 or 5; ‘ ‘ no punches (blank)

Order of punches is & - 0 1 2 3 4 5 6 7 8 9 0 - &

‘&/9’ is the same as ‘1/&’

Page 15: Section 2 - Getting Started

15

Fundamentals

Individual constants

The – punch is sometimes referred as the 11th or X punch, and & is sometimes referred as 12th or Y or V punch.

Each code represents one answer to a question. For example,

‘What is your favorite color?’ which has the response list:

Red : 1

Yellow : 2

Blue : 3

Green : 4

Black : 5

White : 6

coded into one column. If my favorite color is green, this will appear in the data file as a

4 in the appropriate column, just as if your favorite color is red, there will be a 1 in that

column.

Page 16: Section 2 - Getting Started

16

Fundamentals

Strings of Data Constants

Strings are lists of single ASCII characters. They are enclosed in dollar signs ($).

Strings are referenced in dollar signs

Refer to more than one column of data

Examples:

$1234$

$ABC$

$ $

Page 17: Section 2 - Getting Started

17

Fundamentals

Numbers

- Whole Numbers

- Real Numbers

Variables: Variables or arrays may be defined as being data, integer or real types. Names up to 10 chars.

Example: int unit 1

real weight 10s

whenever “s” is used varn is interpreted as var(n)

Page 18: Section 2 - Getting Started

18

Variables/ column referencing

Columns are referred by their actual position in the data. This means, if you open the data file in any editor and see the cursor position on which the data is highlighted, the column position refers to the cursor position

In the case of single card data file, the actual column position itself is directly used for referring to a column. For example, c12 refers to column 12 in a single card data file

In the case of milticard data file, the column should be referred in combination with the card number. The format of column referencing is “cXNN” if the number of cards are less than 9 and “cXXNN” if the number of cards are more than 9. Where X refers to the card number and NN refers to the column position. One digit column positions should be referred by preceding the column number with “0”.

Example: c108 refers to 1st card 8th column

c412 refers to 4th card 12th position

c1009 refers to 10th card 9th position

Page 19: Section 2 - Getting Started

19

Variables/ column referencing

A series of columns may be considered as either string or numeric and is referenced as c(m,n) where m is the start column position and n is the end column position

Examples:

c(12,15) refers to columns 12 to 15 in a single card data file

c(106,110) refers to columns 6 to 10 of 1st card in a multicard data file

Page 20: Section 2 - Getting Started

20

Describing Data Structure

Page 21: Section 2 - Getting Started

21

Data Structure

By default Quantum reads one record or a line from your data file at a time. Each record may be up to 100 columns long

Most Market Research surveys consist of multi-card records

Some surveys consist instead of long records with more than 100 columns of data

These data structure must be described on the struct statement

Format: struct;options

The “struct” statement must be the first statement in your program

Page 22: Section 2 - Getting Started

22

Data Structure – contd..

Specifying Long records

  struct;reclen=n

where n is the length of the record in columns

the maximum length of a record is approximately 32,000 columns

 Specifying Multi-card Data Sets

This is the most common form of struct statement

struct;read=2;ser=c(m,n);crd=c(p,q)

Where, read = 2 denotes a multi-card set; ser = defines the columns of the serial number; crd = defines columns of the card number

Example: struct;read=2;ser=c(1,4);crd=c80

Page 23: Section 2 - Getting Started

23

Data Structure – contd..

When a multi-card set is read, the cards are defined as follows: 

Card 1 Columns 101-200

Card 2 Columns 201-300

Card 3 Columns 301-400

Card 4 Columns 401-500

…..

Card 10 Columns 1001-1100

By default a maximum of 9 cards are permitted in a set.

Reading Multi-card data sets with 10 or more cards

The option max=n is used to define the maximum number of cards in the set

Example:

struct;read=2;ser=c(1,5);crd=c(6,7); max=19

Page 24: Section 2 - Getting Started

24

Data Structure – contd..

Checking the structure of multi-card data sets

Quantum automatically checks for - Duplicate card types within serial number and adjacent duplicate serial numbers

It is not mandatory that all cards should be present for every respondent in a multicard data file

It is possible check that specific cards are present using req=

 Example:

  struct;read=2;ser=c(1,5);crd=c(6,7); max=19;req=1,2

 In this example each record must have a card 1 and 2 present. If either or both are missing the record will be rejected

 If you require a series of cards to be present specify the first and last separated by a slash

  struct;read=2;ser=c(1,5);crd=c(6,7); max=19;req=1/5