The Utility of Metadata for Questionnaire Design and Evaluation

24 April 2007 QUEST2007: Statistics Canada, Ottawa, Canada

1

The Utility of Metadata for Questionnaire Design and Evaluation

Jim EspositoBureau of Labor StatisticsWashington, DC

Disclaimer: The views and opinions expressed in this presentation are those of the presenter/author and not necessarily those of the Bureau of Labor Statistics or the Bureau of the Census.


2

Objectives of Presentation

To draw attention to the concept of metadata and to its scope and relevance

To describe a case study involving the measurement of work/employment that illustrates the utility of metadata in evaluating and designing questionnaire items


3

Metadata: An Informal Definition

Metadata can be defined as any information (verbal or numeric or code, qualitative or quantitative) that provides context for understanding survey-generated data: Domain-specific/ethnographic information Concepts and question objectives Questionnaire items and administration modes Instructional materials Pre- and post-survey evaluation research Classification algorithms


4

The Measurement of Labor-Force Status via the CPS

Current Population Survey [CPS] Official source of LF statistics in USA (e.g.,

monthly unemployment rate) CPS measures work, not jobs 60,000 households a month Principal LF categories: Employed [EMP],

unemployed [UE], not-in-the-labor-force [NILF] Employed: Work for pay, one hour or more;

unpaid work in family business, 15 hours or more; job (but absent last week)

Data collected monthly via two modes [face-to-face and telephone CAPI; centralized CATI]


5

CPS: Some Relevant Details

The CPS was redesigned in the early 1990s, utilized a multiple-method of evaluation plan (e.g., behavior coding, interviewer and respondent debriefings, split-ballot design) and generated a substantial amount of metadata

The CPS relies on about 16 questionnaire items to generate estimates for its three major labor force categories: EMP, UE and NILF (and various subcategories)

Again, CPS measures work, not jobs


6

The Measurement of Employment Status via the ACS

American Community Survey [ACS] Largest survey conducted in the USA; will

replace the Decennial Census “long form” 250,000 households a month Collects data on a broad range of demographic

topics (e.g., population, housing, disability status, employment status, educational attainment, health insurance)

Adheres to BLS employment concept with the same three major categories: EMP, UE and NILF

Data collected continuously via three modes [SAQ (66%), CATI and face-to-face CAPI)


7

ACS: Some Relevant Details

The ACS was developed over a series of stages (starting in the early 1990s) and achieved full implementation in 2005; there is a substantial amount of metadata documenting this process

At present, the ACS relies on the content of six CPS items (modified for use in the ACS) to generate its estimates for three employment status categories: EMP, UE and NILF

Because of methodological/procedural differences, the CPS and the ACS can not be expected to produce identical estimates


8

CPS: Work Item and DQ Issues [1]

CPS Work Question [No-business-in-household wording.] LAST WEEK, did you do ANY work for pay?

Data Quality [DQ] Issues, CPS Redesign Final evaluation phase (1992-93): Interviewers rated

this item as one of the more problematic questions on the redesigned CPS (e.g., Just my job?; Do you mean my regular job?)

On the basis of other evaluation data (e.g., behavior- coding and response-distribution analyses), these “reports” by respondents were determined not to represent a serious data-quality issue because of the likelihood of interviewer mediation and “repair work”


9

CPS: Work Item and DQ Issues [2]

Data Quality Issues (continued) Respondent debriefing data indicated that this

question did miss some marginal/paid work activity (1.6%): “In addition to people who have regular jobs, we are also interested in people who may only work a few hours per week. Last week, did [name] do any work at all, even for as little as one hour?”

The evaluation work conducted during the redesign was documented extensively by Census Bureau and BLS researchers in the 1990s (e.g., conferences; papers; book chapter); however, much of this work is not cited in ACS evaluation documents


10

ACS: Work Item and DQ Issues [1]

Current ACS LAST WEEK, did this person do ANY work for

either pay or profit? Mark (X) the “Yes” box even if the person worked only 1 hour, or helped without pay in the family business or farm for 15 hours or more, or was on active duty in the Armed Forces.

Data Quality Issues ACS underestimates employment (which

compromises estimates in the other two categories, UE and NILF)—next slide


11

CPS vs. C2000/ACS Estimates

CPS/Census-2000 Match Study “Combined-Month Sample”: February though May, 2000,

specific rotations;~86,000 addresses; wt. N: 207,875,749

CPS vs. ACS-like employment status items EMP: 64.1% vs. 62.3% (underestimate) UE: 2.7% vs. 3.4% (overestimate) NILF: 32.8% vs. 34.0% (overestimate)

Note: The employment status items from the Census-2000

long form are identical to those used in the current ACS.


12

ACS: Work Item and DQ Issues [2]

Data Quality Issues Small–scale evaluation [2004]: Expert reviews; behavior

coding; focus groups with ACS interviewers Behavior coding [CATI site; 51 HHs; 104 persons]:

INT codes: exact (78%); major changes (10%); data due in part to prior context [disability questions]

RSP codes: adequate answers (98%); other than simple yes or no (21%); examples (e.g., “For pay, yes.”; Just his “regular job.”; “No, currently unemployed.”)

Read-if-Necessary Statement: Never read Focus groups: “pay or profit” confusing; multiple-job

holders and self-employed (e.g., “Did you mean, other than my regular job?”); read-if-necessary statement rarely read; some interviewers ask about job directly


13

ACS: Revised Work Items

Revisions to ACS Work Question (1A): LAST WEEK, did this person work for pay at a

job (or business)? [If “no” to 1A, ask (1B).] (1B): LAST WEEK, did this person do ANY work for

pay, even for as little as one hour?

Rationale Current ACS work question confuses some

respondents: Why? Exploiting two-part question appears to clarify the

response task for some respondents and in so doing better achieves the objective of gathering accurate data on work activity and employment status


14

Estimates of Labor-Force/ Employment Status

2006 ACS Content Test January—March 2006; ~ 63,000 addresses, equally

split between control/current vs. test/revised groups

Current vs. revised ACS items EMP: 62.8% vs. 65.7% (plus 2.9%)* UE: 4.1% vs. 3.6% (minus 0.5%) NILF: 33.1% vs. 30.7% (minus 2.4%)*

Revised items manifest less bias and variability, as well


15

The CPS Work Item: Why might it be problematic for some respondents?

Grice (1975): Maxims on Quantity 1. Make your contribution as informative as is

required (for the current purposes of the exchange). 2. Do not make your contribution more informative

that is required.

Fowler (1995): Principles 3 and 3d. Principle 3: A survey question should be worded so

that every respondent is answering the same question.

Principle 3d: If what is to be covered is too complex to be included in a single question, ask multiple questions.


16

Invoking Grice on Quantity: Hypothetical Example [ACS/SAQ]

LAST WEEK, did you do ANY work for pay?

Respondent [full-time job]: How should I answer this [#!&?@] question? It’s doesn’t mention a “job” and probably would if that’s what they wanted to know. And it specifically says “work for pay”, so it must mean doing work on the side. OK, just check the “no” box.

Reference to a “job” is missing. [Maxim 1] “Work for pay” is specified, which would seem

superfluous (especially for someone with a full-time job): Who works all those hours for free? [Maxim 2]


17

Resolution for ACS: Two-Part Work Item

Revisions to ACS Work Question (1A): LAST WEEK, did this person work for pay at a

job (or business)? [If “no” to 1A, ask (1B).] (1B): LAST WEEK, did this person do ANY work for

pay, even for as little as one hour?

Part (1A) specifically mentions “job”, “work for pay” and “business”.

Part (1B) captures work for “as little as one hour”?

Not perfect, but better than current ACS item.


18

Closing Remarks

Even survey questions that appear simple and straightforward may not be for some respondents. [Key issues: Why and how many respondents affected?]

It is risky to import questions from one survey to another, especially when the surveys differ in terms of mode of administration (and in various other ways, too).

In evaluating and “fixing” questionnaire items, quantitative research, alone, is not sufficient.

Summary: Our best hope for optimizing data quality (i.e., minimizing measurement error) is a thorough and critical review of relevant metadata, followed by prudent design-and evaluation decisions that are informed by such reviews.


19

Thank You

Questions or comments?

Post-workshop: [email protected]

mailto:[email protected]

The Utility of Metadata for Questionnaire Design and Evaluation

Documents

Transcript of The Utility of Metadata for Questionnaire Design and Evaluation