Multi-Wavelength Data Fusion for Spitzer/Herschel EG Surveys
Data Collection and Sampling. Primary Data There are various methods for collecting primary...
-
Upload
annis-ryan -
Category
Documents
-
view
226 -
download
1
Transcript of Data Collection and Sampling. Primary Data There are various methods for collecting primary...
Data Collection and Sampling
Primary Data
• There are various methods for collecting primary (original) data– Eg questionnaire, survey, interview,
observation
• Control over investigation much greater• Can more easily avoid “data-driven”
research • Cost can be prohibitive• Pilot studies can be very helpful
Choice of method
• Shipman: choice often between sampling and case study
• Intensive versus extensive research design
• Qualitative versus quantitative data
• Interpretivists favour the former; positivists favour the latter
• All primary research involves selection
• Most methods require sampling
Sampling: general principles
• No a priori superiority of any method• Trade-offs: standardisation versus control,
generalisability versus flexibility• Shipman: sampling method used dependent on
nature of study undertaken• Basis for sample must be transparent• Cost of surveying entire population is prohibitive
(e.g. census)• Constraint of feasibility
Sampling: definitions
• Population: must be defined
• Finite population: e.g. voters
• Sampling unit: single potential member of sample
• Sampling frame: list of sampling units (NB 1936 US Presidential election)
• Sample: drawn from sampling frame
Probability Sampling
• Probability of each sampling unit being chosen is known (often equal probability)
• Simple random sampling: classic method, regarded as most reliable, least biased
• List numbered sampling frame members and select via random number generator
• Other probabilistic methods are available
Systematic sampling
• List members of sampling frame
• Choose first sample member randomly
• Then choose every Kth unit, where K=N/n
• More convenient than SRS for large popn
• Can be a systematic pattern in sample list, leading to bias; e.g. corner shops
Stratified sampling
• Divide population into groups of alike members
• Strata sizes usually proportionate to popn
• Draw randomly from groups
• Cost effective
• Ensure representativeness
• Can lead to excessive number of sub-groups
Cluster Sampling
• Select large groups• Select sampling units from clusters
randomly• Example: take a city, divide into areas,
number areas, select areas randomly, number units within areas, select units randomly
• Very cost-effective• Very good if sampling frame poorly defined
Non-probability Sampling
• Convenience sampling: select whoever is available
• Quota sampling: collect data according to proportions of the population
• Selection of subjects absolutely crucial• Requires great skill of interviewers• Snowball sampling: select next subject from
previous subject
Non-Probability Sampling
• Theoretical sampling: select those most likely to be affected by an issue
• Can ignore things which do not fit
• Can interpret observations according to the theory
• Non-prob sampling cannot claim representativeness as easily but gives much more discretion and control
Response Rates
• Another possible trade-off is on response rates
• R = 1 - (n-r)/n
• Even if initial sample size is appropriate (n’ = n/(1+(n/N)) where n = s2/SE2: see F-N and N: 194-9) response rates can be low
• Postal questionnaires: typically 20-40%
• Non-response bias
Response Rates
• Non-respondents could affect findings
• If reason for non-response is related to issue: e.g. reluctance to interview drunks hampers study on alcoholism
• Response rate can be improved by cover letter, callbacks, skill of researcher, length of questionnaire, types of question
Conclusions
• All types of primary data require selection
• If sampling used: various methods possible
• Sampling method relates to research tool
• Different data collection techniques: questionnaires, interviews, etc. - all to be studied in Research Methods 2 - all have advantages and disadvantages
Secondary Data
Introduction
• Primary quantitative data has several advantages, particularly control; qualitative data too
• Do not equate primary and qualitative
• Today: advantages of secondary data
• Searching on electronic data sources including the Internet
Secondary data
• Primary/secondary is not = qualitative/quantitative
• Qualitative can include secondary data sources such as personal documents, auto/biographies, etc.
• Secondary: collected by someone else, e.g. another academic researcher, business, government agency, etc.
Secondary data
• Used extensively in social science– Durkheim: suicide– Marx: wages, incomes, prices– Weber: church records
• Economists mainly use secondary data
Advantages of Secondary Data
• Might be the only data available• Enables longitudinal /time series work• Cheaper (cost and time) and more convenient than
primary data• Aids generalisation• Arises from natural settings
(nonreactive/unobtrusive data)• Allows replication and checking - validity
Disadvantages of Secondary Data
• May be not exactly the data required
• Differences in underlying sampling, design, questions asked, method of ascertaining information, etc.
• Differences lead to bias
• Method of data generation crucial to econometric studies
Electronic Data Sources
• Through the library system
• Through the internet
• Known versus unknown sources
• Known sources via library catalogue
• Problem of reliability/credibility is common to all electronic sources (more than non-electronic sources)
Electronic Data - Literature
• You can search by author or subject across journals, via several static websites/portals:
• www.econlit.org/
• www.sosig.ac.uk
• www.mimas.ac.uk
• www.economics.ltsn.ac.uk
• www.esds.ac.uk
Electronic Data: Databases
• There are many databases available online
• Most have standardised, national data free to download in various formats
• Common file format is .csv; but .html and even .xls files also common
• OECD: • ONS: • UN: • Penn World Tables: • BEA (US): • Ameristat:• Eurostat: • World Bank: • CIA: • US Statistical Abstract:
• See Dissertation homepage/hb
Conclusions
• Secondary data has many advantages and disadvantages relative to primary
• There is a wide range of secondary data available
• Much data is available on the internet
• Internet sources must be scrutinised more closely than other sources
Qualitative Data
Introduction
• Principals of research design and sampling basically hold for quantitative and qualitative data
• However, they apply most easily to quantitative analysis
• Qualitative analysis has different foci
• Qualitative analysis relatively (to quant; other soc sci) unused in economics
Qualitative techniques: types
• Case study
• Fieldwork (ethnography)
• Observation
• Unstructured interviews
• Analytic induction/grounded theory
• Discourse analysis
• Theoretical sampling
Qualitative techniques: principals
• Qual often = not quantitative
• Can use quant for pattern detection, qual for causal analysis
• Or use qual and quant as equals in inference (triangulation)
• Quantification often inappropriate
Qualitative techniques: principals
• Interpretivism, verstehen
• Used to be associated only with using autobiography, letters, personal documents, diaries
• Ethnography fairly recent:
• Focus on cases rather than generality
Qualitative techniques: principals
• Analysis not really a separate stage of research• Design, data collection and analysis all
simultaneous and continuous• Open-ended approach: Theory and conclusions
formed iteratively• Imagination is crucial• Recognise importance of exceptions• Context is crucial
Fieldwork
• Study of people acting in their daily lives
• Access a group but remain somewhat detached
• Approach with key questions
• Teams get range of perspectives
• Danger of self-perception and bias
Participant Observation
• Adopt perspectives of subject group in order to understand them
• Learning language, customs, behaviours, work, leisure, etc.
• Hanging around and learning the ropes • Being an outsider can changes subjects’ behaviour• Complete participation - researcher wholly
concealed – contamination and artificiality
Participant Observation
• Researchers can go native (internalise group lifestyle)
• Covert researchers can be in danger or create detrimental behaviour
• Researchers can be “piggy in the middle”• Covert: recording observations can be difficult
(e.g. need hidden cameras)• Serious ethical issues with covert observation
Employ analytic induction
• Go in with prejudices and theories
• Revise theory in light of evidence
• Generate new theories until evidence seems to fit
• Flexibility accorded but also required by the researcher
• Need to be open to disconfirming cases
Grounded theory
• Data collected• Develop categories (with inevitable
theoretical priors and language)• Categories checked by data• Once categories seem secure and grounded
in the evidence, formulate interconnection between categories
Evaluation
• Broad range of qualitative techniques • Control over the investigation; less data driven;
flexibility much greater than quantitative studies • Logistically difficult: Huge amounts of data
produced and problems with manipulation (although Nvivo will help with this)
• Must be careful to collect evidence widely to avoid bias
• Can be ethical issues re: data collection and reporting