Post on 22-Jan-2018
Data Collection, Assessment of Qualitative Data, Data Processing:
Key Issues
Bikash Sapkota
B. Optometry
Institute of Medicine, TU, Nepal
• Introduction to data
• Classification of data
• Collection of data
• Methods of data collection
• Assessment of qualitative data
• Processing of data
- Editing
- Coding
- Tabulation
- Graphical representation
Presentation Layout
What is data?
Data are observations or evidences about the social world
Data, the plural of datum, can be quantitative or qualitative in nature
‘data is produced, not given’; that is, researchers choose what to call data, it is not just ‘there’ to be ‘found’. (Marsh 1988)
- The Sage Dictionary of Social Research Methods
The terms 'data' and 'information' are used interchangeably
However the terms have distinct meanings
Data
Facts, events, transactions which have been recorded
Input raw materials from which information is
processed
Information
Data that have been produced in such a way as
to be useful to the recipient
Basic data are processed in some way to form
information
Data & Information
The research studies in behavioral science are mainly concerned with the characteristics or traits
Thus, tools are administered to quantify these characteristics
- but all traits or characteristics can not be quantified
The data can be classified into two broad categories:
Data
Qualitative Data or Attributes
Quantitative Data or Variables
Nature of Data
Nature of Data
1.Qualitative Data or Attributes
The characteristics or traits for which numerical valuecan not be assigned, are called attributes
e.g. gender, motivation, etc.
2. Quantitative Data or Variables
The characteristics or traits for which numerical valuecan be assigned, are called variables
e.g. height, weight etc.
Constants
A constant is all characteristic or condition that is the same for all the observed units or sample subjects of a study
Variables
The characteristic or the trait in the behavioral science which can be quantified is termed as variable
Variables
Continuous variables Discrete variables
Variables
1. Continuous variables
A characteristic whose observation can take any values over a particular range
It can assure either fractional or integral values E.g. wt. of children in kg, height of pt.
2. Discrete variables
Are those on the other hand, which exist only in units not the fractional value (usually units of one)
E.g. No. of cataract pts. in a village, WBC count
Attribute vs. Variable
Attribute Variable
A category of a characteristic, to which a subject either belongs or does not belong or property that a subject either possesses or does not possess
The attributes are becoming sick, describing blood group etc.
Variable describes a characteristic in terms of a numerical value, which is expressed in units of measurements
The variables are height, weight, blood pressure, age of pts. etc.
Qualitative Data
In such data there is no notion of magnitude of size of the characteristic
They are just categorized
The data are classified by counting the individuals having the same characteristics or attribute and not by measurement
For examples: Gender: male/femaleDisease: present/absentSmoke: smoking/not smoking
These data can be measured in nominal and ordinal scales
Quantitative Data
Anything that can be expressed as a number, or quantity or magnitude
Describes characteristics in term of a numerical value, which are expressed in units of measurements
E.g. level of hemoglobin in the blood, no. of glaucoma pts., intra ocular pressure, weight, etc.
Quantitative observations: as each individual is represented by a number
These data can be measured in interval and ratio scales
Measurement Scale
The choice of appropriate statistical technique depends upon the type of data in question
Qualitative
Data
• Nominal Scale
• Ordinal Scale
Quantitative
Data
• Interval Scale
• Ratio Scale
Nominal Scale
The least precise or crude of the 4 basic scales of measurement
Implies the classification of an item into 2 or more categories without any extent or magnitude
There is no particular order assigned to them
The frequency or numbers are used to give a name to something that may be used for determining per cent, mode
Eg. boys and girls; pass and fail; rural and urban
Ordinal Scale
The ordinal scale is more precise scale than the nominal scale
The variables has been categorized or leveled with meaningful natural order
But there is no information about the interval
Eg. Pain: none, mild, moderate, severe
Interval Scale
The interval scale is more precise and refined scale than nominal and ordinal scales
This scale has all the characteristics and relationship of the ordinal scale, besides which distances between any two numbers on the scale are known
The size of interval between two observations can be measured
Eg. The temperature of a body
Ratio Scale
It has the same properties as an interval scale as well as a true or absolute zero value
The ratio scale numerals have the qualities of real numbers, and can be added, subtracted, multiplied or divided
Eg. Mean systolic BP
Process of systematic gathering of data for a particular purpose from various sources, that has been systematically observed, recorded, organized
It is the first step of statistical study
There are several ways of collecting data
The choice of procedures usually depends on the objectives and design of the study and the availability of time, money and personnel
Collection of Data
To obtain information
To keep on record
To make decisions about important issues
To pass information onto others
For research study
Purpose of Data Collection
Data collection is an extremely important part of any research because the conclusions of a study are based on what the data reveal
How Important it is?
Nature, scope & objective of the enquiry
Sources of information
Availability of fund
Techniques of data collection
Availability of trained persons
Factors to be considered before data collection
Example: DocumentsCreative worksInterviewsMan-made materialsSurveys
Example:Unpublished thesis and dissertationsManuscriptBooksJournals
Sources of Data
Source of Data
External
Primary Data Secondary Data
Internal
Internal sources of Data
o Many institutions anddepartments have informationabout their regular functions ,for their own internalpurposes
o When those information areused in any survey is calledinternal sources of data
o Eg. social welfare society
External sources of data
o When information is collected from outside agencies is called external sources of data
o Such types of data are either primary or secondary
o This type of information can be collected by census or sampling method by conducting survey
Internal & External Sources of Data
Data collected by investigator from personal experimental studies for a specific research goal is called primary data
The data are collected specially for a research project
Used when secondary data are unavailable and inappropriate
Data are to be unique, original, reliable and accurate in nature
Primary data hahe not been changed or altered by human beings, therefore its validity is greater than secondary data
Primary Data
Demerits
Evaluated cost
Time consuming
More number of resources
are required
Inaccurate feedback
Required lot of skill with
labor
Targeted issues are
addressed
Data interpretation is better
Merits
High accuracy of data
Greater control
Address as specific research
issues
Primary Data
Interview (direct/indirect)
Schedule
Questionnaires survey
Focus group discussion (FGD)
Community forums and public hearings
Observation
Case studies
Key informants interview
Internet/E-mail/SMS
Primary Data Collection Techniques
The data is collected by the investigator personally, he/she
must be a keen observer
He/she asks or cross-examines the informant and collects
necessary information
It is original in character
Direct personal observation
Direct personal observation is adopted in the following cases
Where greater accuracy is needed
Where the field of enquiry is not large
Where confidential data are to be collected
Where sufficient time is available
Suitability of direct personal observation
Merits
Original data
True and reliable data
Encouraging response
because of personal
approach
A high degree of accuracy
Direct personal observation
Demerits
Unsuitable in large area
Expensive & time-consuming
Untrained investigator brings
worst results
Collection of information
according to the ease of the
informant
The investigator approaches the witness or third parties,
who are in touch with the informant
The enumerator interviews the people, who are directly or
indirectly connected with the problem under the study
Generally this method is employed by different enquiry
committees and commissions
The police department generally adopts this method to
get clues of thefts, riots , murders, etc.
Indirect oral interview
It is more suitable when the area to be studied is large
It is used when direct information cannot be obtained
This system is generally adopted by governments
Suitability of indirect oral interview
Merits
Simple and convenient
Saves time, money and labor
Useful in investigation of a large area
Adequate information can be had
Demerits
Information can’t be relied as absence of direct contact
Interview with an improper man will spoil the results
To get real data, a sufficient no. of people are to be interviewed
Careless attitude of informant affects the degree of accuracy
Indirect oral interview
The local agents or correspondents will be appointed, they
collect the information and transmit it to the office or person
They do according to their own ways and tastes
Adopted by newspapers, agencies, etc.
The informants are generally called correspondents
Suitable in those cases where the information is to be
obtained at regular intervals from a wide area
Information through agencies
Merits
Demerits
Extensive information can be had
It is the most cheap and economical method
Speedy information is possible
It is useful where information is needed regularly
The information may be biased
Degree of accuracy cannot be maintained
Uniformity cannot be maintained
Data may not be original
Information through agencies
The questionnaires is sent to the respondents, there are blank
spaces for answers
A covering letter is also sent along with the questionnaire,
requesting the respondent to extend their full cooperation
Adopted by research workers, private individuals, non-officials
agencies and government
Appropriate in cases where informants are spread over a wide
area
Mailed questionnaires
Merits
Of all the methods, the mailed questionnaire is the most
economical
It can be widely used, when the area of investigation is large
It saves money, labor and time
Demerits
Cannot be sure about the accuracy and reliability of the data
There is long delay in receiving questionnaires duly filled in
Mailed questionnaires
Very similar to the questionnaire method
The main difference is that a schedule is filled by the enumerator who is specially appointed for the purpose
Enumerator goes to the respondents, asks them the questions from the Performa in the order listed, and records the responses in the space provided
Enumerators must be trained in administering the schedule
Data Collection Through Schedules
A detailed study of geographical area to gather data, attitudes, impressions, opinions, satisfaction level etc., by polling a section of the population
Census Survey
• Conducted regularly at large interval of time
Continuous Survey
• Conducted regularly and frequently
Ad-hoc Survey
• Conducted at specific times for specific need
• ‘as and when’ required
Survey
Types
Merits
Cover large population
Less expensive
Information is accurate
Demerits
On small scale survey avoided
Time consuming
Information does not penetrate deeply
Researcher must have good knowledge
Survey
It is the method of comprehensive study of social unit which may be a person, a family, an institution, an organization or a community
Merits
Direct behavioral study
Real & personal experience record
Make possible the study of social change
Increase analysis ability & skills
Demerits
One case almost different from another case
Personal bias
Use only in limit sphere
More time & money consuming
Case Study
Useful to further explore a topic, providing a broader understanding of why the target group may behave or think in a particular way
And assist in determining the reason for attitudes and beliefs
Conducted with a small sample of the target group and
Used to stimulate discussion and gain greater insights
Focus Group Discussion
Merits
Useful when exploring cultural values and health beliefs
Can be used to explore complex issues
Can be used to develop hypothesis for further research
Do not require participants to be literate
Demerits
Lack of privacy/anonymity
Potential for the risk of ‘group think’
Potential for group to be dominated by one or two people
Group leader needs to be skilled at conducting focus groups, dealing with conflict, drawing out passive participants
Time consuming to conduct and analyse
Focus Group Discussion
Application and combination of several research methods in the study of the same phenomenon
Researchers can hope to overcome the weakness or intrinsic biases and the problems that come from single method, single-observer and single-theory studies
The purpose of triangulation in qualitative research is to increase the credibility and validity of the results
Triangulation
Types (Denzin 1978)
Data Triangulation
Investigator Triangulation
Theory Triangulation
Methodological Triangulation
Beating the Bias
Secondary data are those data which have been already
collected and analysed by some earlier agency for its own
use and later the same data are used by a different agency
Published Sources Unpublished Sources
Sources of Secondary Data
Secondary Data
Various governmental, international and local agencies
publish statistical data, and chief among them are:
International publications: They are UNO, WHO, Nature, etc.
Official publications of Government: Department of Drug
Administration, Central Bureau of Statistics
Semi-Official publications: Semi-Govt. institutions like
Municipal Corporation, District Board, etc. publish reports
Published Sources
Publications of Research Institutions: Nepal Development
Research Institute, Nepalese Journal of Ophthalmology etc.
publish the finding of their research program
Journals and Newspapers: Current and important materials
on statistics and socio-economic problems can be obtained
from journals and newspapers like, Swasthya Khabar Patrika,
Health Today Magazine, The Sight, etc.
Published Sources
Records maintained by various government and private
offices
Researches carried out by individual research scholars in
the universities or research institutes
According to Prof. Bowley “It is never safe to take published statistics
at their face value without knowing their meaning and limitations and
it is always necessary to criticize arguments that can be based on
them.”
Unpublished Sources
Before using the secondary data, the investigators should
consider the following factors:
Precautions in the use of Secondary Data
Suitability of data
Adequacy of data
Reliability of data
Reliability of data – may be tested by checking:
Who collected the data?
What were the sources of the data?
Was the data collected properly?
Suitability of data
Data that are suitable for one enquiry may not be necessarily suitable in another enquiry
Objective, scope and nature of the original enquiry must be studied
Adequacy of data – data is considered inadequate, if they are related to area which may be either narrower or wider than the area of the present enquiry
Secondary Data must possess the following characteristics
Primary data
o Real time data
o Sure about sources of data
o Help to give results/ finding
o Costly and time consuming
process
o Avoid biasness of response
data
o More flexible
Secondary data
o Past data
o Not sure about of sources of
data
o Refining the problem
o Cheap and no time
consuming process
o Can not know in data
biasness or not
o Less flexible
The characteristics or traits for which numerical value can not be assigned, are called qualitative data (attributes)
e.g. gender, color, honesty etc.
Methods of collecting qualitative data
Methods of Qualitative Data Collection
Direct Observation
In-depth Interview
Case Study TriangulationUse of
Secondary Data
Assessment of Qualitative Data
Classification of Qualitative data
Qualitative Data
Geographical Classification
Chronological Classification
Qualitative Classification
Assessment of Qualitative Data
Tabulation of Qualitative Data
Qualitative data values can be organized by a frequency distribution
A frequency distribution lists
– Each of the categories
– The frequency/counts for each category
Assessment of Qualitative Data
Frequency Table
A simple data set is: cataract, cataract, keratoconus, glaucoma, glaucoma, cataract, glaucoma, cataract
A frequency table for this qualitative data is
The most commonly occurring eye condition is cataract
Eye condition Frequency
Cataract 4
Keratoconus 1
Glaucoma 3
Assessment of Qualitative Data
What Is A Relative Frequency? The relative frequencies are the proportions (or percents)
of the observations out of the total
A relative frequency distribution lists– Each of the categories– The relative frequency for each category
Relative frequency = Frequency/Total
Assessment of Qualitative Data
Relative Frequency Table
A relative frequency table for this qualitative data is
A relative frequency table can also be constructed with percents (50%, 12.5% and 37.5% for the above table)
Refractive Error Relative Frequency
Cataract .500 (=4/8)
Keratoconus .125 (=1/8)
Glaucoma .375 (=3/8)
Assessment of Qualitative Data
Graphical representation Of Qualitative Data
Bar Diagram
Pie or Sector Diagram
Line Diagram
Pictogram
Map Diagram or Cartogram
Assessment of Qualitative Data
Data Processing
The data, after collection, has to be prepared for analysis
Collected data is raw and it must undergo some processing before analysis
The result of the analysis are affected a lot by the form of the data
So, proper data processing is must to get reliable result
Data Processing
Checking the questionnaires and schedules
Reduction of mass data to manageable proportion
Sum up the materials so as to prepare tables, charts,graphs and various groupings and breakdowns forpresenting the result
Minimizing the errors which may creep in at various stageof the survey
Objectives of Data Processing
1. Manual Data Processing
Involves human intervention
Implies many chances for errors, such as delays in data capture, high amount of operator misprints
Implies higher labor expenses in regards to spending for equipment and supplies, rent, etc.
Types of Data Processing
2. Mechanical Data Processing
Different calculations and processing are performed using mechanical machines like calculators etc.
The use of mechanical machines makes data processing easier and less time- consuming
The chances of errors also become far less than manual data processing
Types of Data Processing
3. Electronic Data Processing
Processing of data by use of computer and its programs
Types of Data Processing
4. Real Time Processing
There is a continual input, process and output of data
Data has to be processed in a small stipulated time period (real time)
Eg, when a bank customer withdraws a sum of money from his or her account it is vital that the transaction be processed and the account balance updated as soon as possible
Types of Data Processing
5. Batch Processing
In a batch processing group of transactions collected over a period of time is collected, entered, processed and then the batch results are produced
Batch processing requires separate programs for input, process and output
It is an efficient way of processing high volume of data
Eg, Payroll system, examination system and billing system
Types of Data Processing
QUESTIONNAIRE CHECKING EDITING CODING CLASSIFICATION
TABULATIONGRAPHICAL
REPRESENTATIONDATA CLEANINGDATA ADJUSTING
The processing of data involves activities such as
Important Steps in Data Processing
When the data is collected through questionnaires, the first steps of data process is to check the questionnaires if they are accepted or not
Not accepted if:
Gives the impression that respondent could not
understand the questions
Incomplete partially or fully
Answered by a person who
has inadequate knowledge
Questionnaire Checking
Process of examining the data collected in questionnaires/schedules
to detect errors and omissions
to correct these when possible
to make sure the schedules are ready for tabulation
Data Editing
Editor is responsible for seeing that the data are;
Accurate as possible
Consistent with other facts secured
Uniformly entered
As complete as possible
Acceptable for tabulation and arranged to facilitate coding tabulation
Data Editing
• Data form complete
• Free of bias, errors, inconsistency and dishonesty
Editing for quality
• Modification to facilitate tabulation,
• Ignoring extremely high/low
Editing for tabulation
• Translating or rewritingField editing
• Wrong and replacementCentral editing
Types of Editing
To gather information
To make data relevant and appropriate for analysis
To find errors and modify them
To ensures that the information provided is accurate
To establish the consistency of data
To determine whether or not the data are complete
To obtain the best possible data available
Necessity of Editing
Process of assigning numerals or other symbols to answers so that responses can be put into limited number of categories or classes
Translating answers into numerical values or assigning numbers to the various categories of a variable to be used in data analysis
Coding is done by using a code book, code sheet, and a computer card
Coding is done on the basis of the instructions given in the codebook
The codebook gives a numerical code for each variable
Coding of Data
72
• A codebook contains coding instructions and the necessary information about variables in the data set
• A codebook generally contains the following information:
- column number
- record number
- variable number
- variable name
- question number
- instructions for coding
Codebook
To organize data code
To form structure for coding
For interpretation of data
For conclusions of data coded
To translating answers into numerical values
To assign no. to the various categories for data analysis
It is necessary for efficient analysis
Necessity of Coding
The process of arranging the primary data in a definite pattern and presenting it in a systematic way
The crude data obtained from experiment or survey is classified according to their properties
Classification cab be done by qualitatively or quantitatively
Classification of Data
The classified data is more easily understood
It presents the facts into a simpler form
It facilitates quick comparison
It helps for further statistical treatment such as average, dispersion etc.
It detects the error easily
Objectives of classification
Qualitative classification
Geographical classification
Chronological classification
Qualitative classification
Quantitative classification
Discrete classification
Continuous classification
Types of classification
Geographical Classification
Data are classified by location of occurrence (i.e. area, region) eg cataract pts. district wise
Chronological classification
Data are classified by time of occurrence of the observations, events
The categories are arranged in chronological order
eg, no. of trachoma pts. recorded from 2000 to 2010
Qualitative Classification
Qualitative classification (Classification according to attributes)
Data are classified according to some quality such as religion, literacy, sex, occupation etc.
Simple classification
Classification is made into 2 classes, such as classification by male or female
Manifold classification
2 or more than 2 attributes are studied simultaneously
Eg. Classification according to sex, again marital status and again literacy
Qualitative Classification
Process of systematic organization and recording of long series of data for further analysis and interpretation into rows and columns
It is concise, logical & orderly arrangement of data in a columns & rows
Tabulation
It presents an overall view of findings in a simpler way
To identify trends
It displays relationships in a comparable way between parts of the findings
It conserves space and reduces explanatory and descriptive statement to a minimum
It facilitates the process of comparison
It provides a basis for various statistical computations
Usefulness of Tabulation
Graphical Representation
Graphs help to understand the data easily
A single picture is worth a thousand words-so goes a common saying
The non statistical minded people also easily understands the data and compares them
Most common graphs are bar charts and pie charts in qualitative study and histogram in quantitative study
Graphical Representation
Advantages
It is easier to read
Can show relationship between 2 or more sets of observations in one look
Universally applicable
Has high communication power
Simplifies complex data
Has more lasting effect on brain
Presentation of Qualitative data
1. Bar Diagram
• Consists of equally spaced vertical (or horizontal) rectangular bars of equal width placed on a common horizontal (or vertical) base line
• The categories are placed on X-axis and their frequencies on Y-axis
Graphical Representation
Graphical Representation
0
100
200
300
400
BPH MBBS B.Optom B.Pharma
NO
. OF
STU
DEN
TS
HEALTH PROGRAM
Health Program at IOM
Simple Bar diagram
Component Bar diagram
Multiple Bar diagram
Graphical Representation
2. Pie Chart
• Circular diagram divided into segments and each segment represent frequency in a category
Graphical Representation
Production of health manpower yearly
PictogramLine diagram
Cartogram
Graphical Representation
Presentation of Quantitative Data
1.Histogram
• Graphical representation of a set of contiguously drawn bars
• Most popular graph for continuous variable
Graphical Representation
Frequency Polygon
Frequency Curve
Scatter Diagram Time Plot
Graphical Representation
Stem-leaf Display
Box-and-whisker Plot
Includes consistency checks and treatment of missing responses
Although preliminary consistency checks have been made during editing, the checks at this stage are more thorough and extensive, because they are made by computer
Computer packages like SPSS, SAS, EXCEL and MINITAB can be programmed to identify out-of-range values for each variable
Data Cleaning
If any correction needs to be done for the statistical analysis, the data is adjusted accordingly
Data Adjusting
Data adjusting is not always necessary but it may improve the quality of analysis sometimes
Data Analysis
• Biostatistics by Prem P. Panta
• Fundamentals of Research Methodology and Statistics by Yogesh k. Singh
• Research Design by J. W. Creswell
• Internet
References
Thank
you