Beverly (How to Write a Research Proposal That Works)

HOW TO WRITE A RESEARCH PROPOSAL THAT WORKS

Beverly B. Palmer, Ph.D.California State University, Dominguez Hills

Visiting Fulbright ProfessorUniversiti Malaysia Sabah

What Is Research? One of a number of ways of generating knowledge. We acquire knowledge

through a variety of ways such as: Tenacity of belief (I have absolute faith) Inspiration (I am illuminated by a new idea) Referral to Authority (I accept existing knowledge) Reason (I think rationally and logically) Experience (I validate knowledge empirically)Research combines the last four ways of knowing on the above list. It is a

“systematic, controlled, empirical and critical investigation of natural phenomena guided by theory and hypotheses about the presumed relations among such phenomena” (Kerlinger, 1986: 10).

Your research must be an original and significant contribution to the field. One of the reasons you do a literature search after you determine the research question is to make sure no one else has done the same research.

Everything follows from the research question.1. First be curious about everything you observe in order to find an area of interest. For example, suppose you decide you want to do research on stress. 2. The next step is to formulate questions around your area of interest. For example, you could ask:– What causes stress?– What is stress?– How to reduce stress?3. Now you pick one of these questions to focus on. The main problem beginning researchers have is in narrowing down the research topic. So lets suppose you pick the last question, “How to reduce stress”. Next you need to define each term in the question. So you need to define “stress” and you need to search for ways to reduce it. You start to search the literature to accomplish these two tasks. Literature SearchYou can look in the library or you can search online but you only want to use the words stress and reduce in your search. Again, don’t get sidetracked by reading material that does not focus on these two words. Also, look at the references in each of your sources. Those references will lead you to related sources. From your literature search you now are able to create an even more focused question, such as, “Does time management reduce stress in college students” because

you found time management is one of the major ways of reducing stress.

Operationalize TermsThe next step is to operationalize the terms in your focused research question. Operationalized definitions are ones which state the concept in terms of measurable and observable behavior. For example, stress could be conceptualized as the physiological response of an increase in blood pressure or as the psychological response of a feeling of tension and anxiety. If you pick the psychological response you have to operationalize this tense and anxious feeling as measureable and observable behaviors. You also have to do that with the concept of time management. One way of operationalizing concepts is to find or create a scale that measures the concept.

Using the correct instrument to answer the research questionYou are either going to use an instrument that was already standardized and published or one that you create. If you create your own instrument you will need to pilot it to make sure it has reliability and validity. You also have two types of instruments to chose:

(a) a survey, which basically yields discrete data (each question asks about a separate concept and these are answered in terms of categories such as “yes” and “no”)

(b) a scale, which yields continuous data (there are several questions or items for each concept and the responses are given in terms of ratings of 1-5 (eg. “very much to not at all”).

The type of instrument you use will determine the type of data analysis you can do. For example, discrete (categorical) data can only be analyzed through qualitative means or by chi square analysis. Continuous data can be analyzed by inferential statistics.If you decide to create a survey here are some things to consider:

This is the Survey Design chapter from The Survey System's Tutorial, revised July, 2000. It is reproduced here as a service to the research community. Copyright 2000, Creative Research Systems. Survey Design

Knowing what the client wants is the key factor to success in any type of business. News media, government agencies and political candidates need to know what the public thinks. Associations need to know what their members want. Large companies need to measure the attitudes of their employees. The best way to find this information is to conduct a survey. This chapter is intended primarily for those who are new to survey research. It discusses options and provides suggestions on how to design and conduct a successful survey project. It does not provide instruction on using specific parts of The Survey System, although it mentions parts of

the program that can help you with certain tasks.

The Steps in a Survey Project 1. Establish the goals of the project - What you want to learn?

2. Determine your sample - Who you will ask?

3. Choose interviewing methodology - How you will ask?

4. Create your questionnaire - What you will ask?

5. Pre-test the questionnaire, if practical - Test the questions.

6. Conduct interviews and enter data - Ask the questions.

7. Analyze the data - Produce the reports.

This chapter covers the first five steps. The Survey System's Tutorial I and II cover entering data and producing reports.

Establishing Goals

The first step in any survey is deciding what you want to learn. The goals of the project determine whom you will survey and what you will ask them. If your goals are unclear, the results will probably be unclear. Some typical goals include learning more about:

The potential market for a new product or service

Ratings of current products or services

Employee attitudes

Customer/patient satisfaction levels

Reader/viewer/listener opinions

Association member opinions

Opinions about political candidates or issues

Corporate images

These sample goals represent general areas. The more specific you can make your goals, the easier it will be to get usable answers.

Selecting Your Sample

There are two main components in determining whom you will interview. The first is deciding what kind of people to interview. Researchers often call this group the target population. If you conduct an employee attitude survey or an association membership survey, the population is obvious. If you are trying to determine the likely success of a product, the target population may be less obvious. Correctly determining the target population is critical. If you do not interview the right kinds of people, you will not successfully meet your goals.

The next thing to decide is how many people you need to interview. Statisticians know that a small, representative sample will reflect the group from which it is drawn. The larger the sample, the more precisely it reflects the target group. However, the rate of improvement in the precision decreases as your sample size increases. For example, to increase a sample from 250 to 1,000 only doubles the precision. You must make a decision about your sample size based on factors such as: time available, budget and necessary degree of precision.

The Survey System (and this Web site) includes a sample size calculator that can help you decide on the sample size (jump to the calculator page for a general discussion of sample size considerations).

Avoiding a Biased Sample A biased sample will produce biased results. Totally excluding all bias is almost impossible; however, if you recognize bias exists you can intuitively discount some of the answers. The following list shows a few examples of biased samples.

SampleProbable

Bias Reason

Your customers

Favorable They would not be your customers if they were unhappy, but it is important to know what keeps them happy.

Your ex-customers

Unfavorable If they were happy they would not be ex-customers, but it is important to know why they left you.

"Phone in" Extreme Views

Only people with a strong interest polls in a subject (either for or against) are likely to call in - and they may do so several times to load the vote.

Daytime Non-working Most people who are at home during Interviews the day do not work. Their opinions may not

http://www.surveysystem.com/sscalc.htm

reflect the working population.

Internet Atypical People

Limited to people with Internet access. Internet users are not representative of the general population, even when matched on age, gender, etc.. This can be a serious problem, unless you are only interested in people who have Internet access.

The consequences of a source of bias depend on the nature of the survey. For example, a survey for a product aimed at retirees will not be as biased by daytime interviews as will a general public opinion survey. A survey of possible Internet products can safely ignore people who are not on the Internet.

Quotas A Quota is a sample size for a sub-group. It is sometimes useful to establish quotas to ensure that your sample accurately reflects relevant sub-groups in your target population. For example, men and women have somewhat different opinions in many areas. If you want your survey to accurately reflect the general population's opinions, you will want to ensure that the percentage of men and women in your sample reflect their percentages of the general population. If you are interviewing users of a particular type of product, you probably want to ensure that users of the different current brands are represented in proportions that approximate the current market share. Alternatively, you may want to ensure that you have enough users of each brand to be able to analyze the users of each brand as a separate group. If you are doing telephone interviewing, The Survey System's optional Sample Management Module can help you enforce quotas. It lets you create automatically enforced quotas and/or monitor your sample during interviewing sessions.

Interviewing Methods

Once you have decided on your sample you must decide on the method of data collection. Each method has advantages and disadvantages.

Personal Interviews An interview is called personal when the Interviewer asks the questions face-to-face with the Interviewee. Personal interviews can take place in the home, at a shopping mall, on the street, outside a movie theater or polling place, and so on.

Advantages

The ability to let the Interviewee see, feel and/or taste a product.

The ability to find the target population. For example, you can find people who have seen a film more easily outside a theater in which it is playing than by calling phone numbers at random.

Longer interviews are sometimes tolerated. Particularly with in-home interviews that have been arranged in advance. People may be willing to talk longer face-to-face to a person than to someone on the phone.

Disadvantages

Personal interviews usually cost more per interview than other methods. This is particularly true of in-home interviews, where travel time is a major factor.

Each mall has its own characteristics. It draws its clientele from a specific geographic area surrounding it, and its shop profile also influences the type of client. These characteristics may differ from the target population and create a non-representative sample.

Telephone Surveys Surveying by telephone is the most popular interviewing method in the USA. This is made possible by nearly universal coverage (96% of homes have a telephone).

Advantages

People can usually be contacted faster over the telephone than with other methods. If the Interviewers are using CATI (computer-assisted telephone interviewing), the results can be available minutes after completing the last interview.

You can dial random telephone numbers when you do not have the actual telephone numbers of potential respondents.

If you are using computer-assisted interviewing, The Survey System's optional Interviewing Module (see Chapter 11 in the Main Manual) helps automatically ensure that questions are skipped when they should be, can check the logical consistency of answers and can present questions or answers in a random order (the last two are sometimes important for reasons that are described later).

Disadvantages

Many telemarketers have given legitimate research a bad name by claiming to be doing research when they start a sales call. Consequently, many people are reluctant to answer phone interviews and use their answering machines to screen calls. Since over half of the homes in the USA have answering machines, this problem is getting worse.

The growing number of working women often means that no one is home during the day. This limits calling time to a "window" of about 6-9 p.m. when you can be sure to interrupt dinner or a favorite TV program.

You cannot show or sample products by phone.

Mail Surveys Advantages

Mail surveys are among the least expensive.

This is the only kind of survey you can do if you have the names and addresses of the target population, but not their telephone numbers.

The questionnaire can include pictures - something that is not possible over the phone.

Mail surveys allow the respondent to answer at their leisure, rather than at the often inconvenient moment they are contacted for a phone or personal interview. For this reason, they are not considered as intrusive as other kinds of interviews.

Disadvantages

Time! Mail surveys take longer than other kinds. You will need to wait several weeks after mailing out questionnaires before you can be sure that you have gotten most of the responses.

In populations of lower educational and literacy levels, response rates to mail surveys are often too small to be useful. This, in effect, eliminates many immigrant populations that form substantial markets in many areas. Even in well-educated populations, response rates vary from as low as 3% up to 90%. As a rule of thumb, the best response levels are achieved from highly-educated people and people with a particular interest in the subject (which, depending on your target population, could lead to a biased sample).

http://www.surveysystem.com/sdesign.htm#bais

http://www.surveysystem.com/sdesign.htm#bais

One way of improving response rates to mail surveys is to mail a postcard telling your sample to watch out for a questionnaire in the next week or two. You can also follow up a questionnaire mailing after a couple of weeks with another card asking them to return the questionnaire. The downside is that this doubles or triples your mailing cost. If you have purchased a mailing list from a supplier you may also have to pay a second (and third) use fee - you often cannot buy the list once and re-use it.

Another way to increase responses to mail surveys is to use an incentive. One possibility is to send a dollar bill along with the survey (or offer to donate the dollar to a charity specified by the respondent.) Another is to include the people who return completed surveys in a drawing for a prize. A third is to offer a copy of the (non-confidential) result highlights to those who complete the questionnaire. Any of these techniques will increase the response rates.

Remember that if you want a sample of 1,000 people, and you estimate a 10% response level, you need to mail 10,000 questionnaires. You may want to check with your local post office about bulk mail rates - you can save on postage using this mailing method. However, many researchers do not use bulk mail, because many people associate "bulk" with "junk" and will throw it out without opening the envelope, lowering your response rate.

Computer Direct Interviews These are interviews in which the Interviewees enter their own answers directly into a computer. They can be used at malls, trade shows, offices, and so on. The Survey System's optional Interviewing Module and Interview Stations can easily create computer-direct interviews.

Advantages

The virtual elimination of data entry and editing costs.

You will get more accurate answers to sensitive questions. Recent studies of potential blood donors have shown respondents were more likely to reveal HIV-related risk factors to a computer screen than to either human interviewers or paper questionnaires. The National Institute of Justice has also found that computer-aided surveys among drug users get better results than personal interviews. Employees are also more often willing to give more honest answers to a computer than to a person or paper questionnaire.

The elimination of interviewer bias. Different interviewers can ask questions in different ways, leading to different results. The computer asks the questions the same way every time.

Ensuring skip patterns are accurately followed. The Survey System can ensure people are not asked questions they should skip, based on their earlier answers. These automatic skips are more accurate than relying on an Interviewer reading a paper questionnaire.

Response rates are usually higher. Computer-aided interviewing is still novel enough that some people will answer a computer interview when they would not have completed another kind of interview.

Disadvantages

The Interviewees must have access to a computer or one must be provided for them.

As with mail surveys, computer direct interviews may have serious response rate problems in populations of lower educational and literacy levels. This method may grow in importance as computer use increases.

E-mail SurveysE-mail surveys are both very economical and very fast. More people have e-mail than have full Internet access. This makes e-mail a better choice than a Web page survey for some populations. On the other hand, e-mail surveys are limited to simple questionnaires, whereas Web page surveys can include complex logic.

Advantages

Speed. An e-mail questionnaire can gather several thousand responses within a day or two.

There is practically no cost involved once the set up has been completed.

You can attach pictures and sound files.

The novelty element of an e-mail survey often stimulates higher response levels than ordinary “snail” mail surveys.

Disadvantages

You must possess (or purchase) a list of e-mail addresses to mail to.

Some people will respond several times or pass questionnaires along to friends to answer. Many programs have no check to eliminate people respon-ding multiple times to bias the results. The Survey System’s E-mail Module will only accept one reply from each

address mailed to. It eliminates duplicate and pass along questionnaires and checks to ensure that respondents have not ignored instructions (e.g., giving 2 answers to a question requesting only one).

Many people dislike unsolicited e-mail even more than unsolicited regular mail. You may want to send e-mail questionnaires only to people who expect to get mail from you.

You cannot use e-mail surveys to generalize findings to the whole populations. People who have e-mail are different from those who do not, even when matched on demographic characteristics, such as age and gender.

Many e-mail programs are limited to plain ASCII text questionnaires and cannot show pictures. E-mail questionnaires from The Survey System can attach graphic or sound files. Although use of e-mail is growing very rapidly it is not universal - and is even less so outside the USA (three-quarters of the world's e-mail traffic takes place within the USA). Many “average” citizens still do not possess e-mail facilities. So e-mail surveys do not reflect the population as a whole. At this stage they are probably best used in a corporate environment where e-mail is much more common or when most members of the target population are known to have e-mail.

Internet/Intranet (Web Page) Surveys Web surveys are rapidly gaining popularity. They have major speed and cost advantages, but also major sampling limitations. These limitations make software selection especially important and restrict the groups you can study using this technique.

Advantages

Web page surveys are extremely fast. A questionnaire posted on a popular Web site can gather several thousand responses within a few hours. Many people who will respond to an e-mail invitation to take a Web survey will do so the first day; most will do so within a few days.

There is practically no cost involved once the set up has been completed. Large samples do not cost more than smaller ones (except for any cost to acquire the sample).

You can show pictures and play sounds.

Web page questionnaires can use complex question skipping logic, randomizations and other features not possible with paper

questionnaires or most e-mail surveys.

Web page questionnaires can use colors, fonts and other formatting options not possible in most e-mail surveys.

On average, people give longer answers to open-ended questions on Web page questionnaires than they do on other kinds of self-administered surveys.

Disadvantages

Current use of the Internet is far from universal. Internet surveys do not reflect the population as a whole. This is true even if a sample of Internet users is selected to match the general population in terms of age, gender and other demographics.

People can easily quit in the middle of a questionnaire. They are not as likely to complete a long questionnaire on the Web as they would be if talking with an interviewer.

Depending on your software, you may have no control over who replies - anyone from Afghanistan to Zanzibar, cruising that web page may answer.

There is often no control over people responding multiple times to bias the results.

At this stage we recommend using the Internet for surveys only when your target population consists entirely of Internet users. Business-to- business research and employee attitude surveys can often meet this requirement. Surveys of the general population usually will not. In either case, be sure your survey software prevents people from completing more than one questionnaire. You may also want to restrict access by requiring a password (The Survey System’s Internet Module allows you to do this) or by putting the survey on a page that can only be accessed directly (there are no links to it).

Scanning Questionnaires Scanning questionnaires is a method of data collection that can be used with paper questionnaires that have been administered in face-to-face interviews; mail surveys or surveys completed by an Interviewer over the telephone. The Survey System can produce paper questionnaires that can be scanned using the Remark Office OMR Program (available from CRS). Other software can scan questionnaires and produce an ASCII File that can be read into The Survey System.

Advantages

http://www.surveysystem.com/websurveys.htm

Scanning can be the fastest method of data entry for paper questionnaires.

Scanning is more accurate than a person in reading a properly completed questionnaire.

Disadvantages Scanning is best-suited to "check the box" type surveys and bar

codes. Scanning programs have various methods to deal with text responses, but all require additional data entry time.

Scanning is less forgiving (accurate) than a person in reading a poorly marked questionnaire. Requires investment in additional hardware to do the actual scanning.

Summary of Survey Methods Your choice of survey method will depend on several factors. These include:

Speed E-mail and Web page surveys are the fastest methods, followed by telephone interviewing. Interviewing by mail is the slowest.

Cost Personal interviews are the most expensive followed by telephone and then mail. E-mail and Web page surveys are the least expensive for large samples.

Internet Usage

E-mail and Web page surveys offer significant advantages, but you cannot generalize their results to the population as a whole.

Literacy Levels

Illiterate and less-educated people rarely respond to mail surveys.

Sensitive Questions

People are more likely to answer sensitive questions when interviewed directly by a computer.

Questionnaire Design

General Considerations The first rule is to design the questionnaire to fit the medium. Phone interviews cannot show pictures. Survey-by-mail respondents cannot ask, “What exactly do you mean by that?” if they do not understand a question. Intimate, personal questions are sometimes best handled by mail or computer, where anonymity is most assured.

While The Survey System will easily let you combine surveys gathered using different mediums; it is not usually recommended that you do so. A mail survey will often not give the same answers as the same survey done

by phone or in person. If you used one method in the past and need to compare results, stick to that method, unless there is a compelling reason to change.

KISS - keep it short and simple. If you present a 20-page questionnaire most potential respondents will give up in horror before even starting. Ask yourself what you will do with the information from each question. If you cannot give yourself a satisfactory answer, leave it out. Avoid the temptation to add a few more questions just because you are doing a questionnaire anyway. If necessary, place your questions into three groups: must know, useful to know and nice to know. Discard the last group, unless the previous two groups are very short.

Start with an introduction or welcome message. In the case of mail questionnaires, this message can be in a cover letter or on the questionnaire form itself. If you are sending e-mails that ask people to take a Web page survey, put your main introduction or welcome message in the e-mail. When practical, state who you are and why you want the infor-mation in the survey. A good introduction or welcome message will encourage people to complete your questionnaire.

Allow a “Don't Know” or “Not Applicable” response to all questions, except to those in which you are certain that all respondents will have a clear answer. In most cases, these are wasted answers as far as the researcher is concerned, but are necessary alternatives to avoid frustrated respondents. Sometimes “Don't Know” or “Not Applicable” will really represent some respondents' most honest answers to some of your ques-tions. Respondents who feel they are being coerced into giving an answer they do not want to give often do not complete the questionnaire.

For the same reason, include “Other” or “None” whenever either of these are a logically possible answer. When the answer choices are a list of possible opinions, preferences or behaviors you should usually allow these answers.

On paper, computer direct and Internet surveys these four choices should appear as appropriate. You may want to combine two or more of them into one choice, if you have no interest in distinguishing between them. You will rarely want to include “Don't Know,” “Not Applicable,” “Other” or “None” in a list of choices being read over the telephone or in person, but you should allow the interviewer the ability to accept them when given by respondents.

Question Types Researchers use three basic types of questions: multiple choice, numeric open end and text open end (sometimes called "verbatims"). Examples of each kind of question follow:

Rating Scales and Agreement Scales are two common types of questions that some researchers treat as multiple choice questions and others treat as numeric open end questions. Examples of these kinds of questions are:

Question and Answer Choice Order There are two broad issues to keep in mind when considering question and answer choice order. One is how the question and answer choice order can encourage people to complete your survey. The other issue is how the order of questions or the order of answer choices could affect the results of your survey.

Ideally, the early questions in a survey should be easy and pleasant to answer. These kinds of questions encourage people to continue the survey. In telephone or personal interviews they help build rapport with the interviewer. Grouping together questions on the same topic also makes the questionnaire easier to answer.

Whenever possible leave difficult or sensitive questions until near the end of your survey. Any rapport that has been built up will make it more likely people will answer these questions. If people quit at that point anyway, at least they will have answered most of your questions.

Answer choice order can make individual questions easier or more difficult to answer. Whenever there is a logical or natural order to answer choices, use it. Always present agree-disagree choices in that order. Presenting them in disagree-agree order will seem odd. For the same reason, positive to negative and excellent to poor scales should be presented in those orders. When using numeric rating scales higher numbers should mean a more positive or more agreeing answer.

Question order can affect the results in two ways. One is that mentioning something (an idea, an issue, a brand) in one question can make people think of it while they answer a later question, when they might not have thought of it if it had not been previously mentioned.

The other way question order can affect results is habituation. This problem applies to a series of questions that all have the same answer choices. It means that some people will usually start giving the same answer, without really considering it, after being asked a series of similar questions. People tend to think more when asked the earlier questions in the series and so give more accurate answers to them.

If you are using telephone, computer direct or Internet interviewing, The Survey System can help with this problem. You can use an Interviewing Instruction to have The Survey System present a series of questions in a random order in each interview. This technique will not eliminate habituation, but will ensure that it applies equally to all questions in a series, not just to particular questions near the end of a series. You must have the Interviewing Module to use these instructions in telephone or

http://www.surveysystem.com/interviewing.htm

computer direct interviews.

Another way to reduce this problem is to ask only a short series of similar questions at a particular point in the questionnaire. Then ask one or more different kinds of questions, and then another short series, if needed.

A third way to reduce habituation is to change the “positive” answer. This applies mainly to level-of-agreement questions. You can word some statements so that a high level of agreement means satisfaction (e.g., “My supervisor gives me positive feedback”) and others so that a high level of agreement means dissatisfaction (e.g., “My supervisor usually ignores my suggestions”). This technique forces the respondent to think more about each question. One negative aspect of this technique is that you will usually have to do Data Transformations on some of the questions after the results are entered, because having the higher levels of agreement always mean a positive (or negative) answer makes the analysis much easier. However, the few minutes extra work may be a worthwhile price to pay to get more accurate data.

The order in which the answer choices are presented can also affect the answers given. People tend to pick the choices nearest the start of a list when they read the list themselves on paper or a computer screen. People tend to pick the most recent answer when they hear a list of choices read to them.

As mentioned previously, sometimes answer choices have a natural order (e.g., Yes, followed by No; or Excellent - Good - Fair - Poor). If so, you should use that order. At other times, questions have answers that are obvious to the person that is answering them (e.g., “What brand(s) of car do you own?”). In these cases, the order in which the answer choices are presented is not likely to affect the answers given. However, there are kinds of questions, particularly questions about preference or recall or questions with relatively long answer choices that express an idea or opinion, in which the answer choice order is more likely to affect which choice is picked. Here too, if you are using telephone, computer direct or Internet interviewing, The Survey System can help. You can use an Interviewing Instruction to have The Survey System present the answer choices in a random order. You must have the Interviewing Module to use these instructions in telephone or computer direct interviews.

Other Tips Keep the questionnaire as short as possible. We mentioned this principle before, but it is so important it is worth repeating. More people will complete a shorter questionnaire, regardless of the interviewing method. If

http://www.surveysystem.com/interviewing.htm

a question is not necessary, do not include it.

Start with a Title (e.g., Leisure Activities Survey). Always include a short introduction - who you are and why you are doing the survey. It is often a good idea to give the name of the research company rather than the client (e.g., XYZ Research Agency rather than the manufacturer of the product/ service being surveyed). Many firms create a separate research company name (even if it is only a direct phone line to the research department) to disguise themselves. This is to avoid possible bias, since people rarely like to criticize someone to their face and are much more open to a third party.

Reassure your respondent that his or her responses will not be revealed to your client, but only combined with many others to learn about overall attitudes.

Include a cover letter with all mail surveys. A good cover letter will increase the response rate. A bad one, or none at all, will reduce the response rate. Include the information in the preceding two paragraphs and mention the incentive (if any). Describe how to return the questionnaire. Include the name and telephone number of someone the respondent can call if they have any questions. Include instructions on how to complete the survey itself.

Mail questionnaires should be numbered on each page and include the return address on the questionnaire itself, because pages and envelopes can be separated from each other. Envelopes should have return postage prepaid. Using a postage stamp often increases response rates, but is expensive, since you must stamp every envelope - not just the returned ones.

You may want to leave a space for the respondent to add their name and title. Some people will put in their names, making it possible for you to recontact them for clarification or follow-up questions. Indicate that filling in their name is optional. Do not have a space for a name, if the questions are sensitive in nature. Some people would become suspicious and not complete the survey.

If you hand out questionnaires on your premises, you obviously cannot remain anonymous, but keep the bias problem in mind when you consider the answers.

If the survey contains commercially sensitive material, ask a "security" question up front to find whether the respondent or any member of his family, household or any close friend works in the industry being surveyed. If so, terminate the interview immediately. They (or family or friends) may work for the company that commissioned the survey - or for a competitor. In either case, they are not representative and should be eliminated. If they work for a competitor, the nature of the questions may betray valuable

secrets. The best way to ask security questions is in reverse (i.e., if you are surveying for a pharmaceutical product, phrase the question as "We want to interview people in certain industries - do you or any member of your household work in the pharmaceutical industry?). If the answer is "Yes" thank the respondent and terminate the interview. Similarly, it is best to eliminate people working in the advertising, market research or media industries, since they may work with competing companies.

After the security question, start with general questions. If you want to limit the survey to users of a particular product, you may want to disguise the qualifying product. As a rule, start from general attitudes to the class of products, through brand awareness, purchase patterns, specific product usage to questions on specific problems (i.e., work from "What types of coffee have you bought in the last three months" to "Do you recall seeing a special offer on your last purchase of Brand X coffee?"). If possible put the most important questions into the first half of the survey. If a person gives up half way through, at least you have the most important information.

Make sure you include all the relevant alternatives as answer choices. Leaving out a choice can give misleading results. For example, a number of recent polls that ask Americans if they support the death penalty yes or no have found 70-75% of the respondents choosing ”yes.” But polls that offer the choice between the death penalty and life in prison without the possibility of parole show support for the death penalty at about 50-60%. While polls that offer the alternatives of the death penalty or life in prison without the possibility of parole, with the inmates working in prison to pay restitution to their victims’ families have found support of the death penalty closer to 30%.

So what is the true level of support for the death penalty? The lowest figure is probably best, since it represents the percentage that favor that penalty regardless of the alternative offered. The need to include all relevant alternatives is not limited to political polls. You can get misleading data anytime you leave out alternatives.

Do not put two questions into one. Avoid questions such as "Do you buy frozen meat and frozen fish?" A "Yes" answer can mean the respondent buys meat or fish or both. Similarly with a question such as "Have you ever bought Product X and, if so, did you like it?" A "No" answer can mean "never bought" or "bought and disliked." Be as specific as possible. "Do you ever buy pasta?" can include someone who once bought some in 1990. It does not tell you whether the pasta was dried, frozen or canned and may include someone who had pasta in a restaurant. It is better to say "Have you bought pasta (other than in a restaurant) in the last three months?" "If yes, was it frozen, canned or dried?" Few people can remember what they bought more than three months ago unless it was a major purchase such as an automobile or appliance.

The overriding consideration in questionnaire design is to make sure your questions can accurately tell you what you want to learn. The way you phrase a question can change the answers you get. Try to make sure the wording does not favor one answer choice over another.

Avoid emotionally charged words or leading questions that point towards a certain answer. You will get different answers from asking "What do you think of the XYZ proposal?" than from "What do you think of the Republican XYZ proposal?" The word "Republican" in the second question would cause some people to favor or oppose the proposal based on their feelings about Republicans, rather than about the proposal itself. It is very easy to create bias in a questionnaire. This is another good reason to test it before going ahead.

If you are comparing different products to find preferences, give each one a neutral name or reference. Do not call one "A" and the second one "B." This immediately brings images of A grades and B grades to mind, with the former being seen as superior to the latter. It is better to give each a "neutral" reference such "M" or "N" that do not have as strong a quality difference image. If possible, just refer to the "first" product and the "second" product.

Avoid technical terms and acronyms, unless you are absolutely sure that respondents know they mean. LAUTRO, AGI, GPA, EIEIO (Life Assurance and Unit Trust Regulatory Organization, Adjusted Gross Income, Grade Point Average and Engineering Information External Inquiries Officer) are all well-known acronyms to people in those particular fields, but very few people would understand all of them. If you must use an acronym, spell it out the first time it is used.

Make sure your questions accept all the possible answers. A question like "Do you use regular or premium gas in your car?" does not cover all possible answers. The owner may alternate between both types. The question also ignores the possibility of diesel or electric-powered cars. A better way of asking this question would be "Which type(s) of fuel do you use in your cars?" The responses allowed might be:

Regular gasoline Premium gasoline

Diesel Other Do not have a car

If you want only one answer from each person, ensure that the options are mutually exclusive. For example:

In which of the following do you live?

a house

an apartmentthe suburbs

This question ignores the possibility of someone living in a house or an apartment in the suburbs.

Score or Scale questions (e.g., "If "5" means very good and "1" means very poor how would rate this product?") are a particular problem. Researchers are very divided on this issue. Many surveys use a ten-point scale, but there is considerable evidence to suggest that anything over a five point scale is irrelevant. This depends partially on education. Among university graduates a ten point scale will work well. Among people with less than a high school education five points is sufficient. In third world countries, a three- point scale (good/acceptable/bad) is often all a respondent can understand. Another problem is that you are assuming that the difference in the factors is within the scale limits - you may have a five-point scale but in a respondent's mind one factor may rate 10 points in comparison to the others.

If you do use a rating scale be sure the labels are meaningful. For example:

What do you think about product X?

It's the best on the market It's about averageIt's the worst on the market

A question phrased like the one above will force most answers into the middle category, resulting in very little usable information.

If you have used a particular scale before and need to compare results, use the same scale. Four on a five-point scale is not equivalent to eight on a ten-point scale. Someone who rates an item "4" on a five-point scale might rate that item anywhere between "6" and "9" on a ten-point scale.

Be aware of cultural factors. In the third world, respondents have a strong tendency to exaggerate answers. Researchers are often perceived as being government agents, with the power to punish or reward according to the answer given (and this is sometimes true). Accordingly they often give "correct" answers rather than what they really believe. Even when the questions are not overtly political and deal purely with commercial products or services, the desire not to disappoint important visitors with answers that may be considered negative may lead to exaggerated scores. Always discount "favorable" answers by a significant factor in all cases. The desire to please is not limited to the third world.

In personal interviews it is vital for the Interviewer to have empathy with the Interviewee. In general, Interviewers should try to "blend" with respondents

in terms of race, language, sex, age, etc. Choose your Interviewers according to the likely respondents.

Leave your demographic questions (age, sex, income, education, etc.) until the end of the questionnaire. By then the Interviewer should have built a rapport with the Interviewee that will allow honest responses to such personal questions. Mail questionnaires should do the same, although the rapport must be built by good question design, rather than personality. Exceptions are any demographic questions that qualify someone to be included in the survey. For example, many researchers limit some surveys to people in certain age groups. These questions must come near the beginning.

Paper questionnaires requiring text answers, should always leave sufficient space for handwritten answers. Lines should be about half-an-inch (one cm.) apart. The number of lines you should have depends on the question. Three to five lines are average. Leave a space at the end of a questionnaire entitled "Other Comments." Sometimes respondents offer casual remarks that are worth their weight in gold and cover some area you did not think of, but which respondents consider critical. Many products have a wide range of secondary uses that the manufacturer knows nothing about but which could provide a valuable source of extra sales if approached properly. In one third world market, a major factor in the sale of candles was the ability to use the spent wax as floor polish - but the manufacturer only discovered this by a chance remark.

Always consider the layout of your questionnaire. This is especially important on paper, computer direct and Internet surveys. You want to make it attractive, easy to understand and easy to complete. If you are creating a paper survey, you also want to make it easy for your data entry personnel.

Try to keep your answer spaces in a straight line, either horizontal or vertical. A single answer choice on each line is best. Eye tracking studies show the best place to use for answer spaces is the right hand edge of the page. It is much easier for a field worker or respondent to follow a logical flow across or down a page. Using the right edge is also easiest for data entry. The Survey System lets you create a Questionnaire Form with the answer choices in two columns. Creating the form that way can save a lot of paper or screen space, but you should recognize doing so makes the questionnaire harder to complete. It also slows the data entry process.

Questions and answer choice grids, as in the second of the following examples, are popular with many researchers. They can look attractive and save paper, or computer screen space. They also can avoid a long series of very repetitive question and answer choice lists. Unfortunately,

they also are a bit harder than the repeated lists for some people to understand. As always, consider whom you are studying when you create your questionnaire.

Look at the following layouts and decide which you would prefer to use:

Do you agree, disagree or have no opinion that this company has:

A good vacation policy - agree/not sure/disagree.Good management feedback - agree/not sure/disagree.Good medical insurance - agree/not sure/disagree.High wages - agree/not sure/disagree.

An alternative layout is:

Do you agree, disagree or have no opinion that this company has:

Agree Not Sure Disagree A good vacation policy 1 2 3 Good management feedback 1 2 3 Good medical insurance 1 2 3 High wages 1 2 3

The second example shows the answer choices in neat columns and has more space between the lines. It is easier to read. The numbers in the second example will also speed data entry.

Surveys are a mixture of science and art, and a good researcher will save their cost many times over by knowing how to ask the correct questions.

Pre-test the Questionnaire

The last step in questionnaire design is to test a questionnaire with a small number of interviews before conducting your main interviews. Ideally, you should test the survey on the same kinds of people you will include in the main study. If that is not possible, at least have a few people, other than the question writer, try the questionnaire. This kind of test run can reveal unanticipated problems with question wording, instructions to skip questions, etc. It can help see if the interviewees are understanding your questions and giving useful answers. If you change any questions after a pre-test, you should not combine the results from the pre-test with the results of post-test interviews. The Survey System will invariably provide you with mathematically correct answers to your questions, but choosing sensible questions and ad ministering surveys with sensitivity and common sense will improve the quality of your results dramatically.

If you are going to create a scale here are some of the things your need to consider:

Validity Issues in Measuring Psychological Constructs: The Case of Emotional Intelligence

Measuring a psychological construct like emotional intelligence is as much an art as it is a science. Because such psychological constructs are latent and not directly observable, issues of construct validity are paramount, but are, unfortunately, often glossed over in the methodology sections of research papers. In an effort to increase the validity of conclusions reached using paper-and-pencil measures of psychological constructs like emotional intelligence, this web page was constructed. This page covers the major validity issues involved in measuring psychological constructs, using examples from measuring emotional intelligence. The information gathered here will provide insight regarding the construct of emotional intelligence and how one would attempt to clarify its meaning and measure it (as well as any other psychological construct for that matter).

As of yet, no one has created a measure of emotional intelligence. However, due to the appeal and applicability of such a construct, it is almost certain that someone will attempt such an endeavor soon. As with measuring any psychological construct, one must not rush to make conclusions based on the results of a poorly constructed measuring instrument.

EMOTIONAL INTELLIGENCEWhy emotional intelligence is important

Researchers investigated dimensions of emotional intelligence (EI) by measuring related concepts, such as social skills, interpersonal competence, psychological maturity and emotional awareness, long before the term "emotional intelligence" came into use. Grade school teachers have been teaching the rudiments of emotional intelligence since 1978, with the development of the Self Science Curriculum and the teaching of classes such as "social development," "social and emotional learning," and "personal intelligence," all aimed at "raise[ing] the level of social and emotional competence" (Goleman, 1995: 262). Social scientists are

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#goleman

just beginning to uncover the relationship of EI to other phenomenon, e.g., leadership (Ashforth and Humphrey, 1995), group performance (Williams & Sternberg, 1988), individual performance, interpersonal/social exchange, managing change, and conducting performance evaluations (Goleman, 1995). According to Goleman (1995: 160), "Emotional intelligence, the skills that help people harmonize, should become increasingly valued as a workplace asset in the years to come." And Shoshona Zuboff, a psychologist at Harvard Business School, points out, "corporations have gone through a radical revolution within this century, and with this has come a corresponding transformation of the emotional landscape. There was a long period of managerial domination of the corporate hierarchy when the manipulative, jungle-fighter boss was rewarded. But that rigid hierarchy started breaking down in the 1980s under the twin pressures of globalization and information technology. The jungle fighter symbolizes where the corporation has been; the virtuoso in interpersonal skills is the corporate future" (Goleman, 1995: 149). If these predictions are true, then the interest in emotional intelligence, if there is such a thing, is sure to increase, and with this increase in interest comes a corresponding increase in trying to measure emotional intelligence. Two such measures purport to measure emotional intelligence. One test from USA Weekendand the other is from Utne Reader. However, neither of these tests provide any evidence of providing results that are reliable or valid.

Definition and dimensions of emotional intelligenceRecent discussions of EI proliferate across the American landscape -- from the cover of Time, to a best selling book by Daniel Goleman, to an episode of the Oprah Winfrey show. But EI is not some easily dismissed "neopsycho-babble." EI has its roots in the concept of "social intelligence," first identified by E.L. Thorndike in 1920. Psychologists have been uncovering other intelligences for some time now, and grouping them mainly into three clusters: abstract intelligence (the ability to understand and manipulate with verbal and mathematics symbols), concrete intelligence (the ability to understand and manipulate with objects), and social intelligence (the ability to understand and relate to people) (Ruisel, 1992). Thorndike (1920: 228), defined social intelligence as "the ability to understand and manage men and women, boys and girls -- to act wisely in human relations." And Gardner (1983) includes inter- and intrapersonal intelligences in his theory of multiple intelligences. These two intelligences comprise social intelligence. He defines them as follows:

Interpersonal intelligence is the ability to understand other people: what motivates them, how they work, how to work cooperatively with them. Successful salespeople, politicians, teachers, clinicians, and religious leaders are all likely to be individuals with high degrees of interpersonal intelligence. Intrapersonal intelligence ... is a correlative ability, turned inward. It is a capacity to form an accurate, veridical model of oneself and to be able to use that model to operate effectively in life.

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#gardner

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#thorndike

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#ruisel

http://webstir.vdsys.com/c2c/em_int.html

http://pathfinder.com/@@1scCPWNeOQAAQMew/time/magazine/domestic/1995/951002/951002.cover.html

http://www.utne.com/cgi-bin/eq

http://www.utne.com/cgi-bin/eq

http://tuna.co.iup.edu/~jacross/emotions.html




http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#williams

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#williams

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#ashforth

Emotional intelligence, on the other hand, "is a type of social intelligence that involves the ability to monitor one's own and others' emotions, to discriminate among them, and to use the information to guide one's thinking and actions" (Mayer & Salovey, 1993: 433). According to Salovey & Mayer (1990), the originators of the concept of emotional intelligence, EI subsumes Gardner's inter- and intrapersonal intelligences, and involves abilities that may be categorized into five domains:

Self-awareness: Observing yourself and recognizing a feeling as it happens. Managing emotions: Handling feelings so that they are appropriate; realizing what is behind a feeling; finding ways to handle fears and anxieties, anger, and sadness. Motivating oneself: Channeling emotions in the service of a goal; emotional self control; delaying gratification and stifling impulses. Empathy: Sensitivity to others' feelings and concerns and taking their perspective; appreciating the differences in how people feel about things. Handling relationships: Managing emotions in others; social competence and social skills.

Self-awareness (intrapersonal intelligence), empathy and handling relationships (interpersonal intelligence) are essentially dimensions of social intelligence.

MEASUREMENT ISSUESPsychological constructs

Emotional intelligence is a psychological construct, an abstract theoretical variable that is invented to explain some phenomenon which is of interest to scientists. Salovey and Mayer invented (made up) the idea of emotional intelligence to explain why some people seem to be more "emotionally competent" than other people. It may just be that they are better listeners and this explains the variability in people's "emotional competence." Or it may be that these people differ in emotional intelligence, and this is what explains the difference. Salovey and Mayer believed it was necessary to develop the construct of emotional intelligence in order to explain this difference in people. Examples of other psychological constructs, just to name a few, include organizational commitment, self esteem, job satisfaction, tolerance for ambiguity, optimism, and intention to turnover.

Problems with MeasurementSo imagine for the moment that you are a social scientist and you want to measure emotional intelligence using a paper-and-pencil instrument, or in other words, a questionnaire (also referred to as a scale or measure as well). A questionnaire can include more than one measure or scale (a measure of self-esteem and a measure of depression). Questionnaires are the most commonly

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#salovey

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#mayer

used procedure of data acquisition in field research (Stone, 1978), and many researchers have questioned how good these questionnaires really are. Field research involves investigating something out in the "real world" rather than in a laboratory. Problems with the reliability and validity of some of these questionnaires has often led to difficulties in interpreting the results of field research (Cook, Hepworth, Wall & Warr, 1981; Schriesheim, Powers, Scandura, Gardiner & Lankau, 1993; Hinkin, 1995). Unfortunately, researchers begin using these measures or questionnaires before knowing if they are any good or not, and often make significant conclusions only to be contracted by other researchers later on who are able to measure the constructs more accurately and precisely (Hinkin, 1995). Thus, before you go ahead and add another lousy measure of a psychological construct to the already growing pile of them, take a few minutes now to learn about the process of creating valid and reliable instruments that measure psychological constructs.

Validity and ReliabilityDeveloping a measure of a psychological construct is a difficult and extremely time-consuming process if it is to be done correctly (Schmitt & Klimoski, 1991). However, if you don't take the time to do it right, then any conclusions you reach using your questionnaire may be dubious. Many organizational researchers believe that the legitimacy of organizational research as a scientific endeavor is dependent upon the how well the measuring instruments measure the intended constructs (Schoenfeldt, 1984). The management field needs measures that provide results that are valid and reliable if the field is to advance (cf. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1985). The American Psychological Association (1985) states that measures of psychological constructs should demonstrate content validity, criterion-related validity and internal consistency, or reliability, which in turn provide evidence of construct validity. Reliability refers to the extent to which the question responses correlate with the overall score on the questionnaire. In other words, do all the questions "hang together," all attempting to measure the same thing, whatever that thing is? What that "thing" is involves the issue of validity. Validity is basically "the best available approximation to the truth or falsity of a given inference, proposition, or conclusion" (Trochim, 1991: 33). In this particular case where a measure is being constructed, validity refers to how well the questionnaire measures what it is supposed to be measuring. There are different types of validity, and each will be discussed below. What needs to be stressed at this point is that the key word here is demonstrating, not proving, validity of our questionnaires. We can never prove that are instruments measure what they are supposed to measure. There is no one person or statistical test that can prove or give approval of your measure. That's why it is suggested that one use the modifier "approximately" when referring to validity because "one can never know what is true. At best, one can know what has not yet been ruled out as false" (Cook & Campbell, 1979: 37). Only through time and lots of testing will the approximate "validity and reliability" of your measure be established. I use quotes around the words validity and

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#cook

http://www.rio.com/~tstmastr/avalid.html

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#trochim1991

http://www.educ.drake.edu/dannenbring/EDUC162S96/validity_reliability.html

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#schoenfeldt

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#schmitt

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#hinkin


http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#powers


http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#hepworth

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#stone

reliability because the measure itself is not reliable and valid, only the conclusions reached using the measure are reliable and valid.

1. Construct validity Construct validity is concerned with the relationship of the measure to the underlying attributes it is attempting to assess. A law analogy sums it up nicely: construct validity refers to measuring the construct of interest, the whole construct, and nothing but the construct. The goal is to measure emotional intelligence, fully and exclusively. To what degree is your questionnaire measuring the theoretical construct of emotional intelligence (only and completely)? Answering this question will demonstrate the construct validity of your instrument. What might be happening instead of emotional intelligence being measured is that the measure might be measuring something else, may be measuring only part of emotional intelligence and part of something else, or may be measuring only part of emotional intelligence and not the full construct.

Construct validity is an overarching type of validity, and includes face, content, criterion-related, predictive and concurrent validity (described below) and convergent and discriminant validity. Convergent validity is demonstrated by the extent to which the measure correlates with other measures designed to assess similar constructs. Discriminant validity refers to the degree to which the scale does not correlate with other measures designed to assess dissimilar constructs. Basically, by providing evidence of all these variations of construct validity (content, criterion-related, convergent and discriminant), you are establishing that your scale measures what it was intended to measure. Construct validity is often examined using the multitrait-multimethod matrix developed by Campbell and Fiske (1959). See two other terrific web pages for a thorough description of this method: one by Trochim and one by Jabs.

2. Face and content validity Face validity refers to whether a measure appears "valid on the face." In plain English, it means that just by looking at it, one would declare that the measure has face validity. It is a judgment call, and one would look at say a measure of emotional intelligence and say, "Yes, it looks to me like it measures emotional intelligence." Obviously, this is the

http://trochim.human.cornell.edu/tutorial/jabs/mtmm.htm

http://trochim.human.cornell.edu/kb/mtmmmat.htm

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#campbell

http://www-coe.tamu.edu/institute/ral/measure.htm#valid

http://trochim.human.cornell.edu/kb/constval.htm

weakest form of construct validity. Content validity is established by showing that the questionnaire items (questions) are a sample of a universe or domain in which the researcher is interested (Cronbach & Meehl, 1955). Again, this is a judgment call, but more systematic means can be used (such as concept mapping and factor analysis, both described below). This means that, like in the case of emotional intelligence, a questionnaire would have to tap or ask questions about all dimensions of the construct. If our questionnaire of emotional intelligence only asked about how well you engage in conversation at a party than the content adequacy of our measure is suspect. Our focus is too narrow and our questions are not a representative sample of the entire domain or "world of" emotional intelligence. The problem here is that we don't really know what the domain entails. We have only the educated guesses of two guys and a few other researchers who say the domain of emotional intelligence consists of five dimensions. As will be discussed later on, concept mapping is a useful tool for developing and gaining consensus on the domain of a construct. See Schriesheim, Powers, Scandura, Gardiner, and Lankau (1993) for a very thorough review of content adequacy of paper-and-pencil survey type instruments.

3. Criterion-related validity This refers to the relationship between your measure and other independent measures (Hinkin, 1995). It is the degree to which your measure uncovers relationships that are in keeping with the theory underlying the construct. Criterion-related validity is an indicator that reflects to what extent scores on our measure of emotional intelligence can be related to a criterion. A criterion is some behavior or cognitive skill of interest that we want to predict using our test scores of emotional intelligence. For instance, people scoring higher in emotional intelligence on our test we would predict would demonstrate more sensitivity to others' problems, would be able to control their impulses, and would be able to label their emotions more easily than someone who scores lower on our test of emotional intelligence. Evidence of criterion-related validity would usually be demonstrated by the correlation between the test scores and the scores of a criterion performance.

Criterion-related validity has two sub-components: predictive validity and concurrent validity (Cronbach & Meehl, 1955).

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#cronbach




Predictive validity refers to the correlation between the test scores and the scores of a criterion performance given at a later date. Concurrent validity refers to the correlation between the test scores and the scores of a criterion performance when both tests are given at the same time. An example will help clarify the two types of validity.

Perhaps we want to predict the performance of front desk clerks at a hotel. This will be our criterion that we want to predict using some test. The test we will use in this case is a measure of emotional intelligence. The predictive validity of the emotional intelligence test can be estimated by correlating an employee's score on a test of emotional intelligence with his/her performance evaluation a year after taking the test. If there is a high positive correlation, then we can predict performance using the emotional intelligence measure and have demonstrated the predictive validity of the emotional intelligence measure. To demonstrate concurrent validity, we would have to correlate emotional intelligence test scores and criterion scores (current performance evaluations). If the correlation is large and positive, this would provide evidence of concurrent validity. Because the concurrent validity correlation coefficient tends to underestimate the corresponding predictive validity correlation coefficient, predictive validity tends to be preferred to concurrent validity.

4. Internal consistency Also known as internal consistency reliability, this refers to how well the questions correlate to each other and to the total test score. Basically what internal consistency reliability measures is whether the items are all measuring the same thing, whatever that "thing" might be. There are several different statistical procedures for estimating this reliability. The most common estimates a coefficient alpha, or Cronbach coefficient alpha. If a scale is multi-dimensional, consisting of numerous subscales, than coefficient alphas must be estimated for each subscale.

CREATING A MEASURE OF EMOTIONAL INTELLIGENCENow that we know what we are up against, let's begin developing a measure of emotional intelligence (or any other construct you wish to measure). The basic steps for developing measures, as suggested by Schwab (1980) are as follows: Step 1: Item Development

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#step1

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#schwab

The generation of individual items or questions. Step 2: Scale Development

The manner in which items are combined to form scales. Step 3: Scale Evaluation

The examination of the scale in light of reliability and validity issues. The following discussion will be presented in the order of steps suggested by Schwab (1980), with modifications and additions made as necessary. At each step, the issues relating to validity and reliability will be addressed.

Step 1: Item GenerationThe first step in creating a measure of a psychological construct is creating test questions or items. For example, in the case of emotional intelligence, you may create a group of 20 questions, the answers to which would provide evidence of a person's emotional intelligence. But how do you know what to ask? And how many questions are needed? The answer is that you have to ask questions that sample the construct domain and you have to ask enough questions to adequately sample the domain to ensure that the entire domain has been covered, but not too many extraneous questions. According to Hinkin (1995: 969), the "measure must adequately capture the specific domain of interest yet contain no extraneous content." This has to do with content validity and there is no statistical or quantitative index of content validity. It is a matter of judgment and of collecting evidence to prove the content validity of the measure. However, first things first. You have to define the construct you are interested in measuring. It may be already defined by the existing literature or it may need to be defined based on a review of the literature. In the case of emotional intelligence, Salovey and Mayer have provided a theoretical universe of emotional intelligence. They suggest that emotional intelligence consists of 5 dimensions as noted above. One way of generating items for your measure would be to create questions that tap these five dimensions, utilizing the classification schema defined by them. This is called the deductive approach to item development (Hinkin, 1995). So, you say, now we're getting somewhere. All I have to do is write questions that get at all 5 dimensions of emotional intelligence. And if I can't do it alone, I can ask experts to help generate questions within the conceptual definition of emotional intelligence. But how does one know if Salovey and Mayer are right? How does one know that emotional intelligence is comprised of 5 dimensions and not 6 or 3? And how do you know if the dimensions they mentioned are right? Maybe emotional intelligence consists of five dimensions, but just not the dimensions as they defined them.

If little literature or theory exists concerning a construct, then an inductive approach to item development must be undertaken (Hinkin, 1995). Basically the researcher is left to determine the domain or dimensions of the construct. The researcher can gather qualitative data, such as interviews, and categorize the content of the interviews in order to generate the dimensions of the construct.







One method that of data gathering that is quite useful in developing a conceptual domain of a construct is concept mapping.

Developed by William Trochim (1989), concept mapping is a "type of structured conceptualization" that allows a group of people to conceptualize, in the form of a "concept map"(a visual display), the domain of a construct. The group of people can consist of just about anyone and is typically best when a "wide variety of relevant people" are included (Trochim, 1989: 2). In the case of emotional intelligence, in order to develop the domain of the construct, one might wish to gather a group of experts, such as psychologists, or human resources managers, or a group of employees. The groups are then asked to brainstorm about the construct. For emotional intelligence, the brainstorming focus statement may be something like: "Generate statements which describe the ways in which a person high in emotional intelligence is distinct from someone low in emotional intelligence" or "What is emotional intelligence?" The entire process of concept mapping is described in Trochim (1989).

What concept mapping does, as well as what can be done with data collected via qualitative methods such as interviews, is factor analyze, or sort, the items into groups which then provide a foundation for defining a construct as multi-dimensional. If we were to gather a bunch of experts and conducted a concept mapping session, we would hope that their conceptualization of emotional intelligence would consist of the five dimensions suggested by Mayer and Salovey, thus lending support to Mayer & Salovey's theoretical dimensions.

Regardless of whether a deductive or inductive approach to item generation is undertaken, the main issue is content validity, specifically domain sampling. In the case of a deductive procedure, item are generated theoretically from the literature. These items may be assessed by experts in the area as to the content validity of the items. In the case of emotional intelligence, we could develop items to cover the five dimensions. Then we could ask a group of psychologists to sort the items into six categories, the five dimensions plus an "other" category. Those items that were assigned to the proper category more than 80% or 85% would be retained for use in the questionnaire. The "other" category and those items not meeting the cutoff for the proper category would be discarded. This procedure is described as a best procedure in Hinkin (1995). Another way of tackling this would be, rather than giving the five dimensions to the experts, just ask them to sort the piles into as many categories as they see fit. The results can be analyzed in the same manner used in concept mapping. If the experts come up with 5 dimensions like those theorized, then the researcher can be more confident in those dimensions. Just because some people theorize what the domain of a construct is, there is no reason to rely on their theoretical conceptualization of the construct. By giving the experts the categories up front, you are in essence, assuming those categories, dimensions or conceptualization of the construct is correct and are limiting the experts within those boundaries. Allowing the experts to sort into as many categories as they see fit allows the





http://trochim.human.cornell.edu/research/reshome.htm

data to speak for itself and if the categories coincide with the theorized categories, this is confirmatory evidence of the conceptualization of the domain.

If an inductive approach was taken, the same process can be undertaken. Experts may be used to sort the data. If interviews were conducted, the raw, qualitative data may be sorted, from which items are generated for each category. Another way of sorting involves generating items from the raw data, using as much of the wording provided by the interviewees as possible, and then sorting the items. The raw data or items may be sorted by either telling the sorters the number of categories to sort into or by allowing the sorters to categorize into as many categories as they see fit (and each sorter may sort into a different number of categories!). Once again, by allowing the sorters to determine the number of categories, it allows the data to speak rather than forcing the data into some preconceived notion as to how many categories there should be.

The main concern in generating items for a measure is with content validity -- that is, assessing the adequacy with which the measure assesses the domain of interest.

The content validity of a measure should be assessed as soon as the items have been developed. This way, if items need revision, this can be done before the researcher has large investments in the preparation and administration of the questionnaire (Schriesheim, et al., 1993).

Step 2: Scale DevelopmentThere are three stages within this step: design of the developmental study, scale construction, and reliability assessment (Hinkin, 1995). A. Developmental studyAt this stage in the process, the researcher has a potential set of items for the questionnaire measuring the intended construct. However, at this point, we don't know if the items measure the construct. We only know that they seem to break down (via the sorting) into categories that seem to reflect the underlying dimensions of the construct. Next, the researcher has to administer the items or questionnaire to see how well the items conform to the expected and theorized structure of the construct. There are five important issues in measurement that need to be addressed in the developmental study phase of scale development. The Sample

Who the questionnaire or items are given to make a difference. The sample of individuals chosen should be selected to reflect or represent the population of individuals the researcher is intended to study in the future and make inferences about.

Reverse-scored Items



The use of negatively worded items (items that are worded so a positive response indicates a "lack" of the construct) are mainly used to eliminate or attenuate response pattern bias or response set. Response pattern bias is where the respondent simply goes down the page without really reading the questions thoroughly and circles all "4"s for a response to all the questions. With reverse-scored items, the thought is that the respondent will have to think about the response because the answer is "reversed." However, in recent years, reverse-scored items have come under attack because these items where found to reduce the validity of questionnaire responses (Schriesheim & Hill, 1981) and in fact may introduce systemmatic error to the scale (Jackson, Wall, Martin, & Davids, 1993). An in factor analysis (a sorting of the items into underlying categories or dimensions) of negatively worded and positively worded items, the negatively worded item loadings were lower than the positively worded items that loaded on the same factor (Hinkin, 1995). Alternatives to attenuate response pattern bias should be sought before automatically turning to reverse-scored items. Keeping the scales shorter rather than longer can help reduce response pattern bias.

Number of Items The measure of a construct should include enough items to adequately sample the domain, but at the same time is as parsimonious as possible, in order to obtain content and construct validity (Cronbach and Meehl, 1955). The number of items in a scale can affect responses in different ways. Scales with too many items and excessively lengthy can induce fatigue and response pattern bias (Anastasi, 1976). By keeping the number of items to a minimum, response pattern bias can be reduced (Schmitt & Stults, 1985). However, if too few items are used, than the content and construct validity and reliability of the measure may be at risk (Kenny, 1979; Nunnally, 1976). Single item scales (those scales that ask just one question to measure a construct) are most susceptible to these problems (Hinkin & Schriesheim, 1989). Adequate internal consistency reliability can be obtain with as few as three items (Cook, Hepworth, Wall, & Warr, 1981), and the more items added the progressively less impact they have on the scale reliability (Carmines & Zeller, 1979).

Scaling of Items The scaling of items refers to the choice of responses given for each item. Examples include Likert-type scales, such as choosing from 1 to 5, which refer to strongly agree, agree, neither agree or disagree, disagree, and strongly disagree, respectively. Semantic differential scales refer to the use of words such as "happy" and "sad" and the respondent chooses a response on a scale of 1 to 7

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#carmines

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#hinkin2

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#nunnally

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#kenny


http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#anastasi



http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#jackson

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#jackson

http://trochim.human.cornell.edu/kb/truescor.htm

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#schriesheim

or 1 to 5, with "1" referring to "happy" and "5" or "7" referring to "sad" and the numbers in between referring to states between being happy and sad. The important issue to contend with at this point is achieving sufficient variance or variability among respondents. A researchers would not want a measure with a Likert-type scale with responses 1 to 3, and most of the respondents choosing response "3." This measure is not capable of differentiating different types of responses, and perhaps giving choices from 1 to 5 would alleviate this problem. The reliability of Likert-type scales increases with the increase in the number of response choices up to five, but then levels off (Lissitz & Green, 1975).

Sample Size In terms of confidence in the results, the larger the sample size the better. That is, if the researcher has generated items and is looking to conduct a developmental study to check the validity and reliability of the items, then the larger sample of individuals administered the items, the better. The larger the sample, the more likely the results will be statistically significant. When conducting factor analysis of the items to check the underlying structure of the construct, the results may be susceptible to sample size effects (Hinkin, 1995). Rummel (1970) recommends an item-to-response ratio range of 1:4, and Schwab (1980) recommends a ratio of 1:10. For example, if a researchers has 20 items he/she is analyzing, then the sample size should be anywhere from 80 to 200 respondents. New research in this area has found that a sample size of 150 respondents should be adequate to obtain an accurate exploratory factor analysis solution given that the internal consistency reliability is reasonably strong (Guadagnoli & Velicer, 1988). An exploratory factor analysis is when there is no a priori conceptualization of the construct. A confirmatory factor analysis is when the researcher is attempting to confirm the theoretical conceptualization put forth in the literature. In the case of emotional intelligence, a confirmatory factor analysis would be conducted to see if the items "breakdown" or "sort" into five factors or "dimensions" similar to those suggested by Mayer and Salovey. Recent research suggests that a minimum sample size of 200 is necessary for an accurate confirmatory factor solution (Hoelter, 1983).

B. Scale constructionAt this point in the process, the researcher has generated items and administered them to a sample (hopefully representative of the population of interest). The researcher has taken into consideration reverse-scored items, the

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#hoelter

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#guadagnoli


http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#rummel


http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#lissitz

number of items to both adequately sample the domain and be parsimonious, the scaling of the items to ensure sufficient variance among the respondents, and has used an adequate sample size. Now comes the process of constructing the scale or measure of the construct, through a process of reduction of the number of items and the refinement of the construct. The most common technique for doing this is factor analysis (Ford, MacCallum & Tait, 1986). When items do not load sufficiently on a factor should be discarded or revised. Minimum item loadings of .40 are the most commonly mentioned criteria (Hinkin, 1995).

The purpose of the factor analysis in the construction of the scale is to "examine the stability of the factor structure and provide information that will facilitate the refinement of a new measure" (Hinkin, 1995: 977). The researcher is trying to establish the factor structure or dimensionality of the construct. Using a couple of different independent samples for administering the items and then factor analyzing the results of each sample will help provide evidence (or lack of evidence!) of a stable factor structure. If the researcher finds a different factor structure for each sample, then the researcher has some work to do uncover a stable (the same for all samples) factor structure. Although either an exploratory or confirmatory factor analysis can be conducted, Hinkin (1995: 977) recommends using a confirmatory approach at this point in scale development "...because of the objective of the task of scale development, it is recommended that a confirmatory approach be utilized ... [because] it allows the researcher more precision in evaluating the measurement model." And although the confirmatory factor analysis will tell the researcher if the items are loading on the same factor, it does not tell the researcher if the factor is measuring the intended construct. For example, in the case of emotional intelligence, if I administered the items to a sample and the items loaded on five factors, I might want to jump to conclusions and say my items measure the same five dimensions as outlined by Mayer and Salovey. This would be a big mistake. All I really know at this point is that the items appear to measure five factors or dimensions of "something." I still don't know what that something is. I'm hoping that it is emotional intelligence, but I won't gather evidence until Step 3: Scale Evaluation (see below). C. Reliability assessmentTwo basic issues are to be dealt with at this point: internal consistency and the stability of the scale over time. As mentioned previously, the internal consistency reliability measures whether or not the items "hang together" -- that is, whether the items all measure the same phenomenon. The internal consistency reliability of measures are commonly assessed using Cronbach's Alpha. The stability of the measure over time will be assessed by the test-retest reliability of the measure since emotional intelligence is not expected to change over time (Stone, 1978). An alpha of .70 will be considered the minimum acceptable level for this measure.

Step 3: Scale Evaluation

http://www.rio.com/~tstmastr/arelia.html




http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#ford

At this point in the process, a measure of a psychological construct has been developed that is both reliable and valid. Construct validity was demonstrated via concept mapping, factor analysis, internal consistency, and test-retest reliability. However, as suggested by Hinkin (1995: 979, 980),

Demonstrating the existence of a nomological network of relationships with other variables through criterion-related validity, assessing two groups who would be expected to differ on the measure, and the demonstrating discriminant and convergent validity using a method such as the multitrait-multimethod matrix developed by Campbell and Fiske (1959) would provide further evidence of the construct validity of the new measure.

Criterion-related validityCriterion-related validity is an indicator that reflects to what extent scores on the measure of the construct of interest can be related to a criterion. A criterion is some behavior or cognitive skill of interest that one wants to predict using the test scores of the construct of interest. For instance, in the case of emotional intelligence, people who score higher in emotional intelligence according to the measure would be predicted to demonstrate more sensitivity to others' problems, be able to control their impulses, and be able to label their emotions more easily than someone who scores lower on the test of emotional intelligence. Evidence of criterion-related validity would usually be demonstrated by the correlation between the test scores and the scores of a criterion performance. For emotional intelligence, the criterion performance could be showing sensitivity to others' problems, being able to label one's feelings, etc. judged by an expert. One way of doing this would be to have the facilitators of a sensitivity training group (T-group) judge a sample of T-group participants on the performance of the criteria. "The training or T-group is an approach to humans relation training which, broadly speaking, provides participants with an opportunity to learn more about themselves and their impact on others and, in particular, to learn how to function more effectively in face-to-face situations" (Cooper & Mangham, 1971: v). As such, it is a rich environment for seeing the display of emotional intelligence. The facilitators of each T-group will supply subjective measures of each group member's level of emotional intelligence and these will be correlated with the observed scores of each group member on the emotional intelligence instrument, providing further evidence for the measure's validity.

Construct validityConstruct validity includes face, content, criterion-related, predictive, concurrent, convergent and discriminant validity, as well as internal consistency. Issues concerning face, content, predictive and concurrent validity have already been addressed in previous sections. As mentioned previously, construct validity is often examined using the multitrait-multimethod matrix, and is a wonderful method that addresses issues of convergent and discriminant validity (see Campbell and Fiske (1959) or the web pages by Trochim and Jabs for details on this method). Convergent validity is demonstrated by the extent to which the measure correlates with other measures designed to assess similar constructs.

http://trochim.human.cornell.edu/tutorial/jabs/mtmm.htm

http://trochim.human.cornell.edu/kb/mtmmmat.htm


http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#cooper



Discriminant validity refers to the degree to which the scale does not correlate with other measures designed to assess dissimilar constructs.

In the case of emotional intelligence, the newly developed measure could be correlated with Gist's (1995) Social Intelligence measure, Riggio's (1986) Social Skills Inventory, Hogan's (1969) Empathy Scale, Snyder's (1986) Self-monitoring Scale, Eysenck's (1977) I.7 Impulsiveness Questionnaire and Watson and Greer's (1983) Courtauld Emotional Control Scale. Such correlations with specific dimensions of the emotional intelligence measure would provide evidence for convergent validity. Specifically,

Hogan's Empathy Scale should converge with the empathy subscale of the emotional intelligence instrument;

Eysenck's I.7 Impulsiveness Questionnaire should negatively correlate and Watson and Greer's Courtauld Emotional Control Scale should positively correlate with the motivating oneself subscale of the emotional intelligence instrument;

Riggio's Social Skills Inventory should converge with the handling relationships subscale of the emotional intelligence instrument; and

Gist's Social Intelligence should be positively correlate with the self awareness and handling relationships subscales of the emotional intelligence instrument.

The correlations of these other scales with specific subscales of the measure of emotional intelligence would be predicted to be stronger than the correlations of any of these other scales with the entire measure of emotional intelligence, thus providing evidence of discriminant validity. In addition, discriminant validity of any measure of emotional intelligence would have to address how emotional intelligence differs from other intelligences.

In addition, as with any measure of a psychological construct, social desirability should be assessed. One of the most popular measures of social desirability is the Crowne and Marlowe (1964) measure. Another point to be mentioned is that a different independent sample should be used at each stage in the development of any psychological construct, thus attenuating the possibility of "sample specific" findings and increasing the generalizability of the measure.

CONCLUSIONCreating a paper-and-pencil measure of a psychological construct is lengthy and difficult process, and poor measures continue to be created and used. Some researchers may not understand or appreciate the importance of reliability and validity to proper measurement. Many researchers create measures and never validate them, instead relying on "the face validity if a measure appears to capture the construct of interest" (Hinkin, 1995: 981). In addition, because


http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#crowne

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#watson

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#watson

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#eysenck

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#snyder

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#hogan

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#riggio

http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#gist

developing sound measures is a arduous and lengthy process, many researchers take shortcuts or simply avoid the process altogether. Schmitt (1989) believes that the behavioral science field may overlook the importance of validity and reliability, instead emphasizing statistical analysis. Statistical procedures and analysis are of little importance if the data is collected with measures that have not been proven to provide reliable and valid data (Nunnally, 1978). And without sound measurement, the theoretical progress of the field is in jeopardy (Schwab, 1980).

REFERENCESAnastasi, A. (1976). Psychological testing, 4th ed. New York: Macmillan. Ashforth, B.E. & Humphrey, R.H. (1995). Emotion in the workplace: A reappraisal. Human Relations, 48(2), 97-125.

Campbell, D.T. & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56: 81-105.

Carmines, E.G. & Zeller, R.A. (1979). Reliability and validity assessment. Beverly Hills: Sage.

Cook, T.D. & Campbell, D.T. (1979). Quasi-experimentation. Boston: Houghton Mifflin Company.

Cook, J.D., Hepworth, S.J., Wall, T.D. & Warr, P.B. (1981). The experience of work. San Diego: Academic Press.

Cooper, C.L. & Mangham, I.L. (1971). T-groups: A Survey of Research. London: Wiley-Interscience.

Cronbach, L.J. & Meehl, P.C. (1955). Construct validity in psychological tests. Psychological Bulletin, 52: 281-302.

Crowne, D. & Marlowe, D. (1964). The approval motive: Studies in evaluative dependence. New York: Wiley.

Eysenck, S.B., Pearson, P.R., Easting, G. & Allsopp, J.F. (1985). Age norms for impulsiveness, venturesomeness and empathy in adults. Personality and Individual Differences, 6(5), 613-619.

Ford, J.K., MacCallum, R.C. & Tait, M. (1986). The application of exploratory factor analysis in applied psychology: A critical review and analysis. Personnel Psychology, 39: 291-314.

Gardner, H. (1993). Multiple Intelligences. New York: BasicBooks.

Gist, M.E. (1995). The Social Intelligence measure.


http://trochim.human.cornell.edu/tutorial/young/eiweb2.htm#nunnally


Goleman, D. (1995). Emotional intelligence. New York: Bantam Books.

Guadagnoli, E. & Velicer, W.F. (1988). Relation of sample size to the stability of component patterns. Psychological Bulletin, 103: 265-275.

Hinkin, T.R. (1995). A review of scale development practices in the study of organizations. Journal of Management, 21(5), 967-988.

Hinkin, T.R. & Schriesheim, C.A. (1989). Development and application of new scales to measure the French and Raven (1959) bases of social power. Journal of Applied Psychology, 74(4): 561-567.

Hoelter, J.W. (1983). The analysis of covariance structures: Goodness-of-fit indices. Sociological Methods and Research, 11: 325-344.

Hogan, R. (1969). Development of an empathy scale. Journal of Consulting and Clinical Psychology, 33, 307-316.

Jackson, P.R., Wall, T.D., Martin, R. & Davids, K. (1993). New measures of job control, cognitive demand and production responsibility. Journal of Applied Psychology, 78: 753-762.

Kenny, D.A. (1979). Correlations and causality. New York: Wiley.

Lissitz, R.W. & Green, S.B. (1975). Effect of the number of scale points on reliability: A Monte Carlo approach. Journal of Applied Psychology, 60: 10-13.

Mayer, J.D. & Salovey, P. (1993). The intelligence of emotional intelligence. Intelligence, 17, 433-442.

Nunnally, J.C. (1976). Psychometric theory, 2nd ed. New York: McGraw-Hill.

Riggio, R. (1986). Assessment of basic social skills. Journal of Personality and Social Psychology, 51(3), 649-660.

Ruisel, I. (1992). Social intelligence: Conception and methodological problems. Studia Psychologica, 34(4-5), 281-296.

Rummel, R.J. (1970). Applied factor analysis. Evanston, IL: Northwestern University Press.

Salovey, P. & Mayer, J.D. (1990). Emotional intelligence. Imagination, Cognition, and Personality, 9(1990), 185-211.

Schmitt, N.W. & Klimoski, R.J. (1991). Research methods in human resources management. Cincinnati: South-Western Publishing.

Schmitt, N.W. & Stults, D.M. (1985). Factors defined by negatively keyed items: The results of careless respondents? Applied Psychological Measurement, 9: 367-373.

Schoenfeldt, L.F. (1984). Psychometric properties of organizational research instruments. In T.S. Bateman & G.R. Ferris (Eds.), Method and analysis in organizational research. Reston, VA: Reston Publishing.

Schriesheim, C.A. & Hill, K. (1981). Controlling acquiescence response bias by item reversal: The effect on questionnaire validity. Educational and psychological measurement, 41: 1101-1114.

Schriesheim, C.A., Powers, K.J., Scandura, T.A., Gardiner, C.C. & Lankau, M.J. (1993). Improving construct measurement in management research: Comments and a quantitative approach for assessing the theoretical content adequacy of paper-and-pencil survey-type instruments. Journal of Management, 19: 385-417.

Schwab, D.P. (1980). Construct validity in organization behavior. In B.M. Staw & L.L. Cummings (Eds.), Research in organizational behavior, Vol. 2. Greenwich, CT: JAI Press.

Snyder, M. (1986). On the nature of self-monitoring: Matters of assessment, matters of validity. Journal of Personality and Social Psychology, 51(1), 125-139.

Stone, E. (1978). Research methods in organizational behavior. Glenview, IL: Scott, Foresman.

Thorndike, E.L. (1920). Intelligence and its uses. Harper's Magazine, 140, 227-235.

Trochim, W.M. (1991). Developing an evaluation culture for international agricultural research. In D.R. Lee, S. Kearl, and N. Uphoff (Eds.). Assessing the Impact of International Agricultural Research for Sustainable Development: Preceedings from a Symposium at Cornell University, Ithaca, NY, June 16-19, the Cornell Institute for Food, Agriculture and Development, Ithaca, NY.

Trochim, W.M. (1989). An introduction to concept mapping for planning and evaluation. Evaluation and Program Planning, 12, 1-16.

Trochim, W.M. (1985). Pattern matching, validity, and conceptualization in program evaluation. Evaluation Review, 9(5), 575-604.

Watson, M. & Greer, S. (1983). Development of a questionnaire measure of emotional control. Journal of Psychosomatic Research, 27(4), 299-305.

Williams, W.M. & Sternberg, R.J. (1988). Group intelligence: Why some groups are better than others. Intelligence, 12, 351-377.

Creating a hypothesisThe next step after operationalizing your concepts is to write your research question as a hypothesis. We usually write the hypothesis as a null one. For

example, your hypothesis might be “College students in their first year of studies in Malaysian public college taught time management skills (as defined as learning the importance of setting aside two hours each day at the same time and same place for doing assignments and studying) will show no differences in stress (as defined by the SCL-90 R anxiety scale) after one month than those students who were not taught time management skill.” Notice that this hypothesis makes the focused research question even more specific and includes the operational definition of all important concepts. This hypothesis will now guide the research method you chose.

Types of Research DesignGenerally, research design is divided into: Quantitative designs:

SurveyQuasi-ExperimentalExperimental

Qualitative designs:Historical/Comparative StudiesEthnographic (including Action Research)

Methodological Issues1. Selection and assignment of subjects: Random selection is not the same

as random assignment to groups. Since random selection and assignment are both difficult to achieve, most studies are quasi-experimental ones and use matching of groups already created in the setting (e.g. Students in a particular classroom).

2. The number of subjects in each group must account for possible drop-outs. The rule of thumb is at least 30 subjects in each “cell” that you are going to use in statistical analysis (e.g. If you analyze male versus female responses) or for each variable in your analysis.

3. Order effects from the presentation of the stimuli. For example, if you are giving the subjects two questionnaires or two scales you need to present scale 1 to half of the subjects as the first scale and scale 2 to the other half of the subjects as the first scale (this is known as counterbalancing).

These are just some of the methodological issues you need to consider in order to control for threats to reliability and validity. Here are some of the most common threats:

Validity and Reliability Whichever design, research must be valid and reliable. Internal validity = extent results are accurately interpreted. External validity = generalizability of results. Internal reliability = consistency of methods and analysis among researchers in the same setting. External reliability = replicability of methods and analysis by independent observers in the same or similar settings.

Some Pitfalls of Analysis & Interpretation Invalid design, e.g. mismatch between instrument and research questions (problem of internal validity). Hawthorne effect, e.g. researcher effect on subject response (problem of internal validity). Uncorroborated analysis – (problem of internal validity & reliability in qualitative research). Interaction effects of multiple testing & treatments (problem of external validity in exp & quasi-exp research). History – incidents between or before testing that interfere with effects of treatment in exp & quasi-exp research).

Data AnalysisYour data analysis, just as everything else, follows from your research question, the way your concepts were operationalized, and your research design.

HelpThe information on survey instruments and on construct validity was taken from links obtained through the following website. This website also has excellent guidelines for conceptualizing and writing-up research.http://www.lib.csubak.edu/Dave/psyc/apaman.html. (Incidentally, Dave is David Cohen, a colleague of mine in the California State University system.)Another useful website for getting statistics help and to download statistics programs (and even an online book, SPSS for Beginners) is:http://www.statistics.com.The best book for knowing which research design and statistical analysis to use, plus how to write them up in the correct format, as well as a very clear explanation of how to use SPSS is:Gren, SB, Sikind, NJ, Akey, TM (2000). Using SPSS for Windows: Analayzing and Understanding Data. New Jersey: Prentice Hall.Also see the attached guidelines for what to include in each section of your proposal.

http://www.statistics.com/

http://www.lib.csubak.edu/Dave/psyc/apaman.html

Beverly (How to Write a Research Proposal That Works)

Documents

Transcript of Beverly (How to Write a Research Proposal That Works)