Evidence on relationships between teaching evaluations and ratemyprofessors teaching … ·...
Transcript of Evidence on relationships between teaching evaluations and ratemyprofessors teaching … ·...
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 1
Evidence on relationships between teaching evaluations and
ratemyprofessors teaching tags
Terrance Jalbert
University of Hawaii Hilo
ABSTRACT
This research examines relationships between teaching tags and teaching evaluations.
The paper constitutes one in a series of papers by the same author that examines
Ratemyprofessors.com data. Since 2014, Ratemyprofessors’ reviewers can assign tags that
provide detailed comments about the class and professor. The analysis here examines how these
student-assigned teaching tags relate to teaching evaluations and the willingness of a student to
take another course from the professor. This paper analyzes data for 202 business professors.
Results identify variables important and unimportant in determining teaching ratings. Professors
might focus efforts on critical areas identified here to improve their teaching ratings.
Keywords: Determinants of Teaching Evaluations, Ratemyprofessors, Teaching Tags
Copyright statement: Authors retain the copyright to the manuscripts published in AABRI journals.
Please see the AABRI Copyright Policy at http://www.aabri.com/copyright.html
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 2
INTRODUCTION
Student evaluations of teaching provide a useful method of assessing professor
performance. These evaluations allow for standardized comparison across professors and across
universities. Evidence indicates increasing popularity of teaching evaluations for determining
instructor performance. Miller and Seldin (2014) conducted a survey of 401 accreted four-year
liberal arts colleges. Results indicate that 99.3 percent of these colleges consider classroom
teaching a major factor in evaluating faculty performance. Moreover, 94.2 percent of colleges
use systematic student evaluations in determining teaching performance. Student evaluations
exceed the popularity of second place variable, chair evaluation, by more than fifteen percent.
Emphasis on student evaluations increased by more than six percent from 2000 to 2010.
Given the importance of teaching evaluations, it is valuable to study determinants and
potential biases in these evaluations. One difficulty associated with conducting studies on
teaching evaluations occurs because the evaluations are generally not publicly available. Thus,
most studies examining evaluations include data limited to a single university. A recently
developed instructor rating tool, called Ratemyprofessors (RMP) allows students to make public
reviews of faculty. As of June 27, 2019, the service reports ratings for 1.7 million professors
from over 7,500 schools, including more than 19 million individual ratings
(Ratemyprofessors.com, 2019).
The Ratemyprofessors tool enjoys high levels of popularity and respect among students.
Brown, Baillie and Fraser (2009) provide an excellent student survey regarding
Ratemyprofessors. Results show that 83 percent of students have visited RMP and 36 percent
have rated a professor on RMP. Students indicated that RMP ratings constitute an honest and
representative measure of teaching abilities. Indeed, 96 percent of participants believed that
students provide more or equally honest reviews in their RMP ratings than standard teaching
evaluations. Moreover, 81 percent of students believed RMP ratings better or equally represent
instructors’ performance than standard teaching evaluations. Students indicated substantial
intentions to use RMP ratings when making academic decisions. Some 71 percent of students
avoided taking an instructor because of their RMP rating. Results show significant correlation
between RMP and standard teaching ratings. However, there remains some unexplained
differences between the two ratings methods. Their results show that standard teaching
evaluations tend to be higher than RMP ratings.
Since September of 2014, RMP reviewers can assign tags to teaching evaluations.
Students can select up to three tags out of a list of 20 provided by Ratemyprofessors
(Ratemyprofessors, 2019). The Tags function allows students to assign indicators to an
evaluation that provide insights into important determinants of their evaluation. This study uses
RMP data with a focus on the relationship between Tags and overall teaching evaluations. By
examining these data for multiple professors and multiple schools, this study provides a national
study of teaching evaluation determinants.
This paper represents the third in a series that relies on variations of the same dataset. In
an earlier paper, Jalbert (2019a) combines Ratemyprofessors and Social Science Research
Network Data (SSRN) for 300 business professors to examine relationships between professor
teaching and research ratings. Results indicate no relationship between teaching and research
ratings. The author argues that teaching and research duties represent separate and distinct
activities. Results further indicate that course difficulty, expertise area, experience, tenured
status and Hotness significantly impact teaching ratings. In addition, rating profiles differ
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 3
significantly between public and private schools. Jalbert (2019b), examined the extent to which
professors who excel at both teaching and research exits. Results indicate that few professors
exist who achieve excellence both in teaching and research. Fewer than 25 percent of professors
placed in the top 50 percentile on both measures. Moreover, many seriously deficient professors
exist in the areas of teaching, research or both. Results show 50 percent of all professors falling
in the lower 30th percentile for teaching, research, or both.
The remainder of the paper transpires as follows. In the next section we provide a review
of the extant literature. The data and methodology section presents data utilized in the study,
provides some summary statistics and a description of how data examination proceeded. Next,
the approach provides the empirical results and discussion. The paper closes with some
concluding comments and some recommendations for professors.
LITERATURE REVIEW
A large body of research examines teaching evaluations. Much of this research examines
how various demographic factors relate to teaching evaluations. Marks (2000), used structural
equation modeling to examine student evaluations. He identified five latent variables in student
evaluations as follows: organization; workload and difficulty; expected grades and fairness of
grading; instructor liking and concern; and perceived learning.
Mixed evidence exists on differences in teaching ratings by gender. Some authors find
that female professors receive lower teaching evaluations than male professors. Wagner, Rieger
and Voorvelt (2016) found that women are eleven percent less likely than men to earn teaching
evaluations that meet the minimum requirements for tenure and promotion. Miles and House
(2015) found that female and males are generally competitive, but that females earn noticeably
lower scores in large classes. Mengel, Sauermann and Zölitz (2018) examined a sample
including nearly 20,000 teaching evaluations. They found that women received systematically
lower teaching evaluations than their male counterparts. Bosow (1995) evaluated student
evaluations by gender of both the student and professor. She found no impact of student gender
on male professor’s evaluations. However, male students ranked female professors lower than
female students. Similarly, Centra and Gaubatz (2000) found that female students consistently
give higher evaluations to female professors. Jalbert (2019a and 2019b), found that female
professors receive significantly higher teaching evaluations than male professors. He cautions
readers on interpreting results from these studies. He notes that his results, and those from other
similar studies, could be driven by either rater bias or gender-based differences in teaching
capabilities. In a rare paper to address both issues, Boring, Ottoboni and Stark (2016) find that
females receive lower teaching evaluations and further find that these differences are not
reflective of differences in teaching quality.
A small body of research examines the extent to which class size impacts teaching
evaluations. McPherson (2006) examined principles level courses finding that class size
negatively affects teaching evaluations. Feldman (1984) found a small inverse relationship
between class size and students’ overall evaluation of the teacher and course. Miles and House
(2015) found that class size impacts teaching evaluations. Linsky and Strauss (1975) found a
negative relationship between overall teaching ratings and university enrollment. This result
suggests that professors from small schools earn higher teaching evaluations than those from
large schools. However, Jalbert (2019a) found no relationship between university size and
teaching evaluations.
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 4
A stylized fact in higher education suggests that course grades significantly impact
teaching evaluations. McPherson (2006) found that expected grades significantly impact student
teaching evaluation scores. Similarly, Bilgen, Susanh and Kaytaz (2015) found positive
associations between student teaching evaluations and grades in Turkey. Miles and House
(2015) examined more than 30,000 evaluations for 255 professors. Their results show that
higher teaching evaluations associate with higher expected grades. Isley and Singh (2005)
examined expected grades and cumulative grades as they relate to teaching evaluations. They
found the gap between expected grade and cumulative grades significantly explains teaching
evaluations with a negative sign. This implies that larger discrepancies lead to lower
evaluations. The authors argue this gap constitutes a more appropriate variable than expected
grade in explaining teaching evaluation scores. Brown, Baillie and Fraser (2009) found that
difficulty ratings on RMP evaluations relate to teaching evaluations in a stronger way than for
standard teaching evaluations.
A small body of literature examines the role of humor in education and in teaching
evaluations. Garner (2006) examined evaluation scores for 117 undergraduate students in a
distance education format. He found that courses with humor received significantly higher
evaluations than those without humor. Moreover, students retained more information when
humor was incorporated into the course. Banas, Dunbar, Rodriguez and Liu (2010) provided a
review of the existing literature and summarize the results by noting the use of nonaggressive
appropriate humor in the classroom associates with a more relaxed and interesting learning
environment, higher instructor evaluations, stronger motivations to learn and more course
enjoyment. Sojka, Gupta and Deeter-schmelz (2010) found that faculty believe easy and
entertaining instructors receive higher teaching evaluations.
An interesting study by Leis and McKinzie (2019) examined the impact of industry
experience on teaching evaluations. They examine 355 sets of teaching evaluations and
identified a positive relationship between industry experience and instructor evaluations.
Interestingly, they find negative relationships between years of teaching experience and three
evaluation criteria: defining expectations and objectives, effective communication and
availability of professor.
DATA AND METHODOLOGY
This paper utilizes data from Ratemyprofessors.com. Data collection began with the
sample used in Jalbert (2019a) and Jalbert (2019b). The initial sample included 300 business
professors from 104 universities, who were both listed in the Social Science Research Network
(SSRN) website and had 10 student reviews on Ratemyprofessors. Collected data includes the
teaching rating and difficulty rating of each professor from RMP. Data collection also involved
recording the number of student reviews the professor realized and the percentage of students
who indicated a willingness to take another class from the professor since May 25, 2016. The
analysis also involved collecting Tag data. For evaluations completed after September of 2014,
students may add up to three Tags to their review indicating how the students describe the
professor and course. Ratemyprofessors aggregates these Tags and provides totals for
professors. The following list reports the twenty candidate Tags students may select from:
Gives Good Feedback, Respected, Lots of Homework, Assessible Outside Class, Get Ready to
Read, Participation Matters, Skip Class? You Won’t Pass, Inspirational, Graded by a Few
Things, Test Heavy, Group Projects, Clear Grading Criteria, Hilarious, Beware of Pop Quizzes,
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 5
Amazing Lectures, Lecture Heavy, Caring, Extra Credit, So Many Papers and Tough Grader
(Ratemyprofessors.com, 2019).
To supplement professor data, the analysis involved gathering data on the university
where the professors teach. Collected data includes University Quality, University Reputation
and University Average Ranking (Each with a scale from 0-5 with 5 equaling the highest score).
Finally, we identified data on the Number of Professors reviewed at the university and use this as
a measure of university size. The Public vs Private variable indicates 1 if the school is public
and 0 if private. The Professor Gender variable indicates 1 for female and 0 for male.
Next, the process involved reducing the dataset on two metrics to assure that the dataset
included only recently active professors. To eliminate obsolete observations, data reduction
involved eliminating professors whose most recent Ratemyprofessors.com rating occurred prior
to January 1, 2017. This process reduced the dataset by 94 observations to 206 observations.
Next, the method removed four observations that included no reported Tags. This resulted in a
final dataset of 202 professor observations for analysis including 7,563 total evaluations
implying an average of just over 37 evaluations per professor.
The analysis considered five measures of teaching evaluation. Raw Evaluation as
reported by RMP constitutes the first measure. The process continued by making three
enhancements to this measure. Given the stylized fact that course difficulty impacts teaching
evaluations, the second measure of performance combines teaching ratings and course difficulty
measures. This paper equally weights the two measures to arrive at a new measure, Weighted
Evaluation, in a manner analogous to Jalbert (2019a and 2019b). Equation 1 shows the
calculations of this measure.
Next, the approach considered the possibility that students from different universities
evaluate professors in different ways. The analysis here uses RMP University Average Rating to
standardize the Raw evaluations across universities. Equation 2 shows the Standardized
Evaluation technique. By multiplying the relative evaluations by 5 in Equation 2, the
Standardized Evaluations utilize the same scale as raw and weighted evaluations. The fourth
approach both weighted and standardized the data, Standardized and Weighted Evaluation, as
shown in Equation 3. These first four measures have values ranging from 1-5 with 5 equaling
the highest. A final teaching evaluation measure uses a new measure of teaching performance
available from RMP whereby students indicate if they would take the professor again. The
Willingness to Retake variable equals the percentage of students who indicate they would take
the class again.
����ℎ��� �� ����� = ��� ���������������� !�""�#���$
% (1)
&�����'��(�� �� ����� = ��� ����������
)��� ����$ *� ��+ ����������, 5 (2)
&�����'��(�� ��� ����ℎ��� �� ����� = .���/��/�0 / ���������������� !�""�#���$
% (3)
The data included observations from 91 universities with ten universities producing four
professors, 25 universities producing 3 professors, 31 universities producing 2 observations and
25 universities producing 1 professor. The Number of Professors evaluated at the sample
universities ranged from under 350 to more than 8,500 indicating considerable size diversity
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 6
among the sample universities. Data includes 57 finance professors, 52 accounting professors, 52
economics professors, 28 management professors and 13 professors classified as others.
As noted earlier, students may select up to three tags to accompany their evaluation.
Table 1 (Appendix) shows available tags from which student can select. Aggregate Responses
indicates the total number of responses received by all sample professors. Recall that the total
number of evaluations in the dataset equaled 7,563. The largest number of responses equals 745
for Tough Grader followed by Skip Class You Won’t Pass and Respected with 596 and 461
responses respectively. The next two columns show the number of professors having at least one
response and the number of professors without a response for each Tag. The most common Tag
was Tough Grader. The least common tag was So Many Papers. Percent with Responses
indicates the percentage of professors with one or more responses for a Tag with values ranging
from 6.93 percent to 77.72 percent. The column Mean Number of Responses shows the average
number of responses for those professors who received the Tag. The largest mean number of
responses equals 4.745 and the minimum equals 1.071.
RESULTS
The analysis begins with single regressions of each of the twenty-eight independent
variables individually on the dependent variables. The approach used ordinary least squares
regression and estimated the following three models for each independent variable where ɛ is a
random error term:
1�2 �� ����� = 3 + 5678���9������ :�'��;��< + = 4
����ℎ��� �� ����� = 3 + 5678���9������ :�'��;��< + = 5
���������>> �� 1���?� = 3 + 5678���9������ :�'��;��< + = 6
Table 2 (Appendix) shows the results. Panel A provides results for the Raw Evaluation
dependent variable. Panel B and C show results for the Weighted Evaluation dependent variable
and Willingness to Retake dependent variable respectively. Raw Evaluation results reveal eight
significant independent variables at the one percent level: Amazing Lectures, Hilarious,
Respected, Caring, Inspirational, Assessable Outside Class, Course Difficulty and Good
Feedback. An additional four variables show significance at the 5 percent level: Get Ready to
Read, Clear Grading Criteria, Expertise Area and Public Versus Private.
In each case, the data revealed coefficients consistent with the expected sign as noted in
the column labeled Expected Sign (E.S.) Each variable produced a positive response with the
exceptions of Get Ready to Read, Course Difficulty and Public versus Private. The data here
confirms the common conception that course difficulty affects teaching evaluations. R2’s on the
regressions range from 0.0004 to 0.1707 with course difficulty and amazing lectures producing
the highest R2’s.
Regressions on Weighted Evaluations produced similar results. Again, twelve
independent variables produce significant results including: Amazing Lectures, Hilarious,
Respected, Caring, Inspirational, Assessible Outside Class, Skip Class You Won’t Pass, Good
Feedback, Expertise Area, School Reputation, Public versus Private and Number of Professors.
R2 values ranged from 0.0000 to 0.642 with Amazing Lectures, Respected, and Good Feedback
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 7
producing the highest R2. Two significant variables in the Raw Evaluations did not reach
significance in the Weighted Evaluation results: Get Ready to Read and Clear Grading Criteria.
In addition, Course Difficulty was not incorporated as an explanatory variable in the Weighted
Evaluation examinations. Three insignificant variables in the Raw Evaluation results achieved
significance in the Weighted Evaluation results: Skip Class you Won’t Pass, School Reputation
and Public Versus Private.
Results reveal robustness to changes in dependent variable specifications. Standardized
Evaluations (Equation 2) and Weighted Standardized Evaluations (Equation 3) produced similar
results to the raw and weighted variables presented here. For this reason, the tables here exclude
these results.
The analysis turns to an examination of the dependent variable Willingness to Retake.
Recall the Willingness to Retake variable indicates if the student indicate they would retake the
class. Many observations did not include data for this variable. For this reason, regressions on
Retake include 108 observations. Results show the variables Amazing Lectures, Hilarious,
Respected, Caring, Assessible Outside Class, Get Ready to Read, Course Difficulty, Text Heavy,
Clear Grading Criteria, Pop Quizzes and Professor Gender significantly explain the variable
Would Retake. The positive coefficient on Professor Gender reveals that students indicate a
higher desire to retake female professors.
The approach continues with multiple regression analysis that incorporates all twenty-
eight independent variables. We estimate the following equations using ordinary least squares
regression for Raw Evaluations:
1�2 �� ����� = 3 + 567@A�(��� B�C� '�>< + 5%7D���'�� >< + 5E71�>9�C���< +
5F7G�'���< + 5H78�>9�'�������< + 5I7@>>�>>�;�� J �>��� G��>>< + 5K7B�C� '� D��L< +
5M7N�'��C�9����� O����'>< + 5P7&?�9 G��>> Q� ���R� N�>>< + 56S7T'� 9 N'�U�C�>< +
5667T�� 1���L �� 1���< + 56%7&� O��L N�9�'>< + 56E7B��> �V D�A�2�'?< +
56F7G� '>� W�VV�C ��L< + 56H7X�>� D��L< + 56I7G���' T'����� G'���'��< +
56K7T'���� ;L � Y�2 Xℎ���>< + 56M7X� �ℎ T'���'< + 56P7T��� Y���;�C?< +
5%S7N�9 Z �((�>< + 5%67[�'� G'����< + 5%%7[9�'��>� @'��< + 5%E7T����'< +
5%F7\���'>��L Z ����L< + 5%H7\���'>��L 1�9 ������< +
5%I7\���'>��L @�'��� 1�����< + 5%K7N ;��C > N'����< +
5%M7] A;�' �V N'�V�>>�'>< + = 7
The Weighted Evaluation regressions, while otherwise identical, exclude the Course
Difficulty independent variable. The regression on Willingness to Retake utilizes the same
independent variables noted in Equation 7.
Table 3 (Appendix) shows the results. Results for Raw Evaluations, presented in Panel
A, show twelve significant variables. Interestingly, Hilarious, Caring and Assessible Outside
Class do not reach the 10 percent significant level in the multiple regression. However, some
insignificant variables in the single regressions show significance in the multiple regression.
These variables include: Lecture Heavy with a negative coefficient, Graded by a Few Things
with a negative coefficient, Professor Gender with a negative coefficient and School Quality with
a negative coefficient. The resulting R2 equals 0.5196 and the Adjusted R2 equals 0.4419.
The Weighted Evaluation examination, Panel B, produce similar results however, the
variables Clear Grading Criteria, School Quality and School Reputation do not reach a 10
percent significance level. The R2 and Adjusted R2 equal 0.3438 and 0.2420 respectively.
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 8
Willingness to Retake (Panel C) produces interesting results that differ substantially from
the previous results. Amazing Lectures, Hilarious, Caring and Inspirational do not achieve
significance in the model. Willingness to Retake reveals a positive association with Respected,
Assessible Outside Class, Clear Grading Criteria, Extra Credit, and School Average Rating.
Willingness to Retake reveals a negative relationship with Lecture Heavy, Get Ready to Read,
Test Heavy, and Tough Grader. The regression results in an R2 of 0.6071 and an adjusted R2 of
0.4679.
Next, the approach applied stepwise regression to the model. Stepwise regression
objectively selects explanatory variables for inclusion in the model that increase the model fit.
Several variable selection methods exist. The analysis here uses forward selection which
involves adding variables to the model in each stage that meet the inclusion criteria. The
process stops when no further variables meet the criteria. The criteria require a ten percent
significance level for inclusion into the model.
Table 4 (Appendix) shows the stepwise regression results. Panel A shows results for
Raw Evaluations. Results indicate that eleven variables met the ten percent significance criteria
for inclusion in the model. Six variables show significance at the one percent level, three at the
five percent level and two at the one percent level. As one might expect, course difficulty
produced the highest Partial R2 at 0.1707. Amazing Lectures provides a partial R2 of 0.0880.
Lecture Heavy, Caring and Public vs Private and Graded by a Few Things round out the one
percent significance variables. The full model R2 reaches 0.4622.
Panel B shows results for Weighted Evaluations. As noted earlier, the approach excludes
Course Difficulty from these regressions. Amazing Lectures, Graded by a Few Things,
Assessible Outside Class and Public vs Private achieve significance at the one percent level.
School Average Rating enters the model indicating some significant differences in evaluations
across schools. The full model produces a lower Model R2 of 0.3126, a value lower than the
Raw Evaluation results. Interestingly, Caring was an important contributor in the Raw
Evaluation results but did not enter the Weighted Evaluation results. Similarly, Gender entered
the Weighted Evaluation results, but was not significant in the Raw Evaluation results.
Panel C shows the Willingness to Retake results. The results here differ substantially
from the Raw and Weighted Evaluation results. Test Heavy, which did not enter the Panel A or
B results, entered the model here in Stage 2. It produced a partial R2 of 0.1370 indicating the
type of testing impacts Willingness to Retake. Another interesting finding shows that Amazing
Lectures and Course Difficulty did not enter the Willingness to Retake model. The full model
resulted in a Model R2 of 0.5502.
Next we aggregate the results to glean collective inferences. Examining the combined
results in this fashion allows for identification of independent variables that exhibit robustness to
measurement differences. Table 5 (Appendix) shows the combined results. Panel’s A and B
report results for the Single and Multiple regressions respectively. A positive notation indicates
the variable entered the model significantly with a positive coefficient. A negative notation
indicates the variable entered the model significantly with a negative coefficient. Variables
without a notation did not enter the model. Panel C reports results for the stepwise regression.
The figure in the cells show the sequence in which the variable entered the model.
Not surprisingly, Amazing Lectures shows positive significance in each simple and two
multiple regressions. It also entered two stepwise regressions. The only regressions that
Amazing Lectures did not enter as significant was the multiple and stepwise regressions on
Willingness to Retake. This suggests the obvious path of improving lecture quality for teaching
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 9
evaluation improvement. Hilarious generated significant positive results in the simple
regressions for each dependent variable. However, it did not result in significant coefficients in
the multiple or stepwise regressions. This finding suggests that research on the relationships
between humor and teaching evaluations should carefully consider appropriate control variables.
Respected constituted the only significant variable in every regression and entered with a
positive coefficient in each case. Clearly, Respected impacts teaching evaluations.
Nevertheless, it is difficult to identify specific actions a professor might accomplish to increase
the amount of respect they enjoy. This finding suggests the need for additional research to
identify variables related to students’ respect of a professor.
Caring yielded significant results in the single regressions with a positive sign and in two
of the stepwise regressions. However, results did not reach ten percent significance in the
multiple regressions. Overall, it appears advantageous for professors to increase their level of
caring. Inspirational was significant in two of three single and multiple regressions, with
insignificant results in both cases for the Retake variable. Results produced mixed signs on the
Inspirational variable with positive coefficients in the single regressions, but negative
coefficients in the multiple regressions. This variable entered each of the stepwise regressions
late in the sequence and with marginal significance. While the result is interesting, it remains
difficult to pinpoint specific actions a professor might take to increase perceived inspiration
levels.
Accessible Outside of Class delivered significantly positive coefficients in each single
regression and in one multiple regression. The variable entered each stepwise regression at
stages up to third place. Students clearly value accessibility. Moreover, accessibility falls
directly under control of the professor. For this reason, improving accessibility appears
beneficial for professors seeking higher evaluation scores.
Lecture Heavy generated significant negative coefficients in each multiple regression but
did not achieve significance in the single regressions. It entered each stepwise regression in the
earlier stages. It appears that including other classroom activities, in addition to lectures, might
produce higher teaching evaluations. Results show little effect of Participation Matters, Skip
Class you Won’t Pass and Group Projects on teaching evaluations.
Get Ready to Read yielded significant negative coefficients in two single regressions and
one multiple regression. It also entered two stepwise regressions. The result implies that
professors should carefully select reading materials for their courses and might eliminate
noncritical reading. The variables So Many Papers and Lots of Homework did not produce any
significant results.
Course Difficulty produced significant coefficients for both independent variables, but
only in the Raw Evaluation multiple regression. It was the most significant item in the Raw
Evaluation stepwise regression but did not enter the other stepwise regressions. The negative
coefficient indicates a clear path for professors to increase their evaluation scores.
Next, consider grading techniques used by the professor. Results show Test Heavy
impacts Willingness to Retake negatively but did not show an impact for the other dependent
variables. Clear Grading Criteria, produced positive coefficients for Raw Evaluations and
Willingness to Retake regressions, but did not show significance in any Weighted Evaluation
regressions. Graded by a Few things resulted in negative coefficients in the raw and weighted
multiple regressions and entered each of the stepwise regressions. Tough Grader generated a
negative coefficient for Willingness to Retake and entered two stepwise regressions. Good
Feedback yielded positive coefficients for the Raw and Weighted single and multiple regressions
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 10
and entered the stepwise regressions for these two dependent variables. Pop Quizzes showed a
negative coefficient on the Willingness to Retake single regression but did not result in a
significant coefficient for any other model. Extra Credit returned only one negative significant
coefficient. The combined evidence suggests that professors wishing to improve their
evaluations should focus heavily on specifying and adhering to unambiguous grading criteria.
They should work to make their examinations representative of all materials covered. Finally,
providing good feedback leads to better teaching evaluations.
Professor Expertise Area yielded significant results with mixed coefficient signs. These
results suggest that further research examining evaluations by area, using a larger dataset, might
produce interesting results. Similarly, Professor Gender produces coefficients with mixed signs,
again suggesting additional research to further understand the relationship. University
Reputation, Average University Rating and Number of Professors produced sporadic significant
results. The variability in those results makes it difficult to make inferences. Negative
coefficients on the Public vs Private single regressions and the variables entry into two stepwise
regressions indicates systematic differences between public and private university evaluations.
Public university professors appear to earn lower evaluations than private school professors. It
remains unclear if these differences imply private schools provide better professors to their
students or there exists systematic bias in evaluations. Future research might shed additional light
on this issue.
CONCLUDING COMMENTS
This paper examines determinants of business professor teaching evaluations. The
examination utilizes newly available data from Ratemyprofessors.com. Ratemyprofessors.com
provides a public forum for evaluating professors that is standardized across professors,
universities and nations. A recently added feature of Ratemyprofessors.com allows students to
incorporate Tags into their evaluations. These Tags allow students to identify selected
characteristics of the professor and course that influenced their evaluation. This paper examines
relationships between these Tags and associated teaching evaluations. The paper utilizes data for
202 business professors.
Results indicate some variables impact teaching evaluations, while others produce limited
or no evidence of an impact on evaluations. The analysis here confirms that course difficulty
impacts teaching ratings in a negative way. As one would expect higher quality lectures relate to
higher teaching evaluations. While respected professors receive significantly higher
evaluations, it remains unclear any specific actions professors might undertake to improve in this
category. Evidence indicates that professors who make themselves more available outside of
class can improve their teaching evaluations. Providing clear grading criteria and good feedback
might further lead to higher teaching evaluations. Professors should manage required reading to
minimize student workload whenever possible. Professors wishing to improve their teaching
evaluations should focus on these areas as well as other significant areas noted in this research.
Like all research, this paper includes some limitations. These limitations include the
relatively small dataset utilized in the paper. Future research might verify the results here when
additional data becomes available through RMP. Further, a considerable body of literature
suggests that female and male students evaluate professors differently. RMP does not report
evaluator gender, making it impossible to assess these effects in the current research. Students
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 11
may only list three tags for a professor and students do not rank the Tags by importance.
Enhanced data on these metrics might provide new and interesting results.
REFERENCES
Banas, J.A., N. Dunbar, D. Rodriguez and SJ Liu (2011) “A Review of Humor in Educational
Settings: Four Decades of Research,” Communication Education, Vol. 60(1), p. 115-144
Bilgen Susanh, Z., and M. Kaytaz (2015) “Determinants of Student Evaluation of Teaching:
Evidence from Turkey,” Journal of Business & Economic Policy, vol. 2(1) p. 121-134
Boring, Ottoboni and Stark (2016) “Student Evaluations of Teaching (mostly) do not Measure
Teaching Effectiveness,” Science Open Research, 11 pages, (DOI: 10.14293/S2199-
1006.1.SOR-EDU.AETBZC.v1)
Bosow, S. A. (1995) “Student Evaluations of College Professors: When Gender Matters.”
Journal of Educational Psychology, vol. 87(4) p. 656-665
Brown, M.J., M. Baillie, and S. Fraser (2009) “Rating Ratemyprofessors.com: A Comparison of
Oline and Official Student Evaluations of Teaching,” College Teaching, Vol. 57(2), p.
89-92
Centra, J. A. and N. B. Gaubatz (2000) ”Is there Gender Bias in Student Evaluations of
Teaching?,” Journal of Higher Education, vol. 71(1) p. 17-33
Feldman, K.A. (1984) “Class Size and College Students’ Evaluations of Teachers and Courses:
A Closer Look,” Research in Higher Education, Vol. 21(1) p. 45-116.
Garner, R.L. (2006) “Humor in Pedagogy: How Ha-Ha Can Lead to Aha!,” College Teaching,
Vol. 54(1) p. 177-180
Isley, P. and H. Singh (2005) “Do Higher Grades Lead to Favorable Student Evaluations?,”
Journal of Economic Education, Vo. 36(1), p. 29-42.
Jalbert, Terrance (2019a) “Relationships between Business Faculty Teaching and Research
Ratings,” Research in Higher Education Journal, Vol. 36, p. 1-20
Jalbert, Terrance (2019b) “Do Business Professors Who Excel at Both Teaching and Research
Exist? Research in Higher Education Journal, Vol. 37, p. 1-17
Marks, R.B. (2000), “Determinants of Student Evaluations of Global Measures of Instructor and
Course Value, Journal of Marketing Education, Vol. 22(2) p. 108-119
Lewis, V.J., and K. McKinzie (2019) “Impact of Industry and Teaching Experience, Course
Level and Department on Student Evaluations,” Quarterly Review of Business
Disciplines, Vol. 5(4), p. 335-373
Linsky A.S. and M.A. Strauss (1975) Student Evaluations, Research Productivity and Eminence
of College Faculty, Journal of Higher Education, Vol. 46(1) p. 89-102
McPherson, M. A. (2006), “Determinants of How Students Evaluate Teachers,” Research in
Economic Education, Vol.37(1, Winter) p. 3-20
Mengel, F., J. Sauermann and U. Zölitz (2018), “Gender Bias in Teaching Evaluations,” Journal
of the European Economic Association, Vol. 17(2), p. 535-566
Miles. P. and D. House (2015) “The Tail Wagging the Dog; An Overdue Examination of Student
Teaching Evaluations,” International Journal of Higher Education, Vol. 4(2), p. 116-126
Miller J. Elizabeth and P. Seldin (2014) “Changing Practices in Faculty Evaluation,” American
Association of University Professors, Reports and Publications, (May-June), downloaded
June 25, 2019 from www.aaup.org/article/changing-practices-faculty-
evaluation#.XRJN4HdFyUk
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 12
Ratemyprofessors.com, accessed June 27, 2019 at: www.ratemyprofessors.com/About.jsp
Sojka, Jane, A.K. Gupta and D.R. Deeter-schmelz (2010) “Student and Faculty Perceptions of
Student Evaluations of Teaching: A Study of Similarities and Differences,” College
Teaching, Vol. 50(2) p. 44-49
Wagner, N. M. Rieger and K. Voorvelt (2016) “Gender, Ethnicity and Teaching Evaluations:
Evidence from Mixed Teaching Teams,” Economics of Education Review, Vol. 54
(October), p. 79-94
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 13
APPENDIX
Table 1: Number of Responses
Ratemyprofessor.com
Tag
Aggregate
Responses
Professors with One
or More Response
Professors without a
Response
Percent with
Responses
Mean Number
of Responses
Amazing Lectures 287 93 109 46.04 3.086
Hilarious 207 72 130 35.64 2.875
Respected 461 135 67 66.83 3.415
Caring 331 113 89 55.94 2.929
Inspirational 155 75 127 37.13 2.067
Assessible Outside Class 198 95 107 47.03 2.084
Lecture Heavy 374 129 73 63.86 2.899
Participation Matters 247 97 105 48.02 2.546
Skip Class You Won’t Pass 596 138 64 68.32 4.319
Group Projects 163 65 137 32.18 2.508
Get Ready to Read 302 103 99 50.99 2.932
So Many Papers 15 14 188 6.93 1.071
Lots of Homework 379 110 92 54.46 3.446
Test Heavy 261 109 93 53.96 2.395
Clear Grading Criteria 360 128 74 63.37 2.813
Graded by a Few Things 158 83 119 41.09 1.904
Tough Grader 745 157 45 77.72 4.745
Good Feedback 343 128 74 63.37 2.680
Pop Quizzes 78 33 169 16.34 2.364
Extra Credit 180 59 143 29.21 3.051
Would Take Again 108
This table identifies twenty Tags that students may apply in the process of rating a professor.
Students may select up to three Tags for each professor review. Data includes 202 professors
and 7,563 individual evaluations. Aggregate Responses indicates the total number of responses
received by all sample professors. Professors with one or more response equals the number of
professors with at least one response for the Tag. Professors without a response shows the
number of professors who did not receive a response for the tag. Percent with responses
indicates the percentage of professors that had one or more responses. Mean number of
responses specifies the average number of responses for those professors who received the Tag.
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 14
Table 2: Single Regression Results
Dependent Var Panel A: Raw Evaluations. (N=202) Panel B: Weighted Evaluations. (N=202)
Independent Variable E.S Intercept Coef. T-Stat R2 Intercept Coef. T-Stat R2
Amazing Lectures + 3.5573 0.0879 4.88*** 0.1064 3.4363 0.0351 3.70*** 0.0642
Hilarious + 3.6221 0.0586 2.98*** 0.0424 3.4600 0.0255 2.5** 0.0303
Respected + 3.5445 0.0603 4.55*** 0.0938 3.4331 0.0233 3.34*** 0.0527
Caring + 3.6052 0.0470 3.31*** 0.0519 3.4586 0.0168 2.27** 0.0251
Inspirational + 3.6018 0.1048 3.58*** 0.0601 3.4582 0.0365 2.38** 0.0275
Accessible Outside Class + 3.6015 0.0824 2.85*** 0.0390 3.4446 0.0423 2.85** 0.0390
Lecture Heavy ? 3.7076 -0.0137 -1.01 0.0051 3.4990 -0.0069 -0.99 0.0049
Participation Matters ? 3.6618 0.01665 0.63 0.0020 3.4914 -0.0043 -0.32 0.0005
Skip Class You Won’t Pass ? 3.6927 -0.0036 -0.32 0.0005 3.5435 0.0111 1.97* 0.0190
Group Projects ? 3.6993 -0.0213 -0.81 0.0033 3.4889 -0.0035 -0.26 0.0003
Get Ready to Read - 3.7612 -0.0528 -2.57** 0.0321 3.4953 -0.0062 -0.57 0.0016
So Many Papers - 3.6907 -0.1154 -0.65 0.0021 3.4866 -0.0058 -0.06 0.0000
Lots of Homework - 3.6899 -0.0041 -0.27 0.0004 3.4661 0.0107 1.37 0.0093
Course Difficulty - 5.2069 -0.4634 -6.42*** 0.1707
Test Heavy ? 3.7234 -0.0319 -1.29 0.0082 3.4789 0.0046 0.44 0.0010
Clear Grading Criteria + 3.6238 0.0327 2.21** 0.0239 3.4761 0.0056 0.73 0.0027
Graded by a Few Things - 3.7031 -0.0268 -0.81 0.0033 3.5030 -0.0215 -1.26 0.0079
Tough Grader - 3.7072 -0.0068 -0.85 0.0036 3.4645 0.0059 1.44 0.0102
Good Feedback + 3.4597 0.0780 3.86*** 0.0695 3.4274 0.0346 3.30*** 0.0516
Pop Quizzes ? 3.7006 -0.0477 -1.25 0.0078 3.4889 -0.0071 -0.36 0.0007
Extra Credit + 3.6734 0.0099 0.78 0.0030 3.4837 0.0027 0.42 0.0009
Expertise Area ? 3.5380 0.0997 2.45** 0.0292 3.5407 -0.0378 -1.80* 0.0159
Professor Gender ? 3.632 0.1712 1.56 0.0121 3.5066 -0.0702 -1.24 0.0077
School Quality + 3.6819 0.0001 0.00 0.0000 3.0857 0.1033 1.25 0.0078
School Reputation + 2.9410 0.1875 1.43 0.0101 2.8761 0.1543 2.30** 0.0258
School Average Rating + 2.6839 0.2974 0.37 0.0007 1.7803 0.4570 1.25 0.0077
Public vs Private ? 3.8565 -0.2515 -2.35** 0.0268 3.5710 -0.1224 -2.22** 0.0240
Number of Professors - 3.7797 -0.0000 -1.23 0.0076 3.5687 -0.0000 -2.04** 0.0205
Dependent Var Panel C: Willingness to Retake (N=108)
Independent Variable E.S. Intercept Coef. T-Stat R2
Amazing Lectures + 66.5734 1.6030 2.31** 0.0479
Hilarious + 67.2862 1.5415 2.14** 0.0415
Respected + 63.7992 1.8399 3.33*** 0.0947
Caring + 66.4809 1.3041 2.50** 0.0556
Inspirational + 68.1423 1.6635 1.49 0.0204
Accessible Outside Class + 65.6756 2.7307 2.49** 0.0552
Lecture Heavy ? 70.2533 -0.1346 -0.26 0.0006
Participation Matters ? 67.3451 1.5404 1.47 0.0200
Skip Class You Won’t Pass ? 70.4735 -0.1240 -0.29 0.0008
Group Projects ? 70.1099 -0.1922 -0.20 0.0004
Get Ready to Read - 73.6551 -1.7340 -2.18** 0.0428
So Many Papers - 70.9909 -8.4295 -1.30 0.0158
Lots of Homework - 71.1197 -0.4613 -0.80 0.0061
Course Difficulty - 105.3084 -10.7697 -2.83*** 0.0701
Test Heavy ? 74.3714 -2.2576 -2.38** 0.0506
Clear Grading Criteria + 67.2959 0.09337 1.70* 0.0266
Graded by a Few Things - 72.1418 -1.8639 -1.52 0.0213
Tough Grader - 72.2589 -0.4228 -1.44 0.0191
Good Feedback + 66.7817 1.3096 1.62 0.0243
Pop Quizzes ? 71.9727 -3.5007 -2.57** 0.0587
Extra Credit + 69.1312 0.5177 1.17 0.0128
Expertise Area ? 66.0349 2.3572 1.12 0.0117
Professor Gender ? 66.5641 12.0026 2.83** 0.0489
University Quality + 31.7040 9.8684 1.31 0.0160
University Reputation + 44.8546 6.3730 1.00 0.0094
University Average Rating + -106.3552 47.1721 1.42 0.0187
Public vs Private ? 66.6786 4.3464 0.81 0.0061
Number of Professors - 66.8148 0.0014 0.85 0.0067
This table shows results of simple regressions. ***, ** and * indicate significance at the 1, 5 and
10 percent levels respectively. E.S. indicates the expected sign.
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 15
Table 3: Multiple Regressions
Dependent Variable Panel A: Raw Evals. Panel B: Weighted Evals. Panel C: Retake
Independent Variable E.S. Coefficient T-Stat Coefficient T-Stat Coefficient T-Stat
Intercept 2.8043 1.1068 -221.0462
Amazing Lectures + 0.04763 1.68* 0.03813 2.25** -0.3205 -0.28
Hilarious + 0.0087 0.41 -0.0116 -0.93 1.2648 1.55
Respected + 0.5637 3.00*** 0.0189 1.69* 1.9910 2.19**
Caring + 0.0407 1.58 0.0162 1.05 0.6376 0.62
Inspirational + -0.07604 -2.05** -0.0520 -2.35** -1.3209 -0.86
Accessible Outside Class + 0.04978 1.46 0.02936 1.43 2.7579 2.12**
Lecture Heavy ? -0.0857 -3.98*** -0.0490 -3.81*** -1.844 -2.02**
Participation Matters ? -0.0253 -1.08 -0.01657 -1.18 0.7197 0.77
Skip Class You Won’t Pass ? -0.0105 -0.67 0.0045 0.49 0.3703 0.61
Group Projects ? -0.0394 -1.64 -0.0060 -0.42 0.1516 0.17
Get Ready to Read - -0.0328 -1.48 -0.0079 -0.60 -1.6226 -1.90*
So Many Papers - -0.0607 -0.41 -0.0496 -0.56 -2.8011 -0.52
Lots of Homework - 0.0080 0.48 0.0068 0.69 -0.2580 -0.40
Course Difficulty - -0.3273 -4.27*** 3.8287 0.83
Test Heavy ? -0.0249 -0.73 0.0020 0.10 -3.5070 -2.75***
Clear Grading Criteria + 0.05268 2.12** 0.0005 0.03 2.2264 2.42**
Graded by a Few Things - -0.0793 -2.06** -0.0548 -2.39** -2.2798 -1.55
Tough Grader - -0.0086 -0.59 0.0071 0.82 -1.1295 -2.05**
Good Feedback + 0.0563 2.38** 0.0316 2.23** -0.5624 -0.58
Pop Quizzes ? 0.0027 0.08 -0.0058 -0.30 -1.4596 -1.21
Extra Credit + 0.0025 0.12 0.0040 0.26 1.8500 2.33**
Expertise Area ? 0.0602 1.66* -0.0368 -1.86* 2.4976 1.17
Professor Gender ? -0.1605 -1.67* -0.1505 -2.65*** 6.4030 1.27
University Quality + -0.5544 -1.92* -0.0950 -0.55 3.7277 0.27
University Reputation + 0.6119 2.49** 0.2131 1.45 7.9396 0.69
University Average Rating + 0.4593 0.80 0.5549 1.62 59.4311 2.11**
Public vs Private ? -0.1451 -1.40 -0.0870 -1.40 5.8166 1.11
Number of Professors - -0.0000 -0.10 -0.0000 -1.08 0.0015 1.00
F-Statistic 6.68*** 3.38*** 4.36***
R2 0.5196 0.3438 0.6071
Adj R2 0.4419 0.2420 0.4679
This table shows results of multiple regressions including all twenty-eight independent variables
on the dependent variables. The indicators ***, ** and * indicate significance at the 1, 5 and 10
percent levels respectively. E.S. indicates the expected sign.
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 16
Table 4: Stepwise Regression Results
Dependent Variable Panel A: Raw Evaluations
Variable Entered Number of
Variables
Partial R2 Model R2 F Value
Course Difficulty 1 0.1707 0.1707 41.17***
Amazing Lectures 2 0.0880 0.2587 23.61***
Lecture Heavy 3 0.0431 0.3018 12.23***
Caring 4 0.0421 0.3439 12.65***
Public vs Private 5 0.0262 0.3701 8.16***
Graded by a Few Things 6 0.0299 0.4000 9.71***
Accessible Outside Class 7 0.0157 0.4157 5.20**
Get Ready to Read 8 0.0138 0.4295 4.67**
Respected 9 0.0140 0.4435 4.83**
Clear Grading Criteria 10 0.0103 0.4538 3.62*
Inspirational 11 0.0084 0.4622 2.95*
Panel B: Weighted Evaluations
Amazing Lectures 1 0.0642 0.0642 13.72***
Graded by a Few Things 2 0.0413 0.1055 9.19***
Accessible Outside Class 3 0.0329 0.1384 7.56***
Public vs Private 4 0.0340 0.1724 8.09***
Lecture Heavy 5 0.0211 0.1935 5.13**
Expertise Area 6 0.0231 0.2166 5.75**
Tough Grader 7 0.0219 0.2385 5.58**
University Average Rating 8 0.0146 0.2531 3.78*
Professor Gender 9 0.0107 0.2638 2.79*
Good Feedback 10 0.0124 0.2762 3.27*
Inspirational 11 0.0114 0.2875 3.03*
Respected 12 0.0121 0.2996 3.29*
University Reputation 13 0.0130 0.3126 3.59*
Panel C: Willingness to Retake
Respected 1 0.0947 0.0947 11.09***
Test Heavy 2 0.1370 0.2317 18.73***
Caring 3 0.0757 0.3075 11.37***
Tough Grader 4 0.0350 0.3424 5.48**
Clear Grading Criteria 5 0.0251 0.3675 4.04**
University Quality 6 0.0257 0.3932 4.28**
Lecture Heavy 7 0.0280 0.4212 4.83**
University Average Rating 8 0.0252 0.4464 4.51**
Extra Credit 9 0.0341 0.4804 6.42**
Graded by a Few Things 10 0.0205 0.5010 3.99**
Accessible Outside Class 11 0.0148 0.5158 2.94*
Get Ready to Read 12 0.0173 0.5331 3.52*
Inspirational 13 0.0170 0.5502 3.56*
This table shows stepwise regression results. The approach utilizes forward selection criteria
requiring significance at the ten percent level for inclusion in the model. The indicators ***, **
and * indicate significance at the 1, 5 and 10 percent levels respectively.
Research in Higher Education Journal Volume 38
Evidence on relationships, Page 17
Table 5: Aggregated Results
Analysis Type Panel A: Single Regressions Panel B: Multiple Regressions Panel C: Stepwise Regressions
Dependent Variable E.S. Raw Weight Retake Raw Weight Retake Raw Weight Retake
Amazing Lectures + + + + + + 2 1
Hilarious + + + +
Respected + + + + + + + 9 12 1
Caring + + + + 4 3
Inspirational + + + - - 11 11 13
Assessible Outside Class + + + + + 7 3 11
Lecture Heavy ? - - - 3 5 7
Participation Matters ?
Skip Class You Won’t Pass ? +
Group Projects ?
Get Ready to Read - - - - 8 12
So Many Papers -
Lots of Homework -
Course Difficulty - - - - 1
Test Heavy ? - - 2
Clear Grading Criteria + + + + + 10 5
Graded by a Few Things - - - 6 2 10
Tough Grader - - 7 4
Good Feedback + + + + + 10
Pop Quizzes ? -
Extra Credit + + 9
Expertise Area ? + - + - 6
Professor Gender ? + - - 9
University Quality + - 6
University Reputation + + + 13
University Average Rating + + 8 8
Public vs Private ? - - 5 4
Number of Professors - -
This table shows the combined empirical analysis results. In Panels A and B, a positive notation
indicates the variable entered the model significantly with a positive coefficient. A negative
notation indicates the variable entered the model significantly with a negative coefficient.
Variables without a notation did not enter the model significantly. In Panel C, the cells denote
the sequence or stage with which the variable entered the model. E.S. indicates the expected
sign.