The Assessment of Teaching at a Large Urban Community College

56
The Assessment of Teaching at a Large Urban Community College Terri M. Manning and Denise Wells, Central Piedmont Community College Lily Hwang, Morehouse College Lynn Delzell, UNC-Charlotte Presentation made to AIR, May 19 st , 2003 – Tampa, FL

description

The Assessment of Teaching at a Large Urban Community College. Terri M. Manning and Denise Wells, Central Piedmont Community College Lily Hwang, Morehouse College Lynn Delzell, UNC-Charlotte. Presentation made to AIR, May 19 st , 2003 – Tampa, FL. - PowerPoint PPT Presentation

Transcript of The Assessment of Teaching at a Large Urban Community College

Page 1: The Assessment of Teaching at a Large Urban Community College

The Assessment of Teaching at a Large Urban Community College

Terri M. Manning and Denise Wells, Central Piedmont Community College Lily Hwang, Morehouse College

Lynn Delzell, UNC-Charlotte

Presentation made to AIR, May 19st, 2003 – Tampa, FL

Page 2: The Assessment of Teaching at a Large Urban Community College

Why we evaluate teaching? We do teaching evaluation for two reasons

(heavy emphasis on the 1st): 1. So faculty will have feedback from

students that can be used to improve teaching.

2 . So chairs/division directors can have one consistent indicator of students’ perceptions about faculty (especially part-timers). These are often used as one of several means of teaching assessments for merit.

Page 3: The Assessment of Teaching at a Large Urban Community College

Problems in General with “Evaluation of Teaching” Tools Most are created internally Committees don’t always start at the beginning –

“what is good teaching?” Most are not tested for (at least) validity and

reliability Many are thrown together rather quickly by a

committee whose goal is a usable survey tool

Page 4: The Assessment of Teaching at a Large Urban Community College

Very Few Tools are For Sale Institutions are unique and what they want to

measure is unique (undergraduate, graduate, continuing ed, literacy and distance ed courses)

Because most institutions see them for what they are…. happiness coefficients

No one will stand behind them… “our tool is a valid measure of teaching”

They would never stand up in court So be very careful! Never site your teaching

eval as a reason for not renewing a contract.

Page 5: The Assessment of Teaching at a Large Urban Community College

Problems with the use of them….. The scores are used inappropriately and

sometimes unethically (or at least stupidly) They are used for merit pay, promotion and

tenure Scores are treated like gospel - “you are a bad

teacher because you scored below the department mean on the tool”

Page 6: The Assessment of Teaching at a Large Urban Community College

Problems with use, cont.

Critical at the community college where 100% of the job description is “to teach”

Used to make hiring and firing decisions Teachers are placed in a “catch-22” situation

(do I pretend this tool measures teaching or blow it off….. you could be in trouble either way)

Who is included in group means for comparison purposes

Page 7: The Assessment of Teaching at a Large Urban Community College

A Misconception You get a bunch of people together Throw a bunch of questions together Call it a teaching evaluation tool And “hocus pocus” it is a valid, reliable, sensitive

and objective tool You can make merit, promotion and tenure

decisions with it… no problem

Page 8: The Assessment of Teaching at a Large Urban Community College

What Makes a Good Questionnaire? Validity – it truly (with proof) tests what it says it

tests (good teaching) Reliability – it tests it consistently over time or over

terms, across campuses and methods Sensitivity (this is critical) – it picks up fine or small

changes in scores – when improvements are made, they show up (difficult with a 5-point likert scale)

Objectivity – participants can remain objective while completing the tool – it doesn’t introduce bias or cause reactions in subjects

Page 9: The Assessment of Teaching at a Large Urban Community College

Problems Inherent in Teaching Evaluation with Validity What is “good teaching” It isn’t the same for all teachers It isn’t the same for all students We know it when it is not there or “absent” Yet, we don’t always know it when we see it (if the

style is different than ours) Who gets to define good teaching How do you measure good teaching How can you show someone how to improve it based

on a “likert-scale” tool (this is how you raise your mean by .213 points)

Page 10: The Assessment of Teaching at a Large Urban Community College

Problems Inherent in Teaching Evaluation with Reliability Students perceptions change (e.g. giving them the

survey just after a tough exam versus giving it to them after a fun group activity in class)

From class to class of the same course, things are not consistent

Too much is reliant on the student’s feeling that day (did they get enough sleep, eat breakfast, break up with a boy friend, feel depressed, etc.)

Faculty are forced into a standard bell curve on scores There is often too much noise (other interactive factors,

e.g. student issues, classroom issues, time of day)

Page 11: The Assessment of Teaching at a Large Urban Community College

Greatest Problem …. Sensitivity Likert scales of 1-5 leave little room for improvement Is a faculty member with a mean of 4.66 really a

worse teacher than a faculty member with a mean of 4.73 on a given item

Can you document for me exactly how one can improve their scores

In many institutions, faculty have learned how to abuse these in their merit formulas

Faculty with an average mean across items of 4.88 still don’t get into the highest rung of merit pay

Page 12: The Assessment of Teaching at a Large Urban Community College

The Standard Bell Curve

2.14%

13.59%

34.12% 34.12%

13.59%

2.14%0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

40.00%

-3 -2 -1 1 2 3

Standard Deviations

Mean

Page 13: The Assessment of Teaching at a Large Urban Community College

IQ – An Example of a (somewhat) Normally Distributed Item (key is range)

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

40.00%

45.00%

50.00%

55 70 85 Mean(100)

115 130 145

Scaled IQ Score

Standard Deviation = 15

Page 14: The Assessment of Teaching at a Large Urban Community College

The Reality of Our Tool - Questions #1 of 17,734 responses from Fall 2000)

67.9%

20.5%

9.4%

1.5% 0.5%

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

5 4 3 2 1

Percent

Item Mean = 4.54, Standard Deviation = .77

Mean

1. The instructor communicates course objectives, expectations, attendance policies and assignments.

Page 15: The Assessment of Teaching at a Large Urban Community College

What Would the Scores Look Like?

2.23

3

3.774.54

5.31

6.086.85

0

1

2

3

4

5

6

7

-3 -2 -1 Mean 1 2 3

Scores Forced into aBell Curve

Standard Deviations Above and Below the Mean

Maximum Score = 5

Page 16: The Assessment of Teaching at a Large Urban Community College

How We Developed the Student Opinion Survey at CPCC We started with the old tool An analysis was done (it was rather poor and

proof of administrative reactions to current issues) The old tool contained 20 questions mostly about the

business of teaching (handing back exams, speaking clearly, beginning class on time, etc.)

91% of faculty received all 4s and 5s on each item The less sophisticated students were, the higher they

rated their teachers

Page 17: The Assessment of Teaching at a Large Urban Community College

Next…..

A subcommittee of the Institutional Effectiveness Committee was formed consisting mainly of faculty

The committee spent one year studying the tools of other colleges and universities and lifting what we liked

We found virtually nothing for sale What we did find were test banks of questions

Page 18: The Assessment of Teaching at a Large Urban Community College

Next, cont.

We started with 50-60 questions we liked off of other tools

We narrowed the questions down We worked through every single word in each

statement to make sure they were worded exactly like we wanted them and that they measured what we wanted

We ended up with 36 questions on the new tool

Page 19: The Assessment of Teaching at a Large Urban Community College

Next, cont.

We worked on the answer scale We found students had trouble processing the likert scale

(it wasn’t defined) Students liked the A-F grading scale but faculty didn’t (it

took far less time) We worked through the “excellent, good, fair, poor” type of

scale and the “strongly agree to strongly disagree” scale. We tested two types during our pilot process.

Page 20: The Assessment of Teaching at a Large Urban Community College

Next, cont.

We wanted to create subscales with a wider range of scores than a 1-5 scale:

The art of teaching The science of teaching The business of teaching The course The student

Page 21: The Assessment of Teaching at a Large Urban Community College

Next, cont.

We pilot tested the tool with about 10 classes and followed it up with focus groups (Fall 1999)

We revised the tool We pilot tested again (many sections, about

400 students) with two scales (Summer 2000): A-F scale like grades A-E scale with definitions for each score

Page 22: The Assessment of Teaching at a Large Urban Community College

What We FoundStudents rated faculty differently depending on the scale.Example:

13. How would you rate 13. The instructor the instructor on encourages encouraging thinking and learning thinking and learning.

A-F Scale Strongly Agree Scale

Mean 3.56 Mean 3.48

St.Dev. .74 St.Dev. .71A 241 (68.7%) SA 203 (58.8%)B 75 (21.4%) A 107 (31.0%)

C 28 (8.0%) PA 31 (9.0%)

D 6 (1.7%) D 4 (1.2%)F 1 (.3%) SD 0

Page 23: The Assessment of Teaching at a Large Urban Community College

More Testing

We took the first full data-set (Fall 2000) and did some comprehensive analysis on the tool. We found: Students rated the faculty in more difficult classes

higher (we and the Deans thought the opposite would be true)

Students rated most course difficulty levels as “about right.”

Students didn’t inflate their course involvement and preparation

Page 24: The Assessment of Teaching at a Large Urban Community College

We Attempted to Establish Validity We took the survey results to a Division Director and

had them look at the scores from the survey and compare them with what they knew to be true of their faculty over the years.

The faculty analyzed had been at the college for years and had a definite “history of teaching”

Some we looked at scored rather low and some extremely high (but lots of variance)

The Division Director felt the survey picked the faculty out in order of their teaching ability. Those scoring lower were not considered as good a teacher as those who scored high.

Page 25: The Assessment of Teaching at a Large Urban Community College

Why Validity is Hard

Typically to establish validity, one uses a tool considered “valid” and compares the new tool to the results of the valid tool

With teaching evaluation, there are no established “valid” tools

The only way we knew to validate it was against the historical records of teaching at the College and through some statistical tests (factor analysis)

Page 26: The Assessment of Teaching at a Large Urban Community College

Results

We finalized the tool in summer of 2000

We began using it in every class in Fall 2000

Page 27: The Assessment of Teaching at a Large Urban Community College

Improving Teaching

Chairs or Division Directors should use it appropriately It is one indicator of teaching (we say it counts no more

than 40%) A criteria or benchmark was set (average of 4 on all items –

our criteria) If a faculty scores an average of 4 out of five on every item,

how much more can we really expect? Do not norm-reference it (set means and standard

deviations based on your department’s norms)

Why?????

Page 28: The Assessment of Teaching at a Large Urban Community College

Case Scenario In Fall a faculty member rates a 4.22 on item 12 on the

survey. In her department the mean on that item was 4.76, SD=.36. This faculty member is told “you scored more than one SD below the department mean and need to improve your teaching.”

That faculty member works very hard to improve her teaching. In the Spring term on item 12 she scores a 4.51. She is happy her scores are now up within one SD of the department mean.

However, everyone else in the department also raised their scores and the new department mean is 4.81, SD=.28. Her scores are still more than one SD below the department mean.

Page 29: The Assessment of Teaching at a Large Urban Community College

Case Scenario, cont.

What’s worse, she has a friend in another department where the department mean on item 12 was 3.99, SD=.21.

If only she worked in that department, she would score more than one standard deviation above the mean and be considered a good teacher.

That chair wouldn’t ask her to make improvements in her teaching.

Is she really a better or worse teacher in either department????

Page 30: The Assessment of Teaching at a Large Urban Community College

Case Scenario, cont.

Things can be very different within departments: Some classes are electives Some classes are required for majors Multiple disciplines will be incorporated into a

department mean Some courses are easier than others Students are forced into some classes and don’t

want to be there

Page 31: The Assessment of Teaching at a Large Urban Community College

We found that we had to impress upon the faculty and staff that: Once a Tool is Established….

Every time you change a single word, you invalidate the survey

Every time you change the scale, you invalidate the survey

Every time you add or throw out a question, you invalidate the survey

If not, they want to keep changing it

Page 32: The Assessment of Teaching at a Large Urban Community College

Characteristics of the New Teaching Evaluation

Tool

Page 33: The Assessment of Teaching at a Large Urban Community College

Comparing the Scales

2 12 16

18

73

12

28

57

0

10

20

30

40

50

60

70

80

5's 4's 3's 2's 1's

Old ToolNew Tool

Old Tool % 4-5 = 91% New Tool % 4-5 = 85%

Page 34: The Assessment of Teaching at a Large Urban Community College

Psychometric Properties - Validity

Factor Analysis of the Teacher Evaluation

Assessment Survey

Eigenvalues and Factor Loadings

Factor 1 Factor 2 Emerging Factor 3

Instructor Course Student

Eigenvalue = Eigenvalue = Eigenvalue = 19.35 2.61 1.26

Page 35: The Assessment of Teaching at a Large Urban Community College

The Instructor – Factor 1

The art, science and business of teaching did not factor out separately

The science and business of teaching were highly correlated to the art of teaching

This makes sense. If a faculty member does not utilize multiple methods in teaching or hand papers back in a reasonable amount of time – chances are students won’t rate them as good teachers

How faculty utilize appropriate method and manage the classroom impact how students see them as teachers

Page 36: The Assessment of Teaching at a Large Urban Community College

Psychometric Properties - Reliability Internally consistent = a measure of how

consistent the instrument assesses teaching quality across the items

Cronbach’s Alpha - compares the functioning of each item to all the other items within the instrument (a perfectly reliable instrument will produce a coefficient of 1.00)

The TEAS yielded an Alpha of .974 indicating very good internal reliability

Page 37: The Assessment of Teaching at a Large Urban Community College

Psychometric Properties - Sensitivity While the TEAS may be able to distinguish

improvement in instructors who performed “Below Average” or “Very Poor,” it will not identify improvement in those who have already scored in the top rating (this is fine with us)

Another indication that the instrument may not detect small changes is the rather small item standard deviations (.72 - .98)

The greater the spread across items, the better the sensitivity (the subscales produce this)

Page 38: The Assessment of Teaching at a Large Urban Community College

Sub-ScalesThe Important Pieces

Page 39: The Assessment of Teaching at a Large Urban Community College

The Art of Teaching

The Art of Teaching (items: 8, 10, 11, 12, 13, 14, 15, 16, 17, 20, 21)

The art of teaching involves the more innate aspects of teaching that are not considered method. Examples of this would be a teacher’s ability to motivate students, be enthusiastic, positive attitude toward students and course, encourage participation, make students feel valued and comfortable asking questions, etc.

Page 40: The Assessment of Teaching at a Large Urban Community College

Art of Teaching

Scale of possible points for this item is 11-55Points (it is more sensitive).

Mean: 48.9St. Dev: 8.1

Number scoring 11-21 (<2 on every item) 174 (1.0%)Number scoring 22-32 (<3 on every item) 674 (4.1%)Number scoring 33-43 (<4 on every item) 2,376 (14.5%)Number scoring 44-55 (4/5s every item) 13,192 (80.4%)

From Fall 2000 dataset

Page 41: The Assessment of Teaching at a Large Urban Community College

Science of Teaching

The Science of Teaching (items: 2, 9, 16, 18, 19)

The science of teaching involves methods or areas that can be taught such as organizing class time, clarifying materials with examples, making relevant assignments, use of text book and teaching new things to students.

Page 42: The Assessment of Teaching at a Large Urban Community College

Science of Teaching

Scale of possible points for this item is 5-25 points.

Mean: 22.2

St. Dev: 3.5

Number scoring 5-9 (<2 on every item) 121 (.7%)

Number scoring 10-14 (<3 on every item) 547 (3.2%)

Number scoring 15-19 (<4 on every item) 2,551 (14.8%)

Number scoring 20-25 (4/5s on every item) 14,054 (81.4%)

From Fall 2000 dataset.

Page 43: The Assessment of Teaching at a Large Urban Community College

The Business of Teaching

The Business of Teaching (items: 1, 3, 4, 5, 6, 7)

The business of teaching involves items and issues required by the institution such as handing out syllabi, applying policies and being fair to students, meeting the class for the entire period, holding office hours, providing feedback and announcing tests in advance, etc.

Page 44: The Assessment of Teaching at a Large Urban Community College

The Business of Teaching

Scale of possible points for this item is 6-30 points.

Mean: 26.8

St. Dev: 3.9

Number scoring 6-11 (<2 on every item) 73 (.4%)

Number scoring 12-17 (<3 on every item) 401 (2.4%)

Number scoring 18-23 (<4 on every item) 2,505 (14.7%)

Number scoring 24-30 (4/5s on every item) 14,043 (82.5%)

From Fall 2000 dataset

Page 45: The Assessment of Teaching at a Large Urban Community College

The Course

The Course (3 items: 22, 24, 27) The course evaluation has less to do with the

teacher and more to do with the course characteristics, its applicability to the students’ field of study, difficulty level, etc.

Page 46: The Assessment of Teaching at a Large Urban Community College

The Course

Scale of possible points for this item is 3-15

points.

Mean: 12.8

St. Dev: 2.4

Number scoring 3-5 (<2 on every item) 142 ( .8%)

Number scoring 6-8 (<3 on every item) 750 ( 4.4%)

Number scoring 9-11 (<4 on every item) 3,476 (20.6%)

Number scoring 12-15 (4/5s on every item) 12,489 (74.1%)

From Fall 2000 dataset

Page 47: The Assessment of Teaching at a Large Urban Community College

The Student

The Student (items: 31, 32, 33, 34, 35, 36)

This allows a student to assess the amount of effort they put into the course. While faculty are not responsible for this, it may help explain the variance in teacher evaluation.

Page 48: The Assessment of Teaching at a Large Urban Community College

The Student

Scale of possible points for this item is 6-30 points.

Mean: 26.2

St. Dev: 2.3

Number scoring 6-11 (<2 on every item) 27 ( .2%)

Number scoring 12-17 (<3 on every item) 283 ( 1.7%)

Number scoring 18-23 (<4 on every item) 3,175 (19.0%)

Number scoring 24-30 (4/5s on every item) 13,209 (79.1%)

From Fall 2000 dataset

Page 49: The Assessment of Teaching at a Large Urban Community College

Correlations

1.000 .916** .863** .755** .532**

. .000 .000 .000 .000

16416 16193 15919 15842 15588

.916** 1.000 .870** .753** .529**

.000 . .000 .000 .000

16193 17273 16711 16562 16374

.863** .870** 1.000 .704** .532**

.000 .000 . .000 .000

15919 16711 17022 16294 16117

.755** .753** .704** 1.000 .602**

.000 .000 .000 . .000

15842 16562 16294 16857 16035

.532** .529** .532** .602** 1.000

.000 .000 .000 .000 .

15588 16374 16117 16035 16694

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

ART

SCIENCE

BUSINESS

COURSE

STUDENT

ART SCIENCE BUSINESS COURSE STUDENT

Correlation is significant at the 0.01 level (2-tailed).**.

Correlations between Subscales

Page 50: The Assessment of Teaching at a Large Urban Community College

Regression – What accounts for the most variance (entire data set)?

Model Summary

.917a .841 .841 3.1669 .841 76834.418 1 14569 .000

.926b .858 .858 2.9907 .017 1768.653 1 14568 .000

.929c .863 .863 2.9326 .005 584.070 1 14567 .000

Model1

2

3

R R SquareAdjustedR Square

Std. Error ofthe Estimate

R SquareChange F Change df1 df2 Sig. F Change

Change Statistics

Predictors: (Constant), SCIENCEa.

Predictors: (Constant), SCIENCE, BUSINESSb.

Predictors: (Constant), SCIENCE, BUSINESS, COURSEc.

86% of the variance in the Art of Teaching can be accounted for by the way students rated the Science and Business of Teaching and the Course.

Page 51: The Assessment of Teaching at a Large Urban Community College

Regression – One Course for One Instructor

Model Summary

.877a .770 .757 5.3518 .770 60.236 1 18 .000

.931b .868 .852 4.1779 .098 12.537 1 17 .003

.953c .909 .892 3.5670 .042 7.321 1 16 .016

Model1

2

3

R R SquareAdjustedR Square

Std. Error ofthe Estimate

R SquareChange F Change df1 df2 Sig. F Change

Change Statistics

Predictors: (Constant), SCIENCEa.

Predictors: (Constant), SCIENCE, STUDENTb.

Predictors: (Constant), SCIENCE, STUDENT, BUSINESSc.

In this English 231 class (Amer. Lit.), 89% of the variance in the Art of Teaching can be accounted for by how the students rated the Science and Business of Teaching and how the student rated their classroom participation and readiness.

Page 52: The Assessment of Teaching at a Large Urban Community College

Differences Between Departments

5

15

25

35

45

55

65

art science business course student

Hospitality MeanScience Mean

Page 53: The Assessment of Teaching at a Large Urban Community College

What Was Envisioned byThe Committee Faculty determined to be excellent in the art

of teaching, the science of teaching and the business of teaching would be selected to put together training modules or mentoring programs in each area through the CTL

Faculty scoring low on any of the subscales would be sent to the CTL for serious help

Changes would be documented over time of improvements made

Page 54: The Assessment of Teaching at a Large Urban Community College

The Chair/Division Director’s Role Use the TEAS fairly It is what it is….. When faculty need help, send them for it Attempt to create an atmosphere of “value in

good teaching” in your division Faculty can and should help each other Look for other ways to evaluate teaching

(portfolios, observations, self-assessments)

Page 55: The Assessment of Teaching at a Large Urban Community College

What we plan to do with it….. We plan to sell it through our college’s

Services Corporation (503c) We will either sell the rights to it (test plus

booklet) so you can reproduce it and do your own analysis

Or we can sell the scantron sheets with the survey printed on it and do the analysis for you

Over the next year we plan to analyze a university sample

Page 56: The Assessment of Teaching at a Large Urban Community College

The End

This presentation can be found:

http://inside.cpcc.edu/planning Click on studies and reports It is listed as AIR teaching eval 2003