Elisabeth Gareis recently raised a question regarding student evaluations of our courses, which prompted me to write this. But her post doesn’t have “evaluations” in its title, and so I’m making a new post of this, rather than simply commenting on Elisabeth’s, in order to draw attention to the matter of evaluations.
I take a deep breath and write that I find I have deep and progressively more distressing doubts about the worth and efficacy of teaching evaluations, at least as they exist at Baruch College. There, I’ve said it.
I lay no claim to having systematically studied Baruch’s evaluations as a whole. But I carefully scrutinize every evaluation of every member of my department every term, and as a member of the School of Liberal Arts & Sciences’ Personnel and Budget Committee I see the evaluations of every member of the arts and sciences faculty who comes before the committee for personnel actions, including reappointments, tenure and promotion, and sabbaticals. I’m probably as familiar with the college’s teaching evaluation patterns as anyone. And among the things I see are several consistent tendencies that trouble me. Trouble, as in “Why do we put so much emphasis on such imperfect instruments.” Let me quickly note that I wish the college paid a whole lot more attention to the importance of teaching in tenure and promotion processes than it does; what I’m talking about here are the evaluations, not the teaching. And, because this is a blog, I’m not going to go into detail; I’m merely pointing out some of the issues that concern me.
First, it is my considered opinion that our evaluation format serves primarily as a popularity contest. Because I’ve observed all my faculty (except some of the very newest GTFs and adjuncts) in the classroom, I have a sense of the relationship between what I can actually see of their skills and how students rate them. There is some consistency at the lower end, I think; people who are in my opinion less than skilled teachers do tend to get lower ratings, but I’m not sure there’s a strong correlation here. What I do find consistently, though, is that nice guys tend to finish first. I’ve seen accomplished teachers get lower scores because of their personalities. And I’ve seen at least one colleague consistently receive 5.0s while teaching not much differently than anyone else in our department. I have come to the conclusion that students tend to rate their profs by how much they like them, not by how skillful or effective their teaching is.
When I examine the entire range of items on a single evaluation printout, I find very little variation from item to item (though I would exclude from this generalization the final items about improvements in communication skills, etc., on the new forms). That is, if students approve of their instructors on the whole, they tend to rate them highly on every individual item. They tend to discriminate minimally among the different qualities the form asks them to evaluate. The fundamental idea behind our format is for us to receive useful feedback about our performances on specific aspects of our teaching (i.e., commentary on where we could use improvement), but I have little sense that this is what is actually happening.
The scores of a significant majority of my department’s faculty cluster rather closely together. While a few get scores in the mid to high 3s, and thus lower the department’s mean a bit, most get scores between 4.2 and 4.6. On the one hand, this isn’t a bad thing; it’s nice to see so many fine teachers in my department. On other hand, how do we make sense of a 5-point scale when almost everyone’s scores lie within a 4 decimal point range near the top? I’d like to be able to take teaching evaluation scores into account in my decision-making, but I can’t seriously make discriminations between people when nearly all the scores are bunched together this way, nor does this pattern incline me to think that students are employing their critical faculties when they evaluate us.
There is much more to be said about this. Most of you are familiar with many of the other criticisms that are regularly leveled against the evaluation process and format. I’ve opened the valve just a bit to release some of the pressure. I repeat, my comments here are in no way meant to downplay the importance of teaching. To the contrary, I so value good teaching that I’m troubled by how poorly we evaluate it. We are using an inherently flawed process as a guide to making critical tenure and promotion decisions. Experience tells me that it is next to impossible to change our evaluation procedures (it took 25 years or so to achieve the last changes to our format, changes I see as mostly cosmetic, though there are those who disagree with me about this), and I reluctantly acknowledge that I don’t have any alternative clearly in mind. I just thought I’d use this occasion to rant a bit.
In the course of a casual conversation with our Provost after I had written this, I mentioned some of these thoughts. He was surprised at my conclusions and told me research shows that evaluations are valid instruments. But as we pursued the topic a bit, it became clear that what he’s talking about-at least as I understood him-is the difference between someone with 2.5 and someone with a 4.5. Yes, I recognize that these evaluations can be used to spot the occasional complete incompetent. But that’s not how we use them most of the time; it’s actually quite rare to find someone consistently below 3.5 and there really aren’t all that many people consistently below 4.0, at least in the School of Liberal Arts & Sciences.