Pondering Teaching Evaluations

Posted on December 12, 2008 by Glenn Petersen

Elisabeth Gareis recently raised a question regarding student evaluations of our courses, which prompted me to write this. But her post doesn’t have “evaluations” in its title, and so I’m making a new post of this, rather than simply commenting on Elisabeth’s, in order to draw attention to the matter of evaluations.

I take a deep breath and write that I find I have deep and progressively more distressing doubts about the worth and efficacy of teaching evaluations, at least as they exist at Baruch College. There, I’ve said it.

I lay no claim to having systematically studied Baruch’s evaluations as a whole. But I carefully scrutinize every evaluation of every member of my department every term, and as a member of the School of Liberal Arts & Sciences’ Personnel and Budget Committee I see the evaluations of every member of the arts and sciences faculty who comes before the committee for personnel actions, including reappointments, tenure and promotion, and sabbaticals. I’m probably as familiar with the college’s teaching evaluation patterns as anyone. And among the things I see are several consistent tendencies that trouble me. Trouble, as in “Why do we put so much emphasis on such imperfect instruments.” Let me quickly note that I wish the college paid a whole lot more attention to the importance of teaching in tenure and promotion processes than it does; what I’m talking about here are the evaluations, not the teaching. And, because this is a blog, I’m not going to go into detail; I’m merely pointing out some of the issues that concern me.

First, it is my considered opinion that our evaluation format serves primarily as a popularity contest. Because I’ve observed all my faculty (except some of the very newest GTFs and adjuncts) in the classroom, I have a sense of the relationship between what I can actually see of their skills and how students rate them. There is some consistency at the lower end, I think; people who are in my opinion less than skilled teachers do tend to get lower ratings, but I’m not sure there’s a strong correlation here. What I do find consistently, though, is that nice guys tend to finish first. I’ve seen accomplished teachers get lower scores because of their personalities. And I’ve seen at least one colleague consistently receive 5.0s while teaching not much differently than anyone else in our department. I have come to the conclusion that students tend to rate their profs by how much they like them, not by how skillful or effective their teaching is.

When I examine the entire range of items on a single evaluation printout, I find very little variation from item to item (though I would exclude from this generalization the final items about improvements in communication skills, etc., on the new forms). That is, if students approve of their instructors on the whole, they tend to rate them highly on every individual item. They tend to discriminate minimally among the different qualities the form asks them to evaluate. The fundamental idea behind our format is for us to receive useful feedback about our performances on specific aspects of our teaching (i.e., commentary on where we could use improvement), but I have little sense that this is what is actually happening.

The scores of a significant majority of my department’s faculty cluster rather closely together. While a few get scores in the mid to high 3s, and thus lower the department’s mean a bit, most get scores between 4.2 and 4.6. On the one hand, this isn’t a bad thing; it’s nice to see so many fine teachers in my department. On other hand, how do we make sense of a 5-point scale when almost everyone’s scores lie within a 4 decimal point range near the top? I’d like to be able to take teaching evaluation scores into account in my decision-making, but I can’t seriously make discriminations between people when nearly all the scores are bunched together this way, nor does this pattern incline me to think that students are employing their critical faculties when they evaluate us.

There is much more to be said about this. Most of you are familiar with many of the other criticisms that are regularly leveled against the evaluation process and format. I’ve opened the valve just a bit to release some of the pressure. I repeat, my comments here are in no way meant to downplay the importance of teaching. To the contrary, I so value good teaching that I’m troubled by how poorly we evaluate it. We are using an inherently flawed process as a guide to making critical tenure and promotion decisions. Experience tells me that it is next to impossible to change our evaluation procedures (it took 25 years or so to achieve the last changes to our format, changes I see as mostly cosmetic, though there are those who disagree with me about this), and I reluctantly acknowledge that I don’t have any alternative clearly in mind. I just thought I’d use this occasion to rant a bit.

In the course of a casual conversation with our Provost after I had written this, I mentioned some of these thoughts. He was surprised at my conclusions and told me research shows that evaluations are valid instruments. But as we pursued the topic a bit, it became clear that what he’s talking about-at least as I understood him-is the difference between someone with 2.5 and someone with a 4.5. Yes, I recognize that these evaluations can be used to spot the occasional complete incompetent. But that’s not how we use them most of the time; it’s actually quite rare to find someone consistently below 3.5 and there really aren’t all that many people consistently below 4.0, at least in the School of Liberal Arts & Sciences.

This entry was posted in Student Participation, Uncategorized. Bookmark the permalink.

21 Responses to Pondering Teaching Evaluations

Arthur Lewin says:

December 12, 2008 at 10:21 am

“I have come to the conclusion that students tend to rate their profs by how much they like them, not by how skillful or effective their teaching is.” It might not be that cut and dry. Because they like their instructors, students may actually learn more from them.

Also, the tight clustering of teaching evaluation scores reminds me of the tight clustering of student grades in elite colleges and in most graduate schools. The same underlying factors may be responsible for all three phenomena.

“If students approve of their instructors on the whole, they tend to rate them highly on every individual item.” I notice the same thing when I write recommendations for students. If I approve of the student on the whole, I tend to rate them highly on every individual item.

The intriguing questions you raise are probably true of evaluations as a whole. There is no such thing as a perfect evaluation mechanism, including tests of every kind. For example, the SATs are certainly better than what went before, basing college admissions on bloodlines, connections or money. But, nonetheless, like all evalution mechanisms, they have serious flaws.

Perhaps all we can do is try to move toward perfection in every kind of evaluation knowing that we can never achieve it. The intriguing questions you raise may serve as metaphor not only for testing in general but the entire scientific enterprise.
Veena Talwar Oldenburg says:

December 12, 2008 at 10:33 am

I find that scores are related to the scores they get on tests. And they think that As and occasionally Bs are what they deserve for every quiz. When they get fewer marks than that–and in my classes a lot of them actually cluster around C and D, and some with Fs–they will give you the same grade, and call all the tests confusing, not related to the text book etc.
Yes, you do get a few points for having a sense of humor, but that is about it. I know colleagues who serve sweet snacks (donuts and the like, rich in white sugar and transfats)in every elective–and that goes over very well too. But that tactic hasn’t persuaded me…..
Elisabeth Gareis says:

December 12, 2008 at 12:28 pm

Research supports some of the objections raised during this discussion. Students seem to perceive instructors with a sense of humor and pleasant personality as more effective. Grades seem to influence course evaluations (although the effect is less pronounced the older the student gets). Another factor is gender: Older students and students from specific cultural backgrounds seem to prefer male instructors, younger students female instructors. I haven’t seen any studies on bribery with food. It’s research waiting to be conducted!

So, evaluations are not perfect, but in combination with observation reports (if they are based on sound and measurable criteria) and other documents (e.g., the instructor’s teaching philosophy, syllabi, samples of student work, grade distributions), they seem useful.

On a personal level, I find the evaluations (the numerical items, but particularly also the handwritten comments) a helpful tool for revising my syllabi and teaching goals.
Sean O'Toole says:

December 12, 2008 at 2:46 pm

Interesting post. Thanks, Glenn. I’ve always wondered about evaluations. I’ve also always taken them really seriously. A former colleague and administrator once told me about a study of student evaluations that gave me pause. In the study (the details of which I’m afraid escape me), students were asked to evaluate their professors after the first five minutes of class on the first day and then again at the very end of the semester. The results were virtually identical. My colleague used this to give me the following advice about the first day: smile often and don’t wear black.

There are a couple of ways to read these findings, of course, and perhaps they’re both right: On one hand, good teaching requires finding ways to get students to “like” us — to feel inspired to engage with us in what is (hopefully) new and intellectually challenging work. At a recent Master Teachers workshop, Charlie Cannon talked about this in terms of inviting students to make the leap into what might otherwise seem uninspired (the required course), abstract (the purely “academic” subject), or counterintuitive (the unknown). In this reading, students’ gut-level responses constitute more than a popularity contest. On the other hand, students make judgments quickly, unconsciously, and according to preexisting systems of value: smile often and don’t wear black.

I know some former colleagues who can’t bear — or resolutely refuse — to look at evaluations. Women, nonwhite, and/or lesbian and gay faculty often fear retaliation. This is perhaps less true on a campus as diverse as Baruch, but it’s a concern nonetheless.

Regardless of who we are (or who students think we are), the results of student evaluations require interpretation. The student who gives a 1 or 2 across the board in a class that otherwise gives 4’s and 5’s is of particular interest. Has this student had a different experience of the course than the others? Misread the instructions? Is he or she upset about a grade? Responding to some deeply felt belief about how we “should” teach — or look or behave? Could this student have been reached, even, given our particular subject/pedagogy/embodiment and his or her own attitudes? The real “problem” with evaluations is that we don’t know. We can only look for patterns, try to address resistant students better in the future, and perhaps remember that teaching is itself an imperfect instrument.
Mehmet Genc says:

December 12, 2008 at 7:27 pm

Nice post. My concern is not with clustering among faculty who get between 4.2 and 4.6. I see the correlation between “expected” grades and the ratings as a bigger problem. Students’ evaluations should be based on how the teacher performed in class. Instead, it becomes a yardstick for rewarding or punishing the instructor based on the grades he gives. Call me cynical but this is what I see. The incentives set up in the system makes the resolution of the problem more difficult. If your contract is renewed annually and renewal depends on how well you teach, well… you will do whatever it takes to raise your ratings, even if it means giving every student accolades. If, on the other hand, every professor’s grade distribution was identical, students would have to evaluate and differentiate professors on other items. They would be forced to. Would there be clustering? Perhaps. Would overall ratings fall? Probably. But could one attribute any difference in evaluations to the expected grades? No, because all professors’ grade distributions would be more or less the same.
Therefore, the problem is more than having a sense of humor. Yes, we do have to invite students to take a leap, but we also do have to reduce the variance in grade distributions within the same course.
Tomasello says:

December 13, 2008 at 12:16 am

I personally prefer glancing at the voluntarily submitted and quite random Ratemyprofessors numbers [Quality = Helpfulness+Clarity (1-5) and Easiness (1-5)].

There are two types of scores that, of course, are useless: When I see professor who has the two numbers (Quality and Easiness) above 4.0, I can’t judge. This could imply, “You’re good; I learn.” But it could just as well mean, “You’re easy; that’s (you’re) good.” When I see the two scores below 3.0, that could mean, “The concepts are very difficult; you’re not an easy grader” or “You can’t teach; you’re also abusive.” So we can infer nothing here either.

However, there are two types of scores, I think, worthy to consider seriously: When I see 4.3 for Quality and, say, 2.3 for Easiness, then I kind of walk away believing students think, “Tough subject. Good teacher.” And, unfortunately, whenever there is the opposite, for example, Quality 1.7 and Easiness 3.2, one has to really wonder what the heck is going on in the classroom.
Tomasello says:

December 14, 2008 at 10:22 am

I see that Baruch College has officially sanctioned Ratemyprofessors:
http://www.baruch.cuny.edu/news/david_sitt_hottest.htm
I guess we can now use the numbers for tenure and promotions.
glenn petersen says:

December 15, 2008 at 5:46 pm

I’ve been listening to discussions about correlations between students’ expected grades and their overall ratings of their professors for as long as I’ve been on P&B committees (i.e., 25 years or so). I didn’t mention the issue in my post because it is to me a classic example of the problem with correlations: causality. In my own conceptualization of relationships between teaching and grades, a skilled teacher’s students should be learning a great deal, and learning it well, and should therefore be likely to earn high grades. I realize that this is not always the case, of course, but it does seem to me that in any given case it is entirely possible that a significant correlation between high expectations and high appreciation for the teacher’s skill would be entirely appropriate.
Dennis Slavin says:

December 15, 2008 at 8:56 pm

Sean mentions a study that sounds like the ones cited on pp. 12-15 of Bain, “What the Best College Teachers Do.” Indeed Bain cites a study that suggests that only a few seconds of exposure to an instructor by video are enough to produce a close correlation to evaluation scores after an entire semester. One might conclude that this boils down to very superficial qualities (like smiling etc.), but Bain draws a deeper inference: that humans have evolved (my paraphrase) to have a highly developed ability to judge who they can learn from best — a judgment that they have learned by experience is usually borne out over the course of the semester. Expected grades might be related, but only in the sense that they are tied to learning. For what it’s worth, in years of examining the results of Baruch’s previous evaluation form, which asked students what grade they expected, I found only a very loose correlation with the evaluation score. As Glenn suggests, causality is uncertain.

I would point out that the difference between 4.2 and 4.6 on the evaluation form translates to the difference between 84 and 92 — not an insubstantial difference, especially if students seem prone to grade us highly.
glenn petersen says:

December 17, 2008 at 5:07 pm

Prof. Tomasello draws our attention to Ratemyprofessor.com. Some readers may be unaware of the video clips he has posted on that site: (http://professorsstrikeback.mtvu.com/professor-andrew-tomasello-baruch-college/). Since he’s brought the topic up, let me take this opportunity to weigh in with my own views about his video clips. In them he responds to some criticisms of his personal style that students have made on the Ratemyprofessor.com site. He tells students, among other things:

“You say I’m mean and not cool? F–k you.” [clip 1]

You say I’m too controlling? You’ve got to stop getting stoned before my class.” [clip 2]

“I don’t have to come to class. I don’t have to teach.”
“Every other Thursday I get a direct deposit check from the State of New York into my bank account. I don’t have to come to class.” [clip 3]

My hunch is that he thinks everyone will understand he’s being playfully ironic, but I can assure you that many—probably most—of our Baruch students do not get the joke. I did a systematic, anonymous, survey in my classes last spring, asking my students to view the clips and then write down their responses. In my two classes, a total of two students reported that they found them funny. One student said the clips simply prove that Prof. Tomasello is just the sort of person (I’m paraphrasing here) his students say he is. The rest—that is, the overwhelming majority—said they were offended by them. Some said offended; many said deeply offended.
I raise this issue because it has been my experience, and that of others who teach (at Baruch and elsewhere), that a great many students—particularly working class, immigrant, and newer students—are intimidated by their professors and hesitate to approach us with questions about the class. While we should be doing everything we can to encourage students to approach us, Prof. Tomasello’s video clips serve, in my opinion, only to confirm their fears, and communicate to them that we are impatient, condescending, rude, and just as terrifying as they imagine us to be. I think his videos are doing all of us, students, faculty, and administration, a grave disservice.
At least some of his colleagues assume that Prof. Tomasello believes he’s engaging in something along the lines of “tough love,” telling students what they need to hear—that their problems are of their own making. But tough love is something parents must occasionally resort to with children having specific emotional and behavioral problems. It is not meant to be a one-size-fits-all aid to education.
Baruch’s administration is troubled by these videos, but says it’s up to the Faculty Senate to deal with them; the Faculty Senate’s executive committee members are loathe to engage with the issue, saying either that it’s a matter of academic freedom or that it’s the administration’s responsibility. So the issue isn’t being addressed.
Andrew, would you think a bit more about what your remarks are communicating to students unfamiliar with your earthy style, and to parents coming from backgrounds where higher education is conducted in a more formal style. Why should they want to entrust their children’s education to someone who say f–k you to them when they raise questions about your pedagogical style? I recognize your right to say what you think, but I also point to our professional responsibilities to teach our students as effectively as possible. You’re not scaring them away from just yourself; when we fail to respectfully ask you to take the posts down we’re all complicit in communicating to our students that we really don’t care enough about them to curtail such outlandish, and offensive, behavior toward them.
Tomasello says:

December 17, 2008 at 11:54 pm

Anyway, back to the topic at hand and my reason for visiting the blog:

While cleaning my desk this morning, I found ratings from my two summer courses. What I found curious was the differences in the area entitled “This course challenged me intellectually.” The course that sweeps through music history while presenting a great number of new concepts and difficult vocabulary received a 3.7. The other course, an introduction to the music business in which the curriculum slogs through cases of good and less-than-good business decisions made by artists and the roles and responsibilities of support staff, received a 4.0.

I’m always stunned by the disconnect between the students (numerical) perception of the intellectual challenges of a course and my perception of same. This is especially true in the music capstone course (for which I have been maintaining a BLSCI blog for the last five semesters). These students are responsible for source readings from Plato to Marcuse and Adorno, for understanding subtle topics like aesthetics and critical theory, and for submitting a major research paper. Yet 4900 doesn’t always get to 4.0 for “challengingness.”

It could be that the explanation lies in the fundamental differences in the understanding of the terms. I think that, perhaps for some students in my class, the term “intellectually challenging” must mean “interesting.”

No doubt, there is some truth in the evaluations; and they are instructive for teachers to ponder. But, looking at the questionnaire again, I can only wonder what students perceive as a “goal” or what for them constitutes something being “organized” or “helpful.”
Arthur Lewin says:

December 18, 2008 at 3:28 pm

Glenn writes that, “In my own conceptualization of relationships between teaching and grades, a skilled teacher’s students should be learning a great deal, and learning it well, and should therefore be likely to earn high grades. . . . (I)t does seem to me that in any given case it is entirely possible that a significant correlation between high expectations and high appreciation for the teacher’s skill would be entirely appropriate.”

A self-fulfilling prophecy may be at work here. Because the instructor is upbeat and positive about his students and their ability to do the work, they are perhaps encouraged to think well of the instructor and believe that they can do the work. . . Studies have shown that this is true on the elementary and secondary levels. I suspect it might also be true in higher education.

Perhaps positivity promotes an upward spiral in the learning curve, while negativity engenders a downward movement.
Tomasello says:

December 19, 2008 at 5:07 pm

Arthur, it’s no doubt true that happier people are better at getting their ideas believed. After all, if teaching is ultimately sales, who wants to buy an idea from a grouch? And since “vibes” are contagious, who wouldn’t want to pay attention to someone who was selling positivity?

Since it has been proven that physically attractive people are more successful in life, get better jobs, and earn more, I could imagine that attractive people are probably more positive about life. I would also believe that attractive teachers are seen as more successful at teaching than the gnomish and asymmetrical among us. (In my all-boys HS, for example, most everyone thought Mrs. Brinko to be both a “10” and the best teacher.)

If you sort the Baruch RMP site for physical attractiveness (Hot?), you’ll see that of the 60 professors most often rated “Hot” by students, only 7 score under 3.5 in the Overall Quality range–not a scientific study by any means but anecdotal evidence worthy of note.

As a hideous professor way down at the Willy Loman end of the spectrum, I know I have to work harder on my teaching technique as well as on my smile and my shoe shine.
Susan Chambre says:

December 22, 2008 at 6:12 pm

This certainly has been an interesting and lively discussion. My thanks to Glenn for his frankness and for initiating the discussion.

I have two points and suggestions to add:

1) First, teaching evaluations are only one measure of teaching effectiveness. They’re an easy way to assess a complex phenomenon: what students learn, what they retain, how it influences their long term academic and job-related peformance.

They are not terribly reliable: when in the semester they’re administered apparently matters and expected grade seems to influence the outcome. Various discussions have viewed the development of evaluations as an important source of grade inflation.

Like all indicators, they are imperfect but not to be ignored.

I would hope that this discussion might inspire a revisiting not of the instrument (which was recently changed) but of the use and the need to combine evaluations with other indicators of performance: a standardized teaching portfolio and/or perhaps a review of syllabi as part of the external review process of individuals and Departments.

2) The current method of calculating scores relies on a measure — the mean — which is highly subject to extreme values. In one course of mine last spring, one unhappy student lowered the evaluation on some items from 4.5 to 4.29. Two unhappy students, giving ratings of 1, lowered the mean to 4.07. Of the 14 students responding in this small class, there were two students who were unhappy. They seemed to agree that the course challenged them intellectually and improved their writing. But their low rankings made a difference. The remaining 12 students all provided ratings of 4 or 5 on every item.

In the absence of complex policy changes, perhaps the administration might want to consider substituting means with medians: this statistic is far less subject to the impact of one or two individuals, especially in a small class.

The problem, then, as I see it is not that the evaluations are not useful, but that they need to be evaluated more holistically as part of a broader profile and might be analyzed differently.
Arthur Lewin says:

December 23, 2008 at 4:17 pm

Susan, you write that “(v)arious discussions have viewed the development of evaluations as an important source of grade inflation.” Yes, I have also seen that. However, the opposite must also be true, namely that “the lack of evaluations would be an important source of grade deflation.”

Nonetheless, “when in the semester they’re administered apparently matters and expected grade seems to influence the outcome,” thus calling the reliability of evaluations into question. However, if a way could be found to control when in the semester evaluations are given, and if a mathematical formula could be devised to control for respondents’ expected grade, the reliability of evaluations might be improved.

But even if evaluations were made more reliable, that is more consistent, the question of their validity, their ability to measure what they purport to measure “what students learn, what they retain, how it influences their long term academic and job-related peformance” nonetheless would remain.

In similar fashion the reliability and the validity of the measures we use to evaluate students should also be questioned. Perhaps a holistic approach in evaluating students’ evaluations of faculty, should be matched by a holistic approach in how we evaluate our students. Not that we don’t in either case – we do – but the holistic trend should be encouraged in both.
Andra Ghent says:

January 6, 2009 at 6:18 pm

I am quite frankly appalled by the notion that good teaching = good grades. With any class where there are more than 30 students, there is (or at least should be) a hard curve such that there is no relationship whatsoever between the quality of teaching and the grades. The fact that there are no hard curves mandated by Baruch policy is honestly astounding to me and I do feel as though I am being punished on my teaching evaluations by enforcing one.
Andra Ghent says:

January 6, 2009 at 6:21 pm

As another thought, it is exceptionally for an instructor to game teaching evaluations – give an easy midterm with high grades and then give a difficult final. Of course, substantially more learning happens when the instructor follows the converse approach…
glennpetersen says:

January 7, 2009 at 12:40 am

By my quick back-of-the-envelope calculation, Andra, I’ve been teaching for quite a bit longer than you’ve been alive, which probably means I’m hopelessly out of date and old-fashioned. You’ll have to bring me up to speed. I’m not quite sure what a “hard curve” is (and a trip to Google suggests that the term has no widely agreed-upon sense), but I’m assuming that it means some of those who are doing what I think is admirable work are going to have to receive Cs, Ds, and maybe Fs. I’m quite at a loss to understand why my students’ essays, if I’m working with students closely to help them write well, should necessarily fall into this hard curve. Does this hard curve mean that we must assign students poor grades even if they do good work? Does it mean that even if all the students in one section learn a great deal, because they’ve been taught by a skilled teacher, some of them will have to receive lower grades than students who’ve learned less in a section taught by an incompetent teacher? That students should be graded not on the basis of what they’ve learned, or the quality of the work they’ve done, but purely on the basis of their standing relative to the performance of other students? In other words, does this mean that good teaching is irrelevant to learning and learning irrelevant to grades? Could you please explain to us just how an arbitrary curve helps students learn more effectively, makes them into more thoughtful citizens, or conscientious human beings. I confess that I do not understand this concept. In the spirit of this blog, dedicated as it is to helping us improve our teaching, please expand a bit on just how this pedagogy works, and why it will make us better teachers, and our students better learners. And why, after watching students struggle through the course of the semester to improve the quality of their work, I would still be required to give them the same grades they earned on their first essays? And perhaps you might also give us some insight into just how you think the Faculty Senate and our union could be convinced to go along with a “mandate” to make hard curves “Baruch policy.”
Andra says:

January 22, 2009 at 6:30 pm

Yes, you have the right idea about what I meant about a hard curve. This was the policy at my undergraduate institution and I think it makes a lot of sense.

Grades are inherently relative since different faculty members have widely varying views on what constitutes mastery of the material. I would have failed 1/3 of my students last term if I had been grading on an absolute scale. Was this because my tests were too hard or because the students didn’t study? I just don’t know – my sense is that you always get into these sorts of judgment calls if you leave the discretion to the instructor to define grades.

My undergraduate transcripts also listed both the student’s grade as well as the class average. This allows whomever is reading the transcript to come to their own conclusions about what the grades mean.

It just seems like the incentive of the professor is to give out high grades under the current system as such a strategy yields better teaching evaluations, fewer complaints from students (and hence less work), and a warm fuzzy feeling that you are a brilliant educator since so many of your students get good grades. Given this incentive compatibility problem, I believe it is the responsibility of the university to ensure that grades represent relative rankings on the same scale instead of widely varying opinions as to what constitutes mastery of the material. I certainly agree that good instructors teach their students more and that, if we were all benevolent social planner types, we would allocate grades without any regard to our own incentives and that this would be the best of all possible worlds. However, I’m not at all confidant that we are in fact so blind to our own best interests and egos.

More generally, the problem is that the current system seems to explicitly put students from top-tier institutions in the best position when applying to graduate schools and on the job market. Unless you are intimately familiar with the academic standards of an institution, you just don’t know how to evaluate a transcript when it comes from a lesser known institution. So employers and grad schools rationally discount the grades from such institutions and choose to rely more on where the student went to school than how they did – they figure the very fact that the student got in and got at all decent grades means he or she is intelligent and competent. My suspicion is that the problem is compounded by substantially more grade inflation at private universities than at public institutions.

My view is that the best students from most universities are usually quite good but the average probably differs quite a bit from institution to institution. If I were an employer, I would much rather hire a student in the top decile at Baruch than a median student from an Ivy League university. But if I am not all that familiar with Baruch’s academic standards, I don’t know whether a student with a 3.75 is in the top decile or just OK. The current system thus does a serious disservice to our top performers. This seriously undermines the notion of equality of opportunity.
glennpetersen says:

January 25, 2009 at 4:56 pm

A few observations:

First, in my last comment I asked a series of questions about how this hard curve helps students learn. In Andra’s reply the word “learn” appears not at all; “teach” shows up just once. “Grade” or “grading” appears 12 times. We seem to be talking past one another. Perhaps someone else could weigh in here and find a way for the two of us to get onto the same page?

Second, in Andra’s list of “incentives,” I see nothing about educating students. I don’t know what the people she hangs out with talk about, but my colleagues and I spend far more time talking about what we want students to learn, and why we want them to learn it, than we do about gaming the system. Perhaps she might consider hanging with a different crowd.

Third, I am of course quite concerned about our students’ employment prospects (I have a daughter graduating in May and I have a pretty good sense of the apprehension she’s experiencing these days). But I have no influence whatsoever over corporate hiring practices, so I find Andra’s discussion about employers’ assessments of our students’ grades somewhat abstract. On the other hand, I do have a fair degree of impact on what students learn in my classroom, and so I’ll continue to focus my efforts on how I teach, rather than on how Baruch’s grades compare with Princeton’s. (I have difficulty imagining that simply by getting Baruch’s grading practices onto a level playing field we’d see our students achieving hiring preference over Harvard/Wharton/Princeton grads in, for instance, the financial services industry, where so many of them have sought employment in recent years, but I’d be more than happy to consider data to the contrary.)

Finally, the notion that we would be making a substantial contribution to equality of opportunity for our students by requiring ourselves to give them Fs, Ds, and a large number of Cs for what we think is good work calls to mind Samuel Johnson’s celebrated characterization of patriotism.
Dennis Slavin says:

January 26, 2009 at 11:15 pm

If we have general agreement about what our students should learn in a class – let me rephrase that as what our students should be able to DO once they’ve completed a class (solve certain types of equations; identify the style period in which pieces of concert music were composed; express in complete sentences six reasons for the start of the civil war; etc.) and we inform them of our expectations – then grading becomes an exercise in judging how close they’ve come to achieving those goals. I can live easily with that amount of subjectivity in grading.

If all of our students can do those things, then they all deserve A’s, but we should reconsider where to position the bar next time round. In my experience we can come fairly close to general agreement about goals after a reasonable amount of discussion. And if there is a good mix of newer and more experienced instructors taking part in the discussion, the goals will be about right. If we find that a large percentage of the students don’t meet these criteria, we might have set the bar too high; we might have taught them poorly; they might have blown off our class.

Andra suggests that some instructors might like the “warm fuzzy feeling that you are a brilliant educator since so many of your students get good grades.” If that’s the Sylla to avoid, I’d suggest that Charybdis is the smug feeling of superiority that might derive from failing our students en masse. I’ve seen both.

Given that very few American colleges indicate deciles, I don’t think that our students are disadvantaged without that measure. Most employers know what a 3.75 (or 2.75) from Baruch means. By the way, the mean GPA of Baruch graduates in 2008 was 3.1.

Finally, I think that while “hard curves” are very unfair – a given class might be mostly nitwits or mostly honors students; how can the B’s in one class be compared with the other? – the essential problem is that such curves contradict what our enterprise is about: learning. Those who have learned should receive grades that reflect that.

Comments are closed.

Pondering Teaching Evaluations

21 Responses to Pondering Teaching Evaluations

Recent Posts

Archives

Meta