Bus Test vs. Embarrassment Test: Validity & Reliability of Student Course Ratings

"... a good (i.e., a comprehensive) eval system will have the capability and the mandate to further explore both the bus hits and the embarrassments. "

From: Mike Theall
Date: Apr 28 2006 - 7:43am

I have three internal reports to prepare by next Wednesday (good thing it’s Trav’s turn in webcast 2), so I can’t make a thorough response now, but maybe later. However, I’ll try for a quick answer.

Anything can happen, including the apparent contradiction of high learning and low ratings. No one claims the generalizations are perfect. In fact, it’s the rare violations that give rise to a lot of the confusion about ratings. My position is that a good (i.e., a comprehensive) eval system will have the capability and the mandate to further explore both the bus hits and the embarrassments. That system combines various eval mechanisms with data from assessment, institutional research, and other sources to try to understand phenomena that don’t “fit”.

Example: Jen Franklin I looked at gender ratings found (from the entire dataset) no relationships to ratings (i.e., no evidence of gender bias). We then broke the data down by discipline. AHA! We found significant differences in ratings in business fine arts (i.e. a violation of the generalization from a lot of major research and from our own full dataset analysis). Then we collected data about gender of the faculty found in these 2 cases, predominantly male faculty. Then we explored course assignments found that in these areas women were primarily teaching large-enrollment, required, entry-level courses (each factor contributing a bit toward lowering average ratings for whoever teaches such courses). Thus, the ratings differences may have had nothing to do with gender and teaching, but rather with context and teaching (i.e., the difficulty of teaching such courses compared to senior seminars etc) A replication at another university (both studies with big samples by the way) found no male-female differences in these disciplines. Further exploration also found no gender inequity in hiring and no inequities in course assignments.

So, had we stopped at finding significant differences favoring males in these disciplines, we could have made the error of claiming that ratings are biased by gender and/or by some disciplinary factor. There may have been bias in the first case, but it wasn’t from students and it wasn’t because ratings were invalid or unreliable. Just the opposite. Ratings provided data that were reliable and valid and when properly reviewed, showed evidence of a different kind of possible gender bias: that coming from administrators or the workplace environment.

So, violations of the general rules occur with ratings (as with every other area). Claiming the occasional violation as evidence of invalidity or unreliability is most often a mistake. Good practice requires the ability and support to investigate unusual phenomena.

Use my name if you want to.

Cheers,

mike

Bus Test vs. Embarrassment Test

Thursday, May 11, 2006

Validity & Reliability of Student Course Ratings - M. Theall 4/28/2006

No comments:

Blog Archive

Links

Contributors