Bus Test vs. Embarrassment Test: 2006

Thursday, May 11, 2006

Evaluation of teaching NOT a priority for any professional organization! - Trav Johnson 5-1-2006

"... general evaluation of teaching is not a priority for any professional organization. ... Having an organization whose primary focus is evaluating teaching ... would be an important step in promoting and legitimizing the evaluation of teaching in higher education. "

From: Trav Johnson

Date: May 1 2006 - 6:34pm

Tom, I like your idea. Thanks Mike for your great description of history and context for this issue. I would like to add one point that may seem peripheral, but I think it is important.

I have noticed for years that there is not a strong professional organization for the evaluation of teaching. I know there is the special interest group within AERA and there is a topical interest group in AEA. There is some interest/support from organizations like POD and AIR. There is also support from specific centers such as the IDEA Center and the Evaluation Center (Western Michigan).

But the general evaluation of teaching is not a priority for any professional organization. It is always something added on the side and its proponents are scattered across a host of organizations, institutions, and efforts. I think this fragmentation is a hindrance to establishing more widespread understanding and support for the evaluation of teaching. Having an organization whose primary focus is evaluating teaching (or at least one of its primary foci is evaluating teaching) would be an important step in promoting and legitimizing the evaluation of teaching in higher education.

This type of organization could help develop and promote a set of “7 Principles” or other standards/guidelines/best practices for evaluating teaching. And it could help give more credibility and unity for evaluation-of-teaching ideas and efforts. This may be one of the things that is needed to eventually establish more widely accepted and more broadly utilized principles of practice like "7 Principles for Fair and Effective Student Evaluation of Teaching".
Trav

"Principles for good evaluation will always be threatening" - Mike Theall 5-1-2006

"... 1) there has been no major organizational backing for published evaluation principles; 2) there has not been coverage of evaluation issues the equivalent of the press surrounding C & G’s publication of their principles; and 3) principles for good undergraduate education do not threaten anyone, whereas principles for good evaluation will always be threatening. ... Agreeing to the principles might mean actually having to do something and it seems to me that a lot of objection to ratings comes from objection to the very notion of being evaluated. "
From: Mike Theall

Date: May 1 2006 - 9:08am

Tom said, “…there is as much mis-, mal- and non-feasance around evaluation of teaching and courses as ever…. I say we forget the flat Earthers and move on. They have never been convinced by research or reasoned argument. More of the same will be a similar waste of time.” I agree.

From the start, Jen Franklin & I stressed improving the practice of evaluation. In fact, our first major paper was about the knowledge and attitudes of ratings users. We did a number of more typical papers on ratings validity & related issues, but those were primarily a function of working on and needing to validate the eval & dev system we developed in our FIPSE grant in the late 80s.

In the more recent past, especially since about 2000-2001 when I started collaborating with Raoul Arreola, I have thought even more about the state of practice; about the closed-minded attitudes of so many; and about the effectiveness of the established research in terms of affecting practice. Not to say the research is weak … just the opposite … but the reality is that the researchers, as good as they were, most often spoke to each other and did not reach the wide audience of users. That’s what led to our development of the “meta-profession” model as a tool to help institutions, faculty, and administrators deal with the issues “on the ground”. Campus discussions and attempts to reach consensus about faculty work and performance expectations have to be the basis for evaluation and development policy & practice.

That’s why I put the emphasis on evaluation being “local” , on coupling eval & dev, and on examination of faculty work. Raoul, Jen, & I all come from ”systems” backgrounds and view evaluation from a macro perspective as well as a micro one. The larger view is critical because it demonstrates the need for well-articulated evaluation & development systems rather than haphazard process and ad hoc questionnaires. Our primary target audience is institutional administrators because they have the ability to put effective systems in place. As example, think of the differences between AERA and AAHE. At AERA, researchers talk to each other. AAHE succeeded because it talked to top-level administrators and got ‘buy in” on its initiatives and support for campus activities.

So, would a “7 Principles” approach make a difference? I can’t predict that it would, but it couldn’t hurt. There have been several attempts to disseminate guidelines before. Ken Doyle implied as much in his books in 76 & 83, as did John Centra in 1979. Braskamp, Brandenberg & Ory list 12 “considerations” for evaluation and 5 for development in their 1984 book. The second books from both Centra (93) and Braskamp & Ory (94) reinforced their broader views of good practice. McKnight had a 14-point list back in 1984. Dick Miller’s second book had a 10-point list in 1987. Pete Seldin has a chapter in building a system in his 1999 book and it has guidelines as well. The most specific applied process is Arreola’s “8-Step” approach (in his 2 editions of ‘Developing a comprehensive eval system’, 95’ & 00’) that results in a “source-impact matrix” that specifies and prioritizes what will be evaluated, by whom, using what methods. That will be reinforced in the third edition, coming out this summer, and that book will have an extended description of the ‘meta-profession’ and its application to evaluation & development.

The guidelines I went over in the first webcast incorporate this work and add my own twists. A shorter list could be taken pretty much directly out of those guidelines. So it’s not that there haven’t been attempts to disseminate something like a ‘7 Principles’. The differences have been in the marketing and politics of these ideas. Three factors seem important: 1) there has been no major organizational backing for published evaluation principles; 2) there has not been coverage of evaluation issues the equivalent of the press surrounding C & G’s publication of their principles; and 3) principles for good undergraduate education do not threaten anyone, whereas principles for good evaluation will always be threatening.

Witness what we see most in the press (e.g., the Chronicle as prime offender): ill-informed stories about ratings controversies that paint the occasional negative study (e.g. Williams & Ceci) as being equal in weight to well established research. There is no equivalent history of casting doubt on the research that underlies the 7 Principles. The reality is that few will take issue with mom & apple pie statements about effective education. Not to dismiss the Principles, but check out any institutional mission statement to find obsequies in all the appropriate directions. It’s easy and non-threatening to render lip service to these statements, but to accept evaluation (especially by those who haven’t reached our pinnacle of intellectual authority and disciplinary expertise & stature) is another story. There are valid & reliable ways to do evaluation well. Agreeing to the principles might mean actually having to do something and it seems to me that a lot of objection to ratings comes from objection to the very notion of being evaluated. I suspect they would take issue with almost any form of evaluation of teaching.

Well, enough of my cynicism. The bottom line is that it would be a matter of minutes to develop a set of principles. The wide publication of these principles, particularly if offered collaboratively by one or more professional organizations could have an impact. I could get POD and the AERA SIG on Faculty Teaching, Evaluation & Development to sign on and we have contacts in the other organizations that would be relevant. Any attempt to improve practice is a step in the right direction. We should probably do something like this & I am very willing to take part.

Next step?

mike

"Student Engagement Test"? - Trav Johnson 4-28-2006

"... my suggestion is that we think of a 'student engagement' test. We want students to do what it takes for them to learn significant concepts, skills, etc.; so the focus of evaluation is on course designs and instructors that best facilitate and support student engagement in significant learning. "
From: Trav Johnson

Date: Apr 28 2006 - 4:54pm

Let me take a little different approach to one part of this discussion. I know what I am about to say is implied in some of the previous comments by others, but I think it is useful to make it explicit.

Both the design of a course and the characteristics/actions of the teacher are obviously important, but neither is the primary focus of what we really want to know. First and foremost, we want students to learn that which is important and relevant (we could spend all day discussing what important and relevant might be, but suspend this discussion for now). We know from research and learning theories that students learn by doing, by being engaged in the learning process. So we are not so concerned with course design and what teachers do per se, only that these facilitate what students do to learn —think of Barr and Tagg's article "From Teaching to Learning,” and Dee Fink’s book “Creating Significant Learning Experiences.”

Course design and teacher performance are means to an end. What we want to see are course designs that require students to engage in significant learning. We want to see instructors who effectively engage students in the learning process. So my suggestion is that we think of a “student engagement” test. We want students to do what it takes for them to learn significant concepts, skills, etc.; so the focus of evaluation is on course designs and instructors that best facilitate and support student engagement in significant learning.

So if instructors are completely interchangeable, we may wonder what value the instructor adds to student engagement in learning in a course. If student engagement is dependent purely on the instructor teaching the course, we might wonder what is wrong with the course design. By the way, my experience is that a well-designed course that engages students in meaningful learning can go a long way to compensate for *almost* any teacher deficiencies, i.e., the course can be adequate or even very good. On the other hand, course design alone will seldom make a course outstanding. The amazing courses I sometimes hear about (i.e., those that have life-changing effects on many students) always seem to have an exceptional teacher (in addition to good course design).

Trav

Trav D. Johnson, Asst. Director
BYU Faculty Center, 4450 WSC
(801) 422-5845

Ratings Drop as Consequence of Shift in Course Design Toward Active Learning? - Steve Ehrmann 4-28-2006

"... I've heard this story many times before: instructor shifts to a design that involves more active learning, collaboration, and student responsibility. Instructor ratings drop, sometimes to a career threatening degree."
From: Steve Ehrmann

Date: Apr 28 2006 - 9:27am

I was the person (or at least one of them) who asked the question Steve cites at the beginning of his email. Here are some additional facts:
* the two physics courses have very different designs. The earlier, traditional design features a single lecturer, and about 800 students (two lecture sections; lots of discussions sections taught by others). The use of only one lecturer (in a department of 80+ faculty) makes it more feasible for the department to select superior lecturers, and student ratings are often high.
* One faculty member (who had taught the course some years ago and had gotten high ratings) didn't like the learning outcomes and redesigned the course. In the new design, students are taught in groups of 100 (each with a different instructor, plus TAs), using discussion, experiments, clickers, etc. I THINK his ratings went down. I'm sure that the ratings of the average instructor (8 of them) were lower than the ratings of the single lecturer in the old design.
* I know the learning outcomes (conceptual understanding) improved dramatically, relative to the old design.
* a highly rated instructor in the same department claims that a) the department doesn't have enough good teachers to staff so many sections of the redesigned course so that the average rating of the eight course leaders will be significantly lower than the rating of the single lecturer in the old design, b) he infers that lower instructor ratings mean that students in this introductory course are less likely to want to learn physics in the future. It's better to have more affection for physics than better learning outcomes, he said. Whether you agree or not, this is consistent with what Mike said in the workshop - whatever SCE [Student Course Evaluation] measures, it's only a partial measure of how good the course was.
* My guess is that many faculty in this research-oriented department of 80 faculty would agree that most of them could not do either model well (lecture freshmen; guide on the side for freshmen), that some could teach comfortably in both ways, and that some could do only one well.

I've heard this story many times before: instructor shifts to a design that involves more active learning, collaboration, and student responsibility. Instructor ratings drop, sometimes to a career threatening degree. I don't know how often this drop in SCE scores is a measure of decreased satisfaction (the instructor isn't working as hard for me? I'm not comfortable learning this way?) and to what extent it is an artifact -- the change in pedagogy created a mismatch with the questions on the form (which perhaps were biased toward questions about lecturing - the instructor is lecturing less so scores go down, even for students who are satisfied with the course and learning well).

Steve

**********
Steve Ehrmann (ehrmann@tltgroup.org)
The TLT Group
301-270-8311
Blog: http://jade.mcli.dist.maricopa.edu/2steves/

Validity & Reliability of Student Course Ratings - M. Theall 4/28/2006

"... a good (i.e., a comprehensive) eval system will have the capability and the mandate to further explore both the bus hits and the embarrassments. "

From: Mike Theall
Date: Apr 28 2006 - 7:43am

I have three internal reports to prepare by next Wednesday (good thing it’s Trav’s turn in webcast 2), so I can’t make a thorough response now, but maybe later. However, I’ll try for a quick answer.

Anything can happen, including the apparent contradiction of high learning and low ratings. No one claims the generalizations are perfect. In fact, it’s the rare violations that give rise to a lot of the confusion about ratings. My position is that a good (i.e., a comprehensive) eval system will have the capability and the mandate to further explore both the bus hits and the embarrassments. That system combines various eval mechanisms with data from assessment, institutional research, and other sources to try to understand phenomena that don’t “fit”.

Example: Jen Franklin I looked at gender ratings found (from the entire dataset) no relationships to ratings (i.e., no evidence of gender bias). We then broke the data down by discipline. AHA! We found significant differences in ratings in business fine arts (i.e. a violation of the generalization from a lot of major research and from our own full dataset analysis). Then we collected data about gender of the faculty found in these 2 cases, predominantly male faculty. Then we explored course assignments found that in these areas women were primarily teaching large-enrollment, required, entry-level courses (each factor contributing a bit toward lowering average ratings for whoever teaches such courses). Thus, the ratings differences may have had nothing to do with gender and teaching, but rather with context and teaching (i.e., the difficulty of teaching such courses compared to senior seminars etc) A replication at another university (both studies with big samples by the way) found no male-female differences in these disciplines. Further exploration also found no gender inequity in hiring and no inequities in course assignments.

So, had we stopped at finding significant differences favoring males in these disciplines, we could have made the error of claiming that ratings are biased by gender and/or by some disciplinary factor. There may have been bias in the first case, but it wasn’t from students and it wasn’t because ratings were invalid or unreliable. Just the opposite. Ratings provided data that were reliable and valid and when properly reviewed, showed evidence of a different kind of possible gender bias: that coming from administrators or the workplace environment.

So, violations of the general rules occur with ratings (as with every other area). Claiming the occasional violation as evidence of invalidity or unreliability is most often a mistake. Good practice requires the ability and support to investigate unusual phenomena.

Use my name if you want to.

Cheers,

mike

Text Chat That Started It All - Excerpts: Bus Test vs. Embarrassment Test, etc.. [ vs. Hurricane Katrina Test]

Several of us began discussing these issues toward the end of our first synchronous session in the TLT Group's online workshop "Student Course Evaluations -- from Paper to On-Line: Issues, Questions, and Some Answers" April 27, May 4, and May 11, 2006 - co-sponsored by The POD Network.

We were exchanging questions and comments audibly, public text chat, and private text chat. Here's an excerpt from the public text chat discussion:

"Tom Angelo] Well-designed courses have to take into account the characteristics of the students (and that requires assessment of their characteristics, particularly prior learning and beliefs). But well-designed courses are of limited use if they don't focus on the "right" outcomes.

[Steve Gilbert TLT Group] Can a course be well-designed if it mostly takes advantage of the skills of a brilliant, charismatic teacher?

[Tom Angelo] Perhaps, but it won't pass the "bus test".

[Steve Gilbert TLT Group] "bus test"? I'll bite....

[Melissa McDaniels] Are students positioned to comment on course design?

[Mike Theall] Steve, I don't think you can design a course that way. Sure, take advantage of a strength, but don't rely on it alone
...

[Tom Angelo] If that brilliant, charismatic teacher (Mike Theall? Steve Gilbert?) gets hit by a bus, can another smart and well-prepared teacher (who's not brilliant or charismatic) achieve similar learning outcomes with the students?

...

[Steve Gilbert TLT Group] Bus test only applies to a teacher? What if that bus hit the whole dept? Demolished the entire campus.....?

[Tom Angelo] That's the Hurricane Katrina test."

Embarrassment Test AND Pride Test?

Coiuld the real test we’re looking for be a combination of the “embarrassment test” and the “pride test”?

If lots of people would be happy that the bus hit the teacher, then that course/etc. is bad.
If lots of people would be upset if the bus his the teacher, then that course /etc. is good.

If some course, teacher activity, etc. is happening in a way that many people want to hide from the public because there is something embarrassing about it, then that is probably bad.

If some course, teacher, etc. is happening in a way that many people want to tell the world about it because they are proud of what they believe is happening, then that is probably good.

If hardly anyone would notice if the bus hit the teacher and someone replaced him/her, then that is probably bad. Isn’t it?

Who would brag about courses passing "The Bus Test"?

Suppose an institution could say that all the courses were so “well-designed” that they could pass the “bus test.” Would that be something to brag about?

I think it might be rather embarrassing to acknowledge that “all of our courses are structured so that the faculty members are truly interchangeable” It would be a little like trying to brag about having figured out a really terrific curriculum guide and selection of textbooks without saying anything about the qualifications, skills, or other characteristics of the teachers. [Without saying anything about the variations in learning goals and needs among the students, either!]

Why do you think MIT can give away so much course content without anyone believing even for a moment that MIT is giving away MIT courses or any significant part of an MIT education? Commercial publishers and for-profit educational organizations can provide excellent training and develop excellent instructional materials… some of which can be used with interchangeable teachers or no teachers at all by SOME students for SOME purposes.

[Of course, many people have been learning quite well quite independently via access to nothing more than books. But the number of such learners compared with the overall population is almost negligible. Some of the best of those are called “scholars” and they are widely recognized as unusual – if not downright peculiar.]

I would expect an institution to want to be able to say something like….”We have really hard-working highly skilled teams to develop course-related materials and help faculty members develop syllabi and activities. We’re really proud of the kind of support these teams can provide for our faculty members – especially how well we can respond to differences among the faculty and differences among departments and among differences in course goals. Obviously, this approach is just an extension of our intense commitment to respecting individual differences among students, too.

Bus Test vs. Embarrassment Test