From
john_damron@pop.mindlink.net
Mon Oct 14 20:56:48 1996
POLITICS OF THE CLASSROOM (revised)
Pt. 1/SimpleText Version
e-mail: john_damron@mindlink.bc.ca
Copyright John C Damron, Douglas College, 700 Royal Ave, New Westminster, British Columbia, Canada. This document can be freely redistributed in whole or in part, provided that this copyright notice is included intact, and that no material profit is generated from such a transaction.
The author can be contacted in the Social Sciences Department, Douglas College, P.O. Box 2503, New Westminster, British Columbia, Canada, V3L 5B2 (telephone (604) 527-5312).
TABLE OF CONTENTS
-
MAIN TEXT (Quotations indicated by quote)
-
APPENDIX A: THE VALIDITY AND ACCURACY OF STUDENT RATINGS-
APPENDIX B: MORE ON INSTRUCTOR PERSONALITY AND STUDENT RATINGS-
APPENDIX C: THE VALIDITY OF STUDENT RATINGS INTERPRETATIONS-
APPENDIX D: A CROSS-DISCIPLINE RATING BIAS-
APPENDIX E: THE POLITICS OF VALIDITY -- DISCRIMINANT AND CONVERGENT-
ENDNOTES (Numerically denoted in text as [x]-
REFERENCES
From their inception earlier this century, student instructional rating questionnaires have been touted as a cheap and convenient means of evaluating the teaching of college and university faculty. They were eagerly embraced by college administrators in the 1960s because they offered a ready vehicle for assessing faculty hired to teach the troves of students entering post-secondary institutes. Their promise, technical appearance and utter simplicity have ensured the popular use of student instructional ratings for well over thirty years now.
Nevertheless, student ratings were and continue to be controversial because it is not clear that they indeed do assay teaching effectiveness. For example, some critics argue that student ratings are unduly affected "by the personal style of the instructor rather than [the] instructor's ability to convey instructional material (Abrami, Leventhal & Perry, 1982)." Ultimately, such concerns gave rise to a host of studies that explored the relationship of instructor personality to student ratings and, in some instances, to student achievement.
Before examining the contribution of instructor personality to
student instructional ratings let us first take note of two
fundamental dimensions of teaching - instructional processes and
instructional products. Instructional processes are comprised of the
mix of routines, techniques and strategies employed by instructors to
promote learning in their students. Among others, these may include
informative lectures, classroom and laboratory demonstrations, field
trips, debates, and instructional videos. Instructional products, in
contrast, consist of the substantive outcomes promoted by
instructional processes (Abrami, d'Apollonia & Cohen, 1990).
Although both are essential elements of teaching, instructional
products are utterly fundamental. It is the products of teaching that
students come to possess. And it is the products of teaching that
students take with them when they leave places of learning in pursuit
of further challenges. Whether called student learning, mastery, or
cognitive growth, it is the capacity to prompt such changes that
distinguish effective from ineffective teaching. Thus, one can say
that instructional products establish whether and to what extent
classroom practices are, indeed, effective instructional processes
(e.g., Abrami, d'Apollonia & Cohen, 1990). The reader is advised
to bear the process-product distinction clearly in mind while reading
the pages that follow.
Studies of instructor personality have considered two distinctly different views of personality (Murray, Rushton,& Paunonen, 1990; Feldman, 1986) and a broad feature of personality, instructor expressiveness (e.g., Abrami, Leventhal & Perry, 1982). Research bearing on the latter has usually been conducted within the framework of the "Dr. Fox" (or educational seduction) paradigm, in which instructional processes and their assessment are examined in a laboratory setting.
Investigations of the relationship of instructor personality to student ratings have assessed personality by either asking instructors to complete an established personality inventory or by having students or colleagues record their perceptions of instructors' personalities on questionnaires provided for this purpose (Feldman, 1986). In a comprehensive review of the literature on instructor personality and student instructional ratings Feldman (1986) found mixed evidence that students' ratings were systematically related to instructors' personalities. Although some studies found little or no relationship between the personality characteristics of instructors and their student ratings, others reported broad and robust relationships. These disparate outcomes are attributable to the procedures used to assess instructor personality (Feldman, 1986). When personality is assessed with an established instructor-written personality inventory there is virtually no correlation between instructor personality and student ratings. However, when the assessment is based on the perceptions of students or colleagues, the overall relationship of instructor personality to student ratings is substantial, with positive correlations ranging from moderate to high (Feldman, 1986).
The effect of perceived instructor personality on student instructional ratings can be substantial. Murray (1975), for example, found that 67% of between-teacher variance in student ratings was attributable to and predictable from, peer ratings of instructor liberalism, lightheartedness, extraversion, exhibitionism, and other perceived attributes. Sherman & Blackburn (1975) and Tomasco (1980) reported that student ratings were predictable from student perceptions of, among other characteristics, instructor amicability, pragmatism, nurturance, changeability, and exhibitionism. Similarly, Rushton, Murray & Paunonen (1983) and Murray, Rushton & Paunonen (1990) found that 40 to 70% of between-instructor variance in student ratings was attributable to peer assessments of instructors' personality attributes. For the most part, student perceptions of instructor personality are similar to those formed by departmental colleagues (Feldman, 1986). Research of this sort suggests that a considerable portion of the differences in instructors' student ratings is based on perceived personality characteristics rather than instructional effectiveness.
A fascinating extension of these findings was turned up in the aforementioned study by Murray, Rushton & Paunonen (1990). They found that the mix of perceived personality characteristics that yield high student ratings varied markedly for different course types. For instance, highly rated instructors of large introductory courses were perceived as liberal, neurotic and extraverted, while high scoring instructors of smaller discussion oriented classes were perceived as gregarious, adaptable and supportive. Findings such as these prompted the authors to conclude that instructors "tend to be differently suited to different types of courses" and that "the compatibility of teachers to courses appears to be determined in part by personality characteristics."
As noted above, researchers have also examined the effects of a more
restricted range of personality characteristics on student ratings
and student achievement. Most can be broadly summarized as
"instructor expressiveness." Although details differ from study to
study, the overall results suggest that vacuous but animated,
charismatic, and amusing lectures yield significantly higher student
ratings than substantive but less animated lectures.
A meta-analysis of a dozen of these studies revealed that "instructor expressiveness had a substantial impact on student ratings but a small impact on student achievement" (Abrami, Leventhal & Perry, 1982). Summary and global ratings, which are frequently used to make tenure and promotion decisions, were particularly elevated by instructor expressiveness. The analysis also found that lecture content had a sizable influence on student achievement but only a negligible impact on student ratings. Findings such as these prompted the architects of the educational seduction paradigm to conclude that student instructional ratings "should not be used to make decisions about faculty promotion and tenure because charismatic and enthusiastic faculty can receive favorable student ratings regardless of how well they know their subject matter or. . .how much their students learn" (Abrami, Leventhal & Perry, 1982, p. 447; Ware & Williams, 1975; 1980).[1;2]
What is one to make of these findings? Some researchers (e.g.,
Murray, 1975; Tomasco, 1980; Murray, Rushton & Paunonen, 1990)
suggest that the high positive correlation between perceived
instructor personality and student instructional ratings is evidence
that instructors are effective because of the persona they inject
into their classroom work. Thus, in a sense, instructor personality
is conceived of as an instructional process or an antecedent thereof.
However, this view begs a fundamental question. Namely, are student
perceptions of instructor effectiveness (as revealed by student
ratings) equivalent to more objective assays of instructional
effectiveness, e.g., examinations and other systematic measures of
instructional products? In fact, they are not. As noted above,
although the correlation between instructor expressiveness and
student ratings is sizable, the correlation between expressiveness
and student achievement is negligible (Abrami, Leventhal & Perry,
1982). More generally, while the correlation between student ratings
and perceived instructor personality is considerable, their
relationship to measures of student achievement is markedly smaller
(Dowell & Neal, 1982, 1983; McCallum, 1984) [3]. Thus, we surmise
that expressiveness and other perceived instructor characteristics
share a minor portion of variance with measures of student
achievement. Also, however, given their substantial contribution to
student ratings and the weak correlation between student ratings and
student achievement, it is clear that perceived instructor traits
share much more variance with student ratings than with student
achievement (see Appendix B for more information).
Perceived instructor personality is conceptually related to a phenomenon known to social psychologists as implicit personality theory (Brewer & Crano, 1994, pp. 128-139). As suggested by the name, implicit personality theories are generic presumptions people make in order to infer the personality attributes of other people. These inferences are usually made quickly and on the basis of scanty evidence. Indeed, they are often inordinately influenced by the first impressions formed of other people (e.g., Widmeyer & Loy, 1988). Thus, as Brewer & Crano (1994, p. 143) note, we tend to "make snap judgments about the personalities of other individuals that cannot possibly be. . .accurate" and indeed, "[there is] plenty of evidence of biases and distortions in the person perception process." Considering the ease with which impressions are formed and the typically sizable role of preconceptions, stereotypes and social context in their formation, perceived instructor personality may reveal as much about students as they do about instructors, a point made by Leventhal, Abrami & Perry (1976) in a closely related context.
The idea that student perceptions of instructors are affected by preconceptions and stereotypes is supported by research showing that male and female university faculty are evaluated differently by their students (Goodwin & Stevens, 1993). For example, Kaschak (1978) found that male students tended to rate male professors higher than their female counterparts but female students rated male and female professors equally. A similar study by Basow & Silberg (1987) found that male and female students rated female professors significantly lower than male professors, even when professors were matched for type of course, years of teaching experience, and tenure status. When considered together with the results of closely related studies (e.g., Lombardo & Tocci, 1979; Ferber & Huber, 1975) such findings prompt the conclusion that "less favorable ratings of women are most likely to occur when women are seen as not fitting gender stereotypes (Basow & Silberg, 1987, p. 312)," a conclusion reinforced by Berry's (1989) observation that "women faculty members are evaluated less favorably, especially when they step out of traditionally 'feminine' areas of knowledge."
Students are also apt to confuse situationally dictated and role-driven behavior with instructor personality, a point vividly illustrated below by Brewer and Crano.
Students are prompted to attribute instructor demeanor to personality
rather than roles and situations because the role is enacted by the
instructor, who is also the focus of attention. Situational factors
are not as salient (e.g., Brewer & Crano, 1994, p. 199) [4].
Thus, it is not altogether surprising that, as noted above,
instructors of small, discussion oriented classes are often perceived
as gregarious and supportive while instructors of large introductory
university courses are seen as extraverted but also somewhat
"neurotic" (e.g., exposed, apprehensive, controlled and aloof).
Neither is it surprising that instructors become known in their
academic communities for these attributes. For the most part, the
circumstances of their teaching promote them.
Three interrelated observations seem warranted at this juncture.
First, to the extent that use of student instructional rating
questionnaires is predicated on the assumption that they measure
instructional effectiveness, the foregoing findings are clearly
problematic. When considered together with validity research yielding
only marginal and unstable relationships between student ratings and
instructional outcomes (e.g., Palmer, Carliner & Romer, 1975;
Dowell & Neal, 1982, 1983; Abrami, d'Apollonia & Cohen,
1990), it seems likely that most of the factors contributing to
student instructional ratings are unrelated to instructors' ability
to promote student learning (e.g., Small, Hollenbeck, & Haley,
1982; Ware & Williams, 1980; Chandler, 1978; see Appendix A, page
9). Indeed, given the considerable contribution of perceived
instructor personality, student ratings may be more closely linked to
student stereotypes of teachers than to instructional effectiveness
(Erikson, 1983).
Second, the research summarized above creates knotty problems for conscientious instructors whose continued employment depends on receiving high student ratings. Since expressiveness and other perceived personality characteristics contribute greatly to high student ratings, instructors may choose to commit substantial time and psychological energy to projecting these qualities in class. However, while these inflate instructional ratings, they may add relatively little to student achievement (see Appendix B).
Alternatively, instructors may dedicate themselves to preparing substantive, conceptually challenging lectures because these contribute substantially to student achievement. However, sober, substantive and carefully given lectures may limit instructors' ability to evince exhibitionism, lightheartedness, amicability, and the like, even if they are prompted to do so. And, since lecture content contributes much less to student instructional ratings, the price instructors pay for this strategy is lower student ratings and, possibly, loss of promotions, salary increments, or employment.
Or, instructors may attempt a compromise by allocating classroom time to both strategies. But this would result in declines in either student ratings or student achievement compared to teaching keyed to one or the other strategy. Moreover, sizable doses of persona may draw students' attention away from intellectually challenging lecture materials (Ware & Williams, 1977, 1980). And clearly, since student ratings may bear on future employment and student achievement is essentially invisible to convenors and administrators, the safest strategy is to simply maximize student ratings.[5]
Third, an upshot of the research cited above is that student ratings are questionable assays of teaching effectiveness. However, in one important sense, this observation misses the fundamental point of student instructional ratings. Student ratings are often required of new teaching faculty (and encouraged for tenured faculty) because they make it more likely that instruction will uphold the "open-humanistic-excellence" rhetoric embraced by college administrations and sold to education ministries (see endnote 5).[6] As a means of classroom surveillance, administrative use of student ratings insures that teaching will seldom rise to a level of substance that precipitates student unhappiness. For the most part, this practice is embraced independently of course objectives and standards, the capabilities and commitments of students, the validity of the evaluation instrument, and the consequences of shallow teaching for long-term student success. When combined with open-door admissions and exceedingly liberal grading policies, the net result can be little more than the trappings of student success.[7] Rather than unfettered excellence in post secondary education, the overarching institutional agenda revealed by such practices is classroom marketability, elevated enrollments, and very high consumer satisfaction. Unfortunately, while such a stratagem may produce contented students, it essentially forsakes responsibility for educational leadership for the methods, goals and standards of the commodity marketplace (McMurtry, 1991). Although endemic in private-for-profit training schools (e.g., Trend College, CompuCollege, and the like) this surely is not a state of affairs one expects to find in an institute of higher learning.
Consumer-market models of education have weighty and widespread implications for college and university instructors, particularly in light of the research on perceived instructor personality discussed in the pages above. Within the logic of such models, instructors themselves become commodities whose market value is established substantially by student (i.e., customer) ratings.[8] McMurtry (1992) makes a similar observation.
Whether embraced explicitly or tacitly, consumer-market models of
education raise a host of overarching issues that are in need of
prompt and thorough examination.[9] However, one point seems obvious
from the very start. There are limits on how much of themselves
instructors can be required to market in the interest of student
ratings. And there are substantial limits on how much instructors
should contribute to practices whose aims are centered more on
consumer satisfaction than student achievement.
Unfortunately, troublesome findings such as these have had little if
any influence on college teaching evaluation practices. The reasons
are fourfold. First, routine teaching evaluation is neither
influenced nor guided by the professional research literature.
Indeed, teaching evaluation is not widely understood as a practice
that must be conducted with validated instruments and professional
expertise. Deans, chairmen and convenors who wouldn't think of
interpreting other psychometric profiles are often not reluctant to
interpret student ratings data and ground rather weighty decisions in
their interpretations. Second, widespread use of student ratings
creates the appearance of teaching evaluation whether or not the
evaluation instrument has been validated. Everyone (including
students, boards of governors and education ministers) can see that
the institution is dedicated to "excellence in teaching." This
salutary effect would be lost if validation testing proved the
instrument to be invalid. Third, even invalid evaluation instruments
are capable of assessing student satisfaction with instructors and
their courses, a factor that college administrators may value as much
or more than instructional effectiveness. Fourth, college
administrators embrace student instructional ratings because they
serve their managerial interests. Compared to qualitative evaluation
methods, student ratings render teaching and teaching evaluation
calculable and comparable across instructors and disciplines. And
finally, even vacuous teaching evaluation allows administrators to
police faculty and induce them to comply with administrative agendas.
Few other administrative prerogatives offer such control.
Unfortunately, when acted on conscientiously by instructors, the data
yielded by invalid evaluation instruments are likely to prompt
classroom changes that diminish instructional effectiveness (see
endnote 3). Presumably, this compromise is acceptable to those who
place greater value on consumer satisfaction than effective teaching
and student achievement.
Although much of the foregoing constitutes a vigorous critique of student instructional ratings, it overshadows an obvious but noteworthy final point. Ultimately, it is not incumbent upon instructors to prove that student ratings are invalid measures of instructional effectiveness. The case for student ratings must be made by those who purport to evaluate teaching for the same reasons that those presuming to assess intelligence, aptitude, and mental health must demonstratethe validity of their procedures. However, as suggested in the pages above, such a case will be not easily made and indeed, may not be possible. Yet clearly, without convincing empirical evidence to the contrary, it is not obvious that student instructional rating questionnaires are accurately characterized as teaching effectiveness metrics. But what then are they?
Student ratings and the instructional changes implied by them are little more than prescriptions for professional demeanor. Although demonstrably linked to the social perceptions of students, these prescriptions are largely unrelated to the substantive products of instruction (see Appendix A). Moreover, when asserted dogmatically and enforced by threats to promotions, raises or continued employment, student ratings become a potent means of manipulating the behavior of college and university teachers. As such, they expose teaching faculty to arbitrary regimentation and thus constitute a considerable threat to academic freedom. For these and other reasons noted in the pages above, it is not surprising that instructors often chafe against mandatory use of student instructional ratings.
------------------------------------------------------------
Copyright John C Damron, Douglas College, 700 Royal Ave, New
Westminster, B.C. Canada. This document can be freely redistributed
in whole or in part, provided that this copyright notice is included
intact, and that no material profit is generated from such a
transaction.
As noted earlier in this paper, although instructional processes and
products are essential components of teaching, instructional products
are the more fundamental because they embody the effects of
instructional processes on student achievement. As such, they provide
criteria for establishing whether and to what extent classroom
practices are viable instructional processes. High positive
correlations between student ratings and instructional products and
low correlations with extraneous factors form the basis of student
rating validity.
According to Dowell & Neal (1982), the student ratings research literature is extensive, inconsistent, and of strikingly variable quality. It is thus difficult to interpret and summarize concisely. However, as outlined below by Abrami, d'Apollonia & Cohen (1990), several methodological considerations characterize all coherent rating validity research.
After circumscribing a similar set of controls, Dowell & Neal
(1982, p. 51) note that such methodological requisites are essential
considerations in any critical examination of validity studies
"because these elements are inappropriately implemented in all but a
few studies." Bearing the foregoing in mind, the validity of student
ratings will be summarized here in light of the results of four
meta-analyses of existing validity studies and Abrami, d'Apollonia
& Cohens' (1990) analysis of variability in multisection
validation findings.
In a meta-analysis of 41 early multisection validity studies, Cohen (1981) found that student achievement explained 18.5% and 22% of overall instructor and course rating variance, leaving most between-teacher rating variance unaccounted for. It is appropriate to interpret these results cautiously, however, because many of the primary studies in Cohen's analysis used neither random assignment to course sections nor statistical procedures to control for initial differences in student ability (Cohen, 1981). In a better controlled subsequent meta-analysis, Cohen (1983) found that student achievement accounted for 14.4% of overall instructor rating variance.
Other analyses have turned up somewhat lower estimates of student rating validity. In a meta-analysis of 14 multisection validity studies, McCallum (1984) found that student achievement explained 10.1% and 6.4% of (respectively) overall instructor and course rating variance. And, in a quantitative analysis of six validity studies chosen for their exceptional control of student presage variables, Dowell & Neal (1982) found that student achievement accounted for only 3.9% of between-teacher student rating variance.[10] Indeed, in their primary study with the most extensive controls for student ability (Palmer, Carliner, & Romer, 1978), student achievement accounted for only 1.44% of between-instructor ratings variance. Both findings suggest that the link between student instructional ratings and achievement is mediated by student ability (Dowell & Neal, 1982).
Of equal significance, Dowell & Neal (1982) found evidence that validity coefficients are strongly affected by situational factors. They note that the research literature
Abrami, d'Apollonia & Cohen (1990) draw a similar conclusion
about variability in validity outcomes across validity studies and
over rating dimensions.
In a closely related observation, Dowell & Neal (1982) conclude
that
As noted on the first page of this appendix, validation designs must
incorporate control procedures that protect against threats to
internal and external validity. Failure to properly implement such
controls corrupts the integrity and generality of the data yielded by
the design and renders them essentially uninterpretable (Dowell &
Neal, 1982).
Cohen (1981) acknowledges that his meta-analysis cannot address concerns about the internal validity of validation studies. He notes that "it is difficult to determine. . .the extent to which achievement differences among sections can be attributed to differences among teachers" (p. 305). Dowell & Neal (1982) take a step further in concluding that most validation studies are of questionable internal validity (p. 51; p. 60). They also suggest that many or most validation studies lack external validity, a conclusion that Cohen (1981) resists, at least with respect to his own study (p. 305). However, Abrami, d'Apollonia & Cohens' (1990, pp. 222-224) examination of Cohen's validity outcome data suggests that his validity coefficients cannot be generalized to other students, instructors and institutional settings [Note: Cohen is the same person in both references].
Thus, the research above suggests that the validity of student instructional ratings is problematic for three fundamental reasons. First, validation studies that do not properly control for biasing factors (e.g., student characteristics) yield internally invalid and uninterpretable estimates of rating validity. Second, when appropriate controls are implemented, resulting validity estimates account for only a small fragment of between instructor rating variance. The proportion of variance accounted for appears to be inversely related to the scope of the controls. Third, even among well designed validity studies, validity coefficients tend to be highly variable and mediated by situational factors to such a degree that coherent context-independent estimates of validity are not possible (Dowell & Neal, 1982; 1983; Abrami, d'Apollonia & Cohen, 1990). The latter two problems have weighty implications for the accuracy and developmental utility of student ratings.
As noted above, meta-analyses of published validity studies (e.g., Dowell & Neal, 1982; Cohen, 1983; McCallum, 1984) indicate that student achievement explains between 4-14% of student ratings variance, leaving 86-96% of variance unaccounted for and attributable to factors other than teaching effectiveness (e.g., perceived instructor personality and the like).[11; 12] Such validity levels are characterized by near maximum standard errors of estimate (.9798 -.9250), exceedingly wide confidence intervals, and (consequently) a very high incidence of specious and adventitious differences between and among instructors' effectiveness ratings (e.g., Ferguson, 1981, pp. 130-132; Howell, 1992, pp. 237-240).[13]
As shown below, in comparing the results of their (1982) quantitative review of multisection validity studies with those of a critic (Cohen, 1983), Dowell & Neal (1983) come to a similar conclusion regarding the questionable accuracy of student ratings.
Dowell & Neal (1983) coclude their discussion with the following
observation about the managerial appeal of student ratings.
This observation is virtually identical to a thesis of the present
paper.
According to educator Wilbert McKeachie (1987, p. 4), "for personnel purposes, faculty and administrators rightly have great concerns about the reliability and validity of evaluation data." He goes on to suggest that these concerns are not as urgent when student ratings are used for instructional development. However, this is true only in the narrow sense that use of student ratings for instructional development does not usually affect one's employment status. The fact remains that student ratings may still play a considerable role in guiding and substantiating instructional change. At the very least, this function is served poorly by marginally valid or invalid teaching evaluation instruments.
It was noted earlier in this paper that laboratory studies show that
instructor expressiveness has a substantial impact on student
instructional ratings but a smaller effect on student achievement.
Indeed, a meta-analysis of such findings (Abrami, Leventhal &
Perry, 1982) found that instructor expressiveness accounted for 29%
of student rating variance but only 4.3% of student achievement
variance. In contrast, lecture content accounted for 16% of student
achievement variance but only 4.6% of student ratings variance. Thus,
an upshot of educational seduction studies is that student ratings
are over sensitive to the expressive style of instructors and
substantially insensitive to instructors' ability to promote student
learning (Abrami, Leventhal & Perry, 1982). As illustrated below
by the authors of the foregoing meta-analysis, these differences are
not particularly subtle.
In related studies of student and instructor characteristics, student
ratings, and student achievement, Abrami, Perry & Leventhal
(1982) obtained results similar to those reported above. Moreover,
Abrami, Perry & Leventhal (1982) found that in both laboratory
and field studies (see Abrami & Mizener, 1985)
Unlike educational seduction research, studies of the relationship of
perceived instructor personality to student achievement have been
somewhat rare (e.g., Feldman, 1986). However, in a field (real
classroom) study by Murray (1978), instructional effectiveness was
assessed by students' ratings and by objective measures of student
achievement. Murray found that peer ratings of instructor personality
characteristics were more highly correlated with student ratings than
with student achievement. He also found that the perceived
personality traits associated with student ratings overlapped little
with the particular attributes associated with student achievement.
Thus, while instructor liberalism, exhibitionism, extraversion and
personal warmth contributed sizably to student ratings, they were
unrelated to objective measures of student achievement.
Multiple correlations between peer ratings of instructor personality
and, respectively, student ratings and student achievement, revealed
that perceived instructor personality traits explained 2.6 times more
variance in student ratings than student achievement (R2s = .5476 and
.2116). On the basis of the foregoing, Murray concluded that
In a frequently cited multisection validity study, Sullivan &
Skanes (1974) obtained similar results using a methodology different
from Murray's. They found a sizable subgroup of instructors who
facilitated high achievement in their students yet received low
ratings from them, and a second subgroup whose members prompted low
student achievement but nonetheless received high ratings.
In institutional environments in which instructors' student ratings are visible and student achievement is essentially invisible, inequities are inevitable. Instructors who are skilled at the art of impression management are likely to receive high student ratings whether or not their students have adequately mastered course materials. In contrast, instructors with effective pedagogical skills who cannot or will not manage students' impressions will receive substantially poorer ratings, especially if they fail to exude liberalism, exhibitionism and other key personality attributes. These findings, which would seem to be consistent with those of educational seduction researchers, call into question the validity of student ratings as measures of instructional effectiveness.[15]
------------------------------------------------------------
Copyright John C Damron, Douglas College, 700 Royal Ave, New
Westminster, B.C. Canada. This document can be freely redistributed
in whole or in part, provided that this copyright notice is included
intact, and that no material profit is generated from such a
transaction.
Although recent critiques of student ratings have quite properly
focused on rating instrument validity, another form of validity --
the validity of ratings interpretations -- has also been discussed.
The point here is that student rating data can be valid or invalid,
as can interpretations of ratings data. This point is well taken.
Even if a sufficiently valid rating questionnaire existed, there are
no guarantees that interpretations of ratings data will be valid (or
reasonable, coherent or fair). The edited excerpts below on ratings
interpretation validity are drawn from a paper written by Jennifer
Franklin and Micheal Theall (1990). These authors oversee faculty
development offices at Northeastern University and the University of
Alabama respectively.
_______________________________
Conversations with faculty and administrators...led increasingly to concerns about what users [e.g., chairmen; deans] were doing with the information we were providing. We saw that some departmental administrators, who routinely use ratings to make decisions about personnel, evaluation policy, and resource allocation, were not familiar enough with important ratings issues to make well informed decisions...
We received many requests from faculty for assistance in interpreting reports, and we discovered that our clients would not or could not use many of the instructions for interpretation that we had provided. Clearly stated disclaimers regarding the limitations of ratings data in particular circumstances appeared to have little effect on the inclination of some clients to use invalid or inadequate data...
Our research findings, as well as anecdotal reports from many of our colleagues, suggest that many of those who routinely use ratings are liable to be seriously uninformed about critical issues. For example, among faculty respondents who reported using ratings for personnel decisions involving other faculty, nearly half were unable to identify likely sources of bias in ratings results, recognize standards for proper samples, or interpret commonly used descriptive statistics...
A great deal of scholarly attention has been paid to the validity and reliability of student ratings as a measure of instructional quality. Considerably less has been given to actual practice... Utilization of ratings is one of the least often studied or discussed issues in the realm of ratings phenomenon. There are far fewer reported observations of ratings users in action in personnel decision making or of the ways in which teaching improvement consultants use ratings in interactions with their faculty clients...
Even given the inherently less than perfect nature of ratings data and the analytical inclinations of academics, the problem of unskilled users, making decisions based on invalid interpretations of ambiguous or frankly bad data, deserves attention. According to Thompson (1988, p. 217) 'Bayes' Theorem shows that anything close to an accurate interpretation of the results of imperfect predictors is very elusive at the intuitive level. Indeed, empirical studies have shown that persons unfamiliar with conditional probability are quite poor at doing so (that is, interpreting ratings results) unless the situation is quite simple." It seems likely that the combination of less than perfect data with less than perfect users could quickly yield completely unacceptable practices, unless safeguards were in place to insure that users knew how to recognize problems of validity and reliability, understood the inherent limitations of rating data and knew valid procedures for using ratings data in the contexts of summative and formative evaluation.
Whether the practices of those who operate rating systems or use ratings can stand close inspection has become open to question. It is hard to ignore the mounting anecdotal evidence of abuse. Our findings, and the evidence that ratings use is on the increase, taken together, suggest that ratings malpractice, causing harm to individual careers and undermining institutional goals, deserves our attention ...(pp. 78-80).
_________________________________
The mechanics and style of interpreting ratings appear to vary dramatically across the domains of ratings use, particularly with respect to the role of quantitative information. It is our impression that many teaching consultants employ subjective, experientially based methods of dealing with information, while administrative decision makers may strive to construct empirically based (or "empirical looking") formulas...
There are some fundamental concepts for using numbers in decision making. To the degree that these concepts are ignored, interpretations of data become, at best, projective tests reflecting what the user (e.g., a chairperson or dean) already knows, believes, or perceives in the data. Treating tables of numbers like inkblots ('ratings by Rorschach') will cause decisions to be subjective and liable to error or even litigation...
Ratings are particularly subject to sampling problems, such as not having enough courses on which to base a comparison between two instructors and not involving enough students in rating each course section. Moreover, the fact that classes with fewer than thirty students are statistically small samples means that special statistical methods are required for some purposes.... Substantially different models for analysis are also required for various uses of the data. Given such problems, there are many opportunities for error in dealing with numbers. Three types of errors come to mind immediately.
The first involves interpretation of severely flawed data, with no recognition of the limitations imposed by problems in data collection, sampling, or analysis. This error can be compared to a Type I error in research -- wrongly rejecting the null hypothesis -- because it involves incorrectly interpreting the data and coming to an unwarranted conclusion. In this case, misinterpretation of statistics could lead to a decision favoring one instructor over another, when in fact the two instructors or not significantly different.
The second type of error occurs when, given adequate data, there is a failure to distinguish significant differences from insignificant differences. This error can be compared to a Type II error. -- failure to reject the null hypothesis - because the user does not realize that there is enough evidence to warrant a decision. In this case, failure to use data from available reports (assuming the reports to be complete, valid, reliable, and appropriate) may be prejudicial to an instructor whose performance has been outstanding but who, as a result of the error, is not appropriately rewarded or worse, is penalized.
The third type of error occurs when, given significant differences, there is a failure to account for or correctly identify the sources of differences. This error combines the other two types and is caused by misunderstanding of the influences of relevant and irrelevant variables. In this case, a personal predisposition toward teaching style.., may lead a user to attribute negative meanings to good ratings, or to misinterpret the results of an item as negative evidence when the item is actually irrelevant and there is no quantitative justification for such a decision.
Any of these errors can render an interpretation entirely invalid...
How can we conceptualize the problem of ensuring that users do not make decisions or take actions that are based on invalid interpretations of data? In the followingexample, invalid interpretations are seen to result from either invalid or unreliable data or from lack of skill, knowledge, or necessary information on the part of the user. The strategy is to make sure that users either have or have access to sufficient skills or information to form valid hypotheses. Valid, reliable hypotheses are those interpretations of ratings that knowledgeable, skilled users, with adequate information concerning the present data, would be likely to produce or concur with.
Let us...state our goal in the following way: "The user will make decisions that are based on valid, reliable hypotheses about the meaning of data." In this case, the user should receive or construct working hypotheses that do the following things:-
Take into account problems in measurement, sampling, or data collection and include any appropriate warnings or disclaimers regarding the suitability of the data for interpretation and use.-
Do not attempt to account for differences between any results when they are statistically not significant (probably <.05).-
Disregard any significant differences that are merely artifacts (for example, small differences observed in huge samples), which can technically be significant but are unimportant).-
Account for any practically important, significant differences between results in terms of known, likely sources of systematic bias in ratings or reliably observed correlations, as well as in terms of relevant praxiological constructs about teaching or instruction.
The user should also refrain from constructing or acting on hypotheses that do not meet these conditions... (pp. 87-89)...
The validity of inferences or interpretations should concern those
who design and operate ratings systems as much as validity and
reliability of instruments used to obtain the data... How use occurs
ought to be very important issue, one for which those who develop
ratings systems ought to be held accountable... (pp. 80-81).
_______________________________
The following is an edited excerpt from an article on student ratings
written by William Cashin (1990), a prominent evaluation expert at
Kansas State University. The paper is titled "Students Do Rate
Different Academic Fields Differently." Cashin examined very large
data bases of students' ratings obtained with either the Educational
Testing Service's Student Instructional Ratings questionnaire (SIR)
or Kansas State University's IDEA questionnaire. Both are widely used
in the USA. In a nutshell, he found sizable differences in how
students rate teaching across various academic disciplines.
__________________________________
If you ask a college teacher whether students rate different academic fields differently, he or she will most probably say yes. If you ask why, you are not likely to be given much justification beyond the conviction that different fields are different. Nevertheless, there is increasing evidence that the conventional wisdom is correct. Students do rate academic fields differently. What is not clear is why...
The high group tends to consist of the arts and humanities. This trend is not universal, however; English language and literature and history both fall into the medium-low group. The low groups tend to consist mostly of business, economics, computer science, math, physical sciences, and engineering. The biological and social sciences and health and other professions tend to fall somewhere in the middle.
If we look at "Course Effectiveness" and "Instructor Effectiveness" combined, we see that the fine and applied arts and music fall into the high group for both measures. If we consider fields that are high on one measure and medium-high on the other, art, communications, foreign languages and literature, home economics, secretarial studies, and speech also fall toward the high end. This is very much a humanities cluster, with the exception of home economics and secretarial studies.
Several fields fall into the low group for both course effectiveness and instructor effectiveness: business and management, computer and information sciences, data processing technologies, economics, engineering, physical sciences, and physics. To the fields that were low on one measure and medium low on the other we must add accounting, chemistry, mathematical sciences, and philosophy. This is very much a math-science technical cluster, with the exception of philosophy and, perhaps, business and management.
The primary implication [of these findings] is that...we need to decide what to do about this phenomenon when we interpret student-ratings data. Administrators can no longer look at data from a variety of fields and unquestioningly compare numbers directly. Instructors cannot look at two courses they are teaching and necessarily assume that, if their ratings for the two courses are the same, that they taught both courses equally well.
The real problem arises from our not knowing why the different fields are rated differently. This finding is not due just to variations in student motivation (for example, required verses elective courses) or class size. In one unpublished analysis of IDEA data it was found, even after researchers controlled for students' motivation and class size, that differences in academic fields explained an additional 10 percent or more of the variance for some IDEA course objectives. In another study of a sample of IDEA data, 14-18 percent of the remaining variance was explained after controlling for differences among institutions, in number of courses for each field, in student motivation, and in class size.
There are several possible explanations for differences in the ratings of different academic fields. One is that the more quantitative courses tend to receive lower ratings. The low fields tend to be math, science, engineering, and quantitative business courses (for example, accounting and economics). A possible explanation for these differences is that students' quantitative skills are more poorly developed than their verbal skills. This would make quantitative courses more difficult to teach. Moreover, quantitative courses may receive lower ratings because students have lower expectations of success and lower actual rates of success. We have evidence that higher student ratings are related to...students' satisfaction and that, as grades decrease, students more frequently attribute their poor performance to factors external to themselves.
Another explanation of different ratings for different fields is that the more sequential courses, where success depends heavily on the mastery of material from a previous course, tend to receive lower ratings. This holds true for most math and science courses and for many professional courses, but it also holds true of foreign language courses, which tend to receive low ratings. Sequential courses may receive lower ratings because today's students are not studying as much as students have in previous decades and so do not have as solid a foundation for the courses that come later in a sequence...
Yet another explanation is that students in different majors rate course differently, because of differences in attitude, in academic skill and goals, in motivation, in learning styles, or in models of effective teaching. Although students majoring in any given field are likely to vary in many ways, it is quite possible that, taken as a group, they have certain characteristics that are related to how they rate courses and instructors... (pp. 113-119).----------------------------------------
* Lawrence Aleamoni, an evaluation expert at the University of Arizona, has made an observation of a similar sort regarding rating biases against required courses and student biases associated with various course levels (e.g., freshman, sophomore, and the like). He reports that
the variables that distinguish a required course from an elective, and that identify courses by level (freshman, sophomore, and so on) do seem to generate significant differences in student ratings. For example, the higher the proportion of students taking the class as a requirement, the lower the overall rating. [Moreover], freshmen tend to rate their teachers significantly lower than do sophomores, sophomores tend to rate them significantly lower than do juniors, and so on.
There are literally hundreds of student rating questionnaires in use
in North America, many of which are "home grown" or mass produced by
optical scanner manufacturers. Although a few questionnaires have
undergone convergent validity testing (e.g., the SIR, IDEA and MSSIR)
most have not dispite the not the finding that validity coefficients
can be negative (see endnote 14). Moreover, with very rare exception,
none have undergone discriminant validation. Thus, although weighty
decisions about faculty careers are often made on the basis of
student ratings, the psychometric integrity of the rating instrument
is essentially unknown but treated as coherent and valid. It is
difficult to imagine a more cavalier state of affairs.
INSTRUCTOR PERSONALITY AND THE POLITICS OF THE CLASSROOM (revised) Pt. 4------------------------------------------------------------
Copyright John C Damron, Douglas College, 700 Royal Ave, New Westminster, B.C. Canada. This document can be freely redistributed in whole or in part, provided that this copyright notice is included intact, and that no material profit is generated from such a transaction.
1. The research cited here on perceived instructor personality (e.g., Feldman, 1986) and instructor expressiveness (e.g., Abrami, Leventhal, & Perry, 1982) complement each other in methodologically significant ways. The latter research was conducted in a laboratory setting where study variables are carefully manipulated and measured in a context free of extraneous influences. Studies of this sort are said to have high internal validity because they accurately identify relationships between or among variables. The former research was done in a more natural but less controlled setting where contaminating variables may influence the outcome of the study. Such studies tend to have high external validity, meaning that their results are readily generalizable to other settings. Conclusions rooted in both sorts of study are preferred.
2. Closely related concerns about administrative uses of student ratings have been expressed, respectively, by educators Paul Rosenfeld and S.C. Erikson.
3. The ratings yielded by virtually all student instructional rating
questionnaires bear a marginal and unstable or unknown relationship
to the very quality effective teaching must promote: student
learning. College instructors are expected, nevertheless, to submit
themselves to teaching evaluation programmes, permit their
professional performances to be judged by them, and, perhaps, change
their instructional techniques in light of feedback from them.
Although many instructors indeed do these things, it is difficult to
imagine a practice more harmful to a community that is ostensibly
committed to instructional effectiveness. While such programmes
create the appearance of coherent teaching evaluation, they provide
little or no basis for accurately assessing instructional products or
promoting the development of effective pedagogical techniques. And
ironically, because student instructional ratings are poorly
correlated with instructional products, the changes instructors make
to their teaching routines to elevate student ratings are more likely
to compromise than improve teaching effectiveness.
This can be a rather disconcerting realization for enthusiastic proponents of student ratings, who may prefer to believe that evaluation feedback is inherently veridical. But on what grounds is this belief sustainable? What is the relationship of student feedback to classroom proficiency? On what basis can instructors know that the classroom changes implied by student rating feedback are, in fact, improvements? And if feedback bears only a meager relationship to student learning, to what of pedagogical worth is it related? These questions are neither frivolous, unreasonable, nor esoteric. They are utterly fundamental and in need unambiguous answers.
4. This tendency is known widely amongst social psychologists as the "fundamental attribution error," according to which people regularly attribute role or situationally driven behavior to the personality of the actor.
5. As noted below by a seasoned B.C. community college observer, predicaments such as these are mostly of political rather than pedagogical origin.
6. In most instances this socializing strategy succeeds. New
instructors are usually on probation for two years or longer and are
vulnerable to dismissal. Most are prepared to do what they must to
comply with the expectations of those empowered to affect their
employment. And most, therefore, are ready to do what is necessary to
assure that the student ratings they receive are acceptably high. In
an atmosphere in which good teaching is equated with high student
ratings, it makes sense to weave into ones classroom performance
virtually anything that elevates such ratings. For the most part,
this is simple survival. However, in light of the research reported
herein, it is not likely to give rise to effective teaching processes
(also see endnote 3).
7. In a day and age dominated by vulgar consumerism, college degree mills and media hype, the differences between education and the trappings of education can go easily unnoticed. They go unnoticed because both endow students with the conspicuous signs of education -- course credits, grades, transcripts, and degrees, diplomas or certificates. The mere trappings of education are now pervasive in North America. Despite their grades and diplomas, graduates of the pubic school systems often read, write and think at levels that belie their thirteen years of public education. As indicated in the observation below, this is not a particularly new problem.
This observation was made in 1981 by John Roueche, professor and
director of the Community College Leadership Program, University of
Texas at Austin. Twelve years later, public school students fared no
better.
Statistics such as these make it all the more necessary for colleges
to adopt strategies that are demonstrably effective. Yet, as Roueche
& Roueche (1993, p. 20) note below, college administrations
across North America have not been particularly eager to do so.
8. This "market value" can sizably affect instructor employability,
tenure, promotions, raises and awards. Probationary and nonregular
faculty are, of course, particularly vulnerable.
9. As noted below by McMurtry (1991), the incursion of commodity market influences into academic and educational matters is now considerable.
10. This validity coefficient, which was based on coefficients
computed by primary researchers, rose to 6.8% when a primary study
with a negative validity coefficient was excluded.
11. Cohen's (1981) results were not used to calculate this range because they were influenced by primary studies in which students were not randomly assigned to course sections and student ability controls were not used, thus yielding dubious evidence of validity (Abrami, Leventhal, & Perry, 1982, p. 459). Inclusion of Cohen's global validity coefficients would result in an upper limit of 22% and a slightly smaller standard error of estimate.
12. Of course,large systematic sources of ratings variance such as perceived instructor personality (40-70% of rating variance) essentially drive student ratings when perceptions of instructor personality aren't controlled for.
13. In this context the confidence interval is the range of rating values that contains an instructor's true teaching effectiveness score 95 out of 100 samples (i.e., assessments). In the research cited on page 12 by Dowell & Neal the confidence interval very nearly contains all possible rating values. Thus, virtually any rating score could be an instructor's true score.
14. Dowell & Neal (1983) note that similar variability characterizes mean validity estimates, an observation echoed by McCallum (1984, p. 151) and Abrami, d'Apollonia & Cohen (1990, e.g., pp. 222-223).
15. Some social psychological studies also shed light on the
relationship of perceived instructor personality to student ratings.
For example, in a study of the effects of instructor "warmth" on
student perceptions of instructor personality and teaching ability,
Widmeyer & Loy (1988) informed a 270 student physical education
class that their regular professor would be temporarily unavailable.
In his place would be a guest lecturer, "Dr. Jim Wilson," who would
deliver a lecture on the compatibility of sport and education (Dr.
Wilson was actually senior author W.N. Widmeyer). Students were then
given a written biographical sketch of Dr. Wilson. For half of the
students, the biographical sketch disclosed that Dr. Wilson was
considered to be a rather warm person - industrious, critical,
practical, and determined" (the "warm" condition). Remaining students
received the same sketch with the exception that Dr. Wilson was
characterized as a rather cold person (the "cold" condition). All
students were informed that their impressions of Dr. Wilson would be
sought after the lecture. Dr. Wilson was then ushered into class
where he delivered a 40 minute "neutral and informative lecture" to
all students. He then departed the lecture theatre. A teaching
assistant distributed evaluation materials and instructed students to
assess Dr. Wilson's personality and teaching ability.
Statistical analyses yielded significant differences (p < .001) between assessments performed in the warm and cold conditions. With respect to personality, students who were told that Dr. Wilson was cold assessed him as less pleasant, less sociable, less good-natured and less humorous than students in the warm condition. With respect to teaching ability, students in the cold condition assessed Dr. Wilson as less knowledgeable, less considerate, less interesting and less intelligent than students in the warm condition. Students in the cold condition were also less likely than their counterparts to surmise that Dr. Wilson would "go far" in his teaching career. These results are consistent with early findings of person perception researchers (e.g., Asch, 1946; Kelly, 1950).
Widmeyer & Loy (1988) concluded that student assessments of teaching abilities
[In related research, Murray (1978) also found that warmth was
significantly correlated (0.47) with student ratings. However, Murray
found that warmth was virtually unrelated (- 0.01) to student
achievement.]
---------------------------------
* An earlier version of this paper appeared in the June 1994 issue of
FACULTY MATTERS (No. 5, pages 9-12) and the September, 1994 issue of
UPDATE (the newsletter of the Okanagan University College Faculty
Association). The author would like to express his gratitude to Dr.
Bruce Landon, Dr. Stephen Mainprize, Dr. Ray Koopman, Dr. Bruce
Alexander, Dr. John McMurtry, Mr. Ross Powell, Mr. Bill Main, Ms.
Jean Cockburn, Ms. Pam Burry, and Ms. Roslyn Dixon for their helpful
and encouraging comments on earlier drafts of this paper. The author
can be contacted in the Social Sciences Department, Douglas College,
P.O. Box 2503, New Westminster, British Columbia, Canada V3L 5B2
(Telephone (604) 527-5312).
----------------------------------
December 27, 1994
Revised June 17, 1995
Revised December 14, 1995
Revised July 21, 1996
Rwvised October 8, 1996
Word Count
Main Text: 3400
Total: 12,911
------------
Abrami, P.C., d'Apollonia, S., & Cohen, P.A. (1990). Validity of student ratings of instruction: What we know and what we do not. Journal of Educational Psychology, 82, 219-231.
Abrami, P.C., Leventhal, L., & Perry, R.P.(1982). Educational seduction. Review of Educational Research, 32, 446-464.
Abrami, P.C., Perry, R.P. & Leventhal. L. (1982). The relationship between student personality characteristics,teacher ratings, and student achievement. Journal of Educational Psychology, 74, 111-125
Abrami, P.C. & Mizener, D.A. (1985). Student/instructor attitude similarity, student ratings, and course performance. Journal of Educational Psychology, 77, 693-702.
Aleamoni L. (1989). Typical faculty concerns about evaluation of teaching. In L.M. Aleamoni (Ed.) Techniques for evaluating and improving instruction. Jossey-Bass, Inc. San Francisco.
Asch, S.E. (1946). Forming impressions of personality. Journal of Abnormal and Social Psychology, 41, 258-290.
Basow, S.A., & Silberg, N.T. (1987). Student evaluations of college professors: Are female and male professors rated differently? Journal of Educational Psychology, 79, 308-314.
Berry, E. (1989). Taking women professors seriously. Paper presented at the annual meeting of the American Psychological Association, San Francisco, April 1989.
Brewer, M.B., & Crano, W.D. (1994). Social Psychology. St. Paul, MN: West Publishing Co.
Cashin, William. (1990). Students Do Rate Different Academic Fields Differently. In Theall, M. & Franklin J. (Eds), Student Ratings Of Instruction: Issues For Improving Practice. Jossey-Bass, Inc. San Francisco, 1990.
Chandler, T.A. (1978). The questionable status of student evaluations of teaching. Teaching of Psychology, 5, 150-152.
Cohen, P.A. (1981). Student ratings of instruction and student achievement: A meta analysis of multisection validity studies. Review of Educational Research, 51, 281-309.
Cohen, P.A. (1983). Comment on "a selective review of the validity of student ratings of teaching." Journal of Higher Education, 54, 448-458.
Dowell, D.A. & Neal, J.A. (1982). A selective review of the validity of student ratings of teaching. Journal of Higher Education, 53, 51-62.
Dowell, D.A., & Neal, J.A., (1983). The validity and accuracy of student ratings of instruction: A reply to Peter A. Cohen. Journal of Higher Education, 54, 459-463.
Erikson, S.C. (1983). Private measures of good teaching. Teaching of Psychology, 10, 133 136.
Feldman, K.A., (1986). The perceived instructional effectiveness of college teachers as related to their personality and attitudinal characteristics: A review and synthesis. Research in Higher Education, 24, 139-213.
Ferber, M.A,. & Huber, J.A. (1975). Sex of student and instructor: A study of student bias. American Journal of Sociology, 80, 949-963.
Ferguson, G. (1981). Statistical analysis in psychology and education. (5th ed.). New York: McGraw-Hill.
Franklin, J., & Theall, M. (1990). Communicating student ratings to decision makers: Design for good practice. In Theall, M. & Franklin J. (Eds), Student Ratings of Instruction: Issues For Improving Practice, Jossey-Bass, Inc. San Francisco.
Goodwin, L.D., & Stevens. E.A. (1993). The influence of gender on university faculty member's perceptions of "good" teaching. Journal of Higher Education, 64, 166-185.
Howell. D.C. (1992). Statistical methods for psychology (2nd ed.). Belmont, Cal: Duxbury Press.
Kaschak, E. (1978). Sex bias in student evaluations of college instructors. Psychology of Women Quarterly, 2, 235-43.
Kelly, H.H. (1950). The warm-cold variable in first impressions of persons. Journal of Personality, 18, 431-439.
Leventhal, L., Abrami, P.C., & Perry. R.P. (1976). Do teacher rating forms reveal as much about students as about teachers? Journal of Educational Psychology, 68, 441-445.
Lombardo, J. & Tocci, M.E. (1979). Attribution of positive and negative characteristics of instructors as a function of attractiveness and sex of instructor and sex of subject. Perceptual and Motor Skills, 48, 491-494.
McCallum, L.W. (1884). A meta-analysis of course evaluation data and its use in the tenure decision. Research in Higher Education, 21, 150-158.
McKeachie, W. (1987). Can evaluating instruction improve teaching? In L.M.Aleamoni (Ed.), Techniques for evaluating and Improving Instruction. San Francisco: Jossey-Bass, Inc.
McMurtry, J. (1991). Education and the market model. Journal of Philosophy of Education, 25, 209-217.
McMurtry, J. (1992). Evaluating teaching by evaluating learning. Unpublished manuscript. University of Guelph, Ontario, Canada.
Murray, H.A. (1975). Predicting student ratings of college teaching from peer ratings of personality type. Teaching of Psychology, 2, 66-70.
Murray, H.A. (1978). Teacher ratings, student achievement, and teacher personality traits. Paper read at the annual meeting of the Canadian Psychological Association.
Murray, M.J., Rushton, J.P. & Paunonen, S.V. (1990). Teacher personality and student instructional ratings in six types of university courses. Journal of Educational Psychology, 82, 250-261.
Palmer, J.G., Carliner, J. & Romer, T. (1978). Leniency, learning, and evaluations. Journal of Educational Psychology, 70, 855-863.
Rosenfeld, P. (1987). Instructor's Manual to Accompany Scarr and Vander Zadens' Understanding Psychology (5th ed.), New York: Random House.
Roueche, J.E. (1981). Transfer and attrition points of view: Don't close the door. Community and Junior College Journal. December/January.
Roueche, S.D., & Roueche.,J.E. (1993). Making good on the promise: The view from between a rock and a hard place. American Association of Community Colleges Journal. April/May.
Rushton, J.P., Murray, H.G., & Paunonen, S.V. (1983). Personality, research creativity, and teaching effectiveness in university professors. Scientometrics, 5, 93-116.
Sherman, B.R. & Blackburn, R.T. (1975). Personal characteristics and teaching effectiveness of college faculty. Journal of Educational Psychology, 67, 124-131.
Small, A.C., Hollenbeck, A.R., & Haley, L. (1982). The effect of emotional state on student ratings of instruction. Teaching of Psychology, 9, 205-211.
Sullivan, A.M. & Skanes, G.R. (1974). Validity of student evaluation of teaching and the characteristics of successful instructors. Journal of Educational Psychology, 66, 584-590.
Tomasco, A.T. (1980). Student perceptions of instructional and personality characteristics of faculty: A canonical analysis. Teaching of Psychology, 7, 79-82.
Thompson, G.E. (1988). Difficulties in interpreting course evaluations: Some Bayesian insights. Research in Higher Education, 28, 217-222.
Ware, J.E., & Williams, R.G. (1975). The Dr. Fox effect: A study of lecturer effectiveness and ratings of instruction. Journal of Medical Education, 50, 148-156.
Ware, J.E., & Williams, R.G. (1977). An extended visit with Dr. Fox: Validity of student ratings of instruction after repeated exposure to a lecture. American Educational Research Journal, 14, 449-457
Ware, J.E., & Williams, R.G. (1980). A reanalysis of the Dr. Fox experiments. Instructional Evaluation, 4, 15-18.
Widmeyer, W.N., & Loy, J.W. (1988). When you're hot you're hot! Warm-cold effects in first impressions of persons and teaching effectiveness. Journal of Educational Psychology, 80, 118-121.
John C Damron, PhD
DOUGLAS COLLEGE
P.O. Box 2503
New Westminster, British Columbia
Canada V3L 5B2 FAX: 527-5095
e-mail: john_damron@mindlink.bc.ca