Monday, October 15, 2012

Dr. Ries and Insensitivity of Tests


According to  the author of this week's article, W. James Popham (2011, p. 298), when evaluating a large scale accountability test one must consider the test’s “instructionally sensitivity.”    A test’s “instructional sensitivity”  represents the degree to which students’ performances on that test accurately reflect the quality of the instruction that was provided specifically to promote students’ mastery of whatever is being assessed.   An “instructionally sensitive” test should  be capable of distinguishing between strong and weak instruction by allowing us to validly conclude that a set of students’ high test scores are meaningfully, but not exclusively, attributable to effective instruction.   Similarly, such a test would allow us to accurately infer that a set of students’ low test scores are meaningfully, but not exclusively, attributable to ineffective instruction.  Popham goes on to suggest that the only  practical way in which to appraise a test’s “instruction sensitivity”   is to create panels of 15 to 20 curriculum specialist and teachers who are knowledgeable about the content under study, and to add to that group of specialists several noneducators.   Popham recommends that such a panel rely on four evaluative dimensions.  In essence, those who judge large scale accountability tests should consider:   1) the number of curriculum aims assessed,  2) the clarity of assessment targets,  3) the number of items per assessed curricular aim, and  4) the instructional sensitivity of the items.  First of all, do you agree with Popham that large scale accountability tests need to be examined more closely by expert panels?   Why or why not?   Then you are being asked to  discuss any one of these four evaluative dimensions suggested by the author and make a case as to why it should or should not be used for the large scale evaluative purposes discussed here.    Taking this thought a step further, can the “four dimensions of evaluation” model that has been discussed in this article be applied to you in your classroom and to your teacher-made assessments?   How do you evaluate the "instructional sensitivity" of the tests you create for your students?    Do you apply any of the evaluative dimensions to your personal evaluation of tests?      

7 comments:

  1. I wholeheartedly disagree with what Mr. Popham says, with regard to creating panels of curriculum experts and non-educators to closely examine large-scale accountability tests. While I agree that assessments need to be looked at in terms of the four evaulative dimensions, I don't think that such tests need to be examined by anyone but the teachers, administrators, and whoever is writing that particular curriculum. We don't need non-educators involved in this process.

    With regard to the "instructional sensitivity" of my tests, I believe it is my job as an educator to vary the types of assessments I give, so that all students have a fair chance to doing well on different types of assessments, such as oral or written tests. I already make accommodations for my special needs students. I don't think we should have to do more than these things!

    ReplyDelete
  2. I found Popham's article interesting, if somewhat "wonky," to read. I do think he makes a very good point about the validity of using standardized tests to judge the quality of instruction. This is such an important issue to all NJ public school teachers, who will soon find their performance evaluations are based in part on standardized test scores.

    Should there be panels of experts? I would think so; after all, some set of people is designing these tests. I have no idea, for instance, what the process is for developing the new test items for the Common Core Standards. However, I suspect that there are many people involved in the endeavor of test creation, including careful consideration of the design, content, and balance of test items- at least I hope so! Popham's framework for evaluating individual test items seems reasonable to me. He specifies that the panels should be made up mostly of subject matter experts, including teachers, who know the content very well.



    The review of test "item sensitivity" seems important to me. Popham's idea of examining each test item for the influence of the factors other than instruction that impact a student's ability to answer it was right on. We all know that students in higher District Factor Groups generally score higher on the ASK than students in urban districts, and we all know that it's not because of teacher quality (in fact, it may be in spite of it!). We all know that some students are just brighter, and/or better test takers, and they will score higher on these tests. I'm impressed by Popham's effort to align the assessment items with the goals the test is supposed to be measuring (isn't this what we need to do on the tests we design for our students). Measuring teacher effectiveness on factors that have nothing to do with instruction is actually bizarre. So Popham's idea carefully considering test items to sift out that that are more likely measuring SES or individual academic aptitude than instruction is essential if we are using these tests to assess instructional effectiveness.

    Popham's premise is closely related to our readings in the Brookhart book this week. If we want to create assessments that measure what our students have learned in our class, we need to ensure we develop assessments that are related to the learning goals. Analogously, if we want to measure teachers on their instruction, we need to make sure that the tools we use for measurement are more related to their teaching than to the demographics of their student populations.

    ReplyDelete
    Replies
    1. I agreed with what Popham had stated about schools and there tests and the validity of it. He had said that large scale accountability have taken a precedent in schools. Since, in some states teachers are held to accountability to make sure that their students accomplish the standards that are set for them to pass the state test. If the students do well that means that they understand what was being taught to them.
      The number of aims and goals that need to be assessed in curriculum are very important.
      By experience all teachers realize that they need to pick and chose the correct goals to use instead of them all. When figuring out what goals or aims to teach for the test a teacher should pick the most important goals that will be covered on the test. You must look at what you can cover through a week, few days, and a monthly unit. Also, the use of a rubric, benchmarks, and skiils tests, simple solutions, and weekly spelling and math tests.
      The way that I address the tests that I use in my own classroom are by creating my own weekly tests for some subject depending on the subjects that need me to create them. Some subjects already have test generators that I can make photocopies of the test. Some tests I create on my own from what I am teaching. I make a review sheet for my students of what will be on the tests that I am giving them. I am a firm believer to give my students a review sheet of what they are being taught. I also, make flash cards of what I am teaching for Math, Reading, Spelling and go over that with my students.
      Yes, I do I try to use different types of assessments for grading with my students. For example today my class talk a vocabulary test and with one of my students I did a verbal type of test with my student to make sure they understood what I was teaching.

      Delete
  3. Essentially on one level or another every educator in America has been adversely effected by high stakes testing. How can one test adequately rank what does or does not happen in a classroom. I think high stakes testing is merely a snap shot into the learning process of a student at a particular point in time. Our society is so focused on "almighty test scores", that children and fostering a ginuine love of learning have fallen by the wayside and it saddens me greatly!
    I contend the last thing we need to bring into education is another outside opinion especially that of non-educators. It is time educators realize we are the experts in our chosen field of study. Furthermore, how would these non-educators be chosen. I have a funny feeling that educators would have very little to no say in selecting this group, much the same way that we have very little say in what we teach and how we teach it outside of our individual classrooms. We never seem to be included in the policy and politics that effects us.

    ReplyDelete
    Replies
    1. I think standardized tests such as the HSPA are an extremely poor indicator of teacher instruction, especially when the current teacher of the student is responsible for that student's performance. That student could have received topic instruction from the current teacher, but could have received poor instruction from prior teachers. So why is only 1 teacher accountable? It should impact everyone who ever taught that student. With non-standardized tests, a teacher should ensure he/she analyzes the data to see if there was a misconception during the learning process. If there was, data utilized should help correct that disconnect.

      I think teacher evaluations should definitely incorporate student achievement in some respect. However, there are so many kinks with that approach that they need to be worked out before anything is implemented.

      Delete
  4. Large scale accountability tests will not only have implications for students, they will have implications or teachers as well. Across the country, they will be used as part of the new teacher evaluation process. After reading the article by Popham, I am not sold on the validity of these tests or that they can accurately measure traits that are influenced by instruction. Popham states that “all but a few of the accountability tests now having such a profound impact on our nation’s schools are instructionally insensitive,” and that “they are patently unsuitable for use in any sensible educational accountability program.” This concerns me, to say the least. To think that teacher’s compensation will be affected by accountability tests that have no overwhelming proof of validity is frightening. If a test is going to affect a teacher’s livelihood, there needs to be clear and present evidence that the test does what it claims to do. The Popham article confirms that there is no clear proof that these tests are a direct reflection of instruction by any one teacher.
    In the section “Items Per Assessed Curricular Aim,” Popham points out that these large scale accountability tests do not have enough items to allow teachers, or anyone else for that matter, to evaluate what, if any, curricular aims wet met. One of the main goals of assessment should not only be to evaluate student progress, but should be used as a tool for teachers. A teacher needs to be able to evaluate whether or not the content they are teaching is reaching the students and if understandings are being achieved. If a teacher can’t tell where they may be missing the mark, how can they ever hope to correct the problem and re-think their teaching of that content?
    My distain for these kinds of tests only grew when I read the Rook article today. The article supports the idea that standardized testing is not fair to students, and I would add that this kind of evaluative practice is then unfair to teachers as well. The research is clear; these tests show that some students are heavily advantaged by these tests and some are considerably disadvantaged. We cannot choose our students. Teachers have to work with what they are given, and if the students that they teach are disadvantaged by a test, then their teaching is probably disadvantaged by that test as well.

    ReplyDelete
  5. (I had to send this on a separate submission - it was too long to submit as one. Sorry guys...) I evaluate the instructional sensitivity of my testing on a regular basis. I always review the tests in an open forum after I have graded them. I evaluate what questions they students did poorly on first. If there is a large number of students that answered a particular question incorrectly, I flag that question and evaluate it. I consider the wording of the question to determine if it is confusing or ambiguous in any way. I ask the students why “they” think they got it wrong. The students are usually honest. If they had not prepared sufficiently, they tell me. If they didn’t understand the question, they tell me that as well. I re-word the question in class to see if they can answer it and if they can, I know I have to change the wording on the test for the next time. I may even give the students credit for that question if it is clear to me that I didn’t design the question properly. I also look at the median scores of the test. I consider the students’ class performance and daily study habits and evaluate if the tests reflect the same kind of performance. This can be telling in various ways. Students perform poorly for a variety of reasons, and one of them is just a basic lack of preparation. This is not something that a teacher is necessarily directly responsible for. If they chose not to prepare for the test, hopefully that is a lesson learned. However, if a majority of the students perform poorly, you have to assess what the problem is and work to correct it. This is a process that I do and I don’t believe that a large scale accountability test can do what I do. Its that simple. A student can perform poorly on any given day, and the reasons they perform poorly vary. They can perform poorly because they have not eaten breakfast that day. It could be because their parents were arguing the night before. It could be because one of their parents decided to leave town. It could be because they were ill. The list is endless and the Rook article outlines a host of other issues that affect how students perform on large scale accountability tests. I don’t care what kind of expert panels are summoned to design a test that can measure the effectiveness of a teacher; if it involves the results of one student on one particular day in his/her life, that is not enough for me to buy into it.

    ReplyDelete