# So was I an effective teacher?

We just finished with PARCC testing for the year. Oh my gosh, finally. It was a massive effort involving time and technology and a ton of flexibility and creative problem-solving. For eight school days, I had to teach my computer classes either with no computers or with only a half hour of class time. The tests were really long and tiring and we were fried at the end of each session. I spent a lot of money on cookies and gum to keep the kids’ spirits up. Eight full days of testing schedules. I wondered to myself if the information I’m going to get from PARCC tests is so amazing that it would be well worth the eight days of testing.

So I put forth a little experiment. In the past few years, we have given two standardized math assessments. We give NWEA MAP, which is an hour-long multiple-choice test you can administer anytime on a flexible schedule. It’s easy and you get results right away. I’ve always valued being able to show a kid just how many points she gained in one year, on the day we take the test.

The other assessment has been (up until we started with PARCC) our state assessment, aligned to the Colorado Academic Standards (now Common Core). The test is called TCAP and takes 5-6 hours to administer for math alone. I bet myself that the data from the 1-hour test was just as good as the data from the 6 hour test. I decided I would scatterplot my growth on MAP vs. the growth on TCAP. If the correlation was strong, we were getting good information from both tests, and maybe that’s evidence we could get by with just one of them. You with me?

Growth is ultimately what determines our school’s rating and my rating as a teacher. I’ve always been behind that model, because it doesn’t matter where your kids start at the beginning of the year. If they grow, you’ve taught them, and that’s good. My understanding is that my results from the two standardized tests – MAP and TCAP – are plugged into a formula to give me an Effectiveness Rating.

So. Growth on MAP versus growth on TCAP. For the MAP tests, I see the numbers right away, so I get a raw score in points and just subtract the two spring scores to find the growth. Typical growth is around 6 points. For TCAP, the state uses a “growth model” formula to determine growth from year to year, and then they tell us a child’s growth percentile. Thus, typical growth is in the 50th percentile.

The graph:

I used the correl() function in Excel to find the correlation coefficient. It’s 0.24. If there is a correlation, it’s really weak.

This is the data that was used to determine my effectiveness rating. Would you determine that I did an effective job?

I made a graph of the previous year’s data and found the same thing. Growth by one measure did not predict growth by another. I took my charts to my friend Kathy to show her. She just as nerdy as I am about data, and she gets just as intense about results, so I knew she’d be really interested in it.

Me: So I was curious if growth on MAP had anything to do with growth on TCAP, since they’re used for our effectiveness rating. So I scatterplotted MAP against TCAP growth model.

Kathy: They don’t correlate.

Me: Wha… how did you know?

Kathy: I’ve been plotting mine for four years. My students analyze it as a statistics activity. There is no correlation between their MAP growth and TCAP growth.

Me: I’m having a crisis over this. It always meant a lot to me to get my test results.

Kathy: Maybe it’s just you and me. We ought to look at it department-wide.

So I took my charts to one of my administrators and explained what I found.

Me: So there’s no correlation.

Admin: Really?

Me: Look, there’s the correlation coefficient.

Admin: They’re testing different things.

Me: They shouldn’t be testing different things. Right? They should be measuring your growth in math achievement.

Admin: If they’re testing different things, we have to make sure we’re not treating them as if they’re testing the same thing. I’ll show this to the assessment department head. Will you send it to me electronically?

I have always appreciated that the leadership at my school and my district wants to use data correctly as an improvement tool, and they always welcome a critical discussion. I love that we can have that conversation.

He’s right, the tests probably are measuring different things, but I honestly do not know what they each measure. There are some key differences in the tests. MAP is all multiple choice, TCAP is partly constructed response and graded with a rubric. MAP is computerized. TCAP has always been on paper. MAP is a general survey of math knowledge and problem-solving, TCAP is standards-based and specific to a grade level. But still, you would think if a student generally got smarter in math, they would consistently show growth by both measures, and it just isn’t true.

I am really wondering if everything I’ve ever thought about math achievement tests is a lie.

One possibility is that PARCC will add clarity to this confusing picture of student achievement growth.

The other possibility is that I’ll get one more data point that doesn’t make any sense, but this time I will have wasted 8 days getting that data point.

Thank you. This is a very interesting read. A study could be done on just this data. You would thing better is better no matter what the testing tool.

That’s what I would have thought too. I can’t let it go. We get really stressed out about this data. It really should not be hard to assess a student’s general math (or reading) ability. Give me 10 minutes with a student and I can figure out if they understand math above, at, or below grade level. How on earth are we giving kids hours of tests and getting completely random patterns of data on their math achievement?

We trust our doctors and we trust our lawyers but we live in a society that refuses to trust its teachers.

State tests, at least in NY are created and graded under a cloud of secrecy. Cut scores are made after the fact to create the illusion of a failing school system and of course since everything is secret they can’t be used to help in terms of student development.

On the teacher evaluation side, there has been plenty of work showing that student test scores are a horrible way to assess teacher effectiveness.

We are fuzzy on how growth model is calculated as well, and I share your concerns on the usefulness of the data – received too late and not in enough detail to help teaching the kids, not trustworthy enough to evaluate a teacher.