These are some questions/concerns teachers have been frequently asking. Hopefully these answers can help you get on the right track.
Q: What is student growth?
Student growth is a demonstrable change in the knowledge and skills of a student or a group of students, as evidenced by gain and/or attainment on two or more assessments between two or more points in time.
Q: What is a growth assessment?
A growth assessment is defined as a set: a pretest and posttest The growth assessment measures student growth in an instructional period of time.
Q: What is a SLO?
A SLO is a Student Learning Objective. SLOs create a measurement model that enables an evaluator to analyze scores from two or more Type I, II, or III assessments and identify whether a pre-established goal(s) has been made through a demonstrated change in a student’s knowledge and skills over time.
Q: Why are we using SLOs?
Q: How many assessments are needed in the process to show growth?
That is a GREAT question. Technically the answer is: at minimum you need two. One to get baseline data and one to use for comparison. However, there are many reasons why two assessments may not be enough.
1) With only a pre and a post test you have no way to change the direction of your teaching based on student learning during the instructional process. It won’t be until after you are done teaching that you realize students have not grown as much as you hoped. If you have mid-point assessment (for a total of 3) you will have a point at which you can see what is working and what is not working, who is responding well to instructional tools and who needs more help. You can pivot your instruction and differentiate and change to meet the needs of your students. Could you do this with smaller formatives throughout the instructional interval? YES! And I hope you do. It would be unwise to not “check in” on student learning multiple times throughout the instructional interval.
2) Students are people, and data with people is unpredictable. People have variables that are outside of the teacher’s control. For example: a student might score poorly because he didn’t have breakfast, broke up with his girlfriend, had a death in the family/friends. None of these reasons have anything to do with the quality of a teacher, but a post test score might be low as a result. If you have more than 2 data points, you have more information to bring to an evaluation conversation.
Q: What is midpoint or pivot point?
Midpoint or pivot point defines the halfway point between your pretest and posttest. It is important to understand that this is the process for collecting formative student learning data halfway through the evaluation cycle that will assess progress and inform instructional adjustments but will not be included in student growth scores for evaluative purposes. With approval from the administrator, SLO’s could be adjusted if necessary based on collected data.
Q: What types of evidence are needed to show growth?
Teachers should gather student scores over multiple assessments to reflect the change in student performance on at least two different assessments. Depending on how you write a growth goal, this data should be presented in a way that proves you did or did not meet your goal. Typically the evidence for teachers to gather includes: assessments given, student work samples, scores on assessments within the set, spreadsheet or other tool showing growth over time.
Q: How do I collect and track my student data?
It depends on your school district’s requirements. Some have simple forms for you to hand-write in data, others are using data warehousing systems. For most teachers, tracking data starts in a spreadsheet of student scores which you can sort from highest score to lowest score during your data analysis process.
Q: How can the district insure that the evidence is valid and reliable? (ie. teachers won’t scheme the system?)
It hard to imagine a system that is completely perfect and void of human influence or error. However, there are ways to increase the confidence we can have in the system.
1) Start with valid assessments. What is a valid assessment? It is an assessment that accurately measures what it is intended to measure. That means the questions are written so students get the correct/incorrect because of their knowledge of that content and skill NOT because of cultural bias, NOT because of excessive wordiness, etc. For example, a question with a very difficult vocabulary word might become invalid because students are getting it wrong because they don’t know that word, not because they don’t understand the concept. The data is measuring who knows the word: not what you intended to measure. You can’t prove an assessment is valid until you have historical data, so it is important that we give the assessment to the students and examine the scores. “…Validity is concerned with the confidence with which we may draw inferences about student learning from an assessment. Furthermore, validity is not an either/or proposition, instead it is a matter of degree,” (Gareis & Grant, 2008 p.35)
Thus, increasing validity will be an ongoing district process.
2) Start with Reliable Assessments. What is a reliable assessment? It is an assessment that will yield repeatable results. If you take an assessment and give it to a group of kids one year, then give it to a similar group of students next year you get similar results. Reliability in selected response (multiple choice) tests is easier to achieve. Reliability with open ended and performance tests is more difficult to ensure. Rubrics must be created and used so that the grader(s) have very specific understandings of what each level of the rubric means. Inter-rater reliability, or multiple graders who would agree on the same score for the same work is essential. Even more essential is that the individuals grading are consistent between one batch of student work, and then another batch of student work several months or even a year in the future. Ensuring repeatable results with the same rubric is ESSENTIAL.
3) Assessments Should be High Quality : Teachers should be able to explain why the assessment set accurately measures student growth in the key areas of their curriculum and administrators should be able to understand common elements in quality growth assessment design. Using a district rubric or checklist that looks at alignment, distractor and wrong answer use, growth design, cognitive demand, validity and reliability will help indicate the quality of the assessments. “Gaming the system” with unaligned, easy, or schemed assessments should thus be highlighted and prevented.
4) Make Data Collection Simple: Asking teachers to fill out complex spreadsheets opens us up for unintentional errors in data entry. Systems of scantron tools and ways to consistently and automatically collect data will minimize errors both intentional and unintentional.
Q: How can we ensure validity and reliability between assessments when two teachers are creating and grading the same assessment?
The validity should not be affected by the fact that there is more than one grader. In fact, the multiple minds at the table during the creation process should help increase the validity of the assessment and help increase the fact that the assessment truly measures what it intended to measure.
Reliability is another issue altogether. When multiple teachers are grading assessments, we need to make sure their results are repeatable no matter who grades the test. Start by increasing inter-rater reliability by having a “trade and grade” professional development event. Teachers can learn about how the others would have graded the same questions. Some districts have use two graders on the same assessment and use averaged the score. Other districts don’t allow teachers to grade their own student’s work. Ultimately, if multiple teachers using the same assessment have worked to ensure a high degree of comparability in the way they give scores, the data produced will be reliable.
Q: Will teachers still disaggregate scores by the complexity levels of the assessment? (The comment was a concern about the labor intensive nature of this process)
How you choose to disaggregate your data will depend on two things
1) The type of goal you write
2) How you will use the data in instruction
We should understand that the growth portion of the evaluation is typically based on the % of students that made an appropriate amount of growth during an instructional interval. Unless your goal is that students will improve the number questions on an “DOK 4” difficulty level correct, or the number of questions of a difficulty level “DOK 2” level correct, you do not need to disaggregate this data for the student growth portion of your evaluation. Typical goals are that students will improve on entire standards, skills or assessments in total. To support these goals, at minimum, you need the data that reflects improvement on just that.
However, keep in mind that part of the reason we are examining student growth is to make instructional decisions based on student need and adjust our work in the classroom to ensure all students are learning and growing. In the professional practice portion of the evaluation you also demonstrate your ability to work with assessment, adjust instruction based on student data and use formative assessments in your reflections on teaching. Thus disaggregation of this data will be potentially a factor to distinguish the “excellent” teacher from the “proficient” teacher.
Q: How can we create mirrored assessments that become an integrated part of the instruction and assessment process without becoming a burden and a loss of instructional time (i.e. using too many days at the beginning or end of a cycle due to the need for assessment)
Consider that every time you administer an assessment you give up instructional time. Any assessment that takes instructional time and doesn’t give quality data to improve the way we teach should be seriously examined for its purpose. If you create high quality, mirrored assessments that become integrated into instruction they WILL be providing great data and the instructional time loss will be very worthwhile. The key is designing an assessment that provides the RIGHT data. That means data that can be used to change instruction. It is easy to write an assessment that gives you the wrong kind of data. It is much more complicated to create assessments that allow teachers to get to a deep level of realization in student understanding and learning. This hinges on careful and intentional assessment design considering assessment alignment, intentional question form choices, distractor analysis, intentional question complexity, and a high level of validity and reliability.
Q: Can I write a mirrored assessment that contains “content based” questions? Can these be included as a part of measuring growth?
Of course. You can include virtually anything that you can prove is an important part of your curriculum. When asking content based questions, the problem typically is that content questions tend to be memorization. For example: vocabulary, identification of people or dates. When putting together an assessment blueprint, take a moment and plan the percentage of questions at different levels of cognitive demand. If you choose to have 25% of the assessment at the remember, understand and recall (DOK 1) level of complexity, for example, you have those questions that can still ask students for vocabulary or other recall questions. Just realize that it is NOT best practice to have the entire assessment at the lower levels of complexity. Think about ways to get to analysis and evaluation with your content based questions and you will be off and running with a high quality assessment. Typically these questions will be more skill based and may move to applying your content.
Q: We concerned that we are testing over material we haven’t taught yet to see what they know!
And you are! This is how some growth assessments work. To know for-sure what a student knows or does not know, sometimes you have to venture into territory they might not know. We see this as a key feature of the computer adaptive testing systems like MAP/NWEA.
An attainment assessment (the kind we have been giving kids for years) tests material they should already know. They are given after you taught. This test is given at the END of the instruction and it measures how much students learned (or attained). This is great for measuring mastery, but NOT GROWTH.
On the flip side, let’s consider growth. Imagine I’m heading to the pediatrician’s office I want to know how much taller my 3 year old son has become in the past year. So, when we arrive, we measure his height today. That tells me one thing: how tall he is today. That’s attainment. What I don’t know is how much he has GROWN. The only way to do that is to look at a data point from last year’s pediatrician visit. Then I can compare the two points and see how much he has changed.
In our classrooms, we have to do the same thing! We are measuring change over time in order to get growth data. Without a baseline (like the first pediatrician visit), we have nothing to compare. So yes, we have to measure beginning of the year: before we taught the students. This gives us the baseline score for comparison, an ESSENTIAL data point for measuring growth.
What you may find, is some students surprise you with what they already know, and help you better decide where to move next. Others may surprise you with their vast lack of knowledge and where you have to remediate. Baseline data is formative, and it will tell you how to mold your teaching over the next instructional interval.
It is highly suggested that you have more than the two measurement points: get at least one in between. The assessments throughout the instructional interval are the “checkup” to see how much the student is growing as you suspected (or not growing!) and if your instructional methods are working as planned or not. This is really how you get information to help you mold your teaching, pivot your direction and differentiate to best meet the needs of all students. This is how you help every student grow to their full potential!
Q: How can measure growth without the loss of instructional time?
(ex: losing too many days at the beginning or end of a cycle due to the need for assessment)
Consider that every time you administer an assessment you give up instructional time. Any assessment that takes instructional time and doesn’t give quality data to improve the way we teach should be seriously examined for its purpose. If you create high quality, mirrored assessments that become integrated into instruction they WILL be providing great data and the instructional time loss will be very worthwhile. The key is designing an assessment that provides the RIGHT data. That means data that can be used to change instruction. Its easy to write an assessment that gives you the wrong kind of data. Its much more complicated to create assessments that allow teachers to get to a deep level of realization in student understanding and learning. This hinges on careful and intentional assessment design considering assessment alignment, intentional question form choices, distractor analysis, intentional question complexity, and a high level of validity and reliability.
Q: What kind of assessments are appropriate for K-2?
For the youngest students, you may choose different assessment options than our older students. For example, long tests are often not feasible. Some teachers have approached this by breaking a test up and giving it to the students over several days. Additionally, you may find more recall and memorization tests at this level with more level one types of questions such as memorization of numbers, letters, sight words and math facts. If it is an appropriate reflection of what is essential to memorize for that class or subject for that age group, then it is an appropriate type of skill to consider for a growth assessment. I would caution you to stay away from assessments that are 100% memorization because they can leave you with no way to show sustained growth. For example, once a student memorizes the list of sight words early on in the year, how can the student demonstrate continued learning that is occurring, or how does a teacher demonstrate his or her continual impact on the student? Think about assessments that allow you to reach both the high end students and the low end students.
It may be difficult to use multiple choice for students this young. If you choose to use a selected response method such as multiple choice, it might be preferable to show the question first, then show some answer choices. Some teachers have done this by putting them on separate pages or using another blank page to cover the answers.
It is appropriate to have open ended questions for the students, just remember to use a rubric. In fact, many teachers of the youngest grades are using more open ended questions than multiple choices.
Another thing to consider is giving younger students some or all of the assessment orally. This takes much longer to administer, but the results may be more accurate. This may also require teachers to administer the assessment in a one-on-one setting. Remember that whatever administration technique you choose, you need to be consistent. In other words, if you are doing the whole assessment orally in the fall, you need to repeat the same method and do it orally at the next data point. By changing the administration technique in the middle of your assessment set you will invalidate your data!
Q: Where do we begin as we build these assessments?
Start with a blueprint. As an architect and contractor wouldn’t attempt to build a home without a blueprint, a teacher shouldn’t attempt to build the assessment set without a game plan and blueprint. Make a few decisions:
1) What will I measure? Ask yourself: What are the essential skills that are most important in your curriculum that can be measured for sustained growth? What will give me leverage for the next level of learning? What has endurance? Look at your standards, and discuss tools such as the Livebinder scope and sequence by ISBE as well as the PARCC assessment frameworks.
2) How will I measure it? Ask yourself: what types of questions will give me the best data? Will multiple choice provide an accurate representation of student learning? Is that appropriate for this subject or grade level? Remember that multiple choice, though the hardest and most time consuming to write, will give you the easiest questions to grade. Will constructed response provide an accurate representation of student learning? Is that appropriate for this subject or grade level? What about performance? Sketch out your plan. Look at each skill you will measure and determine what form you will use for the questions and how many questions.
3) What is my goal level of cognitive demand? Ask yourself: what level of difficulty should the questions be written to? Does that range of complexity reflect the range of complexity of the tasks within my classroom and the curriculum? Does this range of cognitive demand allow all students to demonstrate their knowledge (both high achieving and low achieving)?
Q: Are rubrics a good option for measuring growth?
Rubrics are a viable and great tool for measuring growth over multiple data points.
It is important to recognize, however, that ensuring reliability (repeatability of results) can be a little more difficult with a rubric graded open ended task than a multiple choice test. Reliability with open ended and performance tests is more difficult to ensure. Rubrics must be created and implemented so that the grader(s) have very specific understandings of what each level of the rubric means. Inter-rater reliability, or consistency between multiple graders, is important so the whole team would agree on the same score for the same work. Even more essential is that the individuals grading are consistent within themselves. That means when you grade a batch of student work, and then another batch of student work several months or even a year in the future your scoring methods are the same (a three, is a three is a three. every time). Ensuring repeatable results with the same rubric is ESSENTIAL to getting useful data for growth.
Q: Can one question be aligned to multiple standards?
It depends on the question. In general, we should proceed with caution.
Questions that are straightforward (especially those at DOK 1 and 2) will generally work best when aligned with only a single standard.
However, other questions that require synthesis (ex: argumentative essay writing) or multiple steps (ex: multi-step math problem) may easily be aligned to multiple standards. One respected source, Engage New York, has several questions aligned to more than one standard. However, I never have seen more than 3 standards per questions. Most frequently it is 2 or 1 standards aligned to a single question.
What we do want to avoid is a practice we call “standard stuffing.” This is most commonly seen in lesson planning when a teacher examines an activity, and then links every single standard that a student is exposed to in the activity, every single standard the student needed to know, and some that were serious instructional focuses. Standard Stuffing leaves a lesson plan with a long list of standards aligned to a single activity. For example a lesson focused on determining the efficacy of a source may have other standards including citing evidence, language standards, and writing standards.
The teacher should identify a “primary” standard first. This is the skill or concept of focus will have explicit instruction and practice. The rest is “secondary” which might be reviewed, or touched upon. Often, most secondary standards were primary standards at a point earlier in the school year.
When it comes to assessment, we suggest the same thought is applied. Find a single standard that is primary, and the others can be noted as secondary. We don’t want to standard stuff our questions. Here’s why: when it comes time to re-teach or adjust curriculum based on assessment results, you want questions to tell you where the students need help. If a single question indicates a single standard at a focused DOK level your instructional responsiveness toolbox is very focused. For example a student getting questions wrong about the cell at DOK 1 needs vocab intervention where as a student getting questions wrong about the cell at DOK 2 might need to see the interactions between cell organelles and see how cell organelles are working together. Totally different approaches.
So, in short, if too many standards are aligned to a single question it muddies the intervention and re-teaching waters, in my opinion. The right questions can be aligned to multiple standards…but which questions are good candidates is a powerful and important conversation.
Q: Can a teacher create an assessment set and use it for 2 different Student Learning Objectives (SLOs)?
Generally, the answer is “no” for (1) the simple reason that you are “putting all your eggs in a single basket” if you will. The quality of your assessments is checked and double checked through your review process. However, if students do not display growth appropriately (due to factors beyond the assessment itself…including validity of assessment, testing environment, other distractions in students’ personal lives at school) on the assessment tool both SLOs will be adversely affected.
Another reason to say “no” is that (2) Each SLO focused on a single BIG IDEA and a single assessment tool is used to measure that idea. Typically that BIG IDEA covers multiple content standards, and if they are similar enough to be measured in tandem one might argue that they are part of the same BIG IDEA. Of course, without talking to the teacher and getting more information, one cannot be 100% sure of their implementation. Thus the recommendation is built upon the a myriad of examples and what a typical assessment measuring multiple CCSS looks like.
Another reason to typically say “no” is that (3) PEAC (In Illiniois) recommended two SLOs and the law was written to require two SLOs to allow teachers to measure two different aspects of their teaching. If you use the same assessment, teachers are typically using both SLOs to measure a single aspect of their teaching (ex: answering multiple choice math problems, or only writing one style like narrative—not integrating expository and persuasive, for example, or only reading literature–not integrating reading for information, etc.)
No SLO can possibly have a learning goal which encompasses the vast amount of material a teacher covers during an instructional interval. Thus, the teacher must focus. But typically we do not have 2 SLOs with the same focus. Each SLO typically has a single learning goal with leverage, endurance and readiness for the next level of learning which the assessment will be designed to measure…and the learning goal on each SLO is typically different.
According to the trainings and documentation on the ISBE website, a single SLO will have a single learning goal. A learning goal is a BIG IDEA (see the work of Grant Wiggins, Larry Ainsworth, Doug Reeves for more information). A BIG IDEA integrates multiple content standards and multiple units of study. (ISBE SLO Guidebook page 6
A BIG IDEA is NOT focused on the minute aspects of a class, but rather big threads that are of integral importance to student learning. Here are some examples: (A) Analysis of Primary Sources within time period XXX-XXX through an argumentative paragraph (B) Reading literature and literary analysis through multiple choices and short answer questions.
Notice that both examples bring in multiple content standards and are consistently worked on over multiple units of study. Example (A) has students reading sources at various time periods (could cover multiple units) and the understanding of history (several C3 standards) combined with reading information and reading charts and graphs (several CCSS RI and RH standards) and communication of ideas (CCSS Writing) brings together multiple standards. They are integrated and not separated into multiple SLOs.
Typically the two SLOs selected by the teacher cover two different learning goals which are not simply two different standards, but rather two different BIG IDEAS. It is also suggested that teachers use the opportunity of two SLOs to measure in different modalities if that is appropriate. For example one might be focused on discussion and speaking and the other on writing.
These big ideas should be over long periods of time as the teacher is continuously working on those ideas in several units.
Q: Can I use a 4-6 week unit of study as my instructional interval?
An SLO designed for a single unit can be very problematic and is not considered best practice by many. Thinking about the process of the SLO from the teacher perspective, you will see how there is little time to get everything done in this time range. The baseline assessment, grading and writing of the goal may take a typical teacher a week to complete. The meeting with the evaluator needs to happen next, and that meeting may not happen for a couple of days after the teacher submits the SLO for approval (this isn’t even addressing the fact that any PERA JC that sets a district wide SLO deadline would then require all teachers to choose their first unit of instruction based on deadlines). By the time the SLO is approved (as long as there were no revisions requested by the evaluator) we could be a good 2 weeks into the unit.
The midpoint of a 4 week unit would already be HERE at the same time the goal is only getting approved. YIKES. Then there has to be another meeting if the goal would potentially be adjusted and has the teacher had ample time to teach, assess formatively and adjust instruction to the unique needs of the students in that classroom. I think it would be difficult to argue that the teacher has enough time. This is why a longer instructional interval is advised.
From the perspective of the evaluator, the goal of evaluations are to capture the teacher impact on learning…the teachers professional practice not in a snapshot but in their long term impact on kids. If we only look at a short period of time we run into the same concerns of the observation that occurs on one date and never again. It can be a good snapshot, or an inaccurate snapshot. In short, looking at growth over longer periods of time gives a more accurate picture of learning over time and more valid evaluation data.
Q: With CCSS aligned writing rubrics, is there concern in creating two SLO’s from the same rubric?
Using the same rubric for two different SLOs will bring in the concerns as mentioned above with needing two SLOs. They could arguably be two different BIG IDEAS which could easily be made into two SLOs. However, by using the same prompt, same writing samples…you COULD run into some issues related to validity and putting all your eggs in one basket. It is understandable that writing takes a long time to grade, and these two big ideas are tied together and it may be an ineffective use of teacher time to grade 2 essays to have 2 different assessments.
Some thoughts on implementation:
The teacher widened the scope of the assessment to allow 3 components, 2 of which would be aligned to each SLO. Both SLOs would have measurement components from the final paper, but each SLO would have an additional and unique component as well.
1) pre-writing activities focused on organization (graphic organizer, outline, etc) The teacher may see improvement in the student’s planning but less execution in the writing final draft…but still capture student growth.
2) pre-writing activities focused on gathering evidence
3) Final Essay
SLO #1: Learning goal written around organization linked to assessment (1) and (3)
SLO #2: Learning goal written around use of evidence and support linked to assessment (2) and (3)
Anne is an assessment and curriculum specialist best known for her work in assessment design, data analysis and instructional effectiveness. Anne is a sought after speaker in the area of assessment design, curriculum and instruction.