|
New York State Testing Program (NYSTP)
Frequently Asked Questions
About the Assessments
Q1.
Are 3-8 tests standardized?
A1. Yes. A standardized test is defined as a test that uses uniform
procedures for administration and scoring in order to assure that the
results from different people are comparable. The NYSTP requires
standardized administration. Tests must be administered exactly the same
way each time. Everyone must be given the exact same instructions in
sufficient detail that no differences in administration would take place
between settings or people who administer the test.
Q2. Are 3-8 tests norm-referenced or criterion-referenced
tests?
A2. They are criterion-referenced tests because they measure how
well students are meeting the learning standards in English Language
Arts and mathematics. In contrast, a norm-referenced test is made to
compare students to each other rather than determining whether or not a
student meets a certain criteria.
Q3.
Is there such thing as a bad test question? Exactly how is a test
question developed?
A3. A
hard test question is not necessarily a bad test question. NYSTP is
designed to determine whether students are meeting the learning
standards. Hard questions are those where only a small percentage of
students were able to answer correctly. The question may be intended to
have this characteristic in order to detect different student
achievement levels. Test questions, commonly referred to as items, are
developed through a process that involves many steps guided by industry
standards in assessment and measurement. Teachers are included in
committees that develop NYS tests to ensure that test questions are
aligned with the State’s Learning Standards and to help specify
performance indicators, question formats, and appropriate content for
the grade levels in which they are experts. This test specification
process leads to the writing of test questions following important
guidelines set by specialists. A rigorous review and editing of written
questions takes place many times during the process culminating in
statistical analysis of how the test questions perform on field tests.
For more information see
Guide to the Grades
3-8 Testing Program in English Language Arts and Mathematics
http://www.emsc.nysed.gov/3-8/intro07.pdf
Guidelines for Item
Writers:
http://emsc33.nysed.gov/osa/assesspubs/pubsarch/etsguidelines
Standard Setting and
Equating on the New Generation of New York State Assessments
http://www.emsc.nysed.gov/osa/assesspubs/pubsarch/standard%20settingand%20equatingon%20thenew%20generation.pdf
Educational Assessment: Four Principles to Consider
http://www.ctb.com/articles/article_information.jsp?CONTENT%3C%3Ecnt_id=10134198673246869&FOLDER%3C%3Efolder_id=1408474395243877&ASSORTMENT%3C%3East_id=1408474395213825
Q4.
Can results of previous ELA and mathematics testing for grades 4 and 8
be compared to results of the new NYSTP for grades 3-8?
A4. A representative of NYSED refers to the “old” and “new”
tests as cousins rather than brothers. The tests do not lend themselves
to absolute comparison because the “old” grade 4 tests included K-4 or
4-8 content and skills whereas the new 3-8 tests consist primarily of
specific material with a narrower range of content on each test. The
range of content covers what students learned at the end of the previous
grade level and at the beginning and middle of the current grade level
of the student. Of course, longitudinal and cross-sectional data will
become increasingly available in the coming years that will allow for
year to year comparison.
Interpretation of Results
Q5. What does it mean for an individual student to achieve a
Standard Performance Index (SPI) within the Target Range?
A5. It means that a student demonstrates the expected level of
understanding in the specific ELA Learning Standard or mathematics
Strand. An SPI is a derived score from 0 to 100 that estimates the
number of questions a student would have answered correctly if there
were 100 questions per Learning Standard or Strand. However, the way
tests are currently designed there are as few as four questions
developed to generate an SPI. The Target Range for SPI varies across the
Standards and Strands because the number and difficulty of the test
questions vary across these Standards and Strands.
Consequently, the SPI must be interpreted in context rather than as a
stand alone number. See: Interpreting Student Scores
http://www.emsc.nysed.gov/irts/nystart/2006/InterpretingStudentScores_files/frame.htm
Q6. Is it appropriate to use the scores of NYSTP as part of the
student’s classroom grade?
A6. No. Local districts may elect to display results on student
report cards. However, the scores from NYSTP are derived scores
and should not be calculated as percent correct or averaged with
existing raw score calculations. This is particularly confusing with
secondary level Regents exams, and elementary and intermediate social
studies and science exams, which use a scale score from 0 to 100. See:
How the Regents Exams are Scored
http://emsc32.nysed.gov/osa/concht/scoring-regents.htm
Q7.
The Northeastern Regional Information Center supplies my district with
reports of NYSTP results. At one time we received a report that detailed
which answers individual students selected on each test question. This
report was not distributed to us during the last reporting cycle. Why?
A7. Analysis of distractors, which are the
incorrect response choices included in a multiple-choice item, is best
conducted at the group level. The resulting information is far
more stable in signaling possible areas of concern than examining
incorrect responses of individual students alone. The JMT Data Analysts*[1]
do not recommend the use of individual student by item analysis. For
further information see the question and answer that follow below.
Q8. How helpful is it
to see how an individual student performed on each test question?
A8. It is far more
helpful to review how an individual performed in a specific learning
standard/strand than on a single test question. Student by item data are
very limited. It is important to realize that many factors impact how a
student performs on a single test, including factors that are
situational rather than related to the student’s ability. From a
statistical and psychometric stand point, measurement error for student
performance on individual test items is much greater than for a
composite of items (e.g., a specific standard or strand). Analysis and
interpretation of student performance data should be conducted on the
most reliable index of the student’s ability to avoid misinterpretation
of the results. Scaled Scores are the most reliable index of student
performance on the New York State tests.
JMT
Data Analysts
Kathleen
Maxwell, Capital Region BOCES (kmaxwell@gw.neric.org)
Amy Svirsky,
Capital Region BOCES (asvirsky@gw.neric.org)
Stacy Ward,
HFM BOCES (sward@hfmboces.org)
Jacqueline
Pezzulo, Questar III (jpezzulo@questar.org)
Nicole
Catapano, WSWHE BOCES (ncatapano@wswheboces.org)
Katie Jones,
WSWHE BOCES (kjones@wswheboces.org)
|