Accreditation: The granting of recognition of a test or an
examination, usually by an official body such as a government department,
examinations board, etc.
Aggregate: To combine two or more related scores into one
total score.
Alignment: The process of linking content and performance
standards to assessment, instruction, and learning in classrooms. One typical
alignment strategy is the step-by-step development of (a) content standards,
(b) performance standards, (c) assessments, and (d) instruction for classroom
learning.
Assessment grid: A set of assessment
criteria presented in a tabular format.
Benchmark: A detailed, validated description of a specific
level of student performance expected of students at particular ages, grades,
or levels in their development. Benchmarks are often represented by samples of
student work.
Bias: A test or item can be considered to be biased if one
particular section of the candidate population is advantaged or disadvantaged
by some feature of the test or item which is not relevant to what is being
measured. Sources of bias may be connected with gender, age, culture, etc.
Borderline performance: A level of knowledge and skills that
is just barely acceptable for entry into a performance level (e.g., B2-level).
Classical test theory (CTT): CTT refers to a body of
statistical models for test data. The basic notion of CTT is that the observed
score X obtained when a person p is administered form f of test X, is the sum
of a true-score component and an error component. See also Item Response
Theory (IRT).
Compensatory strategy: A strategy that allows a high level of
competence in one of the components of the assessment to compensate for a low
level of the other components.
Conjunctive strategy: A strategy that requires attaining some
predefined minimum level of competence for each one of the separate components
to allow the final, summarized result to be judged as acceptable (sufficient).
Construct: A hypothesized ability or mental trait which
cannot necessarily be directly observed or measured; for example, in language
testing, listening ability.
Content standards: Broadly stated expectations of what
students should know and be able to do in particular subjects and grade
levels.
Content validity: A test is said to have content validity if
the items or tasks of which it is made up constitute a representative sample
of items or tasks for the area of knowledge or ability to be tested.
Constructed response (CR): A form of written response to a
test item that involves active production, rather than just choosing from a
number of options.
Cross-language standard setting: A method intended to verify
that examinations in different languages are linked in a comparable way to the
common standards.
Cross validation: The application of a scoring system derived
in one sample to a different sample drawn from the same population.
Cut score (cut-off score): The minimum score a candidate has
to achieve in order to be assigned to a given level or grade in a test or an
examination.
Decision validity: The degree to which classification
decisions will be identical in repeated testing with the same examinees.
Direct test: A test which measures the productive skills of
speaking or writing, in which performance of the skills itself is directly
measured.
Examinee-centred method: A standard setting method in which
someone who knows examinees well provides a holistic assessment of the level
of their language proficiency, for example a CEFR level.
External validation: Collecting evidence from independent
sources which corroborate the results and conclusions of procedures used.
Familiarisation: Tasks to ensure that all those who will be
involved in the process of relating an examination to the CEFR have an
in-depth knowledge of it.
High stakes testing: A form of testing with important
consequences for test takers.
Holistic judgment: Evaluating student work in which the score
is based on an overall judgment of student performance rather than on specific
separate criteria.
Indirect test: A test or task which attempts to measure the
abilities underlying a language skill, rather than testing performance of the
skill itself. An example is testing writing ability by requiring the candidate
to mark structures used incorrectly in a text.