Accreditation: The granting of recognition of a test or an examination, usually by an official body such as a government department, examinations board, etc.
Aggregate: To combine two or more related scores into one total score.
Alignment: The process of linking content and performance standards to assessment, instruction, and learning in classrooms. One typical alignment strategy is the step-by-step development of (a) content standards, (b) performance standards, (c) assessments, and (d) instruction for classroom learning.
Assessment grid: A set of assessment criteria presented in a tabular format.
Benchmark: A detailed, validated description of a specific level of student performance expected of students at particular ages, grades, or levels in their development. Benchmarks are often represented by samples of student work.
Bias: A test or item can be considered to be biased if one particular section of the candidate population is advantaged or disadvantaged by some feature of the test or item which is not relevant to what is being measured. Sources of bias may be connected with gender, age, culture, etc.
Borderline performance: A level of knowledge and skills that is just barely acceptable for entry into a performance level (e.g., B2-level).
Classical test theory (CTT): CTT refers to a body of statistical models for test data. The basic notion of CTT is that the observed score X obtained when a person p is administered form f of test X, is the sum of a true-score component and an error component. See also Item Response Theory (IRT).
Compensatory strategy: A strategy that allows a high level of competence in one of the components of the assessment to compensate for a low level of the other components.
Conjunctive strategy: A strategy that requires attaining some predefined minimum level of competence for each one of the separate components to allow the final, summarized result to be judged as acceptable (sufficient).
Construct: A hypothesized ability or mental trait which cannot necessarily be directly observed or measured; for example, in language testing, listening ability.
Content standards: Broadly stated expectations of what students should know and be able to do in particular subjects and grade levels.
Content validity: A test is said to have content validity if the items or tasks of which it is made up constitute a representative sample of items or tasks for the area of knowledge or ability to be tested.
Constructed response (CR): A form of written response to a test item that involves active production, rather than just choosing from a number of options.
Cross-language standard setting: A method intended to verify that examinations in different languages are linked in a comparable way to the common standards.
Cross validation: The application of a scoring system derived in one sample to a different sample drawn from the same population.
Cut score (cut-off score): The minimum score a candidate has to achieve in order to be assigned to a given level or grade in a test or an examination.
Decision validity: The degree to which classification decisions will be identical in repeated testing with the same examinees.
Direct test: A test which measures the productive skills of speaking or writing, in which performance of the skills itself is directly measured.
Examinee-centred method: A standard setting method in which someone who knows examinees well provides a holistic assessment of the level of their language proficiency, for example a CEFR level.
External validation: Collecting evidence from independent sources which corroborate the results and conclusions of procedures used.
Familiarisation: Tasks to ensure that all those who will be involved in the process of relating an examination to the CEFR have an in-depth knowledge of it.
High stakes testing: A form of testing with important consequences for test takers.
Holistic judgment: Evaluating student work in which the score is based on an overall judgment of student performance rather than on specific separate criteria.
Indirect test: A test or task which attempts to measure the abilities underlying a language skill, rather than testing performance of the skill itself. An example is testing writing ability by requiring the candidate to mark structures used incorrectly in a text.