Frequently asked questions

This FAQ section has been created to help you in the analysis of your examinations and their link to the CEFR

How can I be sure that a student’s performance in speaking or writing is judged at the desired CEFR level?

When judging a student’s performance in speaking or writing we can make use of benchmarks: detailed, validated descriptions of specific levels of student performance that is expected at a particular CEFR level. Such benchmarks are often represented by samples of student work (writing) or videos (speaking).

Benchmarks can be produced in the following way. A coordinator selects a number of oral or written student sample performances to introduce a specific CEFR level. For each sample a group of experts then judges and discusses whether the sample does indeed illustrate the level, and why it is not the level above or below this. After reconsideration the group then votes on the level of each performance. During a last phase the individual group members rate the performances once again and compare their scores.

It must be emphasized here that benchmarking is a group process, rather than one expert showing and telling the other experts which performances best illustrate a performance at the desired CEFR level.

Suggestions for action

Collect a number of relevant student samples.
Select a panel of experts (around 10).
Make sure that the experts are familiar with the CEFR and its descriptors through familiarisation activities.
Collect all materials for the benchmarking session.
Inform panellists beforehand about the details of the benchmarking procedure.
Carry out a benchmarking procedure as suggested in the linking Manual.
Publicize the results of the benchmarking session.

References

Council of Europe (2009), Relating language examinations to the Common European framework of reference for languages: learning, teaching, assessment (CEFR), A Manual, Strasbourg: Council of Europe. See Chapter 5 Standardisation training and benchmarking. See Chapter 5 Standardisation training and benchmarking
Noijons, José & Jana Béresová, Gilles Breton, Gábor Szabó (2011), Relating language examinations to the Common European Framework for Languages: Learning, Teaching, Assessment (CEFR). Highlights from the Manual, Graz: ECML.
Council of Europe (2008), Report on the cross linguistic benchmarking seminar to calibrate examples of spoken productions in English, German, French, Italian, Spanish with regard to the six levels of the CEFR, Strasbourg: Council of Europe.

Is it necessary to provide a context for tasks?

In the case of speaking and writing tasks this question may seem to be rhetorical. In real life, how we speak and write much depends on the circumstances we are placed in. So if speaking and writing tasks are to be authentic, we cannot usually do without providing a context. Generally speaking, most speaking and writing tasks are therefore placed in a context.

In theory it may be possible to ask a student to speak or write about a subject without providing a context. This is often done when students are to give their (personal) opinion about a subject, a phenomenon or an incident. Yet from a CEFR point of view this may be questionable: in our communication with others we need to think about who we are addressing and why. To simply give our/an opinion without thinking of the person we are addressing may be rather counterproductive, it may hurt other people’s feelings or it may simply not be understood or indeed be misunderstood. We must also realize that the aim of assessing speaking or writing in a foreign language is not to test a student’s ability to express a view or an opinion, but rather to test if the student can express that opinion in the foreign language. In other words: we need to test if the student can express a view or an opinion, but we do not assess the content of the message (e.g. facts, data, etc.).

In the case of reading and listening tests, we often find that students are instructed to read a text or listen to a passage without any context given (“Read the following text and answer the questions.”). It cannot be denied that in such cases we may be testing reading or listening. However, from a CEFR point of view we would also expect a context when testing reading and listening: we need to give a reason for reading the text.

Suggestions for action

In testing spoken and written interaction it is advisable to provide the students with contexts that are realistic and link up with their age and their experience in life. We also need to remember that there are cultural differences between speakers and writers from different backgrounds. Students may or may not be comfortable in saying or writing certain things.
In testing spoken production it is also advisable to provide a context based on a printed text or on a visualisation (pictures, photos, etc.). This may help students in constructing a view or an opinion. To avoid us testing the students’ opinions or views or imagination or cultural knowledge, rather than their ability to express them, it is advisable to provide key words or arguments for and against.
In tests of reading or listening, by providing students with authentic texts placed in a realistic context, and with the student being given a purpose to read or listen, we may improve the validity of reading and listening tests.
When providing contexts bear in mind that there are unsuitable contexts such as war, politics, racism (including cultural clichés and stereotyping), sex and sexism (including stereotyping), potentially distressing topics (e.g. death, terminal illness, severe family and social problems, natural disasters, and the object of common phobias such as spiders and snakes – where the treatment might be distasteful), examinations, passing and failing, drugs.

References

Council of Europe (2001), Common European framework of reference for languages: learning, teaching, assessment, Cambridge: Cambridge University Press. See section 4.1 The context of language use and Table 5, External context of use: descriptive categories.
ALTE (2011), Manual for Language Test Development and Examining; For use with the CEFR, Strasbourg: Council of Europe. See Sections 1.4 Ethics and fairness and 3.4.1 Editing new materials. See also Appendix IV, Advice for item writers.

Once an exam is linked to the CEFR, how can I make sure that new versions of the exam are linked to the CEFR as well?

The most obvious answer to this question would be to apply all the steps of linking the new exam to the CEFR: familiarisation, specification, standardization, standard setting and validation. Some steps may be made easier if in a matrix test specifications have been produced: what is tested, how it is tested, the number of items, the item types, the types of texts etc. This is a way to make sure that a test measures the same construct with the same CEFR-related skills as earlier versions.

What is ideally needed is to run a pre-test of the new exam with a representative selection of items of the earlier exam embedded (so-called anchor items). With the help of advanced statistics it would then be possible to set standards comparable to those of the earlier exam. If the earlier standards have been related to the CEFR, one may argue that the new exam is linked to the CEFR. In fact this is part of the linking process of validation.

Suggestions for action

Produce a test matrix, related to the CEFR, with reference to CEFR descriptors and levels, containing test specifications.
Collect data in pre-tests, preferably using anchor items to link exams to each other.
Possibly adapt standards when a new version of the exam proves to be easier or more difficult than the older one.

References

ALTE (2011), Manual for Language Test Development and Examining; For use with the CEFR, Strasbourg: Council of Europe. See Sections 2.4, 2.5 and Appendix VII.

Council of Europe (2009), Relating language examinations to the Common European framework of reference for languages: learning, teaching, assessment (CEFR), A Manual, Strasbourg: Council of Europe. See Chapter 7 Validation.

Noijons, José & Jana Béresová, Gilles Breton, Gábor Szabó (2011), Relating language examinations to the Common European Framework for Languages: Learning, Teaching, Assessment (CEFR). Highlights from the Manual, Graz: ECML. See Chapter 7 Validation.

Can we link our tests to the CEFR levels even though we do not have statistics on student performance on our tests, apart from total scores?

In general the validity and the reliability of a test will be enhanced when on the basis of data of student performance on the test, the test itself and/or the format of the test is adapted (difficulty of tasks, type of tasks, length of the test etc.). The first and most important step in the linking process is to make sure that the test in question is valid (that it indeed tests what it claims to be testing) and reliable (that the testing is consistent). Ideally this is done through pretesting the collection of data on the test. However, there are situations where such a procedure is not possible or too costly, such as in classroom-based testing.

Suggestions for action

If it is not possible to collect evidence that a test is valid and reliable through statistics, we can nevertheless make an attempt to link the test to the CEFR through specification. In fact specification is a phase in the linking process that always needs to be carried out.

The specification phase in the linking process helps to raise awareness among test providers of:

the importance of good content analysis of language examinations to ensure content validity;
the CEFR, especially its descriptor scales;
the rationale for relating language examinations to an international framework like the CEFR; and
ways in which the CEFR can be exploited in planning and describing language examinations.

There are four steps to be taken in the specification phase:

Assuring adequate familiarisation with the CEFR by those describing the test;
Analysing the content of the test in question in relation to the relevant categories of the CEFR ( should an area tested not be covered by the CEFR, the user is asked to describe it);
Profiling the test in relation to the relevant descriptor scales of the CEFR on the basis of this content analysis;
Making a first claim on the basis of this content analysis that an examination or test in question is related to a particular level of the CEFR.

References:

Council of Europe (2009), Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR), A Manual, Strasbourg: Council of Europe. Appendix A, sections A2 to A5 provides a helpful set of forms (A1-A24) that can help test developers specify a test in terms of the CEFR.

ALTE (2001). Principles of good practice for ALTE examinations; Revised draft October 2001, www.alte.org

Can we test at more than one CEFR level in one test?

In theory this is possible, in practice this may be challenging. It also depends on the type of test and the skill that is being tested. A test that is (to be) linked to the CEFR is supposed to tap a representative number of descriptors at the desired CEFR level. For each descriptor at each level we would need a sufficient number of items to be able to give a valid judgement on whether the student can do what is described in the descriptor. In practice this may mean that tests of reading and listening would have to be longer than is feasible.

In the case of speaking tests, there are formats such as in the Oral Proficiency Test (OPI), where a trained interlocutor moves from one level to another depending on the proficiency of the candidate. In such tests it would be possible to find out whether the student is able to function at more than one CEFR level. It must be stressed here that interlocutors will need to be thoroughly trained in administering such a test. Generally speaking this would not be within the reach of untrained teachers.

There exist computer-based adaptive tests, in which the students are presented with items at various levels, depending on the responses that they give. In principle a more difficult task (possibly at a higher CEFR level) is presented every time the student gives a correct response. In this way testing time may be reduced considerably. This means that numerous items are needed to create an item bank. However, creating a calibrated item bank is costly, and needs time.

The CEFR is built on the idea that a person who can perform at a given CEFR level can also perform at the level(s) below the given level. A person at B1-level is supposed to be able to perform at levels A2 and A1 as well. However, this does not mean that we can simply assign levels A2 or A1 to the student when he or she has a low score on a B1-level test, for the reasons outlined above.

Suggestions for action

When testing receptive skills such as reading and listening, aim at one level only. If students score high on a one-level test, set them a test at a higher level (and the reverse).

When testing writing, set students a series of tests with tasks that are linked to descriptors at one level only. If students score high on such one-level tests, set them a series of tests at a higher level (and the reverse).

When testing speaking, either train interlocutors to administer OPI type of tests, or do as with administering writing tests (see above).

References

ALTE (2011), Manual for Language Test Development and Examining; For use with the CEFR, Strasbourg: Council of Europe. See Chapter 5, Marking, grading and reporting of results and Appendix VII.

How do we make sure that our pass/fail decisions are related to the CEFR?

In many countries pass/fail scores in exams are laid down in the law or described in the syllabus, without reference to the CEFR. Thus it is possible for students to pass an exam at a given CEFR level without attaining a score that would indicate that students have a proficiency at the desired CEFR level.

For a pass/fail score that is related to the CEFR we need to carry out a standard-setting procedure (for the receptive skills) or a benchmarking procedure (for the productive skills). In these procedures a group of experts determines what minimum score is needed for the students to claim that they have reached the desired level. In the case of speaking or writing tests, performances can be selected by experts that illustrate how students should perform for them to be graded at a specific CEFR level.

It is thus possible for a student to have a score on the exam that indicates two things: (1) the student has or has not passed the exam from a legal perspective and (2) the student has or has not reached the desired CEFR level.

It is often said in syllabuses that the exam is at a given CEFR level. However, if there has been no CEFR-related standard setting or benchmarking, the scores on that exam cannot be said to be related to the CEFR.

Suggestions for action

Redesign the pass/fail decisions in the syllabus in terms of the CEFR.

Carry out standard-setting procedures (receptive skills) and benchmarking procedures (productive skills).

See also FAQs: How can I be sure that a student’s performance in speaking or writing is judged at the desired CEFR level? and How do we determine that a student performance in reading or listening is at a specific CEFR level?

References

Council of Europe (2009), Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR), A Manual, Strasbourg: Council of Europe. See Chapter 5 Standardisation training and benchmarking and Chapter 6 Standard setting procedures.

Noijons, José & Jana Béresová, Gilles Breton, Gábor Szabó (2011), Relating language examinations to the Common European Framework of Reference for Languages: Learning, teaching, assessment (CEFR). Highlights from the Manual, Graz: ECML. See Chapter 5 Standardisation training and benchmarking and Chapter 6 Standard setting procedures

Council of Europe (2008), Report on the cross linguistic benchmarking seminar to calibrate examples of spoken productions in English, German, French, Italian, Spanish with regard to the six levels of the Common European Framework of Reference for Languages (CEFR), Strasbourg: Council of Europe.

How do we determine that a student performance in reading or listening is at a specific CEFR level?

When grading a student’s performance in reading or listening we may need to set a performance standard. This is the boundary or cut score between two scores on a performance scale. A cut score of 30, for example, says that a score of 30 or more indicates a performance at a particular level (for example B1) while a lower score indicates that the student has not reached the desired level.

There are various ways to set standards. It has been found that applying two or more of such methods may yield the best results. For all these methods a coordinator needs to gather student scores on a reading or listening tests. A number of such methods are described in the linking Manual (see References below). In cases such as reading or listening tests when numerical scores are given, experts estimate at what CEFR level a test taker can be expected to respond correctly to a set of items.

It must be emphasized here that standard setting is a group process, rather than one expert showing and telling the other experts which score is required to determine if a performance is at the desired CEFR level.

Suggestions for action

Collect a relevant set of data of student performance on a test.
Select a panel of experts (around 10).
Make sure that the experts are familiar with the CEFR and its descriptors through familiarization activities.
Collect all materials for the standard-setting session.
Inform panellists beforehand about the details of the standard-setting procedure.
Carry out standard-setting procedures as suggested in the linking Manual.
Publicize the results of the standard-setting session.

References

Council of Europe (2009), Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR), A Manual, Strasbourg: Council of Europe. See Chapter 6 Standard setting procedures.

Noijons, José & Jana Béresová, Gilles Breton, Gábor Szabó (2011), Relating language examinations to the Common European Framework of Reference for Languages: Learning, teaching, assessment (CEFR). Highlights from the Manual, Graz: ECML. See Chapter 6, Standard setting procedures

Can an exam that tests linguistic competence (language use) be linked to the CEFR?

Even if the CEFR acknowledges that linguistic competence is an important aspect of language competence, it may be difficult to link sections in an exam that test subskills such as grammar or vocabulary to the CEFR. It must be noted that the formulations of the CEFR descriptors for linguistic competence are rather general and can be interpreted in many ways. For some languages (such as French and German) more detailed descriptors have been developed. However, these have not been scaled in the same way as those in the CEFR itself have been.

The problem is that often exams tend to focus on issues in vocabulary and grammar that learners with a specific first language background find difficult when learning a particular foreign language. Such sections in an exam may focus on the structure of a language rather than on the communicative aspects of it. Such sections do not necessarily focus on grammatical constructions that are typical of written texts produced in various contexts at various levels, and the vocabulary going with those. Such sections are not usually linked to specific linguistic CEFR descriptors at various levels.

From a formative point of view such a linguistic focus is understandable. However, in summative situations, if the curriculum and the syllabus claim that students should be able to function at specific CEFR levels at the end of secondary school, then it is to be wondered if an exam that is to be linked to the CEFR should contain (large) sections on linguistic competence.

It can be argued that when testing reading and listening we also test a student’s understanding of the structure and the vocabulary of a language. We may argue the same for tests of speaking and writing: if assessment criteria such as the use of vocabulary and grammatical structures are applied, then there would seem no need to discretely measure vocabulary and structures.

Suggestions for action

Adopt the action-oriented approach of language use as expressed in the CEFR.
When testing reading and listening, focus on understanding the text, its message, its style, its genre, the author’s (speaker’s) intention with reference to relevant CEFR descriptors.
When testing speaking and writing, use formal criteria: accuracy, range, fluency, interaction, coherence, cohesion, with reference to CEFR criteria.

References

Council of Europe (2001), Common European framework of reference for languages: learning, teaching, assessment, Cambridge: Cambridge University Press. See section 5.2.1 Linguistic competences; Table 3.

Council of Europe (2009), Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR), A Manual, Strasbourg: Council of Europe. See Tables C1-C4.

How do I know whether a test is at the CEFR level it claims to be at?

Many testing organisations and publishing companies claim that the tests they administer or publish are at a given CEFR level. The validity of such claims may be very important for test takers. On the basis of their results, they may be admitted to further education or hired for a job. There is also a need for institutes and employers to be able to depend on the validity of claims of links to the CEFR and to specific CEFR levels in particular.

Suggestions for action

It is obvious that without sufficient proof of the validity of claims of links to CEFR levels, such claims cannot be trusted. Ideally such proof should be included in the test materials. However, such test materials may potentially refer to documents that are confidential and thus inaccessible to the general public. Yet some published information on linking should be available. Such information may contain evidence of various types:

Through field testing the test has been shown to be valid and reliable;
The test has been specified in terms of the CEFR: it links up with the action-oriented, communicative approach the CEFR takes towards language learning;
The test has been standardized: scores on performances have been linked to levels in the CEFR through benchmarking and standard –setting;
There is independent evidence which corroborates the results of the standardization: empirical validity.

It will not always be easy to find enough evidence of the validity of a test’s claims to links to the CEFR levels. Very often the only validation of such claims is through a specification of the content of the test in terms of the CEFR, such as low-stakes classroom-based tests. For some tests, the resulting evidence may be sufficient. However, for high-stakes tests all the types of evidence as mentioned above are needed for the links to the CEFR level to be called valid.

References

Council of Europe (2009), Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR), A Manual, Strasbourg: Council of Europe. See Figure 2.2 and Chapter 7.

Noijons, José & Jana Béresová, Gilles Breton, Gábor Szabó (2011), Relating language examinations to the Common European Framework of Reference for Languages: Learning, teaching, assessment (CEFR). Highlights from the Manual, Graz: ECML. See Figure 2.2 and Chapter 7

ALTE (2011), Manual for Language Test Development and Examining; For use with the CEFR, Strasbourg: Council of Europe. See Appendices 1 and VII

Do we need to present authentic real-life texts in reading and listening tests?

Texts that are taken from real life and that have a communicative function link up to the CEFR model of language use and would therefore be welcome in CEFR-based language tests. This is not to say that such texts cannot be edited for technical reasons (texts may be too long for inclusion, incidental words may cause undue problems of understanding at the intended CEFR-level). Such editing is permissible, both from a validity point of view and from a legal point of view, as long as certain rules of good practice are observed.

The selection of listening samples may be problematic for various reasons. Authentic materials may be difficult or expensive to obtain, the sound quality may not be acceptable, the cost of producing varied listening samples may be too high. Yet it must be avoided that listening is tested by making use of reading texts that are read out by one or two actors.

Suggestions for action

Always mention the source of a text. Indicate when changes have been made (“Adapted from …”).
When no author is known or mentioned, it is advisable not to refer to an author in the items. Refer to the text itself, as in: What does the text say about …?
When editing a text, make sure that the original message of the text is maintained.
When editing a text, make sure that there is a clear beginning and ending.
Select listening samples that are in the public domain, on the Internet (e.g. YouTube).
In listening tests: if editing existing texts is necessary, use texts that have originally been spoken.
In recording listening texts: use native speakers to produce oral texts.
Try and produce semi-authentic recordings, where the speakers do not “read out” texts but create utterances spontaneously on the basis of cues.
Try to have a varied group of actors with different accents to read out selected texts.
A certain degree of background noise in recordings is acceptable. In real life most listening activities will be accompanied by background noise.
Select types of texts that go with specific descriptors at specific CEFR levels.
And above all, choose texts appropriate for the age of the test takers.

References

ALTE (2011), Manual for Language Test Development and Examining; For use with the CEFR, Strasbourg: Council of Europe. See section 1.1.3 Operationalising the model, section 3.3.2 Commissioning, section 3.4.1 Editing new materials. See also: Appendix 4, Advice for item writers.

Do we give the same number of points for each item in reading and listening sub tests?

Some items are on occasion given more weight than other items because they are thought to be more difficult than others. If the item consists of a number of operations this is acceptable, if the students know that the item is worth more points. In other cases it is hardly necessary to weight items. We will be able to distinguish good students from less good students because in principle the less good students might give an incorrect response and gain no points on that item.

From a CEFR-point of view there is also an issue in weighting items. If one item is considered to be more difficult than another item, then it must be wondered if that item may be tapping a descriptor at a higher level. As is argued in another FAQ (Can we test at more than one CEFR level in one test?), it is advisable to create homogenous tests aimed at one CEFR-level only.

There is one more issue in weighting items. Item writers or indeed the syllabus itself may claim that certain items are more difficult than other items. Without data on the performance of these items such claims are not valid.

Suggestions for action

In the case of a complex item, try to re-phrase the item and turn it into a set of items.
When complex items need to be used, always inform the students about the number of points that can be gained for such an item.
When testing language skills, aim at one CEFR level only. If students score high on a one-level test, set them a test at a higher level (and the reverse).
When constructing language tests, always try to collect data on item performance, preferably also in a pre-test.

Can tests of different skills be summarized at one CEFR level?

Some language tests claim that they measure a student’s language skills at one or more CEFR levels. One must check what the validity of such a claim would be. We cannot simply “average out” performances in different skills. In real life most learners are better at one skill than another, certainly at the lower CEFR levels. Thus, in a test we may be able to average score points, but we cannot average CEFR levels. We may be able to say that the student is at B2 for reading and at A2 for writing. We cannot then say that the student is at B1 for reading and writing combined.

Table 1 in the CEFR (Common Reference Levels: global scale) is often misunderstood as meaning that what is described for a particular level is what is to be expected from a language user at that level, for all language skills. This table however should be interpreted as a description of what a person can do at that level and with particular skills: he or she can function at the given level for reading and listening, but at another level for speaking and writing.

The Council of Europe has advocated the development of profiles, in which the student’s proficiencies in the various language skills are described. The European Language Portfolio has also adopted this approach.

Suggestions for action

When developing a language test, give separate CEFR level-estimations for each language skill tested.
Make sure that there is a sufficient number of tasks for each language skill to be able to give a valid estimation of the CEFR level.
Encourage students to find out about their strengths and weaknesses for each language skill through tests of specific skills.
Report test results for each language skill separately.

References

Council of Europe (2009), Language Education Policy Profiles - A transversal analysis: trends and issues, Strasbourg: Council of Europe
Council of Europe (1991), European Language Portfolio (ELP), Strasbourg: Council of Europe

Download the document

This initiative is carried out within the framework of a cooperation agreement between the European Centre for Modern Languages and the European Commission, entitled
Innovative Methodologies and Assessment in language learning
www.ecml.at/ec-cooperation