Of anchor questions, leaks and Prometric’s plan for a glitch-free CAT 2010

Apoorv Pandit

14 years ago

There will be little change in the base process used behind the creation and evaluation of the Common Admissions Test (CAT) 2010 compared to CAT 2009. Prometric, the testing vendor commissioned by the Indian Institutes of Management (IIMs) for conducting the test will continue to use the psychometric process to ensure that the CAT is valid, reliable and fair.

Speaking to mediapersons in New Delhi last week, Prometrics Vice President of Test Development Services Stephen Williams said that he was happy with the content creation and evaluation of CAT 2009. It went exactly as we would have liked it to be, he said.

Prometric Indias Managing Director Soumitra Roy was quick to clarify that Mr Williams was speaking only about the academic content and evaluation aspect of CAT 2009 (and not the hardware failures and virus attacks).

Much of what Mr Williams told journalists about the processes and standards used in creation and evaluation of the CAT question papers has been shared in the public domain before. I will summarize the highlights and some points of interest captured during the interaction.

In summary, during the test development phase, “Prometric works closely with IIM professors along with specially trained subject matter experts from other well-regarded Indian universities. Each exam question is written, edited and reviewed in an iterative process. All modifications and approvals are tracked electronically in an audit trail.”

“After the tests are administered, the raw scores are calculated on the basis of a +3 for a correct answer, a -1 for a wrong answer while un-attempted questions are ignored. After equating the scores (read below), they are linearly scaled to a 0-450 range before being presented to the candidates as test results.”

Anchor questions, Cloned questions and question leakages

“Each section of the paper has 4 anchor items (questions), 2 of which would be exposed for the first time and another two would have been exposed in a previous paper. So a total of 6 items out of the 60 would have been exposed before and another 6 will be exposed once more. What were doing is creating an equating chain which links all the papers together. Then we go into the post-equating process. First we look at how each item performed, how many people got it right and how many got it wrong. Then we look at the make-up of each particular paper and how that mixture of items performed across the papers. And by using the anchor items, you can make a determination about where that paper falls in difficult level.”

(Mr Williams refused to divulge whether the anchor questions were shared between question papers of consecutive slots.)

The performance of test-takers in these anchor questions is measured to establish a common metric of difficulty between two question papers and then adjust the raw scores accordingly.

Mr Williams said that a few more questions similar in nature but different in form – called ‘cloned questions’ – were repeated across slots. A simplistic example is an algebraic problem in multiple versions, each version having a different set of substitution values. In reality, the differences between the versions would be a lot less apparent, said Mr Williams.

Anchor questions and cloned questions have a lot to do with the controversy over the ‘leakage’ of CAT 2009 questions on anonymous blogs, Orkut communities and coaching institute channels. For any candidate to gain an advantage in a future test slot, it is important that the leaked questions fall in the anchor or cloned categories.

Mr Williams played down the leakages. According to him, people tend to have nervous mindsets during a test and their ability to remember questions and reproduce them accurately a few hours later is overrated. “We went through the blogs and Orkut communities that were accused of sharing questions and found that very few of them had gotten it right. There was only a perception that the questions on the blogs were the same as those in the test, but they were not,” he said.

Not all candidates are exactly ‘nervous’, though. Test-preparation institutes, for example, were regularly sending ‘proxy candidates’ (teachers, content developers) to CAT 2009 slots with the intentional purpose of memorizing questions and sharing them with their students. These questions were being shared clandestinely in classrooms and secret email lists with test-takers in later slots.

Mr Williams’ defended this with the explanation that such concerted activity had not worked because “if that were to happen, then statistically we would have noticed strange patterns with the answering of cloned questions which we didn’t.”

Mathematically, the extent of the advantage gained by being privy to an anchor question is an interesting probability problem (anyone?).

CAT 2010 will be held in 32 cities and over fewer centers

The IIMs will release the CAT 2010 advertisement in the end of August, said Mr Roy. Prometric plans to use fewer testing centers this year after its disastrous tryst with India’s hardware infrastructure during CAT 2009.

“We will only choose test centers that adhere to the norms created by us for ensuring infrastructure quality,” he said.

As the testing window will also be longer (more like a month), there will clearly be more number of slots, more number of distinct question papers and therefore a larger pool of questions.

Edit (July 15, 2010): Prometric issued a clarification today, which has been updated in the story. The number of anchor questions per paper is not 2+2, but 6+6. The explanation should thus read: “Each section of the paper has 4 anchor items, 2 of which would be exposed for the first time and another two would have been exposed in a previous paper. So a total of 6 items out of the 60 would have been exposed before and another 6 will be exposed once more. What were doing is creating an equating chain which links all the papers together. Then we go into the post-equating process. First we look at how each item performed, how many people got it right and how many got it wrong. Then we look at the make-up of each particular paper and how that mixture of items performed across the papers. And by using the anchor items, you can make a determination about where that paper falls in difficult level.