Quantcast

Call for Papers: PopMatters Celebrates The Jam in Massive Special Section

The Multiple Choice Game

Tuesday, Jul 27, 2010
A multiple choice exam is looking for a sweetspot of a certain percentage passing, not too high and not too low.

The multiple choice question may be one of the most despised games ever conceived. The purpose of a multiple choice exam is to exclude people in a quantitative manner, be it for admission into schools, licensing professionals, or limiting the number of high grades in a class. Assessing a person on their individual merits is a time consuming process, and once a school or class hits a critical mass of students, it isn’t economically reasonable to scrutinize all of them. Let’s say you’ve got 1000 participants and five people reading their results. You can cut time and costs by figuring out a way to neatly get rid of 500 because they scored under a certain amount. A multiple choice exam cannot be so difficult that you exclude an excessive number of applicants. Most law schools, for example, have a minimum LSAT score that you must score below for automatic denial and a high score for automatic acceptance. Applicants in between those scores are then addressed on an individual level and other factors are introduced. The problem with such a system is that to ensure a multiple choice test produces the right number of passing scores, you have to keep changing the questions.


An article by the National Center for Fair and Open Testing explains, “multiple-choice items are an inexpensive and efficient way to check on factual (“declarative”) knowledge and routine procedures. However, they are not useful for assessing critical or higher order thinking in a subject, the ability to write, or the ability to apply knowledge or solve problems” (“Multiple-Choice Tests”, Education.com). It’s for that reason that a multiple choice question is always limited in scope: it can only be about basic knowledge of a topic. The formula is to have two answers that are blatantly wrong, one that is kinda right, and one that is the most right. One instructor during a review session for the BAR pointed out that on average a student will know the correct answer immediately to 25% of the problems, have no clue on 25%, and be able to boil it down to the right and kinda right answer for the other 50%. So the way that you evaluate the difficulty of a multiple choice question is how similar the right and kinda right answers are. A person who can’t boil it down to those two doesn’t know the basic material and shouldn’t pass.
  


Right and kinda right are obtuse concepts. They revolve around checking if a person understands the distinction between knowing something is correct and understanding why something is correct. A good analogy would be the difference between someone saying that water puts out fire and another person saying water puts out fire because it cuts off the oxygen and removes heat from the material. In that sense, people either fail on the facts or on the technicalities when choosing between these two answers. In law exams, someone can have a legal code perfectly memorized but completely miss the point when trying to apply it to a scenario. On the other hand, they can perfectly understand the situation at hand but not know how, in particular, something illegal has occurred. In the example above, I might ask you why a fire went out when Suzie dumped a bucket of water on it. The correct answer obviously needs to be a bit more than, “Because water puts out fire.”


There’s a great post by Jerard Kehoe that outlines a lot of the basic strategies when crafting these questions. The important thing to remember is that a test maker wants a certain percentage to fail, so they are going to have a balance of questions based on how often people get them right on average (“Writing Multiple Choice Exams”, Practical Assessment , Research & Evaluation, 1995). A multiple choice question ends with what is called a “stem”, which will be either an incomplete statement or a direct question. Incorrect answers are called “distractors”. The more information that is in the stem of the problem and not in the answers, the more difficult the question. This is because no matter how good the teacher, there is always a risk of accidentally giving away the answer in the phrasing of the distractors. The words might trigger a latent memory or response instead of a factual understanding by the student.


There are too many tricks to list but an example would be one in which the question lists a definition and then offers several vocabulary words. That question will fail significantly more students than the reverse situation because they have less to tip them off. Negative statements in multiple choice questions are useful but are very susceptible to bias. This would be the “pick the answer that is most wrong” type of question. Students just instinctively pick “correct” answers, so anytime that a negative answer is introduced, a teacher has to factor in that a larger percentage of people are going to get it wrong. Other examples would be having the correct answer be gibberish but the other choices be factually incorrect or distinguishing between two correct answers by having a factually incorrect statement attached to it.


From icanhascheezburger.wordpress.com

From icanhascheezburger.wordpress.com



The goal of ensuring the test taker’s factual knowledge is being tested rather than luck or deductive reasoning results in one of the weirdest elements of multiple choice tests: there is a “correct” way for people to get a question wrong. The post above explains, “The number of students choosing a distractor should depend only on deficits in the content area which the item targets and should not depend on cue biases or reading comprehension differences in ‘favor’ of the distractor”. In other words, you’re supposed to get it wrong because the kinda right answer is incomplete or factually wrong compared to the others. Things that tip you off to the answer or comprehension differences should be kept to a minimum. The test taker should only know what to do due to prior reference.


The biggest design issue with multiple choice tests is that writing a good, coherent multiple choice question is difficult because of the thin line between right and kinda right. Being adept at it means specializing both in the craft itself and having an encyclopedic knowledge of the field being tested. Most exams that I’ve worked with were written by very informed people who wrote frustratingly ambiguous questions. An example of a bad question would be one that distinguished the right and kinda right answer because one used the word “presumed” and the other used “inferred”. While the words certainly have two different meanings, spotting the distinction had nothing to do with the subject matter of the question. Professional exam writers are aware of this problem and now many exams will test a question out before actually counting it. Out of an exam of 100 questions, 10 will be experimental ones that see how many people get it right. Once they’ve got the rough percentage of how many people get it wrong on average, they factor it in with easier and more difficult ones. It goes back to the overall purpose of the exam being to exclude people but not too many people. A multiple choice exam is looking for a sweetspot of a certain percentage passing, not too high and not too low.


From yojo.com

From yojo.com


The more the questions are refined to weed people out, the more people will rely on study materials tailored to the exam to pass. These, in turn, break down the scoring process because people who can get access to the materials disproportionately pass and skew the passage ratings. The questions must become more difficult to compensate for this. The problem grows because you can only maintain a test that people are breaking with study materials by changing the questions that you use. Since the body of knowledge that you’re testing is already pre-determined, there is no new material to naturally generate new questions. The MCATs can’t suddenly include a non-medical topic just to fix the passage rate. So to keep generating new questions, the tests have to keep finding ways to change the presentation of the material. The solution thus perpetuates the problem: excessively convoluted questions can only be passed through practice because they continually deviate from the subject matter’s normal presentation. This issue is particularly pronounced if the question is presenting extremely implausible hypothetical scenarios that a professional would never encounter in real life. Like, oh say, the MBE on the BAR. Once a standardized test gets to a point where studying the test is just as important as knowing the material, how much of the practice materials you can afford might as well be the first question.

Comments
Now on PopMatters
Short Ends and Leader: 'Battleship': What Did You Expect?
'Battleship': What Did You Expect? (Short Ends and Leader) [Mon, 2:00 pm]
East Meets Least: 'Thirteen Women' (Short Ends and Leader) [Fri, 4:00 pm]
'Man to Man' is an Early Talkie that's Not Stagey at All (Short Ends and Leader) [Fri, 4:00 pm]
Calling Out to Carroll...Baker: 'Bridge to the Sun' (Short Ends and Leader) [Fri, 4:00 pm]
Early Summer 2012 New Music Playlist (Mixed Media) [Fri, 12:00 pm]
Paranormal (Radio)Activity: 'Chernobyl Diaries' (Short Ends and Leader) [Fri, 11:00 am]
'Men in Black 3' Looks Back, Again (Reviews) [Fri, 9:20 am]
Poliça: 11 May 2012 - Rochester, NY (Reviews) [Fri, 6:25 am]
'The Witcher 2' Does the Exposition Dump Right (Moving Pixels) [Fri, 6:00 am]
  1. The Top 10 Overplayed Songs You Hate by Artists You Love (Sound Affects)
  2. Tea with 'Sherlock': Investigating the Investigators (Features)
  3. Sunk? This 'Battleship' Stunk! (Short Ends and Leader)
  4. Top Ten Lost Midwest Punk Singles (Sound Affects)
  5. Tenacious D: Rize of the Fenix (Reviews)
  6. 20 Questions: Kate Bornstein (Features)
  7. 10 Pieces of Cinematic Art That Require Revisiting (Short Ends and Leader)
  8. Like 'Doom', In Heels (Moving Pixels)
  9. Punk Rock's Pet Sounds: An Interview with Bomb the Music Industry! (Features)
  10. She's a Rainbow: A Tribute to Donna Summer (Features)
  11. Counterbalance No. 82: U2's 'Achtung Baby' (Sound Affects)
  12. 'Albatross': A Not-So-Weighty Coming-of-Age Meets Mid-Life-Crisis Film (Reviews)
  13. We Will Avenge Them Or… Be Avenged?: The Individual in the US Experience (Features)
  14. Go Goth!: Ranking the Burton/Depp Collaborations (Short Ends and Leader)
  15. Counterbalance No. 83: The Stooges' 'Fun House' (Sound Affects)
  16. The Queen and Her Crayons: An Interview With Donna Summer (Features)
  17. Best Coast: The Only Place (Reviews)
  18. The Best Canadian Records of the Year? The Fun Agony of Voting for the Polaris Prize Long List (Sound Affects)
  19. Flash Points: Mommy's Breast, Marriage Equality and Why Chipotle Is King (Features)
  20. Something’s Wrong with the Black Widow! (Graphic Novelties)
  21. Sergio Leone: Something to Do with Death (Columns)
  22. Killer Mike: R.A.P. Music (Reviews)
  23. Sherlock Holmes, Dirk Gently and the Case of the Eccentric Detective (Columns)
  24. In Support of Supports (Moving Pixels)
  25. In Defense Of... Rock Radio: A Force in Popular Culture (Columns)
  26. Early Summer 2012 New Music Playlist (Mixed Media)
  27. The Cult: Choice of Weapon (Reviews)
  28. Willie Nelson: Heroes (Reviews)
  29. Garbage: Not Your Kind of People (Reviews)
  30. 'People's Pornography': The Mundanities of Pornography and Surveillance Culture (Reviews)
PM Picks
Announcements
Ratings

10 - The Best of the Best

9 - Very Nearly Perfect

8 - Excellent

7 - Damn Good

6 - Good

5 - Average

4 - Unexceptional

3 - Weak

2 - Seriously Flawed

1 - Terrible

© 1999-2012 PopMatters.com. All rights reserved.
PopMatters.com™ and PopMatters™ are trademarks
of PopMatters Media, Inc.

PopMatters is wholly independently owned and operated.
PopMatters is a member of BUZZMEDIA Music, MOG and Guardian Select.