Constructing closed-ended questions to assess higher order thinking

Constructing MC questions to assess higher-order thinking

Multiple-choice questions are useful because they can efficiently assess a large amount of information in a short time, especially for big groups of students. However, a common argument against them is that they only measure simple recall, since the correct option is already provided. In this article, we will show that you can also construct MCQs that require higher-order thinking.

What is higher-order thinking?

Higher-order thinking, in Bloom’s taxonomy, refers to the more complex cognitive processes of applying, analyzing, evaluating, and creating knowledge beyond basic remembering and understanding.

MCQ usually assess recall or comprehension, but you can also construct questions that require application and analysis, sometimes even evaluation.

N.B. experts often disagree about the categorization of questions. There is often overlap between, for instance, understanding and applying, or applying and analyzing. See Bloom’s taxonomy as an aid to help you think about the complexity of your MCQs, not as a straitjacket.

(Source: McGill Teaching & Learning Services workshop, n.d.)

Don’t forget the basics

Writing good MCQs (including plausible distractors) is an art. You want to make sure you’re assessing the desired knowledge/skills rather than reading ability, for instance. Don’t ask trick questions or focus on uncommon exceptions: “When you hear galloping, think of horses, not zebras” (Brame, 2022).

For helpful guidelines on MCQ construction, see 2. Constructing – UvA Teaching and Learning Centres (TLC).

Common strategies for assessing higher-order thinking with MCQs

Below you will find four common strategies for constructing MCQs that measure higher-order thinking. There are sample questions from various disciplines, often with a brief rationale explaining why this question requires higher-order thinking. For each sample question, the correct answer is indicated in bold.

1. Flip the question

Instead of asking students to choose the correct definition of a term (remembering), present an example and ask the student to identify the rule or concept (understanding or applying).

Example (education)
Original question: Rephrased question:
 

Which of the following best describes what is meant by ‘formative assessment’?

 

A. is based on the student’s attitudes, interests and values

B. is designed primarily to evaluate learning

C. is usually high‐stakes

D. provides information to modify teaching and learning

 

A teacher uses a strategy called Thumbs Up, Thumbs Down with her students. This illustrates the use of:

 

A. affective assessment

B. formative assessment

C. diagnostic assessment

D. summative assessment

(Source: O’Leary, 2017, in Scully, 2017)

2. Increase the complexity

Require students to combine pieces of knowledge (understanding, applying, sometimes also analyzing).

Example 1 (statistics)

You have carried out a 3 x 2 ANOVA for independent groups. There were 60 participants with 10 participants randomly assigned to each cell. You have now analysed the data and are checking your work. Which of the following would immediately let you know that you have made an error?

A. You found the total degrees of freedom to be 60.

B. You found the mean square for the error term to be 6.25

C. You found the F‐statistic for the interaction effect to be 2.34

D. You found degrees of freedom for the interaction effect to be 2.

(Source: DiBattista, 2011 in Scully, 2017)

Rationale:

This statistics question assesses analysis by requiring students to internally deconstruct the experimental design to calculate expected parameters and compare them against the provided results. It moves beyond recall as students must selectively apply statistical formulas for degrees of freedom (N – 1) to identify a mathematical impossibility within a complex data set.

Example 2 (child development)

Original question Rephrased question
How old is the average child that draws people in this manner?

 

A. 18–24 months

B. 3–4 years

C. 5–6 years

D. 7–8 years

Based on the developmental stage represented by this drawing, which of the following best represents the vocabulary size of the average child capable of producing this figure?

 

A. 50–100 words; mostly nouns and simple social expressions.

B. 200–300 words; beginning to use “telegraphic” two-word sentences.

C. 1,000–1,500 words; using complex sentences and basic grammar rules.

D. 2,500–5,000 words; demonstrating mastery of abstract concepts and adult-like syntax.

(Example revised with GenAI, adapted from van der Marel, 2025).

Rationale:

The original question tests simple recognition and recall. The student only needs to match a visual stage of artistic development to a specific age range found in a textbook. The reworked question requires students to combine pieces of knowledge to identify the developmental stage and then correlate it with linguistic milestones. It moves beyond simple recall by forcing students to integrate knowledge of motor and cognitive development into a singular, cohesive profile of a child (applying, analyzing).

3. Provide a complex case or scenario

Present a context with multiple pieces of information that students must interpret to choose the best plan or strategy (analysing or evaluating). This could also include a graph or a visual.

N.B. questions like these are often text-heavy and may unintentionally measure reading skills under time pressure. Try to keep the scenario as short and to-the-point as possible to avoid unduly (dis)advantaging certain students and make sure there is enough reading time built in.

Example 1 (medicine)

You receive a report on the following patients at the beginning of your evening shift. Which patient should you assess first?

A. An 82‐year‐old with pneumonia who seems confused at time

B. A 76‐year‐old patient with cancer with 300 mL remaining of an intravenous infusion

C. A 40‐year‐old who had an emergency appendectomy 8 hours ago

D. An 18‐year‐old with chest tubes for treatment of a pneumothorax following an accident

(Source: Oermann & Gaberson, 2009 in Scully)

Rationale: This question is complex because students must assess multiple cases in parallel and apply implicit urgency criteria. The higher cognitive level lies in prioritizing based on risk and clinical relevance, not in recognizing individual symptoms. The distractors are convincing because every patient needs care, but only one situation warrants immediate assessment.

Example 2 (medicine)

A 62 year-old woman with a history of confusion and constipation comes to the office for a follow-up visit. Laboratory investigations reveal a serum calcium of 2.9mmol/L, a creatinine of 146 μmol/L, and a hemoglobin of 108 g/L.

Which one of the following would help confirm the diagnosis?

A. Parathyroid hormone

B. Serum protein electrophoresis

C. 25-OH vitamin D

D. Serum creatinine

E. Abdominal ultrasound

(Source: Touchie, 2012, in McGill, n.d.).

Rationale:

This question is complex because students must integrate multiple symptoms and lab values to form a plausible diagnosis before they can determine which additional test will confirm that diagnosis. The higher cognitive level lies in combining and interpreting data, not in recognizing isolated facts. The distractors are credible because they appear to be diagnostically relevant, but not all of them contribute to confirming the most likely diagnosis.

Example 3 (work and organizational psychology)

Scenario:

A US-based NGO is opening a regional office in a country where LGBTQ+ identities are legally restricted. The NGO has a strict “Authentic Self” policy. They are hiring a Country Director and must choose between a well-connected but conservative local candidate and a US-based LGBTQ+ internal candidate. The NGO faces a limited startup budget and must ensure the office is operational within three months without triggering costly legal interventions from the local government.

Question:

In light of the conflicting demands of corporate values, local legal risks, and budget constraints, which of the following strategies represents the most effective balance for the DEI committee to recommend?

A. Hire the local candidate and mandate a signed contract to uphold global LGBTQ+ policies; however, the NGO lacks the local legal presence to enforce these clauses if violated.

B. Hire the internal US candidate to signal non-negotiable values, despite the high cost of a relocation package and the significant legal risk of the candidate being denied a work visa based on their identity.

C. Hire the local candidate but establish a “Dual-Report” system with a global DEI officer; note that this requires a 15% increase in the administrative budget to fund the additional oversight position.

D. Perform a localized “Risk and Equity Audit” to adapt the policy into a “Safe Harbor” model; this utilizes existing HR resources to protect marginalized staff locally without requiring public disclosure or expensive external oversight.

(Source: own examples created with GenAI, revised by subject specialists)

Rationale:

This requires evaluation because the student must weigh ethical values against resource and legal feasibility. It critiques “Universalism” by forcing a choice that is legally and financially viable in a “wicked problem” scenario. Option D is correct because it addresses the “duty of care” using existing internal resources, whereas B and C carry high financial or legal risks that might jeopardize the entire expansion.

4. Build in weighing of arguments

Have students assess the relative importance or validity of arguments supporting a given conclusion (evaluating).

Example 1 (education)

Read the following comments a teacher made about testing. Then answer the question below by choosing the best answer.

“Students go to school to learn, not to take tests. In addition, tests cannot be used to indicate a student’s absolute level of learning. All tests can do is rank students in order of achievement, and this relative ranking is influenced by guessing, bluffing, and the subjective opinions of the teacher doing the scoring. The teacher-learning process would benefit if we did away with tests and depended on student self-evaluation.”

Which one of the following propositions is most essential to the final conclusion?

A. Effective self-evaluation does not require the use of tests.

B. Tests place students in rank order only.

C. Test scores are influenced by factors other than achievement.

D. Students do not go to school to take tests.

(Gronlund, 1998, in McGill, n.d.).

Rationale:

Students must first reconstruct the underlying line of reasoning before they can assess the answer alternatives. This requires distinguishing between main and secondary premises and places the cognitive burden not on subject matter, but on logical analysis and argument recognition. The distractors are therefore plausible, because they do appear in the text, but not all of them support the conclusion.

Example 2 (philosophy)

Consider the question, “What is meant by the charge that utilitarianism is too demanding?”

Now suppose the following answer is given:

“Utilitarianism requires moral people to respond to important moral concerns such as helping the less fortunate, while allowing immoral people to pursue their careers, family lives, and personal projects.”

What is wrong with this answer?

A. Nothing – that answer is correct.

B. It falsely describes what utilitarianism requires of moral people.

C. It falsely describes what utilitarianism allows of immoral people.

D. It relies on a false dichotomy between moral people and immoral people.

(Green, n.d., in McGill).

Rationale:

This question is complex because students must first correctly reconstruct the original argument against utilitarianism before they can assess the answer. The task is not to recognize a correct definition, but to identify a subtle flaw in the given explanation. The higher cognitive level therefore lies in diagnosing an error in reasoning, not in reproducing knowledge. The distractors are sophisticated in terms of content and require precise conceptual distinction.

Adapting one question to multiple higher levels of thinking

Here is a basic worked example of how you can adapt a question addressing factual recall to assess different levels of higher order thinking (using an example from political science).

  1. Remembering: who of these 4 scholars is a Realist?
  2. Understanding: why is X a Realist?
  3. Applying: What would X have to say about event Y?
  4. Analyzing: Why could it be argued that X is not only a Realist, but also an Idealist?
  5. Evaluating: Which assessment of the claim that X is both a Realist and an Idealist is, from perspective Y, the most convincing?

(Source: own examples created with GenAI, revised by subject specialist)

Example 1: Remembering

Which of the following scholars is most closely associated with the development of “Classical Realism”?

A. Alexander Wendt

B. Hans Morgenthau

C. Immanuel Kant

D. Robert Keohane

 

Focus: Recalling specific facts, names, or basic definitions.

Example 2: Understanding

According to Realist theory, why is the international system characterized by constant competition and the potential for conflict?

A. Because international law is fundamentally flawed and requires reform.

B. Because states prioritize global economic stability over national interests.

C. Because the system is anarchic, meaning there is no central authority to enforce rules or guarantee security.

D. Because democratic states are inherently more aggressive than autocratic ones.

 

Focus: Grasping the meaning of a concept and explaining “why” or “how.”

Example 3: Applying

In response to a neighboring country’s sudden military buildup, State A decides to increase its own defense budget and seek new military alliances. This behavior is a classic example of which Realist concept applied to a real-world event?

A. Collective Security

B. The Security Dilemma

C. Relative gains

D. Self-help

 

Focus: Using abstract theoretical concepts to solve a problem or interpret a specific real-world scenario.

Example 4: Analyzing

E.H. Carr is often labeled a “Realist,” yet he argued that a stable international order cannot be sustained by power alone and must include a moral component. Why could it be argued that his work reflects both Realism and elements of Idealism?

A. He believed that international organizations like the League of Nations would eventually replace the state.

B. He argued that power is irrelevant as long as a state has a strong legal framework.

C. He analyzed the reality of power politics while maintaining that “utopian” aspirations for morality are necessary to give power a purpose.

D. He suggested that economic interdependence is the only way to prevent states from engaging in war.

 

Focus: Breaking down information into parts, identifying motives, or finding evidence to support generalizations (e.g., seeing the overlap between conflicting theories).

Example 5: Evaluating

E.H. Carr is often characterised as both a realist and an idealist. Which assessment of this dual classification can be best justified on the basis of theoretical consistency?

A. The classification is problematic because Carr’s emphasis on power and interests is fundamentally incompatible with idealistic assumptions.

B. The classification is valid because Carr takes power analysis seriously and explicitly gives normative aspirations a place within his theory. 

C. The classification is superfluous because Carr’s work can be placed entirely within the realist paradigm.

D. The classification is misleading because Carr regards idealism exclusively as a historical phase that has been overcome by realism.

 

Focus: Making and justifying judgments about theories or classifications by weighing arguments and evidence against explicit criteria (e.g., assessing how theoretically consistent or convincing a particular interpretation or dual classification is).

Using alternative closed-ended question formats in ANS Exam

Apart from traditional MCQs where students pick one best answer out of a number of options (usually three or four), the UvA digital assessment platform ANS allows you to create many different kinds of closed-ended questions. These formats often allow you to address higher level thinking skills beyond recall or understanding. Your faculty’s ICTO team can help you work with ANS.

Below are some examples of question formats that allow assessment of higher order thinking:

  • Fill-in-the-blanks: provide a scenario or case and have students fill in one or more blanks. You can choose whether they should fill in a single-word answer themselves, or you can provide a drop-down list with several options.
  • Ranking question: provide a scenario or case and have students put certain information in the correct order, for instance steps that should be taken according to a theory or model.
  • Hotspot question: students have to place (labelled) markers on an image. This gives you a lot of freedom to upload text or a map, for instance.
  • Matching question: students have to combine theories, concepts, definitions or examples correctly.

Sources

Hulp en advies

Wil je overleggen welke maatregelen passen bij jouw vak? Maak dan een afspraak met de toetsdeskundigen van het TLC-FGw.