Assessment Supervisor Guide

In health professionals education, creating robust assessment plans are a hallmark of a good curriculum. In the era of competency-based medical education (CBME), assessment has taken on a prominence in our educational systems in an unparalleled way.  Entrustable professional activities (EPAs).

Must Know Concepts - Executive Summary

The following are concepts that every trainee who has completed this block should be able to address or explain.

Concept 1: Difference between Assessment vs. Evaluation

  • What is the difference between assessment and evaluation?
  • Assessment is traditionally thought of as about individuals, evaluation is thought to be about programs.  Why is it important to have this differentiation?

Concept 2: Programmatic Assessment

  • What is programmatic assessment?
  • Why is programmatic assessment important for a training program or a medical school? 

Concept 3: Formative vs Summative.  Assessment of/for/as learning.  High stakes vs Low stakes assessment

  • What is the difference between formative vs summative assessment?
  • What is assessment of learning? For learning? As learning?

Concept 4: Modalities of Assessment

  • Reflect back on your own training experiences.  List and describe all of the assessment modalities you have experienced.
  • Discuss the pros and cons of at least seven types of assessments.

Concept 5: Data-driven decisions on trainee competence

  • In the era of CBME, you are asking faculty members to help you gather evidence of performance for trainees throughout their rotations.  What are important considerations to have in terms of data collection?
  • What is a competence committee and how does it work?
  • What are learning analytics? How do they relate to a trainee's learning?

Concepts in Depth - For each of the above topics, please complete the following grid:

Concept 1: Difference between Assessment vs Evaluation

Suggested prompts:

  • What is the difference between assessment and evaluation?
  • Assessment is traditionally thought of as about individuals, evaluation is thought to be about programs.  Why is it important to have this differentiation? 

Suggested activity (optional):

Assessment Evaluation
What are you making judgements about?                                                               
What sorts of outcomes do you measure?
What sort of data would you gather?
How is the data you collected used?         

Key readings about this topic that a faculty supervising a trainee should read or be familiar with: 

  • Moore Jr DE. Assessment of Learning and Program Evaluation in Health Professions Education Programs. New Directions for Adult and Continuing Education. 2018 Mar;2018(157):51-64. (Link)
  • Wass V, Van der Vleuten C, Shatzer J, Jones R. Assessment of clinical competence. The lancet. 2001 Mar 24;357(9260):945-9. (Link)

Other related concepts that trainee might want to know:

  • Program Evaluation (see curriculum unit)

Concept 2: Programmatic Assessment

Suggested prompts: 

  • What is programmatic assessment?
  • Why is programmatic assessment important for a training program or a medical school? 

Suggested activity (optional):

  • Watch this short trigger video: https://youtu.be/qpkk512krNc
  • Give the trainee(s) an activity of creating a program of assessment for a longitudinal integrated clerkship curriculum on clinical skills.

Key readings about this topic that a faculty supervising a trainee should read or be familiar with: 

  • Van Der Vleuten CP, Schuwirth LW. Assessing professional competence: from methods to programmes. Medical education. 2005 Mar;39(3):309-17. (Link)
  • Schuwirth L, van der Vleuten C, Durning SJ. What programmatic assessment in medical education can learn from healthcare. Perspectives on medical education. 2017 Aug 1;6(4):21 (Link)

Other related concepts that trainee(s) might want to know: 

  • Validity Frameworks (Kane, Messick)
  • Competence Committees

Other suggested readings or resources: 

  • Cees van der Vleuten – Video: Lecture on Programmatic Assessment.
  • Gruppen LD, ten Cate O, Lingard LA, Teunissen PW, Kogan JR. Enhanced requirements for assessment in a competency-based, time-variable medical education system. Academic Medicine. 2018 Mar 1;93(3):S17-21. (Link)

Concept 3: Frameworks for assessment. Formative vs summative, assessment of/for/as learning, High stakes vs Low stakes assessment

Suggested prompts:

  • What is the difference between formative vs. summative assessment?
  • What is assessment of learning? For learning? As learning?
  • What is a high stake assessment? What is a low stake assessment?

Suggested activity (optional):

  • Host a debate wherein you discuss of the following resolutions. Pre-assign roles:
    • Be it resolved that there is no such as formative assessment.
    • Be it resolved that in simulation should NEVER be a place for summative assessment.
    • Be it resolved that in the clinical environment there is no such thing as 'low stakes' assessments, since patients should be protected from mistakes.

Key readings about this topic that a faculty supervising a trainee should read or be familiar with:

  • van der Vleuten C, Sluijsmans D, Joosten-ten Brinke D. Competence assessment as learner support in education. In: Competence-based Vocational and Professional Education 2017 (pp. 607-630). Springer, Cham. (Link)
  • Hall A, Chaplin T, McColl T, Petrosoniak A, Caners K, Rocca N, Gardner C, Bhanji F, Woods R. Harnessing the power of simulation for assessment: Consensus recommendations for the use of simulation-based assessment in emergency medicine. CJEM 2020 Mar;22(2):194-203. Doi10.1017/cem2019.499. https://pubmed.ncbi.nlm.nih.gov/32209155/

Other suggested readings or resources: 

  • Schuwirth LW, van der Vleuten CP. Programmatic assessment and Kane’s validity perspective. Medical education. 2012 Jan;46(1):38-48. (Link)

Concept 4: Modalities of Assessment

Suggested prompts:

  • Reflect back on your own training experiences. List and describe all of the assessment modalities you have experienced.
  • Discuss the pros and cons of at least seven types of assessments.

Suggested activity (optional):

  • Case 7 from the Consulting in MedEd cases – A Case of Outlier Residents (Link)

Key readings about this topic that a faculty supervising a trainee should read or be familiar with:

  • Wass V, Van der Vleuten C, Shatzer J, Jones R. Assessment of clinical competence. The Lancet. 2001 Mar 24;357(9260):945-9. (Link)
  • Re: Qualitative commentsGinsburg S, Van Der Vleuten CP, Eva KW, Lingard L. Cracking the code: residents’ interpretations of written assessment comments. Medical education. 2017 Apr;51(4):401-10. (Link)

Other related concepts that trainee(s) might want to know:

  • Validity evidence *Dr. Catherine Patocka’s February 2023 T-TIME presentation on ‘Demystifying Validity Frameworks’ (Link)

Other suggested readings or resources:

  • Ginsburg S, van der Vleuten CP, Eva KW. The hidden value of narrative comments for assessment: a quantitative reliability analysis of qualitative data. Academic Medicine. 2017 Nov 1;92(11):1617-21. (Link)
  • Sebok-Syer SS, Klinger DA, Sherbino J, Chan TM. Mixed Messages or Miscommunication? Investigating the Relationship Between Assessors’ Workplace-Based Assessment Scores and Written Comments. Academic Medicine. 2017 Dec 1;92(12):1774-9. (Link)

Concept 5: Determining Competence: Data-driven decisions on trainees

Suggested prompts:

  • Compare and contrast the two dominant validity frameworks in medical education (Kane, Messick)
  • What makes a 'valid' assessment?
    • In the era of CBME, you are asking faculty members to help you gather evidence of performance for trainees throughout their rotations. What are important considerations to have in terms of data collection? Name some threats to validity when relying on clinical faculty to assessment residents.
  • What is a competency committee and how does it work?
  • What are learning analytics? How do they relate to a trainee's learning?

Suggested activity (optional):

Key readings about this topic that a faculty supervising a trainee should read or be familiar with:

  • Chan T, Sebok‐Syer S, Thoma B, Wise A, Sherbino J, Pusic M. Learning Analytics in Medical Education Assessment: The Past, the Present, and the Future. AEM Education and Training. 2018 Apr;2(2):178-87. (Link)
  • Cook DA, Brydges R, Ginsburg S, Hatala R. A contemporary approach to validity arguments: a practical guide to Kane's framework. Medical education. 2015 Jun;49(6):560-75. (Link)
  • Cook DA, Zendejas B, Hamstra SJ, Hatala R, Brydges R. What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Advances in Health Sciences Education. 2014 May 1;19(2):233-50. (Link)

Other related concepts that trainee(s) might want to know:

Other suggested readings or resources: 

  • Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments: theory and application. The American journal of medicine. 2006 Feb 1;119(2):166-e7. (Link)

Messick. Definitions by David Cook. http://dx.doi.org/10.1007/s10459-013-9458-4

Content evidence:

  • Comprises a description of steps taken to ensure that assessment content (including scenarios, questions, response options, and instructions) reflects the construct it is intended to measure (e.g., ‘‘professionalism’’). This might involve basing the assessment on prior instruments, obtaining expert review, or using an assessment blueprint.

Response process evidence: 

  • Comprises theoretical and empirical analyses evaluating how well rater or examinee actions (responses) align with the intended construct. This includes assessment security (those who cheat are not responding based on the intended construct), quality control, and analysis of examinees’ or raters’ thoughts or actions during the assessment activity.

Internal structure evidence:

  • Comprises data evaluating the relations among individual assessment items and how these relate to the overarching construct. This most often takes the form of measures of reproducibility (reliability) across items, stations, or raters, but can also include item analysis (item difficulty and item discrimination) and factor analysis.

Relations with other variables evidence:

  • Regards the statistical associations between assessment scores and another measure or feature that has a specified theoretical relationship. This relationship might be strongly positive (e.g., two measures that should measure the same construct) or negligible (for measures that should be independent).

Consequences evidence:

  • Regards the impact, beneficial or harmful, of the assessment itself and the decisions and actions that result (e.g. remediation following sub-standard performance). This also includes factors that directly influence the rigor of such decisions, such as the definition of the passing score (e.g., at what point is remediation required?) and differences in scores among subgroups where performance ought to be similar (suggesting that decisions may be spurious).

Kane's 4 Inferences. From Lambert W T Schuwirth1,2 & Cees P M van der Vleuten.

From observation to score:

  • When taking a patient’s BP, the doctor must convert acoustic (Korotkow sounds) signals and a visual reading of the sphygmomanometer to a numerical value. The inferences are based on the assumption that the doctor knows when to take the reading, does not let the sphygmomanometer run down too quickly or too slowly, and uses the right cuff, and so forth. Only when every aspect of the procedure is performed correctly can a valid inference from observation to score be made.

From observed score to universe score:

  • The next inference refers to whether the observations are sufficiently representative of all possible observations. In our example, this refers to whether one measurement provides sufficient data on which to base a diagnosis. The Dutch guideline, for example, stipulates that hypertension can only be diagnosed if BP is taken twice during one consultation and is repeated during a second consultation.32

From universe score to target domain:

  • Now the results of the BP measurements are used to draw conclusions about the cardiovascular status of the patient. This requires heart auscultation, pulse palpation and other results to be incorporated and the results triangulated in order for the conclusions to be valid.

From target domain to construct:

  • The patient’s cardiovascular status can now be used to establish his or her health status, but further information must be obtained from other sources and triangulated to support a more general conclusion.