Assessment Supervisor Guide

In health professions education, a robust assessment plan is a hallmark of a good curriculum. In the era of competency-based medical education (CBME), assessment has taken on a prominence in our educational systems in an unparalleled way.

Must Know Concepts - Executive Summary

The following are concepts that every trainee who has completed this block should be able to address or explain.

Concept 1: Difference between Assessment and Evaluation

What is the difference between assessment and evaluation?
Assessment is traditionally thought of as about individuals, evaluation is thought to be about programs. Why is it important to have this differentiation?

Concept 2: Programmatic Assessment

What is programmatic assessment?
Why is programmatic assessment important for a training program or a medical school?

Concept 3: Formative vs Summative. Assessment of/for/as learning. High stakes vs Low stakes assessment; norm-referenced vs criterion-referenced

What is the difference between formative vs summative assessment?
What is assessment of learning? For learning? As learning?
How do norm-referenced assessments differ from criterion-referenced assessments? What are the strengths and weaknesses of both?

Concept 4: Modalities of Assessment

Looking back on your own training experiences, list and describe all of the assessment modalities you have experienced.
Discuss the pros and cons of at least seven modalities.

Concept 5: Data-informed assessments of trainee competence

In the era of CBME, you are asking faculty members to help you gather evidence of performance for trainees throughout their rotations. What considerations are important when collecting such data?
What is a competence committee and how does it work?
What are learning analytics? How do they relate to a trainee's learning?

Concept 6: Remediation

What are the most common reasons for a learner being in difficulty? How can we be proactive in our support of trainees’ learning?
What is the difference between an ‘enhanced learning plan’, ‘remediation’, and ‘probation’?
What should trigger a trainee to undergo ‘remediation’?
How do you set up a remediation plan?
How should learners be supported during remediation?

Sessions 1 & 2 (aka Meeting 1) – Concepts 1 & 2 – pre-learning materials:

Video:

Van der Vleuten CPM. Programmatic assessment. https://youtu.be/qpkk512krNc

Readings:

Schuwirth LWT, van der Vleuten CPM. A history of assessment in medical education. Advances in Health Sciences Education. 2020;25(5):1045-1056.
Schuwirth LWT, van der Vleuten CPM. General overview of the theories used in assessment: AMEE guide no. 57 Med Teach. 2011;33(10):783-797.
Schuwirth LWT, van der Vleuten CPM. Programmatic assessment: from assessment of learning to assessment for learning. Med Teach. 2011;33(6):478-485.
Van der Vleuten CPM, Schuwirth LWT. Assessing professional competence: from methods to programmes. Medical Education. 2005;39(3):309-317. (Link)
Schuwirth LWT, van der Vleuten CPM, Durning SJ. What programmatic assessment in medical education can learn from healthcare. Perspectives on Medical Education. 2017 Aug 1;6(4):21. (Link)
Gruppen LD, ten Cate O, Lingard LA, Teunissen PW, Kogan JR. Enhanced requirements for assessment in a competency-based, time-variable medical education system. Academic Medicine. 2018 Mar 1;93(3S):S17-S21. (Link)

Sessions 3 & 4 (aka Meeting 2) – Concepts 3 & 4, pre-learning materials:

Video:

Patocka C. Demystifying validity frameworks. February 2023 T-TIME presentation (https://medicine.usask.ca/facultydev/clinician-educator-base/past-t-time-presentations.php)

Readings:

Schuwirth LWT, van der Vleuten CPM. Programmatic assessment and Kane’s validity perspective. Medical Education. 2012 Jan;46(1):38-48. (Link)
van der Vleuten CPM, Sluijsmans D, Joosten-ten Brinke D. Competence assessment as learner support in education. In: Mulder M, ed. Competence-based Vocational and Professional Education. Springer; 2017: 607-630. (Link)
Hall A, Chaplin T, McColl T, et al. Harnessing the power of simulation for assessment: consensus recommendations for the use of simulation-based assessment in emergency medicine. 2020 Mar;22(2):194-203. Doi10.1017/cem2019.499. https://pubmed.ncbi.nlm.nih.gov/32209155/
Wass V, van der Vleuten C, Shatzer J, Jones R. Assessment of clinical competence. The Lancet. 2001 Mar 24;357(9260):945-949. (Link)
Ginsburg S, van der Vleuten CPM, Eva KW, Lingard L. Cracking the code: residents’ interpretations of written assessment comments. Medical Education. 2017 Apr;51(4):401-410. (Link)
Sebok-Syer SS, Klinger DA, Sherbino J, Chan TM. Mixed messages or miscommunication? Investigating the relationship between assessors’ workplace-based assessment scores and written comments. Academic Medicine. 2017 Dec 1;92(12):1774-1779. (Link)

Sessions 5 & 6 (aka Meeting 3) – Concepts 5 & 6, pre-learning materials:

Chan T, Sebok‐Syer S, Thoma B, Wise A, Sherbino J, Pusic M. Learning analytics in medical education assessment: the past, the present, and the future. AEM Education and Training. 2018 Apr;2(2):178-187. (Link)
Cook DA, Brydges R, Ginsburg S, Hatala R. A contemporary approach to validity arguments: a practical guide to Kane's framework. Medical Education. 2015 Jun;49(6):560-575. (Link)
Cook DA, Zendejas B, Hamstra SJ, Hatala R, Brydges R. What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Advances in Health Sciences Education. 2014 May 1;19(2):233-250. (Link)
Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments: theory and application. The American Journal of Medicine. 2006 Feb 1;119(2):166-e7. (Link)
Hauer K, Ciccone MS, Henzel TR, et al. Remediation of the deficiencies of physicians across the continuum from medical school to practice: a thematic review of the literature. Academic Medicine. 2009; 84(12):1822-1832.
Tehrani A, O'Sullivan PS, Lovett M, Hauer KE. Categorization of unprofessional behaviours identified during administration of and remediation after a comprehensive clinical performance examination using a validated professionalism framework, Medical Teacher, 2009;31(11), 1007-1012, DOI:10.3109/01421590802642537
Shearer C, Bosma M, Bergin F, Sargeant J, Warren A. Remediation in Canadian medical residency programs: established and emerging practices. Medical Teacher. 2019;41(1), 28-35. https://doi.org/10.1080/0142159X.2018.1436164
Chou CL, Kalet A, Costa MJ, Cleland J, Winston K. Guidelines: The dos, don’ts and don’t knows of remediation in medical education. Perspectives on Medical Education 2019; 8(6):322-338. https://doi.org/10.1007/s40037-019-00544-5
Mills LM, Boscardin C, Joyce EA, ten Cate O, O’Sullivan PS. Emotion in remediation: a scoping review of the medical education literature. Med Educ. 2021;55(12):1350–1362. DOI: 10.1111/medu.14605
Cleland J, Cilliers F, van Schalkwyk S. The learning environment in remediation: a review. The Clinical Teacher 2018;15(1):13-18. doi: 10.1111/tct.12739
Kalet A, Chou CL, Ellaway RH. To fail is human: remediating remediation in medical education. Perspectives on Medical Education 2017;6(6):418-424. https://doi.org/10.1007/s40037-017-0385-6

Concepts in Depth - For each of the above topics, please complete the following:

Concept 1: Difference between Assessment and Evaluation

Suggested prompts:

What is the difference between assessment and evaluation?
Assessment is traditionally thought of as about individuals, evaluation is thought to be about programs. Why is it important to have this differentiation?

Suggested activity (optional):

	Assessment	Evaluation
What are you making judgements about?
What sorts of outcomes do you measure?
What sort of data would you gather?
How is the data you collected used?

Key readings about this topic that a faculty supervising a trainee should read or be familiar with:

Moore Jr DE. Assessment of Learning and Program Evaluation in Health Professions Education Programs. New Directions for Adult and Continuing Education. 2018 Mar;2018(157):51-64. (Link)
van der Vleuten CPM, Schuwirth LWT, Assessing professional competence: from methods to programmes. Medical Education. 2005;39(3):309-317.

Other related concepts that trainee might want to know:

Program Evaluation (see curriculum unit)

Concept 2: Programmatic Assessment

Suggested prompts:

What is programmatic assessment?
Why is programmatic assessment important for a training program or a medical school?

Suggested activity (optional):

Watch this short trigger video: https://youtu.be/qpkk512krNc
Give the trainee(s) an activity of creating a program of assessment for a longitudinal integrated clerkship curriculum on clinical skills.

Key readings about this topic that a faculty supervising a trainee should read or be familiar with:

Van Der Vleuten CP, Schuwirth LW. Assessing professional competence: from methods to programmes. Medical education. 2005 Mar;39(3):309-17. (Link)
Schuwirth L, van der Vleuten C, Durning SJ. What programmatic assessment in medical education can learn from healthcare. Perspectives on medical education. 2017 Aug 1;6(4):21 (Link)

Other related concepts that trainee(s) might want to know:

Validity Frameworks (Kane, Messick)
Competence Committees

Other suggested readings or resources:

van der Vleuten CPM – Video: Lecture on Programmatic Assessment.
Gruppen LD, ten Cate O, Lingard LA, Teunissen PW, Kogan JR. Enhanced requirements for assessment in a competency-based, time-variable medical education system. Academic Medicine. 2018 Mar 1;93(3):S17-21. (Link)

Concept 3: Frameworks for assessment.

Formative vs summative
Assessment of/for/as learning
Criterion referenced vs normative referenced
High stakes vs Low stakes assessment

Suggested prompts:

What is the difference between formative vs. summative assessment?
What is assessment of learning? For learning? As learning?
What is the difference between criterion- and norm-referenced assessments?
What is a high stake assessment? What is a low stake assessment?

Assessment	OF Learning	FOR Learning	AS Learning
Type	Summative	Formative	Formative
What	Teachers determine progress against a standard	Teacher and peers check progress to determine learning goals	Learners take responsibility for their own learning and explore how to improve
Who	Teacher	Teacher & Peers	Learner & Peers
How	Formal assessments	Formal and informal assessments	Formal and informal assessments + self-assessments
When	Periodic review	Ongoing feedback/coaching	Continuous reflection using data
Why	Ranking and reporting	Improve learning	Deeper learning / enhance performance
Emphasis	Scoring, grades and competition	Feedback, support, collaboration	Reflection, self-evaluation

https://www.harapnuik.org/?p=8475

Suggested activity (optional):

Host a debate wherein you discuss of the following resolutions. Pre-assign roles:
- Be it resolved that there is no such as formative assessment.
- Be it resolved that in simulation should NEVER be a place for summative assessment.
- Be it resolved that in the clinical environment there is no such thing as 'low stakes' assessments, since patients should be protected from mistakes.

Key readings about this topic that a faculty supervising a trainee should read or be familiar with:

van der Vleuten C, Sluijsmans D, Joosten-ten Brinke D. Competence assessment as learner support in education. In: Competence-based Vocational and Professional Education 2017 (pp. 607-630). Springer, Cham. (Link)
Hall A, Chaplin T, McColl T, Petrosoniak A, Caners K, Rocca N, Gardner C, Bhanji F, Woods R. Harnessing the power of simulation for assessment: Consensus recommendations for the use of simulation-based assessment in emergency medicine. CJEM 2020 Mar;22(2):194-203. Doi10.1017/cem2019.499. https://pubmed.ncbi.nlm.nih.gov/32209155/

Other suggested readings or resources:

Schuwirth LW, van der Vleuten CP. Programmatic assessment and Kane’s validity perspective. Medical education. 2012 Jan;46(1):38-48. (Link)

Concept 4: Modalities of Assessment

Suggested prompts:

Reflect back on your own training experiences. List and describe all of the assessment modalities you have experienced.
Discuss the pros and cons of at least seven assessment modalities.

Suggested activity (optional):

Case 7 from the Consulting in MedEd cases – A Case of Outlier Residents (Link)

Key readings about this topic that a faculty supervising a trainee should read or be familiar with:

Wass V, Van der Vleuten C, Shatzer J, Jones R. Assessment of clinical competence. The Lancet. 2001 Mar 24;357(9260):945-9. (Link)
(Re: Qualitative comments) - Ginsburg S, Van Der Vleuten CP, Eva KW, Lingard L. Cracking the code: residents’ interpretations of written assessment comments. Medical education. 2017 Apr;51(4):401-10. (Link)

Other related concepts that trainee(s) might want to know:

Validity evidence *Dr. Catherine Patocka’s February 2023 T-TIME presentation on ‘Demystifying Validity Frameworks’ (Link)

Other suggested readings or resources:

Ginsburg S, van der Vleuten CP, Eva KW. The hidden value of narrative comments for assessment: a quantitative reliability analysis of qualitative data. Academic Medicine. 2017 Nov 1;92(11):1617-21. (Link)
Sebok-Syer SS, Klinger DA, Sherbino J, Chan TM. Mixed Messages or Miscommunication? Investigating the Relationship Between Assessors’ Workplace-Based Assessment Scores and Written Comments. Academic Medicine. 2017 Dec 1;92(12):1774-9. (Link)

Concept 5: Data-informed assessment of trainee competence

Suggested prompts:

Compare and contrast the two dominant validity frameworks in medical education (Kane, Messick)
What makes a 'valid' assessment?
- In the era of CBME, you are asking faculty members to help you gather evidence of performance for trainees throughout their rotations. What are important considerations to have in terms of data collection? Name some threats to validity when relying on clinical faculty to assessment residents.
What is a competency committee and how does it work?
What are learning analytics? How do they relate to a trainee's learning?

Suggested activity (optional):

Conduct a problem-based learning activity about either of these cases:
- The Case of the Competency Conundrum
- The Case of the Flagged 3rd year

Key readings about this topic that a faculty supervising a trainee should read or be familiar with:

Chan T, Sebok‐Syer S, Thoma B, Wise A, Sherbino J, Pusic M. Learning Analytics in Medical Education Assessment: The Past, the Present, and the Future. AEM Education and Training. 2018 Apr;2(2):178-87. (Link)
Cook DA, Brydges R, Ginsburg S, Hatala R. A contemporary approach to validity arguments: a practical guide to Kane's framework. Medical education. 2015 Jun;49(6):560-75. (Link)
Cook DA, Zendejas B, Hamstra SJ, Hatala R, Brydges R. What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Advances in Health Sciences Education. 2014 May 1;19(2):233-50. (Link)

Other related concepts that trainee(s) might want to know:

Other cases: The case of the awkward assessor.

Other suggested readings or resources:

Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments: theory and application. The American journal of medicine. 2006 Feb 1;119(2):166-e7. (Link)

Concept 6: Remediation

Suggested prompts:

What are the most common reasons for a trainee to be in difficulty? How can we be proactive in our support of trainees’ learning?
What is the difference between an ‘enhanced learning plan’, ‘Remediation’, and ‘Probation’?
What should trigger a trainee to undergo ‘remediation’?
How do you set up a remediation plan?

Suggested activity (optional):

Create a fictional trainee in difficulty. Create a remediation blueprint.

Performance / competency concern

CanMEDS domain

How will you teach & support the trainee?

How will you assess the trainee?

Key readings about this topic that a faculty supervising a trainee should read or be familiar with:

Hauer K, Ciccone MS, Henzel TR, et al. Remediation of the deficiencies of physicians across the continuum from medical school to practice: a thematic review of the literature. Academic Medicine. 2009; 84(12):1822-1832.
Tehrani A, O'Sullivan PS, Lovett M, Hauer KE. Categorization of unprofessional behaviours identified during administration of and remediation after a comprehensive clinical performance examination using a validated professionalism framework, Medical Teacher, 2009;31(11), 1007-1012, DOI:10.3109/01421590802642537
Shearer C, Bosma M, Bergin F, Sargeant J, Warren A. Remediation in Canadian medical residency programs: established and emerging practices. Medical Teacher. 2019;41(1), 28-35. https://doi.org/10.1080/0142159X.2018.1436164
Chou CL, Kalet A, Costa MJ, Cleland J, Winston K. Guidelines: The dos, don’ts and don’t knows of remediation in medical education. Perspectives on Medical Education 2019; 8(6):322-338. https://doi.org/10.1007/s40037-019-00544-5

Other related concepts that trainee(s) might want to know:

Mills LM, Boscardin C, Joyce EA, ten Cate O, O’Sullivan PS. Emotion in remediation: a scoping review of the medical education literature. Med Educ. 2021;55(12):1350–1362. DOI: 10.1111/medu.14605
Cleland J, Cilliers F, van Schalkwyk S. The learning environment in remediation: a review. The Clinical Teacher 2018;15(1):13-18. doi: 10.1111/tct.12739
Kalet A, Chou CL, Ellaway RH. To fail is human: remediating remediation in medical education. Perspectives on Medical Education 2017;6(6):418-424. https://doi.org/10.1007/s40037-017-0385-6

Other suggested readings or resources:

Local UGME and PGME remediation resources.

Messick. Definitions by David Cook. http://dx.doi.org/10.1007/s10459-013-9458-4

Content evidence:

Comprises a description of steps taken to ensure that assessment content (including scenarios, questions, response options, and instructions) reflects the construct it is intended to measure (e.g., ‘‘professionalism’’). This might involve basing the assessment on prior instruments, obtaining expert review, or using an assessment blueprint.

Response process evidence:

Comprises theoretical and empirical analyses evaluating how well rater or examinee actions (responses) align with the intended construct. This includes assessment security (those who cheat are not responding based on the intended construct), quality control, and analysis of examinees’ or raters’ thoughts or actions during the assessment activity.

Internal structure evidence:

Comprises data evaluating the relations among individual assessment items and how these relate to the overarching construct. This most often takes the form of measures of reproducibility (reliability) across items, stations, or raters, but can also include item analysis (item difficulty and item discrimination) and factor analysis.

Relations with other variables evidence:

Regards the statistical associations between assessment scores and another measure or feature that has a specified theoretical relationship. This relationship might be strongly positive (e.g., two measures that should measure the same construct) or negligible (for measures that should be independent).

Consequences evidence:

Regards the impact, beneficial or harmful, of the assessment itself and the decisions and actions that result (e.g. remediation following sub-standard performance). This also includes factors that directly influence the rigor of such decisions, such as the definition of the passing score (e.g., at what point is remediation required?) and differences in scores among subgroups where performance ought to be similar (suggesting that decisions may be spurious).

Kane's 4 Inferences. From Lambert W T Schuwirth1,2 & Cees P M van der Vleuten.

From observation to score:

When taking a patient’s BP, the doctor must convert acoustic (Korotkow sounds) signals and a visual reading of the sphygmomanometer to a numerical value. The inferences are based on the assumption that the doctor knows when to take the reading, does not let the sphygmomanometer run down too quickly or too slowly, and uses the right cuff, and so forth. Only when every aspect of the procedure is performed correctly can a valid inference from observation to score be made.

From observed score to universe score:

The next inference refers to whether the observations are sufficiently representative of all possible observations. In our example, this refers to whether one measurement provides sufficient data on which to base a diagnosis. The Dutch guideline, for example, stipulates that hypertension can only be diagnosed if BP is taken twice during one consultation and is repeated during a second consultation.32

From universe score to target domain:

Now the results of the BP measurements are used to draw conclusions about the cardiovascular status of the patient. This requires heart auscultation, pulse palpation and other results to be incorporated and the results triangulated in order for the conclusions to be valid.

From target domain to construct:

The patient’s cardiovascular status can now be used to establish his or her health status, but further information must be obtained from other sources and triangulated to support a more general conclusion.

Back to Complete Set of Documents

Performance / competency concern	CanMEDS domain	How will you teach & support the trainee?	How will you assess the trainee?