Text/HTML

Text/HTML

Next-Gen Credentialing AI, Human Oversight, and Content Creation

AI is being used in physical therapy education and assessment to support tasks like item writing, formative assessment, and simulated learning. This article is based on a presentation at the 2025 Annual Education Meeting by Kimberly Swygert, Colleen Lettvin, Marcia Himes, and Jonathan Bird.

Text/HTML

Untitled Document

Artificial intelligence (AI) is already influencing how educators and assessment professionals approach content creation, formative learning, and credentialing. In physical therapy education and assessment, these tools offer new opportunities to support efficiency, personalization, and simulation. However, there is also an accountability gap inherent in AI-assisted decision-making. If an AI-generated recommendation leads to harm, responsibility does not rest with the system itself. This reality is particularly salient in healthcare contexts, where professional judgment and ethical responsibility cannot be delegated to a tool.

For educators and regulators, the implication is clear: AI may support work, but it cannot replace professional responsibility. As AI capabilities continue to evolve, the central challenge is not whether to use these tools, but how to apply them thoughtfully and responsibly.

AI in Assessment

Assessment programs depend on validity, reliability, fairness, and defensibility. AI systems lack moral agency, cannot be held accountable for errors, and may produce confident but inaccurate outputs. These limitations underscore why humans must remain “in the loop,” particularly when decisions affect students, licensure candidates, or public safety.

Item writing is a mission-critical activity for educational programs and licensure boards, and it remains heavily dependent on human expertise. High-quality test items require collaboration among subject matter experts (SMEs), psychometricians, test developers, and editors. This process is time-consuming and expensive, but it is essential for producing scores that are valid, reliable, fair, and defensible.

Efforts to use generative AI for item generation date back to the 2010s, when psychometricians and data scientists began experimenting with early language models. At that time, models were smaller, required significant technical expertise, and lacked user-friendly interfaces. While some organizations explored these approaches, overall adoption was limited. The recent emergence of more accessible large language models (LLMs) has dramatically changed that landscape.

With so much new potential, it may be difficult for organizations new to AI to know where to start. Rather than beginning with end-to-end item generation, organizations may benefit from a more incremental approach: identifying specific components of item development where generative AI can provide support while preserving human oversight and implementation. For example, an SME might write the stem and identify the correct answer, then use an AI system to generate potential distractors or rationales. In other cases, AI might be used to code an item based on an underlying diagnosis or content category. This approach acknowledges both the strengths and limitations of AI. Even while some limitations of generative AI remain, it can still assist in offloading more tedious or error-prone tasks, reducing cognitive load on SMEs and allowing them to focus on higher-level judgment.

Therefore, the key question becomes not whether to use generative AI, but where its use adds value without compromising quality. Organizations should begin by identifying where AI has the greatest potential to add value, whether by improving an existing assessment process or enabling a new type of assessment that would be difficult to create through traditional methods. For example, formative assessment may be a particularly promising area for AI integration. Because formative assessments are low-stakes, they offer opportunities to experiment with new technologies while maintaining appropriate safeguards. In national assessment contexts, AI can support components of case-based formative assessments without altering the underlying concepts being measured.

One example involved AI-generated or AI-enhanced video cases. While variation in video quality might be unacceptable in a high-stakes exam, it may be considered manageable in a formative setting. This distinction highlights how the assessment purpose should guide decisions about AI adoption.

Conversational AI also points to future possibilities. If assessment programs can support back-and-forth video interactions between students and avatars, they may be able to measure concepts that are difficult to assess through traditional formats. These include not only communication skills, but also aspects of clinical reasoning and professional judgment.

At the same time, the rapid pace of technological change poses challenges. For example, for one project, early iterations produced avatars that appeared artificial and emotionally limited, but technological advances led to more realistic portrayals within a short timeframe. Tools considered state-of-the-art early in the year may be outdated by year’s end. This reality complicates long-term planning and reinforces the need for flexible frameworks that prioritize frameworks and outcomes over specific technologies.

Generative AI in PT Education

In physical therapy education, generative AI can support formative learning when grounded in vetted resources. In a cardiopulmonary course, for example, educators could use a tool that allows instructors to upload approved course materials (e.g., textbook chapters, case studies, lab handouts) after confirming copyright permissions. Once these materials are uploaded, the AI system is constrained to that content, effectively creating guardrails that reduce the risk of hallucinated or inappropriate information. Students can then interact with the material in ways that support understanding and application.

This approach addresses a core concern: students often “don’t know what they don’t know.” Students may not have the expertise to recognize when AI-generated content is misleading or incorrect. By anchoring AI interactions in vetted sources, educators can provide personalized support while maintaining confidence in the accuracy of the underlying information.

Another type of tool can encourage exploration and comparison across multiple AI models. After students develop their own responses to a clinical case, they can use the tool to see how different generative AI systems would approach the same problem. Students then compare outputs, identify strengths and weaknesses, and refine their original plans using clinical judgment. This structured comparison can move students beyond memorization toward metacognition and critical reasoning. Rather than accepting AI outputs as fact, students can learn to evaluate them; this alone has become an essential skill as AI becomes more integrated into clinical and educational environments. Faculty guidance remains central to this process, particularly in lab settings where discussion and feedback can take place.

Another application involved creating customized AI “bots” to simulate clinical interactions. These bots could be programmed with specific patient characteristics, medical histories, or emotional states, allowing students to practice subjective history-taking in a low-stakes environment. Separate bots could be designed to help students strengthen professional behaviors such as empathy, communication, and professionalism—areas that are often challenging to teach and assess consistently.

While these simulations do not replace real patient interactions, they offer scalable opportunities for practice and feedback. As with other uses of AI, their effectiveness depends on thoughtful design and clear boundaries.

Generative AI as an Extension of Human Expertise

Across national assessment and classroom contexts, generative AI should be viewed as an extension of human expertise, not a replacement for it. When used thoughtfully, AI can support repetitive tasks, provide individualized feedback, and scale learning opportunities. At the same time, humans remain responsible for defining frameworks, ensuring fairness, and exercising professional judgment.

For regulators, educators, and assessment professionals, the challenge is to balance innovation with responsibility. Formative assessments offer a natural testing ground, while high-stakes contexts demand greater caution. By focusing on where AI adds value and maintaining clear guardrails, we can leverage new tools while upholding each assessment tool’s standards. In doing so, AI becomes not a threat to expertise but a tool that helps extend and elevate it.

Text/HTML

Kimberly Swygert

Kimberly Swygert, PhD, is the Director of Test Development Innovations at NBME. Kimberly has more than twenty-five years of experience in the psychometric and health professions education fields across multiple domains, including test development, innovative item development, test construction/scoring, and test security. Her recent work has focused on the potential for NLP and ML to enhance test development and the legal/ethical factors for the implementation of AI in assessment. Kimberly has been a psychometric consultant for over a decade for organizations such as FSBPT and AICPA. She has published chapters on item and test development in recent editions of Assessment for Health Professions Education, the Handbook of Test Development, and the Guidelines for Technology-Based Assessment.

	Colleen Lettvin, PT Colleen Lettvin is the Assessment Content Manager at the Federation of State Boards of Physical Therapy. Colleen oversees the development of content for the National Physical Therapy Examination and other products, such as the Practice Exam & Assessment Tool (PEAT). Before joining FSBPT as a staff member in 2016, Colleen was actively engaged in item and examination development as a volunteer at FSBPT since 2005. Colleen has been a Board-Certified Specialist in Cardiovascular and Pulmonary Physical Therapy by the American Board of Physical Therapy Specialties (ABPTS) since 2010 and received her Master of Science in Physical Therapy from Texas Woman’s University.
	Marcia Himes Marcia K. Himes is Program Director of Physical Therapy at Missouri State University, where she blends her passion for teaching with more than two decades of clinical expertise in cardiopulmonary rehabilitation, geriatrics, and fall-risk prevention. She teaches patient management courses in cardiopulmonary and exercise physiology, preparing students for evidence-based, patient-centered care. A board-certified Cardiovascular & Pulmonary Clinical Specialist, she also mentors students in a thriving pro-bono clinic. Marcia is dedicated to advancing physical therapy through innovative education, collaborative practice, and an unwavering commitment to excellence in patient outcomes.
	Jonathan Bird Jonathan Bird has been a PTA Program Chair and faculty member for more than sixteen years. He has been on the Idaho PT Licensing Board since 2021 and currently serves as Board Chair. He has been an FSBPT PTA Exam writer since 2018 and an EDC-PTA member and chair since 2020. He is currently in clinical practice and also mentors entry-level DPT students.