Trial Outcomes & Findings for Preventing Medication Dispensing Errors in Pharmacy Practice With Interpretable Machine Intelligence (NCT NCT06245044)

Last Updated: 2025-11-26

Results Overview

Difference in task time measured by the number of seconds from starting the task to accepting or rejecting a medication image

Recruitment status

COMPLETED

Study phase

Target enrollment

68 participants

Primary outcome timeframe

Throughout the verification task

Results posted on

2025-11-26

Participant Flow

Participant milestones

Participant milestones
Measure	Pharmacists Licensed pharmacist were recruited to participate in a mock verification task. In this crossover design trial, each participant received all the three study interventions. The order of the arms was randomized.
No MI help STARTED	68
No MI help COMPLETED	50
No MI help NOT COMPLETED	18
Scenario 1 STARTED	50
Scenario 1 COMPLETED	50
Scenario 1 NOT COMPLETED	0
Scenario 2 STARTED	50
Scenario 2 COMPLETED	50
Scenario 2 NOT COMPLETED	0

Reasons for withdrawal

Reasons for withdrawal
Measure	Pharmacists Licensed pharmacist were recruited to participate in a mock verification task. In this crossover design trial, each participant received all the three study interventions. The order of the arms was randomized.
No MI help Technical issues	18

Baseline Characteristics

Two participants did not indicate their sex on the demographics survey.

Baseline characteristics by cohort

Baseline characteristics by cohort
Measure	Pharmacists n=50 Participants Licensed pharmacists in the United States with medication dispensing experience who are 18 or older.
Age, Categorical <=18 years	0 Participants n=50 Participants
Age, Categorical Between 18 and 65 years	50 Participants n=50 Participants
Age, Categorical >=65 years	0 Participants n=50 Participants
Age, Continuous	35.52 years STANDARD_DEVIATION 6.949 • n=50 Participants
Sex: Female, Male Female	34 Participants n=48 Participants • Two participants did not indicate their sex on the demographics survey.
Sex: Female, Male Male	14 Participants n=48 Participants • Two participants did not indicate their sex on the demographics survey.
Ethnicity (NIH/OMB) Hispanic or Latino	2 Participants n=50 Participants
Ethnicity (NIH/OMB) Not Hispanic or Latino	46 Participants n=50 Participants
Ethnicity (NIH/OMB) Unknown or Not Reported	2 Participants n=50 Participants
Race (NIH/OMB) American Indian or Alaska Native	0 Participants n=50 Participants
Race (NIH/OMB) Asian	6 Participants n=50 Participants
Race (NIH/OMB) Native Hawaiian or Other Pacific Islander	0 Participants n=50 Participants
Race (NIH/OMB) Black or African American	1 Participants n=50 Participants
Race (NIH/OMB) White	38 Participants n=50 Participants
Race (NIH/OMB) More than one race	2 Participants n=50 Participants
Race (NIH/OMB) Unknown or Not Reported	3 Participants n=50 Participants
Region of Enrollment United States	50 participants n=50 Participants

PRIMARY outcome

Timeframe: Throughout the verification task

Difference in task time measured by the number of seconds from starting the task to accepting or rejecting a medication image

Outcome measures

Outcome measures
Measure	Scenario #1 n=50 Participants MI help will be presented in the form of a pop-up message the participant's decision differs from the MI's determination.	Scenario #2 n=50 Participants MI help will be displayed concurrently with the filled and reference images.	No MI Help n=50 Participants Participants will complete the medication verification task without any MI help
Reaction Time	4727 millisecond (ms) Standard Deviation 1040	4510 millisecond (ms) Standard Deviation 1339	3668 millisecond (ms) Standard Deviation 924

PRIMARY outcome

Timeframe: Throughout the verification task

Difference in detection rate measured by the number of medication verification errors across all participants in the Arm/Group.

Outcome measures

Outcome measures
Measure	Scenario #1 n=50 Participants MI help will be presented in the form of a pop-up message the participant's decision differs from the MI's determination.	Scenario #2 n=50 Participants MI help will be displayed concurrently with the filled and reference images.	No MI Help n=50 Participants Participants will complete the medication verification task without any MI help
Decision Accuracy	238 Number of errors	230 Number of errors	291 Number of errors

PRIMARY outcome

Timeframe: After every trial in Scenarios 1 and 2

Population: This Outcome Measure was pre-specified to be only assessed for Scenarios 1 and 2. No data were collected for this Outcome Measure for the "No MI Help" scenario.

Participants will complete 100 mock medication verification trials in each of the study arms (i.e., Scenario 1, Scenario 2, and No Help). After each trial in Scenario 1 and Scenario 2, participants will use a visual analog scale (VAS) to respond to the question: "How much do you trust the AI advice?" The endpoints of the 100-point VAS are 'Not at all' to 'Completely trust'. Participants indicate their level of trust in the MI advice after every trial on a scale from 1-100, with higher scores indicating greater levels of trust. The trust change, as measured by the visual analog scale, will be calculated using the following formula: Trust change (i) = Trust(i) - Trust(i - 1), where i=2, 3, ..., 100. To compute a single, summarized value for the Trust Change variable within a specific scenario, the individual Trust Change scores measured from the trials are averaged. This averaging method provides a comprehensive measure of how trust shifted across the duration of the scenario.

Outcome measures

Outcome measures
Measure	Scenario #1 n=50 Participants MI help will be presented in the form of a pop-up message the participant's decision differs from the MI's determination.	Scenario #2 n=50 Participants MI help will be displayed concurrently with the filled and reference images.	No MI Help Participants will complete the medication verification task without any MI help
Trust Change	-0.1715 units on a scale Standard Deviation 21.1415	-0.0552 units on a scale Standard Deviation 19.3931	—

PRIMARY outcome

Timeframe: Post-intervention in Scenarios 1 and 2.

Population: This Outcome Measure was pre-specified to be only assessed for Scenarios 1 and 2. No data were collected for this Outcome Measure for the "No MI Help" scenario.

Trust will be assessed using the Muir \& Moray's (1996) Trust in Automation scale. Scores range from 0 to 100 with higher scores indicating greater levels of trust.

Outcome measures

Outcome measures
Measure	Scenario #1 n=50 Participants MI help will be presented in the form of a pop-up message the participant's decision differs from the MI's determination.	Scenario #2 n=50 Participants MI help will be displayed concurrently with the filled and reference images.	No MI Help Participants will complete the medication verification task without any MI help
Trust	53.65 scores on a scale Standard Deviation 33.12	60.37 scores on a scale Standard Deviation 31.95	—

SECONDARY outcome

Timeframe: Throughout the verification task

Population: Complete eye tracking data was not available for analysis for one participant. The MI plot was pre-specified to be only assessed for Scenarios 1 and 2. No data were collected for the Outcome Measure MI plot for the "No MI Help".

Participants' eye movements were tracked using a browser-based online eye tracking system. The outcome measure is the difference in cognitive effort as measured by fixation count in the defined areas of interest: fill image, reference image, or MI plot. Higher fixation rates indicate repeated interest in a certain area.

Outcome measures

Outcome measures
Measure	Scenario #1 n=49 Participants MI help will be presented in the form of a pop-up message the participant's decision differs from the MI's determination.	Scenario #2 n=49 Participants MI help will be displayed concurrently with the filled and reference images.	No MI Help n=49 Participants Participants will complete the medication verification task without any MI help
Cognitive Effort Area of Interest: Fill Image	3 Number of fixations Interval 2.0 to 6.0	3 Number of fixations Interval 2.0 to 6.0	3 Number of fixations Interval 2.0 to 5.0
Cognitive Effort Area of Interest: Reference Image	2 Number of fixations Interval 1.0 to 3.0	2 Number of fixations Interval 1.0 to 3.0	2 Number of fixations Interval 1.0 to 3.0
Cognitive Effort Area of Interest: MI plot	1 Number of fixations Interval 1.0 to 2.0	2 Number of fixations Interval 1.0 to 2.0	—

SECONDARY outcome

Timeframe: Throughout the verification task

Population: Complete eye tracking data was not available for one participant. The MI plot was pre-specified to be only assessed for Scenarios 1 and 2. No data were collected for the Outcome Measure MI plot for the "No MI Help".

Participants' eye movements were tracked using a browser-based online eye tracking system. The outcome measure is the difference in cognitive effort as measured by the duration of fixations in the defined areas of interest: fill image, reference image, or MI plot. Longer fixation duration indicates a higher cognitive load.

Outcome measures

Outcome measures
Measure	Scenario #1 n=49 Participants MI help will be presented in the form of a pop-up message the participant's decision differs from the MI's determination.	Scenario #2 n=49 Participants MI help will be displayed concurrently with the filled and reference images.	No MI Help n=49 Participants Participants will complete the medication verification task without any MI help
Cognitive Effort Area of Interest: Fill Image	687.5 millisecond (ms) Interval 407.75 to 1253.5	706.0 millisecond (ms) Interval 416.0 to 1250.0	619.5 millisecond (ms) Interval 367.0 to 1088.75
Cognitive Effort Area of Interest: Reference Image	399.0 millisecond (ms) Interval 235.0 to 591.0	419.0 millisecond (ms) Interval 239.0 to 623.5	365.0 millisecond (ms) Interval 229.0 to 531.009
Cognitive Effort Area of Interest: MI Plot	268.0 millisecond (ms) Interval 210.75 to 443.0	299.0 millisecond (ms) Interval 212.0 to 499.0	—

SECONDARY outcome

Timeframe: After completing 100 mock verification trials in each arm

Participants will complete 100 mock medication verification trials in each of the 3 arms. The workload of each arm will be measured by the NASA Task Load Index (TLX). The 5 TLX dimensions assessed are: mental demand, effort, temporal demand, performance, and frustration. For each dimension, participants will indicate their response to a single question. For 4 of the dimensions, the endpoints of the Likert scale are 'very low' and 'very high'. The performance dimension is reverse-scored, and the endpoints are 'perfect' and 'failure'. Participants then complete 10 pairwise comparisons of the dimensions by indicating which dimension they consider to be a more important factor (e.g., effort vs frustration). Each category score multiplied by its respective pairwise comparison count is summed and divided by 10 to get an overall weighted workload score. The result is an overall workload score between 1 and 20, with higher scores indicating higher workload.

Outcome measures

Outcome measures
Measure	Scenario #1 n=50 Participants MI help will be presented in the form of a pop-up message the participant's decision differs from the MI's determination.	Scenario #2 n=50 Participants MI help will be displayed concurrently with the filled and reference images.	No MI Help n=50 Participants Participants will complete the medication verification task without any MI help
Workload	7.5 score on a scale Standard Deviation 3.8	6.9 score on a scale Standard Deviation 3.4	7.4 score on a scale Standard Deviation 3.7

SECONDARY outcome

Timeframe: After completing 100 mock verification trials in each arm

Participants will complete 100 mock medication verification trials in each of the 3 arms (No MI Help, Scenario 1, and Scenario 2). After completing 100 trials, participants will assess the mock verification interface using the System Usability Scale (SUS). The SUS is comprised of 10 statements that participants indicate their agreement with using a 5-point Likert scale ranging from strongly agree to strongly disagree. Odd-numbered questions have a positive response and even-numbered questions are reverse-scored. Scores are summed and multiplied by 2.5 to get a final SUS score. SUS scores range from 0 to 100 with higher scores indicating greater usability. An average SUS score is considered to be 68. Anything below 50 is "Not Acceptable. Scores between 51-70 are considered "Marginal", those above 71 are considered "Acceptable", and those at 80 or above are indicative of high usability.

Outcome measures

Outcome measures
Measure	Scenario #1 n=50 Participants MI help will be presented in the form of a pop-up message the participant's decision differs from the MI's determination.	Scenario #2 n=50 Participants MI help will be displayed concurrently with the filled and reference images.	No MI Help n=50 Participants Participants will complete the medication verification task without any MI help
Usability	70.4 score on a scale Standard Deviation 16.4	73.0 score on a scale Standard Deviation 15.6	72.1 score on a scale Standard Deviation 16.5

Adverse Events

No MI Help

Serious events: 0 serious events

Other events: 0 other events

Deaths: 0 deaths

Scenario #1

Serious events: 0 serious events

Other events: 0 other events

Deaths: 0 deaths

Scenario #2

Serious events: 0 serious events

Other events: 0 other events

Deaths: 0 deaths

Serious adverse events

Adverse event data not reported

Other adverse events

Adverse event data not reported

Additional Information

Dr. Corey Lester

University of Michigan

Phone: 734-647-8849

Email: [email protected]

Results disclosure agreements

Principal investigator is a sponsor employee
Publication restrictions are in place