Current Issue Highlights

Latest Articles

  • Image

    Coaching inexperienced clinicians before a high stakes medical procedure: randomized clinical trial

    BMJ. December 28, 2024: 387:e080924

    ABSTRACT

    OBJECTIVE

    To assess whether training provided to an inexperienced clinician just before performing a high stakes procedure can improve procedural care quality, measuring the first attempt success rate of trainees performing infant orotracheal intubation.

    DESIGN

    Randomized clinical trial.

    SETTING

    Single center, quaternary children's hospital in Boston, MA, USA.

    PARTICIPANTS

    A non-crossover, prospective, parallel group, non-blinded, trial design was used. Volunteer trainees comprised pediatric anesthesia fellows, residents, and student registered nurse anesthetists from 10 regional training programs during their pediatric anesthesiology rotation. Trainees were block randomized by training roles. Inclusion criteria were trainees intubating infants aged ≤12 months with an American Society of Anesthesiology physical status classification of I-III. Exclusion criteria were trainees intubating infants with cyanotic congenital heart disease, known or suspected difficult or critical airways, pre-existing abnormal baseline oxygen saturation <96% on room air, endotracheal or tracheostomy tubes in situ, emergency cases, or covid-19 infection.

    INTERVENTIONS

    Trainee treatment group received preoperative just-in-time expert intubation coaching on a manikin within one hour of infant intubation; control group carried out standard practice (receiving unstructured intraoperative instruction by attending pediatric anesthesiologists).

    MAIN OUTCOME MEASURES

    Primary outcome was the first attempt success rate of intraoperative infant intubation. Modified intention-to-treat analysis used generalized estimating equations to account for multiple intubations per trainee participant. Secondary outcomes were complication rates, cognitive load of intubation, and competency metrics.

    RESULTS

    250 trainees were assessed for eligibility; 78 were excluded, 172 were randomized, and 153 were subsequently analyzed. Between 1 August 2020 and 30 April 2022, 153 trainees (83 control, 70 treatment) did 515 intubations (283 control, 232 treatment). In modified intention-to-treat analysis, first attempt success was 91.4% (212/232) in the trainee treatment group and 81.6% (231/283) in the control group (odds ratio 2.42 (95% confidence interval 1.45 to 4.04), P=0.001). Secondary outcomes favored the intervention, showing significance for decreased cognitive load and improved competency. Complications were lower for the intervention than for the control group but the difference was not significant.

    CONCLUSIONS

    Just-in-time training among inexperienced clinicians led to increased first attempt success of infant intubation. Integration of a just-in-time approach into airway management could improve patient safety, and these findings could help to improve high stakes procedures more broadly. Randomized evaluation in other settings is warranted.

    TRIAL REGISTRATION

    ClinicalTrials.gov NCT04472195.

  • Image

    Dexterity assessment of hospital workers: prospective comparative study

    BMJ. December 28, 2024: 387:e081814

    ABSTRACT

    OBJECTIVES

    To compare the manual dexterity and composure under pressure of people in different hospital staff roles using a buzz wire game.

    DESIGN

    Prospective, observational, comparative study (Tremor study).

    SETTING

    Leeds Teaching Hospitals NHS Trust, Leeds, UK, during a three week period in 2024.

    PARTICIPANTS

    254 hospital staff members comprising of 60 physicians, 64 surgeons, 69 nurses, and 61 non-clinical staff.

    MAIN OUTCOME MEASURES

    Successful completion of the buzz wire game within five minutes and occurrence of swearing and audible noises of frustration.

    RESULTS

    Of the 254 hospital staff that participated, surgeons had significantly higher success rates in completing the buzz wire game within five minutes (84%, n=54) compared with physicians (57%, n=34), nurses (54%, n=37), and non-clinical staff (51%, n=31) (P<0.001). Time-to-event analysis showed that surgeons were quicker to successfully complete the game, independent of age and gender. Surgeons exhibited the highest rate of swearing during the game (50%, n=32), followed by nurses (30%, n=21), physicians (25%, n=60), and non-clinical staff (23%, n=14) (P=0.004). Non-clinical staff showed the highest use of frustration noises (75%), followed by nurses (68%), surgeons (58%), and physicians (52%) (P=0.03).

    CONCLUSIONS

    Surgeons showed greater dexterity, but higher levels of swearing compared with other hospital staff roles, while nurses and non-clinical staff showed the highest rates of audible noises of frustration. The study highlights the diverse skill sets across hospital staff roles. Implementation of a surgical swear jar initiative should be considered for future fundraising events.

  • Image

    Age against the machine—susceptibility of large language models to cognitive impairment: cross sectional analysis

    BMJ. December 28, 2024: 387:e081948

    ABSTRACT

    OBJECTIVE

    To evaluate the cognitive abilities of the leading large language models and identify their susceptibility to cognitive impairment, using the Montreal Cognitive Assessment (MoCA) and additional tests.

    DESIGN

    Cross sectional analysis.

    SETTING

    Online interaction with large language models via text based prompts.

    PARTICIPANTS

    Publicly available large language models, or “chatbots”: ChatGPT versions 4 and 4o (developed by OpenAI), Claude 3.5 “Sonnet” (developed by Anthropic), and Gemini versions 1 and 1.5 (developed by Alphabet).

    ASSESSMENTS

    The MoCA test (version 8.1) was administered to the leading large language models with instructions identical to those given to human patients. Scoring followed official guidelines and was evaluated by a practising neurologist. Additional assessments included the Navon figure, cookie theft picture, Poppelreuter figure, and Stroop test.

    MAIN OUTCOME MEASURES

    MoCA scores, performance in visuospatial/executive tasks, and Stroop test results.

    RESULTS

    ChatGPT 4o achieved the highest score on the MoCA test (26/30), followed by ChatGPT 4 and Claude (25/30), with Gemini 1.0 scoring lowest (16/30). All large language models showed poor performance in visuospatial/executive tasks. Gemini models failed at the delayed recall task. Only ChatGPT 4o succeeded in the incongruent stage of the Stroop test.

    CONCLUSIONS

    With the exception of ChatGPT 4o, almost all large language models subjected to the MoCA test showed signs of mild cognitive impairment. Moreover, as in humans, age is a key determinant of cognitive decline: “older” chatbots, like older patients, tend to perform worse on the MoCA test. These findings challenge the assumption that artificial intelligence will soon replace human doctors, as the cognitive impairment evident in leading chatbots may affect their reliability in medical diagnostics and undermine patients' confidence.

Most Popular Articles