We conducted a prospective, randomized, observational study at 3 medical institutions. Study participants were recruited from within the anesthesiology residency programs at the Mt. Sinai Health System (New York, New York), The Ohio State University (Cleveland, Ohio), and the University of North Carolina (Chapel Hill, North Carolina). Approval from the program for the protection of human subjects at each institution was obtained and written consent was required of all voluntary subjects.
All participants were anesthesiology residents in good standing at each of the 3 departments.
Residents with any administrative disciplinary actions, hiatuses from clinical practice, or other perceived confounding performance-affecting variables (e.g., failing in-training exam scores, below average faculty performance appraisals, loss of training credit) were excluded. We used a previously validated simulation scenario for our study, with performance standards dictating the essential actions that must be performed (or not performed) to avoid major morbidity or mortality and a behaviorally anchored rating scale (BARS) to measure technical and non-technical performance. Our simulated encounter was modified from one of the Agency for Healthcare Research and Quality (AHRQ)-funded study scenarios designed to gauge performance during a simulated intraoperative hemorrhage, and previously described by Weinger et al.
19 Our experimental and control study scenarios were identical in all respects except the dialogue and demeanor of the surgeon (Appendix 1).
In brief, the scenarios differed between groups only in respect to the surgeon’s dialogue with the participant and an actor portraying the circulating nurses. The same cohort of actors served as the acting surgeon and nurse within each departmentally administered scenario to minimize variability between cases.
The intervention group’s surgeon was portrayed as impatient, but not overly aggressive or intimidating (i.e., actors were instructed not to scream, become physically intimidating, or use abusive or inappropriate language). The control group’s surgeon was courteous and straightforward. The dialogue utilized was reviewed by an independent panel of 5 board-certified anesthesiologists at the primary site, but not involved in the study design or execution, to determine if the observed dialogue and scripted behaviors were considered unlikely, likely, or very likely to be encountered by a typical anesthesiology resident throughout their training. Unanimous agreement was achieved after only one round of review with each panel member rating the dialogue as very likely.
Each scenario was standardized as described by Weinger et al, utilizing a guide that delineated details of the delivery of the scenarios, scripts, and “rules” for scenario delivery (e.g., contents of the simulated clinical environment, the was wereevolution of the patient’s medical presentation and their responses to interventions, standardized answers to anticipated participant questions, and criteria that defined successful completion of each expected action. Each script outlined the timing and content of key phrases or comments to be made by the actors portraying either surgeon or nurse.
For each study case, high-quality digital andto theaudio recordings waswere collected. The videos were made anonymous (face and voice were altered) and then assessed by three board-certified anesthesiologist raters who graded performance independently and were blinded to the source institution. Raters received batches of videos in a predetermined, counter-balanced order. The same rater was not assigned multiple encounters conducted at a single site on the same day. The raters were instructed not to score a performance if they recognized a participant.4 exemplar videos were created showing good and poor performance on either the normal or “rude” scenario. Raters participated in a one-day in-person training session. They were instructed on the use of the rating software and practiced viewing and rating the exemplar videos. Rater calibration was assessed during training until the raters’ checklist ratings matched the consensus ratings exactly, and their non-technical scores were no more than 1-point from the consensus rating. Non-technical skills were assessed using the BARS score employed by Weinger et al for this scenario. Performance in the scenarios was graded by two attending anesthesiologists. If there was a divergence in scoring or greater than one point or in a yes/no field a third anesthesiologist graded the video. Dichotomous items were then scored based on a majority vote, while ANTS scoring was averaged between the three scores.
Technical performance was measured with: 1) the percentage of the scenario’s checklist actions completed, and 2) holistic ordinal scores of overall technical performance. Behavioral performance was measured with: 1) numerical ratings made using Behaviorally Anchored Rating Scales (BARS) of four categories of skills: Vigilance, Communication, Decision-making, and Teamwork; and 2) holistic ordinal scores of overall behavioral performance. Further details on how this scoring was derived can be found in Weinger et al.
Personality surveys aimed at eliciting perceptions of incivility and/or criticism were collected from participants at least 4 weeks before the encounter, tothe minimize confounding with the simulated intervention. The surveys utilized were the Brief Fear of Negative Evaluation Scale and the Sensitivity to Criticism Scale. The Brief Fear of Negative Evaluation Scale is a validated, widely cited questionnaire designed to measure fear of negative evaluation, often used in the psychological literature as an assessment of social anxiety (Collins 2005, Leary 1983, Weeks 2005). The scale is composed of 12 statements of fearful or worrisome situations, and respondents indicate the extent to which each item describes onethemselvesself on a 5-point Likert scale. The numerical responses to the questionnaire are summated for a cumulative score.
The Sensitivity to Criticism Scale is a survey designed to measure perceptual and emotional responses to criticism (Atlas 1994). Subjects are asked to imagine themselves in situations that provide a range of domains in which criticism would take place, totaling t a 30-item survey. Responses for all items are made on a 7-point Likert scale and summed, resulting in an aggregate index of sensitivity to criticism. Reviewers did not have access to this data during the grading process. All recorded data points were stored on Research Electronic Data Capture (REDCap), a secure web server for analysis. Task performance such as blood ordering and fluid administration were assessed via yes/no responses.
Statistical analysis was performed using SPSS (IBM, Armonk to, NY) in consultation with a statistician. Results are reported as median [IQR] due to their non-normality. Statistical tests performed are reported with their corresponding results, and include Mann-Whitney U tests for non-normally distributed continuous variables as well as chi-squarepostgraduate test for categorical variables. In addition to grouping by intervention, separate statistical analysis was performed between sites to assess for similarity (See Supplemental Table 1). Likewise, Cohen kappa statistics for interrater reliability were performed to assess for agreement between reviewers. Univariable binary logistic regression was performed to assess the impact of incivility on whether or not the participant met the standard for adequate performance. Multivariable binary logistic regression was then performed to assess the same question controlling for confounders including gender, age, sit ost graduate year, gender of the surgeon, prior simulation experience, as well as personality test scores.
Healthcare and Society. (2022, Apr 29). Retrieved from https://paperap.com/healthcare-and-society/