Teaching methods used by general surgeons in simulation or formative assessment leading to improved surgical skill acquisition in UK surgical trainees: a scoping review

Abstract

Background:

The UK general surgery (GS) training faces challenges despite curriculum reforms, this includes varied teaching quality, limited simulation teaching, reduced training opportunities and curriculum flaws. Recent national surveys highlight discrepancies between GS trainees’ dissatisfaction in comparison to other specialities. The aim is to identify teaching methods in simulation or formative assessment leading to improved surgical skill acquisition in GS.

Methods:

A scoping review, following the PRISMA-ScR guidelines, identified studies from the last decade using Pubmed/Medline, EMBASE, ERIC and Web of Science. Data was extracted into tables and reviewed using basic numerical analysis. The charted data was analysed using narrative synthesis and categorized using the Kirkpatrick model, with alignment to Miller’s Pyramid, to assess effectiveness of competency.

Results:

31 studies highlighted a range of simulation and assessment strategies. 26% were categorized into the lowest form of competency, Kirkpatrick Levels 1(Miller’s “Knows”) and showed enhanced trainee confidence. 3% were graded as level 4 (Miller’s “Does”) and demonstrated competency through clinical outcomes. 35% were a mix of methods to assess learning on multiple levels simultaneously allowing in-depth multimodal assessment of learning.

Conclusion:

This paper suggests that an imbalance exists between the quantity of research versus the quality of research that formally assesses surgical skills acquisition at higher levels of competency. Thereby suggesting a demand for higher and reliable research to bridge the current gap in research and develop a more competency-based curriculum embedded with validated tools aligning with the higher levels of competency and supporting surgical autonomy and patient safety.

Keywords:

General surgery

simulation

formative assessment

surgical education

scoping review

UK training

Introduction

Over the past two decades, it has become increasingly recognized that the historically rooted "see one, do one, teach one" model is no longer sufficient for preparing modern surgeons(1)]. In the United Kingdom (UK), the General Surgery (GS) curriculum underwent significant reforms, including a more holistic approach and more flexibility within training pathways. Despite these efforts, challenges persist. These include limited low fidelity simulation practice, absence of research in surgical education and inconsistent guidance on training(2). The burden on trainees is intensified by financial pressures, staff shortages and lack of dedicated teaching times; all of which influence motivation and career decisions(3). These systemic issues require a change in the current curriculum to a more structured, uniform format with standardized and researched high-quality teaching methods across the UK. Without such modifications, not only does one risk trainee burnout and disparities in training but also nationwide inconsistencies in surgical competence and patient care.

Intercollegiate Surgical Curriculum Programme (ISCP)

The national UK Surgical training pathway begins with the two-year Foundation Programme (FY1–FY2) and is then followed by the Core Surgical Training (CST). The latter also spans two years and exposes trainees to broad surgical exposure to core technical and non-technical skills. The following entry into GS specialty training (ST3-8) is highly competitive and signals the movement towards independent clinical practice. The training path is structured by the ISCP and overseen by the Joint Committee on Surgical Training (JCST).

Competence progression must be demonstrated in key index procedures such as appendicectomies, emergency laparotomies and hernia repairs, through Work-based assessments (WBAs) and Procedure-Based Assessments (PBAs). Their purpose is to evaluate operative planning, performance and post-op reflections. To attain the ability for unsupervised practice, a trainee must demonstrate these skills to a level 4 PBA(4) in all index procedure and record operations in a log-book to lead to completion of training, also known as ‘Certificate of Completion of Training’ (CCT).

The current form of WBAs function as formative and summative are suggested to align with Miller’s Pyramid [Figure 1], a model to assess clinical competence that moves from the more theoretical based assessments to more real-life clinical performance(5, 6). However, research suggests that concerns remain over their effectiveness(7), prompting a need for further research.

Fig. 1

Miller’s Pyramid of Clinical Competence

UK GS training Issues

Further training issues include external pressure, especially the COVID-19 pandemic as it exacerbated pressure on an already strained system and therefore resulted in significant loss of operative exposure and diminished opportunities for training progression(9). Consequences of this led to disruptions in the ‘Annual Review of Competency Progression’ outcomes and thus extended training periods(10). A UK study found that 81% of surgical trainees reported a decline in mental well-being during the pandemic (11). This coincided with a broader dissatisfaction demonstrated in the 2023/24 JCST survey, demonstrating a disparity between GS and other specialities. Only 52%, a 5% further decline from the previous year, of GS trainees felt that theatre-based quality indicators were met; they scored the lowest among all surgical specialties(12). In addition, only 45% felt their clinical goals were achieved, with only vascular surgery scoring worse(12). Despite the formal requirement of two-hourly weekly protected teaching times, only 70% of trainees reported this as being met. Similarly, only 69% of GS trainees reported the opportunities for mandatory simulation training, in contrast to 92% of Ears, Nots and Throat and 86% of paediatric trainees.

The trainee dissatisfaction and inconsistencies across different specialities is demonstrating a variability in teaching quality across different specialities(12) and thus runs the risk of discrepancies in patient outcomes subjected to different specialities. These trends demonstrate an urgent demand to re-evaluate the GS educational programme.

Surgeon as Educators

Mastery of surgical skills does not develop from theory alone but through repetitive, hands-on practice(13, 14), hence the pivotal role of a senior in shaping the progression of a trainee through direct supervision. In spite of this expected role, surgeons are not required to obtain a qualification in pedagogy(15). NHS England’s 2024 review demonstrated that neither CST nor ST curriculum mandates such formal qualification(15, 16), this throws light on the systematic gap in the curriculum(17). With the aforementioned curriculum issues, this lack of standardized teaching runs the risk of undermining both patient care(18) and the sustainability of surgical training itself(19, 20).

Surgical Education Learning Theories

Understanding of the theoretical frameworks that underpin effective teaching is crucial in improving surgical educational outcomes. Kolb’s Experiential Learning Theory(21) is particularly suited to surgery, outlining a four-stage cycle: concrete experience, reflective observation, abstract conceptualisation, and active experimentation; demonstrated by a trainee performing a procedure, reflecting on it, identifying improvements, and applying these to future scenarios. Reflection is a central process widely promoted in the medical field and supports the transfer of cognitive to clinical knowledge and thus progression of learning(22). Dewe’s (1933) concept of reflective thinking depicts it as a response to cognitive doubt prompting problem-solving and has been a foundation in defining critical thinking(23). This suggests that critical reflection in surgical training could drive continual self-improvement and adaptation(24, 25). Reflection is seemingly incorporated into the progression of Miller’s Pyramid of competency[FIgure1] from cognitive knowledge to clinical performance(5). Further research into curriculum and teaching methods should be based on such foundational learning theories and pedagogy.

Gap in Literature

A review of current literature reveals a significant gap in GS curriculum and effective teaching methods. Whilst some research explores surgical assessment strategies(26), there is a lack of teaching methodologies focusing on GS trainees within the UK. This is significant given the notable discrepancies in reported satisfaction of GS trainees compared to other specialities(12). In addition, UK training carries its own hurdles that must be researched in isolation to other countries. To date, no scoping or systematic review has comprehensively examined teaching and assessment methods specific to UK GS to ascertain competence and whether the current WBA based strategies are most appropriate. This review seeks to address this disparity.

Aim, Research Question

Research Question

What teaching methods used by general surgeons in simulation or formative assessment lead to improved surgical skill acquisition in UK surgical trainees?

Objectives

To identify specific teaching methods employed by general surgeons.

To contrast effective and ineffective teaching in terms of improved surgical skill acquisition.

To analyse existing research to inform and optimise future teaching strategies in surgical education.

METHODOLOGY

A scoping review was performed due to limited volume of existing research, therefore making a full systematic review unsuitable(27). This approach allows the mapping and organization of a vast topic whilst also identifying current gaps within the current literature(28). The five-stage methodological framework outlined by Arksey and O'Malley(27) was followed, which includes (a)research question identification, (b) identification of relevant studies, (c) study selection, (d) data charting, and (e) collating and reporting the results. The review was conducted in accordance with PRISMA-ScR guidelines for scoping reviews(29).

Eligibility criteria

Studies required to meet the eligibility criteria to be included. Firstly, GS trainees in the UK were the target population as the research question is addressing this population specifically. International studies were excluded as they vary in curriculum, funding, training duration, available resources, training equipment and possibly teaching qualifications.

Medical students were also excluded due to discrepancies in their curriculum and learning objectives. The target trainees were defined as postgraduate doctors pursuing a career in GS, including those without an official training post (e.g., trust-grade doctors). Eligible teaching methods encompassed any form of simulation or formative assessment strategy that could lead to surgical skill acquisition.

Search strategy

A systematic search was performed on the 18th February 2025 of Pubmed/Medline, EMBASE, ERIC and Web of Science. The following Boolean operators were used to identify the papers via OVID: GB OR Great Britain OR UK OR United Kingdom) AND (Teaching OR Teach* OR training OR mentoring OR mentor*) AND(General surg* OR colorectal OR HPB OR hepato-pancreato-biliary OR UGI OR Upper gastrointestinal) AND (Simulation OR formative OR assessment) AND (Surg* skill* OR technical OR competence) AND (Surgical trainee* OR surgical resident* OR Student OR graduate). Web of Science was not accessible via OVID and hence the same search was completely using the same terms. The search was limited to the last ten years and to the English language. Additional relevant papers were identified through tracking references to enhance literature saturation. Grey literature (e.g. abstracts) were considered as these suggested innovative topics with robust outcomes.

Screening

The identified files were exported in RIS file and imported to Zotero reference manager(30). This facilitated the upload to Rayyan(31), a research collaboration platform that supports systematic reviewing of papers. The first author (AA) screened the eligibility of the retrieved papers based on title, abstract and then full paper; this was then followed by a second blinded reviewer (EH), using the same eligibility criteria, to increase reliability,

Charting

Data from the selected studies was tabulated in Microsoft Excel and categorized under major subheadings: paper ID, study design, aim and outcomes. A simple numerical analysis was conducted, followed by a qualitative narrative synthesis organizing the studies according to the four levels of training evaluation of Kirkpatrick’s framework. This widely used model assesses the effectiveness of educational programs(32) starting with the learners’ immediate reaction to learning (level 1); followed by the evaluation of learning by testing the gain of knowledge (level 2). In level 3, it evaluates application of a new skill, and finally, it measures the broader impact of learning e.g. patient outcomes (level 3)(32) (Fig. 2).

Fig. 2

Kirkpatrick Model(32)

Table 1

Summary of Articles Reviewed categorized by Kirkpatrick level
Level	Study,Year,Trainee,level	Study design	Aim	Outcome
1	(33) Aawsaj, Y., et al. (2025), N:10, ST 3–8	Simulation assessment + semi-structured interview	Trainee perceptions of laparoscopic simulation as a summative tool	Simulation built confidence but there was no substitute for live experience. Barriers included NHS pressures, trainer resistance, and dissatisfaction with PBAs.
1	(34) Shalhoub, J., et al. (2017), N:10, CST	Semi Structured interviews	Trainee views on the value and use of PBAs	PBAs are seen as inconsistent and variable by grades. Concerns over validity, especially self-completion. But useful for tracking progress if supported well.
1	(35) Blackhall, V., et al. (2019), N:43, CST	Home simulation + online module; focus group feedback	Identify barriers in engagment to home-based simulation	Engagement limited by motivation, trainer input, and system pressures. Metric feedback felt impersonal; low-fidelity tools and 'tick-box' culture limited perceived value.
1	(36)Singh, P., et al.(2015), N:51, ST1-3	Regional Survey using S-QAT / Likert score	Identify variation in training quality across centers.	Supervision was rated highly, but access to cases, outpatient goals, and teaching varied. Recommendations: protected theatre time and improved training structure.
1	(37) Fleming, C., et al. (2019), N:24, ST?	Descriptive Study with live polling.	Explore opinions on GS fellowships.	Trainees supported fellowships as OOPT, preferred international options, and favoured post-CCT structure within a standardised curriculum.
1	(38)Gaunt, A., et al. (2018), N:42, ST1-8	Qualitative multicentre focus groups	Explore trainees' feedback-seeking behavior.	WBAs driven by self-motives; formative assessment aided self-improvement, WBAS could limit honest reflection. Authors called for feedback reform to support openness.
1	(39) Room, H.J., et al. (2020), N:39, CST	Simulation camp + satisfaction questionnaire	Teach core surgical trainees basic entry level skills.	Trainees valued the camp’s timing, relevance, and consultant-led feedback. Enhanced confidence and mentorship was value.
1	(40)Skervin, A.L., and Scott, H.J., et al. (2021), N:74, CST + ST3-8	Self-report questionnaire on use of MR.	Assess MR use amongst trainees and consultants.	MR used by 91.5%, especially for complex cases. Reported benefits included improved focus, clarity, planning, and anticipation.
1 + 2	(41) Shariff, U., et al, (2015), N:59, ST1-8	RCT, post-intervention knowledge test and evaluation	Assess multimedia tool vs. traditional teaching for colorectal surgery	Both groups improved knowledge equally; trainees appreciated the tool for enhancing decision-making and anatomical understanding.
2	(42)Kailavasan, M., et al. (2020), N:93, ST	Simulation bootcamp with abdominal wall model; post-simulation Likert questionnaires	Assess face validity of a low-fidelity simulator for laparoscopic port insertion	Trainees and faculty rated the model highly; it was deemed effective for port training in both urology and GS with no validity concerns.
2	(43)Hand, F., et al. (2017), N:17, ST1-3‌	Retrospective analysis; similarity between admission and discharge diagnoses	Assess if structured handover reflects diagnostic skill	Diagnostic accuracy improved using the handover tool; scores were based on concordance of key findings, plan and diagnosis.
2	(44)Yiasemidou, M., et al. (2017), N:20, CST	Case controlled study on MR; metrics of simulator: time, motion, safety	Assess impact of 3D visual aids in MR on surgical performance	3D MR improved efficiency (time, movement, path length) vs. controls; no safety differences; authors recommended combining mental imagery with anatomical models
2 + 3	(45)Yule, S., et al. (2015), N:16, ST4-6	RCT on simulation; NOTSS scoring for laparoscopic cholecystectomy	Assess the effect of coaching on non-technical skills in a simulated theatre	Coaching improved NOTSS scores and crisis responses; no change in time + path length.
2 + 3	(46)Ramjeeawon, A., et al. (2020), N:16, ST1-3 & MD students	Simulation with structured debrief; NOTSS, OTAS, STAI	Assess impact of immersive simulation and debrief on teamwork, stress and technical skills.	Teamwork, technical performance, radiation safety, and psychological state improved post-debrief across all training levels.
3	(47) JCST,. (2023), N:?, ST1-8	Report on improving surgical training (IST) pilot trainee feedback	Summary of UK IST trainee Feedback	English trainees reported fewer ST3 opportunities; concerns less common in Scotland/Wales. Most IST posts are being phased out in GS.
3	(48)Allum, W., et al. (2020), N20, ST1-8	Report on IST pilot trainee feedback	Identify areas for improvement in IST programme	Recommended dedicated rota time for training, more elective/sim access in curriculums. Limited Trainer time due to service pressure. Suggested ACP/SCP role expansion, requiring training and HEE support.
3	(49)Clarke, R., et al. (2024), N:26, ST3	Prospective analysis; simulation, lectures, labs; self-assessment using Likert feedback	Develop GS bootcamp to support ST3 transition	Trainees reported significant skill improvement in endoscopy, laparoscopy, open surgery, and non-technical areas. Confidence rises post-course, especially in laparoscopic suturing (77%), ulcer repair (69%), and stress management.
3	(50)Clarke, R., et al. (2024), N:25, ST3	-Simulation with pig tissue model -Feedback questionnaire	Assess face-validity of a low-cost model for teaching acute proctology during the ST3 general surgical bootcamp.	Confidence rose across key proctology tasks; 80% rated the model highly for realism and training value.
3	(51)Metcalfe, K., et al. (2021), N16, CST	Office admin simulation; post-course questionnaire	Evaluate admin simulation for developing non-technical consultant skills	Trainees rated the pilot programme as useful and felt it prepared them for consultant roles. All supported adding it to regional teaching, finding it relevant and well received.
3	(52)Boyle, M., et al. (2021), N?, FY/TG	One-day workshop; pre/post-course questionnaires	Assess clinical decision-making and technical readiness for acute surgical care	All trainees found it useful and relevant; authors encourage inclusion in regional training.
3	(53)Hosny, S.G., et al. (2017), N:37, ST3-8	Multinational qualitative study; semi-structured interviews	Identify barriers/facilitators to uptake of simulation programmes.	Simulation valued for safety and assessment, but hindered by time, cost, motivation; trainees less confident in its high-stakes validity.
3	(54)Rajaratnam, V., et al. (2021) NA	Review of modular laparoscopic training; motor learning theories	Propose evidence-based model for surgical skill acquisition.	Advocates low-cost, self-directed model using mental imagery, deliberate practice, and instructional design to build skill with limited simulation.
3 + 4	(55)Shalhoub, J., et al. (2015), N:?, ST1-8	Descriptive analysis; ISCP usage data	Examine WBA use by region, specialty, and training level	WBA use increased sevenfold (2007–2013); CSTs completed more than STs. PBAs were preferred by STs, with regional variation in volume and type.
3 + 4	(56) Brown, C., et al. (2017), N:84, ST3-8	Service evaluation; PBA trajectory vs. case volume	Evaluation of performance trajectory of index procedures in relation to operative experience, indicative numbers, and training time	Learning curves and PBA timing varied. Few PBAs completed post CCT. Level 4, limiting assessment beyond minimum competence.
3 + 4	(57)Abdelrahman, T., et al. (2016), N69, ST3-8	Service evaluation on PBA learning curve gradients of index procedures	Examine relationship between index numbers and PBA attainment in key procedures	Laparotomy targets deemed sufficient; complex cases like Hartmann’s required > 3× target to reach competence. Authors suggest revising JCST indicative numbers.
3 + 4	(58)De Siqueira, J. R., and Gough, M.J., et al. (2016), N:121, ST3-8	Descriptive analysis; ISCP and eLogbook review	Correlate operative volume with trainer-assessed competence	PBA scores aligned with case volume, but many trainees failed to reach Level 4 despite index numbers. Highlights inconsistency in progression and questions validity of certification.
3 + 4	(59)Abdelrahman, T., et al. (2015), N:89, ST??	Descriptive analysis; ISCP data review	Assess whether GS trainees meet CCT operative and academic targets	Most achieved procedural and academic targets. Authors advise early support and simulation use for underperforming trainees.
3 + 4	(60)Elsey, E.J., et al, (2019), N:311, FY1-2&CST	Cohort study; ISCP and e-logbook review	Assess operative experience and competency progression using national data	Trainees progressed to unsupervised basic procedures through training; complex cases required longer. Training data reflects evolving competence and decision-making.
3 + 4	(61) Abdel-dayem, M., et al. (2021), N:35, CST&SHO	Structured modular training; progression metrics and trainee questionnaire	Develop a reproducible laparoscopic Colorectal Surgery (LCS) training model supporting independent practice	98% satisfaction; most achieved independent LCS. Low conversion rates and good outcomes reported. Keen to maintain the programme. Staffing shortages noted as a barrier.
1–4	(62)James, H. K., et al. (2019), N: 2002	Systematic review; assessed cadaveric simulation across Kirkpatrick levels	Evaluate evidence for cadaveric simulation in postgraduate training	Improved confidence, test scores, and procedural skills. Behavioural transfer inconsistent; limited long-term evidence for patient-level impact.
4	(63) Hanna, G.B., et al, (2022), N:108	Case-control study; comparing clinical outcomes of colorectal cases performed by lapco-vs non-lapco surgeons.	Examine impact of national training programme LAPCO clinical outcome performed by Lacpco surgeons after training completion	Increased rates of laparoscopic colorectal cancer surgery, reduced mortality and morbidity. In-training competency assessment tools predicted clinical performance after training.

A list definition can be found in [Appendix 1].

RESULTS

The initial search generated 201 papers and an additional 3 were identified through alternative sources, after removing 48 duplicates in ovid and 5 in Rayyan, 151 papers remained. After screening the titles and abstracts against the eligibility criteria, 78 and 5 papers were excluded respectively to leave 67 records for full text analysis(Fig. 3). A further 36 records were removed during the full-text article review, due to incorrect geography (n = 29), incorrect population (n = 2) e.g. med students, and wrong intervention (n = 5) e.g. endoscopy. Despite endoscopy being performed by many GS surgeons, the training follows a different curriculum and was therefore excluded. Some studies often included other geography/population in their study, they were only included if the majority of studies were relevant. One urology paper was not excluded, as the procedure tested was equally performed by GS trainees. A vascular paper was not excluded as GS and vascular training paths only separated in 2022(64). One scoping review was removed, yet the decision was made to keep one systematic review as it matched this paper’s methodology perfectly, capturing all Kirpatrick’s levels. This left 31 final studies for the scoping review.

Fig. 3

PRISMA diagram of included studies

Study design and setting

The included studies fell into five broad methodological categories: self-report measures, objective or performance-based assessments, descriptive analyses, and multi-methods. Thirteen used self-reported data, three through interviews, seven via questionnaires, two with focus groups and one via polling. Seven studies employed objective assessments: five assessed technical performance, while others used tools such as S-QAT (a self-assessment tool appraising training quality), OSATS (evaluating surgical technical skills), STAI (measuring state and trait anxiety), debrief scores, and NOTSS (assessing non-technical surgical skills). Eight were descriptive in design, including six studies analysing ISCP and/or eLogbook data and two based on training reports. One paper was a systematic review.

Distribution by Kirkpatrick level

The papers were then categorised according to Kirkpatrick's Four Levels of Training Evaluation (Fig. 2); many papers assessment incorporated multiple levels. They were distributed most commonly in level 1 with eight papers, then eight in level 3 and seven were under a combination of levels 3 and 4. Three papers were categorized as level 2. Two fell into level2/3 and then one paper in each of the following categories: level1/2, level 1/2/3/4, level 4(Fig. 4).

Fig. 4

Bar chart of study distribution across Kirkpatrick levels

Level 1[Table 1]

Eight studies [Figure 4] assessed trainee reactions to simulation or formative assessment interventions, focusing on perceptions, confidence, and engagement rather than measurable outcomes. Data were largely qualitative, drawn from interviews and questionnaires. Trainees often reported increased motivation and confidence, particularly following simulation-based teaching such as bootcamps and skills camps. There was strong support for structured, modular, and mental rehearsal-based approaches (40), though barriers to engagement included service demands, trainer resistance, and a tick-box WBA culture (e.g.33–35).

Level 2[Table 1]– Learning

Three studies [Figure 4] measured evaluation of learning through knowledge e.g., impact of 3D diagnostic skills(43) and visual aids(44). Outcomes were demonstrated with objective metrics like post-test scores, NOTTS, S-QAT and OSATS. Improvements were seen especially if combined with mental rehearsal (44) structured debrief (46). Multimedia, 3D visual aids, and mental imagery enhanced performance, confidence, and reduced errors(44).

Level 3[Table 1]– Behaviour

Eight studies assessed evaluation of skills [Figure 4]. Evidence was largely anecdotal or inferred, based on questionnaires or post-training interviews. A consensus was pro-simulation training. Reported changes included improved operative confidence and patient safety in simulations (45–46, 49–54). Improvements in independence were enhanced further by increased fidelity of model (50), dedicated training program.s (51) and deliberate practice (54). However, cost and limited trainer and training time were common barriers (47, 48, 53).

Mixed levels[Table 1]

Eleven papers used multi-methods to assess learning on multiple evaluation levels simultaneously allowing in-depth multimodal assessment of learning. Level3/4, being the most common combination, offered the opportunity to evaluate new skills in the clinical field, often via formative assessment PBAs. Competence progression across training years was shown with a gradual decrease in supervision, especially for less complex procedures (55–60). Learner’s increased level of autonomy could infer positive clinical outcome, but not guarantee it, hence the level3/4 grade as opposed to level4. Level2/3 tested NOTSS skills and tested these via objective post sim methods, coaching improved real-time decision-making and teamwork (45, 46). A training module was able to evaluate independent skill progression through questionnaires and test clinical outcome metrics objectively(61).

One cadaveric review also reviewed the Kirkpatrick levels and therefore (62) provided: strong trainee approval, knowledge gains, procedural improvement and modest real-world outcome effects, to address all four levels of the training experience(Fig. 2).

Level 4[Table 1] – Results

Only one study [Figure 4] evaluated solely the clinical impact (63), this study examined the impact of a national training programme and noted improved in-training competency and improved real-time patient outcomes. The remaining studies were tested in conjunction with level 4.

Strength of Evidence of each Kirkpatrick Level

In this scoping review, the strength of evidence for each Kirkpatrick Level was assessed using a combination of methodological accuracy, data type, and the depth of outcome measurement reported in included studies and which type of levels and teaching opportunities would be best to improve surgical skill acquisition[Table 2]. Eight papers were rated low and were in Level 1 as it relied on subjective trainee perceptions(33–40). Level 2 papers involvin were categorized as moderate, if studies included objective, quantifiable outcomes (41–46). Papers in level 3 with self-rated outcomes through questionnaires were graded moderate due to their clinical relevance (49–53). However, papers scored high when clinical outcomes were measured directly which only one paper qualified for (62) and was categorized as level 4.

Studies spanning multiple levels were assessed contextually, with the highest level of rigorously collected and validated outcome data informing their overall strength. Papers in level 3/4 were rated moderate-high and contained eight papers that relied on large-scale datasets such as national trainee progression records. PBAs ratings of independence could indirectly support good clinical outcome (55–60).

This evaluative strategy aligns with well-known best-practice frameworks for educational evidence synthesis and therefore could relate to effective skill acquisition and potential for further research.

Table 2

Kirkpatrick framework summary for surgical training evaluation
Kirkpatrick Level	Focus	Outcome Type	Data Sources	Strength of Evidence	Limitations
Level 1: Reaction	Trainee satisfaction, perception of value	Subjective (opinions, confidence)	Surveys, focus groups, interviews, Likert scales	Low	● No direct measure of learning or competence ● Highly variable- Susceptible to social desirability bias
Level 2: Learning	Cognitive or psychomotor gains	Objective (task scores, post-tests, NOTSS)	Simulators, technical task analysis, pre/post MCQs, video scoring	Moderate	● May not translate to behavior ● Studies often lack long-term follow-up ● Mixed fidelity of simulation methods
Level 3: Behavior	Transfer to real-world practice	Observational/self-report, performance logs	Simulation follow-up, post-course self-assessments	Moderate to high	● Self-reporting bias ● Hard to isolate effect of training alone ● Confidence ≠ competence
Level 4: Results	Institutional/patient outcomes, system change	Clinical outcomes, operative independence, progression data	E-logbook data, national audits, PBA records, completion rates	High	● Resource intensive ● Rarely performed ● Hard to attribute causality
Mixed Levels	Multi-domain outcomes	Subjective + objective + behavioral	Literature reviews, multi-level studies (e.g., cadaveric sims). ISCP portfolios.	Varies by level	● Methodological inconsistency- Attribution across levels can be imprecise

Discussion

This scoping review aimed to identify which teaching methods used by GS in simulation and formative assessment contribute most effectively to surgical skill acquisition in UK trainees and which gaps remain in the current limited research surrounding this topic. The Kirkpatrick model(32) and Miller’s Pyramid(7) provided a valuable conceptual scaffold to interpret the nature and depth of learning outcomes revealed in this scoping review, while the Kirkpatrick Model enabled categorisation of educational interventions based on their evaluative impact, Miller’s framework clarifies the type of competence being developed at each stage. By using Kirkpatrick’s model to interpret the outcomes, an uneven distribution of evidence strength and a dominance of lower-level evaluation was identified. By using Miller’s pyramid it was still possible to highlight the progression from "knows" and "knows how", through cognitive gains and improved procedural understanding and "shows how" via simulation or structured assessment(65–67).

Level 1(33–40) outcomes were the most frequently reported, capturing trainee perceptions, satisfaction, and confidence e.g., via Likert scales or interviews. Trainees often reported increased motivation and confidence, particularly following simulation-based teaching such as bootcamps and skills camps. While many trainees valued simulation such as teaching camps for building confidence, some questioned its realism and educational value(39). There was strong support for structured, modular, and MR approaches (40), though barriers to engagement included service demands, trainer resistance, and a tick-box WBA culture (35). MR was regarded as valuable across all levels(40).

Level 1 outcomes, although high in quantity, most were single-centre, underpowered and were reliant on subjective tools to determine outcome. As a result of the subjective nature of their outcome they were classified as a weaker form of evidence. Their generalisability was limited and they failed to assess actual knowledge attainment or long-term clinical relevance

The deliberate proactive engagement suggested in Kolb’s cycle of reflection leads to the transfer of cognitive knowledge to clinical knowledge(22), one could argue that this was executed through self-reporting outcomes. Insights into attitude could indicate confidence, motivation and readiness(Kirkpatrick level1) but offered little measurable indication of learning, hence unable to confirm knowledge acquisition was achieved (Miller’s base).

Level 2(41–46) studies demonstrated more objective improvements across the cognitive domain, particularly with moderate-fidelity simulation and blended training models. Objective learning outcomes were assessed via post-intervention knowledge tests, diagnostic skill assessments, NOTTS, S-QAT and OSATS. They offered a means to demonstrate statistically significant improvements in post-intervention test scores. This level of evaluation provided stronger evidence for the acquisition of knowledge. Significant gains were seen with support of coaching(45), 3D visual aids and mental imagery to enhance performance, confidence, and reduced errors(44).

The strength was rated moderate, when using objective tools, to assess technical and cognitive acquisition. Limitations of its effectiveness in teaching included lack of long-term follow-up and over-reliance on surrogate endpoints like confidence or task completion and lack of skills assessment in the clinical field limited mixed simulation fidelity constrained broader application.

Level 2 tools evaluated knowledge gains and skill acquisition and mapped with Miller’s "Knows” when testing knowledge(41–44) and the “Knows How" stage, when testing knowledge through clinical problem solving(45–46). Level 2 teaching methods can be considered more effective due to their reliance on objective testing to assess knowledge retention yet are still flawed due to their lack of direct skill assessment.

Level 3(47–54) appraised skill attainment and predominantly explored behaviour change in the workplace, often self-reported through post-course feedback engagement. These studies linked simulation training to improved workplace preparedness, particularly in procedural confidence and clinical decision-making(51). Objective assessment tools such as post-test scores, simulation metrics and timed laparoscopic tasks were occasionally employed to measure time, accuracy and error rates e.g., laparoscopic exercises(54). However, most relied on self-reported post-course questionnaires to demonstrate improved skill improvement, confidence(49, 51, 52, 53) and teamwork dynamics(46). Simulation courses led to increased confidence and perceived readiness for registrar-level tasks(49). Simulation was identified as a driver of workplace behavioural change when structured feedback and debrief were included(46).

The evidence was graded moderate as outcomes were mostly deducted from self-reported growth without observational objective verification. In addition, fidelity of a simulation could affect the transferability of skill. Furthermore, minimal long-term follow-ups undermined the reliability of these paper’s findings.

Interventions involving high-fidelity simulation, mental rehearsal, and modular training allowed trainees not only to understand a procedure but to demonstrate it in controlled environments. When learners demonstrate skills in a simulated or structured setting, they “show how” (Miller’s) behaviour could change (Kirkpatrick’s level3). One may interpret the potential for good learning progression to the intermediate levels by demonstrating some behaviour changes indicative of competence and skill development.

Level 4 was demonstrated, in isolation, only by one paper(63) and evaluated clinical impact. It highlighted the benefits of a dedicated training programme in laparoscopic colorectal cancer surgery through demonstrating both the clinical benefits through mortality and morbidity reduction; as well clinical performance through assessment tools(63). This extended on previous papers where trainees call for such dedicated programs and structures(36,46,48,51,52,61) and should be incorporated into further curriculum work. Trainees, almost unanimously, called for dedicated teaching time, trainers, and incorporation into the training programs(11, 36, 46, 48, 51, 52, 61).

Despite the low quantity of papers, the sole paper(63) was considered a higher grade in strength due to high fidelity and thus skill transferability. Measuring clinical patient outcomes is arguably the most reliable way of testing skill outcomes. Reliability was high due to the three years follow up to monitor trainee’s competency.

Level 4 of Kirpatrick model matched with the maximum competence (“Does”) stage of Miller pyramid by demonstrating patient outcomes. It could be assumed to be the most rigorous form of assessment of skill attainment. One can infer the rarity of such papers to be secondary to logistical, ethical and financial challenges in measuring patient outcomes. Overall, level 4 papers were underrepresented, reinforcing the need for more longitudinal, mixed-method studies; with the capability of delineating impact of simulation and formative assessments from training environments to real-world clinical practice.

Majority of papers used multiple methods to address multiple levels of learning categories, which allowed insights across multiple levels. For example, level2/3(45, 46) deducted objective skill acquisition assessment and measured the broader impact of training outcome. The most common form was level3/4, which used large-scale national datasets from ISCP and eLogbook platforms to demonstrate index operative experience and achievement of a third level 4 competence that leads to CCT(56–60). Being based on real patient operations one can assume, indirectly, that increased level of independent operating (behaviour) is related to better patient outcome. Their long-term follow-up was associated with increased reliability. Interestingly, many trainees failed to obtain WBAs indicative of independence in their index procedures, suggesting that indicative numbers alone could fail to confirm true competence thereby questioning the validity of CCT by using WBAs(57, 58, 59). Furthermore, WBAs were often pre-filled by the trainees and inconsistently validated by their trainers(38). A recurring concern was the perceived ineffectiveness of WBAs/PBAs, with trainees reporting these often being used as ‘tick-box’ assessments(35), countering their educational intent.

Mixed levels studies evidence strength depended on the type of outcomes measured. Level 4 outcomes, alone or with level 3, provided rigorous generalisable data and insight into long-term impact on learning due to their large data, duration and reliability. In contrast, these were often associated with subjective or inconsistent validation means thereby lowering their evidence strength. If behavioural change through clinical application in real-world settings was demonstrated, then the "Does" level of Miller’s Pyramid was reached. However, post-certification(CCT) performance was poorly tracked therefore, making it difficult to assess competency, at the ultimate phase of integration into practice. Arguably, mixing the forms of assessments may yield valuable data, which is less ethically challenging than pure level 4 studies. Interest persists in creating more higher level competence-based assignments and to revise current WBAs given the current collated negative feedback.

This review has several limitations, firstly, the inclusion of heterogeneous study designs, populations (e.g., CTs, STs, SAS), and simulation modalities complicated direct comparison. Secondly, many included studies were observational, underpowered, or single-institution, limiting the external validity of findings. Thirdly, the nature of a scoping review precludes formal quality appraisal or meta-analysis. Fourthly, data focused solely on UK-based data, which, while relevant to the research question, may omit valuable international insights. Finally, reporting bias may exist, as negative or inconclusive studies are less likely to be published.

By linking Kirkpatrick’s levels with Miller’s ascending hierarchy of competence, the review supports that true surgical skill acquisition is multidimensional, requiring not just improved learner satisfaction or knowledge gain, but demonstrable procedural capability and transfer into practice. Integrating both models in curriculum and teaching design enables a more comprehensive evaluation of medical education from trainee satisfaction to patient outcome. Pursuing research into level 3 and 4(/“Shows How” and “Does”) is likely to be the most effective means to identify solutions to improving surgical skill acquisition.

Conclusion

This scoping review maps the current landscape of simulation and formative assessment in UK general surgery training, using the Kirkpatrick model to evaluate effectiveness of educational interventions whilst also describing the progression of knowledge to performance with Miller’s Pyramid of competency. Linking Kirkpatrick’s model with Miller’s pyramid underscores the need for educational strategies that move beyond early-stage gains to demonstrate actual competence in clinical settings. While simulation and formative assessment improve trainee confidence and technical ability (Levels 1–2, “Knows”/“Knows How”), fewer studies assess behavioural change or real-world performance (Levels 3–4, “Shows How”/“Does”). Future research should prioritise prospective, longitudinal designs capable of tracking performance into independent practice and thereby focusing on Levels 3–4, “Shows How”/“Does”, criteria as it demonstrates more ingrained learning for surgical skill acquisition. This form is already part of curriculum in form of WBAs, yet they need further research to improve given plentiful complaints raised. Embedding effective, evidence-based strategies into cohesive, competency-based teaching methods, may ultimately support a more accountable framework for developing safe and more consistent GS training across the UK.

Competing Interest

I declare there are no financial or non-financial competing interests.

REFERENCE:

Rodriguez-Paz JM, Kennedy M, Salas E, Wu AW, Sexton JB, Hunt EA, Pronovost PJ. Beyond see one, do one, teach one: toward a different training paradigm. Postgrad Med J. 2009;85(1003):244–9.

Welchman SA. Royal College of Surgeons of England. Improving surgical training. Bull R Coll Surg Engl. 2012;94(3):92–4.

Royal College of Surgeons of England. Improving Surgical Training. London: RCS Professional Standards; 2015.

McKee RF. The intercollegiate surgical curriculum programme (ISCP). Surg (Oxford). 2008;26(10):411–6.

Witheridge A, Ferns G, Scott-Smith W. Revisiting Miller's pyramid in medical education: the gap between traditional assessment and diagnostic reasoning. Int J Med Educ. 2019;25(10):191–2.

Aryal K. The usefulness of work-based assessments in higher surgical training: a systematic review. Int J Surg. 2021;(94):106127.

Massie J, Ali JM. Workplace-based assessment: a review of user perceptions and strategies to address the identified shortcomings. Adv Health Sci Educ. 2016;21(3):455–73.

Miller GE. The assessment of clinical skills/competence/performance. Acad Med. 1990;65(9 Suppl):63–7.

Lund J, Sadler P, McLarty E. The effect of COVID-19 on surgical training. Surg (Oxford). 2021;39(10):829–33.

10.

Barter J. The impact of the COVID-19 pandemic on annual review of competency progression outcomes issued to general surgical trainees. J Surg Educ. 2024;81(8):1119–32.

11.

Clements JM, Burke J, Nally D, et al. COVID-19 impact on surgical training and recovery planning (COVID-STAR): a cross-sectional observational study. Int J Surg. 2021;88:105903.

12.

Joint Committee on Surgical Training. Seventh annual report for the JCST trainee survey [online]. London: JCST;2024 [cited 205 June 17]. Available from: https://www.jcst.org/-/media/Files/JCST/Quality-Assurance/Trainee-Survey/Seventh-annual-report-for-the-JCST-trainee-survey.pdf

13.

Blue AV, Griffith CH, Wilson J, Sloan DA, Schwartz RW. Surgical teaching quality makes a difference. Am J Surg. 1999;177(1):86–9.

14.

Budden CR, Svechnikova K, White J. Why do surgeons teach? A qualitative analysis of motivation in excellent surgical educators. Med Teach. 2016;39(2):188–94.

15.

NHS England. 2024/25 Core Surgical Training Portfolio Guidance for Candidates [online]. London: NHS England;2024 [cited 2025 June 17]. Available from: https://medical.hee.nhs.uk/medical-training-recruitment/medical-specialty-training/surgery/core-surgery/core-surgical-training-self-assessment-scoring-guidance-for-candidates

16.

Cook T. General surgery curriculum. London: Joint Committee on Surgical Training; 2021.

17.

Haubert LM, Way D, DePhilip R, Tam M, Bishop J, Jones K, et al. Surgeons as medical school educators: an untapped resource. Anat Sci Educ. 2011;4(4):182–9.

18.

Verrier ED. The surgeon as educator. Thorac Surg Clin. 2019;29(3):227–32.

19.

Sadideen H, Plonczak A, Saadeddin M, Kneebone R. How educational theory can inform the training and practice of plastic surgeons. Plast Reconstr Surg Glob Open. 2018;6(12):2042.

20.

Sheets KJ, Hankin FM, Schwenk TL. Preparing surgery house officers for their teaching role. Am J Surg. 1991;161(4):443–9.

21.

Kolb DA. Experiential learning: Experience as the source of learning and development. Volume 1. Englewood Cliffs, NJ: Prentice-Hall; 1984.

22.

Mamede S, Schmidt HG. The structure of reflective practice in medicine. Med Educ. 2004;38(12):1302–8.

23.

Dewey J. How We Think: A Restatement of the Relation of Reflective Thinking to the Educative Process. Boston, MA: D.C. Heath & Co; 1933.

24.

Boyd EM, Fales AW. Reflective learning: key to learning from experience. J Humanist Psychol. 1983;23:99–117.

25.

Harrison J, Yaffe E. Teacher educators and reflective practice. Becoming a teacher educator: theory and practice. Dordrecht: Springer Netherlands; 2009. pp. 145–61.

26.

Hackney L, O'Neill S, O'Donnell M, Spence R. A scoping review of assessment methods of competence of general surgical trainees. Surgeon. 2023;21(1):60–9.

27.

Arksey H, O'Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol. 2005;8(1):19–32.

28.

Thomas A, Lubarsky S, Durning SJ, Young ME. Knowledge Syntheses in Medical Education: Demystifying Scoping Reviews. Acad Med. 2017;92(2):161–6.

29.

Tricco AC, Lillie E, Zarin W, O'Brien KK, Colquhoun H, Levac D, Moher D, et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169(7):467–73.

30.

Vanhecke TE, Zotero. J Med Libr Assoc. 2008;96(3):275.

31.

Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan - a web and mobile app for systematic reviews. Syst Rev. 2016;5:210.

32.

Kirkpatrick JD, Kirkpatrick WK. Kirkpatrick's four levels of training evaluation. Alexandria,VA: Association for Talent Development; 2016.

33.

Aawsaj Y, Singh J, Shamim MS. Enhancing Surgical Curriculum: Trainees' Perspectives on Laparoscopic Simulation and Assessment. Cureus. 2025;17(1):77054.

34.

Shalhoub J, Marshall DC, Ippolito K. Perspectives on procedure-based assessments: a thematic analysis of semistructured interviews with UK surgical trainees. BMJ Open. 2017;7:013417.

35.

Blackhall VI, Cleland J, Wilson P, Moug SJ, Walker KG. Barriers and facilitators to deliberate practice using take-home laparoscopic simulators. Surg Endosc. 2019;33(9):2951–9.

36.

Singh P, Aggarwal R, Allen-Mersh T, Darzi AW. Regional Quality Improvement Initiative for Surgical Training. J Am Coll Surg. 2015;221(4):47–8.

37.

Fleming C, et al. Structure and quality assurance of Fellowship Training in General Surgery: consensus recommendations. Int J Surg. 2019;67:101–6.

38.

Gaunt A, Markham DH, Pawlikowska TR. Self-motives in postgraduate trainees’ feedback-seeking behavior: a UK multicenter study. Acad Med. 2018;93(10):1576–83.

39.

Room HJ, Ji C, Kohli S, Choh C, Robinson P, Knight J, et al. Core surgical field camps: a new deanery-based model. Br J Hosp Med. 2020;81(9):1–6.

40.

Skervin AL, Scott HJ. Mental rehearsal: a useful simulation adjunct to surgical training. Surgeon. 2021;19(6):423–9.

41.

Shariff U, Kullar N, Haray PN, Dorudi S, Balasubramanian SP. Multimedia tools for cognitive surgical skill acquisition. Colorectal Dis. 2015;17(5):441–50.

42.

Kailavasan M, Berridge C, Kandaswamy G, Rai B, Wilkinson B, Jain S, Biyani CS, Gowda B. A Low-Cost Synthetic Abdominal Wall Model (Raj Model) for the Training of Laparoscopic Port Insertion. World J Surg. 2020;44(5):1431–5.

43.

Hand F, Stirling A, Felle P, Conlon K, Ridgway P. The e-handover: applications for surgical training. Clin Teach. 2017;14(3):211–5.

44.

Yiasemidou M, Glassman D, Mushtaq F, Athanasiou C, Williams MM, Jayne D, et al. Mental practice with 3D visual aids enhances performance. Surg Endosc. 2017;31(10):4111–7.

45.

Yule S, Parker SH, Wilkinson J, McKinley A, MacDonald J, Neill A, et al. Coaching non-technical skills improves surgical residents’ performance. J Surg Educ. 2015;72(6):1124–30.

46.

Ramjeeawon A, Sharrock AE, Morbi A, Martin G, Riga C, Bicknell C. Simulation training with debrief improves nontechnical skills. J Surg Educ. 2020;77(5):1300–11.

47.

Run through training pilots. Current Position (April 2023) [online]. London: JCST Available from: https://www.jcst.org/-/media/Files/JCST/Key-Documents/Run-through-training-pilots--update.pdf

48.

Allum W. Improving surgical training. Surg (Oxf). 2020;38(10):596–600.

49.

Clarke R, Kuligowska A, Cooper AP, Biyani S. General Surgery Bootcamp: improving ST3 induction. Br J Surg. 2024;111(8):197184.

50.

Clarke R, Kuligowska A, Scott P, Biyani S, Cooper AP. Evaluating a proctology simulation model at ST3 bootcamp. Br J Surg. 2024;111(8):197097.

51.

Metcalfe K, Pollard J, Thomas I. No one told me about this side of the job! a novel teaching method. Br J Surg. 2021;108(2):134014.

52.

Boyle M, Thu K, Mroczek T, Scicluna M, Byrne A. Preparing Foundation trainees for surgical on-calls. Br J Surg. 2021;108(6):259848.

53.

Hosny SG, Johnston MJ, Pucher PH, Erridge S, Darzi A. Barriers to simulation training uptake: a multinational study. J Surg Res. 2017;220:419–26.

54.

Rajaratnam V, Rahman NA, Dong C. Integrating instructional design into surgical training. Ann R Coll Surg Engl. 2021;103(10):718–24.

55.

Shalhoub J, Santos C, Bussey M, Eardley I, Allum W. Use of workplace-based assessments in UK surgical training. J Surg Educ. 2015;72(5):786–94.

56.

Brown C, Abdelrahman T, Patel N, Thomas C, Pollitt MJ, Lewis WG. Operative learning curve trajectory in surgical trainees. Br J Surg. 2017;104(10):1405–11.

57.

Abdelrahman T, Long J, Egan R, Lewis WG. Operative experience vs competence in general surgery. J Surg Educ. 2016;73(4):694–8.

58.

De Siqueira JR, Gough MJ. Correlation between targets and competence in general surgery. Br J Surg. 2016;103(7):921–7.

59.

Abdelrahman T, Thomas C, Santos C, Griffiths G, Lewis W. Counting surgical competence: a GI perspective. Gastroenterology. 2015;148:S–202.

60.

Elsey EJ, Griffiths G, West J, Humes DJ. Changing autonomy in operative experience: a UK cohort study. Ann Surg. 2019;269(3):399–406.

61.

Abdel-Dayem M, Thippeswamy KM, Haray P. Structured modular approach for laparoscopic colorectal surgery. Surg Innov. 2021;28(4):479–84.

62.

James HK, Chapman AW, Pattison GTR, Griffin DR, Fisher JD. Cadaveric simulation in surgical training: a systematic review. Br J Surg. 2019;106(13):1726–34.

63.

Hanna GB, Mackenzie H, Miskovic D, Ni M, Wyles S, et al. Laparoscopic Colorectal Surgery Outcomes Improved After National Training Program (LAPCO) for Specialists in England. Ann Surg. 2022;275(6):1149–55.

64.

2022 ST3 Vascular Surgery Supplementary Applicant Handbook. Summary of changes for 2022 recruitment. London: NHS England; 2022.

65.

Halligan M, Murphy MP, Friedman M. Learning outcomes. In: Certified Healthcare Simulation Educator (CHSE®) Review: Comprehensive Review, PLUS More Than 350 Questions Based on the Latest Exam Blueprint. 2023.p.19:311.

66.

Toy S. Evaluation paradigms. Scholarship in Healthcare: The Health Scholar’s Toolbox. Cham: Springer International Publishing; 2023. pp. 81–101.

67.

Tangpaisarn T, Phrampus PE, O’Donnell JM. Learning theory in healthcare simulation. Navigating Healthcare Simulation: A Practical Guide for Effective Teaching. Cham: Springer Nature Switzerland.; 2025. pp. 9–16.

67.

Arora S, et al. Mental practice enhances surgical technical skills: a randomized controlled study. Ann Surg. 2012;255(6):1181–7.

Appendix. 1- List definitions.

Acronym	Full Term	Definition / Explanation
ACP	Advanced Clinical Practitioner	A healthcare professional ) with advanced training who supports consultants, including in procedural and perioperative tasks. Plays an increasing role in NHS surgical services.
AES	Assigned Educational Supervisor	A senior consultant responsible for a trainee’s educational development and workplace-based assessment. Overseas trainee throughout a training placement.
ARCP	Annual Review of Competence Progression	A yearly structured annual assessment process used in UK postgraduate training to determine if a surgical trainee is meeting curriculum milestones and can advance in training. It evaluates workplace-based assessments, logbook data, and supervisor feedback.
ASiT	Association of Surgeons in Training	A UK professional body representing surgical trainees across specialties, involved in education policy, research, and advocacy.
CCT	Certificate of Completion of Training	The formal certification that marks the completion of specialty training, making a surgeon eligible for consultant appointment in the UK.
CEX	Clinical Evaluation Exercise	A type of WBA assessing clinical judgment, decision-making, and professionalism during clinical situations.
CT / CST / ST	Core Trainee / Core Surgical Trainee / Specialty Trainee	UK surgical training stages: CT/CST refers to years 1–2 (basic surgical training) and ST spans years 3–8 (higher specialist training).
DOPS	Direct Observation of Procedural Skills	A type of WBA assessing assess a trainee’s ability to perform clinical procedures safely and effectively.
eLogbook	Electronic Logbook	A digital database used by trainees to record operations with grade of independence, assisting in certification and performance tracking.
ENT	Ear, Nose and Throat (Otolaryngology)	A surgical specialty focusing on conditions of the head and neck, including sinuses, larynx, and ears.
FY	Foundation Year	The first two years (FY1 and FY2) of postgraduate medical training in the UK, undertaken before specialty training.
GMC	General Medical Council	The UK’s regulatory authority for medical education and practice. It maintains the medical register and sets training standards.
GS	General Surgery	A broad surgical specialty involving the gastrointestinal tract, hernia repair, and emergency abdominal operations.
HEE	Health Education England	A statutory body overseeing education and training for healthcare professionals in England.
ISCP	Intercollegiate Surgical Curriculum Programme	The official online platform and curriculum for surgical training in the UK and Ireland, used to manage portfolios, WBAs, and ARCP submissions.
IST	Improving Surgical Training	A UK initiative to modernise surgical training, aiming for more structured training time, earlier run-through progression, and enhanced simulation use.
JCST	Joint Committee on Surgical Training	A committee who oversee all UK surgical training standards, curriculum development, and trainee certification criteria.
LCS	Laparoscopic Colorectal Surgery	A minimally invasive surgical technique used for procedures involving the colon and rectum. Frequently taught using simulation.
Likert Scale	Likert Psychometric Scale	A scale used in surveys to measure subjective perceptions, ranging often from “strongly disagree” to “strongly agree”.
mini-CEX	Mini Clinical Evaluation Exercise	A type of WBA assessing clinical skills, often involving case discussions or direct observations of examinations.
MR	Mental Rehearsal	A cognitive training technique where individuals imagine steps of procedures, shown to enhance focus, decision-making, and surgical performance.
NOTSS	Non-Technical Skills for Surgeons	A structured behavioural framework used to assess interpersonal and cognitive skills e.g.,leadership, communication, and situational awareness in surgical settings.
NHS	National Health Service	The publicly funded healthcare system of the UK, under which most clinical services are delivered.
OOPT	Out of Programme Training	Formal training taken outside the main UK programme, such as fellowships or research, requiring deanery.
OSATS	Objective Structured Assessment of Technical Skills	A validated, checklist-based assessment tool used to evaluate technical skill in surgery under observed or simulated conditions.
OTAS	Observational Teamwork Assessment for Surgery	An observational scoring tool used to assess team behaviours and communication during simulated or real surgical procedures.
PBA	Procedure-Based Assessment	A type of WBA assessment that acts as a structured, formative tool to assess trainees' level of independence when performing surgical procedures. It is mapped to ISCP levels of competence. Widely used but frequently criticised.
RCT	Randomised Controlled Trial	A study design where participants are randomly assigned to different interventions to test efficacy. Considered a gold standard in research.
SAS	Specialty and Associate Specialist Doctors	Experienced non-training grade doctors who contribute significantly to clinical work and often assist with teaching and service provision.
SCP	Surgical Care Practitioner	A non-physician practitioner trained to assist in surgeries and perioperative care, under consultant supervision.
SHO	Senior House Officer	A former UK junior doctor training grade, now largely replaced by F2, CT1–2 and ST1–2 levels. Still used colloquially.
S-QAT	Surgical Quality Assessment Tool	A trainee-reported survey measuring the quality of surgical education across domains like supervision, curriculum delivery, and feedback.
STAI	State-Trait Anxiety Inventory	A validated psychological assessment used to measure acute (state) and baseline (trait) anxiety levels, especially pre- and post-simulation or assessment.
TEVAR	Thoracic Endovascular Aortic Repair	A minimally invasive technique for repairing the thoracic aorta using catheter-guided stent grafts.
VR	Virtual Reality	A computer-generated 3D simulation environment used for immersive surgical training, often with feedback and real-time performance metrics.
WBA	Workplace-Based Assessment	An umbrella term for in-practice assessment methods such as Direct Observation of Procedural Skills (DOPS), Case-Based Discussion (CBD), mini-Clinical Evaluation Exercise (mini-CEX) and Procedure-Based Assessment (PBA).. They are used to document day-to-day performance and progression in ISCP.

APPENDIX 2- Charting Table

Kirkpatrick model	Study, Year, Trainee, lvl	Study design	Aim	Outcome
1	(33) Aawsaj, Y., et al. (2025), N:10, ST 3–8	Simulation assessment + semi-structured interview	-explores surgical trainees’ perceptions of using simulated laparoscopic assessment as a summative tool in the UK.	Trainees valued laparoscopic simulation for building confidence and assessment but said it couldn’t replace real experience. Barriers included NHS demands, trainer resistance, limited resources, and frustration with PBAs as the only assessment tool.
1	(34) Shalhoub, J., et al. (2017), N:10, CST	Semi Structured interviews	-to understand surgical trainees' perspective and identify the significant of PBA.	Trainees reported inconsistent senior support and noted PBAs’ value varied by training level. They questioned PBA validity, especially self-completion, but acknowledged their usefulness for tracking progress and guiding further learning when properly used.
1	(35) Blackhall, V., et al. (2019), N:43, CST	Home simulation + online module; focus group feedback	- to uncover the barriers to engagement with home-based simulation, with a view to developing an improved programme.	4 themes were identified: trainee motivation, feedback quality, trainer involvement, and systemic factors. Trainees disliked impersonal metric feedback, preferring consultant input. A ‘tick-box’ culture, mismatched expectations, and doubts about low-fidelity simulation showed a need for more shared responsibility and a clearer understanding of deliberate practice.
1	(36)Singh, P., et al.(2015), N:51, ST1-3	Regional Survey using S-QAT / likert score	- to identify variation in training quality across training centers.	Twelve centres reported strong supervision, approachable trainers, and good trainee rapport. However, access to training lists, outpatient goals, and teaching programmes varied. Recommendations included protected theatre time and improved organisation of technical, non-technical, and research training.
1	(37) Fleming, C., et al. (2019), N:24, ST?	Descriptive Study with live polling.	-to provide recommendations on the structure and quality assurance of fellowships in General Surgery.	Trainees saw fellowships as OOPT and supported their inclusion in specialty training. Preferences were separated between UK and international options, with the latter being praised for structure and case variety. Most favoured post-CCT fellowships, were against national UK selection, and supported a standardised curriculum.
1	(38)Gaunt, A., et al. (2018), N:42, ST1-8	Qualitative multicentre focus groups	to explore trainees' feedback-seeking behavior in the postgraduate surgical workplace using a self-motives framework.	Trainees’ feedback-seeking aligned with self-motives theory: WBAs supported self-enhancement, while informal feedback helped with self-improvement. If WBAs were perceived as summative then they hindered openness. Authors called for reform to promote honest, developmental feedback without the fear of judgement.
1	(39) Room, H.J., et al. (2020), N:39, CST	Simulation camp + satisfaction questionnaire	to teach core surgical trainees basic entry level skills. Training in advanced skills often requires attendance at national fee-paying courses.	All trainees found the field camp highly relevant and reported improved knowledge and excellent faculty feedback. Camps offered advanced skills, consultant-led feedback, and mentorship. If held before ST3 interviews, the timing and training were well received.
1	(40)Skervin, A.L., and Scott, H.J., et al. (2021), N:74, CST + ST3-8	Self-report questionnaire on use of MR.	assess the use of mental rehearsal amongst general surgical trainees and consultants, Mental rehearsal is an effective adjunct to surgical training	Mental rehearsal (MR) is used by 91.5% of surgeons across all levels. Though more common for complex cases, it’s also applied to routine ones. MR improves focus, clarity, planning, and error anticipation, with consultants highlighting its value early in training.
1 + 2	(41) Shariff, U., et al, (2015), N:59, ST1-8	RCT, post-intervention knowledge test and evaluation	determine the effectiveness of a multimedia educational tool developed for an index colorectal surgical procedure	Both groups improved post-test, with no difference between multimedia and study day formats. Trainees reported better decision-making and anatomy knowledge, viewing the tool as a valuable supplementary resource.
2	(42)Kailavasan, M., et al. (2020), N:93, ST	Simulation bootcamp with abdominal wall model; post-simulation Likert questionnaires	assess face validity of a novel low fidelity abdominal wall simulator for training of laparoscopic port insertion at theUrology Simulation Bootcamp course (USBC).	Trainees and experts rated the low-fidelity abdominal wall model positively, with no significant difference in face validity. It was seen as a useful tool for laparoscopic port placement training in both urology and general surgery.
2	(43)Hand, F., et al. (2017), N:17, ST1-3‌	Retrospective analysis; similarity between admission and discharge diagnoses	-aimed to ascertain whether a standardised electronic handover could also be used as a surrogate marker of trainees' diagnostic skills.	Over six months, trainees improved diagnostic accuracy using a structured handover tool. Performance was measured by key findings, diagnosis, and treatment plan, with points awarded for alignment between initial and discharge diagnoses as a diagnostic skill marker.
2	(44)Yiasemidou, M., et al. (2017), N:20, CST	Case controlled study on MR; metrics of simulator: time, motion, safety	hypothesis: that the provision of interactive 3D visual aids during MP could facilitate surgical skill performance of laparoscopic cholecystectomies.	Mental practice and 3D simulation improved time, movement, and path length over conventional training, with no safety differences. Authors suggest combining mental imagery and anatomical variation models boosts preoperative prep and novice training.
2 + 3	(45)Yule, S., et al. (2015), N:16, ST4-6	RCT on simulation; NOTSS scoring for laparoscopic cholecystectomy	Effect of coaching on nontechnical skills and performance during laparoscopic cholecystectomy in a simulated Theatre	The intervention group’s NOTSS scores improved significantly, unlike the control group. Coached participants called for help faster in critical scenarios, though time and path length were unchanged. Coaching enhanced non-technical skills in simulated surgery.
2 + 3	(46)Ramjeeawon, A., et al. (2020), N:16, ST1-3 & MD students	Simulation with structured debrief; NOTSS, OTAS, STAI	To assess whether fully immersive simulation with structured debriefing improves lead surgeon teamwork in a standardized TEVAR scenario. Secondary aims: evaluate concurrent improvements in technical skills and radiation safety behaviours.	Structured debriefing significantly improved NOTSS teamwork scores—communication, coordination, and leadership—regardless of trainee grade. Technical skills, radiation safety, and procedure speed improved. Psychological outcomes also benefited, with reduced tension and worry, and increased relaxation and contentment post-debrief.
3	(47) JCST,. (2023), N:?, ST1-8	Report on improving surgical training (IST) pilot trainee feedback	Summary of IST trainee Feedback	General surgery trainees in England felt disadvantaged by limited ST3 posts under the mixed run-through and uncoupled model. Fewer issues were reported in Scotland and Wales. Pilot programme references are being removed from curricula and GMC materials, except for ongoing pilots in Trauma & Orthopaedics and Paediatric Surgery.
3	(48)Allum, W., et al. (2020), N20, ST1-8	Report on IST pilot trainee feedback	to summarize surgical training issues, improving surgical training (IST) pilot programme	The study recommended 60% of trainee rotas should focus on training, with more access to elective theatre sessions and simulation training. Trainers were limited by clinical demands and poor job planning. Expanding ACP and SCP roles could support rotas but needs equivalent training, funding, and backing from HEE.
3	(49)Clarke, R., et al. (2024), N:26, ST3	Prospective analysis; simulation, lectures, labs; self-assessment using Likert feedback	To create an introductory course or “bootcamp” to assist new ST3s transition from working at core trainee to General Surgical Registrar level.	Trainees reported significant skill gains in endoscopy, laparoscopy, open surgery, and non-technical areas. Confidence rose from 69% to 100% post-course, especially in laparoscopic suturing (77%), ulcer repair (69%), and stress management.
3	(50)Clarke, R., et al. (2024), N:25, ST3	-Simulation with pig tissue model -Feedback questionnaire	-assess face validity of a low-cost model for teaching acute proctology during the ST3 general surgical bootcamp.	Following simulation training, trainees reported increased confidence in rectal exams (72%), banding (80%), and Examination Under Anaesthesia (68%). Most rated the model “Good” or “Excellent” (80%) with high realism, finding the training effective.
3	(51)Metcalfe, K., et al. (2021), N16, CST	Office admin simulation; post-course questionnaire	office admin” simulation session as a method for gaining further non-technical skills in surgery, prepare them for the role of Consultant ‘useful’ or ‘very useful	Trainees rated the pilot programme as useful and felt it prepared them for consultant roles. All supported adding it to regional teaching, finding it relevant and well received.
3	(52)Boyle, M., et al. (2021), N?, FY/TG	One-day workshop; pre/post-course questionnaires	assessing the acute surgical patient, identifying which patients need an operation and having the technical skills to competently assist in theatre.	Trainees reported increased confidence across all areas post-course: on-call management (66% to 100%), theatre decision-making (37.5% to 93.7%), and suturing (37.5% to 100%). The course effectively boosted technical and decision-making skills.
3	(53)Hosny, S.G., et al. (2017), N:37, ST3-8	Multinational qualitative study; semi-structured interviews	identify barriers and facilitators to the implementation and uptake of surgical simulation training programs	SImulation was valued the most but was limited by cost, time, and motivation. It was widely seen as improving patient safety and was supported for mandatory assessment. Experts backed its use in competency evaluation, while residents were less confident in its validity.
3	(54)Rajaratnam, V., et al. (2021) NA	Review of modular laparoscopic training; motor learning theories	This article aims to review current models in surgical skills acquisition and to propose an integrative process-driven, outcomes-based model for surgical skills acquisition and mastery.	The authors suggest a low-cost, self-directed model using motor imagery, mental, and deliberate practice to support skill mastery in times of limited hands-on training. Instructional design can guide scalable, simulator-free programmes as a good alternative.
3 + 4	(55)Shalhoub, J., et al. (2015), N:?, ST1-8	Descriptive analysis; ISCP usage data	the aim of the study was to describe the use of WBAs by UK surgical trainees and examine variations by training region, specialty, and level of training.	Validated WBAs per trainee rose over seven-fold from 2007 to 2013, with core trainees completing more than specialty trainees. London and ENT trainees submitted the most. WBA types remained stable—operative and non-operative assessments were evenly split in CSTs, while PBAs were most used by specialty trainees.
3 + 4	(56) Brown, C., et al. (2017), N:84, ST3-8	Service evaluation; PBA trajectory vs. case volume	To evaluate the performance trajectory for general surgery index procedures in relation to operative experience, indicative numbers, and training time among higher surgical trainees in a UK deanery.	Learning curves for emergency laparotomy and Hartmann’s procedure varied by caseload and training time. Timing between PBAs differed across competency levels, indicating inconsistent tracking. Few trainees completed PBAs after reaching level 4, limiting continued skill assessment.
3 + 4	(57)Abdelrahman, T., et al. (2016), N69, ST3-8	-Service evaluation -Learning curve gradients related to PBA levels of index procedures	To examine the relationship between operative volume in key indicative procedures and competence levels achieved by general surgery trainees in a higher surgical training programme within a UK deanery.	Only emergency laparotomy had a competence-to-target ratio below 1, suggesting JCST targets are appropriate. For procedures like Hartmann’s, trainees needed over triple the target to reach competence. Authors recommend revising indicative numbers, as current targets may underestimate actual training needs.
3 + 4	(58)De Siqueira, J. R., and Gough, M.J., et al. (2016), N:121, ST3-8	A descriptive analysis of the use of WBAs and elog book in UK surgical training -Use of ISCP by trainees	to assess the correlation between trainer assessment of competence and completion of indicative numbers.	Operative volume correlated with PBA scores for colectomy and Hartmann’s, but many trainees failed to reach Level 4 despite meeting index numbers/targets. Over half of post-target PBAs scored below Level 4, showing how variable progression can be. The study questions current certification reliability and seeks for more robust assessment tools.
3 + 4	(59)Abdelrahman, T., et al. (2015), N:89, ST??	-A descriptive analysis of the use of ISCP -percentage trainees reaching CCT targets	to evaluate the current operative experience achieved by UK gastrointestinal (GI) surgery trainees at CCT, and to determine whether targets set are achievable	Most GI trainees met operative targets—63% for total cases, 69% for emergency laparotomy—with higher rates in major subspecialties. Academic goals were also met by most: 88% had ≥ 3 publications, 94% met presentation targets. Authors recommend early identification of lower performers and targeted simulation to support progress.
3 + 4	(60)Elsey, E.J., et al, (2019), N:311, FY1-2&CST	-Cohort Study - Review of ISCP and elog book for all UK GS trainees	To quantify operative experience in general surgery training, including key procedures, and track changes in supervision over time. To evaluate whether UK surgical training data can evidence competency progression and entrustment decisions across a full trainee cohort.	Trainees progressed from simple to complex procedures with declining supervision over time. National PBA data showed Level 4 competence for basic procedures by the end of training, while complex ones took longer. The study highlighted how training data reflects evolving competence and decision-making.
3 + 4	(61) Abdel-dayem, M., et al. (2021), N:35, CST&SHO	-structured modular approach for LCS -assess competency-based progression and questionnaire for independent practice for LCS	To create a modular Laparoscopic colorectal surgeries ( LCS) training programme enabling progression from novice to independent operator and trainer, through a reproducible, transferable pathway supporting competency-based trainee development.	The structured training programme had a 98% satisfaction rate, with most trainees planning to adopt it. Low conversion rates (1.5%) and good outcomes were also reported, with many achieving independent laparoscopic LCS skills. The model was seen as adaptable, although 45% stated that staffing levels was a barrier.
1–4	62)James, H. K., et al. (2019), N: 2002	-systematic literature review -evaluate the current evidence on cadaveric sim training against 4 levels of kirkpatrick model	describe and evaluate the evidence for cadaveric simulation in postgraduate surgical training	Cadaveric simulation yielded positive learner reactions and post-test knowledge gains. Most studies showed improved procedural performance, though behavioural change and clinical transfer were inconsistent. Level 4 evidence was promising for some tasks, but long-term impact and retention remain unclear.
4	(63) Hanna, G.B., et al, (2022), N:108	Case-control study; comparing clinical outcomes of colorectal cases performed by lapco-vs non-lapco surgeons.	Examine impact of national training programme LAPCO clinical outcome performed by Lacpco surgeons after training completion	Increased rates of laparoscopic colorectal cancer surgery, reduced mortality and morbidity. In-training competency assessment tools predicted clinical performance after training.