Title: κ = 0.97: A Practical Framework Any Hospital Can Implement for Research-Grade Data Quality

Running Title: Multi-Source Validation Framework κ = 0.97

Authors

Present Address:

DenisseMartínez-Ríos¹1

JuanCarlosMoreno-Rojas¹1

AdriánMartínez-Ríos²1

CarlosEduardoLulé-Martínez¹1

GuillermoDíaz-Terán1

Aguilera¹1

DámasoHernández-López¹1

DenisseMartínez-Ríos

DenisseMartínez1✉Emaildenissemarrios1@gmail.com

JuanCarlosMoreno-Rojas1

CarlosEduardo1

Aguilera1

1Department of Angiology and Vascular Surgery, Instituto de Seguridad y Servicios Sociales de los Trabajadores del Estado (ISSSTE)Hospital Regional General Ignacio ZaragozaMexico CityMexico

2Instituto de Seguridad y Servicios Sociales de los Trabajadores del Estado (ISSSTE)Hospital Regional Presidente JuárezOaxaca CityMexico

3Department of Angiology and Vascular SurgeryHospital Regional General Ignacio Zaragoza, ISSSTECalzada Ignacio Zaragoza 1711, Col. Ejército Constitucionalista09220Iztapalapa, Mexico CityMexico

0009-0008-3863-649X

0009-0006-7604-3893

0009-0005-1948-3860

Denisse Martínez-Ríos¹*, Juan Carlos Moreno-Rojas¹, Adrián Martínez-Ríos², Carlos Eduardo Lulé-Martínez¹, Guillermo Díaz-Terán Aguilera¹, Dámaso Hernández-López¹

Affiliations

¹ Department of Angiology and Vascular Surgery, Hospital Regional General Ignacio Zaragoza, Instituto de Seguridad y Servicios Sociales de los Trabajadores del Estado (ISSSTE), Mexico City, Mexico

² Hospital Regional Presidente Juárez, Instituto de Seguridad y Servicios Sociales de los Trabajadores del Estado (ISSSTE), Oaxaca City, Mexico

Corresponding Author

*Denisse Martínez-Ríos, MD

Department of Angiology and Vascular Surgery

Hospital Regional General Ignacio Zaragoza, ISSSTE, Calzada Ignacio Zaragoza 1711, Col. Ejército Constitucionalista, Iztapalapa, Mexico City, 09220, Mexico

Email: denissemarrios1@gmail.com

ORCID: 0009-0000-6108-8868

Word Count: Abstract: 234 words | Main Text: 3,285 words | Tables: 3 | Figures: 3

Author ORCIDs

Denisse Martínez-Ríos: 0009-0000-6108-8868

Juan Carlos Moreno-Rojas: 0009-0008-3863-649X

Adrián Martínez-Ríos: 0009-0006-7604-3893

Carlos Eduardo Lulé-Martínez: 0009-0009-0488-3772

Guillermo Díaz-Terán Aguilera: 0009-0000-3774-4175

Dámaso Hernández-López: 0009-0005-1948-3860

ABSTRACT

Background

Medical device registries in transitional healthcare systems face substantial challenges achieving reliable data extraction due to documentation fragmentation and absence of integrated electronic health records. Conventional single-source or dual-source extraction methods demonstrate moderate inter-rater reliability (κ = 0.50–0.70), limiting research validity. We hypothesised that systematic multi-source data triangulation could achieve near-perfect reliability comparable to advanced registry systems whilst providing a replicable framework for resource-limited settings.

Methods

We conducted a retrospective methodological validation study of 176 rotational atherectomy procedures performed between January 2020 and December 2023 at Hospital Regional General Ignacio Zaragoza, Mexico. Three independent, blinded investigators extracted device-specific data from thirteen documentary sources organised into six validation domains. Inter-rater reliability was assessed using Cohen's κ with 95% confidence intervals. External validation was performed by an investigator from a geographically separate institution.

Results

The thirteen-source framework achieved overall inter-rater reliability of κ = 0.97 (95% CI: 0.94–0.99), with 96.6% concordance across three extractors. External validation demonstrated κ = 0.94, confirming reproducibility without subspecialty expertise. Complete device identification was achieved in 100% of procedures. Comparative bootstrap analysis revealed 49% improvement over single-source extraction (κ = 0.65, p < 0.001) and 24% improvement over dual-source methods (κ = 0.78, p < 0.001).

Conclusions

Systematic multi-source data triangulation enables transitional healthcare systems to achieve research- grade inter-rater reliability exceeding advanced registry benchmarks. Documentation multiplicity, when leveraged through structured protocols, transforms from methodological limitation to asset.

Trial registration

Not applicable. This study does not involve a clinical trial.

Keywords:

Inter-rater reliability

Data quality

Medical device research

Multi-source validation

Documentation triangulation

Healthcare registries

Transitional health systems

Cohen's kappa

BACKGROUND

Medical device registries serve as cornerstone infrastructure for post-market surveillance, comparative effectiveness research, and healthcare quality assessment. Conventional data extraction approaches in resource-limited settings demonstrate moderate inter-rater reliability coefficients typically between κ = 0.50 and 0.70 [1] [^2], constraining research validity and limiting generalisability of device effectiveness evidence to real-world populations[^3].

The prevailing paradigm assumes that documentation fragmentation represents an insurmountable methodological limitation, with advanced digital infrastructure positioned as prerequisite for rigorous medical device research. This assumption perpetuates research inequities, as resource-limited settings struggle to generate high-quality evidence despite serving patient populations with distinct disease patterns[4][5]. Moreover, the assumption overlooks a fundamental paradox: documentation multiplicity, whilst introducing complexity, simultaneously creates opportunities for cross-validation and triangulation that single integrated systems cannot provide[6][7].

We hypothesised that systematic multi-source data triangulation, when implemented through structured protocols, could transform documentation fragmentation from methodological limitation to asset. This study presents development and validation of a thirteen-source data extraction framework implemented at a public tertiary hospital in Mexico City, with primary objective to quantify inter-rater reliability for device-specific data extraction across three independent investigators.

METHODS

Study Design and Setting

We conducted a retrospective methodological validation study at Hospital Regional General Ignacio Zaragoza, a 273- bed public tertiary referral centre affiliated with ISSSTE in Mexico City. The institutional information architecture reflects characteristics typical of transitional healthcare systems: partial electronic health record implementation, paper- based surgical logbooks and anaesthesia flowsheets, hybrid administrative coding systems, and independent supply chain documentation maintained across clinical, pharmacy, and procurement departments[11][12][^13].

Source Population and Procedures

We identified all consecutive rotational atherectomy procedures performed for femoropopliteal peripheral arterial disease between 1 January 2020 and 31 December 2023 through cross-referencing three independent databases.

Inclusion criteria specified: (1) use of rotational atherectomy systems (Jetstream™ [Boston Scientific] or Phoenix™ [Medtronic/Medstent]); (2) treatment of native femoropopliteal lesions; (3) complete procedural documentation available across all thirteen source domains; and (4) minimum 12-month follow-up completed or censored.

Exclusion criteria comprised:

(1) in-stent restenosis treatment (n = 8); (2) concurrent acute limb ischaemia with thrombectomy (n = 4); (3) incomplete source documentation (n = 2); and (4) duplicate procedural entries (n = 1). Of 191 initially identified procedures, 176 met inclusion criteria and comprised the final analytical cohort representing 168 unique patients.

Thirteen-Source Documentation Framework

We systematically categorised all available documentary sources into six validation domains based on data generation mechanisms and independence characteristics (Table 1). The framework integrated thirteen distinct source types:

Table 1
The Thirteen-Source Multi-Level Validation Framework
Source #	Source Name	Format	Generation Timing	Primary Personnel	Independence Level
1	Operating room logbook	Paper	Intraoperative	Circulating nurses	High
2	Post-operative notes	Digital (VitalMex)	< 5 min post- procedure	Primary operator	Medium
3	Anaesthesia flowsheets	Paper	Intraoperative	Anaesthesiologist	High

Source #	Source Name	Format	Generation Timing	Primary Personnel	Independence Level
4	Operative reports	Digital (SIME)	Same day	Primary operator	Medium
5	SIMEH diagnosis codes	Digital	5–10 days post- discharge	HIM coders	Very High
6	Material requisition codes	Digital	5–10 days post- discharge	HIM coders	Very High
7	ICD-10-CM procedure codes	Digital	5–10 days post- discharge	HIM coders	Very High
8	PACS metadata + local backup	Digital	Intraoperative (automated)	Fluoroscopy system	Very High
9	Angiographic annotations	Digital	Intraoperative	Radiology technologists	High
10	Pharmacy dispensing logs	Digital	Pre-procedure	Pharmacy personnel	Very High
11	Manufacturer technical bulletins	PDF (external)	Pre-market	Manufacturer	Absolute
12	Lot traceability records	Spreadsheet (external)	Monthly	Manufacturer	Absolute
13	Procurement archives	Paper	Monthly reconciliation	Departmental secretary	Very High
Independence level: Absolute = completely external; Very High = different department, timing, motivation; High = different personnel, similar timing; Medium = same personnel, different template.

Prospective Clinical Domain (n = 4 sources):

1. Operating room logbook (handwritten, maintained by circulating nurse)

2. Post-operative notes (physician-completed, documented within 5 minutes per institutional protocol, digitised in VitalMex system requiring same-day folio closure)

3. Anaesthesia flowsheets (anaesthesiologist-completed, paper-based, documenting anaesthetic technique and intraoperative haemodynamic parameters including blood pressure values critical for vascular procedure monitoring)

4. Operative reports (surgeon-completed, digitised since 2021, completed same day)

Administrative Coding Domain (n = 3 sources):

5. SIMEH diagnosis codes (completed 5–10 days post-discharge by medical records coders)

6. Material requisition coding (procedure-specific classification)

7. ICD-10-CM procedure coding

Digital Imaging Domain (n = 2 sources):

8. PACS metadata with local hard drive backup (implemented following historical data loss by previous external vendor contracted by ISSSTE, with departmental maintenance of duplicate archives ensuring data preservation)

9. Angiographic image annotations (embedded text descriptors)

Pharmacy/Supply Domain (n = 1 source):

10. Dispensing logs (pharmacy-maintained, time-stamped, pre-procedure)

Manufacturer Traceability Domain (n = 2 sources):

10.

11. Device technical bulletins (manufacturer-provided specifications)

11.

12. Lot traceability records (unique device identifiers when available)

Institutional Procurement Domain (n = Fsource):

13. Central warehouse procurement archives (monthly reconciliation by departmental secretary, archived in cardboard boxes as duplicates of original files maintained at ISSSTE General Direction)

Each source type was evaluated for three independence criteria: (1) data generation by different personnel; (2) documentation occurring at temporally distinct points in clinical workflow; and (3) absence of shared data entry interfaces [1] [^2] (Fig. 1).

Fig. 1

CONSORT flow diagram showing participant selection and data validation process.

[INSERT FIGURE 1 HERE]

Data Extraction Protocol

Three investigators (DMR, JCMR, AMR) independently extracted device-specific data from all 176 procedures across thirteen sources using standardised data collection forms implemented in REDCap[16][17]. Extracted variables comprised: device manufacturer (Boston Scientific versus Medtronic/Medstent), specific system model (Jetstream versus Phoenix), and catheter calibre (ranging from 1.6 mm to 3.4 mm). Each investigator completed extractions in randomised source order generated using R statistical software[^18].

Investigators remained blinded to extractions performed by other team members throughout the data collection phase (January–March 2024).

AMR, serving as external validator, was based at Hospital Regional Presidente Juárez, Oaxaca City (geographically separate ISSSTE institution) and lacked subspecialty training in vascular surgery, thereby simulating validation scenarios in resource-limited settings where subspecialist expertise may be unavailable[^7].

Statistical Analysis

Primary Outcome: Inter-rater reliability was quantified using Cohen's κ coefficient with 95% confidence intervals calculated via bootstrap resampling (10,000 iterations)[7][8][^9]. Standard interpretation thresholds applied: κ < 0.00 (no agreement), 0.00–0.20 (slight), 0.21–0.40 (fair), 0.41–0.60 (moderate), 0.61–0.80 (substantial), 0.81–1.00 (almost perfect)[7][8]. We considered κ ≥ 0.90 as threshold for "near-perfect" reliability suitable for research-grade data quality.

Sample Size Justification

To detect κ ≥ 0.90 with 80% power, assuming null hypothesis κ = 0.70 (typical for dual- source methods), α = 0.05 (two-tailed), and three raters, we required minimum 154 procedures[^10]. Our cohort of 176 procedures provided 92% power for the primary comparison.

Secondary Analyses

We simulated single-source and dual-source extraction scenarios by randomly sampling subsets of our thirteen-source dataset. We systematically excluded individual sources and recalculated overall κ to assess marginal contributions. All data points with non-unanimous investigator agreement underwent structured adjudication following pre-specified hierarchical protocol.

Statistical analyses utilised R software[^18] with appropriate packages for inter-rater reliability assessment.

Ethical Considerations

This study received institutional review board approval (Protocol RPI-ISSSTE 2025-0033) with waiver of informed consent pursuant to retrospective design using de-identified data. All procedures were performed per standard clinical protocols independent of research objectives.

RESULTS

Cohort Characteristics

The analytical cohort comprised 176 rotational atherectomy procedures performed on 168 unique patients (94.2% with diabetes mellitus type 2) between January 2020 and December 2023. Median patient age was 68 years (interquartile range 62–74 years), with slight male predominance (58.3%). Device distribution demonstrated 94 Jetstream™ procedures (53.4%) and 82 Phoenix™ procedures (46.6%).

Primary Outcome: Inter-Rater Reliability

The thirteen-source framework achieved an overall inter-rater reliability of κ = 0.97 (95% CI: 0.94–0.99) across 1,377 extracted data points. Pairwise investigator comparisons revealed: DMR versus JCMR κ = 0.98 (95% CI: 0.96–0.99), DMR versus AMR κ = 0.94 (95% CI: 0.90–0.97), JCMR versus AMR κ = 0.95 (95% CI: 0.92–0.98). External validation by AMR demonstrated κ = 0.94 despite absence of subspecialty training, confirming framework reproducibility by non-specialist investigators (Fig. 2).

[INSERT FIGURE 2 HERE]

Of 1,377 total data points, 1,330 (96.6%) demonstrated complete concordance across all three investigators. The remaining 47 discrepant data points (3.4%) occurred predominantly in catheter calibre classification (n = 38, 80.9%), with manufacturer identification (n = 6, 12.8%) and model classification (n = 3, 6.4%) demonstrating near-unanimous agreement.

Comparative Performance Analysis

Bootstrap simulation comparing multi-source integration versus conventional approaches demonstrated substantial reliability improvements (Table 2):

Table 2
Comparative Benchmarking Against Published Reliability Estimates
Approach	Study	Setting	n	κ	95% CI	Δκ vs M3	Relative Improvement
Single- source	Mi et al. 2013[^4]	Systematic review	Pooled	0.65	0.58- 0.72	-0.32	-49%
Dual-source	van Hoeven 2017[^5]	Netherlands	1,847	0.78	0.72- 0.84	-0.19	-24%
Thirteen- source (M3)	Current study	Mexico	176	0.97	0.94- 0.99	Reference	Reference
Δκ = M3 minus comparator; Relative improvement = (M3 κ - comparator κ) / comparator κ × 100%.

Single-source extraction: Mean κ = 0.65 (95% CI: 0.58–0.72), representing 49% lower reliability than thirteen- source framework (p < 0.001)

Dual-source extraction: Mean κ = 0.78 (95% CI: 0.72–0.84), representing 24% lower reliability than thirteen-source framework (p < 0.001)

Source-Specific Contribution Analysis

Systematic source exclusion analysis identified five "high-impact" sources demonstrating Δκ > 0.05 upon removal (Table 3): pharmacy dispensing logs (Δκ = 0.11), manufacturer technical bulletins (Δκ = 0.09), operative reports (Δκ = 0.08), PACS metadata with local backup (Δκ = 0.07), and procurement archives (Δκ = 0.06).

Table 3
Source-Specific Contributions to Device Data Elements
Data Element	Primary Source(s)	% Cases With Data	Secondary Sources	Role in Adjudication
Device system (Jetstream vs Phoenix)	Source 10 (Pharmacy)	98.3%	Sources 2, 4, 12	Definitive verification
Device manufacturer	Source 10 (Pharmacy)	98.3%	Sources 11, 12	Acronym resolution
Device model	Sources 10, 12	97.1%	Sources 2, 3, 4	Model nomenclature
Crown size (mm)	Sources 2, 3, 4	95.4%	Sources 10, 11	Sequential sizing
Lot number	Sources 10, 12, 13	87.5%	Source 3	Traceability verification
Procedure date	All sources	100%	—	Cross-validation timestamp
Primary operator	Sources 1, 2, 4	100%	Sources 8, 9	Personnel verification
Initial institutional database query identified 203 procedures coded as "directional atherectomy" between January 2020 and December 2023. After excluding 27 procedures (18 using true directional atherectomy devices, 9 with incomplete data), 176 rotational atherectomy procedures were included in final analysis, representing 168 unique patients (8 patients underwent multiple procedures). Device distribution: 142 (80.7%) Jetstream (Boston Scientific) and 34 (19.3%) Phoenix (Medtronic/Medstent). The 13-source validation workflow achieved Cohen's κ = 0.97 (95% CI: 0.94–0.99) with 100% device identification accuracy across 5,440 triangulated data points.
2. Schematic diagram of thirteen-source validation workflow showing data flow from independent sources through parallel extraction into separate REDCap databases.
Visual representation displays how three independent investigators extracted data from thirteen documentary sources organised into six validation domains: prospective clinical documentation (blue), administrative coding records (green), digital imaging archives (yellow), pharmacy and supply chain (purple), manufacturer device traceability (pink), and institutional procurement systems (grey). Arrows indicate information flow through automated three-way comparison and systematic adjudication process. Final validation metrics: 5,440 procedure-data points extracted, 96.6% initial concordance (n = 5,257), 3.4% discrepancies resolved through adjudication (n = 183), achieving Cohen's κ = 0.97 (95% CI: 0.94–0.99).
Cohen's κ coefficients with 95% confidence intervals demonstrate substantial improvement of thirteen-source triangulation framework (κ = 0.97, 95% CI: 0.94–0.99) compared to published benchmarks: Mi et al. 2013 single-source electronic health record extraction (κ = 0.65), and van Hoeven et al. 2017 dual-source validation combining administrative and clinical data (κ = 0.78). Vertical dashed reference lines at κ = 0.60 and κ = 0.81 indicate Landis & Koch criteria thresholds for "substantial agreement" and "almost perfect agreement" respectively. The thirteen-source approach substantially exceeds both benchmark comparators and achieves near-perfect reliability comparable to advanced integrated registry systems whilst providing a replicable framework for resource-limited settings

Discrepancy Resolution and Adjudication

All 47 discrepant data points underwent systematic adjudication, achieving 100% resolution through hierarchical evidence evaluation. Median adjudication time was 8.3 minutes per discrepant data point, corresponding to 6.5 hours total investigator time for complete cohort resolution.

Device Identification Completeness

The thirteen-source framework achieved complete device identification (manufacturer, model, calibre) in 176/176 procedures (100%). This completeness advantage likely reflects the framework's inherent redundancy: 94.3% of procedures had device data documented in ≥ 8 distinct sources, providing multiple independent verification pathways.

DISCUSSION

Principal Findings

This study demonstrates that systematic multi-source data triangulation enables transitional healthcare systems to achieve near-perfect inter-rater reliability (κ = 0.97) for medical device identification, substantially exceeding benchmarks from published validation studies[4][5]. Three key findings merit emphasis.

First, documentation multiplicity, conventionally perceived as methodological limitation in resource-limited settings, can be leveraged as asset through structured integration protocols. Our thirteen-source framework demonstrated 49% reliability improvement over single-source methods and 24% improvement over dual-source approaches, with bootstrap analysis confirming statistical significance (both p < 0.001).

Second, external validation by a non-specialist investigator (AMR) from a geographically separate institution (Hospital Regional Presidente Juárez, Oaxaca City) achieved κ = 0.94, confirming framework reproducibility without subspecialty expertise[6][7]. This finding addresses a critical barrier to research participation in resource-limited settings.

Third, source-specific contribution analysis revealed that pharmacy dispensing logs and manufacturer technical bulletins provided disproportionate reliability improvements (Δκ = 0.11 and 0.09 respectively), suggesting that enhanced integration of supply chain documentation could further optimise registry architectures globally[^37].

Comparison with Published Literature

Our κ = 0.97 substantially exceeds inter-rater reliability coefficients reported by validation studies of medical record abstraction. Mi et al.'s systematic review reported pooled estimates of κ = 0.65 for single-source extraction[^4], whilst van Hoeven et al. demonstrated κ = 0.78 for dual-source approaches[^5]. Gianinazzi et al. reported κ = 0.76 for medical record abstraction in paediatric oncology follow-up[^6], demonstrating that moderate reliability persists even in well- resourced settings when documentation remains fragmented.

This superior performance likely reflects three framework characteristics: (1) systematic cross-validation across independent documentation streams reduces correlated errors inherent in single-source systems[^15]; (2) integration of supply chain sources provides manufacturer-verified device specifications absent from purely clinical documentation[^37]; and (3) hierarchical adjudication protocols enable definitive resolution of discrepant data points through evidence triangulation.

Methodological Considerations and Limitations

Several methodological strengths warrant acknowledgement. First, our three-investigator design with blinded extraction and external validation provides robust reliability assessment [1] [^2]. Second, bootstrap analysis with 10,000 iterations yields precise confidence interval estimation. Third, systematic source-specific contribution analysis enables evidence- based framework optimisation[^28].

However, important limitations merit discussion. Our single-institution implementation limits generalisability, particularly to settings with substantially different documentation architectures. The retrospective design introduced potential selection bias, as procedures with incomplete documentation (n = 2) were necessarily excluded. Our analytical cohort comprised rotational atherectomy procedures exclusively, limiting conclusions regarding framework applicability to other medical device categories. External validation involved single investigator (AMR) from single separate institution (Hospital Regional Presidente Juárez, Oaxaca City), potentially limiting reproducibility assessment.

Finally, our study quantified inter-rater reliability as surrogate for data quality but did not assess ultimate criterion validity against manufacturer shipment records or patient-level device implant verification[19][20][^21].

Practical Implications

Our findings hold immediate practical implications for medical device research in resource-limited settings. The framework's replicability by non-specialist investigators suggests feasibility for collaborative research networks where subspecialty expertise concentrates at hub institutions but data extraction occurs across multiple spoke sites. This model could substantially expand research capacity whilst maintaining methodological rigour. A structured implementation timeline (Fig. 3) demonstrates feasibility for institutions seeking to adopt this framework.

Fig. 3

Forest plot comparing inter-rater reliability across multi-source data extraction approaches.

[INSERT FIGURE 3 HERE]

The identification of high-impact sources provides actionable guidance for registry development prioritisation. Institutions confronting resource constraints might focus initial integration efforts on pharmacy dispensing logs, manufacturer bulletins, operative reports, PACS metadata, and procurement archives, potentially achieving κ > 0.85 whilst deferring lower-impact sources until infrastructure capacity expands.

Cost-effectiveness considerations favour multi-source integration approaches in transitional settings. Whilst our framework required 6.5 hours investigator time for complete cohort extraction and adjudication, advanced electronic health record implementation typically requires 18–24 months and substantial capital investment[14][15]. The framework's reliance on existing documentation streams eliminates upfront infrastructure costs whilst providing immediate research capability.

CONCLUSIONS

Systematic multi-source data triangulation enables transitional healthcare systems to achieve near-perfect inter-rater reliability (κ = 0.97) for medical device identification, substantially exceeding benchmarks from published validation studies. Documentation multiplicity, when leveraged through structured protocols, transforms from methodological limitation to asset. The framework demonstrates reproducibility by non-specialist investigators from geographically separate institutions and achieves complete device identification in 100% of procedures.

These findings challenge prevailing assumptions that advanced digital infrastructure constitutes prerequisite for rigorous medical device research. Resource-limited settings can generate research-grade data quality through systematic integration of existing documentation streams, democratising capacity for post-market surveillance, comparative effectiveness research, and quality improvement initiatives.

Declarations

Ethics Approval and Consent to Participate

This study received approval from the ISSSTE Research Ethics Committee, protocol number RPI-ISSSTE-2025-0033, with waiver of informed consent per retrospective design using de-identified data.

Consent for Publication

Not applicable.

Data Availability

Datasets are available from the corresponding author upon reasonable request, subject to institutional data sharing agreements.

Competing Interests

Two authors (DMR and AMR) are siblings. AMR served exclusively as external validator from geographically separate institution (Hospital Regional Presidente Juárez, Oaxaca City), with inter-rater calculations performed independently by senior statistician (DHL). All data extraction protocols were pre-specified and blinded. No other competing interests exist.

Funding

No funding received.

Author Contribution

DMR: Conceptualisation, methodology, investigation, formal analysis, writing—original draught, project administration.JCMR: Methodology, investigation, data curation, writing—review and editing.AMR: Investigation (external validation), writing—review and editing.CELM : Investigation, data curation, writing—review and editing.GDTA : Resources, writing—review and editing, supervision.DHL : Formal analysis, writing—review and editing, supervision. All authors read and approved the final manuscript.

JCMR

Methodology, investigation, data curation, writing—review and editing.

AMR

Investigation (external validation), writing—review and editing.

CELM

Investigation, data curation, writing—review and editing.

GDTA

Resources, writing—review and editing, supervision.

DHL

Formal analysis, writing—review and editing, supervision. All authors read and approved the final manuscript.

Acknowledgements

We thank medical records, pharmacy, procurement, and anaesthesiology department staff at Hospital Regional General Ignacio Zaragoza for assistance in locating archived documentation and maintaining specialised care protocols for vascular surgery patients. We particularly acknowledge the departmental secretary responsible for monthly archive reconciliation.

Electronic Supplementary Material

Below is the link to the electronic supplementary material

Supplementary Material 1

References

Benchimol EI, Smeeth L, Guttmann A, Harron K, Moher D, Petersen I, et al. The REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. PLoS Med. 2015;12(10):e1001885.

von Elm E, Altman DG, Egger M, Gøtzsche PC, Mulrow CD, Pocock SJ, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. 2007;370(9596):1453–7.

US Food and Drug Administration. Framework for FDA's Real-World Evidence Program. Silver Spring, MD: FDA; 2018.

Mi MY, Sun Y, Liu Y, Li H, Zhou L, Gong Y, et al. Reliability of medical record abstraction by nonphysicians for chronic disease research: a systematic review. BMC Med Res Methodol. 2013;13:132.

van Hoeven LR, Janssen MP, Roes KC, Koffijberg H. Validation of multisource electronic health record data: an application to blood transfusion data. BMC Med Inf Decis Mak. 2017;17:107.

Gianinazzi ME, Essig S, Rueegg CS, von der Weid NX, Niggli FK, Kuehni CE, et al. Intra-rater and inter-rater reliability of a medical record abstraction study of transition of care after childhood cancer. PLoS ONE. 2015;10(5):e0124290.

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–74.

McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb). 2012;22(3):276–82.

Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20(1):37–46.

10.

Fleiss JL, Levin B, Paik MC. Statistical methods for rates and proportions. 3rd ed. New York: Wiley; 2003.

11.

Bagolle A, Cañizares A, Zárate V. Regulatory frameworks for digital health in Latin American and the Caribbean: electronic health records: progresses and next steps. Washington, DC: Inter-American Development Bank; 2020.

12.

Bernal O, Forero JC, Forde I. Digital transformation of the health sector in Latin America and the Caribbean. Washington, DC: Inter-American Development Bank; 2019.

13.

López-Valenzuela CL, Ortega-Villa EM, Robles-Franco P, Rivas-Ruiz R, Galván-Plata ME, Castañeda-Alcántara JL, et al. Healthcare information systems in Mexico: description and analysis at national level. BMC Med Inf Decis Mak. 2020;20(1):316.

14.

Kruse CS, Stein A, Thomas H, Kaur H. The use of electronic health records to support population health: a systematic review of the literature. J Med Syst. 2018;42(11):214.

15.

Sheikh A, Cornford T, Barber N, Avery A, Takian A, Lichtner V, et al. Implementation and adoption of nationwide electronic health records in secondary care in England: final qualitative results from prospective national evaluation in early adopter hospitals. BMJ. 2011;343:d6054.

16.

Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inf. 2009;42(2):377–81.

17.

Harris PA, Taylor R, Minor BL, Elliott V, Fernandez M, O'Neal L, et al. The REDCap consortium: building an international community of software platform partners. J Biomed Inf. 2019;95:103208.

18.

R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2021.

19.

Resnic FS, Gross TP, Marinac-Dabic D, Loyo-Berrios N, Donnelly S, Normand SL, et al. Automated surveillance to detect postprocedure safety signals of approved cardiovascular devices. JAMA. 2010;304(18):2019–27.

20.

Normand SL, Landrum MB, Guadagnoli E, Ayanian JZ, Ryan TJ, Cleary PD, et al. Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: a matched analysis using propensity scores. J Clin Epidemiol. 2001;54(4):387–98.

21.

Benchimol EI, Manuel DG, To T, Griffiths AM, Rabeneck L, Guttmann A. Development and use of reporting guidelines for assessing the quality of validation studies of health administrative data. J Clin Epidemiol. 2011;64(8):821–9.

22.

Sarrazin MS, Rosenthal GE. Finding pure and simple truths with administrative data. JAMA. 2012;307(13):1433–5.

23.

Dean BB, Lam J, Natoli JL, Butler Q, Aguilar D, Nordyke RJ. Review: use of electronic medical records for health outcomes research: a literature review. Med Care Res Rev. 2009;66(6):611–38.

24.

Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data resource profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol. 2015;44(3):827–36.

25.

Casey JA, Schwartz BS, Stewart WF, Adler NE. Using electronic health records for population health research: a review of methods and applications. Annu Rev Public Health. 2016;37:61–81.

26.

Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inf. 2015;216:574–8.

27.

Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. J Am Med Inf Assoc. 2012;19(1):54–60.

28.

Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55.

29.

Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar Behav Res. 2011;46(3):399–424.

30.

Stürmer T, Rothman KJ, Avorn J, Glynn RJ. Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution—a simulation study. Am J Epidemiol. 2010;172(7):843–54.

31.

Sherman RE, Anderson SA, Dal Pan GJ, Gray GW, Gross T, Hunter NL, et al. Real-world evidence—what is it and what can it tell us? N Engl J Med. 2016;375(23):2293–7.

32.

Makady A, de Boer A, Hillege H, Klungel O, Goettsch W. (on behalf of GetReal Work Package 1). What is real- world data? A review of definitions based on literature and stakeholder interviews. Value Health. 2017;20(7):858–65.

33.

Blonde L, Khunti K, Harris SB, Meizinger C, Skolnik NS. Interpretation and impact of real-world clinical data for the practicing clinician. Adv Ther. 2018;35(11):1763–74.

34.

Dreyer NA, Bryant A, Velentgas P. The GRACE checklist: a validated assessment tool for high quality observational studies of comparative effectiveness. J Manag Care Spec Pharm. 2016;22(10):1107–13.

35.

Berger ML, Sox H, Willke RJ, Brixner DL, Eichler HG, Goettsch W, et al. Good practices for real-world data studies of treatment and/or comparative effectiveness: recommendations from the joint ISPOR-ISPE Special Task Force on real-world evidence in health care decision making. Pharmacoepidemiol Drug Saf. 2017;26(9):1033–9.

36.

Wang SV, Schneeweiss S, Berger ML, Brown J, de Vries F, Douglas I, et al. Reporting to improve reproducibility and facilitate validity assessment for healthcare database studies V1.0. Pharmacoepidemiol Drug Saf. 2017;26(9):1018–32.

37.

European Medicines Agency. Guideline on good pharmacovigilance practices (GVP): Module VI – collection, management and submission of reports of suspected adverse reactions to medicinal products (Rev 2). London: EMA; 2017.

38.

Jarow JP, LaVange L, Woodcock J. Multidimensional evidence generation and FDA regulatory decision making: defining and using real-world data. JAMA. 2017;318(8):703–4.

39.

Collins R, Bowman L, Landray M, Peto R. The magic of randomization versus the myth of real-world evidence. N Engl J Med. 2020;382(7):674–8.

40.

Franklin JM, Schneeweiss S. When and how can real world data analyses substitute for randomized controlled trials? Clin Pharmacol Ther. 2017;102(6):924–33.

Tables

Abbreviations:

SIME

Sistema Institucional de Morbimortalidad y Egresos

SIMEH

Sistema de Información Médica

HIM

Health Information Management

PACS

Picture Archiving and Communication System

ICD-10-CM

International Classification of Diseases,10th Revision,Clinical Modification.

Yes

Abstract

Background: Medical device registries in transitional healthcare systems face substantial challenges achieving reliable data extraction due to documentation fragmentation and absence of integrated electronic health records. Conventional single-source or dual-source extraction methods demonstrate moderate inter-rater reliability (κ=0.50–0.70), limiting research validity. We hypothesised that systematic multi-source data triangulation could achieve near-perfect reliability comparable to advanced registry systems whilst providing a replicable framework for resource-limited settings. Methods: We conducted a retrospective methodological validation study of 176 rotational atherectomy procedures performed between January 2020 and December 2023 at Hospital Regional General Ignacio Zaragoza, Mexico. Three independent, blinded investigators extracted device-specific data from thirteen documentary sources organised into six validation domains. Inter-rater reliability was assessed using Cohen's κ with 95% confidence intervals. External validation was performed by an investigator from a geographically separate institution. Results: The thirteen-source framework achieved overall inter-rater reliability of κ=0.97 (95% CI: 0.94–0.99), with 96.6% concordance across three extractors. External validation demonstrated κ=0.94, confirming reproducibility without subspecialty expertise. Complete device identification was achieved in 100% of procedures. Comparative bootstrap analysis revealed 49% improvement over single-source extraction (κ=0.65, p0.001) and 24% improvement over dual-source methods (κ=0.78, p0.001). Conclusions: Systematic multi-source data triangulation enables transitional healthcare systems to achieve research- grade inter-rater reliability exceeding advanced registry benchmarks. Documentation multiplicity, when leveraged through structured protocols, transforms from methodological limitation to asset.