Structuring Trust: A Quantitative and Traceable Framework for Hardware Security Assurance
Shao-Fang
Wen
1,2✉
Emailshao-fang.wen@ntnu.no
Arvind
Sharma
1
1
Department of Information Security and Communication Technology
Norwegian University of Science and Technology
2815
Gjøvik
Norway
2
Department of Business and IT
University of South-Eastern Norway
3800
Bø
Norway
Shao-Fang Wen 1,2,* and Arvind Sharma 1
1 Department of Information Security and Communication Technology, Norwegian University of Science and Technology, 2815 Gjøvik, Norway
2 Department of Business and IT, University of South-Eastern Norway, 3800 Bø, Norway
* Correspondence: shao-fang.wen@ntnu.no
Abstract
The security assurance of hardware systems is increasingly critical in connected and safety-sensitive infrastructures, where compromised components can endanger human safety or disrupt essential services. However, current assurance practices remain largely qualitative and fragmented across overlapping standards, leading to duplicated evaluation efforts and inconsistent interpretations of system trustworthiness. This paper introduces a structured, multi-level framework that links security requirements to verification evidence through six traceable layers, enabling reproducible and partially quantitative assessment of hardware assurance. The framework supports lifecycle reasoning and transparent traceability, allowing assurance arguments to be consolidated across complex, multi-component systems. Its applicability is demonstrated through a wireless fingertip oximeter used in healthcare infrastructure, illustrating how traceable evidence and quantitative scoring can provide measurable subsystem evaluation and diagnostic insight into system-level resilience.
Keywords:
Hardware security assurance
Quantitative assurance
Security metrics
Security evaluation
A
1. Introduction
The security assurance of hardware-based systems has become a critical concern across safety-critical and connected domains. Medical devices, industrial controllers, and embedded platforms increasingly underpin critical infrastructures whose disruption can affect public safety, healthcare delivery, or industrial continuity. These systems often operate in environments exposed to persistent cyber threats, while regulators demand robust evidence that such components can be trusted to perform securely and reliably [1–3]. Failures in hardware assurance can therefore have cascading consequences, from compromised patient safety in medical contexts to large-scale disruptions across interconnected infrastructure systems. In this setting, assurance is not merely a matter of compliance but of establishing justified confidence that systems have been designed, implemented, and validated against recognized threats [4].
Yet existing assurance practices struggle to provide this confidence in a consistent and transparent way. Many evaluations still rely on qualitative judgments or assessor-dependent reasoning, which introduces subjectivity and undermines reproducibility [5, 6]. Security standards provide authoritative requirements, but when applied in isolation they often overlap, leave gaps, or even contradict one another [7, 8]. Evaluators consequently face duplicated effort and uncertainty about whether assurance outcomes represent a coherent and defensible view of system trustworthiness [9]. This challenge becomes more critical in the context of interconnected infrastructures, where hardware components form part of global supply chains and multi-vendor ecosystems. Defining the scope of evaluation is particularly difficult when system elements are distributed across suppliers, design regions, and fabrication processes [10, 11]. These uncertainties are amplified in distributed integrated-circuit development and component provenance tracking. Finally, most current frameworks remain qualitative, offering little analytical foundation for comparing subsystem assurance or consolidating results into a traceable, system-level perspective[12].
Recent academic contributions have sought to address aspects of these limitations through quantitative metrics [12, 13], hardware-centric evaluation frameworks [14], and leakage quantification techniques [15]. However, these efforts often remain narrowly focused on specific device types or threat models and lack the generalizability required for system-wide assurance across diverse infrastructure domains. Taken together, these gaps reveal four persistent challenges in achieving traceable and reproducible hardware assurance.
First, abstract objectives must be systematically refined into measurable evidence to avoid vague or subjective claims [16–18]. Second, the complexity and heterogeneity of hardware systems, particularly those integrated into interconnected infrastructures—demand explicit and defensible evaluation boundaries that remain meaningful at both component and system levels [19]. Third, authoritative requirements from overlapping standards must be integrated into a traceable and coherent structure while also addressing assurance gaps where existing standards are silent, such as supply chain authenticity, lifecycle obsolescence, and emerging microarchitectural threats [20]. Finally, assurance must evolve beyond qualitative reasoning by adopting analytic methods that enable measurement, comparison across subsystems, and aggregation into a system-level view of infrastructure resilience [21, 22].
As a direct response to these challenges, this paper proposes a traceable, multi-level framework for hardware security assurance that supports consistent and evidence-based reasoning across interconnected systems. The framework is grounded in four design pillars: progression, scoping, traceability, and analytics, and organizes assurance into six levels, from scope definition to verification, ensuring transparency, scalability, and reproducibility. Its application is demonstrated through a safety-critical case study involving a wireless fingertip oximeter, representing a foundational element of healthcare infrastructure. The framework is further extended with a quantitative methodology that enables measurable subsystem evaluation and aggregation into system-level assurance, thereby addressing residual risks and contributing to overall infrastructure resilience.
The contributions of this paper are threefold. First, it conceptualizes a multi-level assurance framework that operationalizes the principles of progression, scoping, and analytic assurance while integrating authoritative standards and extending them to cover neglected areas such as supply-chain authenticity, lifecycle resilience, and emerging microarchitectural threats.
Second, it demonstrates the framework through a medical-infrastructure case study, showing how assurance artifacts can be generated across all levels and where the framework adds value beyond standards-based compliance. Third, it introduces a quantitative scoring and aggregation methodology that transforms assurance evidence into measurable and diagnostic outputs, offering insight into subsystem coverage and system-level dependability that can inform infrastructure protection and regulatory assurance processes.
The remainder of the paper is structured as follows. Section 2 reviews related work and situates the study in existing assurance practices. Section 3 introduces the design pillars, while Section 4 presents the framework architecture. Section 5 outlines the evaluation methodology, and Section 6 reports the case study results. Section 7 extends the framework with a quantitative methodology, and Section 8 discusses contributions, practical implications, limitations, and future work. Section 9 concludes the paper.
2. Theoretical Background
2.1 Security Assurance in Hardware Systems: Definition and Challenges
Security assurance refers to a disciplined process of evaluating and substantiating the trustworthiness of a system, with the objective of demonstrating that defined security requirements are satisfied and that the system can resist relevant threats across its lifecycle [23, 24]. More specifically, it reflects the degree of confidence that these requirements are fulfilled in a substantive manner, while any residual vulnerabilities are either explicitly accepted within an agreed risk tolerance or mitigated through verifiable countermeasures. [25, 26]. In practice, assurance means more than demonstrating compliance; it requires showing that a system is effectively protected against threats, weaknesses, and attacks by maintaining appropriate safeguards [21]. At its core, assurance provides justified confidence, supported by structured evidence and defensible reasoning that connect high-level security objectives to observable verification outcomes [20].
When applied to hardware, assurance extends beyond software correctness to encompass the physical and architectural properties of devices. It requires evaluation across multiple layers, from printed circuit boards and chips to firmware, software, and operational environments [27]. Key features include tamper resistance, secure boot, cryptographic acceleration, vulnerability management, and incident response, all of which contribute to resilience against advanced attacks. Hardware assurance must also account for physical exposure, subsystem heterogeneity, extended lifecycles, and supply chain risks. Recent studies highlight these imperatives: Mishra and Sahay [28] survey emerging threats such as fault injection, voltage glitching, and speculative execution flaws, while NIST IR 8517 [29] catalogs failure scenarios tied to integration and lifecycle weaknesses. Taken together, these findings underscore that hardware assurance must integrate both logical and physical dimensions to provide enduring trustworthiness.
A
While this broader scope defines what hardware assurance entails, its practical realization is constrained by four recurring challenges. First, hardware is uniquely exposed to physical attacks. Adversaries can manipulate voltage, clock signals, or electromagnetic fields to induce faults, and exploit side channels such as power consumption and timing to extract secrets [
30,
31]. Addressing these threats requires assurance activities that extend beyond code review into fault injection testing, environmental analysis, and physical validation.
Second, hardware platforms are increasingly heterogeneous, combining processors, secure elements, networking interfaces, and firmware controllers within a single device. Preserving subsystem context while reasoning at the system level is critical, since vulnerabilities often emerge at integration boundaries. Studies emphasize that secure composition demands explicit support for cross-component threats [32] [33], and methods such as layered fault-tree analysis demonstrate how assurance can span hardware and software layer[34].
Third, hardware devices often operate under extended lifecycles. Medical, automotive, and industrial platforms may remain in service for decades, during which threats, cryptographic standards, and regulations evolve. This creates assurance challenges around secure updates, obsolescence, and long-term trust. Studies reveal widespread vulnerabilities in end-of-life devices [34], while frameworks for obsolescence management [35] and regulatory guidance on legacy medical devices [36] stress the need for continuous assurance across the total product lifecycle.
Finally, hardware assurance involves specialized verification methods beyond traditional testing. Techniques such as side-channel analysis, fault injection, and formal verification (e.g., SecVerilog) are required to validate critical properties under adversarial conditions [37]. Yet aligning such diverse forms of evidence into a coherent assurance argument remains unresolved.
These difficulties collectively demonstrate the need for frameworks that can facilitate integration into system-level reasoning, capture subsystem context, and account for physical, lifespan, and verification restrictions in hardware assurance. The ensuing sections look at how these problems are addressed by current standards and scholarly works, as well as the reasons why further structuring is required.
2.2 Role of Standards
Standards form the authoritative foundation for hardware security assurance, setting expectations for the protections and controls required in regulated domains. For example, FIPS 140-3 specifies cryptographic module protections such as tamper resistance, key management, and lifecycle controls [38]. IEC 62304 governs the software lifecycle of medical devices but inevitably extends into the hardware–software interface, where embedded controllers and processors are safety-critical [39]. In the medical domain, ISO 14971 prescribes risk management practices [40], while UL 2900-2-1 defines cybersecurity requirements for network-connectable healthcare products [41]. At a broader organizational level, ISO/IEC 27001 establishes information security management principles that also influence hardware assurance [42]. Together, these standards define the baseline requirements against which security claims are evaluated.
Although authoritative, security standards have evolved largely in isolation, leading to fragmentation, overlaps, and unaddressed scope. For example, mechanisms such as secure boot and cryptographic validation appear in multiple standards but are described with inconsistent terminology and varying rigor, complicating reconciliation in practice. At the same time, important risks are under-specified: supply chain authenticity, lifecycle obsolescence, and resilience to microarchitectural attacks are unevenly treated or absent altogether. Therefore, complete assurance cannot be guaranteed by merely adhering to one or even multiple standards.
The history of hardware vulnerabilities reinforces this point. Side-channel exploits such as Spectre and Meltdown revealed weaknesses in speculative execution that were not previously considered in standards [43]. Likewise, CWE-1247 documents fault injection vulnerabilities from voltage and clock manipulation, showing that certain hardware-level threats extend beyond the scope of existing requirements [44]. Academic studies echo these limitations, noting that standards often function as baselines that are necessarily reactive and incomplete.
In this paper, standards are treated as critical inputs to assurance rather than as complete solutions. They provide the authoritative requirements that must be respected, but the diversity and fragmentation of these documents mean they must be consolidated, reconciled, and aligned to form a coherent assurance argument. This motivates the need for a structured approach, introduced later in this work, that organizes heterogeneous standards into a unified requirements set while preserving their provenance and supporting evidence-based evaluation.
2.3 Related Academic Work
Research into security assurance has yielded a variety of frameworks, methodologies, and surveys, but their coverage of hardware remains fragmented. Several reviews emphasize this gap. Yaacoub et al. [45] highlight that cyber–physical systems security research often addresses narrow aspects such as encryption or network security, without unifying them into a broader assurance methodology. Similarly, Shukla et al. [5] identify that existing assurance approaches for ICT and CPS are either highly abstract, offering conceptual models with limited operational guidance, or highly specific, focusing on isolated domains such as microcontrollers or IoT devices. Both reviews converge on the point that a unified, transferable framework is lacking.
Table 1 summarizes representative academic contributions mapped to key assurance dimensions. At the foundational level, research has examined mechanisms such as tamper resistance, secure boot, cryptographic acceleration, and incident response. Other work has addressed subsystem heterogeneity, proposing compositional methods for preserving security across hardware/software layers. Lifecycle trust has been studied through frameworks for obsolescence management and regulatory guidance on total product lifecycle security in domains such as medical and industrial devices. Verification-oriented contributions include formal methods for information-flow security and systematic adversarial testing against side-channel and fault-injection attacks. More recently, studies have explored quantitative and AI-driven approaches, including resilience scoring, algorithms for quantitative attack-tree analysis, hardware-centric assurance frameworks, and automated, traceability techniques.
While these contributions represent significant progress within individual assurance dimensions, they remain fragmented and lack integration. In particular, no existing academic framework consolidates heterogeneous standards, preserves provenance across objectives, requirements, and verification, and enables quantitative aggregation. Addressing this gap is the focus of the framework developed in this paper.
Table 1
Summary of representative academic work in hardware security assurance
|
Assurance Dimension
|
Representative Mechanisms / Focus
|
Representative Academic Work
|
Contribution
|
|
Foundational Security Mechanisms
|
Tamper resistance, secure boot, cryptographic acceleration, vulnerability management, risk assessment, incident response
|
[46], [47], [48], [49], [50], [51]
|
Hardware Trojan detection; Taxonomies of microarchitectural vulnerabilities; framing assurance against physical attacks
|
|
Subsystem Heterogeneity
|
Integration of processors, secure elements, networking interfaces, firmware controllers
|
[32], [52], [53]
|
Compositional assurance linking hardware/software layers; preserving subsystem context
|
|
Lifecycle & Long-Term Trust
|
Update mechanisms, obsolescence management, maintaining trust across medical/industrial/automotive device lifetimes
|
[35], [36], [54]
|
Scenario-based obsolescence management; regulatory guidance on total product lifecycle security
|
|
Verification & Evidence
|
Formal verification of information flows; side-channel and fault-injection testing
|
[15], [37], [55], [56]
|
Formal methods and systematic adversarial testing to produce reproducible evidence
|
|
Toward Quantitative & Automated
|
Quantitative scoring, AI-driven requirement elicitation and traceability
|
[12], [13], [14], [27], [57], [58], [59], [60]
|
Hardware security scoring framework; AI/ML support for requirement extraction and assurance automation
|
2.4 Boundary-Driven System of Interest Model
A central difficulty highlighted in both standards and academic work is that assurance often lacks a clear and defensible definition of system scope. Without explicit boundaries, evaluations risk either including elements irrelevant to the stated objectives or excluding critical dependencies whose compromise would undermine trustworthiness. To address this, the proposed framework is grounded in the Boundary-Driven System of Interest (SoI) Model, originally introduced by Wen and Katt in the context of software security assurance evaluations [25, 61]. The SoI refers to the specific device, component, or system identified by stakeholders as the focus of assurance activities, with its boundary preventing ambiguity and ensuring that evaluation efforts remain directed and relevant.
The SoI Model used in this research is illustrated in Fig. 1. In its extended form, the model is structured around a Component–Environment–Process (C–E–P) triad. The System Component comprises the core hardware under direct evaluation (e.g., chips, firmware, controllers). The System Environment includes integrated or peripheral elements whose trustworthiness is essential but not the primary focus of evaluation (e.g., networks, power supply). The extension introduces a System Process layer, capturing workflows such as provisioning, manufacturing, update deployment, and incident response. These processes directly affect system trustworthiness throughout the lifecycle, as a compromised provisioning pipeline or insecure update service can bypass otherwise robust technical safeguards.
his process dimension addresses a notable gap in existing frameworks, which tend to emphasize technical artifacts while leaving operational workflows implicit. For example, NIST SP 800 − 193 provides comprehensive guidance on firmware integrity and recovery but does not address how provisioning or update workflows are secured over time. Similarly, the Trusted Computing Group (TCG) [62] device trust model focuses on hardware roots of trust and cryptographic anchors but does not explicitly integrate supply chain or patching practices. PSA Certified defines baseline goals for IoT security, yet it also abstracts away lifecycle processes, assuming these are managed externally.
By explicitly incorporating System Processes (e.g., provisioning, secure manufacturing, update distribution, patching, auditing), the SoI model captures assurance dependencies that persist beyond the initial design phase. These processes are often exploited by adversaries—for example, compromised provisioning pipelines that create rogue device identities, or weak update signing enabling rollback attacks. Their evaluation may combine device-level testing (e.g., confirming that only signed updates are accepted) with vendor-provided evidence such as process documentation, logs, or certifications. Including such processes within the assurance boundary ensures that the taxonomy reflects not only static hardware protections but also the dynamic operational workflows that sustain trustworthiness throughout the device lifecycle.
By making boundaries explicit through the C–E–P triad, the SoI model enables assurance that is focused, auditable, and reproducible. It ensures that evaluators distinguish between what is inside the boundary (and thus subject to verification) and what is outside (and thus assumed or constrained). Moreover, by integrating environmental and process considerations alongside component evaluation, the model supports a holistic understanding of system assurance that goes beyond traditional component-centric approaches.
3. Design Pillars
The design of a security assurance framework cannot be arbitrary or ad hoc. It must rest on a set of foundational pillars that capture enduring principles of how assurance should be conceived, justified, and practiced. These pillars reflect insights from established theories and prior research, but also respond to the unique challenges of assuring hardware systems. They represent the intellectual backbone of our philosophy of assurance, distilled through academic practice and honed into four interrelated commitments: progression, boundary and unit-based scoping, traceable standards integration, and analytic assurance.
Figure 2 illustrates these four design pillars and their interrelationships, serving as the conceptual foundation of the proposed framework.
Pillar 1: Progression from Abstract to Concrete
Security assurance must proceed as a logical descent from intent to evidence. In requirements engineering, it is well established that abstract objectives must be systematically refined into measurable and testable conditions to achieve completeness and reproducibility [16]. This principle is echoed in the Goal–Question–Metric paradigm [17], where high-level goals are translated into operational questions and further into measurable metrics. Assurance case theory reinforces this by requiring that claims be explicitly linked to supporting evidence in order to maintain credibility and transparency [18]. Without such operationalization, assurance risks remaining abstract, subjective, or unverifiable.
In hardware security assurance, this means that goals such as “preserve firmware integrity” or “protect cryptographic keys” cannot remain aspirational. They must be decomposed into specific requirements and ultimately verified by tests or measurements. By moving step by step from abstract objectives to concrete validation, assurance becomes reproducible across evaluators and resilient against subjective interpretation. This pillar affirms that credibility in assurance depends on making every claim observable, measurable, and testable.
Pillar 2: Boundary- and Unit-Based Scoping
Security assurance requires explicit boundaries and identifiable units of evaluation. System boundary theory emphasizes that the scope of evaluation must clearly distinguish what lies inside and outside the domain of assessment [19] [25, 61]. Without such clarity, evaluations risk overlooking critical dependencies or wasting resources on irrelevant elements. Since assurance problems and verification techniques vary at each level, it is equally crucial that assurance be anchored to a tangible item, be it a chip, firmware, or subsystem.
In hardware security, this principle is evident when assessing a cryptographic module. One must decide whether the scope includes only the chip, the supporting board, or even the lifecycle processes around provisioning. Anchoring the evaluation to a well-defined unit ensures that appropriate methods are applied: side-channel testing at the chip level, secure update validation at the firmware level, and supply-chain integrity checks at the process level. This pillar therefore establishes assurance as both defensible in scope and technically relevant in granularity.
Pillar 3: Traceable Standards Integration
Assurance claims must be anchored in recognized standards to avoid subjective or ad hoc justification. Every requirement and verification step should be traceable to an authoritative source, ensuring credibility for evaluators, regulators, and end-users alike [19, 20]. In security-critical contexts, traceability thus provides the auditable chain of reasoning needed to demonstrate that objectives are not arbitrary but grounded in authoritative obligations.
In hardware contexts, this challenge is amplified by overlapping and fragmented standards. For example, FIPS 140-3 specifies requirements for cryptographic modules [38], IEC 62304 governs lifecycle integrity for medical device software [39], and UL 2900-2-1 defines cybersecurity protections for network-connectable healthcare systems [41]. Each of these standards is authoritative in its domain, but when applied together they often create duplication, redundancy, or even contradictions. Without a structured approach, evaluators risk performing redundant tests or generating inconsistent results.
By methodically combining requirements from various standards into a single structure while maintaining their provenance, traceable standards integration offers a solution. This anchors each criterion in a reliable source and guarantees that evaluators can show not only what has been tested but also why. In this manner, transparent alignment with accepted standards strengthens assurance credibility by combining disparate compliance requirements into a cohesive, defendable assurance argument.
Pillar 4: Assurance Analytics
Assurance theory emphasizes that confidence in system trustworthiness must rest not only on prescriptive controls but also on systematic reasoning supported by evidence that can be analyzed, compared, and refined over time [20, 21]. Quantitative approaches strengthen this reasoning by applying computational and mathematical techniques to transform assurance from qualitative claims into analyzable metrics. Such metrics are capable of capturing both strengths and weaknesses in a system’s security posture, supporting structured evaluation and improved decision making [12]. Compared to purely qualitative methods, analytics provide clearer, more coherent representations of assurance, enable comparisons across subsystems, and highlight residual gaps that might otherwise remain hidden [22].
In hardware security, assurance must quantify coverage at the subsystem level, where components from diverse supply chain sources introduce varying risks. Analytics enable consistent evaluation of these subsystems, making strengths, weaknesses, and residual gaps visible. Systemic vulnerabilities, such as coupled attack pathways that only manifest at the system level, are then made visible via aggregated data. By turning unit-level evidence into comparable and measurable insight, analytic assurance transforms evaluation from compliance checking into a diagnostic practice that strengthens resilience.
The four pillars of progression, scoping, traceability, and analytics establish a coherent design philosophy for security assurance. ‘Progression’ connects high-level intent to concrete evidence, ‘Scoping’ defines defensible boundaries, ‘Traceability’ anchors claims in authoritative standards, and ‘Analytics’ extend assurance into measurement and diagnostic insight. Collectively, they provide the conceptual foundation for constructing robust frameworks of hardware security assurance. Table 2 summarizes how each pillar directly responds to the challenges outlined in Section 2, while grounding the response in established assurance theory, thereby making explicit the alignment between identified problems, guiding principles, and their theoretical foundations.
Table 2
Design pillars, addressed challenges, and theoretical foundations
|
Design Pillar
|
Addressed Problem / Challenge
|
Theoretical Anchor
|
|
Pillar 1: Progression from abstract to concrete
|
Assurance remains qualitative; objectives expressed vaguely and assessed subjectively, undermining reproducibility.
|
Requirements engineering refinement [16]; Goal–Question–Metric paradigm [17]; Assurance case theory [18]
|
|
Pillar 2: Boundary- and unit-based scoping
|
Hardware systems are heterogeneous; unclear scope leads to overlooked dependencies or wasted effort; assurance varies across levels (chip, firmware, subsystem).
|
System boundary theory [19]; SoI concepts in assurance practice [25, 61].
|
|
Pillar 3: Traceable standards integration
|
Fragmented, overlapping standards (FIPS, IEC, UL, ISO) create duplication, contradictions, and inconsistency in evaluation.
|
Traceability research [19]; Assurance argumentation [20].
|
|
Pillar 4: Assurance analytics
|
Predominantly qualitative frameworks lack metrics for coverage, aggregation, and residual risk discovery; supply chain complexity exacerbates this gap.
|
Assurance reasoning with evidence [21]; Standards integration and traceability [20]; [12, 22]
|
4. Framework Architecture
The framework architecture embodies the design philosophy by operationalizing the four pillars into a structured model for conducting hardware security assurance. Figure 3 illustrates the multi-level hardware security assurance framework. It is organized along two complementary dimensions: analytical segments, which capture the functional breadth of the assurance process, and hierarchical levels, which provide the analytical depth needed to move from abstract concerns to verifiable evidence. In this section we describe these segments and levels in detail, showing how assurance reasoning flows from scope definition to verification.
4.1 System Modeling
The first segment of the framework, System Modeling, directly reflects the pillar of boundary- and unit-based scoping. It establishes the foundation for assurance by defining the scope of evaluation and the concrete unit under analysis. Without a precise understanding of what the system is and how it is bounded, subsequent assurance activities risk inconsistency, incompleteness, or misinterpretation. This segment therefore provides the essential grounding on which all later activities depend.
4.1.1 Level 1: Assurance Scope
At the broadest level, assurance requires a clear and defensible delineation of the system boundary. Explicitly identifying what is included and excluded avoids ambiguity, prevents scope creep, and ensures that resulting assurance claims can be reproduced and justified.
The assurance scope is defined using the SoI model introduced in Section 2.3. By applying the Component–Environment–Process (C–E–P) triad, evaluators establish which elements fall inside the boundary and how external dependencies and lifecycle processes influence assurance. This boundary-driven definition shapes both the relevance of applicable requirements and the sufficiency of supporting evidence, forming the foundation for all subsequent levels of the framework.
4.1.2 Level 2: Evaluation Unit
A
Within the defined scope, assurance must be anchored to a concrete evaluation unit. This is the smallest bounded element of the system for which assurance concerns are coherent and verifiable evidence can be obtained. Anchoring assurance at this granularity creates several advantages. First, it prevents vagueness and overlap. When assurance remains at the system-wide level, requirements often become generic or duplicated across subsystems, making both compliance claims and verification results ambiguous. Units localize responsibility, ensuring that each claim is tied to a clear locus of control. Second, it aligns concerns with appropriate evidence. Different parts of a system give rise to different threats and demand different validation methods: firmware requires code-level integrity checks, wireless modules require protocol inspection, and power subsystems require fault-injection testing. By partitioning the system into distinct units, each requirement can be matched with the evidence modality that is both feasible and convincing. Third, it enhances reproducibility and comparability. Evaluation units provide stable reference points, so different assessors can reach consistent conclusions when applying the same framework. This makes assurance reasoning auditable and defensible across organizational or regulatory contexts. Finally, it enables analytic evaluation. Units can be counted, compared, and aggregated, allowing evaluators to measure coverage, detect redundancy, and identify systemic gaps that may not be visible at the system-wide level.
To avoid arbitrary or assessor-dependent decomposition, the framework employs Assurance-Oriented Hardware Decomposition (AHD). AHD is a structured methodology for defining evaluation units by aligning system breakdown with assurance logic rather than with hardware schematics or assessor preference. The process begins with the C–E–P scope from Level 1, which identifies the relevant components, environments, and processes. From this scope, candidate blocks are enumerated and then assessed against a set of split/merge rules (Table 3). Units are split when they present distinct assurance concerns, require fundamentally different forms of evidence, or are separated by a hard trust boundary such as a communication interface, lifecycle transition, or cryptographic root of trust. Units are merged when they share the same assurance concerns, can be verified with common evidence, or lack a stable interface that would justify independent treatment.
By following these rules, AHD ensures that decomposition is neither too fine-grained, where fragmentation results in redundant requirements and undue verification effort, nor too coarse, where requirements become ambiguous and overlap. Instead, it produces a balanced and defensible set of evaluation units: lean enough to keep assurance practical, yet rich enough to capture the diversity of threats, controls, and evidence modalities that matter. Crucially, AHD transforms decomposition into a reproducible and auditable step. Different evaluators applying the same rules are more likely to arrive at consistent unit definitions, strengthening comparability across assessments and regulatory reviews. In this way, AHD turns what is often a subjective modeling choice into a systematic process that underpins the credibility of the assurance framework.
Table 3
|
Rule
|
Checklist Question
|
If Yes → Action
|
If No → Action
|
|
Functional Cohesion
|
Does the component perform multiple independent security-relevant functions?
|
Split into separate units
|
Keep as a single unit
|
|
Interface Boundaries
|
Does the component expose distinct external interfaces with independent attack surfaces?
|
Split by interface
|
Keep unified
|
|
Implementation Technology
|
Do the subcomponents rely on fundamentally different technologies (e.g., hardware logic vs. firmware)?
|
Split to capture technology-specific vulnerabilities
|
Keep merged
|
|
Assurance Dependency
|
Is one subcomponent’s assurance inseparable from another (e.g., cryptographic core and key manager)?
|
Merge into a single unit
|
Keep separate
|
|
Evidence Availability
|
Can evidence (e.g., tests, certification artifacts) be collected independently for each subcomponent?
|
Split if evidence sets are distinct
|
Merge if evidence cannot be separated
|
|
Criticality
|
Do parts of the component differ significantly in criticality (e.g., safety-critical vs. non-critical)?
|
Split to isolate critical from non-critical
|
Keep merged if criticality is uniform
|
|
Reusability / Commonality
|
Is the subcomponent reused across multiple products or contexts, requiring standalone assurance?
|
Split to allow independent assurance
|
Keep merged if reuse is not expected
|
4.2 Categorization
The Categorization segment embodies the principle of progression (operationalization). By distinguishing between high-level objectives and their systematic refinement into measurable criteria, it ensures that assurance reasoning does not remain abstract but advances toward specificity. This structured descent makes it possible to demonstrate completeness, minimize redundancy, and prepare for traceable mapping to requirements.
4.2.1 Level 3: Security Objective
Security objectives represent the high-level protection goals that express what must be preserved or safeguarded in the system. They are abstract enough to remain relevant across different implementations, yet specific enough to guide subsequent analysis. Objectives frame the central questions of assurance: What assets must be protected? What threats must be mitigated?
For instance, in a secure boot process, an objective might be to preserve the integrity of the firmware against unauthorized modification. Similarly, in a cryptographic module, an objective may focus on ensuring confidentiality of keys and integrity of operations. By articulating objectives explicitly, evaluators can provide a defensible rationale for why assurance activities target specific protections rather than others.
4.2.2 Level 4: Security Criteria
Security criteria refine objectives into structured categories of assurance concern. While objectives provide intent, criteria ensure that the assurance model captures breadth and coverage in a systematic manner. Each criterion translates an objective into a set of measurable domains that can later be mapped to concrete requirements.
For example, the objective of preserving firmware integrity may yield criteria such as cryptographic authenticity, update control, and rollback prevention. Likewise, the objective of protecting cryptographic keys may generate criteria including key storage security, access control, and lifecycle management. By organizing assurance concerns in this way, the model ensures that evaluators can systematically assess completeness, minimize redundancy, and prepare a clear foundation for deriving requirements.
4.3 Requirements
The requirements level operationalizes the pillars of progression and traceable standards integration. Security requirements translate high-level intent into enforceable, testable statements that maintain clear lineage to objectives and criteria. In hardware contexts, such as enforcing firmware integrity through cryptographic validation, requirements demonstrate how abstract concerns are grounded in authoritative mechanisms. Explicit requirements therefore ensure both actionability and defensibility under scrutiny.
4.3.1 Level 5: Security Requirement
Security requirements are concrete controls derived from criteria, prescribing the mechanisms by which higher-level objectives are satisfied. They must be clear, unambiguous, and testable, forming the direct foundation for verification. This specificity distinguishes them from criteria: while criteria frame categories of concern, requirements define the exact protection to be implemented. For example:
The firmware image shall be digitally signed using SHA-256 and validated against the public key stored in the secure element before execution.
Different evaluation units yield different classes of requirements. Hardware-oriented units may emphasize tamper resistance or side-channel protection, while firmware units focus on code integrity, input validation, and secure update mechanisms. By tailoring requirements to the nature of each unit, assurance remains contextually appropriate and technically rigorous. Finally, requirements preserve traceability by linking criteria and objectives to concrete implementations, demonstrating that all identified concerns are systematically addressed.
4.4 Verification
The verification level operationalizes the pillar of analytic and proactive assurance. Verification conditions extend beyond confirming compliance; they establish a foundation for quantitative measurement, reproducibility, and systematic identification of assurance gaps. This step enhances assurance credibility by requiring visible and auditable proof, such as runtime monitoring logs, side-channel analysis results, or penetration test reports, while facilitating aggregation across various criteria and objectives. This quantitative readiness ensures that verification outputs can be consistently rolled up, enabling evaluators to reason about the assurance posture not only at the individual unit level but also across the system as a whole.
4.4.1 Level 6: Verification Condition
A
Verification conditions define the observable and testable evidence required to confirm that a security requirement has been satisfied. Each condition specifies the circumstances under which a requirement is evaluated, the method of validation, and the form of acceptable evidence. In this way, verification conditions serve as the bridge between specification and assurance outcome.
For example, where a requirement mandates firmware signature validation before execution, the verification condition may state:
During system boot, test logs must confirm that the firmware image signature is validated against the manufacturer’s public key before code execution begins.
Similarly, for a requirement on key storage security, a verification condition could require evidence from penetration testing or side-channel analysis demonstrating that secret keys cannot be extracted from the secure element under defined attack conditions.
Verification conditions thus ensure that each assurance claim is repeatable, transparent, and auditable. They also provide the granularity necessary for independent assessment, allowing external evaluators to replicate tests and confirm results. By articulating verification in this manner, the framework avoids reliance on implicit trust and instead grounds assurance in demonstrable evidence.
5.1 Evaluation Methodology
This study adopts a Design Science Research (DSR) evaluation perspective [63], in which the proposed framework is assessed by applying it to a representative problem domain. In DSR, the goal of evaluation is not only to demonstrate that an artifact can be instantiated, but also to provide evidence of its utility, rigor, and relevance.
To achieve this, a case study evaluation approach was selected. Case studies are particularly suited for assurance frameworks because they allow for detailed analysis of how abstract structures (scope, objectives, requirements, verification) can be instantiated on real systems with complex assurance demands. Through systematic instantiation, the evaluation provides tangible outputs (e.g., scoped units, consolidated requirements, verification conditions) and enables comparative analysis against baseline practices.
The evaluation logic follows the layered structure of the framework (L1–L6), ensuring traceability from abstract objectives down to testable evidence. This structured alignment allows the framework’s contributions to be assessed in terms of coverage, redundancy reduction, traceability, and contextual fitness.
5.2 Case Study Selection and Rationale
To evaluate the framework under realistic and demanding conditions, a wireless fingertip pulse oximeter was selected as the case study. These devices are widely deployed in clinical and consumer healthcare for continuous monitoring of blood oxygen saturation (SpO₂) and pulse rate. Their safety-critical function means that security failures can directly affect patient outcomes, while their wireless connectivity and integration with mobile applications expose them to cyber-physical threats. This dual character of medical criticality combined with network exposure makes the oximeter a representative and challenging context for assurance evaluation.
The device presents heterogeneous assurance demands across its subsystems. The sensing chain must preserve measurement accuracy and signal integrity under variable conditions; the compute and firmware subsystem must enforce trusted boot and secure update controls; the wireless interface must protect confidentiality and authenticity of transmitted data; and the physical and power subsystem must resist tampering or fault injection that could compromise reliability. Together, these domains provide a comprehensive test of the framework’s ability to structure, consolidate, and verify assurance requirements without loss of contextual relevance.
The oximeter also operates within a layered regulatory environment. Standards such as IEC 60601-1 (safety of medical electrical equipment) and IEC 62304 (medical software lifecycle processes) impose obligations for safety and development practices, while cybersecurity standards such as UL 2900-2-1 (software cybersecurity for healthcare systems) and FIPS 140-3 (cryptographic module validation) introduce additional, partly overlapping requirements. This fragmented landscape underscores the need for traceable standards integration, providing an appropriate stress test for the framework’s ability to harmonize multiple obligations while preserving authoritative lineage.
The system architecture of the wireless fingertip pulse oximeter is shown in Fig. 4. The device boundary encloses the sensing chain, microcontroller and firmware (including secure boot and cryptographic primitives), BLE radio and stack, power and physical subsystem, user interface, and update/service port. External elements such as the mobile companion application, clinical/home environment, and cloud or EHR services are shown for context but lie outside the assurance scope of this study.
5.3 Evaluation Procedure
The evaluation was conducted through a structured, stepwise procedure that instantiated the framework on the wireless fingertip pulse oximeter. The procedure followed the logical progression of the framework, beginning with system modeling and proceeding through categorization, requirements specification, and verification, before concluding with comparative assessment.
Step 1: System Modeling and Evaluation Unit Definition.
The boundary-driven SoI model was applied to define the assurance scope (L1). The scope included the oximeter hardware, firmware, and wireless interface, as well as operational processes such as secure updates, while excluding external mobile applications and cloud services. Within this scope, the AHD method was employed to derive evaluation units (L2). Following the split/merge rules, four units were identified: CU-SENSE (sensing and analog chain), CU-CPU (compute and firmware), CU-BLE (Bluetooth Low Energy communication), and CU-PHYS (power and physical integrity). Each unit was documented through a Unit Contract, specifying its boundary, interfaces, assurance concerns, and evidence modalities.
Step 2: Definition of Security Objectives and Criteria.
For each evaluation unit, high-level security objectives (L3) were defined to express what must be preserved or protected. These were then refined into structured security criteria (L4) to ensure systematic coverage. Examples include maintaining firmware integrity and preventing rollback attacks for CU-CPU, ensuring confidentiality and authenticity of wireless transmissions for CU-BLE, and preserving measurement accuracy for CU-SENSE. This step prevented assurance reasoning from remaining abstract and prepared a clear foundation for requirement derivation.
Step 3: Specification of Security Requirements
Explicit, testable security requirements (L5) were derived from the criteria. To ensure completeness, requirements were populated by consolidating multiple authoritative standards, including IEC 60601-1, IEC 62304, UL 2900-2-1, and FIPS 140-3. Overlaps and redundancies were reconciled while maintaining full traceability to all source standards. Each requirement was explicitly linked to its evaluation unit, ensuring contextual fitness and preserving authoritative lineage.
Step 4: Definition of Verification Conditions.
Verification conditions (L6) were defined to specify the evidence needed to confirm that each requirement was satisfied. For example, CU-CPU requirements on firmware authenticity were linked to boot log inspection and key validation tests, CU-BLE requirements on encryption were linked to packet capture analysis, and CU-PHYS requirements on tamper resistance were linked to physical probing and power fault injection experiments. Verification conditions were defined with sufficient precision to allow reproducibility by independent evaluators.
Step 5: Comparative Assessment Against Baseline Standards.
Finally, the framework’s outputs were compared against a baseline approach in which requirements were managed separately under individual standards. This comparison was used to evaluate the framework’s contribution to coverage, redundancy reduction, traceability, and gap identification.
6. Results & Analysis
The evaluation of the oximeter demonstrates how the proposed framework produces concrete, traceable outputs across all six levels (L1–L6). Beginning with system scoping and decomposition, the framework yields structured evaluation units, objectives, criteria, consolidated requirements, and verification conditions. These results provide evidence that the framework can systematically reconcile heterogeneous standards into a unified assurance structure.
To illustrate this in practice, the framework was instantiated on the wireless fingertip pulse oximeter following the procedure outlined in Section 5.3. This process yielded outputs at each architectural level, providing the basis for subsequent analysis. The evidence demonstrates that the framework can be applied systematically to a realistic device, producing reproducible artifacts rather than remaining at a conceptual level.
Assurance Scope (L1).
Using the C–E–P triad, the assurance scope was explicitly defined. Table 4 summarizes the included and excluded elements. This scoping step ensured that assurance activities were grounded in a clear boundary rather than relying on implicit assumptions.
Table 4
Assurance Scope (L1) for Wireless Fingertip Pulse Oximeter
|
Category
|
In Scope
|
Out of Scope
|
|
System Component
|
Optical sensing chain (LEDs, photodiode, AFE + ADC); MCU and firmware; BLE radio; power subsystem; enclosure
|
Mobile application; cloud services
|
|
System Environment
|
Local RF environment; USB/charging interface; clinical/home operating context
|
Remote networks; hospital IT infrastructure
|
|
System Process
|
Secure firmware update; device pairing; provisioning
|
Manufacturing supply chain
|
Evaluation Units (L2).
Applying Assurance-Oriented Hardware Decomposition (AHD), the scoped system was partitioned into four evaluation units. Table 5 lists these units together with their primary assurance concerns and evidence modalities.
Table 5
Evaluation Units (L2) and Assurance Focus
|
Evaluation Unit
|
Description
|
Primary Assurance Concerns
|
Evidence Modalities
|
|
CU-SENSE
|
Optical sensing and analog chain
|
Measurement accuracy; signal integrity
|
Calibration tests; error analysis
|
|
CU-CPU
|
Microcontroller and firmware
|
Firmware authenticity; rollback prevention; runtime integrity
|
Boot logs; cryptographic validation; code review
|
|
CU-BLE
|
Bluetooth Low Energy radio and stack
|
Confidentiality and authenticity of data; secure pairing
|
Packet capture; encryption/authentication checks
|
|
CU-PHYS
|
Power subsystem and enclosure
|
Resistance to tampering; fault injection resilience
|
Penetration testing; fault injection logs
|
Security Objectives and Criteria (L3–L4).
From the defined units, high-level objectives were articulated and refined into criteria. Table 6 shows an excerpt from four evaluation units, illustrating how objectives were translated into systematic assurance categories.
Table 6
Example objectives and criteria evaluation units
|
Evaluation Unit
|
Security Objective (L3)
|
Criteria (L4)
|
|
CU-CPU
|
Preserve integrity of firmware at boot
|
• Cryptographic authenticity validation
• Rollback prevention
• Integrity of update mechanism
|
|
Ensure confidentiality of cryptographic keys
|
• Secure key storage
• Access control enforcement
• Protection against leakage through debug interfaces
|
|
Maintain runtime integrity of execution
|
• Memory protection (MPU enforcement)
• Input validation against malformed data
• Error handling without leakage
|
|
CU-SENSE
|
Preserve accuracy of SpO₂ and pulse measurements
|
• Calibration against reference values
• Filtering of ambient noise and artifacts
• Secure transfer of raw data to MCU
|
|
Ensure integrity of sensor data pipeline
|
• No bypass of AFE/ADC
• Error detection in analog–digital conversion
• Consistency checks between samples
|
|
Protect against spoofing or manipulation
|
• Resistance to external optical interference
• Detection of abnormal light injection attempts
|
|
CU-BLE
|
Preserve confidentiality of transmitted data
|
• End-to-end encryption (AES-CCM)
• Key exchange via authenticated pairing
• Protection against passive eavesdropping
|
|
Ensure authenticity of data and endpoints
|
• Mutual authentication in pairing
• Message authentication codes (MACs)
• Replay attack prevention
|
|
Maintain availability of wireless link
|
• Resistance to denial-of-service attempts
• Robustness against jamming/interference
|
|
CU-PHYS
|
Preserve device availability under power variation
|
• Brownout detection
• Controlled shutdown on low battery
• Fault-tolerant power regulation
|
Security Requirements (L5).
A
Requirements were derived from authoritative standards such as IEC 62304 (software lifecycle controls), UL 2900-2-1 (secure update mechanisms), and FIPS 140-3 (cryptographic key handling). These were harmonized into a coherent set that preserves lineage to the original sources while reducing redundancy and contradiction. The full consolidated requirement set is provided in Appendix Table A1, representing the assurance obligations defined within the evaluation boundary. Representative excerpts are shown in Table
7, illustrating how high-level objectives and criteria are translated into concrete, testable requirements with explicit linkage to their originating standards, thereby ensuring both traceability and defensibility of the assurance process.
Table 7
Representative excerpts of security objectives (L3) and criteria (L4) into concrete requirements (L5)
|
Security Objective (L3)
|
Security Criterion (L4)
|
Requirement (L5)
|
Referenced Standard(s) / Act(s)
|
|
CU-CPU – Preserve integrity of firmware at boot
|
Cryptographic authenticity validation
|
The device shall verify the digital signature of all firmware images using SHA-256 and RSA-2048 before execution.
|
UL 2900-2-1 § 5.2; FIPS 140-3 § 7.6
|
|
Rollback prevention
|
The device shall enforce monotonic version counters to prevent installation of outdated firmware.
|
IEC 62304 § 5.7.2; FDA Cybersecurity Guidance (2023)
|
|
Integrity of update mechanism
|
Firmware updates shall only be accepted if validated against manufacturer-issued certificates.
|
UL 2900-2-1 § 6.3; IEC 62304 § 5.8
|
|
CU-CPU – Ensure confidentiality of cryptographic keys
|
Secure key storage
|
Keys shall be stored only in secure hardware-protected registers, never exposed in system memory.
|
FIPS 140-3 § 7.7; HIPAA Security Rule § 164.312
|
|
Access control enforcement
|
Debug interfaces shall require authenticated access before reading protected areas.
|
IEC 60601-1 § 14.13; UL 2900-2-1 § 6.4
|
|
Leakage prevention
|
Test results must confirm that keys are not observable via debug ports or side channels.
|
NIST SP 800 − 57; CWE-1247
|
|
CU-BLE – Preserve confidentiality of transmitted data
|
End-to-end encryption
|
All BLE packets shall be encrypted and authenticated using AES-CCM with 128-bit keys.
|
FIPS 140-3 § 7.8; UL 2900-2-1 § 5.4.2
|
|
Secure key exchange
|
BLE pairing shall employ authenticated LE Secure Connections to prevent MITM.
|
Bluetooth Core Spec v5.3; UL 2900-2-1 § 5.4.3
|
|
Replay attack prevention
|
Devices shall reject packets with reused nonces or counters.
|
NIST SP 800 − 57; IEC 62304 § 5.9.1
|
|
CU-PHYS – Mitigate fault injection attacks
|
Power glitch resistance
|
The device shall detect and reset on abnormal voltage fluctuations.
|
IEC 60601-1 § 14.8; CWE-1247
|
|
Clock manipulation detection
|
The MCU shall implement watchdog timers to detect abnormal clock frequencies.
|
UL 2900-2-1 § 5.6.4
|
|
Key protection under faults
|
Cryptographic keys shall remain inaccessible under induced faults.
|
FIPS 140-3 § 7.9; NIST SP 800 − 160 Vol.2
|
During this step, the framework also revealed gaps in baseline standards. For example, UL 2900-2-1 specifies secure update authentication but does not explicitly address rollback prevention. The framework resolved this by introducing a monotonic version counter requirement, preserving lineage to both UL 2900-2-1 and FDA Cybersecurity Guidance. Similarly, IEC 62304 requires lifecycle management but does not specify protection against optical spoofing attacks. The framework extended coverage by defining spoofing resistance requirements under CU-SENSE.
Verification Conditions (L6).
The final stage defined verification conditions that provide explicit evidence requirements for each security control. While requirements state what protections must exist, verification conditions specify how compliance is demonstrated in practice, grounding assurance in reproducible, testable outcomes. As shown in Table 8, each high-level security objective is systematically decomposed into criteria, requirements, and verification conditions, ensuring that assurance remains traceable to standards and measurable through concrete evaluation activities.
Table 8
Representative excerpts of security objectives (L3) into criteria (L4), requirements (L5), and verification conditions (L6).
|
Security Objective (L3)
|
Security Criterion (L4)
|
Requirement (L5)
|
Verification Condition (L6)
|
|
CU-CPU – Preserve integrity of firmware at boot
|
Cryptographic authenticity validation
|
The device shall verify the digital signature of all firmware images using SHA-256 and RSA-2048 before execution (UL 2900-2-1, FIPS 140-3).
|
During boot, system logs must confirm successful signature verification against the manufacturer’s public key before firmware execution.
|
|
CU-BLE – Preserve confidentiality of transmitted data
|
End-to-end encryption
|
All BLE packets shall be encrypted and authenticated using AES-CCM with 128-bit session keys (FIPS 140-3, UL 2900-2-1).
|
Packet capture analysis must show that all payloads are encrypted and authenticated with valid session keys.
|
|
CU-SENSE – Ensure accuracy of SpO₂ measurements
|
Calibration integrity
|
Sensor calibration data shall be protected against unauthorized modification (IEC 60601-1, FDA Cybersecurity Guidance).
|
Configuration audit must confirm calibration values match manufacturer specifications and cannot be overwritten without authentication.
|
|
CU-PHYS – Mitigate fault injection attacks
|
Power glitch resistance
|
The device shall detect and reset on abnormal voltage fluctuations (IEC 60601-1, CWE-1247).
|
Fault injection testing must show that induced glitches result in safe reset without key leakage or execution bypass.
|
The results presented above demonstrate that the framework produces structured and traceable assurance artifacts, consolidating requirements from heterogeneous standards and linking them systematically to verification conditions. These outputs strengthen coverage, reduce redundancy, and provide clear traceability across all levels of the framework. At this stage, however, the results remain qualitative: they establish completeness and traceability but do not yet enable comparative measurement or prioritization. This limitation is addressed in Section 7, where the framework is extended with a quantitative methodology for scoring and aggregation.
In addition, applying the framework to the oximeter revealed where existing standards alone would have left critical assurance gaps. For example, UL 2900-2-1 specifies authentication of firmware updates but does not mandate rollback prevention, which leaves devices exposed to downgrade attacks. The framework introduced an explicit requirement for monotonic version counters, preserving lineage to both UL 2900-2-1 and FDA Cybersecurity Guidance. Likewise, IEC 62304 emphasizes lifecycle controls but provides no guidance on optical spoofing resistance in sensor subsystems; the framework extended coverage by defining criteria for spoof detection under CU-SENSE.
To illustrate this added value, Table 9 contrasts a baseline standards-only evaluation with the framework outputs. The comparison shows how the framework not only consolidates overlapping requirements but also eliminates redundancies and adds explicit coverage where standards are silent.
Table 9
Comparison of Standards-Only vs. Framework Outputs (Oximeter Case Study)
|
Evaluation Aspect
|
Standards-Only Approach
|
Framework Approach (This Paper)
|
|
Firmware update protection
|
Secure update authentication (UL 2900-2-1)
|
Consolidated requirement set including explicit rollback prevention (gap filled)
|
|
BLE confidentiality
|
Encryption requirement (FIPS 140-3)
|
Extended requirements covering replay attack prevention and entropy validation
|
|
Sensor accuracy & spoofing
|
Not explicitly covered in IEC 62304 / UL 2900
|
New assurance criteria for calibration integrity, noise filtering, and spoof detection
|
|
Requirement redundancy
|
Multiple overlapping clauses across UL 2900 and FIPS 140-3
|
Redundancies reconciled into a single coherent requirement set with preserved traceability
|
|
Traceability
|
Requirements scattered across separate documents
|
Structured end-to-end mapping: Objective → Requirement → Verification Condition
|
7. Toward Quantitative Assurance
The framework presented in this study demonstrates that assurance reasoning can be systematically structured across scope definition, categorization, requirements, and verification conditions. While these contributions address long-standing challenges of fragmentation and traceability, they also establish the foundation for a more advanced capability: the quantitative evaluation of assurance posture. This section outlines the rationale, methodological direction, and illustrative examples of how the framework can be extended into a quantitative model.
7.1 Rationale for Quantification
Qualitative consolidation of requirements reduces ambiguity and improves traceability, but it does not enable comparative decision-making across systems, subsystems, or lifecycle stages. Practitioners and regulators frequently face questions such as: Which subsystem presents the greatest residual risk? How does assurance coverage compare across vendors or product generations? What level of confidence can be claimed at the system level? Without quantification, these questions remain largely subjective. A structured scoring model addresses this gap by translating assurance evidence into normalized, comparable metrics that support prioritization, benchmarking, and resource allocation.
7.2 Methodological Building Blocks
The proposed framework is inherently hierarchical, enabling quantitative evaluation through structured scoring and aggregation. Each level of the framework (L1–L6) provides an anchor for mathematical representation. This section formalizes the building blocks of a quantitative methodology.
7.2.1 Verification Scoring
Each verification condition
produces an observable outcome that can be scored. Let:
denote the satisfaction value of verification condition kkk associated with requirement jjj of criterion iii. Values may be binary (0 = fail, 1 = pass), ordinal (e.g., 0, 0.5, 1 for partial compliance), or probabilistic (confidence levels derived from statistical testing).
7.2.2 Requirement Aggregation
Each security requirement
consists of multiple verification conditions. Its score is defined as a weighted sum:
where
are local weights for verification conditions, normalized such that:
7.2.3 Criterion Aggregation
Criteria
refine security objectives into structured assurance domains. The score for a criterion is:
with
. The weights
capture the relative importance of requirements within the criterion, which may be informed by regulatory mandates, criticality to mission objectives, or known threat profiles.
7.2.4 Objective Aggregation
At the objective level, scores consolidate criteria into higher-level protection goals:
With normalization
. Objectives thus yield quantifiable indicators of whether high-level security intentions (e.g., firmware integrity, confidentiality of data) are achieved.
7.2.5 Evaluation Unit Aggregation
Each evaluation unit
encompasses several objectives. The unit-level assurance score is:
With normalization
where
are weights reflecting the relative importance of objectives within the unit.
Crucially, weights should also be assigned at the evaluation unit level to reflect differences in subsystem criticality. For instance, in a medical oximeter, compromise of the sensing chain (CU-SENSE) directly affects patient safety, whereas compromise of the BLE interface (CU-BLE) primarily affects confidentiality. Thus, evaluation units are assigned criticality weights
, normalized across all units:
7.2.6 System-Level Aggregation
Finally, the system-level assurance score is obtained as a weighted sum of unit scores:
This hierarchical aggregation maintains traceability from individual verification conditions to the overall system assurance posture.
7.2.7 Rationale for Weighting
Weights at each level
may be determined by one or more of the following rationales:
Standards-based weighting. Certain requirements are explicitly prioritized in authoritative standards (e.g., FIPS 140-3 Level 3 physical tamper resistance). The challenge is that standards often differ in emphasis across domains, and direct mapping may over-privilege compliance obligations while underrepresenting emerging threats.
Risk-based weighting. Weights can reflect the impact of a requirement’s failure, derived from likelihood × consequence analysis. This approach aligns with established risk management practices but depends on accurate threat modeling and reliable data, which are often costly or incomplete.
Expert judgment. Domain experts or regulators can assign weights based on experience. While practical, this method is inherently subjective and can be influenced by organizational biases or shifting regulatory priorities, limiting reproducibility.
Uniform weighting. In exploratory phases, assigning equal weights preserves neutrality and avoids subjective disputes. However, this simplicity can mask genuine differences in criticality across requirements or subsystems.
In practice, no single approach is sufficient. A hybrid strategy is often necessary: standards may define minimum priorities, risk assessments can refine criticality, expert elicitation can fill data gaps, and uniform weighting can serve as a baseline for sensitivity analysis. By combining these perspectives, weight assignments can become both more defensible and more adaptable to diverse assurance contexts.
7.3 Worked Numerical Example (Oximeter Case)
This example instantiates the quantitative scheme from Section
7.2 on the four evaluation units of the oximeter: CU-SENSE, CU-CPU, CU-BLE, CU-PHYS. All scores are normalized to
. Weights at each level sum to 1.
7.3.1 Evaluation-Unit Criticality Weights
As shown in Table
10, each evaluation unit is assigned a criticality weight based on its relative impact on patient safety and security. These
values are primarily risk-based, prioritizing safety impact, but they could alternatively be derived from standards or expert elicitation. The assigned weights ensure that subsystem evaluations contribute proportionally to the aggregated system-level assurance score.
Table 10
Unit-level criticality weights (
) and rationale.
|
Evaluation Unit
|
Rationale (summary)
|
|
|
CU-SENSE
|
Direct effect on clinical accuracy (SpO₂), safety-critical
|
0.35
|
|
CU-CPU
|
Trusted boot & runtime integrity underpin all controls
|
0.30
|
|
CU-PHYS
|
Tamper/faults can cause unsafe behavior or bypasses
|
0.20
|
|
CU-BLE
|
Primary confidentiality/authenticity channel
|
0.15
|
|
Total
|
|
1.00
|
7.3.2 Detailed Roll-Up for One Unit (CU-CPU)
The CU-CPU evaluation unit of the oximeter is responsible for trusted boot and runtime integrity. Two objectives are considered here: boot integrity and runtime integrity. Each objective is decomposed into requirements and verification conditions, with weighted aggregation applied at each level according to the methodology introduced in Section 7.2.
Objective 1: Boot Integrity
The first objective ensures that firmware is authenticated at boot and that rollback attacks are prevented.
Requirement R1: Signature validation
The verification conditions are scored as 1.0 and 0.90, with weights (0.6, 0.4). The perfect score reflects tests confirming correct signature validation in all cases, while the slightly lower score represents a minor flaw observed under certain reset conditions.
Requirement R2: Rollback prevention
Verification conditions are scored as 1.0 and 0.50, with equal weights (0.5,0.5). The perfect score reflects a robust version counter mechanism, while the lower score represents partial coverage in testing, where rollback protection was bypassed in one simulated update sequence.
Aggregating both requirements with weights
yields:
Objective 2: Runtime Integrity
The second objective focuses on maintaining runtime execution integrity through memory protection and input validation.
Requirement R1: MPU enforcement
Verification condition scores are 0.80 and 0.7, with equal weights (0.5,0.5):
Requirement R2: Input validation (fuzzing)
Verification condition scores are 0.60 and 0.40, with weights (0.6,0.4):
Aggregating both requirements with weights
gives:
Unit Aggregation
Finally, the two objectives (boot integrity and runtime integrity) are combined at the unit level with weights
Interpretation. The CU-CPU unit achieves a normalized score of approximately 0.813. This indicates strong assurance coverage, with particular robustness in the boot integrity domain (0.897) and moderate strength in runtime integrity (0.658). The decomposition allows stakeholders to identify that improving runtime controls (e.g., stronger input validation) would raise the overall CPU assurance score.
7.3.3 System-Level Score
Finally, aggregate by
evaluation-unit criticality from § 7.3.1:
Interpretation. A system-level score of
indicates strong but improvable assurance. The decomposition preserves traceability: stakeholders can drill down to units (e.g., CU-PHYS at 0.761) and further to specific criteria/requirements/verification conditions to plan targeted improvements.
7.4 Analytics in Assurance Evaluation
While the preceding subsections demonstrated the instantiation of the framework, this section consolidates its analytic capability. A central contribution of the framework is the transformation of assurance from qualitative reasoning into a quantitative and diagnostic process. By structuring evidence as measurable scores, the framework enables evaluation at multiple levels: unit-level, cross-unit, supply chain, and system-level.
7.4.1 Unit-Level Diagnosis
Analytics enables evaluators to identify strengths and weaknesses within individual subsystems by decomposing assurance results to the objective and criterion levels. In CU-SENSE, quantitative assessment shows that SpO₂ Accuracy achieved a strong score (0.88), while Sensor Pipeline Integrity was significantly weaker (0.72), exposing a 0.16 performance gap within the unit. Criterion-level analysis further reveals that Calibration Integrity is highly robust (0.95), whereas Secure Transfer (0.84) and Filtering Validation (0.85) introduce measurable residual weaknesses.
Figure 5 illustrates this hierarchical relationship, with criterion-level scores clustering around their parent objective. By making such differences explicit, unit-level diagnosis uncovers risks that would remain hidden in purely qualitative evaluations. More importantly, it provides evaluators with actionable guidance: targeted assurance effort can be directed toward weaker criteria, while strong components can be leveraged as anchors of trust within the subsystem. This fine-grained diagnostic capability forms the basis for subsequent cross-unit comparisons and, ultimately, system-level resilience analysis.
7.4.2 Cross-Unit Analytics
Comparing results across evaluation units provides diagnostic insight into how assurance is distributed within the system. Figure 6 presents boxplots of objective-level scores for CU-CPU, CU-SENSE, CU-PHYS, and CU-BLE. Each boxplot reports the median (horizontal line), interquartile range (IQR; box boundaries), and full dispersion (whiskers). A horizontal dashed line marks the acceptability threshold of 0.75, representing the minimum score at which objectives are considered adequately assured.
The results show that CU-CPU (median = 0.84, IQR = 0.08) and CU-SENSE (median = 0.80, IQR = 0.07) maintain relatively consistent assurance with narrow variability, indicating that computational integrity and sensing functions are both strongly and uniformly supported. In contrast, CU-PHYS (median = 0.73, IQR = 0.15) and CU-BLE (median = 0.74, IQR = 0.14) exhibit wider dispersion, with several objectives falling below the acceptability threshold. This variability highlights weaker assurance for physical protections and communication links.
These differences emphasize that assurance is not evenly distributed across subsystems. By making such disparities explicit, the framework provides a quantitative basis for prioritization, enabling evaluators to allocate resources toward CU-PHYS and CU-BLE in order to strengthen resilience and reduce uncertainty in assurance coverage.
7.4.3 Supply Chain Analytics
Supply-chain dependencies were quantified by assigning each evaluation unit to either Vendor or Supplier control and computing its criticality-weighted share of assurance within that category. Starting from raw shares (percentage of verification conditions per unit), we applied the unit’s criticality weight
and renormalized within each category so that Vendor and Supplier each sum to 100% (Appendix B shows the worked calculation for all units).
Figure
7 presents the weighted contributions only. On the Vendor side, CU-CPU accounts for 52.5% of the vendor-controlled assurance and CU-PHYS for 47.5% (weights
for both). On the Supplier side, the elevated weight for CU-SENSE (
) amplifies its influence, yielding 59.4% of supplier-provided assurance versus 40.6% for CU-BLE. Thus, within each category a single unit dominates: CU-CPU for Vendor and CU-SENSE for Supplier. The latter is especially consequential because its high criticality concentrates supplier risk; weaknesses in CU-SENSE would disproportionately depress system-level assurance.
By making these weighted shares explicit, the analysis moves beyond listing responsibilities to measuring where assurance dependence is concentrated. Practically, this supports prioritization: strengthen verification and supplier management around CU-SENSE first, while maintaining vendor focus on CU-CPU. The underlying arithmetic is transparent and reproducible, enabling auditors to trace category-level conclusions back to unit-level evidence.
7.4.4 System-Level Resilience
Analytics also enables assurance results to be aggregated into a holistic, system-level view. Figure 8 illustrates this process as a waterfall chart, where each evaluation unit contributes incrementally to the final assurance score. Contributions are computed by weighting each unit’s objective-level scores by its assigned criticality factor and then normalizing them against the system total.
The results show that CU-SENSE increases the system score by + 0.11 and CU-BLE by + 0.09, together accounting for the majority of the system’s assurance uplift. In contrast, CU-PHYS adds only + 0.03, confirming that weaknesses in physical protections dilute their influence on system-level resilience. By visualizing these increments, the framework confirms the quantitative roll-up while also supporting diagnostic reasoning: subsystem weaknesses are immediately visible in their limited contribution to the overall assurance profile.
Residual risk is expressed as the complement of the system assurance score (1 – Ssys). This value provides a forward-looking measure of resilience, highlighting the margin by which the system falls short of full assurance. In practice, this residual risk indicates the extent of additional mitigation or verification required to achieve target assurance levels. By quantifying resilience in this way, analytics shifts the discussion from abstract notions of robustness to concrete, evidence-based metrics that can guide design and assurance planning.
8. Discussion
The evaluation demonstrates that the proposed framework delivers structured, traceable, and actionable assurance artifacts, validating its ability to reconcile heterogeneous standards into a unified model. Compared to a standards-only baseline, the framework improves traceability, reduces redundancy, and establishes clear pathways from objectives to verification. Importantly, it also demonstrates readiness for quantitative evaluation, thereby bridging the gap between compliance documentation and measurable assurance posture.
8.1 Contributions and Insights
The evaluation results provide several concrete insights into the contribution of the proposed framework. First, the framework delivers a structured multi-level model (L1–L6) that systematically translates abstract assurance goals into testable outputs. In the oximeter case, this included a scoped system boundary (L1), four evaluation units (L2), mapped objectives and criteria (L3–L4), a consolidated set of 22 requirements (L5), and corresponding verification conditions (L6). This stepwise progression demonstrates that high-level concerns, such as firmware integrity or wireless confidentiality, can be transformed into enforceable technical checks like firmware signature validation and BLE encryption log review. Second, the framework harmonizes heterogeneous standards into a single coherent requirement set, thereby reducing redundancy while preserving traceability. For example, overlapping clauses on firmware update protection from UL 2900-2-1 and FIPS 140-3 were consolidated into a unified requirement for signature validation and rollback prevention, with lineage maintained to both sources. Third, the framework produces actionable outputs by linking requirements directly to verification conditions. In practice, this means that a requirement on BLE session key generation was paired with verification steps such as inspection of entropy sources and review of RNG implementation logs, while tamper detection requirements were connected to observable events like triggered interrupts and system log entries.
8.2 Practical Implications
The framework has several concrete implications for practitioners who work with assurance in safety-critical and security-sensitive domains. For regulators and auditors, the structured multi-level model provides defensible traceability from high-level assurance objectives down to testable verification conditions. This makes it easier to justify certification and compliance claims, as every requirement is anchored in authoritative standards and supported by observable evidence. For industry stakeholders, the framework reduces the burden of managing overlapping standards by consolidating them into a coherent requirement set. This helps avoid redundant audits and conflicting obligations, which are common pain points when demonstrating compliance with multiple regulatory regimes.
For system designers and assurance managers, the framework provides diagnostic visibility into where assurance is strong and where it remains weak. For instance, in the oximeter case, CU-SENSE and CU-BLE scored strongly, while CU-PHYS lagged behind, pointing directly to areas where further investment in tamper and fault-injection protection would yield the greatest improvement. This enables more targeted allocation of assurance resources, moving beyond box-ticking compliance toward evidence-based risk prioritization. Finally, because the framework produces quantitative-ready outputs, it lays the groundwork for integrating assurance results into broader enterprise decision processes, such as system-level risk management or procurement evaluations where transparent, comparable metrics are increasingly valued.
8.3 Comparison to Existing Approaches
The evaluation highlights how the proposed framework advances beyond both standards-based assurance practices and prior academic contributions. Unlike compliance-only approaches, which treat standards as the central or exclusive source of assurance, the framework treats them as one important input among others, integrating them alongside threat modeling, empirical evidence, and quantitative analytics to provide a more comprehensive foundation.
Standards-based approaches (e.g., FIPS 140-3, IEC 62304, UL 2900-2-1) provide authoritative and widely recognized requirements but are developed in isolation. This leads to fragmentation, overlap, and inconsistencies across domains. Practitioners are left to reconcile these differences manually, which is resource-intensive and prone to error. The framework addresses this limitation by systematically integrating requirements into a layered structure, thereby consolidating scope, eliminating redundancies, and enabling explicit cross-standard traceability.
Academic approaches, in contrast, often emphasize taxonomies, checklists, or domain-specific models (e.g., tailored to medical devices, industrial control, or cloud environments). While such contributions provide useful structure, they tend to remain either overly abstract, limiting operational applicability, or narrowly scoped, restricting transferability across domains. The proposed framework avoids this trade-off by being principle-driven yet domain-agnostic. Its demonstration on a wireless fingertip pulse oximeter illustrates transferability, while its design principles (standards integration, verification linkage, boundary clarity) ensure generalizability.
Furthermore, prior research frequently isolates requirement cataloging from verification methods, leaving a gap between what must be assured and how assurance is demonstrated. The proposed framework explicitly unifies these dimensions by linking high-level assurance objectives to concrete verification conditions. This integration enables holistic assurance that is simultaneously comprehensive, verifiable, and adaptable across contexts. Table 11 summarizes the comparison.
A further contribution is that the framework decouples assurance from strict dependence on standards. They are treated as one important input, but not as the sole foundation. Alongside compliance obligations, the framework integrates threat modeling, empirical evidence, and quantitative scoring, thereby avoiding the pitfall of compliance-only assurance. This enables evaluators to anticipate emerging risks, compare subsystem assurance in a structured way, and generate evidence in domains where existing standards are incomplete or silent.
Table 11
Comparison with existing approaches
|
Approach
|
Limitations
|
How the Proposed Framework Advances
|
|
Standards-only assurance (e.g., FIPS 140-3, IEC 62304)
|
Developed in isolation, fragmentation, overlaps, and inconsistencies; compliance effort without coherence; weak traceability across domains
|
Integrates heterogeneous standards into a layered model; eliminates redundancies; enables explicit cross-standard traceability and coherent assurance arguments
|
|
Academic taxonomies / checklists
|
Either overly abstract (difficult to operationalize) or narrowly scoped (domain-specific only); limited transferability; weak or absent verification linkage
|
Provides a principle-driven but domain-agnostic structure; demonstrated in medical devices but adaptable across domains; embeds verification linkage to improve practical adoption
|
|
Verification-focused methods (e.g., test suites, fault injection)
|
Focus narrowly on testing; isolated from higher-level objectives; no systematic scope consolidation; lack requirement-to-verification traceability
|
Unifies objectives and verification conditions; embeds verification within the assurance structure; ensures that system-level requirements roll down into concrete, testable evidence
|
8.4 Limitations
While the evaluation demonstrates the applicability and benefits of the proposed framework, several limitations must be acknowledged. First, the results are based on a single case study involving a wireless fingertip oximeter. Although this device is both safety-critical and heterogeneous, presenting diverse assurance challenges, its evaluation alone cannot establish generalizability. As such, it provides a demanding “stress test” that highlights subsystem heterogeneity, cryptographic protection, lifecycle processes, and wireless security. Nonetheless, evaluation of one device alone cannot establish generalizability, and broader validation across domains such as industrial IoT, automotive, and defense platforms will be necessary to confirm transferability and robustness.
Second, the evaluation outcomes reported in Section 6 are primarily qualitative, emphasizing structured outputs, traceability, and consolidation of standards. The quantitative scoring and aggregation introduced in Section 7 should therefore be regarded as a proof-of-concept demonstration rather than a validated assurance metric. Real-world deployment will require empirical calibration, for example through structured expert elicitation, alignment with regulatory benchmarks, or analysis of operational failure and incident data.
The quantitative methodology should be interpreted as a conceptual framework rather than as a validated assurance metric. Its primary contribution lies in defining the mathematical structure and hierarchical roll-up that enable traceable quantification. The weights and example scores in the case study serve only as illustrative instantiations, intended to show how the methodology operates in practice. As discussed in Section 7.2.7, each weighting rationale, whether standards-based, risk-based, expert-driven, or uniform has trade-offs, and no single method is sufficient on its own. A hybrid approach is therefore likely to be required in real-world adoption. Future work must calibrate these parameters using empirical datasets, structured elicitation, or regulatory benchmarks to ensure defensible and reproducible application.
Finally, the present scope of the framework is focused on device-level hardware/software integration. It does not yet extend comprehensively to supply chain dependencies, organizational governance, or socio-technical assurance dimensions. These remain critical in real-world deployments, where trustworthiness is shaped not only by technical protections but also by supplier practices, lifecycle management, and human or organizational factors. Expanding the framework to incorporate these dimensions represents an important avenue for future research.
8.5 Future Work
The evaluation of the framework opens several avenues for further research. First, multi-domain validation is necessary to strengthen generalizability. The framework's capacity to adjust to various regulatory environments and assurance procedures would be put to the test if it were applied to other safety-critical industries like industrial control systems, automotive, or aerospace. Comparative studies across these domains would also support analytics generalization and refinement of the framework’s design principles.
Second, the integration of automated and AI-assisted methods represents a promising extension. Automated mapping of requirements, gap analysis, and traceability verification could significantly reduce human effort and increase reproducibility. In particular, AI-based natural language processing could assist in consolidating requirements from large volumes of standards, guidelines, and advisories, thereby enhancing scalability and efficiency.
Third, empirical studies are needed to evaluate the framework in practice and to calibrate the quantitative methodology. On the one hand, structured expert workshops, pilot deployments in industry, or regulatory co-assessments can assess usability and uncover tacit assurance practices not captured in document-based evaluation. On the other hand, defensible calibration of weights and scoring distributions will require systematic data sources. Potential avenues include structured expert elicitation to align priorities, regulatory benchmarks to validate consistency with compliance outcomes, vulnerability datasets such as CWE or CVE to test correlation with observed weaknesses, and operational incident data to establish predictive validity. In parallel, sensitivity analysis and simulated data can be used to examine robustness in the absence of complete empirical datasets. This staged strategy ensures that the framework evolves from a conceptual skeleton into a calibrated methodology grounded in measurable outcomes.
Finally, future research should explore the framework’s role in continuous and adaptive assurance. As systems evolve through frequent updates and dynamic configurations, assurance frameworks must support ongoing verification rather than one-time certification. The suggested structure's applicability to contemporary, adaptive system contexts would be increased by looking into how it may be integrated into lifecycle assurance procedures or used in conjunction with digital twin techniques.
9. Conclusion
This paper presented a traceable, multi-level framework for hardware security assurance. Grounded in the pillars of progression, scoping, standards integration, and analytic assurance, the framework systematically translates abstract objectives into concrete requirements and verification conditions. In doing so, it establishes a structured path from high-level intent to reproducible evidence, addressing long-standing inconsistencies in fragmented assurance practices
The framework was instantiated on a wireless fingertip oximeter, a safety-critical device with heterogeneous assurance demands. The evaluation showed how obligations drawn from IEC, UL, and FIPS standards can be consolidated into a coherent requirement set that preserves traceability while reducing redundancy. Representative verification conditions demonstrated how abstract criteria can be linked to observable evidence such as firmware-integrity validation and tamper-event logging. Comparative analysis against a standards-only baseline highlighted improvements in coverage, transparency, and actionability.
The quantitative methodology introduced in Section 7 further showed how structured outputs can support scoring and aggregation, establishing a foundation for measurable assurance posture and infrastructure resilience analysis. While the scoring model remains illustrative rather than empirically calibrated, the results confirm the framework’s readiness for quantitative extension and cross-domain application. Future work will prioritize empirical calibration, multi-domain validation, and extension to supply chain and socio-technical assurance, moving toward a general methodology for system assurance across safety-critical sectors.