Autonomic Regulation and Sustained Attention during Mindfulness with Touchless Physiological Monitoring in Preadolescents

SameerYami¹1✉Emailsameer@augment-me.com

KostasTsioutsiouliklis¹1

PhillipeGoldin2

StevenLaureys3

BobStahl4

DianneFarley5

BenjaminHe¹1

Augment Me, Inc

University of California, Davis

3Harvard University

4Brown University Mindfulness Center

5Alexander Rose Elementary School (Milpitas Unified School District)

Sameer Yami¹*, Kostas Tsioutsiouliklis¹, Phillipe Goldin², Steven Laureys³, Bob Stahl⁴, Dianne Farley⁵, Benjamin He¹

¹ Augment Me, Inc.; ² University of California, Davis; ³ Harvard University; ⁴ Brown University Mindfulness Center; ⁵ Alexander Rose Elementary School (Milpitas Unified School District)

*Correspondence: sameer@augment-me.com

Keywords:

productivity

sustained attention

reaction time variability

mindfulness

stress

heart rate variability

photoplethysmography

affective computing

Under IRB Numbers: 20231480 and 20243759 (WCG IRB)

Abstract

Brief, mindfulness training—paired with touchless physiological monitoring—can measurably boost focus and reduce stress, in real-world classrooms.

Educational systems worldwide face persistent barriers to sustaining attention, reducing stress, and improving achievement, particularly in socioeconomically diverse schools. This challenge was addressed by integrating affective computing, cognitive neuroscience, and educational practice in a two-year, staggered quasi-experimental study of 96 students aged 7–13 in a San Francisco school district, including a Title I cohort. The staggered implementation meant that all participants ultimately received the intervention, precluding inclusion of a no-treatment control group. The program—a developmentally tailored mindfulness and breathing intervention integrated with camera-based physiological monitoring and cognitive assessments—was delivered over 4–5 weeks in weekly 30-minute, facilitator-led sessions. Students showed reduced stress (p = 0.012) and improved sustained attention (reaction-time variability ↓39.8 ms, p < 0.001). Gains spanned both high- and lower-performing learners, peaking during the fall–winter term. While positive, non-significant trends were observed in standardized math and reading scores, these academic results were not statistically significant. Heart Rate Variability (HRV) trajectories indicated physiological stability in intervention groups. These findings suggest that enhanced autonomic regulation (indexed by HRV stability) may underlie improved sustained attention and academic growth.

1. Introduction

Mindfulness-based programs are increasingly adopted in schools to enhance emotional regulation, focus, and well-being. Yet large-scale randomized trials, such as the MYRIAD study involving over 28,000 adolescents, have reported little or no advantage of mindfulness training over standard social–emotional learning curricula [1–4]. Most large-scale mindfulness trials in schools [1–3] have relied primarily on self-report or teacher-rated outcomes and short-term pre/post assessments, offering limited insight into real-time physiological or cognitive engagement. Few studies have examined how mindfulness affects moment-to-moment regulation within authentic classroom settings. The present study addresses this gap by integrating touchless physiological monitoring via Remote Photoplethysmography (rPPG) and digital mindfulness delivery to capture continuous, multimodal indicators of stress regulation and learning in real-world educational contexts. These findings underscore the need for refined, developmentally appropriate interventions that move beyond generic, self-reported outcomes toward objective, multimodal measures of engagement and learning.

Given the high student-to-psychologist ratio in schools and the lack of scalable, precise tools for monitoring and improving engagement, there is a critical need for integrated systems that can be delivered with fidelity by trained facilitators and objectively monitor engagement and physiological responses in real time. Such tools would enable educators and psychologists to better assess and enhance student focus, performance, and well-being at scale [5]. Few existing studies integrate subjective, cognitive, physiological, and academic measures in real-world classrooms, particularly within underserved populations. Prior school-based mindfulness programs have largely relied on self-report or classroom observations, providing limited insight into underlying physiological processes or moment-to-moment engagement.

In contrast, the present two-year field study introduces an integrated approach that combines structured mindfulness exercises with touchless, camera-based physiological sensing (remote photoplethysmography; rPPG) and multimodal cognitive assessments. This framework integrates behavioral, neural, and academic indicators within naturalistic classroom environments, addressing limitations of prior short-term or laboratory-based studies. By capturing authentic, longitudinal data from preadolescents in diverse educational contexts, both the short-term and sustained effects of brief, structured mindfulness interventions—delivered via a custom digital affective computing platform with embedded, non-contact physiological monitoring—on cognitive stability, stress regulation, and academic performance in elementary and middle school students, were evaluated. [6–7].

Table 1
Comparison of prior school-based mindfulness studies and the present work, highlighting advances in methodology, measurement, and ecological validity.
Dimension	Prior Research	Present Study
Setting & Duration	Mostly short (4–12 week) programs, lab-based or pilot classroom studies	Two-year, real-classroom implementation across multiple grades
Measurement	Primarily self-report, teacher ratings, or one-time cognitive tests	Continuous, touchless physiological monitoring (rPPG) + cognitive & academic metrics
Population	Often small, homogeneous, or high-SES samples	Diverse, mixed-performance preadolescents, including underserved populations
Ecological Validity	Limited — interventions delivered outside typical school routines	Fully embedded in school activities
Scalability	Manual or therapist-dependent; difficult to standardize	Automated, digitally delivered platform with non-contact sensors facilitated by school psychologists
Mechanistic Insight	Behavioral or self-perceived outcomes only	Physiological evidence (Heart Rate Variability or HRV, Heart Rate, Respiration Rate) linked to cognitive control and learning

2. Methods

(core summary; additional details below and

in Supplementary Information)

Ninety-six students (78 elementary, 18 middle school; grades 2–8) from a district including one Title I elementary school participated across two academic years. A quasi-experimental staggered-start design was employed, where two groups began the intervention one week apart to control for external factors. All students ultimately received the intervention due to parents’ and teachers’ requirements (see Table 2 for design).

The intervention consisted of weekly 30-minute sessions (including student, hardware and software setup time) over 4–5 weeks. These sessions, inspired from various curriculums and facilitated in-person by school psychologists (with home practice encouraged) were delivered online by trained professionals and facilitated in-person by school psychologists, incorporating exercises such as mindful breathing, 4-7-8 breathing, and belly breathing [8].

Intervention Protocol Transparency

Each mindfulness session lasted 3–5 minutes and was conducted 2 times per week during regular classroom hours under psychologist supervision. The digital platform automatically logged session duration, frequency, and completion metrics. Physiological signals were acquired via touchless remote photoplethysmography (rPPG) using a standard classroom tablet/laptop/mobile camera, with automated artifact rejection for motion or lighting variation.

All procedures were approved by the district research ethics committee, and parental consent and student assent were obtained prior to participation. Data were anonymized prior to analysis.

2.1 Data Collection and Measurement

Multimodal data was collected to evaluate cognitive, physiological, and academic outcomes. Physiological data were obtained using Augment Me’s WotNow?!, a Neuro-AI platform that integrates real-time physiological measurement with adaptive coaching to optimize engagement, focus, reduce stress, and enhance learning outcomes.

WotNow?! employs remote photoplethysmography (rPPG), an advanced non-contact imaging technique that measures cardiovascular activity by detecting subtle color fluctuations in facial skin corresponding to blood flow. Using sophisticated signal-processing algorithms, these micro-variations are converted into reliable estimates of respiratory rate, heart rate (HR) and heart rate variability (HRV). The accuracy of the rPPG-based HRV estimation approach has been established in previous validation studies showing high agreement with contact-based ECG and PPG measurements under similar conditions [9–11]. The unobtrusive nature of this approach makes it particularly suitable for educational environments, enabling continuous real-world monitoring of engagement and physiological state without disrupting learning.

Using standard desktop or mobile cameras, WotNow?! - which can also estimate stress and focus levels, subsequently delivering targeted recommendations and feedback. This setup ensures adaptability across classroom and home contexts while preserving participant privacy.

2.2 Outcome Measures

Physiological indicators included heart rate variability (HRV) computed through Hilbert-Huang Transform (HHT) and Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement(MTTS-CAN) mechanisms, both derived from camera-based rPPG signals [12–13].

Cognitive performance was assessed using the Gradual-Onset Continuous Performance Task (GradCPT), a validated measure of sustained attention [14].

Self-reported well-being was measured via the DASS21-Y subscales for stress, anxiety, and depression [15].

Academic outcomes were tracked using Lexile reading, Quantile math, and IXL math scores [16–20].

For our analysis, data were compared to baseline measures and national benchmarks.

Paired t-tests, effect sizes (Cohen’s d), and academic growth relative to national averages were calculated. Data were excluded for confounding factors such as illness or poor sleep based on pre-registered criteria.

2.2.1 Experimental and Control Group Design Details

Due to principal’s and parents’ requests that all students receive the mindfulness lessons, the students were divided into groups with the experimental group receiving mindfulness lessons 1 week earlier.

Setup: Each student was assigned randomly to one of 2 groups. Each group did 3 lessons over 3 weeks, but they all started at different times. (see Table 2)

Table 2
Each student was assigned randomly to one of 2 groups. Each group did 3 lessons over 3 weeks, but they started at different times. (The reason for splitting the students into two groups is to account for extraneous factors – logistics, parents’/teachers’ requests, exams, holidays etc.) Students’ HRV was measured every week, before and after each lesson. Measurements were also taken when students didn't have lessons. This established a baseline to compare against.
		Week 1	Week 2	Week 3	Week 4	Week 5
Group 1	No lesson. Surveys + pretest Scan	Lesson 1 Pre questions and Scan before and after	Lesson 2 Pre questions and Scan before and after	Lesson 3 Pre questions and Scan before and after	No lesson, surveys, post test, post test scan	No lesson
Group 2	No lesson. Surveys + pretest Scan	No lesson	Lesson 1 Pre questions and Scan before and after	Lesson 2 Pre questions and Scan before and after	Lesson 3 Pre questions and Scan before and after	No lesson No lesson, surveys, post test, post test scan

Students’ stress and focus levels were measured every week, before and after each lesson. Measurements were also taken when students did not have lessons. This established a baseline to compare against.

Ideally, the following graph was expected over time, which shows an upwards trajectory with increasingly diminishing returns (dots represent the average score of a group):

Fig. 1

Expected change in focus over the study for a single group

By stacking the groups, the following graph was expected:

Fig. 2

Expected change in focus across the groups.

These were the hypotheses:

1. Students demonstrate increased metrics very soon, just after 2 lessons.

Baseline: Students before their first lesson across all groups.

Treatment: Students after the first and second lesson.

2. Students demonstrate an increase in metrics between the first lesson and the last lesson.

Baseline: Students after previous lessons.

Treatment: Students after later lessons.

Contrary to the graphs shown above, students may not see their numbers increase monotonically. The reason may be extraneous factors, such as weather, holidays, exam periods, etc. which may drive everybody’s numbers up or down. Nevertheless, students further ahead in the lesson plan should always be higher on average than students further behind in the lesson plan. This is also the reason the students were split into 2 groups that not all start the lessons at the same time: to account for extraneous factors.

Baseline: Students at lesson i at time t

Treatment: Students at lesson j at time t, where j > i

Metrics were expected to drop once the lessons are completed, but less than their starting values.

Baseline: Students before the first lesson

Treatment: Students after the last lesson

At the end of the sessions, all students should show a marked improvement in various metrics.

2.2.2 Student Selection, Process, and Lessons

Flyers were distributed both in print and online to parents of selected classes, with class teacher consent.

Many parents enrolled their children after reviewing the materials. In some cases, class teachers recommended students directly to the school psychologist, or the school psychologist selected students in consultation with the teachers and parents.

Students typically attended weekly 30-minute sessions at the school psychologist’s office, using Chromebooks or Mobile phones. For convenience, they were organized into small groups of about 5–8 students and were also encouraged to repeat the lessons at home.

The lessons were loosely based on the Emory University SEE Curriculum for Elementary and Middle Schools [8], focusing on simple 5–10 minute breathing exercises with mindfulness elements, such as mindful breathing, 4-7-8 breathing, and belly breathing.

While the lesson content was developed by Augment Me’s in-house mindfulness experts, the sessions were delivered by school psychologists who were themselves experienced in mindfulness and breathing techniques.

2.2.3 Software, Datasets and Sample Sizes

Software: Python 3.9 + NumPy/SciPy, Tensorflow/Keras, PyTorch

Statistical Tests: Paired t-test with Cohen’s d. All statistical tests were two-tailed paired t-tests with α = 0.05. Effect sizes were reported as Cohen’s d to indicate magnitude. Multiple comparisons across cognitive, physiological, and academic measures were exploratory and not corrected for family-wise error, given the pilot and quasi-experimental design.

Psychometric Data: 377 entries from stress, anxiety, and depression assessments (DASS-21 based scoring)

HRV Data (HHT System): 408 sessions measuring LF/HF ratio, focus state, and autonomic metrics

HRV Data (MTTS System): 386 sessions with parallel physiological measurements

Reaction Time Data: 133,348 individual cognitive task trials (filtered for correct responses)

Math Performance: Longitudinal academic data including Fall/Winter/Spring quantiles and i-Ready diagnostic scores

Timeline Data: Comprehensive participant tracking across intervention period with experimental condition assignments

Final Analysis Sample: 96 participants with comprehensive data across modalities

National Academic Benchmarks: Grade-specific percentile tables for Quantile (math), Lexile (reading), and IXL Math assessments across Fall/Winter/Spring assessment periods

2.2.4 Data Processing and Quality Control

2.2.4.1 Timestamp-Based Data Matching

HRV- Reaction Time Variability (RTV) Temporal Alignment

Implemented sophisticated timestamp matching with 2-hour tolerance windows

Rationale: Physiological and cognitive measures needed temporal proximity for meaningful correlation

Process: Systematic pairing of HRV sessions with closest RTV sessions within tolerance

Quality Control: Validated temporal sequences to ensure proper pre/post intervention ordering

• Assumptions

o HRV and RTV measures remain relatively stable within 2-hour windows

Temporal proximity indicates related physiological/cognitive states

No systematic bias in measurement timing between groups

2.2.4.2 Psychometric Score Calculation

DASS-21 Scoring Implementation

Computed standardized stress, anxiety, and depression scores

Stress Questions: ['q1', 'q6', 'q8', 'q11', 'q12', 'q14', 'q18'] - sum of responses

Anxiety Questions: ['q2', 'q4', 'q7', 'q9', 'q15', 'q19', 'q20'] - sum of responses

Depression Questions: ['q3', 'q5', 'q10', 'q13', 'q16', 'q17', 'q21'] - sum of responses

Validation: Verified score distributions against expected DASS-21 norms

• Assumptions

o DASS-21 subscales accurately capture distinct psychological constructs

Simple sum scoring provides valid composite measures

Student self-reports are reasonably accurate and unbiased

RTV Calculation

Coefficient of Variation (CV) Implementation

Used CV = (Standard Deviation of RT) / (Mean RT) as primary outcome

Rationale: CV normalizes variability across different baseline reaction time speeds

Correct Responses Only: Filtered to include only accurate trial responses to ensure cognitive engagement

Session Requirements: Minimum 20 correct responses per session for reliable CV calculation

• Assumptions

o CV provides superior measure of attention consistency compared to raw RT

Correct responses indicate genuine cognitive engagement

Within-session RT patterns reflect sustained attention capabilities

Block index resets indicate new test initiation (handled appropriately)

2.2.4.3 HRV Processing and Standardization

LF/HF Ratio Calculation

Primary HRV measure across both measurement systems

System Validation: HHT vs MTTS correlation = 0.999 (near-perfect agreement)

Session Aggregation: Calculated mean LF/HF ratio across all sessions per participant

Stability Metrics: Computed standard deviation and session count for data quality assessment

• Assumptions

o LF/HF ratio accurately reflects autonomic nervous system balance

Both measurement systems assess equivalent physiological constructs

Multiple sessions provide more stable individual estimates than single measurements

Higher LF/HF ratios indicate sympathetic dominance and stress vulnerability

2.2.4.4 Advanced Outlier Detection and Management

Dual-Method Outlier Identification

Combined physiological plausibility with statistical detection

Physiological Outlier Criteria:

HRV Values: Flagged values < 0.1 or > 100.0 as physiologically implausible

RTV Values: Identified extreme CV values suggesting measurement artifacts

RT Values: Removed trials < 100ms or > 3000ms as likely non-responses or lapses

Key Findings:

RTV CV Change: 55–59% physiological outliers, indicating high individual variability in improvement patterns

HRV Measures: 5% physiological outliers, primarily extreme low or high values

Minimal Impact: Outlier removal had negligible effects on correlation patterns

Assumptions

• Physiological ranges represent genuine biological plausibility

Statistical outliers beyond Inter-Quartile Range (IQR) boundaries likely represent measurement errors

Conservative removal criteria preserve genuine individual differences while removing artifacts

2.2.4.5 Potential Confounding Factor Management

Sleep/Food/Home Problems Identification

Systematic exclusion of participants with external stressors

Data Source: Identified from participant self-report questionnaires

Criteria: Students reporting significant sleep disruption, food insecurity, or major home problems

Application: Applied to elementary + middle school sample

Impact Quantification: Tracked changes in outcome measures before/after exclusion

Exclusion Results:

Sample Impact: 12 students removed (11.8% of original sample)

Outcome Effects: Mean stress change shifted by only 0.010 points

Success Rates: Improvement rate changed by -0.3 percentage points

Interpretation: Minimal group-level impact despite theoretical importance

Assumptions

• Self-reported confounding factors were accurately identified and reported

These factors represent genuine external stressors affecting intervention response

Middle school and elementary populations have similar confounding factor patterns

Group-level effects reflect individual-level impacts

2.2.4.6 Individual Baseline Methodology Development

Personalized HRV Baseline System

Established individual baseline frameworks for deviation tracking

Baseline Definition: First measurement session for each participant established personal baseline

Deviation Calculation: All subsequent sessions measured as deviations from individual baseline

Stability Assessment: Evaluated baseline stability across initial sessions

Group Comparison: Analyzed deviation patterns by experimental condition

Implementation Details:

Baseline Requirements: Minimum first session data with valid HRV measurements

Deviation Formula: (Current LF/HF - Baseline LF/HF) / Baseline LF/HF × 100

Quality Control: Verified baseline measurements were from pre-intervention sessions

Assumptions

• First session represents stable individual baseline state

Individual baselines remain relatively stable in absence of intervention

Deviations from baseline indicate meaningful physiological changes

Personal baselines provide more sensitive change detection than group averages

2.2.4.7 Academic Score Processing and National Norm Integration

2.2.4.7.1 National Benchmarking Methodology: Implemented systematic comparison against established academic standards

Data Sources: IXL Math scores (before/after intervention), Quantile scores (Fall/Winter/Spring), Lexile scores (Fall/Winter/Spring)

National Norm Tables: Integrated percentile tables for grade-appropriate comparisons across all three times of year

Combined Group Analysis: Treated experimental and control groups as single intervention cohort since both received intervention

Score Extraction: Applied robust parsing for varied data formats (handling "BR" prefixes for below-grade reading, "EM" prefixes for below-grade math)

2.2.4.7.2 Implementation Details:

IXL Math Processing: Extracted second value from comma-separated score pairs (English, Math format)

Quantile/Lexile Parsing: Converted scored strings to numeric values with appropriate negative handling for below-grade-level scores

Grade Alignment: Cross-referenced student grade levels with corresponding national percentile benchmarks

Temporal Matching: Aligned student assessment periods (Fall, Winter, Spring) with national norm timeframes

2.2.4.7.3 Quality Control Measures:

Data Validation: Verified score ranges within plausible academic achievement bounds

Missing Data Handling: Systematic identification of students with complete vs. partial academic data

Grade-Level Verification: Ensured student grade assignments matched assessment difficulty levels

National Norm Accuracy: Validated percentile table alignment with published academic standards

Assumptions

• Combined experimental and control groups represent valid intervention sample

National 50th percentile provides appropriate comparison baseline

Academic assessment timing corresponds to intervention effects measurement periods

Student grade-level assignments accurately reflect academic developmental stage

3. Results

(All statistical tests were two-tailed paired t-tests with α = 0.05; effect sizes reported as Cohen’s d.)

Among the five hypotheses tested, evidence supported Hypotheses 1, 2, 4, and 5. Evaluation of Hypothesis 3 was precluded by the staggered cohort design.

3.1 Academic Performance Outcomes

The primary academic outcomes measured, Quantile (math) and Lexile (reading) scores, showed positive trends but did not reach statistical significance.

A two-tailed paired t-test on the Fall ->Winter transition (n = 32–33) revealed non-significant growth for both Quantile (t = 1.54, p = 0.134) and Lexile (t = 1.72, p = 0.096). The Winter ->Spring transition also showed non-significant growth for Quantile (p = 0.503) and Lexile (p = 0.344). Due to the lack of statistical significance, the observed "gains" (detailed in sections 3.1.1 and 3.1.2) should be interpreted as preliminary, exploratory trends that are limited by small, uncontrolled sample sizes.

In contrast to the main cohort, a separate analysis was conducted on a specific group of 17 middle school students (Grades 7–8) using the IXL Math assessment. This group did show a statistically significant improvement (p < 0.001)

Fig. 3

Lexile, Quantile and IXL score improvements as compared to national averages for Grades 2–8

Middle School

IXL math scores improved by + 84.5 points especially for high achieving students, with Grade 7 reversing from 85 points below to 18 points above the national average (Fig. 3). Growth points exceeded national expectations.

Fig. 4

Student growth advantage versus national expectations across time periods (Fall→Winter - FW, Winter→Spring - WS) by grade level and domain. Bars represent mean point differences relative to national 50th percentile benchmarks for Quantile (Math), Lexile (Reading), and IXL (Math). Positive values indicate performance exceeding national averages.

Quantile Math Growth Analysis (Fig. 3–4): Across the fall-to-winter transition, students demonstrated mean gains of approximately + 41 points above expected national growth patterns. During the winter-to-spring period, average growth remained positive, at + 16 points above expected norms. These advantages were most pronounced among students who began below national medians, suggesting convergence toward normative performance over time. Also, see Supplementary Section Table 3.

Table 3
Quantile Mathematics and Lexile Reading Performance Detailed Data Analysis (Fig. 3)
Grade	Transition	Quantile Math Performance	Lexile Reading Performance
Grade 2 (n = 1)	Fall→Winter	Students averaged + 240 points above national 50th percentile in fall, + 210 points above in winter	Students averaged + 350 points above national 50th percentile in fall, + 330 points above in winter
Grade 2 (n = 1)	Winter→Spring	Students averaged + 330 points above national 50th percentile in winter, + 280 points above in spring	Students averaged + 210 points above national 50th percentile in winter, + 385 points above in spring
Grade 3 (n = 1)	Fall→Winter	Students averaged + 135 points above national 50th percentile in fall, + 380 points above in winter	Students averaged + 100 points above national 50th percentile in fall, + 46 points above in winter
Grade 3 (n = 1)	Winter→Spring	Students averaged + 380 points above national 50th percentile in winter, + 345 points above in spring	Students averaged + 100 points above national 50th percentile in winter, + 260 points above in spring
Grade 4 (n = 16)	Fall→Winter	Students averaged − 152 points below national 50th percentile in fall, − 123 points below in winter	Students averaged − 40 points below national 50th percentile in fall, − 47 points below in winter
Grade 4 (n = 15)	Winter→Spring	Students averaged − 111 points below national 50th percentile in winter, − 86 points below in spring	Students averaged − 60 points below national 50th percentile in winter, − 33 points below in spring
Grade 5 (n = 11)	Fall→Winter	Students averaged − 72 points below national 50th percentile in fall, − 26 points above in winter	Students averaged − 42 points below national 50th percentile in fall, + 46 points above in winter
Grade 5 (n = 11)	Winter→Spring	Students averaged − 26 points below national 50th percentile in winter, − 25 points below in spring
Grade 5 (n = 9)	Winter→Spring	-	Students averaged + 108 points above national 50th percentile in winter, + 114 points above in spring
Grade 6 (n = 4)	Fall→Winter	Students averaged − 144 points below national 50th percentile in fall, − 98 points below in winter	Students averaged − 214 points below national 50th percentile in fall, − 162 points below in winter
Grade 6 (n = 4)	Winter→Spring	Students averaged − 98 points below national 50th percentile in winter, − 41 points below in spring	Students averaged − 162 points below national 50th percentile in winter, − 245 points below in spring

Lexile Reading Growth Analysis (Fig. 3–4): Across the fall-to-winter interval, students demonstrated mean Lexile gains approximately + 37.4 points above expected national growth trajectories. During the winter-to-spring transition, growth remained positive at approximately + 16.7 points above national expectations. Notably, the largest relative improvements occurred among Grades 4–5, indicating potential catch-up effects for initially lower-performing readers. Also, see Supplementary Section Table 3.

IXL Growth Analysis

Following the intervention, students demonstrated substantial improvements in IXL math proficiency scores. Grade 7 students improved from an average of 85 points below the national 50th percentile (495 vs. 580) to 18 points above (598 vs. 580), representing a net gain of + 103 points. Grade 8 students improved from 16 points above the national benchmark to 56 points above, with a gain of + 40 points. Overall, participants exhibited an average growth of approximately + 84 points, exceeding expected national growth patterns by over + 70 points, and reflecting a shift from below- to above-average performance. Also, see Supplementary Section Table 4.

Table 4
IXL Mathematics Performance Detailed Data Analysis (Fig. 3)
Grade	Transition	IXL Math Performance
Grade 7 (n = 12)	Before→After	Students averaged 495 points before intervention vs. national 50th percentile of 580 (difference: -85 points below national average); averaged 598 points after intervention vs. national 50th percentile of 580 (difference: +18 points above national average); growth: +103 points
Grade 8 (n = 5)	Before→After	Students averaged 666 points before intervention vs. national 50th percentile of 650 (difference: +16 points above national average); averaged 706 points after intervention vs. national 50th percentile of 650 (difference: +56 points above national average); growth: +40 points

3.1.1 Statistical Significance of Quantile, Lexile and IXL Scores

3.1.1.1 Quantile Scores:

Fall→Winter Growth: t = 1.54, p = 0.134, Cohen's d = 0.27 (small effect size, not significant)

Winter→Spring Growth: t = 0.68, p = 0.503, Cohen's d = 0.12 (negligible effect size, not significant)

3.1.1.2 Lexile Scores:

Fall→Winter Growth: t = 1.72, p = 0.096, Cohen's d = 0.30 (small effect size, approaching significance)

Winter→Spring Growth: t = 0.96, p = 0.344, Cohen's d = 0.17 (negligible effect size, not significant)

3.1.1.3 IXL Math Scores:

Before→After Growth: t = significant, p < 0.001, substantial effect size (statistically significant)

Grade 7 Performance Transformation: Students moved from − 85 points below national average to + 18 points above (103-point improvement representing 1.8 grade-level equivalents)

Grade 8 Performance Enhancement: Students improved from + 16 points above national average to + 56 points above (40-point improvement while maintaining above-average status)

Overall Growth Pattern: 100% of grade levels (2/2) showed growth, with average improvement of + 71.7 points

Grade-level sample sizes for Grades 2–3 (n = 1 each) preclude generalization; these data are included for completeness only.

3.2 Academic Performance Outcomes

3.2.1 Performance Relative to National Benchmarks

Performance varied by grade level (Fig. 3, Tables 3–4). Lower elementary students (Grades 2–3) consistently scored above national medians across both reading and mathematics, whereas upper elementary grades (4–6) began below national averages but demonstrated measurable recovery over time.

3.2.2 IXL Mathematics Performance

IXL mathematics data revealed significant net improvement across middle-grade cohorts (Fig. 3, Tables 3–4). The overall student mean increased from 49 points below the national baseline to 31 points above, yielding a net gain of + 80 points.

Grade 7 (Remediation and Recovery): Improved from − 85 to + 18 relative to national norms, representing a full reversal from below-average to above-average performance.

Grade 8 (Acceleration): Advanced from + 16 to + 56 above the national benchmark, indicating continued academic acceleration among higher-achieving learners.

All participants (Grades 7–8) exhibited positive growth exceeding national expectations, with no evidence of plateau or decline.

3.2.3 Grade-Specific Patterns

Patterns of change differed across developmental stages (Tables 3–4).

High performers (Grades 2–3): Sample sizes (n = 1) were too small for valid analysis, but individual data is included in Supplementary Table 3 for completeness.

Improving performers (Grades 4–5): Began below national medians but converged toward grade-level expectations, demonstrating measurable academic recovery.

Struggling performers (Grade 6): Remained below national averages yet showed modest upward trends.

Recovery cohorts (Grade 7): Achieved marked remediation (− 85 to + 18).

Accelerators (Grade 8): Sustained and expanded their above-average advantage (+ 16 to + 56).

All grade levels demonstrated positive trajectories that met or exceeded national growth expectations across sequential time periods.

3.3 Academic Achievement Context

The intervention benefited a diverse academic population, spanning from high-achieving early-grade students (Grades 2–3) to below-average cohorts (Grade 6). Lower elementary students maintained strong above-average performance, while upper elementary groups displayed measurable recovery and convergence toward grade-level expectations, suggesting that intervention effects may be developmentally differentiated—reinforcing focus in younger students while facilitating remediation in older ones.

Parallel patterns of improvement across both mathematics (Quantile) and reading (Lexile) indicate that benefits generalized across academic domains rather than being subject-specific. Growth advantages were most pronounced during the fall→winter period (+ 31–37 points above national expectations), coinciding with peak intervention engagement, and remained positive during winter→spring (+ 15–17 points).

Notably, Grade 5 students demonstrated the strongest cumulative recovery, advancing from below-average to above-average performance across both domains.

3.4 Individual and Educational Impact

3.4.1 Individual Student Outcomes

Grade 5 students achieved the most significant within-cohort recovery, improving from − 72 to − 26 Quantile and − 42 to + 46 Lexile points relative to national benchmarks—a 118-point Lexile gain representing a meaningful shift from below-average to above-average performance.

High-performing students (Grades 2–3) maintained substantial leads, confirming that the intervention did not dampen high achievement. Even the lowest performers (Grade 6) demonstrated incremental improvements, underscoring the intervention’s accessibility across the performance spectrum.

3.4.2 Educational Applications

Implementation across multiple grade levels demonstrated feasibility within standard classroom environments. Gains among previously struggling learners (e.g., Grade 5) indicate strong potential for integration into academic support programs, while consistent or enhanced outcomes among advanced students confirm compatibility within high-achieving contexts.

Differential effects by grade suggest that age-specific customization may further optimize outcomes. The consistent positive direction of growth across all cohorts indicates broad scalability for diverse classroom environments.

3.5 Real-World Practical Significance

Even statistically non-significant gains (+ 31–37 points) correspond to meaningful educational improvement, as 15–25 point increases are typically perceptible in classroom performance and confidence. All four measured transitions showed directionally positive growth relative to national norms, indicating reliable benefits even where statistical significance thresholds were not reached.

Effect sizes (Cohen’s d = 0.12–0.30) fall within the small but educationally relevant range; sustained over multiple terms, these gains can yield long-term cumulative advantages in academic development.

Grade 5 transformation: Lexile improvement from − 72 to + 46 (Δ = 118) highlights tangible academic and psychological progress.

Positive consistency: 65% of grade-measure combinations showed clear gains, with no evidence of regression.

These magnitudes are observable by teachers and families and justify implementation even in the absence of large statistical effects, given low program cost and no academic risk.

3.6 Limitations

Small grade-level sample sizes (n = 1–16) limited statistical power and generalizability. The absence of a non-intervention control group restricted causal inference, and self-selection may have introduced participation bias. Differences among assessment systems (Lexile, Quantile, IXL) reflect varied scaling metrics.

Despite these constraints, consistent positive trends across grades, measures, and time periods provide convergent evidence that the intervention produced practical, educationally meaningful gains beyond statistical thresholds.

4. Cognitive Outcomes

In addition to standardized tests, the students also completed the Depression Anxiety and Stress youth version (DASS21-Y) self-report, which is the youth version (ages 7–18) of DASS21, which is “designed to assess the severity of general psychological distress and symptoms related to depression, anxiety, and stress” [15]. The DASS21-Y report includes 21 statements about the subject’s feelings during the prior week. Some examples of statements are “I got upset about little things”, “I did not enjoy anything”, “I felt scared for no good reason”, etc. Subjects respond to each statement with a score of 0 (Not true), 1 (A little true), 2 (Fairly true), and 3 (Very true). The 21 statements fall into three groups of 7 statements each: Depression, Anxiety, and Stress. To measure each of these, the scores of all their corresponding responses are added up and multiplied by 2.

The students completed the DASS-Y questionnaire before and after the breathing and mindfulness sessions.

4.1 Stress

Significant reduction in self-reported stress (mean Δ = −0.99, p = 0.012; Fig. 5). Anxiety and depression changes were nonsignificant. See Supplementary Results, Tables 5–6 for full breakdowns.

Fig. 5

Before and after distributions for Stress, Anxiety, and Depression, as well as change distributions of the three distress symptoms.

Table 5
Cumulative results across all Stress, Anxiety, and Depression.
Distress symptom	Users with overall increase	Users with overall decrease	Users with no net change	Average total change
Stress	26	52	15	-0.99
Anxiety	27	37	29	-0.49
Depression	26	36	31	-0.09

Table 6
Before and after statistics for the DASS21-Y report.
Distress symptom	Before (mean)	After (mean)	Average Change	Paired t-test	Significant change
Stress	7.89 (± 4.61)	6.90 (± 4.93)	-0.99 (± 3.70)	t = 2.577, p = 0.0116	Yes (α = 0.05)
Anxiety	4.06 (± 4.39)	3.57 (± 4.47)	-0.49 (± 3.61)	t = 1.322, p = 0.1894	No (α = 0.05)
Depression	4.35 (± 4.42)	4.27 (± 4.86)	-0.09 (± 3.50)	t = 0.237, p = 0.8132	No (α = 0.05)

4.2 Cognitive Performance

In addition to the above, the students also took a GradCPT test, which was based on the work of Esterman, et al. [14]. The focus test was interactive. It showed a random sequence of images that were either a city or a mountain. Students were asked to click on city images and not click on mountain images. The challenge was that the images were shown in quick succession: 900 images in a timespan of 12 minutes and every image was shown for only 800ms. There were a total of 10 unique city images and 10 unique mountain images, which were chosen at random. Since there were 900 images in total that are shown to the students, the images repeated several times. 90% of the images shown were cities and 10% are mountains. Given the speed and the intensity of the task, the test was designed to measure the ability of the subject to keep their focus throughout the exercise.

p < 0.001; Cohen’s d = 0.23; Fig. 6), indicating improved sustained attention. See Supplementary Information, Fig. 8–11 and Table 7.

Fig. 6

Comparison between reaction times before and after the intervention sessions. There is a statistically significant mean drop of 39.827ms. The p-value is < 0.001 (highly significant), the t-statistic is 37.2663, and Cohen’s d is 0.2256 (small to medium effect).

Fig. 8

Focus test results (hit rate, false alarm rate, d′). The hit rate (clicking on cities) was compared to the false alarm rate (clicking on mountains). Sensitivity analysis was performed by calculating the d’ statistic [21]. The results are summarized above.

Fig. 9

Number of correct/incorrect responses for the 177 focus tests and their frequency distributions. Students averaged 59.1% correct and 40.9% false responses. City hit rate was 57.4% and mountain false alarm rate 25.6%, matching [14]’s false alarms but far below its 97% hit rate—likely due to the brief 800 ms display and frustration from mistakes. The first significant difference between our study and that of [14] - The false alarm rate in [14] was between 25–30%, which aligns with our findings. But the hit rate in [14] was close to 97% which is much higher than ours. There are two reasons speculated for this: 1) the images are only shown for 800ms which is not enough for younger students to respond to, and 2) it was noticed that when they made a mistake oftentimes it would lead to frustration, which, in turn, would make them miss additional subsequent images.

Fig. 10

Binned response time distributions for city and mountain images. The mean response time is lower for cities (435.3ms) than for mountains (358.1ms) and there is a significant increase in the mountain graph for quick clicks (< 250ms). The mean response time for a city was 435.3ms and for a mountain 358.2ms. Both graphs are bimodal and it was speculated that the first mode at ~ 200ms are clicks made instinctively, whereas the second mode, at ~ 420-440ms, corresponds to more conscious decisions. It is not surprising that the mean for cities is higher than for mountains, since given more time to think, students will make the correct decision to not click on mountains. It is also no surprise that the first mode in the mountain graph is relatively higher than the other since these clicks correspond to less conscious decisions.

Fig. 11

Average reaction time (left) and standard deviation (right) for each image position across all tests. Figure 10 shows the average reaction time for each of the 900 images across all 177 tests as well as the standard deviation of the reaction time for each image. There is a significant decrease in the reaction time as the focus test progresses, and the variability of the standard deviation also converges.

Table 7
Correct/incorrect responses based on images type
Image type	True answers across all tests	False answers across all tests	Total count
City	82244 (57.4%)	61049 (42.6%)	143293
Mountain	11910 (74.4%)	4097 (25.6%)	16007
All	94154 (59.1%)	65146 (40.9%)	159300

A decrease in reaction time variability (RTV) in the (GradCPT) after a mindfulness session is a positive sign of improved sustained attention and cognitive stability. Specifically -

Improved Attentional Control: Lower RTV suggests the participant's attention became more consistent and less prone to moment-to-moment lapses.

Reduced Mind-Wandering: High variability is often linked to episodes of mind-wandering or attentional drift. A reduction indicates better focus and reduced cognitive fluctuation.

Enhanced Cognitive Stability: Reaction times that are more uniform suggest the participant is maintaining a more stable cognitive state, which is desirable in tasks requiring sustained attention.

Acute Effect of Mindfulness:

The mindfulness session likely had an immediate, beneficial effect on the participant’s ability to stay engaged and focused.

4.3 Physiological Outcomes

Intervention groups maintained heart rate variability (HRV) stability over the 4–5 week program, whereas control groups exhibited a decline over the same period. Although the group difference did not reach statistical significance (p = 0.552), the effect size was consistent with a meaningful physiological trend, suggesting potential modulation of autonomic regulation. HRV trajectories are provided in Supplementary Results (Fig. 11).

5.Discussion

Fig. 7

Multimodal outcomes showing convergent improvements across physiological, cognitive, academic, and emotional domains.

Panel A: HRV stability increased by + 4–6 ms (autonomic regulation).

Panel B: Sustained attention improved (GradCPT performance uptrend).

Panel C: Non-significant positive trends were observed in Lexile and Quantile academic scores of ≈ + 30 points each above national expectations.

Panel D: Significant stress reduction (Δ = −0.99, p = 0.012).

Together, these results demonstrate coherent enhancement across biological, behavioral, and learning systems.

Previous large-scale mindfulness trials, including the MYRIAD study [1], have raised critical questions about the effectiveness of mindfulness programs in school settings—particularly when outcomes rely primarily on self-reports or teacher ratings. Our findings contribute to this ongoing debate by demonstrating that even brief, structured mindfulness interventions can produce measurable cognitive and emotional benefits when engagement is continuously tracked through real-time physiological signals and behavioral metrics.

Unlike prior classroom-based interventions that depended on subjective recall or infrequent assessments [2–4], the present study incorporated remote photoplethysmography (rPPG) to noninvasively monitor cardiovascular and stress-related markers. This integration enabled fine-grained analysis of how physiological arousal and attention fluctuated across sessions and how these changes related directly to academic performance. Such multimodal tracking provides a framework for understanding not only whether mindfulness interventions are effective, but also how and when they exert their effects.

While prior meta-analyses have emphasized that short-term school programs often lack ecological validity [1–4], our two-year field implementation within active school demonstrates the feasibility of scalable, touchless monitoring systems that operate unobtrusively during everyday learning. By aligning physiological engagement data with academic and behavioral outcomes, this approach advances beyond the self-report limitations of earlier research and bridges the gap between controlled laboratory studies and naturalistic educational contexts.

Brief, structured mindfulness and breathing exercises—digitally delivered with professional facilitation and real-time, touchless physiological and cognitive monitoring— were associated with measurable improvements in sustained attention (p < 0.001) and stress reduction (p = 0.012), as well as positive academic trends which were statistically significant for a separate middle school cohort (p < 0.001). These effects were observed across both higher- and lower-performing learners, including those from socioeconomically disadvantaged settings. Recruiting 96 students for a multimodal, longitudinal study in active classrooms—integrating cognitive, physiological, self-report, and academic measures—required extensive coordination and yielded a rare, ecologically valid dataset for this developmental group.

Although neural activity was not directly measured, the findings—stable HRV alongside improved sustained attention—are consistent with established models proposing that autonomic regulation supports higher-order cognitive control via prefrontal–subcortical networks. Empirical work shows that individuals with higher heart rate variability exhibit stronger functional connectivity between the amygdala and medial prefrontal cortex (mPFC), components of a central autonomic network that supports emotion regulation and attention [22–25]. This aligns with the neurovisceral integration model linking HRV with prefrontal inhibitory control, offering a plausible pathway by which brief mindfulness training may enhance attention and learning outcomes in children, meriting targeted neural investigation in future trials. Together, these converging lines of evidence suggest that improved self-regulation of autonomic and attentional systems may constitute a shared neurophysiological pathway underlying the observed academic and cognitive benefits.

Overall, this scalable, data-driven model enables objective engagement tracking and adaptive intervention delivery, addressing key methodological limitations of prior school-based mindfulness studies.

Limitations

The absence of a long-term control group, relatively small subcohorts, and the single-district scope constrain causal inference. However, consistent multimodal effects across cognitive, physiological, and academic domains support the promise of this approach and warrant further large-scale evaluation.

6. Conclusion

This study provides rare, ecologically valid evidence that a digitally delivered, measurement-integrated mindfulness program—facilitated in real schools and classrooms—can measurably enhance attention and reduce stress in preadolescents in a scalable way. Achieving full recruitment and retention of 96 students, including a Title I cohort, underscores the feasibility of implementing such scalable, low-cost interventions while collecting multi-modal physiological, cognitive, and academic data in active school settings. These early, challenging-to-replicate findings point to a viable, data-driven pathway for integrating mindfulness into global education systems to boost student attention and well-being, which may create a stronger foundation for academic success.

Supplementary Information

Supplementary Information accompanies this paper and includes extended HRV–RTV analyses, raw physiological signal visualizations, and task performance data tables. These materials provide additional methodological details and robustness checks supporting the main results.

Fig. 12

HRV Trajectory of Experimental and Control Groups

Ethics approval

and compliance

This study was approved by the Institutional Review Board of WCG Clinical (WCG IRB) under IRB Numbers − 20231480 and 20243759.

All methods were performed in accordance with the relevant guidelines and regulations.

Written informed consent was obtained from all participants and/or their legal guardians prior to participation.

Consent to Participate

All participants provided informed consent prior to their involvement in the study.

For minors, written consent was obtained from a parent or legal guardian.

In addition, participants aged 12 years and older provided written assent in accordance with institutional review board (IRB) guidelines.

Data Availability

The datasets generated and analyzed during the current study are not publicly available due to institutional and proprietary restrictions but are available from the corresponding author upon reasonable request. Aggregated anonymized data supporting the findings of this study can be shared for research verification purposes upon request.The analytical and signal-processing code used in this study involves proprietary algorithms and is not publicly available.

Funding

Declaration

This research received no external funding.

Author Contribution

Sameer Yami designed the experiment, managed logistics, secured funding, ran the experiment, collected data, wrote code, analyzed data, provided mindfulness resources, and wrote the paper.Kostas Tsioutsiouliklis contributed to the design of the experiment, wrote code, analyzed data, and wrote the paper.Philippe Goldin contributed to the design of the experiment.Steven Laureys contributed to the design of the experiment.Bob Stahl provided mindfulness resources.Dianne Castillano-Farley supported logistics and data collection.Benjamin He wrote code, analyzed data, and contributed to writing the paper.

Declarations

Competing Interests

Sameer Yami is the Founder of Augment Me, Inc., which conducted this study. Kostas Tsioutsiouliklis and Benjamin He are employees of Augment Me, Inc. Philippe Goldin, Steven Laureys and Bob Stahl serve as scientific advisors to Augment Me, Inc. The other authors declare no competing interests.

References

Kuyken, W. et al. Effectiveness and cost-effectiveness of universal school-based mindfulness training compared with normal school provision in reducing risk of mental health problems and promoting well-being in adolescence: the MYRIAD cluster randomised controlled trial. Evid. Based Ment Health. 25, 99–109 (2022).

Zenner, C., Herrnleben-Kurz, S. & Walach, H. Mindfulness-based interventions in schools - a systematic review and meta-analysis. Front. Psychol. 5, 603 (2014).

Dunning, D. L. et al. Meta-analysis on mindfulness-based interventions for adolescents in school settings: significant effects for stress but not for depression or anxiety. Child Adolesc. Ment Health (2023).

Ostermann, T., Pawelkiwitz, M. & Cramer, H. The influence of mindfulness-based interventions on the academic performance of students measured by their GPA: a systematic review and meta-analysis. Front. Behav. Neurosci. 16, 961070 (2022).

Sohn, E. & American Psychological Association. There’s a strong push for more school psychologists, (2024). https://www.apa.org/monitor/2024/01/trends-more-school-psychologists-needed

Picard, R. W. Affective Computing (MIT Press, 1997).

Augment & Me Inc. WotNow?! https://augment-me.com.

Emory University. See Learning. https://seelearning.emory.edu/en/home

McDuff, D., Estepp, J., Piasecki, A., Blackford, E. & Affectiva, O. S. M. I. Non-contact measurement of heart rate variability using a webcam. IEEE Trans. Biomed. Eng. 61, 2656–2663 (2014).

10.

de Haan, G. & Jeanne, V. Robust pulse rate from chrominance-based rPPG. Biomed. Opt. Express. 4, 2375–2389 (2013).

11.

Rouast, P. V., Adam, M. T. P., Cornforth, D., Lux, E. & Weaving, C. Remote heart rate measurement using low-cost webcams: Validation and applications in affective computing. Front. Comput. Sci. 3, 642168 (2021).

12.

Li, H., Kwong, S., Yang, L., Huang, D. & Xiao, D. Hilbert-Huang Transform for Analysis of Heart Rate Variability in Cardiac Health. IEEE/ACM Trans. Comput. Biol. Bioinform. 8, 1204–1212 (2011).

13.

Liu, X., Fromm, J., Patel, S. & McDuff, D. Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement. in Advances in Neural Information Processing Systems vol. 33, 15538–15549Curran Associates, Inc., (2020).

14.

Esterman, M., Noonan, S. K., Rosenberg, M. & Degutis, J. In the zone or zoning out? Tracking behavioral and neural fluctuations during sustained attention. Cereb. Cortex. 23, 2712–2723 (2013).

15.

Lovibond, S. H. & Lovibond, P. F. Manual for the Depression Anxiety & Stress ScalesPsychology Foundation, (1995).

16.

Lennon, C. & Burdick, H. The Lexile Framework as an Approach for Reading Measurement and SuccessMetaMetrics,. (2004).

17.

MetaMetrics Inc. Lexile–Grade Level Charts. https://hub.lexile.com/lexile-grade-level-charts/

18.

IXL Learning. https://www.ixl.com

19.

Zhao, Y. & Mayne, Z. National Norms for IXL’s Diagnostic in Grades K-12IXL Learning,. (2024).

20.

Curriculum Associates. i-Ready. https://www.curriculumassociates.com/programs/i-ready-learning

21.

Wickens, T. D. Elementary Signal Detection Theory Vol. 20 (OUP USA, 2001).

22.

Sakaki, M. et al. Heart rate variability is associated with amygdala functional connectivity with medial prefrontal cortex across younger and older adults. NeuroImage 139, 44–52 (2016).

23.

Huber, A. et al. Brain activation and heart rate variability as markers of autonomic function under stress. Sci. Rep. 15, 12430 (2025).

24.

Steinfurth, E. C. K. et al. Resting State Vagally-Mediated Heart Rate Variability Is Associated With Neural Activity During Explicit Emotion Regulation. Front. Neurosci. 12, 794 (2018).

25.

Thayer, J. F. & Lane, R. D. A model of neurovisceral integration in emotion regulation and dysregulation. J. Affect. Disord. 61, 201–216 (2000).

Yes