Detecting sleep using heart rate and motion data from multisensor consumer-grade wearables, relative to wrist actigraphy and polysomnography (original) (raw)

Improving Sleep Quality Assessment Using Wearable Sensors by Including Information From Postural/Sleep Position Changes and Body Acceleration: A Comparison of Chest-Worn Sensors, Wrist Actigraphy, and Polysomnography

Journal of Clinical Sleep Medicine, 2017

Study Objectives: To improve sleep quality assessment using a single chest-worn sensor by extracting body acceleration and sleep position changes. Methods: Sleep patterns of 21 participants (50.8 ± 12.8 years, 47.8% female) with self-reported sleep problems were simultaneously recorded using a chest sensor (Chest), an Actiwatch (Wrist), and polysomnography (PSG) during overnight sleep laboratory assessment. An algorithm for Chest was developed to detect sleep/wake epochs based on body acceleration and sleep position/postural changes data, which were then used to estimate sleep parameters of interest. Comparisons between Chest and Wrist with respect to PSG were performed. Identification of sleep/wake epochs was assessed by estimating sensitivity, specificity, and accuracy. Agreement between sensor-derived sleep parameters and PSG was assessed using correlation coefficients and Bland-Altman analysis. Results: Chest identified sleep/wake epochs with an accuracy of on average 6% higher than Wrist (85.8% versus 79.8%). Similar trends were observed for sensitivity/specificity values. Correlation between Wrist and PSG was poor for most of the sleep parameters of interest (r = 0.0-0.3); however, Chest and PSG correlation showed moderate to strong agreement (r = 0.4-0.8) with relatively low bias and high precision bias (precision): 9.2 (13.2) minutes for sleep onset latency; 17.3(34.8) minutes for total sleep time; 7.5 (29.8) minutes for wake after sleep onset; and 2.0 (7.3)% for sleep efficacy. Conclusions: Combination of sleep postural/position changes and body acceleration improved detection of sleep/wake epochs compared to wrist acceleration alone. The chest sensors also improved estimation of sleep parameters of interest with stronger agreement with PSG. Our findings may expand the application of wearable sensors to clinically assess sleep outside of a sleep laboratory.

A Validation Study of a Commercial Wearable Device to Automatically Detect and Estimate Sleep

Biosensors

The aims of this study were to: (1) compare actigraphy (ACTICAL) and a commercially available sleep wearable (i.e., WHOOP) under two functionalities (i.e., sleep auto-detection (WHOOP-AUTO) and manual adjustment of sleep (WHOOP-MANUAL)) for two-stage categorisation of sleep (sleep or wake) against polysomnography, and; (2) compare WHOOP-AUTO and WHOOP-MANUAL for four-stage categorisation of sleep (wake, light sleep, slow wave sleep (SWS), or rapid eye movement sleep (REM)) against polysomnography. Six healthy adults (male: n = 3; female: n = 3; age: 23.0 ± 2.2 yr) participated in the nine-night protocol. Fifty-four sleeps assessed by ACTICAL, WHOOP-AUTO and WHOOP-MANUAL were compared to polysomnography using difference testing, Bland–Altman comparisons, and 30-s epoch-by-epoch comparisons. Compared to polysomnography, ACTICAL overestimated total sleep time (37.6 min) and underestimated wake (−37.6 min); WHOOP-AUTO underestimated SWS (−15.5 min); and WHOOP-MANUAL underestimated wake ...

Detecting sleep outside the clinic using wearable heart rate devices

Scientific Reports

The adoption of multisensor wearables presents the opportunity of longitudinal monitoring of sleep in large populations. Personalized yet device-agnostic algorithms can sidestep laborious human annotations and objectify cross-cohort comparisons. We developed and tested a heart rate-based algorithm that captures inter- and intra-individual sleep differences in free-living conditions and does not require human input. We evaluated it on four study cohorts using different research- and consumer-grade devices for over 2000 nights. Recording periods included both 24 h free-living and conventional lab-based night-only data. We compared our optimized method against polysomnography, sleep diaries and sleep periods produced through a state-of-the-art acceleration based method. Against sleep diaries, the algorithm yielded a mean squared error of 0.04–0.06 and a total sleep time (TST) deviation of -$$ - 2.70 (± 5.74) and 12.80 (± 3.89) minutes, respectively. When evaluated with PSG lab studie...

Assessing the performance of a commercial multisensory sleep tracker

PLOS ONE, 2020

Wearable sleep technology allows for a less intruding sleep assessment than PSG, especially in long-term sleep monitoring. Though such devices are less accurate than PSG, sleep trackers may still provide valuable information. This study aimed to validate a commercial sleep tracker, Garmin Vivosmart 4 (GV4), against polysomnography (PSG) and to evaluate intra-device reliability (GV4 vs. GV4). Eighteen able-bodied adults (13 females, M = 56.1 ± 12.0 years) with no self-reported sleep disorders were simultaneously sleep monitored by GV4 and PSG for one night while intra-device reliability was monitored in one participant for 23 consecutive nights. Intra-device agreement was considered sufficient (observed agreement = 0.85 ± 0.13, Cohen’s kappa = 0.68 ± 0.24). GV4 detected sleep with high accuracy (0.90) and sensitivity (0.98) but low specificity (0.28). Cohen’s kappa was calculated for sleep/wake detection (0.33) and sleep stage detection (0.20). GV4 significantly underestimated time a...

Sleep detection with an accelerometer actigraph: comparisons with polysomnography

Physiology & Behavior, 2001

Two validation studies were conducted to optimize the sleep-detection algorithm of the Actillume. The first study used home recordings of postmenopausal women (age range: 51 to 77 years), which were analyzed to derive the optimal algorithm for detecting sleep and wakefulness from wrist activity data, both for nocturnal in-bed recordings and considering the entire 24 h. The second study explored the optimal algorithm to score in-bed recordings of healthy young adults (age range: 19 to 34 years) monitored in the laboratory. In Study I, the algorithm for in-bed recordings (n = 39) showed a minute-by-minute agreement of 85% between Actillume and polysomnography (PSG), a correlation of .98, and a mean measurement error (ME) of 21 min for estimates of sleep duration. Using the same algorithm to score 24-h recordings with Webster's rules, an agreement of 89%, a correlation of .90, and 1 min ME were observed. A different algorithm proved optimal to score in-bed recordings (n = 31) of young adults, yielding an agreement of 91%, a correlation of .92, and an ME of 5 min. The strong correlations and agreements between sleep estimates from Actillume and PSG in both studies suggest that the Actillume can reliably monitor sleep and wakefulness both in community-residing elderly and healthy young adults in the laboratory. However, different algorithms are optimal for individuals with different characteristics. D

It is All in the Wrist: Wearable Sleep Staging in a Clinical Population versus Reference Polysomnography

Nature and Science of Sleep, 2021

There is great interest in unobtrusive long-term sleep measurements using wearable devices based on reflective photoplethysmography (PPG). Unfortunately, consumer devices are not validated in patient populations and therefore not suitable for clinical use. Several sleep staging algorithms have been developed and validated based on ECG-signals. However, translation from these techniques to data derived by wearable PPG is not trivial, and requires the differences between sensing modalities to be integrated in the algorithm, or having the model trained directly with data obtained with the target sensor. Either way, validation of PPG-based sleep staging algorithms requires a large dataset containing both gold standard measurements and PPG-sensor in the applicable clinical population. Here, we take these important steps towards unobtrusive, long-term sleep monitoring. Methods: We developed and trained an algorithm based on wrist-worn PPG and accelerometry. The method was validated against reference polysomnography in an independent clinical population comprising 244 adults and 48 children (age: 3 to 82 years) with a wide variety of sleep disorders. Results: The classifier achieved substantial agreement on four-class sleep staging with an average Cohen's kappa of 0.62 and accuracy of 76.4%. For children/adolescents, it achieved even higher agreement with an average kappa of 0.66 and accuracy of 77.9%. Performance was significantly higher in non-REM parasomnias (kappa = 0.69, accuracy = 80.1%) and significantly lower in REM parasomnias (kappa = 0.55, accuracy = 72.3%). A weak correlation was found between age and kappa (ρ = −0.30, p<0.001) and age and accuracy (ρ = −0.22, p<0.001). Conclusion: This study shows the feasibility of automatic wearable sleep staging in patients with a broad variety of sleep disorders and a wide age range. Results demonstrate the potential for ambulatory long-term monitoring of clinical populations, which may improve diagnosis, estimation of severity and follow up in both sleep medicine and research.

Validation of Zulu Watch against Polysomnography and Actigraphy for On-Wrist Sleep-Wake Determination and Sleep-Depth Estimation

Sensors

Traditional measures of sleep or commercial wearables may not be ideal for use in operational environments. The Zulu watch is a commercial sleep-tracking device designed to collect longitudinal sleep data in real-world environments. Laboratory testing is the initial step towards validating a device for real-world sleep evaluation; therefore, the Zulu watch was tested against the gold-standard polysomnography (PSG) and actigraphy. Eight healthy, young adult participants wore a Zulu watch and Actiwatch simultaneously over a 3-day laboratory PSG sleep study. The accuracy, sensitivity, and specificity of epoch-by-epoch data were tested against PSG and actigraphy. Sleep summary statistics were compared using paired samples t-tests, intraclass correlation coefficients, and Bland–Altman plots. Compared with either PSG or actigraphy, both the accuracy and sensitivity for Zulu watch sleep-wake determination were >90%, while the specificity was low (~26% vs. PSG, ~33% vs. actigraphy). The ...

Sleep assessment by means of a wrist actigraphy-based algorithm: agreement with polysomnography in an ambulatory study on older adults

Chronobiology International, 2020

The purpose of the present work is to examine, on a clinically diverse population of older adults (N = 46) sleeping at home, the performance of two actigraphy-based sleep tracking algorithms (i.e., Actigraphy-based Sleep algorithm, ACT-S1 and Sadeh's algorithm) compared to manually scored electroencephalography-based PSG (PSG-EEG). ACT-S1 allows for a fully automatic identification of sleep period time (SPT) and within the identified sleep period, the sleep-wake classification. SPT detected by ACT-S1 did not differ statistically from using PSG-EEG (bias = −9.98 min; correlation 0.89). In sleep-wake classification on 30-s epochs within the identified sleep period, the new ACT-S1 presented similar or slightly higher accuracy (83-87%), precision (86-89%) and F1 score (90-92%), significantly higher specificity (39-40%), and significantly lower, but still high, sensitivity (96-97%) compared to Sadeh's algorithm, which achieved 99% sensitivity as the only measure better than ACT-S1's. Total sleep times (TST) estimated with ACT-S1 and Sadeh's algorithm were higher, but still highly correlated to PSG-EEG's TST. Sleep quality metrics of sleep period efficiency and wake-aftersleep-onset computed by ACT-S1 were not significantly different from PSG-EEG, while the same sleep quality metrics derived by Sadeh's algorithm differed significantly from PSG-EEG. Agreement between ACT-S1 and PSG-EEG reached was highest when analyzing the subset of subjects with least disrupted sleep (N = 28). These results provide evidence of promising performance of a fullautomation of the sleep tracking procedure with ACT-S1 on older adults. Future longitudinal validations across specific medical conditions are needed. The algorithm's performance may further improve with integrating multi-sensor information.

A Comparison of Sleep Detection by Wrist Actigraphy, Behavioral Response, and Polysomnography

Sleep, 1997

Two alternative methods for detecting sleep. wrist actigraphy (ACT) and behavioral response monitoring (BRM), were compared to polysomnography (PSG). In the BRM paradigm, a threshold intensity visual or auditory stimulus generated by a palm-top computer was presented about once per minute, and subjects pressed a microswitch if the stimulus was detected. A response within 5 seconds of the stimulus was scored as "wake" and a failure to respond as "sleep." Four males and four females underwent two nights of simultaneous in-home PSG, BRM, and ACT.~Each night, subjects underwent a protocol designed to generate five sleep latency trials. Subjects were awakened by alarm clocks at approximately I-hour intervals and remained awake for 10 minutes before returning to bed for another sleep onset latency (SOL) trial. Minute-by-minute comparisons were made for PSG versus ACT and BRM. All measures were fairly sensitive in detecting sleep, but BRM was more accurate in determining SOL and subsequent wakefulness. Behavioral response monitoring using a tone resulted in more responses and arousals prior to and during light stages of sleep than BRM using a light. It is concluded that BRM has some important advantages as a simple, minimally invasive method for monitoring sleep.

Multi-Night at-Home Evaluation of Improved Sleep Detection and Classification with a Memory-Enhanced Consumer Sleep Tracker

Nature and Science of Sleep

To evaluate the benefits of applying an improved sleep detection and staging algorithm on minimally processed multisensor wearable data collected from older generation hardware. Patients and Methods: 58 healthy, East Asian adults aged 23-69 years (M = 37.10, SD = 13.03, 32 males), each underwent 3 nights of PSG at home, wearing 2 nd Generation Oura Rings equipped with additional memory to store raw data from accelerometer, infra-red photoplethysmography and temperature sensors. 2-stage and 4-stage sleep classifications using a new machine-learning algorithm (Gen3) trained on a diverse and independent dataset were compared to the existing consumer algorithm (Gen2) for whole-night and epoch-by-epoch metrics. Results: Gen 3 outperformed its predecessor with a mean (SD) accuracy of 92.6% (0.04), sensitivity of 94.9% (0.03), and specificity of 78.5% (0.11); corresponding to a 3%, 2.8% and 6.2% improvement from Gen2 across the three nights, with Cohen's d values >0.39, t values >2.69, and p values <0.01. Notably, Gen 3 showed robust performance comparable to PSG in its assessment of sleep latency, light sleep, rapid eye movement (REM), and wake after sleep onset (WASO) duration. Participants <40 years of age benefited more from the upgrade with less measurement bias for total sleep time (TST), WASO, light sleep and sleep efficiency compared to those ≥40 years. Males showed greater improvements on TST and REM sleep measurement bias compared to females, while females benefitted more for deep sleep measures compared to males. Conclusion: These results affirm the benefits of applying machine learning and a diverse training dataset to improve sleep measurement of a consumer wearable device. Importantly, collecting raw data with appropriate hardware allows for future advancements in algorithm development or sleep physiology to be retrospectively applied to enhance the value of longitudinal sleep studies.