Scaling AI for expert level seizure detection in newborns

January 13, 2025

by Robert Hogan

AI research

Neonatal seizures are a medical emergency but often occur without visible symptoms. Seizure recognition requires real-time expert EEG monitoring that is unfortunately unavailable in 82% of hospitals. Failure to treat these seizures in a timely manner is associated with adverse neurodevelopmental outcomes. 

To fill this need there has been increasing interest in using AI for automating neonatal seizure detection and democratizing access to this critical assessment. Our recent research “Scaling convolutional neural networks achieves expert level seizure detection in neonatal EEG” published in npj Digital Medicine presents a breakthrough for this field by achieving expert-level accuracy in neonatal seizure detection for the first time.

The Challenge

Most previous works have been limited in data scale by the scarcity of expert time for EEG data annotation. As a result the datasets used are often very small, and annotated with so called “weak labels” where a seizure is localised in time but not on specific EEG channels. This has limited both the scale and therefore quality of the models that can be trained with this data. The result has been that, while models have improved over time with newer techniques, they have lagged behind human expert level performance.

Our Approach

We believe the path to success in AI is to simultaneously scale training data and model sizes, as has been born out in many other fields. To tackle this, we built an annotation platform to maximise the efficiency of expert annotators. This allowed us to scale the dataset to over 50,000 hours of EEG recorded from 202 babies. Crucially, we use “strong labels” which localise the seizure both in time and on specific EEG channels, providing a much better training signal for the model.

This dataset, combined with a state-of-the-art deep convolutional neural network architecture allowed us to scale the size of the models we could effectively train.

Big Gains from Scaling

Our hypothesis that scaling dataset size and model was confirmed by experiment. We found that by increasing the number of neonates or the total hours of annotated EEG in the dataset we could get ~ 50% improvements in detection performance.

Similarly, we scaled the size of the number of several orders of magnitude from a 38k parameter model, typically of the literature, up to a 21m parameter model. We found significant performance gains across a large number of metrics on 2 held out test sets.

The resulting model is a new state-of-the-art for neonatal seizure detection with an AUC of 0.982 on open source benchmark from Helsinki.

Expert Level Performance

In addition to reporting a variety of metrics on two held out datasets, we compared our model directly to expert agreement. When reviewing neonatal EEG for seizure experts do not agree 100% of the time. Without another source of ground truth, this inter-rater sets a ceiling for measuring a model’s performance. To see how the model compares we conduct a statistical test akin to the Turing test. For a given set of experts, we measure the inter-rater agreement, then compare it to what we would get if we replaced an expert with the AI. If there is no change in inter-rater agreement, we can say the model is expert equivalent. To make this more robust, we do this for all experts in turn and account for variability of the dataset using bootstrap statistics.

In our paper we detail this procedure and show that our largest model  is indistinguishable from human experts on two held out test sets. This is an important milestone for the field and changes how we think about the capabilities of these models. It unlocks new use cases like large scale scientific studies of previously unexamined datasets. We’ve already taken our first steps in this area which you can read about here [link to “Spatial density of early seizures in neonatal EEG is predictive of total seizure burden: a large-scale retrospective study” poster]

Bringing it to the cotside

This new model is one of several AI models that run as part of our short automated assessment of newborn brain health with the Wave device. You can read more about Wave here.