Toward precision EEG: assessing the reliability of individual-level ERPs across EEG systems

Abstract

Introduction:

Event-related potentials (ERPs) are among the most established tools for studying the neural mechanisms of perception and cognition. Advancing toward precision EEG, however, places new demands for a better understanding of how reliable neural markers are at the individual subject level.

Methods:

We conducted two complementary experiments using an auditory oddball paradigm with three sounds (Standard, Target, and Novel) to examine the reliability of N100 and P300 components. In Experiment 1, we assessed consistency at both the group and individual levels across four EEG systems: one research-grade wired system (BioSemi) and three mobile devices (Smarting, DSI-24, and EPOC X). In Experiment 2, we used a test–retest design to evaluate within-participant reliability over time.

Results:

Results from Experiment 1 show that at the group level, all EEG systems demonstrated the canonical N100 and P300 components; however, the EPOC X system showed a significantly reduced signal-to-noise ratio compared to the others. At the individual level, temporal and spatial clustering analyses showed that N100 and P300 components were detectable in most individuals (70–85%), with additional significant responses appearing outside this range. We further calculated the similarity of individual responses across participants (“typicality index”), which revealed highly consistent responses to Standard and Novel sounds, alongside divergent patterns of responses to Targets. In Experiment 2, results indicated high within-participant consistency of response patterns for all three stimuli, demonstrating that individual ERPs remain reliably stable over time, even when they deviate from canonical group-level patterns.

Conclusion:

The current study contributes to the ongoing discussion regarding the utility and reliability of ERP-based metrics for precision imaging and highlights important methodological considerations for their practical implementation.

1 Introduction

Event-related potentials (ERPs) have long been a cornerstone of cognitive neuroscience, offering a robust, non-invasive, and cost-effective means of examining the neural dynamics that underlie perception and higher-order cognitive functions. Since its inception (Donchin et al., 1973; Hillyard et al., 1973, 1998; Picton et al., 1974; Squires et al., 1975; Sutton et al., 1965), ERP research has relied heavily on averaging responses across individuals as a way to increase the signal-to-noise and to obtain neural responses that were common and generalizable (Luck, 2014; Makeig et al., 2004; Sur and Sinha, 2009). This approach has been hugely successful, leading to some of the most foundational discoveries in cognitive neuroscience and establishing a taxonomy of canonical ‘ERP components’ that have been used for decades to study the mechanisms underlying perceptual and cognitive processes in the human brain (Duncan et al., 2009; Hajcak et al., 2019; Helfrich and Knight, 2019; Luck, 2012, 2014). And yet, relying solely on group averages has its limitations, particularly if we strive to use ERP-based metrics to learn something about individual brains and their idiosyncrasies. This scientific aspiration, sometimes referred to as precision imaging framework, captures the desire to use neuroimaging tools for the purpose of individualized assessments of neurocognitive abilities or clinical conditions, with the capacity to inform personalized interventions and to assess treatment efficacy (Dubois and Adolphs, 2016; Gordon et al., 2017). Indeed, ERP-based measures hold much promise to serve as potential biomarkers for different neurocognitive conditions (de Aguiar Neto and Rosa, 2019; Keizer, 2021; McLoughlin et al., 2014; O’Sullivan et al., 2006), such as schizophrenia (Johnstone et al., 2013; Ren et al., 2021; Rosburg, 2018; Rosburg et al., 2008; Shen et al., 2020), attentional deficit disorder (Gamma and Kara, 2016; Hasler et al., 2016; Kaiser et al., 2020; Slater et al., 2022), post-traumatic stress disorder (Javanbakht et al., 2011; Johnson et al., 2013; Lewine et al., 2002), and anxiety (Al-Ezzi et al., 2020; Howe et al., 2014; Lars Thoma et al., 2020; Turan et al., 2002). The applicational appeal of individual ERP-based biomarkers is enhanced by the impressive advances of mobile EEG technology in recent years, which increases accessibility to affordable neural recording, bringing devices directly to the individual – in their home, hospital, clinic or school (Davidesco et al., 2023; Davidesco et al., 2021; Gillis et al., 2022; Hölle et al., 2021; Hölle and Bleichner, 2023; Mathewson et al., 2024; Sabio et al., 2024; Xu et al., 2022).

However, the utility of individual-level ERPs as biomarkers is complicated by the inherent variability of these responses. As described extensively over the years, the spatiotemporal properties of individual-level ERPs are strongly influenced by multiple factors, including anatomical differences, electrode positioning, and head size (Luck, 2014; Makeig et al., 2004; McCarthy and Wood, 1985; O’Connor et al., 1994). Moreover, the method for identifying specific ERP components in individual responses can be challenging, often relying on visual peak-picking or general heuristics that do not always capture the extent of individual variability and lack standardized procedures (Davidesco et al., 2023). Adding to that, the various ERP components (e.g., N100, MMN, P2, P300, N400, P600, etc.), may differ in their detectability across individuals and specific design features. For example, the P300 components, which is often evoked by surprising events, is deemed more reliable than other components such as N100 or P2 (Fabiani et al., 1998; Intriligator and Polich, 1995; Polich, 1987, 1997; Segalowitz and Barnes, 1993; Sklare and Lynn, 1984; Tervaniemi et al., 1999). Hence, the assumption that ERPs of individuals look like ‘noisy’ versions of group-level ERPs, in terms of time-course and spatial topography, does not seem to hold and poses a stark barrier for ERP-based precision imaging (Clayson et al., 2019; Fields and Kuperberg, 2020; Hajcak et al., 2017; Höller et al., 2017; Jensen and MacDonald, 2023; Karvelis et al., 2023; Luck and Gaspelin, 2017; Melnik et al., 2017). Specifically, it makes it difficult to distinguish between individual differences that truly reflect variability in neurocognitive abilities or clinical states (and accordingly could potentially serve as meaningful biomarkers) vs. individual differences that simply capture the “natural” variability between healthy brains (Hajcak et al., 2017; Höller et al., 2017). That said, if we had a good assessment of this “natural” variability in individual-level ERP components, this would substantially advance efforts to assess their utility for precision imaging. The current study is an attempt to do just that – to quantify the similarity and variability in ERP responses across individuals, for specific ERP components and for the entire spatio-temporal pattern (Segalowitz and Barnes, 1993; Tomé et al., 2015).

We focused on one of the most well-studied ERP paradigms - the 3-sound auditory oddball task (Fabiani et al., 1987; Hillyard et al., 1973; O’Connor et al., 1994; Squires et al., 1976). In this paradigm, participants listen to a sequence of repeated tones (Standards) and are asked to detect specific deviant tones (Targets). In addition, occasional Novel sounds are presented (in this case, short ecological sounds such as phone ringtones; Masson and Bidet-Caulet (2019)], which are not expected. This paradigm yields two prominent ERP components – an early N100, that reflects the neural response in early auditory cortices (Coull, 1998; Hillyard et al., 1973, 1998; Oatman and Anderson, 1977) and a later P300 response, that is associated with higher cognitive processes including attention-capture, stimulus discrimination and decision-making (Coull, 1998; Linden, 2005; Ghani et al., 2020; Hillyard et al., 1973; Lee et al., 2014; Polich, 2007; Tomé et al., 2015). While the N100 is considered an obligatory response for all auditory stimuli, the P300 is more selective and is usually observed in response to deviant, target, or surprising events (with potential dissociations between P300 subcomponents. For review, see Polich, 2007). Here, we use this paradigm to assess the similarity and variability in ERP responses to Standard, Target, and Novel sounds, in a non-clinical population, focusing on the N100 and P300 components as well as on the ERP waveform.

We conducted two complementary experiments. In Experiment 1, we assessed the consistency/variability of the time-course of neural responses across individuals and the probability of identifying the N100 and P300 components at the individual-level, based on group-constrained or fully data-driven analyses. Specifically, we compared results across four different EEG systems – spanning wired and wireless systems, research-grade and consumer-grade, with different numbers of sensors. These included: BioSemi (64-channel, wired, gel-based); Smarting (24-channels, wireless, semi-dry by mBrainTrain); DSI-24 (24-channels, wireless, dry, by Wearable Sensing), and EPOC X (14-channels, wireless, semi-dry, by Emotiv). The four EEG systems were selected to represent distinct points along the continuum of EEG acquisition technologies currently used in cognitive neuroscience. These systems differ in electrode count, sensor layout, and hardware constraints (See Table 1), reflecting the diversity of EEG technologies currently used in both laboratory and real-world research. BioSemi served as a high-density, wired laboratory reference system, while EPOC X is one of the most commonly used mobile EEG devices in real-world studies (Sabio et al., 2024; Sawangjai et al., 2020). The Smarting and DSI-24 systems occupy an intermediate position, combining research-grade signal quality with portable hardware and moderate channel counts. Including systems with different electrode numbers and configurations allowed us to assess the generalizability of individual-level ERPs, and their robustness across contexts and technical specifications of different systems. This effort aligns with growing efforts to advance the use of mobile neurotechnologies for clinical and research purposes outside the lab (For comprehensive reviews see: Armand Larsen et al., 2024; D’Angiulli et al., 2022; Hölle et al., 2021; Janssen et al., 2021; Lau-Zhu et al., 2019; Mathewson et al., 2024; Niso et al., 2023; Xu and Zhong, 2018). Experiment 2 was conducted using a research-grade EEG system (BioSemi Active II) and was aimed at evaluating the degree of within-subject ERP consistency and how it related to between-subject variability. Participants repeated the same oddball task experiment as in Experiment 1 twice, in a test–retest design, and we quantified the similarity of individual ERP spatio-temporal morphology between runs.

SystemNumber and age-range of participantsData transference method# EEG ElectrodesType of sensorsSampling frequency rateDefault reference and ground locationsSystem layout (based on the 10–20 system)BioSemiN = 10
Age: 24.27 ± 3.17Wired64Ag-AgCl electrodes with gel conductor1,024 HzCMS-DRLSmarting (mBrainTrain)N = 9
Age: 22.70 ± 2.26Bluetooth24Passive electrodes with saline-soaked sponges as conductors250-500 Hz HzCz-AFzDSI-24
(Wearable sensing)N = 11
Age: 28.78 ± 5.12Bluetooth21Dry active electrodes300 HzLinked M1/M2-FpzEpoc X (Emotiv)N = 7
Age: 16.00 ± 0.00Bluetooth14Passive electrodes with saline-soaked sponges as conductors128 HzP3-P4

Overview of the four EEG systems used in the study.

The table presents participant demographics for each setup and summarizes key characteristics of each system, including data transfer method, number and type of electrodes, sampling frequency, and default reference and ground locations. Scalp layouts in the right column illustrate the electrode positions for each system (circled in red), overlayed on the scalp layout of the BioSemi channel system (based on the international 10–20 system).

2 Materials and methods—experiment 12.1 Participants

EEG was recorded from 37 participants (23 females, 16 males), aged between 15 and 32 (see Table 1 for description of the sample tested with each EEG system). Most participants were right-handed (4 with a dominant left hand) and reported having no hearing problems or diagnosis of any neurological or psychiatric condition. The study was approved by the Institutional Review Board (IRB) of Bar-Ilan University. Informed written consent was obtained from all participants. For participants under the age of 18 (see the EPOC X setup), additional ethical approval was granted by the Israeli Ministry of Education, and parental consent was required before each session. All participants were compensated for their participation with money, course credits, or small tokens.

2.2 Paradigm

All participants performed an auditory oddball task, designed as follows. Stimuli consisted of three types of auditory stimuli: Standard tones (1,000 Hz, 50 milliseconds duration), Target tones (1,500 Hz, 50 milliseconds duration), and a variety of Novel ecological sounds (300 milliseconds duration), such as phone ringtones, adapted from the work of Masson and Bidet-Caulet (2019). All stimuli included 10-millisecond ramp-up and ramp-down phases.

Figure 1 illustrates the structure of the task. In each block, 80 auditory stimuli were presented, which included 70% Standard tones, 15% Target tones, and 15% Novel sounds. Participants were instructed to respond via keyboard button press only to the Target tone. Stimulus order was pseudo-randomized with two constraints: (1) at least five Standard tones were presented consecutively at the beginning and the end of each block; (2) at least one Standard tone was interposed between a Target tone and a Novel sound. We used a constant inter-stimulus interval (ISI) of 1,000 milliseconds, and each block lasted ~87 s. The experiment contained one initial training block followed by eight experimental blocks, yielding a total of ~450 Standard tones, 96 Target tones, and 96 Novel Sounds. A short break was given between each experimental block, and the onset of the next block was self-paced. The experiment was programmed and presented using the PsychoPy software platform (Peirce, 2007).

Diagram illustrating an auditory attention task with a sequence of speaker icons representing standard tones (gray, fifty milliseconds), target tones (orange, fifty milliseconds), and novel sounds (blue, three hundred milliseconds), separated by interstimulus intervals of one thousand milliseconds; finger icons indicate responses to target tones.

Experimental design of the auditory oddball paradigm. Each block consists of 80 auditory stimuli in the following division: 70% standard tones (1,000 Hz), 15% target tones (1,500 Hz), and 15% novel ecological sounds (e.g., ringtones). Stimuli were presented with a fixed 1,000 ms ISI. The experiment included one training block and eight experimental blocks.

2.3 EEG recording (per system)

EEG data were recorded from four different systems (N = 7–11 participants per system, see Table 1 for a detailed description). The same oddball paradigm was used for all systems; however, some aspects of the experimental setup varied from system to system, as a function of their technical specifications as well as other experimental constraints. Rather than viewing these variations as an obstacle for comparison, we see them as a crucial component of our overall scientific goal, which is to assess the generalizability of ERP-based markers across contexts and EEG systems. Similarly, our choice to test a relatively small sample with each system aligns with our overarching goal to evaluate the sensitivity for identifying replicable neural responses within-subject. Our statistical analyses focus primarily on inter-subject statistics, focusing on the similarity between participants rather than relying on group-level averaging. Below, we provide details of the specific experimental setup used for each system.

2.3.1 BiosSemi

We used the BioSemi Active II EEG system (BioSemi BV, Amsterdam, Netherlands; https://www.biosemi.com/Products_ActiveTwo.htm) as our ‘gold standard’ to which three mobile systems were compared. We used a gel-based 64-channel system, with Ag-AgCl electrodes positioned according to the 10–20 system and a 1,024 Hz sampling rate. Data was recorded in an electrical-shielded and acoustically-attenuated room in a lab at Bar Ilan University. Participants were seated comfortably in front of a computer screen, where visual instructions and a central fixation cross were presented. Auditory stimuli were presented in a free-field manner through a loudspeaker positioned in front of the participant. EEG data were recorded through the Lab Streaming Layer platform (LSL) (Kothe et al., 2014). The audio in the room during the experiment was recorded using an external microphone and also streamed into LSL (via the audio-capture interface; https://github.com/labstreaminglayer/App-AudioCapture) to facilitate accurate segmentation of the EEG data based on the actual audio perceived.

2.3.2 Smarting

The second EEG system tested was a saline-based wireless Smarting system (mBrainTrain LLC, Belgrade, Serbia; https://mbraintrain.com/), with 24 EEG electrodes positioned according to the 10–20 system and a 250 Hz sampling rate. Data was recorded using a similar setup as the BioSemi data, in the same electrical-shielded and acoustically-attenuated room at Bar Ilan University. EEG data was wirelessly streamed to the recording computer via Bluetooth and was recorded using LSL, and synchronized to the audio recordings from the microphone (described above).

2.3.3 DSI

The third EEG system tested was the Wearable Sensing DSI wireless dry-electrode system (Wearable Sensing, San Diego, CA, USA; https://wearablesensing.com/dsi-24/), featuring 21 active sensors, positioned according to the 10–20 system and using a 300 Hz sampling rate. These data were collected in a field-based setting, in a quiet yet non-shielded classroom at Stanford University, California, USA. Here, participants completed the experimental task on a laptop, and auditory stimuli were delivered in a free-field manner through the laptop’s built-in speakers. The audio in the room during the experiment was recorded using the internal microphone on the laptop. EEG data was wirelessly streamed to the recording computer via bluetooth and was recorded using LSL, and synchronized to the audio recordings from the microphone.

2.3.4 EPOC X

The last EEG system tested was the EPOC X system by Emotiv (Emotiv Inc., San Francisco, CA, USA; https://www.emotiv.com/), which features 14 passive saline-based electrodes, positioned according to the 10–20 system and using a 128 Hz sampling rate. These data were collected in a field-based setting, in a quiet yet non-shielded classroom in a local high school (in-school lab). Participants were 9th-grade students, aged 14–15, and they completed the experiment as part of an ongoing neuroeducation research-practice partnership between the Begin High School in Ramat Gan and the research team from Bar Ilan University (Korisky et al., 2024). In this setup, participants performed the task on a laptop and auditory stimuli were delivered through in-ear headphones (to avoid excessive noise from outside sources). EEG signals were wirelessly streamed to the recording computer via Bluetooth, recorded using LSL, and synchronized to the audio recordings from the microphone.

2.4 Data analysis2.4.1 Behavior analysis

Key presses were analyzed to classify responses as hits, misses, or false alarms as follows: a key press was classified as a ‘hit’ if it occurred within 200 to 1,500 milliseconds after the target tone onset. Otherwise, that target was considered to be ‘missed’. All other key presses were considered ‘false alarms’, indicating responses made mistakenly or unrelated to the stimuli.

2.4.2 EEG analysis

Data analysis protocols were highly similar across all EEG systems, and were based on the MATLAB-based FieldTrip toolbox (MathWorks 2021, https://www.mathworks.com; Oostenveld et al., 2011) using identical scripts, with only slight adaptations due to system-specific differences (e.g., file format, layout etc.).

2.4.2.1 Preprocessing

Raw EEG data were re-referenced to linked right and left mastoids (BioSemi, Smarting and EPOC X systems) or to linked left and right ear lobes (DSI system, default). Then, data were bandpass filtered between 0.5 and 40 Hz and detrended and demeaned to retain the frequency range associated with auditory ERPs. For artifact correction, we employed independent component analysis (ICA) to remove ocular and cardiac artifacts (identified through visual inspection of the time course and spatial distribution of the ICA components).

The onset of each auditory stimulus was identified using combined indications from digital triggers sent from PsychoPy and the audio recordings. The continuous EEG data were epoched into trials ranging from −100 to 700 ms around the onset of each sound. Epochs with remaining muscle-related or other artifacts or high noise were identified and rejected based on the standard deviation of the EEG signal (STD) using the ft_rejectvisual function in the FieldTrip toolbox (thresholds for rejection were determined separately for each EEG system: BioSemi and Smarting - 25 STD; DSI - 35 STD; EPOC X - 40 STD). The number of trials rejected from each system in each condition is reported in Table 2.

ConditionBioSemiSmartingDSIEPOC XStandard22.2 ± 12.60 (4.9%)39.3 ± 27.0 (8.7%)12.3 ± 11.57 (2.7%)40.8 ± 27.41 (9.1%)Target3.7 ± 4.34 (3.9%)6.4 ± 5.66 (6.7%)1.3 ± 2.11 (1.4%)7.1 ± 5.06 (7.4)Novel5.1 ± 5.19 (5.3%)6.8 ± 6.55 (7.1%)1.2 ± 1.23 (1.3%)8.5 ± 6.95 (8.9%)Mean across conditions10.4 ± 10.31 (4.7%)17.5 ± 18.88 (7.5%)4.9 ± 6.40 (1.8%)18.8 ± 19.10 (8.5%)

The number of trials rejected for each condition and EEG recording system (mean ± SD; percentage of rejected trials).

2.4.2.2 Group level analysis

Clean epochs were averaged for each participant, separately for Standard, Target, and Novel stimuli. Averages were then low-pass filtered at 12 Hz (4th order zero-phase Butterworth filter), and baseline-corrected to the pre-stimulus period (−100 to 0 ms for BioSemi, Smarting and DSI-24 and 50–150 ms for EPOC X, following a visible delay in data acquisition) to produce ERPs. Grand average responses were derived by averaging the ERPs across participants, separately for each system and stimulus type.

To identify time windows where the ERP significantly deviated from zero, we used a data-driven clustering approach for each system. For each task condition, a one-sample t-test was performed at each time point to identify periods where the response differed significantly from zero (p < 0.05). Next, a temporal clustering permutation test was applied to identify contiguous time windows with significant responses at each electrode (Maris and Oostenveld, 2007). In addition, we visually inspected the grand-average waveforms to identify peaks that represent the N100 and P300 components based on their timing, polarity, and scalp topography.

To assess the similarity of ERP responses across systems, we calculated the cross-correlation between the grand-average ERP waveforms from each mobile EEG system and those obtained with the BioSemi cap, which served as the lab-standard reference. This analysis was restricted to ERPs from the one centro-frontal electrode (‘F4’ for EPOC X and ‘Fz’ for all other systems) where the N100 and P300 components were maximal. For each between-system comparison, we extracted the temporal lag corresponding to the maximum correlation and the associated correlation coefficient (Pearson’s r) at this lag.

2.4.2.3 Individual level analysis

The individual-level approach aimed to examine whether the N100 and P300 components that were identified at the group level could also be reliably detected within individual participants’ ERPs. Given the inherent variability in the spatio-temporal morphology of neural responses across individuals, we employed three complementary within-subject analyses that vary in the a-priori assumptions that they impose from the group-level analysis onto the individual-level data. These analyses were applied to single-trial data from each condition and are reported separately for each EEG system.

The first analysis sought to determine the consistency of the time-course of neural responses to each stimulus across individuals, beyond spatial variations between them, and their relation to the group-average. To this end, single-trial EEG data from each participant, one-tailed t-tests were performed at each electrode and each time-point (binned into 20-ms windows to reduce multiple comparisons). Using a statistical threshold of p < 0.05 (uncorrected), we assessed how many participants showed significant responses at any given point in time, across all electrodes and at specific electrodes. We specifically quantified the proportion of participants showing significant neural responses within the time-windows identified in the group-averages for the N100 and P300 peaks.

In the second analysis, we used a more rigorous statistical analysis to identify the N100 and P300 peaks in individual ERP time courses, and to assess the similarity of the time-course of neural responses across individuals. This analysis was restricted to a single electrode where these responses were maximal in the group-averages (Fz or the closest available electrode), in order to reduce the number of multiple comparisons and to overcome potential bias due to the uneven number of electrodes across EEG systems. For each participant, we averaged the single-trial EEG data in each condition to obtain their personal ERP. We applied a two-way t-test to the single-trials at each time-point and used a temporal clustering permutation test [permutest.m function, MATLAB, https://www.mathworks.com/matlabcentral/fileexchange/71737-permutest] to identify clusters of consecutive time points where the ERP differed significantly from zero (positive or negative, alpha < 0.025). This statistical approach captures the truly individuated time courses of neural responses for each participant. Next, we examined the consistency of ERP time courses across individuals by calculating the pairwise correlations between the ERPs of all participants (recorded with the same EEG system), and computed a “typicality score” for each participant: the average of their correlations with all other participants. Next, to assess whether this measure varied across conditions and EEG systems, we fitted a linear mixed-effects model with Condition (Standard, Target, Novel), System (BioSemi, Smarting, DSI-24, EPOC X), and their interaction as fixed effects, and Subject included as a random intercept to account for repeated measurements across conditions within participants. Because each participant contributed one typicality value per condition, the random intercept captured within-subject dependence across conditions rather than trial-level variability.

Estimated marginal means were computed to evaluate pairwise contrasts between conditions, with correction for multiple comparisons. Model fit was quantified using marginal and conditional R2, and the intraclass correlation coefficient (ICC) was computed to estimate between-subject variance.

The third analysis focused on the similarity in the spatial distribution of neural responses across individuals. This analysis was restricted to the two time-windows identified in the group-level analysis for the N100 and P300 components (See Table 3) to reduce the number of multiple comparisons. For each participant, we averaged the single-trial EEG data within these time-windows at each electrode and applied a one-way t-test at each electrode with a spatial-clustering permutation test to identify clusters of consecutive time points where the response differed significantly from zero (negative for N100, positive for P300; alpha < 0.025). Complementing the previous analysis, the approach allows us to investigate the extent to which individual topographies aligned with the group-level patterns.

SystemN100P300StandardTargetNovelNovelBioSemi51–106 (77)50–100 (78)50–104 (87)233–327 (272)Smarting40–104 (76)48–104 (80)32–168 (80)252–332 (296)DSI-24(−16) – 43 (13)(−6) – 50 (23)(−13) – 66 (26)213–270 (240)EPOC Xn.s. (277)n.s. (269)n.s. (269)n.s. (497)

Time windows (in milliseconds) showing significant clusters (p < 0.05, cluster-corrected) corresponding to the N100 and P300 components in response to each task condition, presented separately for each system.

Values in parentheses indicate the peak latency identified through visual inspection.

Together, these analyses provide a basis for comparing and quantifying similarities and differences in ERP responses across participants, conditions, and EEG systems.

3 Results – experiment 13.1 Behavior

Analysis of behavioral performance served as a means to confirm that participants were appropriately engaged in the task and to ensure similar behavioral characteristics across participants tested with different EEG systems. Due to a technical problem during the session, the behavioral data from the EPOC X system was unavailable for two participants, limiting the analysis to five participants (out of seven).

The overall hit rate was relatively high (90% average across systems); and false alarms were minimal (average 2.53 false alarms per participant), indicating that participants tested with all systems understood the task and performed it well. Notably, performance was slightly lower for participants tested with the EPOC X, who were high school students tested in an in-school lab environment, as opposed to participants tested with the other systems who were older. Reaction times were longer for data collected using the DSI-24 and EPOC X systems (~684 ms and ~642 ms), both of which were collected under field conditions, relative to the BioSemi and Smarting systems (~452 ms and ~469 ms, respectively), which were collected in a lab setting. These differences tentatively point to potential effects of the testing environment on performance, a topic that should be taken into consideration, particularly when using mobile testing systems (see Figure 2).

Two scatter plots labeled A and B compare participant performance across four EEG device types: BioSemi (blue), Smarting (orange), DSI-24 (yellow), and EPOC X (purple). Panel A shows hit rates (%) nearly at 100% for most participants, with a few lower outliers by device. Panel B displays response times (milliseconds), with higher values for lower-performing participants. Each device is color-coded as indicated by the legend.

Behavioral performance on the oddball tasks across EEG systems, showing (A) mean hit rates and (B) response time (ms). The X-axis represents the participant index, ordered by performance for each measurement, with each point corresponding to an individual participant.

3.2 EEG results3.2.1 Group-level analysis

To identify group-level time windows for our targeted components – N100 and P300, we first explore the grand-average ERPs elicited in response to Standard, Target, and Novel sounds for each of our systems. Using cluster-based analysis, we identified clusters in the time course where the signal differed significantly from zero (at electrode ‘Fz’ or the closest one to it, p < 0.05, temporal-cluster corrected). Visual inspection reveals a replicable canonical shape of auditory ERPs in all four systems, showing an early frontocentral negative peak in response to all stimuli, corresponding to the N100 component, followed by a wide frontocentral positive peak only in response to Novel sounds, corresponding to the P300 component. Results also revealed differences in onset latencies across systems, with the EPOC X showing a pronounced delay. Since stimulus onset was estimated from audio recordings synchronized to the EEG data via LSL, this delay may be attributed to hardware limitations or data transmission latency (Figure 3).

Four EEG devices are displayed, each paired with a set of line graphs plotting ERP amplitude over time for standard, target, and novel stimuli, and scalp heatmaps for N100 and P300 potentials. Top left shows a red cap, top right a black device, bottom left a black multi-sensor helmet, and bottom right a black headset with frontal sensors. Standard is blue, target is red, and novel is green on each graph, with shaded areas representing variability. Each panel highlights two main ERP components (N100 and P300) and their topographic scalp distributions. Each device’s ERP patterns and topographies are visually distinct.

Grand-average responses and topographies of ERPs across systems: (A) BioSemi (n = 10), (B) Smarting, mbt (n = 9), (C) DSI-24, Wearable Sensing (n = 11), (D) EPOC X, EMOTIV (n = 8). Each panel presents the grand average ERP elicited in response to the three stimuli: Standard, target, and novel sounds, from one selected electrode (‘F4’ for EPOC X, ‘Fz’ for all other systems). Shaded areas around the curves represent the standard error across participants. Time t = 0 represents our best estimation of the onset of the stimulus, which was determined based on direct audio-recordings that were synchronized to the EEG recordings via LSL (see Methods). Nonetheless, some timing differences are observed between the systems, most notably with greatly delayed responses with the EPOC X system. Thus, for the BioSemi, Smarting, and DSI, the baseline was computed on 100 ms before the onset of the stimuli while the baseline period for EPOC X was computed from 50 ms to 150 ms. The colored horizontal lines beneath each figure indicate the time-windows where the ERP in res

Comments (0)

No login
gif