In this study, the frequency dependence of pulsed DPOAE latency in the range of 1 to 14 kHz roughly follows a power law with an exponent of –0.66...–0.71, where −0.66 is the result of fitting \(\Gamma \), and −0.71 is the result of fitting \(\tau \). Correspondingly, the frequency-dependence of the periods has an exponent of 0.29 to 0.34, which may be regarded as a proxy for the increase in gain and frequency tuning of the cochlear amplifier observed from the apex to the base of the cochlea.
Fig. 7
A Dependence of the parameter \(c_1\), representing the baseline value of the maximum active state of the cochlea, on \(c_3\), describing the level dependence of the model function. Linear regression yields \(c_3 = -29.5 + 1.497 \, c_1\). This leads to minimum spread of the latencies at 66.7 dB SPL. B Distribution of the coefficient of the level dependence, \(c_3\), on a logarithmic axis. The vertical red line represents the value of \(c_3\) from Table 1 for the linear fit (15.7 dB)
The range of the above-mentioned numbers reflects different weighting of residuals depending on whether the fit is performed with latency in ms or dB, which is important to consider when comparing to literature values. Figure 5 compares the latency functions from this study to selected data from the literature for a relatively low stimulus level of 40 dB SPL, including SFOAE [41, 42] and DPOAE measurements [41]. For frequencies above 1 kHz, this frequency dependence almost exactly matches the exponent of the frequency dependence for behavioral frequency-tuning in a forward-masking task expressed as tuning-quality factor \(Q_} \approx f^\), as found in [16] for frequencies of 1–8 kHz (see green curve LOS22 in Fig. 5). The overall trend of these OAE latency measures and their comparison to psychophysical tuning estimates is in accordance with the view that frequency selectivity of auditory neural signals and thus psychophysical performance is basically provided by the frequency selectivity of the cochlear filter, at least at low-to-moderate levels and above 1 kHz. This is consistent with findings from a single preparation in a Chinchilla for a basal location [12], and aligns with the concept that the cochlea and the subsequent neural signal processing provide filtering close to the minimum-phase theorem for linear filtering, as has been proposed for a long time [6].
Breaks in Cochlear ScalingThe data of this study indicate that the frequency dependence of pulsed DPOAE latency deviates from a single-exponent power law in the frequency band \(f_2=3-6\, \mathrm kHz\) (mean value: \(c_6=4.7\) kHz). Additionally, there appears to be a noticeable change at \(f_2=1.5\, \mathrm kHz\), as the mean pulsed DPOAE latency function, \(\Gamma \), shows a higher slope between 1 and 1.5 kHz than between 1.5 kHz and the transition region at 4.5 kHz for seven out of ten stimulus levels (Fig. 3B). The existence of a major basal-apical break in cochlear scaling has been proposed by several authors, typically claimed to be at 1 kHz in humans (for review, see, e.g., [2, 31, 43]). This study, however, included only one frequency below \(f_2\)=1.5 kHz, so we did not attempt to fit an additional break point.
While earlier studies suggested scale invariance, at least for the basal part of the cochlea, the term “approximate local scaling invariance” [44] is certainly more appropriate and shall be interpreted here as any exponent of the frequency dependence of the periods below 0.3. \(2^\) = 1.23, meaning that over the range of one octave, properties such as filter bandwidth or latency change “only” by 23%, which might be taken as a reasonable limit for talking of approximate local scaling invariance. In this sense, the region between \(f_2=3\!-\!6\,\mathrm kHz\) would appear to clearly violate approximate local scaling symmetry.Footnote 2 This deviation suggests that also in the basal half the cochlea does not adhere to a simple power law scaling across all frequencies but instead exhibits localized variations in frequency tuning and latency properties.
The improved fit of the frequency dependence of latency when adding a tanh-function to fit the transition region, taken together with the high stability of results over three months, suggests that this is not an incidental finding. These deviations from a simple power law differ from those discussed by others. Christensen et al. [33] identified a second break at 2.6 kHz, beyond which the increase in their DPOAE “scaled” phase — a method to reduce the influence of the different frequency ratios they used — diminishes. Their phase, presented as periods on a linear scale, i.e., N, differs from \(\Gamma \) as used in this study. They also noted a break in the corresponding SFOAE measure, which steepens above that transition frequency. After rescaling the data from [33] to \(\Gamma \) (not shown), a segmented linear fit to the SFOAE data would show breaks at 350 Hz (clearly) and at 1.5 kHz (weakly). Overall, their SFOAE curve shows more continuous changes in bending rather than clear breaks.
Fig. 8
Examples of individual-level dependencies of short-pulsed DPOAE latencies. Each panel represents one subject at a selected \(f_2\), each thin line demonstrates one of the seven test sessions, thick lines the curve fits of the level dependence. Blue curves: left ear; red curves: right ear. Inserts show \(\Delta \Gamma _\). For instance, in panel A, the right ear of subject 01 (orange curve fit) displays a latency of 14 periods at \(L_2\) = 65 dB SPL, and 23 periods at 35 dB SPL, corresponding to a change in latency of \(\Delta \Gamma _\)= 4.25 dB relative to the value at 65 dB SPL (see Methods). At all frequencies shown here, subjects have a threshold of < 10 dB HL with the exception of the right ear of subject S05 at 13 kHz. The examples presented in panels A and B show exceptions to the general rule: In each subject, at one frequency, one of both ears exhibited a rather low or “flat” level-dependence, where even at the lowest stimulus levels, the latency was not much higher than at 70 dB SPL. Stimulus parameters used for recording the data shown in panels A and B, and corresponding pure-tone thresholds are given in Table 3. The stability of these measurements over three months indicates that this is not an accidental finding. Panel C presents a more typical example, showing a similar level dependence of latencies in both ears. Panel D illustrates an example at \(f_2\)=13 kHz with more scatter, but consistent level dependence of the latencies. Individual (\(L_2,\,L_1\))-stimulus level pairs and subjective thresholds \(L_}\) in units of dB SPL are given as follows. S01R: (\(L_2\)=35/\(L_1\)=46), (\(L_2\)=65/\(L_1\)=71); \(L_}\)=21.5. S01L: (35/49), (65/66); 10.3. S02R (35/62), (65/74); 10.1. S02L: (35/49), (65/68); 20.2. S05R: (35/71), (65/81); 35.1. S05L: (35/68), (65/82); 20.5. S08R: (35/65), (65/76); 11.6. S08L: (35/66), (65/78); 13.8
For their corresponding DPOAE data, the most notable feature of \(\Gamma \) is a break at around 4 kHz, above which the periods remain constant. Comparing these findings to log-log scaled mean curves recorded at sound-pressure levels of 40 dB SPL (Fig. 5), we could be tempted to identify various weak breaks in the curves. However, no common feature consistently appears in the range of 1–5 kHz across all curves. Although not firmly evidenced, the comparisons in Fig. 5 suggest that determination of latency in the time domain (this study and curves MS16) shares some features, while curves based on the phase-gradient method show different characteristics (curve AGS18). The salient common feature of OAE data, from which one would expect to be able to infer something about cochlear scaling, is a rather constant rise in periods of around 0.3 dB/dB throughout the range of 1–10 kHz.
Weak breaks or perturbations in the frequency dependencies of latencies appear to exist, and in this study, they might even be said to be pronounced and also clearly consistent across the different stimulus levels. Moreover, these breaks remained individually stable over a three-month measurement period. Their exact shape seems to be quite dependent on subject (Fig. 4C, D), analogous, for example, to the spread of fitted break frequencies of [33]. Similarly, when deriving cochlear properties from swept-tone DPOAE phase gradients, the level-dependence is very low [45], while, in contrast, recent studies on SFOAE show clear level-dependence throughout the frequency range of 0.7–8 kHz [46], aligning fairly well with this study for low-to-moderate levels.
To conclude on the scaling break issue, it is conspicuous that the PTC of Oxenham & Shera (curve LOS22 in Fig. 5) also shows a steepening of the slope above 4 kHz, although no “saturation” indicates the end of a transition region. The forward-masking PTC (1–8 kHz) had been measured at 10 dB SL [16]. As threshold pressure at the eardrum rises by 5 to 10 dB from 1 to 8 kHz (cf. Figure 3 in [47]), the PTC data have probably been measured at 15 to 25 dB SPL. When correcting for higher sound pressures at higher frequencies, the level dependence for a fixed stimulus pressure suggests that the \(Q_}\)-values would rise, leading to a better match to the pulsed DPOAE latency frequency dependence. These differences highlight the complexity of directly comparing results across studies and underline the need for cautious interpretation of scaling breaks in cochlear measurements.
Comparison of Pulsed DPOAE Latencies to ABR LatenciesPulsed DPOAE latencies are compared to those of ABR using data from tone-burst evoked ABR wave V [27] and from click evoked electrocochleographic NAP wave I measurements [7]. While the frequency-dependence of the NAP wave I data matches the general 0.3 dB/dB dependence seen in all OAE data in Fig. 5, the tone-burst ABR wave V data do not. The study of [27] extended that of [26] by varying ramp designs. The data shown belong to tone-burst durations scaled with \(f^\), covering a frequency range of 1–8 kHz. To facilitate comparison, twice the estimate for cochlear forward latency is presented, calculated by subtracting 5 ms from the wave V latency [27]. This adjustment accounts for a 1 ms synaptic delay and a 4 ms interpeak wave I-V latency [26, 27]. These data were excellently reproducing the data of [26].
Table 3 Stimulus-related parameters and individual pure-tone thresholds for panels A and B of Fig. 8Two curves are shown (RAN13, gray curve, and RAN13 corr., red curve), where the latter includes a correction to estimate the correspondent group delay, because the data given in both papers compute the delay from stimulus onset, as is common usage in audiology. However, in the case of tone-burst stimuli, the group delay is the shift between stimulus and response pattern. Therefore, twice the ramp duration from the round-trip latency was subtracted. The derived round-trip group delay latency aligns reasonably well between 2 and 4 kHz with the OAE latencies shown in Fig. 5 but diverge notably at higher frequencies. At 8 kHz, the highest frequency of the data of [27], the discrepancy at 40 dB SPL amounts to a factor of 1.6 (37.3 vs. 22.8 periods, Fig. 5) or additional 1.8 ms for the round-trip latency, which is considerable. In the study of [27], ABR latency was also compared to tone-burst OAE (TBOAE). A slightly disproportionate increase of ABR versus TBOAE latency had been noted by [27] themselves, who, using the same stimulus waveforms, measured tone-burst OAE extracted by the nonlinear-residual technique, and derived the energy-weighted group delay as their measure of latency. At 8 kHz and the stimulus level of 40 dB SPL, their wave-V delay was 8.2 ms, resulting in a forward delay estimate of 3.2 ms, and the TBOAE delay was found to be 5 ms, resulting in a forward delay estimate of 2.5 ms. Consequently, the ABR data overestimates the TBOAE latency data if a factor of two is used to convert OAE round-trip delay to ABR forward latency. Their TBOAE latency of 5 ms is also considerably longer than the DPOAE latency of this study (2.8 ms). The authors discuss the reason for overestimating tone-burst OAE latencies, especially at high frequencies and low stimulus levels, i.e., the necessity to resolve the nonlinear residual in an increasingly noisy portion of the settling stimulus signal. A major difference compared to the data of this study lies in the pulse widths used for stimulation; for example, at 1 kHz, the full width half maximum of our \(f_2\) pulse was 13.1 ms, whereas it was 1.6 ms in the study of [27].Footnote 3
Latencies of wave I derived by transtympanic electrocochleography, stimulated by clicks with appropriate high-pass noise masking of the basal emitters [7], however, intended to limit the region contributing to the narrow-band electrocochleographic NAP to ½ octave basal to the characteristic frequency place (CF; 3-dB criterion). The latencies were fitted with \(\tau = \tau _0 + 3.4 f^\), f = 0.45–10 kHz, where \(\tau _0\) = 0.8 ms accounts for the synaptic delay. The exponent of −0.77 is in close agreement with the OAE data shown in Fig. 5. At 10 kHz, the computed latency of 11.6 periods is much lower than in the data of [26] and [27]. Here, a transition region between the data points at 3.6 and 5.3 kHz can be clearly identified.
Fig. 9
Ramp design and influence on time-domain measures. A Frequency dependence of ramp duration (\(T_\textrm\)) and full width half maximum (\(T_\textrm\)) of the \(f_2\) short pulse, shown for the post-hoc optimal design (blue/green line), and as used in this study (yellow/red line). B Time course of a DPOAE pulse response, as computed with a nonlinear active model of the cochlea [20, 52], for six different ramp durations \(T_f_2}\)=1–6 ms at \(f_2\)=4 kHz. C Dependence of the latency, according to the definition used of our experimental results, i.e., \(t_}-t_f_2}\) (OD), on ramp duration. In addition, latency is computed for \(t_}-t_f_2}\) (max) and half-maximum values for ramp and pulse response (HM). As ramps at 4 kHz to be exaggeratedly shorted by 0.73 ms, the resulting bias due to ramp design corresponds to an additional lag of 0.20 ms or 0.8 cycles and thus does not explain the relatively shorter latency at the beginning of the transition region
Level-dependence also allows a glimpse on what might contribute to the discrepancy between ABR latencies of Neely and Rasetswhane [26, 27] as compared to the OAE literature. ABR latencies are dominated by the most basal regions of the cochlea, where the inner hair cells first reach the threshold of synaptic firing during the build-up of a tone burst. As the stimulus levels increase, the earliest generators move faster towards the base, i.e., move from a tail-side \(Q_}\) point at low stimulus levels successively to, say, the \(Q_}\) point at a higher level, which is expected to lead to a stronger dependence of latency on stimulus level. In the present study, over a range of 45 dB level variation, the latencies vary by 2.8 dB (Fig. 4A), whereas in the ABR data of [26] and [27] the correspondent change is 6.3 and 6.5 dB, respectively. This notion would imply that frequency-specificity of tone-burst ABR is more reduced with higher stimulus levels than the OAE generation region of the nonlinear-distortion component, and this in turn could provide a plausible contributor for the discrepancy in the exponent of the latency dependence as well. However, such concepts are based on steady-state properties such as tuning quality factors, and consequently should be preferably investigated as a transient process in a time-domain model.
Test-Retest Reliability of Pulsed DPOAE LatenciesThe test-retest reliability of pulsed DPOAE latencies might become clinically relevant, in combination with short-pulse DPOAE amplitudes or eventually as a stand-alone measure, for instance, to objectively monitor the function of the cochlear amplifier. To provide a reference range in ten normal-hearing subjects, the test-retest reliability of nonlinear-distortion component latencies \(\tau \) was determined by repeatedly testing the subjects seven times over three months (Fig. 6, Table Suppl.).
There are only a few reports on the test-retest reliability of DPOAE latencies. Mahoney and Kemp [48] reported that the test-retest reliability of DPOAE delays was within 8.5% of the mean at \(f_2\) = 1–6 kHz tested in 12 ears three times by using an \(f_2\) ratio sweep with the \(f_2/f_1\) ratio 1.22–1.26, \(L_1\)/\(L_2\) = 60/45. In the present study, test-retest comparisons of \(\Gamma \) (dB re N) were within 6.0% of the mean for \(f_2\) = 2 kHz and within 9.1% for \(f_2\) = 6 kHz. Dreisbach et al. [49] described the test-retest reliability of DPOAE delays in normal-hearing adults for \(f_2\) = 2–16 kHz using \(f_1\) ratio sweeps with a fixed \(f_2\) and varying \(f_1\), resulting in frequency ratios of \(f_2/f_1\) = 1.05–1.30 and \(L_1\)/\(L_2\) = 60/45. The average group delay differences were 0.28 ms (SD 0.24 ms) at \(f_2\) = 2–8 kHz and 0.22 ms (SD 0.20 ms) at \(f_2\) = 10–16 kHz compared with the present study with 0.29 ms (SD 0.54 ms) at \(f_2\) = 2–8 kHz and 0.21 ms (SD 0.31 ms) at \(f_2\) = 10–14 kHz.
Dreisbach et al. [50] measured DPOAE ratio sweeps four times in 40 cystic fibrosis patients at the two highest frequencies where patients had present DPOAEs. The average absolute difference between trials for group delay at \(f_2\) = 8–16 kHz was 0.23 ms (SD 0.33 ms), with the smallest absolute differences of 0.19 ms occurring at \(f_2\) =14 kHz and the greatest absolute differences of 0.29 ms at 16 kHz. In the present study, the smallest absolute difference occurred at \(f_2\) = 13 kHz with 0.17 ms, and the greatest absolute difference occurred at \(f_2\) = 14 kHz with 0.28 ms within \(f_2\) = 8–14 kHz. The 95% range of data amounted to 0.87 ms at \(f_2\) = 8–16 kHz [50] in comparison with the present study with 0.93 ms at \(f_2\) = 8–14 kHz. In summary, the test-retest reliability of short-pulse DPOAE latencies described in this study seems to be comparable with the test-retest reliability of DPOAE group delays using \(f_1\) ratio sweep and \(f_2\) ratio sweep paradigms based on the literature.
The observation that the absolute differences of \(\Gamma \) show basically only a weak frequency dependence (see Fig. 6B and Table 2), with the median ranging between 0.53 and 0.84 dB for 1–13 kHz, illustrates that the intra-subject reliability, test-retest reliability or stability of the latency is not dominated by noise or systematic properties of the OD algorithm. If it were, then the test-retest reliability would be expected to be primarily constant in terms of absolute differences of \(\tau \), where, however, the correspondent range of medians spans between 0.15 and 1.0 ms. This observation suggests a physiologic cause for latency stability.
Possible Influence of Ramp Durations and Pulse-Width ChoiceThe question may arise as to whether the transition region observed between 3 and 6 kHz in our data could be attributed to our ramp duration choice. Figure 9A illustrates the ramp durations (\(T_\textrm\)) and pulse widths (\(T_\textrm\)) that would be deemed optimal post-hoc, i.e., following \(\tau \approx f^\), along with those chosen in our experiments. The divergence between the functions can be expressed such that our experimental \(f^\)-dependence overcompensates cochlear dispersion by up to a factor of 1.53 (=1.93 ms/1.26 ms) up to 4 kHz, and then reduces the overcompensation due to the constant ramp duration above 4 kHz, reaching parity with the optimal choice at 7 kHz.
Our latency definition counts the time elapsed between the stimulus reaching steady state and \(\tau _}\). In the mean, amplitude measured at \(\tau _}\) is 1.67 dB below the maximum (or steady state) of the pulse response, i.e., at 83% [51] (see also the example shown in Fig. 2). We now consider two possible scenarios.
Scenario 1The nonlinear, active amplification in the cochlea responds with a delay, but can ideally follow the ramp form. In a cosine-law, 83% are reached 0.81 = 1–0.19 of the ramp-up time. Therefore, 19% of the difference between the ideal \(f^\) ramp definition and the experimental one is expected to confound our results, i.e., 0.19 (2.14–1.41 ms) = 0.14 ms, corresponding to approximately half a period at 4 kHz. This is expected to lead to an artefactual reduction of the latencies reaching its maximal amount at 4 kHz, thus potentially explaining the transition region, but not to the amount seen in the experiments. First, Eq. 2 leads to a relative latency reduction of 0.29 ms for the \(c_\) parameter set at 4 kHz; thus, only 45% of the latency discrepancy between both model fits at 4 kHz could be explained. Second, the steepness of the latency increase in the transition region is much less explainable by the ramp choice: Between 4 and 6 kHz, the discrepancy between both ramp designs amounts to 2.76 cycles, which at the mean frequency of 5 kHz corresponds to 0.55 ms. However, only 19% of this — equivalent to 0.4 cycle at 5 kHz — is expected to bias our latency measure. This does not explain the transition region, and moreover, the question would arise why there is the second bend at the end of the transition region.
Scenario 2For instance, we assume that at 1 kHz the cochlea can follow the ramp, but as the ramps become exaggeratedly short up to 4 kHz, the cochlea can no longer follow the transient quick enough, due to some sort of slew-rate problem. In this case, the delay would be relatively prolonged by maximally (2.14–1.41 ms) = 0.73 ms at 4 kHz, i.e., 2.9 cycles. Thus, it appears that a slew-rate problem could certainly lead to considerable effects, but it would explain at first sight a relative increase of the latencies up to \(f_2\)=4 kHz, followed by a decrease up to 7 kHz and higher.
To further test a possible ramp-design influence, we simulated the influence on latency measures using a one-dimensional hydromechanical nonlinear cochlea model solved in the time domain [20, 52]. This model type replicates the short-wave behavior close to the peak [64], taking into account that pressure variation in the scalae may narrow down to a region in the vicinity of the basilar membrane, a phenomenon also called fluid focusing [53]. The model is coupled to a multi-component oscillator system mimicking realistic middle-ear transmission properties. Six different ramp durations were tested at \(f_2\)=1.5 and 4 kHz. Figure 9B shows exemplarily the DPOAE pulse responses in the ear canal obtained for \(f_2\)=4 kHz. Inspection of Fig. 9B reveals a slew-rate-like phenomenon, because it is clearly seen that for ramps shorter than 4 ms, the DPOAE pulse response increasingly fails to follow the onset with high fidelity.
Figure 9C depicts the dependence of latency on ramp duration for three different latency definitions. For this discussion, the most important is the latency computed as the time elapsed between the end of the stimulus waveform ramp and the OD point of the DPOAE pulse response (Fig. 9C, dashed line with crosses), similar to the experiments discussed here. At 4 kHz, where the ramp duration used in our study was 1.41 ms, whereas 2.14 ms would have been optimal (Fig. 9A), the correspondent change in measured latency due to this non-optimal choice corresponds to a potential exaggeration of the latency of 0.20 ms. In contrary to our results, this means that ramp duration choice would have led to a latency value exaggerated by 0.8 cycles at 4 kHz and an understated estimate for higher frequencies. Thus, following the model, the transition region would even be understated. Note that this type of cochlea model represents not a quasi-linear approximation of a nonlinear system, but solves the transient response of a nonlinear distributed positive-feedback system in the time domain.
To conclude on this issue, a minor influence of ramp duration choice on the transition region cannot be excluded, but is not expected, given that (1) the combination of ramp design and the definition of the latency using the onset-decomposition algorithm predicts a clearly smaller effect, (2) an explanation involving a slew-rate problem leads to an opposite effect, and (3) the nonlinear, active model predicts a dominance of the slew-rate effect. While a general existence of such a transition region in normal-hearing subjects is questionable, the above considerations and the simulations show that ramp design hardly contributes considerably to the finding that the individual latency functions can show clear transition regions (Fig. 4C, D) that differ inter-individually in amount and position, and are partially so steep that the assumption of scaling symmetry in the above-mentioned sense becomes unrealistic in certain frequency regions.
Another limitation could be seen in using pulse widths that are clearly larger than the expected latency at frequencies above approximately 8 kHz, at least for higher stimulus levels. For \(f_2\ge 4\) kHz, the full width half maximum of the pulses is 3.2 ms, which corresponds to about 24 periods at 8 kHz and already 30 periods at 10 kHz. The question arises whether a contamination by the coherent-reflection source could skew the latency data. For instance, Fig. 1B from [35] shows a pulse-basis decomposition of a pulse response at 10 kHz. In that example of a high-frequency pulse response, there appears to be a smaller coherent-reflection source with a delay relative to the nonlinear-distortion response of \(\approx \)2.5 ms and a clearly different phase (almost in quadrature). Although the stimulus pulse width is approximately twice the delay between both source contributions, the coherent-reflection source just starts at the onset decomposition time of the sampling algorithm, and thus does not interfere with it. At frequencies higher than 10 kHz, there might have been a risk of falsely sampling an interference state between two sources. On the other side, we have never encountered amplitudes of a presumed coherent-reflection source at such high frequencies being much larger than the one shown in Fig. 1B from [35]. One has to keep in mind that using the time-domain method, once in a while one would encounter a destructive phase constellation which, if both contributions have similar amplitude, always would strike the eye, presenting a notch where both response contributions overlap, which we never saw. It is thus deemed improbable that the latency data, even up to 14 kHz, are contaminated to any great extent by interference phenomena between both source contributions.
The Factor of 2 and Whether OAE Are Backpropagated by Compressional or Slowly Traveling WavesWe have used here a factor of 2 to make ABR forward delay comparable to the raw latencies of our and others’ OAE data, as well as those reported by others. This is a choice which we have borrowed from comparisons of tone-burst (TBOAE) and click-evoked otoacoustic emission (TEOAE) latencies to ABR data in the past [26,27,
Comments (0)