Development and validation of the dysarthria impact scale: a patient-reported outcome for motor speech disorders

The protocol was approved by the Institutional Review Boards of The University of Melbourne Human Ethics Committee (2024-11758-51349-6) and University of Tübingen (003/2015BO2) in accordance with the Declaration of Helsinki.

Tool development

Rather than undertaking de novo qualitative item generation, we adopted a theory and literature-driven approach to derive items reflecting key domains of dysarthria impact, supplemented by targeted patient and clinician review to ensure relevance, clarity, and face validity. The COSMIN (COnsensus-based Standards for the selection of health status Measurement INstruments) risk of bias checklist was used to optimize methodological quality and limit risk of bias of the proposed tool [31, 32]. The a priori clinometric properties targeted in the tool design focused on integrating adequate reliability and validity, minimizing measurement error and maximizing responsiveness to symptom severity. It was considered important that the proposed tool was brief, easy to complete, and suitable for use in a variety of clinical settings and languages.

We first conducted a review of the literature to identify comparable patient-reported outcomes in communication research and clinical practice. This included examination of PROMs designed for use in aphasia (language impairment), motor speech (including dysarthria and apraxia), and voice, and general communication impairment. This identified several tools with overlapping features and themes relevant to our own tool, including the Communicative Participation Item Bank [33], the Voice Handicap Index, [34] and the Dysarthria Impact Profile [23] (See Supplementary Materials S1 for detail). However, no existing tools met all requirements for use. A small purposive sample of individuals with progressive dysarthria (Friedreich ataxia; n = 5) was consulted to review item relevance, wording, and comprehensibility, rather than to undertake primary qualitative item generation (See Supplementary Materials S2). Patient involvement at this stage was limited to feasibility review rather than formal item rating, reflecting pragmatic constraints of a multi-site study. Combining information from existing tools and people living with progressive dysarthria, we focused on developing questions across three established domains of communication: physical functioning, emotional functioning, and functional impact of dysarthria.

The first series of 29 items was created across these three domains. Clinical expert review involved two speech–language pathologists with experience in motor speech disorders, a neurologist, and a general physician. Items were independently rated for relevance, clarity, and redundancy. Iterative consensus discussions were used to remove or refine items judged to be overlapping, ambiguous, or insufficiently aligned with the intended construct, resulting in reduction from 25 to 22 items. A trial version was condensed to 22 items following a final review by content experts and a trial with the clinical cohorts (see Supplementary Materials S4 for trial items). Subsequent shorter versions of the tool were also tested, namely DIS 17-item [DIS-17] and DIS 6-item [DIS-6]. Item wording was adapted from existing instruments and refined to ensure consistent directionality, simple syntax, and relevance to dysarthria-related daily communication, rather than voice- or language-specific impairment.

Participants

244 participants were recruited to the study (52.9% female). Cohorts included individuals with Huntington’s disease (n = 45), Friedreich ataxia (n = 38), Parkinson’s disease (n = 31), spinocerebellar ataxia (n = 21), or head and neck cancer (n = 20) [participants were recruited from the head and neck ward and were included if they presented with any speech impairment resulting from their condition]. The head and neck cancer cohort was included as a non-neurodegenerative comparator group to examine the cross-etiology validity of the DIS in dysarthria arising from structural rather than neurological causes. Other diagnoses included cerebellar ataxia, neuropathy, vestibular areflexia syndrome (CANVAS) or multiple system atrophy-cerebellar (MSA-C) (n = 19). Seventy age- and sex-matched controls also participated. See Table 1 for demographic and clinical information. Recruitment and testing were undertaken in Melbourne, Australia, and Tübingen, Germany. Clinical participants were excluded from the degenerative disease cohorts if they had comorbid neurological diseases known to impact communication (e.g., stroke, multiple sclerosis); clinical symptoms other than those caused by their primary disease; lacked competency in English or German (depending on site); a history of alcohol or drug abuse that required medical intervention; or a history of learning disability and/or intellectual impairment. Eligibility for inclusion as a healthy control required having no family history of a neurological disease and had unremarkable cognition, speech, and language function, as assessed by a speech pathologist. Cognition was assessed using the Montreal Cognitive Assessment (MoCA) [35]. All subjects provided informed consent to participate in the study.

Table 1 Clinical and demographic information for the cohortReference tests and clinically meaningful endpoints

All participants were assessed using the Dysarthria Impact Scale (DIS) (index test). Convergent validity was examined by asking participants to also complete the Voice Handicap Index (VHI) [34]. The VHI was used as a reference test for communication performance. The full version of the VHI is a PRO with 30 items exploring the psychosocial consequences of voice disorders. The Short Form 36 (SF-36) [36] was used as a reference test for general health and well being. The SF-36 is a PRO examining generic health and quality of life. The VHI and SF-36 were selected as pragmatic external anchors because no single existing instrument captures dysarthria-related impact across neurological diagnoses; these tools were not treated as gold standards, but as complementary reference measures reflecting related psychosocial and health constructs.

Participants also provided speech samples that were recorded for rating and analysis. Samples were acquired to investigate the link between speech production and self-reported speech-related quality of life. Speech was recorded using the Redenlab® Desktop or Online software in a quiet room. Speech tasks included two iterations of a sustained vowel /a:/ and an unprepared monologue for approximately 1 min. Speech samples were rated independently by two expert speech–language pathologists on an ordinal scale of 0–4 where 0 = unremarkable and 4 = severe. Inter-rater agreement was high (ICC = 0.964 for intelligibility and ICC = 0.958 for naturalness scores) and consistent with prior published protocols using this scale as per previous work [37,38,39]. Concurrent validity was examined by producing objective composite measures of intelligibility and naturalness using Redenlab’s AnalyzeTM pipeline derived from the two tasks as per earlier work [40, 41] and comparing them to DIS scores.

Demonstrating methodological quality of DIS

Although the DIS is not a diagnostic instrument, selected principles from the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies-2) framework [42] were used to structure assessment procedures (e.g., blinding, timing of assessments, and reference test independence) to minimize bias and improve transparency. These criteria were considered to minimize bias and improve applicability. The two reference tests, VHI and SF-36, and the index test (DIS), were interpreted separately (i.e., expert clinicians rating speech were blinded to DIS outcomes). The index test was administered to all participants irrespective of clinical presentation. The reference tests were applied to all participants who completed the index test where possible. The duration between the index and reference tests was brief (i.e., within the same testing session or day) to minimize opportunities for participant performance or perception to change between tests (e.g., medication for PD can influence performance over the day). We recruited four different clinical populations to ensure a diverse representative sample of dysarthria was exposed to the index test. Data are presented in detail to adequately describe specific aspects of the index test. The index test can easily be reproduced based on information provided in this manuscript. Lastly, reasonable definitions of normal/abnormal performance on the index and reference tests are provided.

Statistical analysis

Statistical analyses were conducted using SPSS (IBM SPSS Statistics for Windows, Version 28.0. Armonk, NY: IBM Corp). Validity evidence for the Dysarthria Impact Scale was examined through assessment of convergent validity and discriminative (known-groups) validity. Convergent validity was examined by assessing associations between DIS scores and established patient-reported outcome measures assessing related constructs, including the Voice Handicap Index (VHI) [34] and the SF-36 [36]. Discriminative (known-groups) validity was examined by evaluating the ability of the DIS to differentiate between clinical and control participants using receiver operating characteristic (ROC) analyses, with sensitivity, specificity, and 95% confidence intervals calculated. Cut-off thresholds were chosen as 1 standard deviation below the mean for healthy controls, which gave greater weight to sensitivity, without minimizing the importance of specificity or losing much area under the curve (AUC). Sensitivity scores provided an estimate of the proportion of participants who were identified as presenting with reduced quality of life due to impaired speech. Specificity calculations estimated the proportion of unimpaired participants who were identified as not presenting with reduced QoL. Differences between disease groups were explored with plots and summary statistics. Pearson’s correlations between DIS and clinically relevant metrics such as disease and dysarthria severity and dysarthria were calculated to explore these relationships. Known-groups validity was further examined by comparing DIS performance across neurological diagnostic groups. Internal consistency was evaluated using Cronbach’s alpha for the DIS-22, DIS-17, and DIS-6 total scores in the clinical cohort using complete item responses for each scale. Lower DIS scores indicate worse speech-related quality of life. Cronbach’s alpha was used descriptively to quantify item inter-relatedness and does not establish structural validity.

Test–retest reliability

Intra-individual stability and reliability for the DIS were explored by examining agreement between scores provided 1 month apart. Ataxia was selected for test–retest assessment due to its relatively stable short-term clinical profile, allowing evaluation of score stability in the absence of expected clinical change. This short duration was considered long enough to wash out some familiarity effects and brief enough to ensure the disease had not progressed. Agreement was examined using a Bland–Altman plot in addition to correlation, as it was anticipated both samples would be strongly correlated [43]. Only data from the first assessor were included in the final validation data analysis. Repeated assessments were only used for establishing reliability, as determined in our pre-analysis plan.

Item reduction

The original list of items in DIS included 22 questions. An iterative approach was adopted to reduce the number of items with the aim of retaining accuracy but reducing completion time and burden on participants. This could then yield a complete set of questions, in addition to a brief version, for use in time-poor clinical settings. At each step, the correlation between the total DIS and each item was calculated, and the item with the weakest correlation (or more than one item if similarly weak) was eliminated (see Supplementary Materials S3). Sensitivity, specificity, and AUC were calculated at each step to help determine the main possibilities for a shorter questionnaire. Principal component analysis (PCA) was initially performed, but resulted in the same item reduction.

Calculating minimal detectable change (MDC) and minimal clinically important difference (MCID)

MDC was estimated via within-subject standard deviation (WSSD) values. WSSD was calculated to quantify individual-level test–retest variability for the DIS-17 and DIS-6. The standard deviation of the differences between test and retest scores (SD_diff) was first derived from repeated assessments conducted 1 month apart in the ataxia subgroup. The WSSD was then computed using the standard formula:

In the context of test–retest or within-subject variability calculations, the square root of 2 is used to adjust for the variance of the difference, which is twice the within-subject variance.

MCID reflects a patient-perceived threshold of benefit and is ideally estimated using an external anchor to capture what patients judge to be a meaningful change. In this study, MCID was estimated using a distribution-based approach because no anchor (e.g. patient-reported global change) was available. Here we used the widely accepted “0.5 × SD” approach [44], which estimates MCID as half of the standard deviation of the group score distribution.

Translation

Materials were translated from English into German, French, Polish, Czech, Portuguese, and Turkish (see Supplementary Materials S4 for alternative language versions). Translation protocols are described in Kraus et al. [45] and included both forward and backward translations required for each language.

Comments (0)

No login
gif