Identifying multiple sclerosis in women of childbearing age in six European countries: a contribution from the ConcePTION project

Multiple sclerosis (MS) is a long-term autoimmune condition affecting around one in 1,000 people worldwide. The prevalence of MS varies within and between countries, being higher in Nordic countries, and has generally increased over the last few decades [1, 2]. Women are two to four times more likely to be affected than men, and are usually diagnosed during their childbearing years, raising the question of the impact of MS and MS treatment on pregnancy [3,4,5]. Pregnant women are usually excluded from clinical trials, resulting in a lack of information on the safety of use of medication during pregnancy. To fill this gap, post-authorisation observational studies play an essential role. In particular, multicenter studies using data from several healthcare data sources are needed, especially for rare diseases such as MS where data is scarce.

As we show in a concurrent work, the choice of the method for assessing prevalence and the length of the lookback both have an impact on MS prevalence estimates (article in print: DOI https://doi.org/10.1007/s10654-025-01243-8). When estimating MS prevalence in a multicentre study with several healthcare data sources, an additional key factor is the algorithm used to identify MS. Algorithms are useful as they may include diagnoses from different sources, such as inpatient, outpatient, primary care, as well as prescription data [6,7,8,9,10].

A wide range of algorithms for identifying MS in administrative healthcare databases has been described in the literature [6,7,8,9,10,11,12,13,14,15]. Capkun et al. tested ten algorithms from the literature in a large US administrative claims database, and the corresponding prevalence estimates ranged from 87 to 212 per 100,000, illustrating the major impact of the choice of the algorithm on prevalence estimate. Based on a comparison with published prevalence, two algorithms appeared to be superior to the others: the first one required two MS diagnoses at least 30 days apart and the second one required at least one principal inpatient MS diagnosis or 2 MS diagnoses at least 30 days apart [7]. However, the choice of the most accurate algorithm can differ depending on the database. In three databases, the preferred algorithm required 3 or more MS-related claims from any combination of inpatient, outpatient, or DMT (Disease-Modifying Therapy) use within 1 year [10]. In Wales, an algorithm requiring either an MS diagnosis code with the disease onset at least six months after the earliest entry in the Welsh Primary Care source, or three MS diagnosis codes, had a sensitivity of 96.8% and a specificity of 99.9% [9]. A less restrictive algorithm, requiring at least one MS record in administrative datasets among medicine prescriptions, hospital discharge and outpatient consultations, was used and validated in several Italian studies, with a sensitivity ranging from 85 to 99% and a specificity ranging from 87.4 to 100% [12,13,14,15]. In France, a comparable algorithm requiring only one event among long-term disease status for MS, MS-related hospital admission or reimbursement for MS-specific DMTs was used in the national health data system [8]. The performance of this algorithm was later evaluated, showing a sensitivity of 87.6% and a specificity of 99.9% [11].

Within the ConcePTION project, we aim to explore the use and safety of MS medications during pregnancy using several European healthcare data sources, and the first step is to identify women with MS in these sources. In this study, we aimed to compare 5 algorithms to identify MS among women of childbearing age in six European healthcare data sources. For this purpose, we assessed MS prevalence using these five algorithms, and compared the prevalence estimates within and across data sources. MS prevalence estimates were then compared with published prevalence. Identifying women with MS is a first step to further study the use of MS medicines in women of childbearing age and pregnant women, and the safety of use of these medicines during pregnancy.

Methods.

Study population

The study population consisted of women aged between 15 and 49 years (i.e. all women of childbearing age including pregnant women), between 2005 and 2019 from six European data sources.

Data sources

The study was conducted using health care data sources from six European countries: Finland, Haute-Garonne (France), Emilia Romagna (Italy), Norway, Valencian Region (Spain) and Wales (UK). Detailed information on the data sources are given in online supplementary Table 1. Briefly, in Finland and Norway, data are from administrative healthcare databases with national coverage including birth, prescription, primary and specialized health care registries. The records from all registries are linkable at the individual level by a unique national person identifier. In Haute-Garonne (France), data are from the population-based EFEMERIS cohort of pregnant women living in Haute-Garonne containing data on pregnancy characteristics, outcomes and child health. In Emilia Romagna (Italy) and Valencian Region (Spain), data originate from regional administrative health registries. They include diagnoses from hospital and specialist care contacts (only for the Italian data source) and drug dispensing data. In Wales (UK), data are linked in the SAIL databank [16, 17]; for this study, hospital admissions data (national coverage) was linked with primary care data, including all prescriptions issued in primary care. Some 85% of Wales’ primary care practices contribute data to SAIL.

The Italian, Norwegian, and Wales data sources provided data on women of childbearing age, with complete data coverage during the study period. The Spanish, Finnish, and French data sources provided data only on pregnant women. In Finland, diagnosis data from patient registries was available continuously during the study period, but prescription data was only available from three months prior to pregnancy until three months after the end of pregnancy. In Valencian Region, diagnosis and prescription data was available continuously from 2013 to 2019. In France, the prescription data was available from 2.5 months prior to the pregnancy until the end of pregnancy and maternal diagnostic data (from inpatient data) was available only during the pregnancy.

Study period

The study period ran from 1st January 2005 to 31st December 2019. Not all the years were available across all data sources: the exact study periods for each data sources are listed in online supplementary Table 2. Wales data source included historical data from 1 January 1998 to 31 December 2004 for women resident in Wales in the study period.

For each data source including women of childbearing age, the cohort entry date was the latest of the four following dates: the date they joined the data source, the date of their 15th birthday, 1st of Jan of the earliest year of data available in the data source or January 1st, 2005. The cohort exit date was the earliest of the four following dates: the date they left the data source, the date of death, the date of their 50th birthday or December 31st, 2019.

For the data sources including only pregnant women, we restricted data collection to 3 months before to 3 months after pregnancy to be homogeneous between these data source: for Valencian Region (Spain) and Finland, the cohort entry date was 3 months before the 1st day of Last Menstrual Period (LMP) of the first pregnancy and the cohort exit date was 3 months after the end of the last pregnancy; in the French data source, the cohort entry date was 2.5 months before LMP of the first pregnancy and the cohort exit date was the end of the last pregnancy. In these data sources, follow-up could contain several observation periods corresponding to the different pregnancies, separated by periods with no data available. We calculated the coverage of the follow-up, corresponding to the percentage of the follow-up during which the woman is observed.

Inclusion criteria

For the data sources including all women of childbearing age (i.e. Italian, Norwegian, and Wales data sources), only women who had complete coverage for at least 365 consecutive days in the study period were eligible.

For the data sources only including pregnant women (i.e. Spanish, Finnish, and French data sources), all complete pregnancy periods lying within the study period for women aged between 15 and 49 years-old during the entire pregnancy period were included in the study. In the Spanish data source, the ConcePTION pregnancy algorithm was used to identify pregnancy episodes, establish the pregnancy type of end and to estimate the pregnancy start date (corresponding to the LMP date) and pregnancy end date [18].

MS identification algorithmsComponents of the MS algorithms

Two types of components were used in the algorithms to identify MS: diagnostic codes recorded in various settings, and medicines prescribed or dispensed. Diagnostic codes (listed in online supplementary Table 3) were classified according to their type: inpatient diagnoses (from patients admitted to hospital), primary care diagnoses, and other diagnoses (including diagnoses made during emergency visit or outpatient care). The second component was medications data: dispensing (or prescription in Wales) of MS DMTs (listed in online supplementary Table 4), distinguishing MS-specific DMTs (the only indication is MS) from non-specific MS DMTs (indications for MS and other diseases). The availability of these data components in the six data sources is shown in Table 1.

Table 1 Availability of algorithms components in data sourcesAlgorithm description

Five algorithms to identify MS (named MS1 to MS5) were developed and the estimated prevalences were compared. The algorithm MS1 identified MS cases based on the presence of at least one MS-related diagnosis (all types of care) or at least one prescription for MS-specific DMT, as proposed by Foulon et al. [8]. The algorithm MS2 required to be positive for MS1 and to have one more MS diagnosis or DMT prescription. Based on the study of Culpepper et al. [10], the algorithm MS3 required to be positive for MS2 and to have one more MS diagnosis or DMT prescription. The algorithm MS4 identified MS based on the presence of at least one inpatient MS-diagnosis or at least two outpatient, unspecified or primary-care MS-diagnoses, as proposed by Capkun et al. 2015 [7]. The algorithm MS5 identified MS based on the presence of at least two MS-related diagnosis (all types of care), as proposed by Capkun et al. 2015 [7]. When multiple diagnoses were required, a minimum of 30 days’ separation was required. Table 2 summarizes the criteria required for each algorithm.

Table 2 Number and type of events required in the algorithms used to identify women with multiple sclerosisStatistical analysisPrevalence of MS

Our other study demonstrated that the choice of method for estimating the prevalence of MS can vary significantly depending on the study population (article in print: DOI https://doi.org/10.1007/s10654-025-01243-8). Consequently, we chose to use two different methods to assess prevalence, depending on if the data source included women of childbearing age or pregnant women.

The date of MS identification was the date when the algorithm criteria were met. For example, with the algorithm MS5, 2 MS diagnoses are needed, the date of identification was therefore the date of the second diagnosis. After MS identification, a woman was considered with MS until the end of her follow-up.

In the data sources with women of childbearing age, an average point prevalence of MS was calculated: a point prevalence was calculated on the 1st day of each month during the given period, and the prevalence on the given period was the average of all these points prevalence. On the first day of each month, the MS prevalence was calculated as follow: number of women in the study and identified with MS before or on the given day divided by the number of women in the study on the given day.

In the data sources confined to pregnant women, data were available only during the pregnancy period, making it difficult to identify the exact date of MS diagnosis. Therefore, the date on which the algorithm criteria were met was very unlikely to be the date of first diagnosis. To overcome this lack of precision, we chose to calculate a period prevalence of MS, a method that did not take time into account. Period prevalence of MS over a given period was calculated as follows: the numerator included all women in the study any time during the given period, having been identified with MS before the end of the given period, and the denominator included all women in the study any time during the given period. In contrast to the average point prevalence, when calculating period prevalence over a given period, identification of MS at the end of the given period will have the same weight as an identification of MS before the given period.

Period prevalence over the entire study period, as well as the percentage of variation between prevalence estimates provided by MS2 to MS5 in comparison to prevalence estimates provided by MS1 were also calculated for all the data sources and available in supplementary Table 5.

95% Confidence interval (95% CI) were calculated using the Wilson score method.

Covariates

Figure 1 illustrates the periods with available data used to identify MS, along with the periods when prevalence stratified by different time intervals and age groups was calculated, by data source.

Prevalence was stratified by five-year intervals (2005–2009,2010–2014,2015–2019). Most data sources covered shorter study periods, resulting in some intervals being less than five years.

Prevalence was also stratified by the age of the woman (15–24,25–29,30–34,35–39,40–49). Results within the 2015–2019 period have been plotted, as this is the most recent period, with the longest lookback available to identify MS.

For the period prevalence, for each age group, women who were into the relevant age group at any time during the period were included in the prevalence calculation. For the average point prevalence, for each age group, women who were into the relevant age group the day of the point prevalence calculation were included in the prevalence calculation.

Software and common data model

All Data Access Providers (DAPs) extracted an instance from their data source that was large enough to support the study design, and mapped them into the ConcePTION Common Data Model (CDM), thus obtaining an instance of the ConcePTION CDM [19]. This enabled the use of standardized analytics and tools across the network. However, the queries to be executed in distributed analyses still needed to be adapted to the diversity of the data source, including whether the data source could include all women or only pregnant women, the specific coding system, and the specific settings where diagnoses are recorded.

The script was developed using R. A script in SAS was developed to cross-check the outputs of the script within the French data source (EFEMERIS).

The DAPs executed the study code locally on their CDM instance. The result of the script was interpreted and if any inconsistencies were found the script was revised. After reviewing the aggregated results, DAPs approved their upload to the remote Research Environment hosted by the anDREa Consortium, that includes the ConcePTION partner University Medical Center Utrecht. This environment, compliant with local General Data Protection Regulation implementations, could be accessed by the principal investigator.

The results from each of the contributing data sources were then combined in tables and figures for this paper. Non-empty cell counts < 5 were shared in masked format.

Comments (0)

No login
gif