RareLink: scalable REDCap-based framework for rare disease interoperability linking international registries to FHIR and Phenopackets

The RareLink framework is an open-source, modular software designed for seamless integration with any REDCap instance that is connected to the BioPortal Ontology Services terminology server and has a valid country-specific Systemized Nomenclature of Medicine—Clinical Terms (SNOMED CT) license (Fig. 1). RareLink consists of five core modules: (i) RareLink Documentation, (ii) RareLink Common Data Model, (iii) RareLink Command-Line Interface (CLI), (iv) RareLink FHIR Module, and (v) RareLink-Phenopackets Module. It is compatible with local REDCap projects and institutional deployments, supporting diverse use cases, including patient registries, study feasibility and cohort generation, and related clinical research activities. While RareLink can be deployed locally and core functions such as the FHIR export pipeline and the semi-automated import operate offline, an internet connection is required to access the documentation, support ontology services in the Phenopackets pipeline and term search in REDCap instruments. The framework is designed to be user-friendly for both clinical and technical personnel. By integrating the complete ontology-based RD-CDM14 within REDCap, RareLink provides a flexible framework that enables guided manual data entry for prospective cohorts and semi-automatic import using LinkML-Map15 for retrospective data sets. Once data is captured, users can export to validated HL7 FHIR instances and GA4GH Phenopackets by using the CLI that enables them to interact with the toFHIR Module and the RareLink-Phenopackets Module (Fig. 1). The Clinical Data Interoperability Services (CDIS) module16 can be integrated to import existing FHIR instances to REDCap. Although SNOMED CT and several of the other ontologies support multilingual labels, the current RareLink implementation retrieves ontology content from BioPortal using English-language labels only.

Fig. 1: Overview of the entire RareLink framework’s data flow.figure 1

The RareLink framework is integrated with a local REDCap instance and preconfigured for the RareLink-CDM, with the option of disease-specific extensions. The RareLink-CDM is equivalent to the ontology-based RD-CDM integrated through its Python package and defined through its LinkML schema and the corresponding REDCap instruments. Utilising the local REDCap API and the RareLink-CLI, the toFHIR & CDIS modules can export to HL7 FHIR International Patient Summary (IPS) and Genomic Reporting resources and import from a corresponding FHIR server enabling export or record linkage to the European Reference Networks, the EHDS, or other international and domain-specific registries. The RareLink-Phenopackets module exports to GA4GH Phenopackets, allowing for the use of Phenopacket-based analysis software. Additionally, the LinkML-based import mapper can support data import from tabular databases, map it to the corresponding LinkML schema, and enable either subsequent import into REDCap or direct export as Phenopackets. The Manual Data Capture Guide aids with the manual entry of data according to the common data model. API Application Programming Interface. CDIS Clinical Data Interoperability Services. CDM Common Data Model. CLI Command Line Interface. EHDS European Health Data Space. FHIR Fast Healthcare Interoperability Resources. GA4GH Global Alliance for Genomics and Health. IPS International Patient Summary. LinkML Linked Data Modeling Language. RareLink documentation https://rarelink.readthedocs.io/en/latest/index.html. RD-CDM Pyton Package Index (PyPI) https://pypi.org/project/rd-cdm/. RD-CDM ontology-based rare disease common data model (https://github.com/BIH-CEI/rd-cdm). REDCap Research Electronic Data Capture.

RareLink documentation

The comprehensive RareLink documentation was designed with an emphasis on reusability, scalability, and cross-institutional applicability. Given RareLink’s deployment across multiple countries and institutions, clear and accessible documentation was essential to ensure consistent implementation and independent use. It centralises all RareLink features, guides, and resources, accessible from any REDCap site, and is structured into five subsections: Background, RareLink Framework, Installation, User Guide, and Additional Information. The Background section provides theoretical context and introduces the ontologies and terminologies employed—such as SNOMED CT—as well as the structural data standards such FHIR and Phenopackets, with references for further reading. The RareLink Framework section offers an overview of the architecture, including the RareLink-CDM and the command-line interface. Installation guides users through framework setup, REDCap configuration, data dictionary import, and REDCap API connection. Detailed user guides support manual data capture, semi-automated data entry, use of the Phenopackets and FHIR modules, development of REDCap instruments, and REDCap tools integration. The Additional Information section includes contribution guidelines, a changelog, FAQs, acknowledgements, licensing, and contact information (https://rarelink.readthedocs.io/en/latest/index.html).

RareLink common data model

We enhanced the ontology-based RD-CDM14 for its REDCap implementation by developing the RareLink Common Data Model (RareLink-CDM), aligning it with corresponding FHIR resources and profiles, and Phenopacket blocks. To ensure interoperability, we defined required fields within each section and extended the Measurements section to support linkage with International Patient Summary (IPS) profiles for laboratory, imaging, and procedural data and Phenopacket’s MedicalAction block. Only the Formal Criteria section is mandatory; all other sections are optional but conditionally required: if any field is used, its associated mandatory elements must be completed. With exception of the Formal Criteria, Personal Information, Patient Status, Consent, and Disability, all sections are implemented as repeating instruments (Fig. 2a–c).

Fig. 2: The RareLink-CDM in REDCap for the evaluation cohort of ten individuals with Kabuki syndrome type 1.figure 2

a Overview of all RareLink-CDM sections displayed as standalone data collection instruments within REDCap’s Designer view. b The record status dashboard displaying the ten simulated individuals enrolled in the registry, each with completed data across demographic, consent, disease, genetic, phenotypic, and measurement instruments. c The record dashboard for individual 3, illustrating the use of repeating instances for phenotypic features. d A screenshot of the data entry window for a phenotypic feature of individual 3, including introductory text and links to the documentation and manual data capture guide. In this example, the individual was reported with confirmed, recurrent, and profound hypotonia, with an infantile onset dated 17 September 2017. HP Human Phenotype Ontology.

To promote accurate data capture and reduce entry errors, the REDCap instruments include detailed instructions, branching logic, and embedded links, complemented by the manual data capture guide (Fig. 2d). The LinkML representation of the RareLink-CDM mirrors this structure by grouping repeating instances in its JSON serialisation. Once data is exported as REDCap-JSON, it is automatically processed and validated against the LinkML-JSON schema. This schema17, along with the corresponding Python classes, includes all REDCap variables, coded terms, and value sets as importable enumerations. The RareLink-CDM automatically obtains its ontology versions from the RD-CDM Python package18, which in turn retrieves and validates the latest releases from BioPortal.

Command line interface

The RareLink command-line interface (CLI) guides users through the entire workflow from setup to data export, with a design that accommodates those with limited coding experience. It features a user-friendly interface with descriptive commands, embedded links, and contextual hints. Each command includes cross-references and built-in validation to ensure required steps are completed and configurations are correct. Organised into five command groups, the CLI facilitates onboarding and streamline usage (Fig. 3a). After setup, the redcap command group enables API-based interaction with local REDCap projects. The fhir and phenopackets command groups provide all necessary steps for exporting data in both formats (Fig. 3b). As adoption of RareLink grows, additional commands will be incorporated based on community-driven feedback to support evolving needs.

Fig. 3: RareLink command-line interface and data flow for the Kabuki Type 1 evaluation cohort.figure 3

a The RareLink CLI is organised into five primary command groups—framework (global configuration), setup (local installation), redcap (interaction with a REDCap project), fhir (FHIR export via the RareLink-FHIR module) and phenopackets (Phenopacket generation via the RareLink-Phenopackets module)—with further subcommands under each. Future releases will extend functionality as requirements evolve. b Key data-flow steps (download-records, phenopackets export, fhir export) are invoked after setup (setup keys, fhir setup, and, if necessary, fhir hapi-server and redcap validate-hgvs) on the CLI to fetch, validate, and transform REDCap data into the LinkML representation of the RareLink-CDM, Phenopackets, and FHIR instances (Supplementary Fig. 1 for full console output). The LinkML data can be imported back into REDCap using the redcap upload-records command. If the evaluation cohort had been imported from another retrospective database, LinkML-Map could also have been used. CDM common data model. FHIR Fast Healthcare Interoperability Resources. LinkML unified data modeling language. REDCap Research Electronic Data Capture.

RareLink-FHIR-module

All 75 elements mapped to FHIR in the ontology-based RD-CDM14 are exportable from the RareLink-CDM to the corresponding FHIR resources and profiles that we selected. The export process leverages the toFHIR engine19, which enables automatic validation against the IPS v2.0.0 profiles, the Genomics Reporting v3.0.0 profiles, and Base Resources v4.0.1 structure definitions to any FHIR server (Fig. 4). These profiles are embedded as dependencies within the RareLink-CDM profiles ensuring interoperability upon implementation. This functionality facilitates linkage to FHIR repositories, international registries, and supports data import via the Clinical Data Interoperability Services module16. The export pipeline requires Docker to be installed and running which can be hindered on some hospital systems by institutional software installation restrictions. In the absence of a remote FHIR server, a local instance can be set up using a HAPI server. Detailed guidance and all CLI commands are provided in the user guide documentation under the FHIR module section17. The RareLink-CDM FHIR implementation guide and specifications are publicly available and hosted through our repository17.

Fig. 4: Schematic overview of the RareLink-CDM as both FHIR instances and a Phenopacket.figure 4

FHIR instances conform to the HL7 International Patient Summary v2.0.0 profiles for Patient, Condition, Laboratory, Radiology, and Procedure. Genetic findings are captured using the HL7 Genomics Reporting v3.0.0 Genetic Variant and Diagnostic Implication profiles. Additional components, including encounters, phenotypic, and other observations (e.g. age category, gestational age), family history and consent (incorporating ERDRI-CDS elements), utilise FHIR R4 base resources. The RareLink-CDM Phenopacket comprises Individual, VitalStatus, and Disease blocks, together with phenotypic, measurement, and medical action data, and genetic information within the Interpretation and VariantDescriptor blocks. ERDRI-CDS European Rare Disease Infrastructure Common Data Set. HL7 Health Level 7. IPS International Patient Summary. RareLink-CDM RareLink Common Data Model.

RareLink-Phenopackets-module

With the exception of the family history section, all 43 elements mapped to Phenopackets in the ontology-based RD-CDM14 are exportable from REDCap via the RareLink-CDM in the current version (Fig. 4). Following export, the dataset is processed and validated against the LinkML schema, after which the CLI phenopackets export command generates the corresponding Phenopackets (Fig. 3b). A free-of-charge Bioportal API key is required to retrieve ontology labels during this process. The export engine leverages RareLink’s DataProcessor class to convert REDCap codes into valid OntologyClass elements within the Phenopacket structure. Mapping logic is defined in the BaseMapper class to support the necessary Phenopacket blocks. By integrating mapping, creation, writing, and validation functionalities within a single pipeline, the engine streamlines the export process. While the mappings are preconfigured for the RareLink-CDM, they remain extensible. Developers seeking to adapt the Phenopacket engine for other data models should follow RareLink’s guidelines for building ontology-based REDCap instruments. Detailed setup instructions for extending the engine are provided in the Phenopackets Module section of the documentation17.

Clinical usability

Evaluation of the RareLink framework demonstrated usability and interoperability across diverse REDCap environments. During testing, users highlighted the intuitive documentation, the user-friendly CLI, and automated export pipelines as key strengths. The system successfully supported the export and analysis of all elements defined in the ontology-based RD-CDM that were mapped to either FHIR or Phenopackets, with the sole exception of the family history section in Phenopackets14, demonstrating its adaptability and coverage. Notably, predefining cohort-specific elements and customising the RareLink-CDM improved usability and consistency, facilitating subsequent registry deployment and data analysis. Its integration into the adopted REDCap infrastructure was noted as a major advantage, offering a scalable solution as demonstrated in phase two and three. However, REDCap’s typical separation from hospital information systems was identified as a barrier to seamless integration with clinical workflows. Conversely, utilising the RareLink-CDM’s LinkML schema and the RareLink-Phenopacket pipeline independently of REDCap increased efficiency in generating Phenopackets for retrospective cohorts. While the framework’s modular design and extensive guidance support its reusability across various research settings, the evaluation also highlighted challenges for users without a background in coding or interoperability. Users unfamiliar with command-line tools required additional support to operate the framework effectively. Effective clinical usability depended on precise local semantic annotation, with domain experts engaged to ensure that the chosen ontology terms reflected site-specific meaning and supported consistent data interpretation. To support these issues, feedback mechanisms embedded in the documentation and GitHub repository17 are actively collecting input for future enhancements.

Deployment teams in South Africa and Japan provided formative feedback, reinforcing that semantic alignment was a prerequisite for meaningful clinical use, and highlighting the framework’s scalability, the value of predefined cohort elements for harmonisation, and the need for tailored onboarding for non-technical users. To date, we have not conducted a combined international cohort analysis, but feedback from deployments is shaping the design and harmonisation of potential cross-site studies. We also note that using RareLink in different settings can introduce bias through mapping choices, local capture heterogeneity, and resolution fallbacks when ontology terms cannot be matched exactly. These risks were addressed through expert-reviewed semantic alignment, iterative refinement based on site feedback, and consistency checks with simulated data.

Canadian Inborn Errors of Immunity National Registry

The RareLink framework demonstrated both scalability and extensibility through its application in the Canadian Inborn Errors of Immunity National Registry (CIEINR)20. The collaboration with CIEINR served as a key use case for evaluating extensibility and scalability. In this domain-specific implementation, the core RareLink-CDM was extended with additional clinical fields tailored to immunodeficiency-specific needs. Drawing from this experience, we developed guidelines for creating REDCap instruments compatible with the RareLink framework and provided detailed documentation on adapting the modular Phenopacket module for export. Iterative trial data capture, continuous feedback loops, and refinement processes informed these developments. Evaluation showed that the core RareLink-CDM sufficiently covered key data elements, such as demographics, patient clinical status, and genetic information. Additional domain-specific sections, such as phenotypic features, were incorporated through modular extensions and controlled value sets. Mandatory elements were preserved to maintain interoperability, and rule-based validation ensured consistency across the extended model. The CIEINR is in the implementation stage and has been proven operational for Phenopackets in the piloting phase. Specific development and analysis on the use of the RareLink framework will be reported elsewhere. This use case confirmed that RareLink can be adapted for diverse clinical domains while preserving semantic and syntactic integrity.

Evaluation cohort

The evaluation cohort comprised ten simulated individuals with Kabuki Syndrome type 1. For each case, data included basic demographics, phenotypic features, genetic findings and selected clinical measurements. Following data entry, records were exported to both FHIR instances and Phenopackets (Fig. 3b). The resulting JSON files are publicly available via our GitHub repository17. As illustrated in Supplementary Fig. 2, the exported data maintain consistent semantic representations across formats for each individual, while ensuring syntactic interoperability with the respective data standard

Comments (0)

No login
gif