Objective. Electrocardiogram (ECG) analysis is vital for the diagnosis of cardiac conditions and monitoring human physiological states. However, challenges such as signal perturbations, inconsistent quality, and signal inference undermine the reliability of ECG analysis. Despite advances in large language models (LLMs), their application in enhancing ECG-based physiological measurements remains underexplored. To address these challenges, the objective is to develop a novel multimodal framework that integrates ECG signals with textual instructions for robust denoising and signal quality assessment, enabling effective physiological analysis across diverse tasks. Approach. We propose cross-modal attention (CMA)-ECG, a multimodal framework that employs a hybrid cross-attention mechanism to align ECG and text features with task-specific heads, combined with a pre-trained LLM for contextual enhancement. The framework leverages pretrained LLMs with 7B parameters, balancing accuracy and computational requirements for practical deployment. Main results. Extensive experiments on multiple datasets demonstrate that CMA-ECG achieves state-of-the-art (SOTA) performance in robustness to perturbations, quality assessment, and denoising. CMA-ECG achieves up to 8.8% higher area under the ROC curve in quality assessment and 20% lower mean squared error in denoising compared to SOTA baselines, ensuring reliable ECG processing. Significance. This approach advances ECG analysis by integrating signal and contextual data, offering a robust solution for physiological monitoring and analysis, ensuring reliable ECG processing.
Comments (0)