Comparative performance of structural aligners in functional domain annotation

Accurate protein domain annotation is essential for inferring protein function, and databases such as Pfam provide sequence-derived signatures for thousands of domain families. Because protein structure is more evolutionarily conserved than sequence, structure-based searches can detect homologous relationships even at low sequence identity (typically below 30%), where pairwise sequence aligners often lose sensitivity. Here, we leverage AlphaFold-derived structures of Pfam domain instances to systematically evaluate structure-based versus sequence-based methods for Pfam annotation.

We benchmarked three structural aligners (Reseek, Foldseek, TM-align) against sequence-based methods (MMseqs, HMMER) using both exhaustive all-against-all searches and a split-family design that enables direct comparison of pairwise and profile-based ranking performance. We also evaluated residue-level alignment accuracy using Pfam multiple sequence alignments as reference and investigated whether profile-derived information can improve structural hit ranking.

In all-against-all searches, Reseek achieved the highest sensitivity up to the first false positive (AUC = 0.85), outperforming Foldseek (0.81), TM-align (0.76), and MMseqs (0.46). In split-family evaluation, HMMER remained superior (maximum F1 = 0.991), highlighting the continued strength of sequence-profile approaches for family-level annotation. Performance varied substantially across domain families, with average sequence identity emerging as the strongest predictor of success. Structural aligners consistently produced more accurate residue-level mappings than pairwise sequence methods. Finally, incorporating profile-derived information via rescoring improved structural annotation performance for short domains, suggesting a path toward profile-informed structure-based domain annotation.

Comments (0)

No login
gif