AMIGO - Guided assignment of C-methyl labelled proteins

AMIGO workflow

This section provides an overview of the accepted protein labelling schemes, the required NMR experiments, and the corresponding input and output data required by AMIGO. Figure 1 illustrates the process using the MILproSVproSAT methyl labelled UDP-glucose pyrophosphorylase enzyme from Leishmania major (LmUGP) protein as example.(Mühlberg et al. 2022).

Isotopic labelling scheme

AMIGO was developed for the backbone-independent assignment of methyl groups in large, densely and stereospecifically methyl-labelled proteins. As shown below, the best results are obtained with [1H,13C] Met-ε, Ile-δ1, Leu-δ2 and Val-γ2 (proS) or Leu-δ1 and Val-γ1 (proR) (MILproS/RVproS/R), Ala-β (MILproS/RVproS/RA) and Thr-γ (MILproS/RVproS/RAT) labelling schemes. However, non-stereospecific labelling such as (M)ILV(AT) are also tolerated, where parenthesis indicate optional amino acid-type labelling.

NMR experiments

At a minimum, AMIGO requires the acquisition of two types of NMR experiments: (i) a single 3D or 4D NMR experiment to obtain short-range spatial restraints (NOEs), and (ii) [1H,13C] HMQC spectra recorded for different labelling schemes to identify the residue type of each methyl-group signal. Additional experiments may be acquired to obtain long-range restraints, such as PCS or PRE data. For the assignment of LmUGP a single 4D HMQC-NOESY-HMQC experiment with a NOE mixing time of 180 ms was sufficient to extract short-range spatial restraints. Amino acid-type identification was achieved using [1H,13C] HMQC spectra of single amino acid-type labelled LmUGP samples. In addition, four PCS datasets were obtained from [1H,13C] HMQC spectra of MILproSVproSAT methyl-labelled LmUGP loaded with either diamagnetic or paramagnetic lanthanide ions. Details about how NMR experiments were set-up for proteins illustrating this manuscript can be found in Table S1.

Input data

AMIGO requires a single list of NOE cross peaks to outline a sparse NOE graph (NG). This list can be obtained from manual or automated inspection of the 3D or 4D NOE spectrum. The list also contains information on residue type for each peak. The algorithm explicitly considers reciprocal NOE cross-peaks for improved accuracy during graph fitting, formally analysed as directed edges in the NG (see Material and Methods for more information). AMIGO also accepts a second list with paramagnetic NMR (pNMR)-based long-range spatial restraints like PCS or PRE, drastically increasing the number of correct assignments and the overall algorithm performance.

A high-resolution structural model or an ensemble of structures is also required to derive a structure graph. For more details on how structural ensembles are handled by AMIGO see SI pg. 12. Additionally, a list containing pre-assignments can be provided for further refinement of the assignment. Detailed information on how to prepare the files and configure AMIGO can be found in the supplementary information.

Output data

The primary output generated by AMIGO is a list of methyl assignments. The robustness of the assignments is evaluated by performing three full-protein replicas with different cut-off distance ranges used for the construction of the structure graphs, as will be explained later. AMIGO also provides a file containing the methyl walks together with graphical representations of NOE and structural graphs indicating the methyl walks. Methyl walk paths can be used to retrace the methyl walks proposed by AMIGO, allowing critical evaluation of the assignments (Fig. S1-S2). If long-range structural restraints are provided, AMIGO generates correlation plots of experimentally and theoretically calculated additional restraints based on the proposed assignment (Fig. S3). Deviations greater than one standard deviation usually indicate incorrect assignments due to, e.g., methyl groups in the structural model deviating from the solution conformation(s). While NOE peak intensities are not directly considered by AMIGO, they can be used by the user to further refine and disambiguate the assignment results.

Fig. 1Fig. 1The alternative text for this image may have been generated using AI.

Typical workflow for automatic methyl resonance assignment with AMIGO. The basis input data required by AMIGO comprises experimental methyl-methyl NOEs from a 3D or, preferably, 4D HMQC-NOESY-HMQC experiment, information on the labelling scheme used and a high-resolution structural model. For increased performance or further refinement of the assignment, long-range structural restraints and pre-assignments can also be included

Rather than treating the methyl assignments as a subgraph isomorphism problem (Pritišanac et al. 2017 a, 2019; Nerli et al. 2021), AMIGO uses a heuristic graph matching algorithm to obtain the best structural graph matching a given NMR data set. A fundamental characteristic of AMIGO is its ability to fully replicate the meticulous methyl walk strategy traditionally used by expert users. Consequently, users can backtrack through assignments with ease, enabling rapid methyl-specific evaluation. As will be explained in the next chapter, a series of weight functions allows users to improve the automated procedure iteratively, in the sense of a decision-making process.

Automated methyl walks

The first step in a (manual) methyl walk is the assignment of one or more methyl signals to be used as starting points for assignment propagation. These so-called seminal assignments are usually obtained by comparing methyl-methyl NOE patterns to a structural model. In some cases, site directed mutagenesis may be helpful to obtain a seminal assignment. Low-abundance amino acids often give rise to unique patterns of methyl-methyl NOE cross-peaks. Therefore, in a first step the low-abundance amino acids and their theoretical NOE cross-peak patterns must be identified from the structural model. It is of advantage to use a starting amino acid methyl group that is characterized by NOEs to as many different amino acid types as possible. Once identified, one of the cross-peaks within this initial NOE cross-peak pattern is tentatively assigned and used as a next starting point. If the assignment is correct the associated NOE pattern should match the theoretical pattern obtained from the structural model. This process is repeated until no more assignments are possible.

Mimicking this strategy, a typical methyl walk performed by AMIGO begins with the automated selection of a node from the NOE graph constructed from the NMR data to serve as seminal assignment. Rather than relying on a single starting point, AMIGO first ranks all methyl-group nodes according to the abundance of the respective amino acids from lowest to highest abundance, and according to the number of NOEs observed for each node from the highest to the lowest number of NOEs. Then, independent methyl walks are initiated using this ranking to select methyl groups for a seminal assignment. A node, υ, and all adjacent nodes are considered as a “closed neighbourhood subgraph”, denoted as N[υ]NOE with the NOE subindex indicating NOESY data. AMIGO uses a rarity score (Eq. 1) to rank methyl groups according to the rarity of their corresponding N[υ]NOE (i.e. from rarest to least rare). Therefore, the methyl group with the highest rarity score will be selected as seminal assignment for the first round. This is illustrated with a methionine (Met) signal M(1) in Fig. 2a, Step 1.

Fig. 2Fig. 2The alternative text for this image may have been generated using AI.

AMIGO emulates the human decision-making during a methyl-walk. Workflow of a typical protein methyl assignment performed by AMIGO

Then, AMIGO explores the structural model to identify the Met residue best matching the above-described requirements. Multiple closed neighbourhoods constructed from the structural model N[υ]struct with variable cut-off distances within a defined range, i.e., from 3 to 9 Å in steps of 0.2 Å, are created for each Met in the structural model. Each N[υ]struct is systematically compared to the experimental neighbourhood subgraph, N[υ]NOE, and the best match is selected. The best match corresponds to the node showing the highest similarity score, as defined in Eq. 4. This is exemplified in Fig. 2, Step 2, where M(1) is assigned to M321 based on the N[υ]struct constructed at 7.2 Å cut-off distance. Similarly, the structural graph is initialized with the N[υ]struct constructed at 7.2 Å around M321.

Once the seminal assignment has been performed, AMIGO advances the methyl walk by identifying the rarest NOE node adjacent to M321. The node from the NOE network showing the highest rarity score (Eq. 1) is then selected. In this example it is a leucine (Leu) L(2) in Fig. 2, Step 3. The corresponding N[υ]NOE is extracted and systematically compared against all possible N[υ]struct at variable cut-off distances around every leucine methyl group adjacent to M321. Once the best match is found (Eq. 4), the NOE node is assigned. In this example L(2) is assigned as L328. The structural graph is again expanded by comparing newly generated subgraphs N[υ]struct (in this example the cut-off turned out to be 6.4 Å) to the next rarest experimental subgraph N[υ]NOE. This process is continued until no further assignments are possible. The procedure can be regarded as a computer-guided methyl walk, accelerating the spectral assignment.

The extent of a methyl walk and the accuracy of the assignments strongly depend on the NOE signal selected as seminal assignment. Therefore, AMIGO performs an exhaustive search by completing as many methyl walks as signals in the methyl-TROSY spectrum are available, using a new methyl group as seminal assignment in each iteration. A total score is calculated for each methyl walk according to Eq. 8, being the sum of all individual similarity scores calculated during individual methyl walks. Once every signal in the methyl-TROSY spectrum has been sampled as seminal assignment, the methyl walk with the highest total score is selected. This procedure allows an extensive methyl coverage, therefore ensuring the closest-to-ground solution (Fig. 2). All methyl groups corresponding to the selected methyl walk (referred to as “cluster”; see Fig. 2a) are removed from the analysis, and a new NOE cross-peak from the pool of unassigned signals is selected according to Eq. 1 as seminal assignment for a new methyl walk. Therefore, a cluster represents a subgraph of the overall NOE graph containing the methyl groups assigned together within the final methyl walk of a given iteration. The assignment is then continued with the remaining methyl groups until no further methyl walks can be obtained. This procedure generates a set of methyl walks covering only methyl groups with NOE cross peaks, namely the complete protein methyl walk (CPMW, see Fig. 2a). The CPMW encompasses all generated assignments, the NOE and the structural graphs. Importantly, structural graphs are constructed using methyl-specific cut-off distances that best match the NOE data.

Confidence levels for the assignments are obtained from three consecutive CPMWs with increasing cut-off ranges used for the construction of the structural graphs (Fig. 2b). Consistent assignments throughout the three iterations are classified as “safe”, while variable assignments are labelled as “ambiguous”. AMIGO will not perform assignments if a minimum “safety” criterion is not met, labelling these methyl groups as “unassigned”. A detailed explanation of these exclusion criteria is provided in the SI, p. 19. Alternatively, the metrics “number of total assignments” and “number of perfect matches” between the NOE and the structure graphs can be used to identify the best CPMW.

Key to this strategy is the use of a highly efficient algorithm able to explore large graphs with relatively low computational cost. AMIGO uses an efficient matching algorithm based on the decomposition and reassembly of complex graphs into their smallest subunits (i.e. graph building blocks or GBB) for improved performance.

The graph building block concept

In a first step, AMIGO converts the list of NOEs into a graph, yielding what will be referred to as “NOE graph” (Fig. 3a). In this graph, resonances corresponding to NOE auto-peaks (also identified in the methyl-TROSY spectrum) are depicted as nodes. Each node is characterized by a primary attribute, namely the amino acid type, which can be supplemented with additional experimental constraints such as PRE or PCS. For clarity, nodes are color-coded according to the amino acid type. The connections between nodes or the edges represent experimental NOE cross-peaks observed in multidimensional NOESY datasets. Arrows originate from the auto-peak and point towards the NOE cross-peaks. Therefore, bidirectional edges reflect a higher level of confidence than unidirectional edges.

Fig. 3Fig. 3The alternative text for this image may have been generated using AI.

Fast graph calculation is achieved by decomposing structure and NOE graphs into GBBs. (a) Creation of NOE graphs. The resonances and corresponding nodes are color-coded and labelled with a capital letter according to their amino acid type and a lower-case letter as an identifier. (b) Creation of a structure-based graph. For each methyl group in the structural model, there is a node in the structure-based graph. Nodes are connected by edges if the methyl groups are close enough in space to allow for methyl-methyl NOEs. Factors other than the inter-nuclear distance can modulate the intensity of NOE signals. Therefore, the cut-off distances used to construct the graphs can be adjusted. Since cut-off distances are individually adjusted for each methyl group, connections between methyl groups in the graph become asymmetrical, symbolized by unidirectional edges. Methyl groups and corresponding nodes are color-coded and labelled with a capital letter according to their amino acid type and an integer as internal an identifier. It should be noted that the integer does not necessarily correspond to the amino acid position in the protein sequence. (c) Example for a set of structure graph-building-blocks (SGBBs). Each SGBB is characterized by an “active node” and a specific cut-off distance that is applied to the corresponding methyl group in the structural model. Note that for the example shown in (b), four methyl groups and two cut-off distances result in 2⁴ possible structure-based graphs. Assigning two arbitrary cut-off distances, d1 and d2 to every node leaves only 2 × 4 SGBBs that are required to represent the complete set of structure-based graphs. (d) The number of possible structural graphs as a function of the number of cut-off distances n. Three cases are considered for two (m = 2, black), three (m = 3, blue) and four (m = 4, grey) methyl groups. (e) The NOE graph can also be broken down into building blocks. All nodes are color-coded according to the amino acid as follows: pink for isoleucine (I), green for alanine (A), and blue for methionine (M)

In contrast, creating a structure graph based on a given structural model requires further considerations. Methyl groups present in the structural model are considered as nodes (see Fig. 3b). As in NOE graphs, each node is characterized by attributes, such as the amino acid type or theoretical restraints (for example, theoretical PRE or PCS). Edges between nodes depend on cut-off distances that correspond to observed NOEs. Because the magnitude of NOEs depends on many factors besides the internuclear distance, (Neuhaus 2011) specific cut-off distances may be required for each individual methyl group. By sampling various sets of cut-off distances for each methyl group, multiple structure graphs can be created and compared with the NOE graph. This variability introduces the challenge of high-dimensional data. For example, consider two possible cut-off distances, d1 and d2, for four methyl groups arranged as in Fig. 3b. Each methyl group can be correlated to either d1 or d2, resulting in 24=16 possible combinations, each representing a distinct structure graph. In general, for n cut-off distances and m methyl groups, nm different structure graphs can be created. As the number of methyl groups or cut-off distances increases, this approach quickly generates a vast number of structure graphs, leading to unmanageable computational times. For instance, for a protein with 100 methyl groups and assuming 5 possible cut-off distances, extrapolation of the example would result in 5100 (approx. 7.89 × 1069) possible structure graphs, a number well beyond the computational capabilities of standard computers.

Here, we introduce a simplification strategy based on graph building blocks (GBBs) to alleviate the computational challenges posed by large numbers of methyl groups or distance cut-offs (Fig. 3c). Each GBB is identified by a methyl group that serves as an “active node.” In the case of structure-based graphs, building blocks are termed SGBBs, and comprise the active node along with all methyl groups within a specific cut-off distance serving as the neighbouring nodes. As previously discussed, while nm structure graphs might be created when sampling all combinations of cut-off distances, the number of possible SGBBs defining any given N[υ]struct is reduced to n x m, hence scaling linearly with the number of methyl groups (Fig. 3d). Moreover, SGBBs can be combined to reconstruct any structure-based graph from the nm set of structure graphs (see Fig. S4). This concept of GBBs naturally extends to corresponding NOE graph building blocks (NGBBs) obtained directly from NMR experiments. For example, in 4D HMQC-NOESY-HMQC experiments, each F3(13C)-F4(1H) plane contains an auto-peak that serves as the active node (Fig. 3a and e). The cross-peaks in such planes result from methyl groups that are close in space, representing the remaining nodes of a particular NGBB. Directed edges in an NGBB correspond to inter-methyl NOEs.

Like SGBBs, NGBBs can be combined to form an overall NOE graph that ideally matches only one of all possible structure-based graphs. Comparison between NGBBs and SGBBs is much more efficient than working with complete graphs, allowing an efficient construction of methyl walks. For a detailed, stepwise explanation on how methyl-walks are constructed from GBBs see SI, p. 14, Figs. S4-S11. In summary, the GBB principle encapsulates the intuitive methyl walk concept while effectively reducing the sampling space.

Benchmarking AMIGO using stereospecifically labelled proteins

For benchmarking, AMIGO has applied to a set of 11 proteins of variable sizes, topologies and oligomeric states covering a range of stereospecific and non-stereospecific isotopic labelling schemes. The collection spans from small to very large NOE networks covering a large range of methyl densities and NOE connectivity degrees, as seen in Fig. 4a (for more details see last chapter). Importantly, AMIGO relies on NOEs for the construction of methyl walks. Therefore, although AMIGO benefits from the inclusion of long-range structural restraints, performing assignments solely on the basis of PCS or PRE data is not possible. Since we prioritise robustness over assignment extension, all methyl signals showing no NOEs were not included in the calculations. Consequently, all reported percentages refer to methyl signals exhibiting at least one NOE cross-peak, unless stated otherwise.

Table 1 Methyl resonance assignment statistics

In a first step, AMIGO has been used for the assignment of large monomeric and dimeric proteins that had been studied in our laboratory and that have been labelled with stereospecific, highly dense MILproSVproSA or MILproSVproSAT labelling schemes. These proteins are the dimeric form of the protruding domain (P-dimer) of the human Norovirus capsid protein (HuNoV Pd) (Müller-Hermes et al. 2020), the corresponding P-dimer of the murine Norovirus capsid protein (MNV Pd) (Maass et al. 2022 b), the UDP-glucose pyrophosphorylase enzyme from Leishmania major (LmUGP) (Mühlberg et al. 2022 b), and the dimeric soluble domain from human blood group B galactosyltransferase B (GTB) (Flügge and Peters 2018) (Table 1 and S1). Assignments were validated by manual assignments and used as ground-truth for benchmarking.

Fig. 4Fig. 4The alternative text for this image may have been generated using AI.

AMIGO performance. (a) Number of methyl-methyl NOEs available for each protein. Note that LmUGP was assigned using NOEs obtained in the open (gray) and closed conformations (dotted bar). (b) AMIGO performance as a function of the total number of labelled methyl groups. Proteins labelled with prochiral Leu and Val labelling (proS) are highlighted in a, b) with a dashed square. (c) Representative example showing the performance of AMIGO as a function of methyl resonances assigned per cluster (blue) and the percentage of methyl groups explored (orange) during the assignment of HuNoV Pd. (d) Methyl groups consistently assigned to the same amino acid in three consecutive runs by AMIGO are indicated as colored spheres, while ambiguously assigned groups are depicted in white. The colors indicate the amino-acid types of the assigned groups, with non-assigned and incorrectly assigned groups indicated in black and red, respectively. (e) Histograms showing methyl-specific cut-off distances selected by AMIGO for the construction of the structure-based graphs. (f) Representation of methyl-specific cut-off distances on crystal structures. Color-code corresponds to 3 Å (white) to 9 Å (red)

The following experimental data were used for the assignment: (i) methyl-TROSY spectral peak lists generated by automated spectral peak picking in CCPNMR (Skinner et al. 2016) including amino acid type classification. (ii) NOESY peak lists generated by manual analysis of 4D HMQC-NOESY-HMQC (4D NOESY) spectra. A single 4D NOESY spectrum with optimized mixing times for maximal NOE signal count was acquired per sample. (iii) MNV Pd, GTB and LmUGP exhibit natural metal-ion binding sites permissive to lanthanoid ions. Therefore, PCSs were extracted from methyl-TROSY spectra and supplied to AMIGO as long-range spatial restraints. The corresponding set of theoretically calculated PCSs was produced using Paramagpy. (Orton et al. 2020) (iv) High-resolution crystal structures of the proteins were obtained from the Protein Data Bank with accession codes 4 × 06 for HuNoV Pd, 6E47 for MNV Pd, 2OEF and 4M2A for the open and closed conformations of LmUGP, and 2RJ1 for GTB. The N-, C-termini and the so-called internal loop from GTB as well as the C-terminus from LmUGP lacked electron density in the crystal structures and were modelled using ModLoop (Fiser and Sali 2003), as previously described in their corresponding publications.

When applied to the viral protein HuNoV Pd (~ 72 kDa), AMIGO correctly assigned 94% of methyl resonances, with no erroneous or ambiguous assignments (Fig. 4b; Table 1). A closer inspection of AMIGO’s assignment strategy reveals that most assignments are concentrated in the top-ranking graph matches (Fig. 4c). Specifically, an analysis of the number of assignments per cluster shows that over 75% of all assignments originate from the first four methyl walk clusters. This pattern is consistently observed across the benchmark set, indicating that focusing on the top-ranking clusters is sufficient to recover most correct assignments, therefore substantially reducing computational time, particularly for large systems.

Fig. 5Fig. 5The alternative text for this image may have been generated using AI.

Detail of assignment of methyl groups in the GTB glycan binding pocket. (a) Sixteen methyl groups near the paramagnetic metal were not assigned in the initial AMIGO run. (b) Setting assigned methyl groups as pre-assignments and re-running AMIGO without PCS allowed to assign 12 out of 16 additional methyl groups. The crystal structure corresponds to 2RJ1, with the C-terminus and the internal loop modelled with ModLoop shown in violet

In the case of MNV Pd (~ 70 kDa), AMIGO assigned 69% of methyl resonances based solely on NOE cross-peaks. MNV Pd contains one metal-binding site per monomer with high affinity for lanthanoids, enabling PCS extraction. Inclusion of PCS extended the assignment to 78% (Fig. 4b; Table 1). LmUGP (monomeric, ~ 62 kDa) represents an interesting case study, as full protein assignment was only achieved through the combination of two 4D NOESY datasets acquired from different protein conformations induced by uridine diphosphate glucose (UDP-Glc) binding (Mühlberg et al. 2022 b). Using either the apo (open) or the UDP-Glc bound (closed) conformation alone enabled assignment of 38% and 66% of methyl signals, respectively. The larger number of assignments in the closed form is explained by the adoption of a more compact structure that facilitates the observation of NOE signals. PCSs could only be obtained from the closed conformation, as lanthanoids coordinate the oxygen atoms of the pyrophosphate group in UDP-Glc. Inclusion of PCS expanded the number of assignments to 81%. Using these assignments as restraints and running again AMIGO on the apo form expanded the number of correct assignments to 86% (Fig. 4b; Table 1).

Finally, we challenged AMIGO with the homodimeric enzyme GTB (~ 70 kDa). The use of NOEs alone yielded 34% assignments, indicating the necessity of additional long-range structural information. GTB requires Mn²⁺ as a cofactor, which can be substituted against lanthanoids to induce PCS. Inclusion of three PCS datasets with varying tensor orientations and magnitudes increased the assignment to 77%. Notably, 16 methyl resonances near the metal-binding site remained unassigned due to internal safeguards in AMIGO that prevent assignments when measured and predicted PCS strongly deviate (Fig. 5a). While useful in previous cases, these safeguards were detrimental here, likely due to a local conformational deviation between the structures observed in solution and in the crystal structure. To resolve this, all previously assigned methyl resonances were fixed as restraints, and AMIGO was re-run without PCS data. This enabled the assignment of 12 out of the 16 previously unassigned methyl signals, increasing the total count to 90% (Figs. 4b and 5b; Table 1). Detailed information about these safeguards and troubleshooting can be found in SI, p. 19.

Applying AMIGO to assign non-stereospecifically labelled proteins

It is important to emphasize that AMIGO was not originally designed to handle non-stereospecifically labelled datasets. In particular, AMIGO cannot differentiate between proS and proR methyl resonances of Leu and Val residues. Consequently, when crystal structures lack the resolution to distinguish between prochiral methyl groups, AMIGO may generate ambiguous or even incorrect assignments. To evaluate this limitation, we challenged AMIGO with a set of seven proteins labelled using non-stereospecific, low-density ILV or ILVA labelling schemes. More details on this dataset originally published with MAGMA (Pritišanac et al. 2017 b) can be found in Table 1 and in Table S1.

Despite this limitation, AMIGO achieved assignment rates in methyl groups showing NOEs of 100% for Ubiquitin (~ 9 kDa), 86% for Msrb (~ 17 kDa), 77% for EIN (~ 28 kDa), 55% for α7α7 (~ 358 kDa), 65% for ATCase (~ 34 kDa), 67% for MBP (~ 41 kDa), and 63% for MSG (~ 81 kDa) (Table 1). Notably, AMIGO preferentially returned “ambiguous” rather than incorrect assignments when chirality could not be resolved from the available data. Detailed analysis revealed that 83% of erroneous assignments involved Leu or Val methyl groups, with the remaining 17% corresponding to isolated Ile residues exhibiting few NOEs to Leu or Val methyls. These results indicate that while AMIGO is not optimal for non-stereospecifically labelled systems, it still offers value as a complementary tool to guide manual curation or validation of assignments obtained via alternative algorithms.

To summarize, AMIGO demonstrates excellent performance in the assignment of stereospecifically labelled methyl groups in large proteins, particularly when combined with complementary long-range structural information such as PCS. This integrative approach is especially valuable for challenging systems where standard approaches may fail. No erroneous assignments were made across any of the stereospecifically labelled proteins when all input parameters were combined, underscoring the robustness of the algorithm. Our results also emphasize the importance of AMIGO’s adaptive algorithm, which allows the use of variable cut-off distances during structure graph construction. This is reflected in Fig. 4e, f, which displays the histogram and topological distribution of methyl-specific cut-off distances selected automatically by AMIGO for optimal graph matching. Clearly, relying on a fixed, global cut-off distance is a simplification that may lead to systematic assignment errors.

Comparison with previous methods - Benchmarking

AMIGO was benchmarked against MAP-XSII, MAGMA, and FLAMEnGO2.0 (Table 1). Comparison with MAUS was not possible, as it requires two NOESY datasets with short and long mixing times to distinguish proS and proR methyl resonances of Leu and Val residues, which were unavailable for our examples. Similarly, MethylFLYA relies on dual NOESY datasets and performs peak picking directly from the spectra, substantially increasing the number of NOESY cross-peaks used for graph construction and thereby preventing a fair comparison focused solely on graph-matching performance. MAGIC was also not included, as it requires NOE signal intensities, which were not available in our datasets.

MAP-XSII utilizes Metropolis Monte Carlo sampling to optimize scoring functions, accepts additional long-range structural constraints (e.g., PCS and PRE), and integrates chemical shift predictions to enhance assignment coverage, albeit at an increased risk of erroneous assignments. In our benchmarking, AMIGO consistently outperformed MAP-XSII on ten out of eleven proteins tested, achieving higher numbers of correct assignments with fewer errors. A similar performance advantage was observed compared to FLAMEnGO2.0. The only exception was α₇α₇, where AMIGO conservatively classified many assignments as ambiguous due to limitations arising from non-stereospecifically labelled data.

The comparison between AMIGO and MAGMA is particularly insightful, as both algorithms depend solely on single NOESY dataset without requiring NOE intensities or chemical shift predictions. For stereospecifically labelled proteins, MAGMA provided comparable results to AMIGO for HuNoV Pd, but it assigned only half the methyl resonances for MNV Pd. Moreover, MAGMA failed to produce results for GTB and LmUGP even after 48 h of runtime on an Intel® Core™ i9-9900 K Processor (3.6 GHz). Regarding non-stereospecifically labelled datasets, AMIGO yielded more correct assignments than MAGMA, although accompanied by some misassignments. Importantly, AMIGO mitigates this trade-off by explicitly reporting ambiguous assignments and providing transparent methyl walk traces for validation purposes.

Taken together, these results establish a solid foundation for exploring how AMIGO’s efficiency and accuracy are influenced by varying the number of methyl groups and labelling schemes, as discussed in the following section.

Performance of AMIGO using experimental and simulated NOE networks from PDB structures

The influence of protein morphology and isotopic labelling schemes on the performance of AMIGO was investigated through extensive simulations. Eight proteins with molecular weights ranging from approximately 28 kDa to 118 kDa were examined, each under four distinct isotopic labelling schemes: MILproSVproS, MILproSVproSA, MILproSVproSAT and MILproRVproRAT, resulting in a total of 32 simulated NOE networks (Table S2). Simulated data graphs were constructed by systematic random removal of edges from the corresponding structure graphs, emulating realistic experimental conditions and resulting in datasets with densities between 1.75 and 2.1 observed NOEs per methyl resonance, as previously described (Nerli et al. 2021) (Fig. 6a, b).

Fig. 6Fig. 6The alternative text for this image may have been generated using AI.

Performance of AMIGO in experimental and synthetic NOE networks as a function of labelling density. (a) Density distribution (defined as number of amino acids/methyl groups) of 10 experimental (orange) and 32 synthetic (blue) NOE graphs used to test AMIGO performance. Labelling schemes explored in the synthetic data are: MILproSVproS (pale blue diamonds), MILproSVproSA (blue squares), MILproSVproSAT and MILproRVproRAT (dark blue triangles). Dashed lines indicate average density distributions for the corresponding labelling schemes. (b) Bar plot showing the distribution of degree connectivity (defined as 2 x number of edges/number of nodes) across experimental and synthetic NOE graphs. (c) Scatter plot displaying the computation time (in minutes) required by AMIGO to perform three exhaustive enumerations of possible methyl assignments, including the optimization pre-runs. The assignment of synthetic MILproR/SVproR/SAT NOE networks from α7α7 required 33 h and are not included in this comparison. (d) Detailed breakdown of run times for synthetic NOE graphs as a function of the methyl labelling scheme. (e) Percentage of correct assignments obtained for methyl groups showing NOEs in synthetic networks as a function of the labelling scheme

The results demonstrated that AMIGO efficiently manages the computational complexity inherent in graph matching under these realistic conditions, providing accurate and reliable methyl assignments within reasonable computation times, even for large proteins employing dense labelling schemes. Although increased labelling density moderately increased computational demands, AMIGO completed most assignments within 5 h using a single AMD EPYC 7302P 16-core processor with 1497.962 MHz CPU. Specifically, simulations consistently yielded high assignment rates (82–97%) across all tested isotopic labelling schemes, including proteins containing up to 245 methyl groups (Fig. 6c-e). Additionally, complete assignments for ten proteins using experimental data were accomplished in less than 24 h, further highlighting the computational practicality of the AMIGO approach.

Comments (0)

No login
gif