Emerging XBB variants in Vietnam show high affinity for hACE2

Since the beginning of 2022, the Omicron variant of the SARS-CoV-2 virus responsible for the COVID-19 pandemic has been the dominant variant. Numerous subvariants of Omicron have been identified, including BA.2.75, BA.4, and BA.5, which are classified as Variants of Concern (VOCs) due to their potential to cause severe illness. Additionally, an emerging variant called XBB has recently caused outbreaks and is believed to be highly lethal. To better understand the XBB variant, we compared the variation in the S protein to that of the other variants. Our initial findings indicate that the XBB variant has spread in Vietnam and carries unique nucleotide changes in the S gene. Specifically, we observed two mutations, G22317T (G252V) and C23123T (P521S), that have resulted in a new subvariant called XBB.1 on the phylogenetic tree. Our analysis also suggests that the XBB variant has a high affinity for hACE2, as indicated by an increase in the interface's number of residues and van der Waals energy. We found that XBB has conserved mutations in RBD that enhance its binding affinity for hACE2. In this report, we noted the mutation V83A, H146Q, and G252V in the NTD increased the binding free energy of the XBB spike protein in the complex with hACE2.


Introduction
In late 2019, a previously unknown virus causing respiratory symptoms surfaced in Wuhan, Hubei Province, China. Molecular analysis identified it as a novel coronavirus, closely related to the one that caused the 2002-2003 SARS outbreak and was named SARS-CoV-2. Possessing distinct advantages over its predecessor, SARS-CoV-2 has rapidly spread worldwide and triggered the COVID-19 pandemic. A defining characteristic of SARS-CoV-2 is its high mutation rate and adaptability in exploiting host cells. Its single-stranded RNA genome encodes four structural genes: nucleocapsid protein (N), spike protein (S), envelope protein (E), and matrix protein (M) [1,2]. Notably, the S gene, which codes for the spike protein, undergoes positive selection, leading to rapid evolution and diversification of the virus during transmission [3]. In contrast, the other structural genes exhibit lower mutation rates [4].
The rapid mutation rate of the S gene can be explained by its vital role in the life cycle of SARS-CoV-2. The S protein not only directly interacts with the hACE2 receptor on the human surface for viral membrane fusion but is also represented as an antigenic factor. Consequently, the S protein was a gold target for vaccine development, monoclonal antibodies, and therapeutic drug [5][6][7]. Studies on the impact of amino acid mutations on the protein's interaction efficiency with hACE2 and antibodies are increasingly famous [8]. The significance of the S gene and protein is understood, which has now classified new variants of SARS-CoV-2 based on the mutation information on the S protein. PANGO Lineages did the same by constructing a secondary database to evaluate the nucleotide changes of the S gene [9]. This shows a trend to assess changes in the S gene and S protein that are essential in preventing the COVID-19 pandemic. There are currently three main groups: VUM (Variants Under Monitoring), VOI (Variants of Interesting), and VOC (Variants of Concern). In particular, the VOC variants are considered the most dangerous and successful variants of SARS-CoV-2 worldwide. Indeed, the first VOC variant, Alpha, which was more infectious than the wildtype (WT) [10], caused intense outbreaks in the UK and several other countries. The following VOC variant is Delta, with characteristic mutations on the S protein that has rapidly become the most dominant variant in the world and caused the deaths of millions of people worldwide, with the United States and India two countries being heavily affected. The success of Delta is attributed to the mutations in the RBD (Receptor Binding Domain) region K417N, L452R, and T478K acting as escape antibodies and increasing binding affinity [11][12][13][14], and in the FCS (Furin Cleavage Site) region, P681R increased the cleavage effectiveness [15,16]. It is shown that mutations appearing on the S protein contribute directly to the infectious ability of SARS-CoV-2.
In late 2021, a new VOC variant called Omicron first appeared in South Africa, carrying over 30 amino acid mutations in the S protein [17]. Shortly after, many studies showed that Omicron was able to escape from mAbs and induce reinfection [18][19][20]. Studies have also demonstrated that Omicron has a better affinity for the hACE2 receptor than Delta to transmit more quickly. As a result, Omicron quickly replaced Delta to become the popular variant worldwide. With its rapid infectious rate and superior mutagenicity, Omicron rapidly proliferated many novel variants, of which BA.

Data collection
The whole genomes of XBB variants were extracted from the GISAID database. The S sequences of Alpha and Delta were selected from our previous publication [21], and the first sequences of BA.2.75, BA.4, and BA.5 variants that appeared in Vietnam were taken from GISAID (accession number: EPI_ISL_16527063, EPI_ISL_13729175, EPI_ISL_15896945 respectively). The S gene was split by the MAFFT ver 2.0 [22] server based on the Wuhan-hu-1 sequence in GenBank (accession number: NC_045512.2). After that, the S gene sequences were performed multiple sequence alignment by the ClustalW program in MEGA11 software [23].

Phylodynamic analysis and haplotype network
Continuous, we analyze the phylodynamic of the S gene based on the Maximum-Likelihood (ML) phylogenetic tree in TreeTime (https://treetime.biozentrum.unibas.ch/) [24] with the GTR model and set up mutation rates at 0,0008/site/year (Base on Nextstrain validation). The cross-validation ML tree was built by IQtree ver 2.2.0 [25] with the GTR model and 10.000 bootstrap values. We used POPART [26] software to identify nucleotide mutation to construct the haplotype network of S gene sequences based on Median Joining Network methods.

Homology modelling and molecular docking
The S gene sequences were translated into protein sequences by MEGA11 software. Then, we used these protein sequences to build the homology three-dimensional (3D) S protein structure in the SWISS-MODELLING server [27] with the S protein (chain A) of crystal Spike -hACE2 complex in PDB (PDB ID: 7T9K) as the template structure. The quality of homology structures were estimated by the Structure Assessment tool , ProSA-web server [28] and ERRAT on the SAVES ver 6.0 package [29].
After building the homology structure, we used these protein models to dock with hACE2 receptor of template structure by HADDOCK (High Ambiguity Driven protein-protein DOCKing) server [30]. To advance the reliability of the docking process, we used Optimize run for bioinformatics and ambiguous restraints (AIRs) features of HADDOCK server. We also used the HDOCK server [31] with the hybrid algorithm docking as the cross-validation of the Spike-hACE2 complex.

Estimating binding affinity and binding free energy
The binding free energy ( ) and binding affinity ( ∆ ) are the primary indexes to determine the effectiveness of protein-protein interactions. To calculate the of spike protein with hACE2 complex (Spike-hACE2 complex), we submitted the complex into the Hawkdock server for examination with Molecular Mechanics Generalized Born Surface Area (MM/GBSA) approach which is the high impact method for protein-protein interactions [32,33]. Hawkdock server minimized the complex in ff02 force fields for 5000 steps, and the distance for van der Waals interactions is 12 Å. The total has indicated details in equation (1). Furthermore, to give more information about the complex, PRODIGY server was used to calculate binding affinity [34,35].

Protein-protein interface network bonding
After molecular docking, these complex structures were examined by the PDBsum server to identify the network bonding of the interface. Then, to determine the positive effect of a mutation that appeared in S protein, we used equals that were calculated in equation (2) which was suggested in the Lei Xu et al. publication [36]. ∆∆ is below 0, meaning the mutation enhances the binding affinity of the spike with hACE, and the opposite ∆∆E is higher than 0.
Finally, we used open-source Pymol software to visualize the complex and estimate the electrostatics of S protein based on the APBS plugin.

Examinaion nucleotide changes of the S gene of XBB variants
The analysis of ten S gene sequences of the XBB variants reveals clustering of XBB and XBB.1 variants through the G22317T (G252V) and C23123T (P521S) mutations, with bootstrap values of 72 and 87, respectively ( Figure 1A). We have also observed high substitution rates of XBB variants since their emergence in Vietnam in November 2022. The haplotype network ( Figure 1B) demonstrates that the G22317T mutation in the S gene led to the cleavage of the XBB variant to XBB.1, followed by further mutations at C23123T, forming a new haplotype group in December 2022. These results indicate that the XBB and XBB.1 variants have the potential to infect the Vietnamese population. The high mutation rate recorded underscores the need to monitor the frequency of the XBB and XBB.1 variants in Vietnam.
To assess the impact of S gene mutations on protein sequences, we compared the mutations characteristic of the XBB variant with the variants identified by the WHO as Variants of Concern (VOCs) currently present in Vietnam, including BA.2.75, BA.4, and BA.5. We found that XBB has five specific mutations in the N-Terminal Domain (NTD) region, namely T21810C (V83A), C22000A (H146Q) C22109G (Q183E), G22317T (G252V) and T22200A (V213E). In the Receptor-Binding Domain (RBD) region, we also note three mutations G22599A (R346T), C22664A (L368I), and T23019C (F486S) (Figure 2), as the specific mutation which split the XBB from BA.2 variant [37]. The XBB variant exhibits more amino acid mutations, with 42 mutations. In the S1 subunit, the NTD region recorded 9 mutations and three deletion mutations, the RBD region had 23 mutations, and the fusion peptide (FP) and connecting peptide (CP) subunits contained 5 mutations, whereas the S2 subunit recorded 5 mutations. The positions of spike protein mutations are illustrated in Figure 2.

Homology spike protein models and protein-protein docking
To generate 3D protein structures, we used the SWISS-MODELLING server to build the homology S protein structures.
These were subsequently evaluated for validation using the Structure Assessment tool, ProSA-web server, and ERRAT of the SAVES ver 6.0 package. Analysis of the homology models revealed that all models had more than 90% acceptable positions on the Ramachandran plot, with an overall quality factor of over 90% on ERRAT. The homology structures were then superimposed to check the angle of the C-alpha of the spike protein with the template structure. The RMSD (Root-mean-square deviation) of the homology structures was found to be 0.01 -0.365 (Å), indicating successful amino acid mutation of the spike protein. All homology models passed our evaluation criteria and are suitable for downstream analysis. The detailed index of homology models is displayed in Table 1. To investigate the interaction between S protein and hACE2, we performed molecular docking on the HADDOCK server, with the HDOCK server used for cross-validation. We selected the best docking model generated by HADDOCK based on the HADDOCK score, while the HDOCK Confidence was used to assess the spike protein's ability to bind to the hACE2 receptor. A HDOCK Confidence value above 0.7 indicates direct interaction between the two proteins, while a value below 0.5 implies that the two proteins cannot bind. Our molecular docking results revealed that the complexes mimicking the binding of the spike protein to hACE2 exhibited HDOCK Confidence values exceeding 0.9, which were even better than those of the WT variant. Moreover, the fluctuations of the simulation Spike-hACE2 complex were below 2.0 (Å). However, the RMSD coefficient established by HADDOCK indicated that the association simulations of Omicron subvariants showed higher variability than the WT variant, possibly due to the large number of protein mutations that affect the protein model's stability during simulation. Overall, our results suggest that the interaction simulations of Spike-hACE2 complex were successfully built. These interaction models will be utilized to evaluate the efficiency of protein-protein interaction. The HADDOCK scores, RMSD, and HDOCK Confidence parameters are presented in Table 2.

Estimate binding free energy and binding affinity of Spike-hACE2 complex
To assess the efficacy of the Spike-hACE2 complex binding, we utilized the binding free energy ( ) and binding affinity ( ∆ ) parameters. We noted BA.  [41]. Our results show that the XBB variant appearing in Vietnam has solid infectious potential and a better affinity for the hACE2 receptor.

Protein-protein network bonding
To evaluate the changes affecting the protein structure in more detail, we analyzed the S protein's interaction surface with hACE2 (interface). Wse analyzed the charge distribution on the S protein surface, showing that the variants enhanced the positive charge expression on the interface (Figure 2). This explains why the entire evaluated variant strongly enhanced ∆ of Spike-hACE2 complex. Here, in terms of the interface of the complex, we find that the surface of the interface has a slight change in favorable charge distribution when compared with BA.2.75, BA.5, and BA.5, which is similar to the decrease in the electrostatics energy of XBB (Table 3) when compared to the other variants. In addition, looking at the amino acids of S protein which are directly interacting with hACE2, we found that the XBB variant has the highest number of residues participating in the interface, 27 residues. In contrast, the variants BA.2.75 had 21 residues, BA.4 had 23 residues, BA.5 had 25 residues, and WT had 21 residues, respectively ( Table 3). The enhancement of amino acids involved in the interaction directly affects the area of the interface. Therefore, the XBB variant has the largest interface area compared to the remaining variants in the study ( Table 3). The enhanced expression of residues in the interface also increases the binding of the spike protein to hACE2. Accordingly, the XBB variant has the most significant number non-bonded contacts at 184 (   The analyzed free energy changes per residue found that the XBB variant preserved most mutations with ∆∆ < 0. In the NTD, there were three new mutations, V83A, H146Q and G252V, which enhance of the Spike-hACE2 complex (Figure 4), showing that these mutations affect spike adhesion to hACE2. Interestingly, the experimental study of Tamura et al. (2022) [42] found that the V83A mutation enhances the transmissibility and fusion capacity of XBB 3.3 times, which is consistent with our prediction based on the V83A mutation. We also checked another new mutation in NTD of XBB, G22317T (G252V), that increased of Spike-hACE2 complex, which was not determined in previous studies [37] [42] [43]. In the RBD region, the XBB variant in Vietnam recorded three new mutations, R346T, L368I, and P521S, in which L368I and P521S were recorded with ∆∆ = 0. R346T and V445P has an ∆∆ > 0, which implies a reduction in the binding free energy of the Spike-hACE2 complex. The reduced binding free energy of R346T and V445P may be due to the contribution of these two mutations to the ability to evade neutralizing antibodies [44] [45]. Other matching types in the F486V and F486S mutations suggested that BA.5 and XBB trade-off immune clearance by decreasing affinity for hACE2 [46] [42]. Likewise, the K417N mutation in variants BA.2.75 and BA.4 has ∆∆ > 0, but the main effect of this mutation is to help the virus escape mAbs [47]. The decrease of K417N because of the substitution of Lys for Asn broke a salt bridge and a hydrogen bond [38]. But our results show that the K417N mutants of BA.5 and XBB are enhanced compared with BA.2 and BA.2.75. This may be due to the enhancement of the interface surface (Table 3), enabling K417N to form non-bonded contacts with Asp30 of hACE2 ( Figure 4). In particular, we further noted the appearance of a new hydrogen bond of XBB compared with BA.5 at the Pro499 position ( Figure 3). Accordingly, the N440K mutation changed Asn440 residues with the uncharged side chain to Lys440 residues with a positive charge to form hydrogen bonds and salt bridge interactions with Glu329 of hACE2 ( Figure 3). We also noted the Q498R and N501Y as mutations that positively affected the interaction efficiency of spike protein with hACE2 previously mentioned in other studies (Figure 3). The remaining XBB variants and BA.2.75, BA.4, and BA.5 have preserved favorable mutations in the FCS (D614G and N679K) and S2 subunit (N764K, D796Y, and N969K).

Figure 5
The ∆∆ per residues of spike protein (This results present for the positive mutation for spike protein with the ∆∆ < 0 in red colors. The ∆∆ means negative effect was colored in blue. The ∆∆ = 0 shows that no changes in the binding free energy were displayed in brown color. And the mutations are not present in the variants were painted by grey color.)
In this study, we highlight the mutation in NTD, V83A, H146Q, and G252V (new mutation) could increase the binding free energy for spike protein in complex with hACE2. Our results also showed that the appearance of new mutations in the S protein of the XBB variant enhanced the contact area of the protein with hACE. The result is an increase in adhesion efficiency through an increase in van der Waals interaction. Although our study has many limitations due to the approach only using various computational tools, these initial results will provide helpful information for further experimental investigations.