Quantitative structure-activity relationship study on the MMP-13 inhibitory activity of fused pyrimidine derivatives possessing a 1,2,4-Triazol-3-yl group as a ZBG

QSAR study has been carried out on the MMP-13 inhibitory activity of fused pyrimidine derivatives possessing a1,2,4triazol-3-yl group as a ZBG in 0Dto 2D-Dragon descriptors. The derived QSAR models have revealed that the number of Sulfur atoms (descriptor nS), Balaban mean square distance index (descriptor MSD), molecular electrotopological variation (descriptor DELS), structural information content index of neighborhood symmetry of 2nd and 3rd order (descriptors SIC2 and SIC3), average valence connectivity index chi-4 (descriptor X4Av) in addition to 1st order Galvez topological charge index (descriptor JGI1) and global topological charge index (descriptor JGT) played a pivotal role in rationalization of MMP-13 inhibition activity of titled compounds. Atomic properties such as mass and volume in terms of atomic properties weighted descriptors MATS5m and MATS3v, and certain atom centred fragments such as CH2RX (descriptor C-006), X--CX--X (descriptor C-044), H attached to heteroatom (descriptor H-050) and H attached to C0(sp3) with 1X attached to next C (descriptor H-052) are also predominant to explain MMP-13 inhibition actions of fused pyrimidines. PLS analysis has also corroborated the dominance of CP-MLR identified descriptors. Applicability domain analysis revealed that the suggested model matches the high-quality parameters with good fitting power and the capability of assessing external data and all of the compounds was within the applicability domain of the proposed model and were evaluated correctly.


Introduction
More than 30 million patients worldwide are affected from osteoarthritis (OA) which is the most common forms of arthritis [1]. The pain and reduced mobility in affected joints due to progressive cartilage damage is the principal morphological characteristic of OA. The lessening of pain and inhibition of inflammation is the only recommended pharmaceutical therapies [2][3][4][5] which include oral treatment with acetaminophen, nonsteroidal anti-inflammatory drugs (NSAIDs), or cyclooxygenase-2 (COX-2) inhibitors and intra-articular injections of hyaluronic acid or corticosteroids. The withdrawal of someCOX-2 selective inhibitors (rofecoxib and valdecoxib) [6,7] rendered an unmet medical need for safe oral disease-modifying osteoarthritis drugs to prevent, slow down, or reverse any advanced cartilage destruction [8,9].
The matrix metalloproteinases (MMPs) are structurally related zinc-dependent endopeptidases which degrade varied extracellular matrix. Among MMPs the hydrolysis of type II collagen, the main structural component of the cartilage matrix, is specifically catalyzed by MMP-13 (collagenase-3) [10,11] and the protein is resistant to most proteases. MMP-13 has high levels of expression at OA chondrocytes as compared to normal chondrocytes [12]. MMP-13 plays a crucial role in the destruction of articular cartilage in OA may be evinced from the findings that in genetically modified mice the regulated expression of human MMP-13 in joint cartilages induced OA [13] and the preferential inhibition of MMP-13 blocked the degradation of explanted human OA cartilage [14]. The inhibition of destruction of cartilage by MMP inhibitors had shown in some animal models of OA in preclinical testing [15]. The concerns of dose limiting toxicity such as skin rash and musculoskeletal side effects (MSS) characterized by joint stiffness and pain has discontinued the most clinical trials of broad-spectrum MMP inhibitors. The reason behind these side effects is unclear pharmacologically [16][17][18] however, it is hypothesized that a nonselective inhibition of other metalloproteinases or a combined inhibition of a series of critical MMPs is the cause of MSS. To avoid undesirable side effects interest has been raised toward potent MMP-13 inhibitors having a high degree of selectivity over other MMPs [19][20][21][22][23][24][25][26][27][28][29]. Most of the MMP inhibitors generally have a P1′ fragment that may be lodged in the S1′ subsite of the enzyme active site and a functional group capable of binding the catalytic zinc ion. The modulation of potency and selectivity against various MMPs is allowed in a number of P1′ fragments but there are only a small identified set of zinc binding groups (ZBGs)employed in the design of selective MMP inhibitors [27,[30][31][32].
The successful cocrystallization of quinazoline-2-carboxamide [23,25,33] and triazole in MMP-13 catalytic domain in high-throughput screening for MMP-13 inhibition allowed the designing of a hybrid molecule which combined the quinazoline with the triazole ZBG. Fused pyrimidine system has been applied successfully by several groups to obtain inhibitory potency and selectivity for MMP-13 inhibition [23,25,[33][34][35][36][37]. A series of novel fused pyrimidine derivatives possessing a 1,2,4-triazol-3-yl group as a ZBG with potent MMP-13 inhibitory activities has been synthesized by Nara et al. [38]. The aim of present communication is to establish the quantitative relationships between the reported activities and molecular descriptors unfolding the substitutional changes in titled compounds.

Biological actions and theoretical molecular descriptors
The reported twenty-nine fused pyrimidine derivatives possessing a 1,2,4-triazol-3-yl group as a ZBG are considered as the data set for present study [38]. These derivatives were evaluated for their MMP-13 inhibitory activities and were reported as IC50. The reported MMP-13 activity on molar basis (as pIC50) along with the structures of these analogues is shown in Table 1. The data set was sub-divided into training set to develop models and test set to validate the models externally. The test set compounds which were selected using an in-house written randomization program, are also mentioned in Table 1. The structures of the all the compounds (listed in Table 1) were drawn in 2D ChemDraw [39] and subjected to energy minimization in the MOPAC using the AM1 procedure for closed shell system after converting these into 3D modules. The energy minimization was carried out to attain a well-defined conformer relationship among the congeners under study. The 0D-to 2D-molecular descriptors of titled compounds was computed using DRAGON software [40]. This software offers a large number of descriptors corresponding to ten different classes of 0D-to 2D-descriptor modules. The different descriptor classes include the constitutional, topological, molecular walk counts, BCUT descriptors, Galvez topological charge indices, 2D-autocorrelations, functional groups, atom-centered fragments, empirical descriptors and the properties describing descriptors. These descriptors offer characteristic structural information specific to the descriptor class. The definition and scope of these descriptor's classes is given in Table 2.  A total number of 492 descriptors, belonging to 0D-to 2D-modules, have been computed to obtain most appropriate models describing the biological activity. Prior to model development procedure, all those descriptors that are inter-correlated beyond 0.90 and showing a correlation of less than 0.1 with the biological endpoints (descriptor versus activity, r < 0.1) were excluded. This procedure has reduced the total descriptors from 492 to 93 as relevant ones to explain the biological actions of titled compounds.

Development and validation of model
The combinatorial protocol in multiple linear regression (CP-MLR) [41][42][43][44][45] and partial least squares (PLS) [46][47][48] procedures were used in the present work for developing QSAR models. The CP-MLR is a "filter"-based variable selection procedure, which employs a combinatorial strategy with MLR to result in selected subset regressions for the extraction of diverse structure-activity models, each having unique combination of descriptors from the generated dataset of the compounds under study. The embedded filters make the variable selection process efficient and lead to unique solution. Fear of "chance correlations" exists where large descriptor pools are used in multilinear QSAR/QSPR studies [49,50]. In view of this, to find out any chance correlations associated with the models recognized in CP-MLR, each cross-validated model has been subjected to randomization test [51,52] by repeated randomization (100 simulation runs) of the biological responses. The datasets with randomized response vector have been reassessed by multiple regression analysis. The resulting regression equations, if any, with correlation coefficients better than or equal to the one corresponding to unscrambled response data were counted. This has been used as a measure to express the percent chance correlation of the model under scrutiny.
Validation of the derived model is necessary to test its prediction and generalization within the study domain. For each model, derived by involving n data points, a number of statistical parameters such as r (the multiple correlation coefficient), s (the standard deviation), F (the F ratio between the variances of calculated and observed activities), and Q 2 LOO (the cross-validated index from leave-one-out procedure) have been obtained to access its overall statistical significance. In case of internal validation, Q 2 LOO is used as a criterion of both robustness and predictive ability of the model. A value greater than 0.5 of Q 2 index suggests a statistically significant model. The predictive power of derived model is based on test set compounds. The model obtained from training set has a reliable predictive power if the value of the r 2 Test (the squared correlation coefficient between the observed and predicted values of compounds from test set) is greater than 0.5. Additional statistical parameters such as, the Akaike's information criterion, AIC [53,54], the Kubinyi function, FIT [55,56] and the Friedman's lack of fit, LOF [57], have also been calculated to further validate the derived models. The AIC takes into account the statistical goodness of fit and the number of parameters that have to be estimated to achieve that degree of fit. The FIT, closely related to the F-value, proved to be a useful parameter for assessing the quality of the models. A model which is derived in k independent descriptors, its F-value will be more sensitive if k is small while it becomes less sensitive if k is large. The FIT, on the other hand, will be less sensitive if k is small whereas it becomes more sensitive if k is large. The model that produces the lowest AIC value and highest FIT value is considered potentially the most useful and the best. The LOF factor takes into account the number of terms used in the equation and is not biased, as are other indicators, toward large number of parameters.

Applicability domain
The usefulness of a model is based on its accurate prediction ability for new congeners. A model is valid only within its training domain and new compounds must be assessed as belonging to the domain before the model is applied. The applicability domain (AD) is evaluated by the leverage values for each compound [58]. A Williams plot (the plot of standardized residuals versus leverage values (h)) is constructed, which can be used for a simple graphical detection of both the response outliers (Y outliers) and structurally influential chemicals (X outliers) in the model. In this plot, the AD is established inside a squared area within ±x standard deviations and a leverage threshold h*, which is generally fixed at 3(k + 1)/n (n is the number of training set compounds and k is the number of model parameters), whereas x = 2 or 3. If the compounds have a high leverage value (h >h*), then the prediction is not trustworthy. On the other hand, when the leverage value of a compound is lower than the threshold value, the probability of accordance between predicted and observed values is as high as that for the training set compounds.

QSAR results
In multi-descriptor class environment, exploring for best model equation(s) along the descriptor class provides an opportunity to unravel the phenomenon under investigation. In other words, the concepts embedded in the descriptor classes relate the biological actions revealed by the compounds. For the purpose of modeling study, 9 compounds have been included in the test set for the validation of the models derived from 20 training set compounds. A total number of 93 significant descriptors from 0D-to 2D-classes have been subjected to CP-MLR analysis with default "filters" set in it. Statistical models in one, two and three descriptors have been derived to achieve the best relationship correlating MMP-13 inhibitory activity. Only one model in one descriptor, nine models in two descriptors and eight models in three descriptors, having r 2 Test> 0.5, were obtained through CP-MLR. The three parameter models have shared 14 descriptors among them. All these 14 descriptors along with their brief meaning, average regression coefficients, and total incidence are listed in Table 3, which will serve as a measure of their estimate across these models. Table 3 Identified descriptors along with their class, physical meaning, average regression coefficient and incidence where n, r, s and F represent respectively the number of data points, the multiple correlation coefficient, the standard deviation and the F-ratio between the variances of calculated and observed activities. In above regression equations, the values given in the parentheses are the standard errors of the regression coefficients. The signs of the regression coefficients suggest the direction of influence of explanatory variables in the models. The positive regression coefficient associated to a descriptor will augment the activity profile of a compound while the negative coefficient will cause detrimental effect to it. In the randomization study (100 simulations per model), none of the identified models has shown any chance correlation.

Descriptor class, average regression coefficient and (incidence)
The descriptors IC3 and PW2 participated in above models are the topological descriptors representing information content index of 3 rd order neighborhood symmetry and path/walk 2 Randic shape index, respectively. The positive influence of descriptors IC3 and PW2 on the activity suggested that higher values of descriptor IC3 and PW2 would be beneficiary to the activity. The other participated descriptor C-006, C-044, H-050 and H-052 belong to atom centered fragment class. The descriptors C-006, C-044 and H-050 shown positive and descriptor H-052 showed negative contribution to the activity. Thus, presence of CH2RX (descriptor C-006), X--CX..X (descriptor C-044) and H attached to heteroatom (descriptor H-050) in addition to absence of H attached to C0(sp3) with 1X attached to next C (descriptor H-052) type structural fragments in a molecular structure would be favorable to the activity.
This model in two descriptors could account for nearly 79% variance in the observed activities. Considering the number of observations models up to three descriptors have been explored and a total number of 8 models having test set r 2 greater than 0.5 were obtained. The representative models in three descriptors are presented below:  These models have accounted for nearly 87% variance in the observed activities. The values greater than 0.5 of Q 2 index is in accordance to a reasonable robust QSAR model. The pIC50values of training set compounds calculated using Eqs. (6) to (9) have been included in Table 1. The models (6) to (9) are validated with an external test set of 9 compounds listed in Table 1. The predictions of the test set compounds based on external validation are found to be satisfactory as reflected in the test set r 2 (r 2 Test) values and the same is reported in Table 1. The plot showing goodness of fit between observed and calculated activities for the training and test set compounds is given in Figure 1.
It is evident from the signs of the regression coefficients that the newly appeared constitutional class descriptor nS, topological class descriptors X4Av and SIC2, 2D autocorrelation descriptor MATS3v contributed positively to the activity. Thus, higher number of Sulfur atoms (descriptor nS) in a molecule and higher values of average valence connectivity index chi-4 (descriptor X4Av), structural information content index of neighborhood symmetry of 2-order (descriptor SIC2) and Moran autocorrelation of lag-3/ weighted by atomic van der Waals volumes (descriptor MATS3v) would be beneficiary to the activity. On the other hand, Galvez charge index JGI1 contributed negatively to the activity suggesting that a lower value of mean topological charge index of order 1 would be favorable to the activity.
A partial least square (PLS) analysis has been carried out on these 14 CP-MLR identified descriptors (Table 3) to facilitate the development of a "single window" structure-activity model. For the purpose of PLS, the descriptors have been autoscaled (zero mean and unit SD) to give each one of them equal weight in the analysis. In the PLS crossvalidation, two components are found to be the optimum for these 14 descriptors and they explained 86.30% variance in the activity. The MLR-like PLS coefficients of these 14 descriptors are given in Table 4.
For the sake of comparison, the plot showing goodness of fit between observed and calculated activities (through PLS analysis) for the training and test set compounds is also given in Figure 1. Figure 2 shows a plot of the fraction contribution of normalized regression coefficients of these descriptors to the activity. The PLS analysis has suggested H-052 as the most determining descriptor for modeling the activity of the compounds (descriptor S. No. 14 in Table 4; Figure 2). The other descriptors in decreasing order of significance are MSD, C-044, H-050, DELS, nS, SIC2, JGI1, C-006, MATS3v, MATS5m, SIC3, JGT and X4Av. Except descriptors MSD, DELS, SIC3, MATS5m and JGT all these descriptors are part of Eqs. (1) to (9) and convey same inference in the PLS model as well. It is inferred from the PLS analysis that a higher values of descriptors DELS (molecular electrotopological variation) and SIC3 (structural information content index of neighborhood symmetry of 3-order) and lower values of descriptors MSD (Balaban mean square distance index), MATS5m (Moran autocorrelation of lag-5/ weighted by atomic masses) and JGT ((global topological charge index) would be advantageous to the activity. It is also observed that PLS model from the dataset devoid of CP-MLR identified 14 descriptors (Table 3) is inferior in explaining the activity of the analogues.  (Table 3) associated with MMP-13 inhibitory activity of fused pyrimidine derivatives

Applicability domain (AD)
On analyzing the model AD in the Williams plot, shown in Figure 3, of the model based on the whole dataset (Table 5), it has appeared that none of the compound was identified as an obvious outlier for the MMP-13 inhibitory activity if the limit of normal values for the Y outliers (response outliers) was set as 2.5 times of standard deviation units.

Figure 3
Williams plot for the training-set and test-set compounds for MMP-13 inhibitory activity. The horizontal dotted line refers to the residual limit (±2.5×standard deviation) and the vertical dotted line represents threshold leverage h* (= 0.413) An outlier to a QSAR is identified normally by having a large standard residual activity and can indicate the limits of applicability of QSAR models. Two compounds listed in Table 1 at S. Nos. 3 and 8 were found to have leverage (h) values greater than the threshold leverage (h*=0.413) suggesting them as chemically influential compounds. For both the training-set and test-set, the suggested model matches the high-quality parameters with good fitting power and the capability of assessing external data. Furthermore, all of the compounds were within the applicability domain of the proposed model and were evaluated correctly.

Conclusion
QSAR study has been carried out on the MMP-13 inhibitory activity of fused pyrimidine derivatives possessing a 1,2,4triazol-3-yl group as a ZBG in 0D-to 2D-Dragon descriptors. The derived QSAR models have revealed that the number of Sulfur atoms (descriptor nS), Balaban mean square distance index (descriptor MSD), molecular electrotopological variation (descriptor DELS), structural information content index of neighborhood symmetry of 2 nd and 3 rd order (descriptors SIC2 and SIC3), average valence connectivity index chi-4 (descriptor X4Av) in addition to 1 st order Galvez topological charge index (descriptor JGI1) and global topological charge index (descriptor JGT) played a pivotal role in rationalization of MMP-13 inhibition activity of titled compounds. Atomic properties such as mass and volume in terms of atomic properties weighted descriptors MATS5m and MATS3v, and certain atom centred fragments such as CH2RX (descriptor C-006), X--CX--X (descriptor C-044), H attached to heteroatom (descriptor H-050) and H attached to C0(sp3) with 1X attached to next C (descriptor H-052) are also predominant to explain MMP-13 inhibition actions of fused pyrimidines.
PLS analysis has also corroborated the dominance of CP-MLR identified descriptors. Applicability domain analysis revealed that the suggested model matches the high-quality parameters with good fitting power and the capability of assessing external data and all of the compounds was within the applicability domain of the proposed model and were evaluated correctly.