CP-MLR derived QSAR rationales for the PPAR agonistic activity of the pyridyloxybenzene-acylsulfonamide derivatives

QSAR rationales have been obtained for the PPAR transactivation activity of pyridyloxybenzene-acylsulfonamides in terms of 0Dto 2D-Dragon descriptors. The descriptors identified in CP-MLR analysis have highlighted the role of atomic mass, van der Waals volumes and polarizability through weighted 2D autocorrelations (GATS1v and GATS1p), modified Burden eigenvalue (BEHm4) and molecular weight (MW). Sum of topological distances between O and S atoms (descriptor T(O..S)), and N and Cl atoms (descriptor T(N..Cl)), average connectivity index chi-1(X1A) and Quadratic index (Qindex) have also shown dominance to optimize the PPARγ transactivation. Descriptors RBN and RBF suggested presence of rotatable bonds in a molecular structure for better PPAR activity. Applicability domain analysis revealed that the suggested model matches the high quality parameters with good fitting power and the capability of assessing external data and all of the compounds was within the applicability domain of the proposed model and were evaluated correctly.


Introduction
Type 2 diabetes mellitus (T2DM) is a chronic metabolic disorder. It is characterized by impaired insulin secretion, insulin resistance and hyperglycemia. More than 90% of diabetic patients are T2DM cases. These metabolic disorders develop, in a long term, many other disorders like dyslipidemia, hypertension and coronary heart disease. Sedentary lifestyle and obesity in association with these risk factors increases morbidity and mortality [1]. It is expected that by the year 2025 there would be 380 million T2DM cases [2]. To cure T2DM a class of new therapeutic agents thiazolidinediones (TZDs), the representatives of which are pioglitazone and rosiglitazone, emerged as insulin sensitizers. The identification and optimization of TZDs was devoid of the knowledge of the target protein. In the field of antidiabetic drug discovery and development, the findings that TZDs are high affinity ligands for peroxisome proliferator-activated receptor (PPAR) [3], opened channels for the extensive research [4][5][6][7][8][9]. The binding of TZD activates PPARwhich functions as an essential transcriptional regulator of glucose and lipid homeostasis. PPAR is the most broadly studied subtype among the three PPAR subtypes (namely designated as PPAR, PPAR, and PPAR). PPARexpressed predominantly in adipose tissue, regulates the expression of a constellation of genes which is closely related to adipocyte differentiation, glucose and lipid metabolism, insulin sensitivity, inflammatory responses and cell proliferation [3,10]. The majority of reported PPAR ligands like TZD, oxazolidinone and tetrazole possess a carboxylic acid or its heterocyclic bioisostere [11][12][13][14][15][16]. There is also an example of non-TZD and non-carboxylic acid PPARagonists [17].

274
A novel class of pyridyloxybenzene-acylsulfonamides as non-thiazolidinedione (TZD), non-carboxylic-acid-based selective PPAR agonists has been reported by Rikimaru et al. [18]. The aim of present communication is to establish the quantitative relationships between the reported activities and molecular descriptors unfolding the substitutional changes in titled compounds.

Biological actions and theoretical molecular descriptors
The reported thirty four pyridyloxybenzene-acylsulfonamidesare considered as the data set for this study [18]. These derivatives were evaluated for their transactivation activity against human PPAR stably expressed in Chinese hamster ovary (CHO) -K1 cells. Transactivation activities were assessed by a luciferase reporter gene assay using (R)-5-(3-{4-[(2-Furan-2-yl-5-methyl-1,3-oxazol-4-yl)methoxy]-3-methoxyphenyl}propyl)-1,3-oxazolidine-2,4-dione [19] as the reference PPAR agonist and were reported as EC50. The general structure of these compounds is given in Figure 1. The structural variations of these analogues along with their reported pEC50, on molar basis, are mentioned in Table 1. The data set has been sub-divided into training and test set. The models developed from training set have externally validated through test set. The test set compounds was selected using an in-house written randomization program. The test and training set compounds are also mentioned in Table 1. The structures of the all the data set compounds of Table 1 were drawn in 2D ChemDraw [20] and subjected to energy minimization in the MOPAC using the AM1 procedure for closed shell system after converting these into 3D modules. The energy minimization was carried out to attain a well defined conformer relationship among the congeners under study. The 0D-to 2D-molecular descriptors of titled compounds was computed using DRAGON software [21]. This software offers a large number of descriptors corresponding to ten different classes of 0D-to 2D-descriptor modules. The different descriptor classes include the constitutional, topological, molecular walk counts, modified Burden eigenvalues, Galvez topological charge indices, 2D-autocorrelations, functional groups, atom-centered fragments, empirical descriptors and the properties describing descriptors. These descriptors offer characteristic structural information specific to the descriptor class. The definition and scope of these descriptor's classes is given in Table 2.  Dragon software computed a total number of 496 descriptors, belonging to 0D-to 2D-modules. These descriptors have been utilized to obtain most appropriate models describing the biological activity. Prior to model development procedure, all those descriptors that are inter-correlated beyond 0.90 (descriptor versus descriptor, r > 0.9) and showing a correlation of less than 0.1 with the biological endpoints (descriptor versus activity, r < 0.1) were excluded. In doing so, 120 descriptors appeared as significant ones to explain the biological activity of titled compounds.

Development and validation of model
In the present study, QSAR models have been developed using the combinatorial protocol in multiple linear regression (CP-MLR) [22][23][24][25][26] procedure. It is a "filter"-based variable selection procedure, which employs a combinatorial strategy with MLR to result in selected subset regressions for the pulling out of diverse structure-activity models. Each derived model has unique combination of descriptors from the generated dataset of the compounds under study. The embedded filters make the variable selection process efficient and lead to unique solution. Fear of "chance correlations" exists where large descriptor pools are used in multilinear QSAR/QSPR studies [27,28]. The fear of any chance correlations associated with the models recognized in CP-MLR, overcome by randomization test [29,30] in which each crossvalidated model has been subjected to repeated randomization (100 simulation runs) of the biological responses. The datasets with randomized response vector have been reassessed by multiple regression analysis. The resulting regression equations, if any, with correlation coefficients better than or equal to the one corresponding to unscramble response data were counted. This has been used as a measure to express the percent chance correlation of the model under scrutiny.
Validation of the derived model is necessary to test its prediction and generalization within the study domain. For each model, derived by involving n data points, a number of statistical parameters such as r (the multiple correlation coefficient), s (the standard deviation), F (the F ratio between the variances of calculated and observed activities), and Q 2 LOO (the cross-validated index from leave-one-out procedure) have been obtained to access its overall statistical significance. In case of internal validation, Q 2 LOO is used as a criterion of both robustness and predictive ability of the model. A value greater than 0.5 of Q 2 index suggests a statistically significant model. The predictive power of derived model is based on test set compounds. The model obtained from training set has a reliable predictive power if the value of the r 2 Test (the squared correlation coefficient between the observed and predicted values of compounds from test set) is greater than 0.5. Additional statistical parameters such as, the Akaike's information criterion, AIC [31,32], the Kubinyi function, FIT [33,34] and the Friedman's lack of fit, LOF [35], have also been calculated to further validate the derived models. The AIC takes into account the statistical goodness of fit and the number of parameters that have to be estimated to achieve that degree of fit. The FIT, closely related to the F-value, proved to be a useful parameter for assessing the quality of the models. A model which is derived in k independent descriptors, its F-value will be more sensitive if k is small while it becomes less sensitive if k is large. The FIT, on the other hand, will be less sensitive if k is small whereas it becomes more sensitive if k is large. The model that produces the lowest AIC value and highest FIT value is considered potentially the most useful and the best. The LOF factor takes into account the number of terms used in the equation and is not biased, as are other indicators, toward large number of parameters.

Applicability domain
The usefulness of a model is based on its accurate prediction ability for new congeners. A model is valid only within its training domain and new compounds must be assessed as belonging to the domain before the model is applied. The applicability domain (AD) is evaluated by the leverage values for each compound [36]. A Williams plot (the plot of standardized residuals versus leverage values (h)) is constructed, which can be used for a simple graphical detection of both the response outliers (Y outliers) and structurally influential chemicals (X outliers) in the model. In this plot, the AD is established inside a squared area within ±x standard deviations and a leverage threshold h*, which is generally fixed at 3(k + 1)/n (n is the number of training set compounds and k is the number of model parameters), whereas x = 2 or 3. If the compounds have a high leverage value (h >h*), then the prediction is not trustworthy. On the other hand, when the leverage value of a compound is lower than the threshold value, the probability of accordance between predicted and observed values is as high as that for the training set compounds.

QSAR results
In multi-descriptor class environment, a model equation(s) along the descriptor class provides a prospect to unravel the phenomenon under study i.e. the concepts embedded in the descriptor classes relate the biological actions revealed by the compounds. For the purpose of modeling study, one third of total active compounds (10) have been included in the test set for the validation of the models derived from remaining 20 training set compounds. A total number of 120 relevant descriptors from 0D-to 2D-classes, which were obtained after the reduction of descriptor data set, have been subjected to CP-MLR analysis with default "filters" set in it. Statistical models in two, three and four descriptors have been explored to achieve the best relationship correlating PPAR transactivation activity. The obtained two and three descriptor models are given below. where n, r, s and F represent respectively the number of data points, the multiple correlation coefficient, the standard deviation and the F-ratio between the variances of calculated and observed activities. In above and all follow-up regression equations, the values given in the parentheses are the standard errors of the regression coefficients. The signs of the regression coefficients suggest the direction of influence of explanatory variables in the models. The positive regression coefficient associated to a descriptor will augment the activity profile of a compound while the negative coefficient will cause detrimental effect to it. In the randomization study (100 simulations per model), none of the identified models has shown any chance correlation.  Table 3 Identified descriptors a along with their class, average regression coefficient and incidence b , in modeling the PPAR transactivation activities of pyridyloxybenzene-acylsulfonamides.

Descriptor class, average regression coefficient and (incidence)
Constitutional descriptors (

280
The three descriptor model could estimate nearly 68% in observed activity of the compounds. Considering the number of observation in the dataset, models with up to four descriptors were explored. It has resulted in 37 models with test set r 2 > 0.50. These models (with 120 descriptors) were identified in CP-MLR by successively incrementing the filter-3 with increasing number of descriptors (per equation). For this, the optimum r-bar value of the preceding level model (=0.786) has been used as the new threshold of filter-3 for the next generation. These models have shared 43 descriptors among them. All these shared descriptors along with their brief meaning, average regression coefficients, and total incidence are listed in Table 3, which will serve as a measure of their estimate across these models.
Following are the selected four-descriptor models for the PPAR transactivation activities of pyridyloxybenzeneacylsulfonamides emerged through CP-MLR. These models have accounted for nearly 81% variance in the observed activities. In the randomization study (100 simulations per model), none of the identified models has shown any chance correlation. The values greater than 0.5 of Q 2 index is in accordance to a reasonable robust QSAR model. The pEC50 values of training set compounds calculated using Eqs. (4) to (7) and predicted from LOO procedure have been included in Table 4.
The models (4) to (7) are validated with an external test set of 10 compounds mentioned in Table 1. The predictions of the test set compounds based on external validation are found to be satisfactory as reflected in the test set r 2 (r 2 Test) values and the same is reported in Table 4. The plot showing goodness of fit between observed and calculated activities for the training and test set compounds is given in Figure 2.
The newly appeared descriptors in above models are, T(N..Cl) and X1A (topological descriptors), RBN and RBF (constitutional descriptors), and GATS1v (a 2D-AUTO class descriptor). Descriptors T(N..Cl), RBN, RBF and GATS1v have correlated positively to the PPAR transactivation whereas descriptor X1A influenced it negatively. Thus from the signs of regression coefficients of these descriptors it is evident that higher values of the sum of topological distances between N and Cl atoms (descriptor T(N..Cl)), presence of more number of rotatable bonds (descriptor RBN), higher value of rotatable bond fraction (descriptor RBF) in a molecular structure and a higher value of Geary autocorrelation of lag-1/weighted by atomic polarizabilities (GATS1p) would be beneficial to the activity, whereas a lower value of descriptor X1A (average connectivity index chi-1) would be advantageous to the activity.   (7), respectively) of training-and test-set compounds for PPAR transactivation.

Applicability domain (AD)
On analyzing the model AD in the Williams plot, shown in Figure 3, of the model based on the whole dataset (Table 5), it has appeared that none of the compounds were identified as an obvious outlier for the PPAR transactivation activities if the limit of normal values for the Y outliers (response outliers) was set as 3 (standard deviation) units. Two compounds listed in Table 1 at S. No. 8 and 27 found to have leverage (h) values greater than the threshold leverage (h*) suggesting them as chemically influential compounds. For both the training-set and test-set, the suggested model matches the high quality parameters with good fitting power and the capability of assessing external data. Furthermore, all of the compounds were within the applicability domain of the proposed model and were evaluated correctly.

Conclusion
QSAR rationales have been obtained for the PPAR transactivation activity of pyridyloxybenzene-acylsulfonamides in terms of 0D-to 2D-Dragon descriptors. The descriptors identified in CP-MLR analysis have highlighted the role of atomic mass, van der Waals volumes and polarizability through weighted 2D autocorrelations (GATS1v and GATS1p), modified Burden eigenvalue (BEHm4) and molecular weight (MW). Sum of topological distances between O and S (descriptor T(O..S)), and N and Cl (descriptor T(N..Cl)), average connectivity index chi-1(X1A) and Quadratic index (Qindex) have also shown dominance to optimize the PPARγ transactivation. Descriptors RBN and RBF suggested presence of rotatable bonds in a molecular structure for better PPAR activity. Applicability domain analysis revealed that the suggested model matches the high quality parameters with good fitting power and the capability of assessing external data and all of the compounds was within the applicability domain of the proposed model and were evaluated correctly.