CONFPASS: Fast DFT Re-Optimizations of Structures from Conformation Searches - PubMed (original) (raw)
CONFPASS: Fast DFT Re-Optimizations of Structures from Conformation Searches
Ching Ching Lam et al. J Chem Inf Model. 2023.
Abstract
CONFPASS (Conformer Prioritizations and Analysis for DFT re-optimizations) has been developed to extract dihedral angle descriptors from conformational searching outputs, perform clustering, and return a priority list for density functional theory (DFT) re-optimizations. Evaluations were conducted with DFT data of the conformers for 150 structurally diverse molecules, most of which are flexible. CONFPASS gives a confidence estimate that the global minimum structure has been found, and based on our dataset, we can have 90% confidence after optimizing half of the FF structures. Re-optimizing conformers in order of the FF energy often generates duplicate results; using CONFPASS, the duplication rate is reduced by a factor of 2 for the first 30% of the re-optimizations, which include the global minimum structure about 80% of the time.
Conflict of interest statement
The authors declare no competing financial interest.
Figures
Figure 1
Computational workflow to understand chemical reactivities.
Figure 2
Sometimes, FF structures ordered by energy are re-optimized at a DFT level to a different order and a different number of structures.
Figure 3
Profile of the 150 molecules in the DFT dataset. The 150 molecules include the 20 Grayson dataset molecules, 3 radical molecules, and 127 randomly picked molecules from the Hutchison dataset. The histograms show that the 150 organic molecules are generally flexible and structurally diverse in terms of molecular size.
Figure 4
Different methods for generating priority lists from the clustering results. m is the total number of conformers in the conformational searching output file.
Figure 5
Illustration of the pipeline-ascent priority list generating method for a molecule with five conformers: from the clustering result (i.e., lists of conformer clusters) to the final priority list. At each new value of n_clusters, we add the lowest-energy structures from every cluster without a member on the priority list. The blue coloring indicates the most stable conformer from each cluster by FF energy in the list of conformer clusters. The green coloring indicates a conformer that has not appeared before and is added to the priority list at the corresponding n_clusters value.
Figure 6
Process flowchart for obtaining the DFT list for evaluation. In the above example, conformers _E_FF5, _E_FF2, and _E_FF10 have the same structure after re-optimization at the DFT level with RMSD values less than 0.005 Å. Conformer E_FF5 has the lowest Δ_G(DFT) among the three and thus is prioritized over conformers _E_FF2 and _E_FF10. The same logic applies to the [_E_FF1, _E_FF6] cluster.
Figure 7
Tests for evaluating the performance of priority generation approaches in prioritizing the most stable conformer ((A) the global minimum test) and unique conformer structures ((B) the bins test).
Figure 8
Workflow for evaluating pipeline and non-pipeline priority list generation approaches using the DFT dataset.
Figure 9
Histogram of duplicate conformer appearance frequency over _r_opt. The pipeline-ascent approach arranges the duplicate conformers toward the back of the priority list.
Figure 10
Global minimum test (_P_GMT) and bins test (Δ_a_bins) results are presented together in a scatter plot. The tests for the random approach have been repeated 5 times, which contributes to the 5 data points under this category. The corresponding hyperparameters (i.e., n, x, or Q) are labeled for selective approaches.
Figure 11
Overall parameter results for the pipeline and non-pipeline approaches are compared with the box plot.
Figure 12
Mole amount of the last optimized conformer relative to the current global minimum (χnew) vs ratio of optimized conformer (_r_opt) plot. Each complete χnew vs _r_opt plot gives m sets of a descriptor array and label, where m is the total number of conformers from conformational searches.
Figure 13
_p_LR vs _r_opt plot from the cross-validation test of the global LR model using input descriptor arrays derived from the default CONFPASS priority list generation approach. _p_LR is the probability value from the LR model. A histogram is given for the true and false predictions by their _r_opt values.
Figure 14
Percentage confidence (%Conf) vs _r_opt plot. The light blue data points come from individual molecules in the DFT dataset. These data points were separated into 100 bins of equal size according to the _r_opt values. The mean %Conf was found for each bin and plotted above in black. Examples of the %Conf vs _r_opt plot for individual molecules are also presented.
Figure 15
Usage recommendations of CONFPASS. m is the total number of conformers and v is the number of re-optimized conformers. We recommend that the cut-off for terminating the re-optimization process should be higher than 80%. Depending on the computer system, the user of CONFPASS may also wish to re-optimize FF structures in batches.
Figure 16
CONFPASS at work: applications of CONFPASS in exploring the conformational space of 5 from the Hutchison dataset. The black data points on the plots are known to the user at their wish after re-optimizing 42% of the conformers following the default CONFPASS priority list (pipeline-mix, x = 0.8 and Q = 0.2). The complete data table is presented in
Table S12
.
References
- Mazzanti A.; Casarini D. Recent Trends in Conformational Analysis. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2012, 2, 613–641. 10.1002/wcms.96. -DOI
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources