CONFPASS: Fast DFT Re-Optimizations of Structures from Conformation Searches - PubMed (original) (raw)

CONFPASS: Fast DFT Re-Optimizations of Structures from Conformation Searches

Ching Ching Lam et al. J Chem Inf Model. 2023.

Abstract

CONFPASS (Conformer Prioritizations and Analysis for DFT re-optimizations) has been developed to extract dihedral angle descriptors from conformational searching outputs, perform clustering, and return a priority list for density functional theory (DFT) re-optimizations. Evaluations were conducted with DFT data of the conformers for 150 structurally diverse molecules, most of which are flexible. CONFPASS gives a confidence estimate that the global minimum structure has been found, and based on our dataset, we can have 90% confidence after optimizing half of the FF structures. Re-optimizing conformers in order of the FF energy often generates duplicate results; using CONFPASS, the duplication rate is reduced by a factor of 2 for the first 30% of the re-optimizations, which include the global minimum structure about 80% of the time.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1

Computational workflow to understand chemical reactivities.

Figure 2

Sometimes, FF structures ordered by energy are re-optimized at a DFT level to a different order and a different number of structures.

Figure 3

Profile of the 150 molecules in the DFT dataset. The 150 molecules include the 20 Grayson dataset molecules, 3 radical molecules, and 127 randomly picked molecules from the Hutchison dataset. The histograms show that the 150 organic molecules are generally flexible and structurally diverse in terms of molecular size.

Figure 4

Different methods for generating priority lists from the clustering results. m is the total number of conformers in the conformational searching output file.

Figure 5

Illustration of the pipeline-ascent priority list generating method for a molecule with five conformers: from the clustering result (i.e., lists of conformer clusters) to the final priority list. At each new value of n_clusters, we add the lowest-energy structures from every cluster without a member on the priority list. The blue coloring indicates the most stable conformer from each cluster by FF energy in the list of conformer clusters. The green coloring indicates a conformer that has not appeared before and is added to the priority list at the corresponding n_clusters value.

Figure 6

Process flowchart for obtaining the DFT list for evaluation. In the above example, conformers _E_FF5, _E_FF2, and _E_FF10 have the same structure after re-optimization at the DFT level with RMSD values less than 0.005 Å. Conformer E_FF5 has the lowest Δ_G(DFT) among the three and thus is prioritized over conformers _E_FF2 and _E_FF10. The same logic applies to the [_E_FF1, _E_FF6] cluster.

Figure 7

Tests for evaluating the performance of priority generation approaches in prioritizing the most stable conformer ((A) the global minimum test) and unique conformer structures ((B) the bins test).

Figure 8

Workflow for evaluating pipeline and non-pipeline priority list generation approaches using the DFT dataset.

Figure 9

Histogram of duplicate conformer appearance frequency over _r_opt. The pipeline-ascent approach arranges the duplicate conformers toward the back of the priority list.

Figure 10

Global minimum test (_P_GMT) and bins test (Δ_a_bins) results are presented together in a scatter plot. The tests for the random approach have been repeated 5 times, which contributes to the 5 data points under this category. The corresponding hyperparameters (i.e., n, x, or Q) are labeled for selective approaches.

Figure 11

Overall parameter results for the pipeline and non-pipeline approaches are compared with the box plot.

Figure 12

Mole amount of the last optimized conformer relative to the current global minimum (χnew) vs ratio of optimized conformer (_r_opt) plot. Each complete χnew vs _r_opt plot gives m sets of a descriptor array and label, where m is the total number of conformers from conformational searches.

Figure 13

_p_LR vs _r_opt plot from the cross-validation test of the global LR model using input descriptor arrays derived from the default CONFPASS priority list generation approach. _p_LR is the probability value from the LR model. A histogram is given for the true and false predictions by their _r_opt values.

Figure 14

Percentage confidence (%Conf) vs _r_opt plot. The light blue data points come from individual molecules in the DFT dataset. These data points were separated into 100 bins of equal size according to the _r_opt values. The mean %Conf was found for each bin and plotted above in black. Examples of the %Conf vs _r_opt plot for individual molecules are also presented.

Figure 15

Usage recommendations of CONFPASS. m is the total number of conformers and v is the number of re-optimized conformers. We recommend that the cut-off for terminating the re-optimization process should be higher than 80%. Depending on the computer system, the user of CONFPASS may also wish to re-optimize FF structures in batches.

Figure 16

CONFPASS at work: applications of CONFPASS in exploring the conformational space of 5 from the Hutchison dataset. The black data points on the plots are known to the user at their wish after re-optimizing 42% of the conformers following the default CONFPASS priority list (pipeline-mix, x = 0.8 and Q = 0.2). The complete data table is presented in

Table S12

References

1. Mazzanti A.; Casarini D. Recent Trends in Conformational Analysis. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2012, 2, 613–641. 10.1002/wcms.96. -DOI
1. Peng Q.; Duarte F.; Paton R. S. Computing Organic Stereoselectivity-from Concepts to Quantitative Calculations and Predictions. Chem. Soc. Rev. 2016, 45, 6093–6107. 10.1039/c6cs00573j. -DOI -PubMed
1. Crawford J. M.; Sigman M. S. Conformational Dynamics in Asymmetric Catalysis: Is Catalyst Flexibility a Design Element?. Synthesis 2019, 51, 1021–1036. 10.1055/s-0037-1611636. -DOI -PMC -PubMed
1. Sohtome Y.; Nagasawa K. Dynamic Asymmetric Organocatalysis: Cooperative Effects of Weak Interactions and Conformational Flexibility in Asymmetric Organocatalysts. Chem. Commun. 2012, 48, 7777–7789. 10.1039/C2CC31846F. -DOI -PubMed
1. Müller C. E.; Wanka L.; Jewell K.; Schreiner P. R. Enantioselective Kinetic Resolution of Trans-Cycloalkane-1,2-Diols. Angew. Chem., Int. Ed. 2008, 47, 6180–6183. 10.1002/anie.200800641. -DOI -PubMed

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

CONFPASS: Fast DFT Re-Optimizations of Structures from Conformation Searches - PubMed (original) (raw)