NEWS (original) (raw)
- Wrap additional examples in
donttest - Remove excessively sized mock_sim_mats_list.rda
Improvements
- calc_nmis now supports parallel processing, progress reported through progressr
- batch_snf_subsamples re-written to parallelize along subsamples rather than cluster solutions, now uses progressr for progress instead of verbose cat statements
- speed up parallelization test
New data
- New mock data objects in the format of
mock_(class name), e.g.,mock_data_listandmock_ext_solutions_df
New functions
- Add several new S3 methods for plot, rbind, str, summary, t, c, extraction, merge, assignment, and type-coercion
Bug fixes
auto_plotoutput data frame doesn’t duplicate cluster column- error catching: data list sub-item name checking improvement
- double transposing
ext_solutions_dfno longer losessim_mats_listattribute
Other
- Typo fixes
- Code formatting
- Computationally intensive examples are now wrapped in
donttestrather than commented out observations(),summary_features(),features(),uids()marked as internal
Bug fixes
- fixed
rbindfor classessolutions_dfandext_solutions_dfnot preserving the class type of the containedweights_matrix
Print formatting
- printing
solutions_dforext_solutions_dfrestricts output to 10 line max by default
Other
- update for CRAN resubmission
Bug fixes
- calc_aris (as of v2, v1 is still fine) incorrectly excluded the first observation from ARI calculations.
- merge.data_list wasn’t properly integrating updated parameter names
- prevent solutions_df and ext_solutions_df from having 0 rows
- use
solutioncolumn inmc_manhattan_plot()when extended solutions data frame has no MC labels
Code formatting
- print.solutions_df title was set as print method for
weights matrix - replace dl_1/dl_2 with x&y for consistency in
merge.data_list()
New functions
- added
as.list()fordist_fns_list,clust_fns_list, anddata_listobjects
Performance improvements
- convert weights matrix to a regular matrix prior to printing reduces print time
- same as last commit
- weights matrix rbinding is faster when treated as a matrix
Print formatting
- deprecated message on
generate_settings_matrixneeded paste0 - solutions data frame printing above 10 rows will default to 10 rows
print.solutions_df()misprinted the number of observations in the solutions data frame
OOP
merge_dls()is superseded bymerge.data_lists()
Bug fixes
ext_solutions_dfmanipulation won’t dropsummary_featuresandfeaturesattributesestimate_nclust_given_graphhas more resiliency to floating point errors through tryCatch statement during eigengap quality assignment- bugfix:
estimate_nclust_given_graphhas more resiliency to floating point errors through tryCatch loop updating eigenvalue scaling - added functions: added
dplyr_row_slice()functions for classessolutions_dfandext_solutions_df
Formatting
- removed debugging dash lines from
extend_solutions()
Bug fixes
extend_solutionswas not assigning feature types properly during p-value calculationsrbind.ext_solutions_dfnow takes...parameter beforereset_indicesparameter to avoid error during calls with unnamed parameters.rbind.solutions_dfnow takes...parameter beforereset_indicesparameter to avoid error during call without named parameters.- slicing
snf_configobject made weights matrix lose its class
Breaking changes
- Extensive changes as a result of a transition to making use of R’s S3 OOP system.
Name changes and new classes
- data list (class
list) -> (classdata_list,list) - solutions matrix (class
data.frame) -> solutions data frame (classsolutions_df,data.frame) - extended solutions matrix (class
data.frame) -> extended solutions data frame (classext_solutions_df,data.frame) - settings matrix -> settings data frame (class
data.frame) -> (classext_solutions_df,data.frame) - distance metrics list (class
list) -> distance functions list (classdist_fns_list,list) - clustering algorithms list (class
list) -> clustering functions list (classclust_fns_list,list) - weights matrix (class
matrix,array) -> (classweights_matrix,matrix,array)
Function changes
generate_data_list()->data_list()- Functions related to converting a solutions matrix into a data frame of cluster solutions (
get_cluster_df(),get_clusters(),get_cluster_solutions()) now all superseded by custom transposition ofsolutions_dfclass objects (i.e., simply callt())
Workflow changes
- Functionality offered by the settings matrix, distance metrics list, clustering algorithms list, weights matrix, and corresponding functions (
generate_settings_matrix(),generate_distance_metrics_list(),generate_weights_matrix(),generate_clust_algs_list()) now all superseded by single functionsnf_config()and thesnf_configclass object it produces - Following derivation of a
split_vector, either byadjusted_rand_index_heatmap()orshiny_annotator(),solutions_dfandext_solutions_dfclass objects can be annotated with their meta cluster labels using the functionlabel_meta_clusters(). This is necessary prior to usage ofget_representative_solutions(). - Functions that convert non-data frame objects, like a data list, to a data frame, have been replaced with
as.data.frame() - Requesting similarity matrices are returned during
batch_snfno longer changes the output structure from a solutions data frame to a list of a solutions data frame and a similarity matrix list. Instead, the similarity matrix list is added to the solutions data frame as an attribute and can be extracted using the functionsim_mats_list().
Improvements
- Significant speed improvement to
calculate_coclustering()function - The p-value heatmap now follows a uni-color palette.
- Customized
print()functions have been defined for all major metasnf objects. - Examples have been added to all major metasnf functions.
- update settings matrix vignette to avoid convergence error on some seeds
- inclusion column bugfixes from 1.1.0
- Verbose parameter added to printing functions. By default set to FALSE.
- CRAN compliant @return values in documentation.
Last update before CRAN submission.
Breaking changes
- Changing seed during settings matrix generation has been deprecated. Please manually call
set.seedprior togenerate_settings_matrixinstead.
Other
- Package size reduced by downscaling vignette images
Bug fix
- Function
estimate_nclust_given_graph()occasionally yielded incorrect number of cluster estimates as a result of improper scaling in metasnf v0.7.0. The scaling should be corrected now.
Breaking changes
- Considerable changes have been made to the co-clustering workflow, including new heatmap and density plot.
Possible breaking changes
- Occasionally, spectral clustering results may yield an n-cluster solution where n differed from the number of clusters requested as a parameter in the spectral clustering function itself. Now, the spectral clustering functions provided in metasnf have been updated to report the actual number of clusters in the generated solution, rather than the number of clusters that was requested
Minor changes
- warnings provided when generating a data list with duplicate feature names
- warnings provided when using
mc_manhattan_plot()with a data list containing duplicate feature names mc_manhattan_plot()parameterrep_solutionreplaced with more accurate nameextended_solutions_matrix(solutions matrix with _pval columns)
Bug fix
SNFtool::estimateNumberOfClustersGivenGraph()could occasionally error out on the basis of calculating eigenvectors (eigengap heuristic) for a Laplacian with floating point values that were too small. Adapted functionestimate_nclust_given_graph()slightly scales up Laplacian to reduce the risk of encountering this error (presumably without any change to resulting cluster number estimate)
New functionality
get_matrix_orderhas arguments allowing users to control which distance metric and agglomerative hierarchical clustering methods are used to sort matrices
Minor changes
- More consistent usage of “feature” over “variable” across documentation.
- New mock ABCD dataframes - like the old ones, but without the “abcd_” prefix and with a more accurate “unique_id” UID column rather than “patient”
New functionality
get_complete_uidsquickly pulls UIDs of observations with complete data from a list of dataframes
Bug fix
extend_solutionsdoesn’t crash on multi-feature target lists
Minor changes
- Warning message provided when subjects are dropped during
generate_data_list() - New
remove_missingparameter forgenerate_data_listallowing subjects with incomplete data to remain in the data list
Bug fixes
- ensure cluster variable is treated as factor during autoplotting
- bugfix on autoplots built from tibbles rather than dataframes
Improvements
- Added clarity to
lp_solutions_matrixerror message when training set is not subset of full data list generate_data_listlist elements now are named after their components- added heatmap parameters to increase plotting flexibility
New functionality
- added generic save_plot function and option to pass cluster_df directly into auto_plot (useful for label propagation)
- add
merge_data_listsfunctionality to horizontally merge data lists
Bug fixes
extend_solutions()will no longer crash when a data_list has the UID column in non-first position.generate_data_list()enforces the UID column to be in first position of each dataframe.
New functionality
auto_plot()will automatically generate bar and/or jitter plots showing how features in a data_list/target_list are distributed across a single cluster solution
New functionality
shiny_annotator()function can be used to identify indices of meta clusters within anadjusted_rand_index_heatmapadjusted_rand_index_heatmap()now has asplit_vectorparameter that will slice a heatmap into meta clustersrename_dl()can be used to rename features in a data_listmanhattan_plothas been split intovar_manhattan_plot(key variable - all variables),esm_manhattan_plot(cluster solutions in an extended solutions matrix to all variables), andmc_manhattan_plot(likeesm_manhattan_plot, but at the meta-cluster level)get_representative_solutionsextracts max-ARI solutions from an extended solutions matrix based on asplit_vectorcontaining meta cluster boundariesbatch_nmicalculates NMI scores (see https://branchlab.github.io/metasnf/articles/nmi\_scores.html)extend_solutionswill only calculate p-value summary measures (min/max/mean) for data_list passed in as atarget_listparameter, but will also accept and calculate p-values for a data_list passed in through thedata_listparameter- heatmap function
adjusted_rand_index_heatmapandassoc_pval_heatmaphave updated parameters to improve ease of use and flexibility (including easier colour control)
Deprecated functions
get_clustered_subshas been removed (does the same thing asget_cluster_df)get_cluster_pvaldeprecated forcalc_assoc_pval- All functions related to target_lists specifically have been deprecated in favour of simply using
generate_data_list()and its corresponding functions
Name changes
remove_signalhas been renamed tolinear_adjustto better reflect its functionsummarize_distance_metrics_listhas been shortened tosummarize_dmlcorrelation_pval_heatmaphas been renamed toassoc_pval_heatmapcalc_om_arishas been renamed tocalc_aris
New vignettes
- NMI scores: https://branchlab.github.io/metasnf/articles/nmi\_scores.html
- Imputations: https://branchlab.github.io/metasnf/articles/imputations.html
Other changes
- Vignettes have been updated
- Warnings are raised if spectral clustering does not generate a cluster solution matching the number of clusters requested
- Chi-squared and
extend_solutionsp-value calculation warnings are now suppressed
Breaking changes
- All variables and values referencing p-values have been rephrased to end in
_pvalinstead of a mix ofp_val,pval, andp. - Removal of deprecated functions
pval_select,p_val_select,top_oms_per_cluster,check_subj_orders_for_lp,get_p,chi_sq_pval, - Function
pval_summaries, which would calculate min/max/mean p-values, has been replaced withsummarize_pvals train_test_assignnow provides results as named list of subject vectors instead of a data.frame.keep_splitfunction has been removed accordingly.
Other changes
sort_subjectsparameter added togenerate_data_listto allow for sorting of subjects in the data_list- fix bug in extend_solutions that incorrectly assigns p-values to variable columns through grep (substring instead of exact match)
extend_solutionscan now also be parallelized (see ?extend_solutions)remove_signalfunction hassig_digsparameter that can be used to restrict how many significant figures are returned in the resulting residualscalc_om_arisis now MUCH faster after removing excessive calls toas.numericand enabling parallel processing withfuture.apply. Thanks for the idea, Alper.- Reformatting of
extend_solutionsto better handle extreme p-values (e.g. infinity) - Replacement of
p_val_selectwithpval_selectwhich can also return negative-log p-values
Bug fixes
generate_data_listcorrectly errors when components are only partially named (resolves https://github.com/BRANCHlab/metasnf/issues/10)
Breaking changes
lp_rowfunction has been replaced bylp_solutions_matrix. The new function is order agnostic: full data lists can be constructed without any restriction on how training and testing set subjects are sorted. Subjects present in the provided solutions matrix to propagate are assumed to be the training subjects.
New functionality
calc_om_arisnow hasprogressparameter. When set to true and used in conjunction withprogressr::with_progress(), a progress bar is shown for the calculations. Learn more with?calc_om_aris.
Bug fixes
greplinstead ofgrepused inextend_solutionsto reduce errors when no chi-squared warning occurs
Other changes
- A vignette specifically for label propagation has been added
- Full removal of several previously deprecated functions
- Minor source code reformatting
New functionality
- Parallel processing is now available! Check out the vignette here: https://branchlab.github.io/metasnf/articles/parallel\_processing.html
Breaking changes
- input_wt and domain_wt are removed from settings_matrix and rest of package - weighting at this level is no longer planned. This will result in altered settings matrices, but only superficially - the columns “input_wt” and “domain_wt” will be missing, but had no effect on the SNF prior to this patch anyway.
keep_splitwill preserve observations who were assigned a split but were not present in the dataframe being split. Instead of being removed, those observations will have NA values.
Bug fixes
- fixed
fraction_clustered_togethercrashing when a cluster was assigned to only a single observation - fixed
fraction_clustered_togethernot running due to bracket typo when evaluating length of the data_list
New functionality
correlation_pval_heatmapfunction can have significance stars disabled withsignificance_starsparameter
Other changes
- pkgdown site now has google site verification code
Breaking changes
- The original SNFtool function
estimateNumberOfClustersGivenGraphhas been used up to this point without specifying a parameter forNUMC. Consequently, final similarity matrices clustered with the default methods (spectral clustering based on eigen-gap or rotation cost heuristics) were not capable of resulting in more than 5 clusters. The default functions have been updated to span 2 clusters to 10 clusters. Users will likely see different clustering results as a result of this change. To replicate the behaviour of default spectral clustering prior to v0.3.0, users should copy the following code prior to the batch_snf command:
clust_algs_list <- generate_clust_algs_list(
"spectral_eigen" = spectral_eigen_classic,
"spectral_rot" = spectral_rot_classic
)
# Adapt below as necessary
solutions_matrix <- batch_snf(
data_list,
settings_matrix,
clust_algs_list = clust_algs_list
)- Added “workspace=2e7” parameter to
fisher_exact_pvalfunction to avoid “FEXACT” error (like here https://github.com/Lagkouvardos/Rhea/issues/17). Impact on results is expected to be negligible.
New functionality
- Function
remove_signal()enables correcting a data_list linearly for confounders / unwanted signal. Vignette is available: https://branchlab.github.io/metasnf/articles/confounders.html. batch_snf()has new parameterautomatic_standard_normalizeto switch out the default numeric distance measures (euclidean) with standard normalized variants.
Other changes
- Added a
NEWS.mdfile to track changes to the package.