Introduction to the AnVILAz package (original) (raw)
Installation
The package is not yet available from Bioconductor.
Install the development version of the AnVILAz package from GitHub with
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager", repos = "https://cran.r-project.org")
BiocManager::install("Bioconductor/AnVILAz")
Once installed, load the package with
library(AnVILAz)
File Management
For this tutorial we will refer to the Azure Blob Storage service as ABS. Within the ABS, we are given access to a Container. For more information, follow this linkto Microsoft’s definition of containers and blobs.
List Azure Blob Storage Container Files
avlist()
The avlist
command corresponds to a view of the files in the Blob container on Azure. They can also be accessed via theMicrosoft Azure Storage Explorer.
Azure Storage Explorer
Uploading a file
As an example, we load the internal mtcars
dataset and save it as an .Rda
file with save
. We can then upload this file to the ABS.
data("mtcars", package = "datasets")
test <- head(mtcars)
save(test, file = "mydata.Rda")
Now we can upload the data to the analyses/
folder in the Azure Blob Storage (ABS) Container.
avcopy("mydata.Rda", "analyses/")
We can also use a small log file for demonstration purposes. The jupyter.log
file is already present in our workspace directory.
avcopy("jupyter.log", "analyses/")
Deleting a file
We can remove the data with avremove
and the relative path to the .Rda
file.
avremove("analyses/mydata.Rda")
Downloading from the ABS
The reverse operation is also possible with a remote and local paths as the first and second arguments, respectively.
avcopy("analyses/jupyter.log", "./test/")
Folder-wise upload to ABS
To upload an entire folder, we can use avbackup
. Note that the entire test
folder becomes a subfolder of the remote analyses
folder in this example.
avbackup("./test/", "analyses/")
Folder-wise download from ABS
By default, the entire source
directory will be copied to the current working directory "."
, i.e., the base workspace directory.
avrestore("analyses/test")
You may also move this to another folder by providing a folder name as the second argument.
avrestore("analyses/test", "test")
The DATA
tab
mtcars
example
First we create an example dataset for uploading to the DATA
tab. We create amodel_id
column from the rownames
.
library(dplyr)
mtcars_tbl <-
mtcars |>
as_tibble(rownames = "model_id") |>
mutate(model_id = gsub(" ", "-", model_id))
Uploading data
The avtable_import
command takes an existing R object (usually a tibble
) and uploads to the DATA
tab in the AnVIL User Interface. The table
argument will set the name of the table. We also need to provide the primaryKey
which corresponds to the column name that uniquely identifies each row in the data. Typically, the primaryKey
column provides a list of patient or UUID identifiers and is in the first column of the data.
mtcars_tbl |> avtable_import(table = "testData", primaryKey = "model_id")
Downloading data
The avtable
function will pull the data from the DATA
tab and represent the data locally as a tibble
. It works by using the same type
identifier (i.e., the table
argument) that was used when the data was uploaded.
model_data <- avtable(table = "testData")
head(model_data)
Delete a row in the table
The API allows deletion of specific rows in the data usingavtable_delete_values
. To indicate which row to delete, provide the a value or set of values that correspond to row identifiers in the primaryKey
. In this example, we remove the AMC-Javelin
entry from the data. We are left with 31 records.
avtable_delete_values(table = "testData", values = "AMC-Javelin")
Delete entire table
To remove the entire table from the DATA
tab, we can use the avtable_delete
method with the corresponding table name.
avtable_delete(table = "testData")
Session information
## R version 4.5.0 RC (2025-04-04 r88126)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] AnVILAz_1.2.0 BiocStyle_2.36.0
##
## loaded via a namespace (and not attached):
## [1] vctrs_0.6.5 httr_1.4.7 cli_3.6.4
## [4] knitr_1.50 rlang_1.1.6 xfun_0.52
## [7] jsonlite_2.0.0 glue_1.8.0 rjsoncons_1.3.2
## [10] htmltools_0.5.8.1 BiocBaseUtils_1.10.0 sass_0.4.10
## [13] rmarkdown_2.29 rappdirs_0.3.3 evaluate_1.0.3
## [16] jquerylib_0.1.4 tibble_3.2.1 fastmap_1.2.0
## [19] yaml_2.3.10 lifecycle_1.0.4 httr2_1.1.2
## [22] bookdown_0.43 BiocManager_1.30.25 compiler_4.5.0
## [25] pkgconfig_2.0.3 digest_0.6.37 R6_2.6.1
## [28] pillar_1.10.2 magrittr_2.0.3 bslib_0.9.0
## [31] tools_4.5.0 AnVILBase_1.2.0 cachem_1.1.0