GitHub - SomaLogic/SomaDataIO: The SomaDataIO package loads and exports 'SomaScan' data via the 'SomaLogic Operating Co., Inc.' proprietary data file, called an ADAT ('*.adat'). The package also exports auxiliary functions for manipulating, wrangling, and extracting relevant information from an ADAT object once in memory. (original) (raw)

SomaDataIO SomaDataIO website

GitHub version CRAN status Downloads R-CMD-check Codecov test coverage Lifecycle: stable License: MIT

The SomaDataIO R package loads and exports ‘SomaScan’ data via the Standard BioTools, Inc. structured text file called an ADAT (*.adat). The package also exports auxiliary functions for manipulating, wrangling, and extracting relevant information from an ADAT object once in memory. Basic familiarity with the R environment is assumed, as is the ability to install contributed packages from the Comprehensive R Archive Network (CRAN).

If you run into any issues/problems with SomaDataIO full documentation of the most recentrelease can be found at our website of articles and workflows. If the issue persists we encourage you to consult theissues page and, if appropriate, submit an issue and/or feature request.


Usage

The SomaDataIO package is licensed under theMITlicense and is intended solely for research use only (“RUO”) purposes. The code contained herein may not be used for diagnostic, clinical, therapeutic, or other commercial purposes.

Installation

The easiest way to install SomaDataIO is to install directly from CRAN:

install.packages("SomaDataIO")

Alternatively from GitHub:

remotes::install_github("SomaLogic/SomaDataIO")

which installs the most current “development” version from the repository HEAD. To install the most recent release, use:

remotes::install_github("SomaLogic/SomaDataIO@*release")

To install a specific tagged release, use:

remotes::install_github("SomaLogic/SomaDataIO@v5.3.0")

Package Dependencies

The SomaDataIO package was intentionally developed to contain a limited number of dependencies from CRAN. This makes the package more stable to external software design changes but also limits its contained feature set. With this in mind, SomaDataIO aims to strike a balance providing long(er)-term stability and a limited set of features. Below are the package dependencies (see also theDESCRIPTIONfile):

Biobase

The Biobase package is suggested, being required by only two functions, pivotExpressionSet() and adat2eSet().Biobasemust be installed separately fromBioconductor by entering the following from the R Console:

if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("Biobase", version = remotes::bioc_version())

Information about Bioconductor can be found here:https://bioconductor.org/install/

Loading

Upon successful installation, load SomaDataIO as normal:

For an index of available commands:

library(help = SomaDataIO)


Objects and Data

The SomaDataIO package comes with five (5) objects available to users to run canned examples (or analyses). They can be accessed onceSomaDataIO has been attached via library(). They are:


Main (I/O) Features

Loading an ADAT

Loading an ADAT text file is simple using read_adat():

Note: This system.file() command returns a filepath to the example_data10

object in the SomaDataIO package

adat_path <- system.file("extdata", "example_data10.adat", package = "SomaDataIO", mustWork = TRUE) adat_path #> [1] "/Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library/SomaDataIO/extdata/example_data10.adat"

adat_path should be the elaborated path and file name of the *.adat file to

be loaded into the R workspace from your local file system

(e.g. adat_path = "PATH_TO_ADAT/my_adat.adat")

my_adat <- read_adat(file = adat_path)

test object class

is.soma_adat(my_adat) #> [1] TRUE

S3 print method (forwards -> tibble)

my_adat #> ══ SomaScan Data ═══════════════════════════════════════════════════════════════ #> SomaScan version V4 (5k) #> Signal Space 5k #> Attributes intact ✓ #> Rows 10 #> Columns 5318 #> Clinical Data 34 #> Features 5284 #> ── Column Meta ───────────────────────────────────────────────────────────────── #> ℹ SeqId, SeqIdVersion, SomaId, TargetFullName, Target, UniProt, EntrezGeneID, #> ℹ EntrezGeneSymbol, Organism, Units, Type, Dilution, PlateScale_Reference, #> ℹ CalReference, Cal_Example_Adat_Set001, ColCheck, #> ℹ CalQcRatio_Example_Adat_Set001_170255, QcReference_170255, #> ℹ Cal_Example_Adat_Set002, CalQcRatio_Example_Adat_Set002_170255, Dilution2 #> ── Tibble ────────────────────────────────────────────────────────────────────── #> # A tibble: 10 × 5,319 #> row_names PlateId PlateRunDate ScannerID PlatePosition SlideId Subarray #> #> 1 258495800012_3 Example… 2020-06-18 SG152144… H9 2.58e11 3 #> 2 258495800004_7 Example… 2020-06-18 SG152144… H8 2.58e11 7 #> 3 258495800010_8 Example… 2020-06-18 SG152144… H7 2.58e11 8 #> 4 258495800003_4 Example… 2020-06-18 SG152144… H6 2.58e11 4 #> 5 258495800009_4 Example… 2020-06-18 SG152144… H5 2.58e11 4 #> 6 258495800012_8 Example… 2020-06-18 SG152144… H4 2.58e11 8 #> 7 258495800001_3 Example… 2020-06-18 SG152144… H3 2.58e11 3 #> 8 258495800004_8 Example… 2020-06-18 SG152144… H2 2.58e11 8 #> 9 258495800001_8 Example… 2020-06-18 SG152144… H12 2.58e11 8 #> 10 258495800004_3 Example… 2020-06-18 SG152144… H11 2.58e11 3 #> # ℹ 5,312 more variables: SampleId , SampleType , #> # PercentDilution , SampleMatrix , Barcode , Barcode2d , #> # SampleName , SampleNotes , AliquotingNotes , #> # SampleDescription , … #> ════════════════════════════════════════════════════════════════════════════════

Please see the article Loading and Wrangling SomaScanfor more details and options.

Wrangling

The soma_adat class comes with numerous class-specific S3 methods to the most popular dplyr andtidyr generics.

see full complement of soma_adat methods

methods(class = "soma_adat") #> [1] [ [[ [[<- [<- ==
#> [6] $ $<- anti_join arrange count
#> [11] filter full_join getAdatVersion getAnalytes getMeta
#> [16] group_by inner_join is_seqFormat left_join Math
#> [21] median merge mutate Ops print
#> [26] rename right_join row.names<- sample_frac sample_n
#> [31] semi_join separate slice_sample slice summary
#> [36] Summary transform ungroup unite
#> see '?methods' for accessing help and source code

Merging Sample Annotation Data

The example_data object includes some sample annotation data built-in, with the variables Age and Sex included for clinical samples, but in practice ADAT files generally do not have any clinical or sample annotation data fields included.

To merge sample annotation data into an existing soma_adat class object, use the left_join() method. Here, joining the ex_clin_dataobject adds in two additional clinical variables, smoking_status andalcohol_use:

clin_path should be the elaborated path and file name of the *.csv or

similar file to be loaded into the R workspace from your local file system

(e.g. clin_path = "PATH_TO_CLIN/clin_data.csv")

clin_data <- readr::read_csv(clin_path)

merged_adat <- my_adat |> dplyr::left_join(ex_clin_data, by = "SampleId")

merged_adat |> dplyr::select(SampleId, Age, Sex, smoking_status, alcohol_use) |> head(n = 3) #> ══ SomaScan Data ═══════════════════════════════════════════════════════════════ #> SomaScan version V4 (5k) #> Signal Space 5k #> Attributes intact ✓ #> Rows 3 #> Columns 5 #> Clinical Data 5 #> Features 0 #> ── Column Meta ───────────────────────────────────────────────────────────────── #> ℹ SeqId, SeqIdVersion, SomaId, TargetFullName, Target, UniProt, EntrezGeneID, #> ℹ EntrezGeneSymbol, Organism, Units, Type, Dilution, PlateScale_Reference, #> ℹ CalReference, Cal_Example_Adat_Set001, ColCheck, #> ℹ CalQcRatio_Example_Adat_Set001_170255, QcReference_170255, #> ℹ Cal_Example_Adat_Set002, CalQcRatio_Example_Adat_Set002_170255, Dilution2 #> ── Tibble ────────────────────────────────────────────────────────────────────── #> # A tibble: 3 × 6 #> row_names SampleId Age Sex smoking_status alcohol_use #>
#> 1 258495800012_3 1 76 F Never Yes
#> 2 258495800004_7 2 55 F Never Yes
#> 3 258495800010_8 3 47 M Never No
#> ════════════════════════════════════════════════════════════════════════════════

Please see the article Loading and Wrangling SomaScanfor more details about available soma_adat methods.

ADAT structure

The soma_adat object also contains specific structure that are useful to users. Please also see ?colmeta or ?annotations for further details about these fields.


Typical ‘SomaScan’ Analysis

This section now lives in individual package articles. For further detail please see:

Note that, in an effort to reduce package size and dependencies, these articles and workflows are only accessible via the SomaDataIO pkgdown website, and are not included with the installed package.


MIT LICENSE