Apply Functions Over Array Margins via Blocking (original) (raw)
seqBlockApply: Apply Functions Over Array Margins via Blocking
Description Usage Arguments Details Value Author(s) See Also Examples
Returns a vector or list of values obtained by applying a function to margins of genotypes and annotations via blocking.
seqBlockApply(gdsfile, var.name, FUN, margin=c("by.variant"), as.is=c("none", "list", "unlist"), var.index=c("none", "relative", "absolute"), bsize=1024L, parallel=FALSE, .useraw=FALSE, .padNA=TRUE, .tolist=FALSE, .progress=FALSE, ...) |
---|
gdsfile | a SeqVarGDSClass object |
---|---|
var.name | the variable name(s), see details |
FUN | the function to be applied |
margin | giving the dimension which the function will be applied over |
as.is | returned value: a list, an integer vector, etc; return nothing by default as.is="none"; as.is can be aconnection object, or a GDS node gdsn.classobject; if "unlist" is used, produces a vector which contains all the atomic components, via unlist(..., recursive=FALSE) |
var.index | if "none" (by default), call FUN(x, ...)without variable index; if "relative" or "absolute", add an argument to the user-defined function FUN likeFUN(index, x, ...) where index is an index of variant starting from 1 if margin = "by.variant": "relative" for indexing in the selection defined by seqSetFilter,"absolute" for indexing with respect to all data |
bsize | block size |
parallel | FALSE (serial processing), TRUE (multicore processing), numeric value or other value; parallel is passed to the argument cl in seqParallel, seeseqParallel for more details. |
.useraw | TRUE, force to use RAW instead of INTEGER for genotypes and dosages; FALSE, use INTEGER; NA, use RAW instead of INTEGER if possible; for genotypes, 0xFF is missing value if RAW is used |
.padNA | TRUE, pad a variable-length vector with NA if the number of data points for each variant is not greater than 1 |
.tolist | if TRUE, return a list of vectors instead of the structure list(length, data) for variable-length data |
.progress | if TRUE, show progress information |
... | optional arguments to FUN |
The variable name should be "sample.id"
, "variant.id"
,"position"
, "chromosome"
, "allele"
, "genotype"
,"annotation/id"
, "annotation/qual"
, "annotation/filter"
,"annotation/info/VARIABLE_NAME"
, or"annotation/format/VARIABLE_NAME"
.
"@genotype"
, "annotation/info/@VARIABLE_NAME"
or"annotation/format/@VARIABLE_NAME"
are used to obtain the index associated with these variables.
"$dosage"
is also allowed for the dosages of reference allele (integer: 0, 1, 2 and NA for diploid genotypes).
"$dosage_alt"
returns a RAW/INTEGER matrix for the dosages of alternative allele without distinguishing different alternative alleles.
"$num_allele"
returns an integer vector with the numbers of distinct alleles.
"$ref"
returns a character vector of reference alleles
"$alt"
returns a character vector of alternative alleles (delimited by comma)
"$chrom_pos"
returns characters with the combination of chromosome and position, e.g., "1:1272721". "$chrom_pos_allele"
returns characters with the combination of chromosome, position and alleles, e.g., "1:1272721_A_G" (i.e., chr:position_REF_ALT).
"$variant_index"
returns the indices of selected variants starting from 1, and "$sample_index"
returns the indices of selected samples starting from 1.
The algorithm is highly optimized by blocking the computations to exploit the high-speed memory instead of disk.
A vector, a list of values or none.
Xiuwen Zheng
[seqApply](/bioc/SeqArray/man/seqApply.html)
, [seqSetFilter](/bioc/SeqArray/man/seqSetFilter.html)
,[seqGetData](/bioc/SeqArray/man/seqGetData.html)
, [seqParallel](/bioc/SeqArray/man/seqParallel.html)
,[seqGetParallel](/bioc/SeqArray/man/seqGetParallel.html)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | # the GDS file (gds.fn <- seqExampleFileName("gds")) # display (f <- seqOpen(gds.fn)) # get 'sample.id (samp.id <- seqGetData(f, "sample.id")) # "NA06984" "NA06985" "NA06986" ... # get 'variant.id' head(variant.id <- seqGetData(f, "variant.id")) # set sample and variant filters set.seed(100) seqSetFilter(f, sample.id=samp.id[c(2,4,6,8,10)], variant.id=sample(variant.id, 10)) # read seqApply(f, "genotype", FUN=print, margin="by.variant") seqApply(f, "genotype", FUN=print, margin="by.variant", .useraw=TRUE) seqApply(f, "genotype", FUN=print, margin="by.sample") seqApply(f, "genotype", FUN=print, margin="by.sample", .useraw=TRUE) # read in block seqGetData(f, "$dosage") seqBlockApply(f, "$dosage", print, bsize=3) seqBlockApply(f, "$dosage", function(x) x, as.is="list", bsize=3) seqBlockApply(f, c(dos="$dosage", pos="position"), print, bsize=3) # close the GDS file seqClose(f) |
---|
SeqArray documentation built on Nov. 8, 2020, 5:08 p.m.