Building an annotation database for metaseqR2 (original) (raw)
Supported organisms
The following organisms (essentially genome versions) are supported for automatic database builds:
- Human (Homo sapiens) genome version hg38
- Human (Homo sapiens) genome version hg19
- Human (Homo sapiens) genome version hg18
- Mouse (Mus musculus) genome version mm10
- Mouse (Mus musculus) genome version mm9
- Rat (Rattus norvegicus) genome version rn6
- Rat (Rattus norvegicus) genome version rn5
- Fruitfly (Drosophila melanogaster) genome version dm6
- Fruitfly (Drosophila melanogaster) genome version dm3
- Zebrafish (Danio rerio) genome version danRer7
- Zebrafish (Danio rerio) genome version danRer10
- Zebrafish (Danio rerio) genome version danRer11
- Chimpanzee (Pan troglodytes) genome version panTro4
- Chimpanzee (Pan troglodytes) genome version panTro5
- Pig (Sus scrofa) genome version susScr3
- Pig (Sus scrofa) genome version susScr11
- Horse (Equus cabalus) genome version equCab2
- Arabidopsis (Arabidobsis thaliana) genome version TAIR10
Using the local database
Setup the database
By default, the database file will be written in thesystem.file(package="metaseqR2")
directory. You can specify another prefered destination for it using the db
argument in the function call, but if you do that, you will have to supply the localDb
argument pointing to the SQLite database file you created to every metaseqr2 call you perform, otherwise, the pipeline will download and use annotations on-the-fly.
In this vignette, we will build a minimal database comprising only the mouse_mm10_ genome version from Ensembl. The database will be build in a temporary directory inside session tempdir()
.
Important note: As the annotation build function makes use ofKent utilities for creating 3’UTR annotations from RefSeq and UCSC, the latter cannot be built in Windows. Therefore it is advised to either build the annotation database in a Linux system or use our pre-built databases.
library(metaseqR2)
buildDir <- file.path(tempdir(),"test_anndb")
dir.create(buildDir)
# The location of the custom database
myDb <- file.path(buildDir,"testann.sqlite")
# Since we are using Ensembl, we can also ask for a version
organisms <- list(mm10=100)
sources <- ifelse(.Platform$OS.type=="unix",c("ensembl","refseq"),"ensembl")
# If the example is not running in a multicore system, rc is ignored
buildAnnotationDatabase(organisms,sources,forceDownload=FALSE,db=myDb,rc=0.5)
## Opening metaseqR2 SQLite database /tmp/Rtmpzgy7rG/test_anndb/testann.sqlite
## Retrieving genome information for mm10 from ensembl
## Retrieving gene annotation for mm10 from ensembl version 100
## Using Ensembl host https://apr2020.archive.ensembl.org
## Retrieving transcript annotation for mm10 from ensembl version 100
## Using Ensembl host https://apr2020.archive.ensembl.org
## Merging transcripts for mm10 from ensembl version 100
## Retrieving 3' UTR annotation for mm10 from ensembl version 100
## Using Ensembl host https://apr2020.archive.ensembl.org
## Merging gene 3' UTRs for mm10 from ensembl version 100
## Merging transcript 3' UTRs for mm10 from ensembl version 100
## Retrieving exon annotation for mm10 from ensembl version 100
## Using Ensembl host https://apr2020.archive.ensembl.org
## Retrieving extended exon annotation for mm10 from ensembl version 100
## Using Ensembl host https://apr2020.archive.ensembl.org
## Merging exons for mm10 from ensembl version 100
## Merging exons for mm10 from ensembl version 100
Use the database
Now, that a small database is in place, let’s retrieve some data. Remember that since the built database is not in the default location, we need to pass the database file in each data retrieval function. The annotation is retrieved as a GRanges
object by default.
# Load standard annotation based on gene body coordinates
genes <- loadAnnotation(genome="mm10",refdb="ensembl",level="gene",type="gene",
db=myDb)
genes
## GRanges object with 55364 ranges and 4 metadata columns:
## seqnames ranges strand | gene_id
## <Rle> <IRanges> <Rle> | <character>
## ENSMUSG00000102693 chr1 3073253-3074322 + | ENSMUSG00000102693
## ENSMUSG00000064842 chr1 3102016-3102125 + | ENSMUSG00000064842
## ENSMUSG00000051951 chr1 3205901-3671498 - | ENSMUSG00000051951
## ENSMUSG00000102851 chr1 3252757-3253236 + | ENSMUSG00000102851
## ENSMUSG00000103377 chr1 3365731-3368549 - | ENSMUSG00000103377
## ... ... ... ... . ...
## ENSMUSG00000095366 chrY 90752427-90755467 - | ENSMUSG00000095366
## ENSMUSG00000095134 chrY 90753057-90763485 + | ENSMUSG00000095134
## ENSMUSG00000096768 chrY 90784738-90816465 + | ENSMUSG00000096768
## ENSMUSG00000099871 chrY 90837413-90844040 + | ENSMUSG00000099871
## ENSMUSG00000096850 chrY 90838869-90839177 - | ENSMUSG00000096850
## gc_content gene_name biotype
## <numeric> <character> <character>
## ENSMUSG00000102693 34.21 4933401J01Rik TEC
## ENSMUSG00000064842 36.36 Gm26206 snRNA
## ENSMUSG00000051951 38.51 Xkr4 protein_coding
## ENSMUSG00000102851 39.79 Gm18956 processed_pseudogene
## ENSMUSG00000103377 40.79 Gm37180 TEC
## ... ... ... ...
## ENSMUSG00000095366 41.37 Gm21860 lincRNA
## ENSMUSG00000095134 46.85 Mid1-ps1 unprocessed_pseudogene
## ENSMUSG00000096768 46.16 Gm47283 lincRNA
## ENSMUSG00000099871 43.39 Gm21742 unprocessed_pseudogene
## ENSMUSG00000096850 48.87 Gm21748 protein_coding
## -------
## seqinfo: 21 sequences from mm10 genome
# Load standard annotation based on 3' UTR coordinates
utrs <- loadAnnotation(genome="mm10",refdb="ensembl",level="gene",type="utr",
db=myDb)
utrs
## GRanges object with 228087 ranges and 4 metadata columns:
## seqnames ranges strand | transcript_id
## <Rle> <IRanges> <Rle> | <character>
## ENSMUST00000193812 chr1 3074323-3074571 + | ENSMUST00000193812
## ENSMUST00000082908 chr1 3102126-3102374 + | ENSMUST00000082908
## ENSMUST00000162897 chr1 3205652-3205900 - | ENSMUST00000162897
## ENSMUST00000159265 chr1 3206274-3206522 - | ENSMUST00000159265
## ENSMUST00000070533 chr1 3214233-3214481 - | ENSMUST00000070533
## ... ... ... ... . ...
## ENSMUST00000177591 chrY 90816465-90816713 + | ENSMUST00000177591
## ENSMUST00000179077 chrY 90816465-90816713 + | ENSMUST00000179077
## ENSMUST00000238471 chrY 90816466-90816714 + | ENSMUST00000238471
## ENSMUST00000179623 chrY 90838620-90838868 - | ENSMUST00000179623
## ENSMUST00000189352 chrY 90844041-90844289 + | ENSMUST00000189352
## gene_id gene_name biotype
## <character> <character> <character>
## ENSMUST00000193812 ENSMUSG00000102693 4933401J01Rik TEC
## ENSMUST00000082908 ENSMUSG00000064842 Gm26206 snRNA
## ENSMUST00000162897 ENSMUSG00000051951 Xkr4 protein_coding
## ENSMUST00000159265 ENSMUSG00000051951 Xkr4 protein_coding
## ENSMUST00000070533 ENSMUSG00000051951 Xkr4 protein_coding
## ... ... ... ...
## ENSMUST00000177591 ENSMUSG00000096768 Gm47283 lincRNA
## ENSMUST00000179077 ENSMUSG00000096768 Gm47283 lincRNA
## ENSMUST00000238471 ENSMUSG00000096768 Gm47283 lincRNA
## ENSMUST00000179623 ENSMUSG00000096850 Gm21748 protein_coding
## ENSMUST00000189352 ENSMUSG00000099871 Gm21742 unprocessed_pseudogene
## -------
## seqinfo: 21 sequences from mm10 genome
# Load summarized exon annotation based used with RNA-Seq analysis
sumEx <- loadAnnotation(genome="mm10",refdb="ensembl",level="gene",type="exon",
summarized=TRUE,db=myDb)
sumEx
## GRanges object with 291497 ranges and 4 metadata columns:
## seqnames ranges strand |
## <Rle> <IRanges> <Rle> |
## ENSMUSG00000102693_MEX_1 chr1 3073253-3074322 + |
## ENSMUSG00000064842_MEX_1 chr1 3102016-3102125 + |
## ENSMUSG00000051951_MEX_1 chr1 3205901-3207317 - |
## ENSMUSG00000051951_MEX_2 chr1 3213439-3216968 - |
## ENSMUSG00000102851_MEX_1 chr1 3252757-3253236 + |
## ... ... ... ... .
## ENSMUSG00000099871_MEX_1 chrY 90837413-90837520 + |
## ENSMUSG00000096850_MEX_1 chrY 90838869-90839177 - |
## ENSMUSG00000099871_MEX_2 chrY 90841657-90841805 + |
## ENSMUSG00000099871_MEX_3 chrY 90842898-90843025 + |
## ENSMUSG00000099871_MEX_4 chrY 90843878-90844040 + |
## exon_id gene_id
## <character> <character>
## ENSMUSG00000102693_MEX_1 ENSMUSG00000102693_M.. ENSMUSG00000102693
## ENSMUSG00000064842_MEX_1 ENSMUSG00000064842_M.. ENSMUSG00000064842
## ENSMUSG00000051951_MEX_1 ENSMUSG00000051951_M.. ENSMUSG00000051951
## ENSMUSG00000051951_MEX_2 ENSMUSG00000051951_M.. ENSMUSG00000051951
## ENSMUSG00000102851_MEX_1 ENSMUSG00000102851_M.. ENSMUSG00000102851
## ... ... ...
## ENSMUSG00000099871_MEX_1 ENSMUSG00000099871_M.. ENSMUSG00000099871
## ENSMUSG00000096850_MEX_1 ENSMUSG00000096850_M.. ENSMUSG00000096850
## ENSMUSG00000099871_MEX_2 ENSMUSG00000099871_M.. ENSMUSG00000099871
## ENSMUSG00000099871_MEX_3 ENSMUSG00000099871_M.. ENSMUSG00000099871
## ENSMUSG00000099871_MEX_4 ENSMUSG00000099871_M.. ENSMUSG00000099871
## gene_name biotype
## <character> <character>
## ENSMUSG00000102693_MEX_1 4933401J01Rik TEC
## ENSMUSG00000064842_MEX_1 Gm26206 snRNA
## ENSMUSG00000051951_MEX_1 Xkr4 protein_coding
## ENSMUSG00000051951_MEX_2 Xkr4 protein_coding
## ENSMUSG00000102851_MEX_1 Gm18956 processed_pseudogene
## ... ... ...
## ENSMUSG00000099871_MEX_1 Gm21742 unprocessed_pseudogene
## ENSMUSG00000096850_MEX_1 Gm21748 protein_coding
## ENSMUSG00000099871_MEX_2 Gm21742 unprocessed_pseudogene
## ENSMUSG00000099871_MEX_3 Gm21742 unprocessed_pseudogene
## ENSMUSG00000099871_MEX_4 Gm21742 unprocessed_pseudogene
## -------
## seqinfo: 21 sequences from mm10 genome
# Load standard annotation based on gene body coordinates from RefSeq
if (.Platform$OS.type=="unix") {
refGenes <- loadAnnotation(genome="mm10",refdb="refseq",level="gene",
type="gene",db=myDb)
refGenes
}
## Getting latest annotation on the fly for mm10 from refseq
## Retrieving genome information for mm10 from refseq
## Retrieving latest gene annotation for mm10 from refseq
## Loading required namespace: RMySQL
## Converting annotation to GenomicRanges object...
##
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
##
## strsplit
## Getting DNA sequences...
## Getting GC content...
## GRanges object with 23471 ranges and 4 metadata columns:
## seqnames ranges strand | gene_id gc_content
## <Rle> <IRanges> <Rle> | <character> <numeric>
## NM_001011874 chr1 3214481-3671498 - | NM_001011874 38.56
## NM_001370921 chr1 4119865-4360303 - | NM_001370921 38.35
## NM_011441 chr1 4490927-4497354 - | NM_011441 49.75
## NM_001177658 chr1 4773199-4785726 - | NM_001177658 42.59
## NM_008866 chr1 4807822-4846735 + | NM_008866 40.99
## ... ... ... ... . ... ...
## NM_001160135 chrY 67339048-67340047 - | NM_001160135 43.50
## NM_001037748 chrY 72554831-72581058 + | NM_001037748 35.58
## NM_001160144 chrY 77073913-77076246 - | NM_001160144 39.85
## NR_137283 chrY 90751139-90755050 - | NR_137283 41.51
## NR_137282 chrY 90785441-90816465 + | NR_137282 46.22
## gene_name biotype
## <character> <character>
## NM_001011874 Xkr4 NA
## NM_001370921 Rp1 NA
## NM_011441 Sox17 NA
## NM_001177658 Mrpl15 NA
## NM_008866 Lypla1 NA
## ... ... ...
## NM_001160135 Gm20806 NA
## NM_001037748 Gm20736 NA
## NM_001160144 Gm20816 NA
## NR_137283 G530011O06Rik NA
## NR_137282 Erdr1 NA
## -------
## seqinfo: 21 sequences from mm10 genome
Or as a data frame if you prefer using asdf=TRUE
. The data frame however does not contain metadata like Seqinfo
to be used for any susequent validations:
# Load standard annotation based on gene body coordinates
genes <- loadAnnotation(genome="mm10",refdb="ensembl",level="gene",type="gene",
db=myDb,asdf=TRUE)
head(genes)
## chromosome start end gene_id gc_content strand gene_name
## 1 chr1 3073253 3074322 ENSMUSG00000102693 34.21 + 4933401J01Rik
## 2 chr1 3102016 3102125 ENSMUSG00000064842 36.36 + Gm26206
## 3 chr1 3205901 3671498 ENSMUSG00000051951 38.51 - Xkr4
## 4 chr1 3252757 3253236 ENSMUSG00000102851 39.79 + Gm18956
## 5 chr1 3365731 3368549 ENSMUSG00000103377 40.79 - Gm37180
## 6 chr1 3375556 3377788 ENSMUSG00000104017 36.99 - Gm37363
## biotype
## 1 TEC
## 2 snRNA
## 3 protein_coding
## 4 processed_pseudogene
## 5 TEC
## 6 TEC
Add a custom annotation
Apart from the supported organisms and databases, you can add a custom annotation. Such an annotation can be:
- A non-supported organism (e.g. an insect or another mammal e.g. dog)
- A modification or further curation you have done to existing/supported annotations
- A supported organism but from a different source
- Any other case where the provided annotations are not adequate
This can be achieved through the usage ofGTF files, along with some simple metadata that you have to provide for proper import to the annotation database. This can be achieved through the usage of thebuildCustomAnnotation
function. Details on required metadata can be found in the function’s help page.
Important note: Please note that importing a custom genome annotation directly from UCSC (UCSC SQL database dumps) is not supported in Windows as the process involves using the genePredToGtf
which is not available for Windows.
Let’s try a couple of exammples. The first one is a custom annotation for the Ebola virus from UCSC:
# Setup a temporary directory to download files etc.
customDir <- file.path(tempdir(),"test_custom")
dir.create(customDir)
# Convert from GenePred to GTF - Unix/Linux only!
if (.Platform$OS.type == "unix" && !grepl("^darwin",R.version$os)) {
# Download data from UCSC
goldenPath="http://hgdownload.cse.ucsc.edu/goldenPath/"
# Gene annotation dump
download.file(paste0(goldenPath,"eboVir3/database/ncbiGene.txt.gz"),
file.path(customDir,"eboVir3_ncbiGene.txt.gz"))
# Chromosome information
download.file(paste0(goldenPath,"eboVir3/database/chromInfo.txt.gz"),
file.path(customDir,"eboVir3_chromInfo.txt.gz"))
# Prepare the build
chromInfo <- read.delim(file.path(customDir,"eboVir3_chromInfo.txt.gz"),
header=FALSE)
chromInfo <- chromInfo[,1:2]
rownames(chromInfo) <- as.character(chromInfo[,1])
chromInfo <- chromInfo[,2,drop=FALSE]
# Coversion from genePred to GTF
genePredToGtfEnv <- Sys.getenv("GENEPREDTOGTF_BINARY")
if (genePredToGtfEnv == "") {
genePredToGtf <- file.path(customDir,"genePredToGtf")
} else {
genePredToGtf <- file.path(genePredToGtfEnv)
}
if (!file.exists(genePredToGtf)) {
download.file(
"http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/genePredToGtf",
genePredToGtf
)
system(paste("chmod 775",genePredToGtf))
}
gtfFile <- file.path(customDir,"eboVir3.gtf")
tmpName <- file.path(customDir,paste(format(Sys.time(),"%Y%m%d%H%M%S"),
"tgtf",sep="."))
command <- paste0(
"zcat ",file.path(customDir,"eboVir3_ncbiGene.txt.gz"),
" | ","cut -f2- | ",genePredToGtf," file stdin ",tmpName,
" -source=eboVir3"," -utr && grep -vP '\t\\.\t\\.\t' ",tmpName," > ",
gtfFile
)
system(command)
# Build with the metadata list filled (you can also provide a version)
buildCustomAnnotation(
gtfFile=gtfFile,
metadata=list(
organism="eboVir3_test",
source="ucsc_test",
chromInfo=chromInfo
),
db=myDb
)
# Try to retrieve some data
eboGenes <- loadAnnotation(genome="eboVir3_test",refdb="ucsc_test",
level="gene",type="gene",db=myDb)
eboGenes
}
## Opening metaseqR2 SQLite database /tmp/Rtmpzgy7rG/test_anndb/testann.sqlite
## Importing GTF /tmp/Rtmpzgy7rG/test_custom/eboVir3.gtf as GTF to make id map
## Making id map
## Importing GTF /tmp/Rtmpzgy7rG/test_custom/eboVir3.gtf as TxDb
## Import genomic features from the file as a GRanges object ... OK
## Prepare the 'metadata' data frame ... OK
## Make the TxDb object ... OK
## Retrieving gene annotation for ebovir3_test from ucsc_test version 20250415 from /tmp/Rtmpzgy7rG/test_custom/eboVir3.gtf
## Retrieving transcript annotation for ebovir3_test from ucsc_test version 20250415
## Retrieving summarized transcript annotation for ebovir3_test from ucsc_test version 20250415
## Retrieving 3' UTR annotation for ebovir3_test from ucsc_test version 20250415
## Retrieving summarized 3' UTR annotation per gene for ebovir3_test from ucsc_test version 20250415
## summarizing UTRs per gene for imported GTF
## Retrieving summarized 3' UTR annotation per transcript for ebovir3_test from ucsc_test version 20250415
## summarizing UTRs per gene for imported GTF
## Retrieving exon annotation for ebovir3_test from ucsc_test version 20250415
## Retrieving summarized exon annotation for ebovir3_test from ucsc_test version 20250415
## summarizing exons per gene for imported GTF
## GRanges object with 9 ranges and 4 metadata columns:
## seqnames ranges strand | gene_id gc_content gene_name
## <Rle> <IRanges> <Rle> | <character> <numeric> <character>
## NP KM034562v1 56-3026 + | NP 50 NP
## VP35 KM034562v1 3032-4407 + | VP35 50 VP35
## VP40 KM034562v1 4390-5894 + | VP40 50 VP40
## GP KM034562v1 5900-8305 + | GP 50 GP
## sGP KM034562v1 5900-8305 + | sGP 50 sGP
## ssGP KM034562v1 5900-8305 + | ssGP 50 ssGP
## VP30 KM034562v1 8288-9740 + | VP30 50 VP30
## VP24 KM034562v1 9885-11518 + | VP24 50 VP24
## L KM034562v1 11501-18282 + | L 50 L
## biotype
## <character>
## NP gene
## VP35 gene
## VP40 gene
## GP gene
## sGP gene
## ssGP gene
## VP30 gene
## VP24 gene
## L gene
## -------
## seqinfo: 1 sequence from ebovir3_test genome
Another example, the Atlantic cod from UCSC. The same things apply for the operating system.
if (.Platform$OS.type == "unix") {
# Gene annotation dump
download.file(paste0(goldenPath,"gadMor1/database/augustusGene.txt.gz"),
file.path(customDir,"gadMori1_augustusGene.txt.gz"))
# Chromosome information
download.file(paste(goldenPath,"gadMor1/database/chromInfo.txt.gz",sep=""),
file.path(customDir,"gadMori1_chromInfo.txt.gz"))
# Prepare the build
chromInfo <- read.delim(file.path(customDir,"gadMori1_chromInfo.txt.gz"),
header=FALSE)
chromInfo <- chromInfo[,1:2]
rownames(chromInfo) <- as.character(chromInfo[,1])
chromInfo <- chromInfo[,2,drop=FALSE]
# Coversion from genePred to GTF
genePredToGtf <- file.path(customDir,"genePredToGtf")
if (!file.exists(genePredToGtf)) {
download.file(
"http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/genePredToGtf",
genePredToGtf
)
system(paste("chmod 775",genePredToGtf))
}
gtfFile <- file.path(customDir,"gadMori1.gtf")
tmpName <- file.path(customDir,paste(format(Sys.time(),"%Y%m%d%H%M%S"),
"tgtf",sep="."))
command <- paste0(
"zcat ",file.path(customDir,"gadMori1_augustusGene.txt.gz"),
" | ","cut -f2- | ",genePredToGtf," file stdin ",tmpName,
" -source=gadMori1"," -utr && grep -vP '\t\\.\t\\.\t' ",tmpName," > ",
gtfFile
)
system(command)
# Build with the metadata list filled (you can also provide a version)
buildCustomAnnotation(
gtfFile=gtfFile,
metadata=list(
organism="gadMor1_test",
source="ucsc_test",
chromInfo=chromInfo
),
db=myDb
)
# Try to retrieve some data
gadGenes <- loadAnnotation(genome="gadMor1_test",refdb="ucsc_test",
level="gene",type="gene",db=myDb)
gadGenes
}
Another example, Armadillo from Ensembl. This should work irrespectively of operating system. We are downloading chromosomal information from UCSC.
# Gene annotation dump from Ensembl
download.file(paste0("ftp://ftp.ensembl.org/pub/release-98/gtf/",
"dasypus_novemcinctus/Dasypus_novemcinctus.Dasnov3.0.98.gtf.gz"),
file.path(customDir,"Dasypus_novemcinctus.Dasnov3.0.98.gtf.gz"))
# Chromosome information will be provided from the following BAM file
# available from Ensembl. We have noticed that when using Windows as the OS,
# a remote BAM files cannot be opened by scanBamParam, so for this example,
# chromosome length information will not be available when running in Windows.
bamForInfo <- NULL
if (.Platform$OS.type == "unix")
bamForInfo <- paste0("ftp://ftp.ensembl.org/pub/release-98/bamcov/",
"dasypus_novemcinctus/genebuild/Dasnov3.broad.Ascending_Colon_5.1.bam")
# Build with the metadata list filled (you can also provide a version)
buildCustomAnnotation(
gtfFile=file.path(customDir,"Dasypus_novemcinctus.Dasnov3.0.98.gtf.gz"),
metadata=list(
organism="dasNov3_test",
source="ensembl_test",
chromInfo=bamForInfo
),
db=myDb
)
# Try to retrieve some data
dasGenes <- loadAnnotation(genome="dasNov3_test",refdb="ensembl_test",
level="gene",type="gene",db=myDb)
dasGenes
A complete build
A quite complete build (with latest versions of Ensembl annotations) would look like (supposing the default annotation database location):
organisms <- list(
hg18=54,
hg19=75,
hg38=110:111,
mm9=54,
mm10=110:111,
rn5=77,
rn6=110:111,
dm3=77,
dm6=110:111,
danrer7=77,
danrer10=80,
danrer11=110:111,
pantro4=80,
pantro5=110:111,
susscr3=80,
susscr11=110:111,
equcab2=110:111
)
sources <- c("ensembl","ucsc","refseq")
buildAnnotationDatabase(organisms,sources,forceDownload=FALSE,rc=0.5)
The aforementioned complete built can be foundhereComplete builts will become available from time to time (e.g. with every new Ensembl relrase) for users who do not wish to create annotation databases on their own. Root access may be required (depending on the metaseqR2 library location) to place it in the default location where it can be found automatically.
Annotations on-the-fly
If for some reason you do not want to build and use an annotation database for metaseqR2 analyses (not recommended) or you wish to perform an analysis with an organism that does not yet exist in the database, the loadAnnotation
function will perform all required actions (download and create a GRanges
object) on-the-fly as long as there is an internet connection.
However, the above function does not handle custom annotations in GTF files. In a scenario where you want to use a custom annotation only once, you should supply the annotation
argument to the metaseqr2
function, which is almost the same as the metadata
argument used in buildCustomAnnotation
, actually augmented by a list member for the GTF file, that is:
annotation <- list(
gtf="PATH_TO_GTF",
organism="ORGANISM_NAME",
source="SOURCE_NAME",
chromInfo="CHROM_INFO"
)
The above argument can be passed to the metaseqr2 call in the respective position.
For further details about custom annotations on the fly, please checkbuildCustomAnnotation
and importCustomAnnotation
functions.
Session Info
sessionInfo()
## R version 4.5.0 beta (2025-04-02 r88102)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] splines stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] BSgenome.Mmusculus.UCSC.mm10_1.4.3 BSgenome_1.77.0
## [3] rtracklayer_1.69.0 BiocIO_1.19.0
## [5] Biostrings_2.77.0 XVector_0.49.0
## [7] metaseqR2_1.21.0 locfit_1.5-9.12
## [9] limma_3.65.0 DESeq2_1.49.0
## [11] SummarizedExperiment_1.39.0 Biobase_2.69.0
## [13] MatrixGenerics_1.21.0 matrixStats_1.5.0
## [15] GenomicRanges_1.61.0 GenomeInfoDb_1.45.0
## [17] IRanges_2.43.0 S4Vectors_0.47.0
## [19] BiocGenerics_0.55.0 generics_0.1.3
## [21] BiocStyle_2.37.0
##
## loaded via a namespace (and not attached):
## [1] survcomp_1.59.0 bitops_1.0-9
## [3] filelock_1.0.3 tibble_3.2.1
## [5] R.oo_1.27.0 preprocessCore_1.71.0
## [7] XML_3.99-0.18 lifecycle_1.0.4
## [9] httr2_1.1.2 pwalign_1.5.0
## [11] edgeR_4.7.0 globals_0.16.3
## [13] MASS_7.3-65 lattice_0.22-7
## [15] dendextend_1.19.0 magrittr_2.0.3
## [17] plotly_4.10.4 sass_0.4.10
## [19] rmarkdown_2.29 jquerylib_0.1.4
## [21] yaml_2.3.10 DBI_1.2.3
## [23] RColorBrewer_1.1-3 ABSSeq_1.63.0
## [25] harmonicmeanp_3.0.1 abind_1.4-8
## [27] ShortRead_1.67.0 purrr_1.0.4
## [29] R.utils_2.13.0 rmeta_3.0
## [31] RCurl_1.98-1.17 rappdirs_0.3.3
## [33] lava_1.8.1 seriation_1.5.7
## [35] GenomeInfoDbData_1.2.14 survivalROC_1.0.3.1
## [37] listenv_0.9.1 genefilter_1.91.0
## [39] parallelly_1.43.0 annotate_1.87.0
## [41] permute_0.9-7 DelayedMatrixStats_1.31.0
## [43] codetools_0.2-20 DelayedArray_0.35.0
## [45] DT_0.33 xml2_1.3.8
## [47] SuppDists_1.1-9.9 tidyselect_1.2.1
## [49] futile.logger_1.4.3 UCSC.utils_1.5.0
## [51] viridis_0.6.5 TSP_1.2-4
## [53] rmdformats_1.0.4 BiocFileCache_2.17.0
## [55] webshot_0.5.5 GenomicAlignments_1.45.0
## [57] jsonlite_2.0.0 iterators_1.0.14
## [59] survival_3.8-3 foreach_1.5.2
## [61] tools_4.5.0 progress_1.2.3
## [63] Rcpp_1.0.14 glue_1.8.0
## [65] prodlim_2024.06.25 gridExtra_2.3
## [67] SparseArray_1.9.0 xfun_0.52
## [69] qvalue_2.41.0 ca_0.71.1
## [71] dplyr_1.1.4 HDF5Array_1.37.0
## [73] withr_3.0.2 NBPSeq_0.3.1
## [75] formatR_1.14 BiocManager_1.30.25
## [77] fastmap_1.2.0 latticeExtra_0.6-30
## [79] rhdf5filters_1.21.0 caTools_1.18.3
## [81] digest_0.6.37 R6_2.6.1
## [83] colorspace_2.1-1 RMySQL_0.11.1
## [85] gtools_3.9.5 jpeg_0.1-11
## [87] biomaRt_2.65.0 RSQLite_2.3.9
## [89] R.methodsS3_1.8.2 h5mread_1.1.0
## [91] tidyr_1.3.1 data.table_1.17.0
## [93] prettyunits_1.2.0 httr_1.4.7
## [95] htmlwidgets_1.6.4 S4Arrays_1.9.0
## [97] pkgconfig_2.0.3 gtable_0.3.6
## [99] registry_0.5-1 blob_1.2.4
## [101] hwriter_1.3.2.1 htmltools_0.5.8.1
## [103] bookdown_0.43 log4r_0.4.4
## [105] scales_1.3.0 png_0.1-8
## [107] corrplot_0.95 knitr_1.50
## [109] lambda.r_1.2.4 reshape2_1.4.4
## [111] rjson_0.2.23 curl_6.2.2
## [113] zoo_1.8-14 cachem_1.1.0
## [115] rhdf5_2.53.0 stringr_1.5.1
## [117] KernSmooth_2.23-26 parallel_4.5.0
## [119] AnnotationDbi_1.71.0 vsn_3.77.0
## [121] restfulr_0.0.15 pillar_1.10.2
## [123] grid_4.5.0 vctrs_0.6.5
## [125] gplots_3.2.0 dbplyr_2.5.0
## [127] beachmat_2.25.0 xtable_1.8-4
## [129] evaluate_1.0.3 bsseq_1.45.0
## [131] VennDiagram_1.7.3 GenomicFeatures_1.61.0
## [133] cli_3.6.4 compiler_4.5.0
## [135] futile.options_1.0.1 Rsamtools_2.25.0
## [137] rlang_1.1.6 crayon_1.5.3
## [139] FMStable_0.1-4 future.apply_1.11.3
## [141] heatmaply_1.5.0 interp_1.1-6
## [143] aroma.light_3.39.0 affy_1.87.0
## [145] plyr_1.8.9 pander_0.6.6
## [147] stringi_1.8.7 viridisLite_0.4.2
## [149] deldir_2.0-4 BiocParallel_1.43.0
## [151] assertthat_0.2.1 txdbmaker_1.5.0
## [153] munsell_0.5.1 lazyeval_0.2.2
## [155] DSS_2.57.0 EDASeq_2.43.0
## [157] Matrix_1.7-3 hms_1.1.3
## [159] future_1.40.0 sparseMatrixStats_1.21.0
## [161] bit64_4.6.0-1 ggplot2_3.5.2
## [163] Rhdf5lib_1.31.0 KEGGREST_1.49.0
## [165] statmod_1.5.0 memoise_2.0.1
## [167] affyio_1.79.0 bslib_0.9.0
## [169] bootstrap_2019.6 bit_4.6.0