Effective Genome Size — deepTools 3.5.6 documentation (original) (raw)
A number of tools can accept an “effective genome size”. This is defined as the length of the “mappable” genome. There are two common alternative ways to calculate this:
- The number of non-N bases in the genome.
- The number of regions (of some size) in the genome that are uniquely mappable (possibly given some maximal edit distance).
Option 1 can be computed using faCount
from Kents tools. The effective genome size for a number of genomes using this method is given below:
Genome | Effective size |
---|---|
GRCh37 | 2864785220 |
GRCh38 | 2913022398 |
T2T/CHM13CAT_v2 | 3117292070 |
GRCm37 | 2620345972 |
GRCm38 | 2652783500 |
GRCm39 | 2654621783 |
dm3 | 162367812 |
dm6 | 142573017 |
GRCz10 | 1369631918 |
GRCz11 | 1368780147 |
WBcel235 | 100286401 |
TAIR10 | 119482012 |
These values only appropriate if multimapping reads are included. If they are excluded (or there’s any MAPQ filter applied), then values derived from option 2 are more appropriate. These are then based on the read length. We can approximate these values for various read lengths using the khmer program program and unique-kmers.py
in particular. A table of effective genome sizes given a read length using this method is provided below:
Read length | GRCh37 | GRCh38 | T2T/CHM13CAT_v2 | GRCm37 | GRCm38 | GRCm39 | dm3 | dm6 | GRCz10 | GRCz11 | WBcel235 | TAIR10 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
50 | 2685511454 | 2701495711 | 2725240337 | 2304947876 | 2308125299 | 2309746861 | 130428510 | 125464678 | 1195445541 | 1197575653 | 95159402 | 114339094 |
75 | 2736124898 | 2747877702 | 2786136059 | 2404646149 | 2407883243 | 2410055689 | 135004387 | 127324557 | 1251132611 | 1250812288 | 96945370 | 115317469 |
100 | 2776919708 | 2805636231 | 2814334875 | 2462480910 | 2467481008 | 2468088461 | 139647132 | 129789773 | 1280188944 | 1280354977 | 98259898 | 118459858 |
150 | 2827436883 | 2862010428 | 2931551487 | 2489384085 | 2494787038 | 2495461690 | 144307658 | 129940985 | 1312207019 | 1311832909 | 98721103 | 118504138 |
200 | 2855463800 | 2887553103 | 2936403235 | 2513019076 | 2520868989 | 2521902382 | 148523810 | 132508963 | 1321355041 | 1322366338 | 98672558 | 117723393 |
250 | 2855044784 | 2898802627 | 2960856300 | 2528988583 | 2538590322 | 2538633971 | 151901455 | 132900923 | 1339205109 | 1342093482 | 101271756 | 119585546 |