Update subsampling by trvrb · Pull Request #1074 · nextstrain/ncov (original) (raw)
added 2 commits
In the current "global" analyses, treating China and India each as just another country in Asia was resulting in much smaller per-capita sampling rates. For example, in the current gisaid/global/6m tree we have 66 viruses from Guatemala (population 17M), 62 viruses from Costa Rica (population 5M), 18 viruses from India (population 1400M) and 21 viruses from China (population 1400M). This is a ~1000-fold difference in per-capita sampling intensity.
This commit partially addresses this issue by splitting out China and India into their own buckets when subsampling. This results in buckets of North America (580M), South America (420M), Europe (750M), Africa (1.2B), Oceania (44M), India (1.4B), China (1.4B) and Asia minus India and China (1.8B).
Additionally, this commit makes a small correction to reduce Oceania to 20% region count relative to other regions from previous 33%.
Within the builds that focus on region=Asia there is currently less intensive per-capita sampling in China and India relative to other countries in Asia. For example, the current gisaid/asia/6m tree has 144 viruses from China (population 1.4B), 96 viruses from India (population 1.4B), 118 viruses from Thailand (population 70M) and 53 viruses from Laos (population 7M). This a 100-fold difference in sampling intensity between Laos and India.
This commit splits Asia-focused builds to have 4 geographic buckets rather than the previous 2, arriving at China, India, Asia (minus China and India) and global context.
This won't fully address differential per-capita sampling intensity in Asia, but is a simple addition that should go a long way.
This commit slight updates targets for the nextstrain_global subsampling schemes in an attempt to bring realized per-capita sample counts more in line with population size basis.
trvrb deleted the update-subsampling branch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})