issue with UCSC liftover conversion (original) (raw)

Jill Rabinowitz

unread,

Jun 22, 2020, 11:55:32 AM6/22/20

to gen...@soe.ucsc.edu

Hi there,

I am using UCSC's liftover tool to go from the hg19 build to hg38. There were 2 chromosome positions that failed to converge that were NOT in the text file that I submitted for conversion. I cannot for the life of me figure out why these 2 chromosomes are listed as not converted when they were not in the file I submitted. Can you please help?

NOT in text file that I submitted for conversion:

chr9:8-80000001
chr12:4-48000001

This is the format of the data I submitted:

chr1:86028-86029

chr1:693731-693732

chr1:713092-713093

chr1:714596-714597

chr1:715205-715206

chr1:715265-715266

chr1:715367-715368

chr1:717474-717475

chr1:717485-717486

Best,

Jill

Jairo Navarro Gonzalez

unread,

Jun 30, 2020, 7:14:41 PM6/30/20

to Jill Rabinowitz, gen...@soe.ucsc.edu

Hello Jill,

Thank you for using the UCSC Genome Browser and reporting your issues with the LiftOver tool.

Unfortunately, we are unable to reproduce your issues and were able to lift the positions you shared just fine. Can you confirm that you are using one of the official Genome Browser mirror sites:

If you are using a text file with the positions that you want to lift, could you share the file with us? If you are concerned with posting the file on a publicly archived forum, you can send the file to genom...@soe.ucsc.edu or directly to me.

I look forward to your reply. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro
UCSC Genome Browser

Want to share the Browser with colleagues?
Host a workshop: https://bit.ly/ucscTraining

Jill Rabinowitz

unread,

Jun 30, 2020, 7:55:32 PM6/30/20

to Jairo Navarro Gonzalez, gen...@soe.ucsc.edu

Hi Jairo,

Thank you for your message. So, I actually ended up just using the UCSC liftover tool on my cluster and I did not get those specific errors. However, I have gotten some rather odd output. In particular, there are some chromosome values that UCSC outputs that don't make sense. Here is the formatting I used for the cluster:

chr1 10176 10177 rs367896724

chr1 10234 10235 rs540431307

chr1 10351 10352 rs555500075

chr1 10504 10505 rs548419688

chr1 10505 10506 rs568405545

chr1 10510 10511 rs534229142

chr1 10538 10539 rs537182016

chr1 10541 10542 rs572818783

chr1 10578 10579 rs538322974

chr 1 148684033 148684034 rs587751686 rs587751686

And this was the output I got for the the bolded row:

chr1_KI270765v1_alt 144190 144191 rs587751686

Now, for the most part, the liftover worked (I am going from hg19 --> hg38), but I don't understand why I am getting those weird chromosome value sand why they aren't just in the unMapped file. Do you have any thoughts on this?

Best,

Jill


External Email - Use Caution

Luis Nassar

unread,

Jul 3, 2020, 2:14:30 PM7/3/20

to Jill Rabinowitz, Jairo Navarro Gonzalez, gen...@soe.ucsc.edu

Hello Jill,

That match you see (chr_alt) is a match on an alternate haplotype sequence on hg38 (http://genome.ucsc.edu/FAQ/FAQblat.html#blat1c). Chain files look for the best matches when lifting coordinates (including these alt haplotype and sometimes fix sequences when lifting to a newer assembly). This process is not perfect, and if you look at our lift over chains on that position on hg19, you will notice that is has a gap (double line): http://genome-preview.soe.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=lou&hgS_otherUserSessionName=ML25756

Note: That link leads to our development server, as that track is not yet released. Data on our development server is subject to change at any time.

The only exception is that one match to chr_alt, which is what lift over is doing on your SNP. You may also notice there that remap (NCBI's equivalent of lift over) does have a mapping to the chr1 region.

When mapping SNPs, the best approach is to lift rsIDs directly (when available). There are two ways to do this in the Genome Browser. The first is to use the Table Browser (http://genome.ucsc.edu/cgi-bin/hgGateway). Making the following selections:

image.png

Then choosing "paste list" or "upload list" in identifiers (names/accessions):. In this case I'll use the example rsID (rs587751686). Then get output (or alternatively give a file name to prompt a download). Results will look as follows:

#chrom chromStart chromEnd name ref altCount alts shiftBases freqSourceCount minorAlleleFreq majorAllele minorAllele maxFuncImpact class ucscNotes _dataOffset _dataLen chr1 144333635 144333636 rs587751686 A 1 G, 0 12 0.000199681,-inf,-inf,-inf,-inf,7.04325e-05,-inf,-inf,-inf,-inf,-inf,-inf, A,,,,,A,,,,,,, G,,,,,G,,,,,,, 2153 snv rareSome,rareAll, 53239806673 127

The second choice if you have rsID numbers, would be to use the Variant Annotation Integrator (http://genome.ucsc.edu/cgi-bin/hgVai). An example of this can be seen here: https://groups.google.com/a/soe.ucsc.edu/g/genome/c/wNZO2A7k33Q/m/RlDl8s6eAgAJ

To summarize, it is recommended to use rsIDs to lift SNPs when available. If those are not available, then lift over may present an alternate solution. Worth considering, however, is that sometimes NCBI's remap may be more successful in certain cases, and likewise UCSC's lift over will be more successful in others.

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute