Sequence comparison of human and mouse genes reveals a homologous block structure in the promoter regions - PubMed (original) (raw)

Comparative Study

Sequence comparison of human and mouse genes reveals a homologous block structure in the promoter regions

Yutaka Suzuki et al. Genome Res. 2004 Sep.

Abstract

Comparative sequence analysis was carried out for the regions adjacent to experimentally validated transcriptional start sites (TSSs), using 3324 pairs of human and mouse genes. We aligned the upstream putative promoter sequences over the 1-kb proximal regions and found that the sequence conservation could not be further extended at, on average, 510 bp upstream positions of the TSSs. This discontinuous manner of the sequence conservation revealed a "block" structure in about one-third of the putative promoter regions. Consistently, we also observed that G+C content and CpG frequency were significantly different inside and outside the blocks. Within the blocks, the sequence identity was uniformly 65% regardless of their length. About 90% of the previously characterized transcription factor binding sites were located within those blocks. In 46% of the blocks, the 5' ends were bounded by interspersed repetitive elements, some of which may have nucleated the genomic rearrangements. The length of the blocks was shortest in the promoters of genes encoding transcription factors and of genes whose expression patterns are brain specific, which suggests that the evolutional diversifications in the transcriptional modulations should be the most marked in these populations of genes.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Sequence identity between human and mouse PPRs. Sequence alignments were calculated using LALIGN with the default parameters. The sequence identity was evaluated as the number of aligned nucleotides in the regions of -1000 to +200 (TSS: 0). The average sequence identities were calculated for each region (A). (B) The PPRs were separated into the 200-bp windows at the positions indicated in the inset. Sequence identity was calculated for each of the windows. Frequency as to which of the windows belong to which of the sequence identity groups represented on the horizontal axis is plotted.

Figure 2

Figure 2

Sequence alignments of the block structure in PPRs and nongenic regions. (A) Frequency of the blocks belonging to each population is shown. (B) Relation between length of the block and the average sequence identity within it. (C) Relation between percentile position within the block and the average sequence identity. (D) Alignment of the nongenic sequences using LALIGN. The sequences ranging from -1 kb to + 200 bp of the putative syntenic regions located in nongenic regions as in UCSC genome browser were aligned and the frequencies of the aligned nucleotides were calculated at each of the positions. Vertical line represents the frequency of the nucleotide at the indicated position being located within the block. (Note that the vertical axis in Figure 1 represents the frequency of the sequence “identity”).

Figure 3

Figure 3

Sequence alignments around the boundary of the block and that of the first intron and the second exon using SSEARCH. (A) Sequences of human and mouse PPRs were aligned using SSEARCH with a 50-bp moving window around the boundary of the block. The broken line represents the boundary of the block calculated using LALIGN. The vertical axis represents the average score of the SSEARCH calculated for the corresponding position. The horizontal axis represents the relative position to the boundary. (B) Result of an analysis similar to that shown in A, using the proximal sequences of the 5′ end of the second exons. The broken line represents the exon-intron boundary. The horizontal axis represents the relative position to the exon-intron boundary.

Similar articles

Cited by

References

    1. Batzer, M.A. and Deininger, P.L. 2002. Alu repeats and human genomic diversity. Nat. Rev. Genet. 3: 370-379. - PubMed
    1. Boguski, M.S. 2002. Comparative genomics: The mouse that roared. Nature 420: 515-516. - PubMed
    1. Carninci, P. and Hayashizaki, Y. 1999. High-efficiency full-length cDNA cloning. Methods Enzymol. 303: 19-44. - PubMed
    1. Cross, S.H. and Bird, A.P. 1995. CpG islands and genes. Curr. Opin. Genet. Dev. 5: 309-314. - PubMed
    1. Deininger, P.L. and Batzer, M.A. 2002. Mammalian retroelements. Genome Res. 12: 1455-1465. - PubMed

WEB SITE REFERENCES

    1. http://dbtss.hgc.jp/; DBTSS.
    1. http://fantom.gsc.riken.go.jp/; FANTOM.
    1. ftp://ftp.virginia.edu/pub/fasta/; SSEARCH.
    1. http://genome.ucsc.edu/cgi-bin/hgBlat?command=start; BLAT.
    1. http://genome.ucsc.edu/downloads.html; UCSC Genome Browser.

Publication types

MeSH terms

Substances

LinkOut - more resources