Kenta Nakai - Academia.edu (original) (raw)

Papers by Kenta Nakai

Research paper thumbnail of A study on the application of topic models to motif finding algorithms

BMC Bioinformatics, Dec 1, 2016

Background: Topic models are statistical algorithms which try to discover the structure of a set ... more Background: Topic models are statistical algorithms which try to discover the structure of a set of documents according to the abstract topics contained in them. Here we try to apply this approach to the discovery of the structure of the transcription factor binding sites (TFBS) contained in a set of biological sequences, which is a fundamental problem in molecular biology research for the understanding of transcriptional regulation. Here we present two methods that make use of topic models for motif finding. First, we developed an algorithm in which first a set of biological sequences are treated as text documents, and the k-mers contained in them as words, to then build a correlated topic model (CTM) and iteratively reduce its perplexity. We also used the perplexity measurement of CTMs to improve our previous algorithm based on a genetic algorithm and several statistical coefficients. Results: The algorithms were tested with 56 data sets from four different species and compared to 14 other methods by the use of several coefficients both at nucleotide and site level. The results of our first approach showed a performance comparable to the other methods studied, especially at site level and in sensitivity scores, in which it scored better than any of the 14 existing tools. In the case of our previous algorithm, the new approach with the addition of the perplexity measurement clearly outperformed all of the other methods in sensitivity, both at nucleotide and site level, and in overall performance at site level. Conclusions: The statistics obtained show that the performance of a motif finding method based on the use of a CTM is satisfying enough to conclude that the application of topic models is a valid method for developing motif finding algorithms. Moreover, the addition of topic models to a previously developed method dramatically increased its performance, suggesting that this combined algorithm can be a useful tool to successfully predict motifs in different kinds of sets of DNA sequences.

Research paper thumbnail of Two different classes of co-occurring motif pairs found by a novel visualization method in human promoter regions

BMC Genomics, 2008

BackgroundIt is essential in modern biology to understand how transcriptional regulatory regions ... more BackgroundIt is essential in modern biology to understand how transcriptional regulatory regions are composed ofcis-elements, yet we have limited knowledge of, for example, the combinational uses of these elements and their positional distribution.ResultsWe predicted the positions of 228 known binding motifs for transcription factors in phylogenetically conserved regions within -2000 and +1000 bp of transcriptional start sites (TSSs) of human genes and visualized their correlated non-overlapping occurrences. In the 8,454 significantly correlated motif pairs, two major classes were observed: 248 pairs in Class 1 were mainly found around TSSs, whereas 4,020 Class 2 pairs appear at rather arbitrary distances from TSSs. These classes are distinct in a number of aspects. First, the positional distribution of the Class 1 constituent motifs shows a single peak near the TSSs, whereas Class 2 motifs show a relatively broad distribution. Second, genes that harbor the Class 1 pairs are more li...

Research paper thumbnail of WoLF PSORT: protein localization predictor

Nucleic Acids Research, 2007

WoLF PSORT is an extension of the PSORT II program for protein subcellular location prediction. W... more WoLF PSORT is an extension of the PSORT II program for protein subcellular location prediction. WoLF PSORT converts protein amino acid sequences into numerical localization features; based on sorting signals, amino acid composition and functional motifs such as DNA-binding motifs. After conversion, a simple k-nearest neighbor classifier is used for prediction. Using html, the evidence for each prediction is shown in two ways: (i) a list of proteins of known localization with the most similar localization features to the query, and (ii) tables with detailed information about individual localization features. For convenience, sequence alignments of the query to similar proteins and links to UniProt and Gene Ontology are provided. Taken together, this information allows a user to understand the evidence (or lack thereof) behind the predictions made for particular proteins.

Research paper thumbnail of Bayesian Joint Prediction of Associated Transcription Factors in Bacillus Subtilis

Biocomputing 2005, 2004

Sigma factors, often in conjunction with other transcription factors, regulate gene expression in... more Sigma factors, often in conjunction with other transcription factors, regulate gene expression in prokaryotes at the transcriptional level. Specific transcription factors tend to co-occur with specific sigma factors. To predict new members of the transcription factor regulon, we applied Bayes rule to combine the Bayesian probability of sigma factor prediction calculated from microarray data and the sigma factor binding sequence motif, the motif score of the transcription factor associated with the sigma factor, the empirically determined distance between the transcription start site to the cis-regulatory region, and the tendency for specific sigma factors and transcription factors to co-occur. By combining these information sources, we improve the accuracy of predicting regulation by transcription factors, and also confirm the sigma factor prediction. We applied our proposed method to all genes in Bacillus subtilis to find currently unknown gene regulations by transcription factors and sigma factors.

Research paper thumbnail of Phylogenetic Analysis of Eubacterial Transcriptional Systems Based on the DBTBS Database of B. subtilis Transcription Factors and Promoters

Genome Informatics, 2003

In 1999, we released the DBTBS database of Bacillus subtilis promoters and transcription factors ... more In 1999, we released the DBTBS database of Bacillus subtilis promoters and transcription factors [1]. DBTBS is a reference database containing experimentally characterized transcription factors with their regulated genes as well as their recognition sequences, as reported in the literature. This database is useful to confirm predictions about transcription relatives using B.subtilis data. The latest version of this database shows position specific scoring matrices (PSSMs) to support a function to find putative transcription factor binding sites [2]. One of the problems for using weight matrices or consensus patterns to identify novel recognition positions of known transcription factors is that it often produces a number of false positives. To overcome this problem, the use of sequence conservation information between closely related species, called phylogenetic footprinting, is widely used. For example, we predicted B. subtilis regulons based on the conservation of upstream sequence...

Research paper thumbnail of A Database of B. subtilis Promoters and Transcription Factors

Genome Informatics, 1998

Although the number of bacteria with their entire genomic sequence known is increasing, it is ess... more Although the number of bacteria with their entire genomic sequence known is increasing, it is essential to study well-known genomes for understanding the 'blueprint` of bacteria more precisely. For this purpose, E. coli and B. subtilis are especially suitable because of their long history of research. Among various information coded in bacterial genomes, we are interested in the analysis of transcriptional regulation network. One of our ultimate goal is to predict the expression condition of given ORFs from their upstream sequences. For example, we have developed a prediction system of sigma-dependency of ORFs found in B. subtilis [6] and in E. coli (unpublished). However, to understand the detailed mechanism of transcription regulation, the knowledge of other transcription factors is also crucial. For E. coli, there is such a database called RegulonDB [3] but there is no databases containing comprehensive information of transcription in B. subtilis. Thus, we constructed a datab...

Research paper thumbnail of Modeling Transcriptional Units of E. coli Genes Using HMM

Genome Informatics, 1998

In recent years, the number of bacteria whose entire genomic sequence is determined is growing ra... more In recent years, the number of bacteria whose entire genomic sequence is determined is growing rapidly. However, the information which can be derived from computer analyses of them is still limited although the predictive identi cation of coding regions is performed with relatively high accuracy (see [4], for example). Therefore, we have studied ways to interpret the regulatory information coded in genomic sequences [5, 6]. In this work, we report our rst e ort to integrate the models for detecting various signals (e.g., promoters and terminators) with our previous model of coding regions, aiming at the recognition of transcriptional units in bacterial genomes.

Research paper thumbnail of Determination of the Nucleotide Sequence of Bombyx mori Cytoplasmic Polyhedrosis Virus Segment 9 and Its Expression in BmN4 Cells

Journal of Virology, 1998

Cloning and sequencing of segment 9 of Bombyx mori cytoplasmic polyhedrosis virus (BmCPV) strains... more Cloning and sequencing of segment 9 of Bombyx mori cytoplasmic polyhedrosis virus (BmCPV) strains H and I were performed. The segment consisted of 1,186 bp harboring 5′ and 3′ noncoding regions and an open reading frame from positions 75 to 1037, encoding a protein with 320 amino acids, termed NS5. Comparison of the nucleotide sequences of NS5 for the two strains indicated 37 point differences resulting in only six amino acid replacements. Homology search showed that NS5 has localized similarities to human poliovirus RNA-dependent RNA polymerase and human rotavirus NS26. By Western blot analysis, NS5 was found in BmCPV-infected midgut cells, but not in polyhedra or virus virions, and was mainly detectable in the nucleus in BmCPV-infected BmN4 cells. Immunoblot analysis with anti-NS5 and antipolyhedrin antibodies displayed marked differences in the period of expression of NS5 and polyhedrin: the polyhedrin molecule was first detected 2 or 3 days after infection with BmCPV, whereas th...

Research paper thumbnail of Transcriptional regulation of a horizontally transferred gene from bacterium to chordate

Proceedings. Biological sciences, Dec 28, 2016

The horizontal transfer of genes between distantly related organisms is undoubtedly a major facto... more The horizontal transfer of genes between distantly related organisms is undoubtedly a major factor in the evolution of novel traits. Because genes are functionless without expression, horizontally transferred genes must acquire appropriate transcriptional regulations in their recipient organisms, although the evolutionary mechanism is not known well. The defining characteristic of tunicates is the presence of a cellulose containing tunic covering the adult and larval body surface. Cellulose synthase was acquired by horizontal gene transfer from Actinobacteria. We found that acquisition of the binding site of AP-2 transcription factor was essential for tunicate cellulose synthase to gain epidermal-specific expression. Actinobacteria have very GC-rich genomes, regions of which are capable of inducing specific expression in the tunicate epidermis as the AP-2 binds to a GC-rich region. Therefore, the actinobacterial cellulose synthase could have been potentiated to evolve its new functi...

Research paper thumbnail of ZBTB16 as a Downstream Target Gene of Osterix Regulates Osteoblastogenesis of Human Multipotent Mesenchymal Stromal Cells

Journal of cellular biochemistry, Oct 1, 2016

Human multipotent mesenchymal stromal cells (hMSCs) possess the ability to differentiate into ost... more Human multipotent mesenchymal stromal cells (hMSCs) possess the ability to differentiate into osteoblasts, and they can be utilized as a source for bone regenerative therapy. Osteoinductive pretreatment, which induces the osteoblastic differentiation of hMSCs in vitro, has been widely used for bone tissue engineering prior to cell transplantation. However, the molecular basis of osteoblastic differentiation induced by osteoinductive medium (OIM) is still unknown. Therefore, we used a next-generation sequencer to investigate the changes in gene expression during the osteoblastic differentiation of hMSCs. The hMSCs used in this study possessed both multipotency and self-renewal ability. Whole-transcriptome analysis revealed that the expression of zinc finger and BTB domain containing 16 (ZBTB16) was significantly increased during the osteoblastogenesis of hMSCs. ZBTB16 mRNA and protein expression was enhanced by culturing the hMSCs with OIM. Small interfering RNA (siRNA)-mediated gene...

Research paper thumbnail of Genome informatics for data-driven biology

GenomeBiology.com (London. Print), Mar 27, 2002

Research paper thumbnail of Discovery of Intermediary Genes between Pathways Using Sparse Regression

PloS one, 2015

The use of pathways and gene interaction networks for the analysis of differential expression exp... more The use of pathways and gene interaction networks for the analysis of differential expression experiments has allowed us to highlight the differences in gene expression profiles between samples in a systems biology perspective. The usefulness and accuracy of pathway analysis critically depend on our understanding of how genes interact with one another. That knowledge is continuously improving due to advances in next generation sequencing technologies and in computational methods. While most approaches treat each of them as independent entities, pathways actually coordinate to perform essential functions in a cell. In this work, we propose a methodology based on a sparse regression approach to find genes that act as intermediary to and interact with two pathways. We model each gene in a pathway using a set of predictor genes, and a connection is formed between the pathway gene and a predictor gene if the sparse regression coefficient corresponding to the predictor gene is non-zero. A...

Research paper thumbnail of DBTMEE: a database of transcriptome in mouse early embryos

Nucleic acids research, 2015

DBTMEE (http://dbtmee.hgc.jp/) is a searchable and browsable database designed to manipulate gene... more DBTMEE (http://dbtmee.hgc.jp/) is a searchable and browsable database designed to manipulate gene expression information from our ultralarge-scale whole-transcriptome analysis of mouse early embryos. Since integrative approaches with multiple public analytical data have become indispensable for studying embryogenesis due to technical challenges such as biological sample collection, we intend DBTMEE to be an integrated gateway for the research community. To do so, we combined the gene expression profile with various public resources. Thereby, users can extensively investigate molecular characteristics among totipotent, pluripotent and differentiated cells while taking genetic and epigenetic characteristics into consideration. We have also designed user friendly web interfaces that enable users to access the data quickly and easily. DBTMEE will help to promote our understanding of the enigmatic fertilization dynamics.

Research paper thumbnail of Prediction of Transcriptional Terminators in Bacillus subtilis and Related Species

PLoS Computational Biology, 2005

In prokaryotes, genes belonging to the same operon are transcribed in a single mRNA molecule. Tra... more In prokaryotes, genes belonging to the same operon are transcribed in a single mRNA molecule. Transcription starts as the RNA polymerase binds to the promoter and continues until it reaches a transcriptional terminator. Some terminators rely on the presence of the Rho protein, whereas others function independently of Rho. Such Rhoindependent terminators consist of an inverted repeat followed by a stretch of thymine residues, allowing us to predict their presence directly from the DNA sequence. Unlike in Escherichia coli, the Rho protein is dispensable in Bacillus subtilis, suggesting a limited role for Rho-dependent termination in this organism and possibly in other Firmicutes. We analyzed 463 experimentally known terminating sequences in B. subtilis and found a decision rule to distinguish Rho-independent transcriptional terminators from non-terminating sequences. The decision rule allowed us to find the boundaries of operons in B. subtilis with a sensitivity and specificity of about 94%. Using the same decision rule, we found an average sensitivity of 94% for 57 bacteria belonging to the Firmicutes phylum, and a considerably lower sensitivity for other bacteria. Our analysis shows that Rho-independent termination is dominant for Firmicutes in general, and that the properties of the transcriptional terminators are conserved. Terminator prediction can be used to reliably predict the operon structure in these organisms, even in the absence of experimentally known operons. Genome-wide predictions of Rho-independent terminators for the 57 Firmicutes are available in the Supporting Information section.

Research paper thumbnail of HitPredict: a database of quality assessed protein–protein interactions in nine species

Nucleic Acids Research, 2010

Despite the availability of a large number of proteinprotein interactions (PPIs) in several speci... more Despite the availability of a large number of proteinprotein interactions (PPIs) in several species, researchers are often limited to using very small subsets in a few organisms due to the high prevalence of spurious interactions. In spite of the importance of quality assessment of experimentally determined PPIs, a surprisingly small number of databases provide interactions with scores and confidence levels. We introduce HitPredict (http:// hintdb.hgc.jp/htp/), a database with quality assessed PPIs in nine species. HitPredict assigns a confidence level to interactions based on a reliability score that is computed using evidence from sequence, structure and functional annotations of the interacting proteins. HitPredict was first released in 2005 and is updated annually. The current release contains 36 930 proteins with 176 983 non-redundant, physical interactions, of which 116 198 (66%) are predicted to be of high confidence.

Research paper thumbnail of The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts

Nucleic Acids Research, 2007

Here we report the new features and improvements in our latest release of the H-Invitational Data... more Here we report the new features and improvements in our latest release of the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/), a comprehensive annotation resource for human genes and transcripts. H-InvDB, originally developed as an integrated database of the human transcriptome based on extensive annotation of large sets of fulllength cDNA (FLcDNA) clones, now provides annotation for 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD), in addition to 54 978 human FLcDNAs, in the latest release H-InvDB_4.6. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 642 (1.9%) non-protein-coding loci; 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. For all these transcripts and genes, we provide comprehensive annotation including gene structures, gene functions, alternative splicing variants, functional non-protein-coding RNAs, functional domains, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene expression profiles, orthologous genes, protein-protein interactions (PPI) and annotation for gene families. The current H-InvDB annotation resources consist of two main views: Transcript view and Locus view and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group.

Research paper thumbnail of Alterations in rRNA-mRNA Interaction during Plastid Evolution

Molecular Biology and Evolution, 2014

Translation initiation depends on the recognition of mRNA by a ribosome. For this to occur, proka... more Translation initiation depends on the recognition of mRNA by a ribosome. For this to occur, prokaryotes primarily use the Shine-Dalgarno (SD) interaction, where the 3 0-tail of small subunit rRNA (core motif: 3 0 CCUCC) forms base pairs with a complementary signal sequence in the 5 0-untranslated region of mRNA. Here, we examined what happened to SD interactions during the evolution of a cyanobacterial endosymbiont into modern plastids (including chloroplasts). Our analysis of available complete plastid genome sequences revealed that the majority of plastids retained SD interactions but with varying levels of usage. Parallel losses of SD interactions took place in plastids of Chlorophyta, Euglenophyta, and Chromerida/Apicomplexa lineages, presumably related to their extensive reductive evolution. Interestingly, we discovered that the classical SD interaction (3 0 CCUCC/5 0 GGAGG [rRNA/mRNA]) was replaced by an altered SD interaction (3 0 CCCU/5 0 GGGA or 3 0 CUUCC/5 0 GAAGG) through coordinated changes in the sequences of the core rRNA motif and its paired mRNA signal. These changes in plastids of Chlorophyta and Euglenophyta proceeded through intermediate stages that allowed both the classical and altered SD interactions. This coevolution between the rRNA motif and the mRNA signal demonstrates unexpected plasticity in the translation initiation machinery.

Research paper thumbnail of Genome-wide demethylation during neural differentiation of P19 embryonal carcinoma cells

Journal of Human Genetics, 2008

Epigenetic regulation including DNA methylation plays an important role in several differentiatio... more Epigenetic regulation including DNA methylation plays an important role in several differentiation processes. We profiled global DNA methylation in the neural differentiation of P19 embryonic carcinoma cells using a microarray-based method called MIAMI. We found a genome-wide demethylation of genes. This suggests demethylation rather than methylation is important in neural differentiation.

Research paper thumbnail of The Origin and Evolution of Eukaryotic Protein Kinases

Research paper thumbnail of Sequence Comparison of Human and Mouse Genes Reveals a Homologous Block Structure in the Promoter Regions

Genome Research, 2004

Comparative sequence analysis was carried out for the regions adjacent to experimentally validate... more Comparative sequence analysis was carried out for the regions adjacent to experimentally validated transcriptional start sites (TSSs), using 3324 pairs of human and mouse genes. We aligned the upstream putative promoter sequences over the 1-kb proximal regions and found that the sequence conservation could not be further extended at, on average, 510 bp upstream positions of the TSSs. This discontinuous manner of the sequence conservation revealed a “block” structure in about one-third of the putative promoter regions. Consistently, we also observed that G+C content and CpG frequency were significantly different inside and outside the blocks. Within the blocks, the sequence identity was uniformly 65% regardless of their length. About 90% of the previously characterized transcription factor binding sites were located within those blocks. In 46% of the blocks, the 5′ ends were bounded by interspersed repetitive elements, some of which may have nucleated the genomic rearrangements. The ...

Research paper thumbnail of A study on the application of topic models to motif finding algorithms

BMC Bioinformatics, Dec 1, 2016

Background: Topic models are statistical algorithms which try to discover the structure of a set ... more Background: Topic models are statistical algorithms which try to discover the structure of a set of documents according to the abstract topics contained in them. Here we try to apply this approach to the discovery of the structure of the transcription factor binding sites (TFBS) contained in a set of biological sequences, which is a fundamental problem in molecular biology research for the understanding of transcriptional regulation. Here we present two methods that make use of topic models for motif finding. First, we developed an algorithm in which first a set of biological sequences are treated as text documents, and the k-mers contained in them as words, to then build a correlated topic model (CTM) and iteratively reduce its perplexity. We also used the perplexity measurement of CTMs to improve our previous algorithm based on a genetic algorithm and several statistical coefficients. Results: The algorithms were tested with 56 data sets from four different species and compared to 14 other methods by the use of several coefficients both at nucleotide and site level. The results of our first approach showed a performance comparable to the other methods studied, especially at site level and in sensitivity scores, in which it scored better than any of the 14 existing tools. In the case of our previous algorithm, the new approach with the addition of the perplexity measurement clearly outperformed all of the other methods in sensitivity, both at nucleotide and site level, and in overall performance at site level. Conclusions: The statistics obtained show that the performance of a motif finding method based on the use of a CTM is satisfying enough to conclude that the application of topic models is a valid method for developing motif finding algorithms. Moreover, the addition of topic models to a previously developed method dramatically increased its performance, suggesting that this combined algorithm can be a useful tool to successfully predict motifs in different kinds of sets of DNA sequences.

Research paper thumbnail of Two different classes of co-occurring motif pairs found by a novel visualization method in human promoter regions

BMC Genomics, 2008

BackgroundIt is essential in modern biology to understand how transcriptional regulatory regions ... more BackgroundIt is essential in modern biology to understand how transcriptional regulatory regions are composed ofcis-elements, yet we have limited knowledge of, for example, the combinational uses of these elements and their positional distribution.ResultsWe predicted the positions of 228 known binding motifs for transcription factors in phylogenetically conserved regions within -2000 and +1000 bp of transcriptional start sites (TSSs) of human genes and visualized their correlated non-overlapping occurrences. In the 8,454 significantly correlated motif pairs, two major classes were observed: 248 pairs in Class 1 were mainly found around TSSs, whereas 4,020 Class 2 pairs appear at rather arbitrary distances from TSSs. These classes are distinct in a number of aspects. First, the positional distribution of the Class 1 constituent motifs shows a single peak near the TSSs, whereas Class 2 motifs show a relatively broad distribution. Second, genes that harbor the Class 1 pairs are more li...

Research paper thumbnail of WoLF PSORT: protein localization predictor

Nucleic Acids Research, 2007

WoLF PSORT is an extension of the PSORT II program for protein subcellular location prediction. W... more WoLF PSORT is an extension of the PSORT II program for protein subcellular location prediction. WoLF PSORT converts protein amino acid sequences into numerical localization features; based on sorting signals, amino acid composition and functional motifs such as DNA-binding motifs. After conversion, a simple k-nearest neighbor classifier is used for prediction. Using html, the evidence for each prediction is shown in two ways: (i) a list of proteins of known localization with the most similar localization features to the query, and (ii) tables with detailed information about individual localization features. For convenience, sequence alignments of the query to similar proteins and links to UniProt and Gene Ontology are provided. Taken together, this information allows a user to understand the evidence (or lack thereof) behind the predictions made for particular proteins.

Research paper thumbnail of Bayesian Joint Prediction of Associated Transcription Factors in Bacillus Subtilis

Biocomputing 2005, 2004

Sigma factors, often in conjunction with other transcription factors, regulate gene expression in... more Sigma factors, often in conjunction with other transcription factors, regulate gene expression in prokaryotes at the transcriptional level. Specific transcription factors tend to co-occur with specific sigma factors. To predict new members of the transcription factor regulon, we applied Bayes rule to combine the Bayesian probability of sigma factor prediction calculated from microarray data and the sigma factor binding sequence motif, the motif score of the transcription factor associated with the sigma factor, the empirically determined distance between the transcription start site to the cis-regulatory region, and the tendency for specific sigma factors and transcription factors to co-occur. By combining these information sources, we improve the accuracy of predicting regulation by transcription factors, and also confirm the sigma factor prediction. We applied our proposed method to all genes in Bacillus subtilis to find currently unknown gene regulations by transcription factors and sigma factors.

Research paper thumbnail of Phylogenetic Analysis of Eubacterial Transcriptional Systems Based on the DBTBS Database of B. subtilis Transcription Factors and Promoters

Genome Informatics, 2003

In 1999, we released the DBTBS database of Bacillus subtilis promoters and transcription factors ... more In 1999, we released the DBTBS database of Bacillus subtilis promoters and transcription factors [1]. DBTBS is a reference database containing experimentally characterized transcription factors with their regulated genes as well as their recognition sequences, as reported in the literature. This database is useful to confirm predictions about transcription relatives using B.subtilis data. The latest version of this database shows position specific scoring matrices (PSSMs) to support a function to find putative transcription factor binding sites [2]. One of the problems for using weight matrices or consensus patterns to identify novel recognition positions of known transcription factors is that it often produces a number of false positives. To overcome this problem, the use of sequence conservation information between closely related species, called phylogenetic footprinting, is widely used. For example, we predicted B. subtilis regulons based on the conservation of upstream sequence...

Research paper thumbnail of A Database of B. subtilis Promoters and Transcription Factors

Genome Informatics, 1998

Although the number of bacteria with their entire genomic sequence known is increasing, it is ess... more Although the number of bacteria with their entire genomic sequence known is increasing, it is essential to study well-known genomes for understanding the 'blueprint` of bacteria more precisely. For this purpose, E. coli and B. subtilis are especially suitable because of their long history of research. Among various information coded in bacterial genomes, we are interested in the analysis of transcriptional regulation network. One of our ultimate goal is to predict the expression condition of given ORFs from their upstream sequences. For example, we have developed a prediction system of sigma-dependency of ORFs found in B. subtilis [6] and in E. coli (unpublished). However, to understand the detailed mechanism of transcription regulation, the knowledge of other transcription factors is also crucial. For E. coli, there is such a database called RegulonDB [3] but there is no databases containing comprehensive information of transcription in B. subtilis. Thus, we constructed a datab...

Research paper thumbnail of Modeling Transcriptional Units of E. coli Genes Using HMM

Genome Informatics, 1998

In recent years, the number of bacteria whose entire genomic sequence is determined is growing ra... more In recent years, the number of bacteria whose entire genomic sequence is determined is growing rapidly. However, the information which can be derived from computer analyses of them is still limited although the predictive identi cation of coding regions is performed with relatively high accuracy (see [4], for example). Therefore, we have studied ways to interpret the regulatory information coded in genomic sequences [5, 6]. In this work, we report our rst e ort to integrate the models for detecting various signals (e.g., promoters and terminators) with our previous model of coding regions, aiming at the recognition of transcriptional units in bacterial genomes.

Research paper thumbnail of Determination of the Nucleotide Sequence of Bombyx mori Cytoplasmic Polyhedrosis Virus Segment 9 and Its Expression in BmN4 Cells

Journal of Virology, 1998

Cloning and sequencing of segment 9 of Bombyx mori cytoplasmic polyhedrosis virus (BmCPV) strains... more Cloning and sequencing of segment 9 of Bombyx mori cytoplasmic polyhedrosis virus (BmCPV) strains H and I were performed. The segment consisted of 1,186 bp harboring 5′ and 3′ noncoding regions and an open reading frame from positions 75 to 1037, encoding a protein with 320 amino acids, termed NS5. Comparison of the nucleotide sequences of NS5 for the two strains indicated 37 point differences resulting in only six amino acid replacements. Homology search showed that NS5 has localized similarities to human poliovirus RNA-dependent RNA polymerase and human rotavirus NS26. By Western blot analysis, NS5 was found in BmCPV-infected midgut cells, but not in polyhedra or virus virions, and was mainly detectable in the nucleus in BmCPV-infected BmN4 cells. Immunoblot analysis with anti-NS5 and antipolyhedrin antibodies displayed marked differences in the period of expression of NS5 and polyhedrin: the polyhedrin molecule was first detected 2 or 3 days after infection with BmCPV, whereas th...

Research paper thumbnail of Transcriptional regulation of a horizontally transferred gene from bacterium to chordate

Proceedings. Biological sciences, Dec 28, 2016

The horizontal transfer of genes between distantly related organisms is undoubtedly a major facto... more The horizontal transfer of genes between distantly related organisms is undoubtedly a major factor in the evolution of novel traits. Because genes are functionless without expression, horizontally transferred genes must acquire appropriate transcriptional regulations in their recipient organisms, although the evolutionary mechanism is not known well. The defining characteristic of tunicates is the presence of a cellulose containing tunic covering the adult and larval body surface. Cellulose synthase was acquired by horizontal gene transfer from Actinobacteria. We found that acquisition of the binding site of AP-2 transcription factor was essential for tunicate cellulose synthase to gain epidermal-specific expression. Actinobacteria have very GC-rich genomes, regions of which are capable of inducing specific expression in the tunicate epidermis as the AP-2 binds to a GC-rich region. Therefore, the actinobacterial cellulose synthase could have been potentiated to evolve its new functi...

Research paper thumbnail of ZBTB16 as a Downstream Target Gene of Osterix Regulates Osteoblastogenesis of Human Multipotent Mesenchymal Stromal Cells

Journal of cellular biochemistry, Oct 1, 2016

Human multipotent mesenchymal stromal cells (hMSCs) possess the ability to differentiate into ost... more Human multipotent mesenchymal stromal cells (hMSCs) possess the ability to differentiate into osteoblasts, and they can be utilized as a source for bone regenerative therapy. Osteoinductive pretreatment, which induces the osteoblastic differentiation of hMSCs in vitro, has been widely used for bone tissue engineering prior to cell transplantation. However, the molecular basis of osteoblastic differentiation induced by osteoinductive medium (OIM) is still unknown. Therefore, we used a next-generation sequencer to investigate the changes in gene expression during the osteoblastic differentiation of hMSCs. The hMSCs used in this study possessed both multipotency and self-renewal ability. Whole-transcriptome analysis revealed that the expression of zinc finger and BTB domain containing 16 (ZBTB16) was significantly increased during the osteoblastogenesis of hMSCs. ZBTB16 mRNA and protein expression was enhanced by culturing the hMSCs with OIM. Small interfering RNA (siRNA)-mediated gene...

Research paper thumbnail of Genome informatics for data-driven biology

GenomeBiology.com (London. Print), Mar 27, 2002

Research paper thumbnail of Discovery of Intermediary Genes between Pathways Using Sparse Regression

PloS one, 2015

The use of pathways and gene interaction networks for the analysis of differential expression exp... more The use of pathways and gene interaction networks for the analysis of differential expression experiments has allowed us to highlight the differences in gene expression profiles between samples in a systems biology perspective. The usefulness and accuracy of pathway analysis critically depend on our understanding of how genes interact with one another. That knowledge is continuously improving due to advances in next generation sequencing technologies and in computational methods. While most approaches treat each of them as independent entities, pathways actually coordinate to perform essential functions in a cell. In this work, we propose a methodology based on a sparse regression approach to find genes that act as intermediary to and interact with two pathways. We model each gene in a pathway using a set of predictor genes, and a connection is formed between the pathway gene and a predictor gene if the sparse regression coefficient corresponding to the predictor gene is non-zero. A...

Research paper thumbnail of DBTMEE: a database of transcriptome in mouse early embryos

Nucleic acids research, 2015

DBTMEE (http://dbtmee.hgc.jp/) is a searchable and browsable database designed to manipulate gene... more DBTMEE (http://dbtmee.hgc.jp/) is a searchable and browsable database designed to manipulate gene expression information from our ultralarge-scale whole-transcriptome analysis of mouse early embryos. Since integrative approaches with multiple public analytical data have become indispensable for studying embryogenesis due to technical challenges such as biological sample collection, we intend DBTMEE to be an integrated gateway for the research community. To do so, we combined the gene expression profile with various public resources. Thereby, users can extensively investigate molecular characteristics among totipotent, pluripotent and differentiated cells while taking genetic and epigenetic characteristics into consideration. We have also designed user friendly web interfaces that enable users to access the data quickly and easily. DBTMEE will help to promote our understanding of the enigmatic fertilization dynamics.

Research paper thumbnail of Prediction of Transcriptional Terminators in Bacillus subtilis and Related Species

PLoS Computational Biology, 2005

In prokaryotes, genes belonging to the same operon are transcribed in a single mRNA molecule. Tra... more In prokaryotes, genes belonging to the same operon are transcribed in a single mRNA molecule. Transcription starts as the RNA polymerase binds to the promoter and continues until it reaches a transcriptional terminator. Some terminators rely on the presence of the Rho protein, whereas others function independently of Rho. Such Rhoindependent terminators consist of an inverted repeat followed by a stretch of thymine residues, allowing us to predict their presence directly from the DNA sequence. Unlike in Escherichia coli, the Rho protein is dispensable in Bacillus subtilis, suggesting a limited role for Rho-dependent termination in this organism and possibly in other Firmicutes. We analyzed 463 experimentally known terminating sequences in B. subtilis and found a decision rule to distinguish Rho-independent transcriptional terminators from non-terminating sequences. The decision rule allowed us to find the boundaries of operons in B. subtilis with a sensitivity and specificity of about 94%. Using the same decision rule, we found an average sensitivity of 94% for 57 bacteria belonging to the Firmicutes phylum, and a considerably lower sensitivity for other bacteria. Our analysis shows that Rho-independent termination is dominant for Firmicutes in general, and that the properties of the transcriptional terminators are conserved. Terminator prediction can be used to reliably predict the operon structure in these organisms, even in the absence of experimentally known operons. Genome-wide predictions of Rho-independent terminators for the 57 Firmicutes are available in the Supporting Information section.

Research paper thumbnail of HitPredict: a database of quality assessed protein–protein interactions in nine species

Nucleic Acids Research, 2010

Despite the availability of a large number of proteinprotein interactions (PPIs) in several speci... more Despite the availability of a large number of proteinprotein interactions (PPIs) in several species, researchers are often limited to using very small subsets in a few organisms due to the high prevalence of spurious interactions. In spite of the importance of quality assessment of experimentally determined PPIs, a surprisingly small number of databases provide interactions with scores and confidence levels. We introduce HitPredict (http:// hintdb.hgc.jp/htp/), a database with quality assessed PPIs in nine species. HitPredict assigns a confidence level to interactions based on a reliability score that is computed using evidence from sequence, structure and functional annotations of the interacting proteins. HitPredict was first released in 2005 and is updated annually. The current release contains 36 930 proteins with 176 983 non-redundant, physical interactions, of which 116 198 (66%) are predicted to be of high confidence.

Research paper thumbnail of The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts

Nucleic Acids Research, 2007

Here we report the new features and improvements in our latest release of the H-Invitational Data... more Here we report the new features and improvements in our latest release of the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/), a comprehensive annotation resource for human genes and transcripts. H-InvDB, originally developed as an integrated database of the human transcriptome based on extensive annotation of large sets of fulllength cDNA (FLcDNA) clones, now provides annotation for 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD), in addition to 54 978 human FLcDNAs, in the latest release H-InvDB_4.6. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 642 (1.9%) non-protein-coding loci; 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. For all these transcripts and genes, we provide comprehensive annotation including gene structures, gene functions, alternative splicing variants, functional non-protein-coding RNAs, functional domains, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene expression profiles, orthologous genes, protein-protein interactions (PPI) and annotation for gene families. The current H-InvDB annotation resources consist of two main views: Transcript view and Locus view and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group.

Research paper thumbnail of Alterations in rRNA-mRNA Interaction during Plastid Evolution

Molecular Biology and Evolution, 2014

Translation initiation depends on the recognition of mRNA by a ribosome. For this to occur, proka... more Translation initiation depends on the recognition of mRNA by a ribosome. For this to occur, prokaryotes primarily use the Shine-Dalgarno (SD) interaction, where the 3 0-tail of small subunit rRNA (core motif: 3 0 CCUCC) forms base pairs with a complementary signal sequence in the 5 0-untranslated region of mRNA. Here, we examined what happened to SD interactions during the evolution of a cyanobacterial endosymbiont into modern plastids (including chloroplasts). Our analysis of available complete plastid genome sequences revealed that the majority of plastids retained SD interactions but with varying levels of usage. Parallel losses of SD interactions took place in plastids of Chlorophyta, Euglenophyta, and Chromerida/Apicomplexa lineages, presumably related to their extensive reductive evolution. Interestingly, we discovered that the classical SD interaction (3 0 CCUCC/5 0 GGAGG [rRNA/mRNA]) was replaced by an altered SD interaction (3 0 CCCU/5 0 GGGA or 3 0 CUUCC/5 0 GAAGG) through coordinated changes in the sequences of the core rRNA motif and its paired mRNA signal. These changes in plastids of Chlorophyta and Euglenophyta proceeded through intermediate stages that allowed both the classical and altered SD interactions. This coevolution between the rRNA motif and the mRNA signal demonstrates unexpected plasticity in the translation initiation machinery.

Research paper thumbnail of Genome-wide demethylation during neural differentiation of P19 embryonal carcinoma cells

Journal of Human Genetics, 2008

Epigenetic regulation including DNA methylation plays an important role in several differentiatio... more Epigenetic regulation including DNA methylation plays an important role in several differentiation processes. We profiled global DNA methylation in the neural differentiation of P19 embryonic carcinoma cells using a microarray-based method called MIAMI. We found a genome-wide demethylation of genes. This suggests demethylation rather than methylation is important in neural differentiation.

Research paper thumbnail of The Origin and Evolution of Eukaryotic Protein Kinases

Research paper thumbnail of Sequence Comparison of Human and Mouse Genes Reveals a Homologous Block Structure in the Promoter Regions

Genome Research, 2004

Comparative sequence analysis was carried out for the regions adjacent to experimentally validate... more Comparative sequence analysis was carried out for the regions adjacent to experimentally validated transcriptional start sites (TSSs), using 3324 pairs of human and mouse genes. We aligned the upstream putative promoter sequences over the 1-kb proximal regions and found that the sequence conservation could not be further extended at, on average, 510 bp upstream positions of the TSSs. This discontinuous manner of the sequence conservation revealed a “block” structure in about one-third of the putative promoter regions. Consistently, we also observed that G+C content and CpG frequency were significantly different inside and outside the blocks. Within the blocks, the sequence identity was uniformly 65% regardless of their length. About 90% of the previously characterized transcription factor binding sites were located within those blocks. In 46% of the blocks, the 5′ ends were bounded by interspersed repetitive elements, some of which may have nucleated the genomic rearrangements. The ...