Carhta Gene: multipopulation integrated genetic and radiation hybrid mapping (original) (raw)
Journal Article
,
INRA, Biométrie et Intelligence Artificielle/Génétique Cellulaire BP 27, 31326 Castanet-Tolosan Cedex, France
*To whom correspondence should be addressed.
Search for other works by this author on:
,
INRA, Biométrie et Intelligence Artificielle/Génétique Cellulaire BP 27, 31326 Castanet-Tolosan Cedex, France
Search for other works by this author on:
,
INRA, Biométrie et Intelligence Artificielle/Génétique Cellulaire BP 27, 31326 Castanet-Tolosan Cedex, France
Search for other works by this author on:
,
INRA, Biométrie et Intelligence Artificielle/Génétique Cellulaire BP 27, 31326 Castanet-Tolosan Cedex, France
Search for other works by this author on:
INRA, Biométrie et Intelligence Artificielle/Génétique Cellulaire BP 27, 31326 Castanet-Tolosan Cedex, France
Search for other works by this author on:
Received:
14 September 2004
Revision received:
03 November 2004
Accepted:
03 November 2004
Published:
14 December 2004
Cite
Simon de Givry, Martin Bouchez, Patrick Chabrier, Denis Milan, Thomas Schiex, Carhta Gene: multipopulation integrated genetic and radiation hybrid mapping, Bioinformatics, Volume 21, Issue 8, April 2005, Pages 1703–1704, https://doi.org/10.1093/bioinformatics/bti222
Close
Navbar Search Filter Mobile Enter search term Search
Abstract
Summary: Carhta Gene: is an integrated genetic and radiation hybrid (RH) mapping tool which can deal with multiple populations, including mixtures of genetic and RH data. Carhta Gene: performs multipoint maximum likelihood estimations with accelerated expectation–maximization algorithms for some pedigrees and has sophisticated algorithms for marker ordering. Dedicated heuristics for framework mapping are also included. Carhta Gene: can be used as a C++ library, through a shell command and a graphical interface. The XML output for companion tools is integrated.
Availability: The program is available free of charge from www.inra.fr/bia/T/CarthaGene for Linux, Windows and Solaris machines (with Open Source).
Contact: tschiex@toulouse.inra.fr
INTRODUCTION
The genetic mapping technique is used to locate polymorphic markers on chromosomes by making use of a probabilistic model of crossing-over. Many genetic mapping tools are available to analyze data in experimental crosses. Most of them are designed to analyze line crosses, one family at a time; few can integrate data from several crosses to build consensus maps (see http://linkage.rockfeller.edu/soft).
Radiation hybrid (RH) mapping is a somatic cell technique that complements the genetic mapping technique, allowing for finer resolution. Most existing RH mapping packages are listed at compgen.rutgers.edu/rhmap
Although capable of handling line crosses, Carhta Gene has been designed to create consensus maps from multiple populations. Instead of directly integrating existing maps, Carhta Gene computes maximum multipoint likelihood maps, taking into account all the available information, offering additional reliability. It can integrate RH and genetic data together.
METHODS
Parametric probabilistic HMM based models for crossover during meiosis (Lander et al., 1987) and for chromosome breakage and retention during RH panel construction (Lange et al., 1995) include parameters such as recombination and breakage probabilities between adjacent markers as well as retention probability. Given experimental data, and assuming some marker ordering, the values of these parameters can be estimated by maximizing the probability that the data observed has been generated by the model. This likelihood therefore allows simultaneous evaluation of the assumed marker ordering and estimation of the corresponding parameters (distances). This is done by a so-called expectation–maximization (EM) algorithm. Models for backcross, f2 intercross, recombinant inbred lines (self and sibs) and phase-known outbreds are available. For RH estimation, the equal retention model both in its haploid and diploid forms is used. The EM forward–backward algorithm used in Carhta Gene has been accelerated by taking into account specific properties of backcross and haploid RH data (see Schiex et al., 2001). Compared with usual EM implementations, the accelerated algorithm can run one or two orders of magnitude faster with no loss of precision.
Carhta Gene provides two ways to merge data files:
- If one assumes that the data merged represents a single map (same order and distances, either genetic or RH), a so-called genetic merging is done. Untyped markers in a population/panel are considered as missing data and one consensus map is produced. RH and genetic data cannot be merged under this model because they use different distances.
- Otherwise, it is assumed that the data files are representative of maps with a common marker ordering but specific distances per dataset. Here, a single consensus ordering is produced but a specific set of distances is estimated for each model merged. Any type of data, genetic or RH, can be merged here. This is called order merging. These two methods can be combined freely.
For genetic merging, the EM implementation deals with combined datasets by performing an E computation on each dataset and then using an M step that takes into account the merging performed. For datasets merged by order, independent log-likelihoods are obtained per dataset and summed up.
The main problem in genetic or RH mapping arises from the number of possible marker orders. For n markers, there exists n!/2 different possible orders. Since the connection between mapping and the traveling salesman problem (TSP) is well known for genetic mapping (Schiex and Gaspin, 1997www.inra.fr/bia/T/CartaGene) and for RH mapping, (Ben-Dor et al., 2000) Carhta Gene relies on this connection and provides extensions of TSP solving algorithms:
- Fast heuristics (e.g. nearest neighbor) using 2-point information for guidance and multipoint estimations for final evaluation.
- More powerful meta-heuristics (e.g. simulated annealing and taboo search) directly using multipoint maximum likelihood as the optimization criterion.
- Even more efficient pure TSP heuristics such as the Lin-Kernighan heuristic (LKH) (Lin and Kernighan, 1973), using the LKH implementation (Helsgaun, 2000, http://www.dat.ruc.dk/keld/research/LKH). These algorithms can use either multiple 2-point maximum likelihood or obligate chromosome breaks (Ben-Dor et al., 2000; Agarwala et al., 2000) to define optimal maps. All the maps identified in this way are then evaluated using a multipoint EM based estimation.
- Finally, dedicated heuristics for framework mapping, duplicated marker detection and map validation.
A unique feature of Carhta Gene is that, instead of producing a single supposedly optimal map, it produces an ordered set of alternative maps which allows an estimation of the reliability of ordering of each marker. The set of all these maps can be explored manually and compared graphically (with a Postscript output; Fig. 1). Dedicated automatic tools facilitate the identification of unreliable markers for further analysis.
Final maps can be produced under MapMaker (Lander et al., 1987) and XML formats for data exchange with MCQTL (Jourjon et al., 2004) (a multiple population QTL mapping software) and BioMercator (Arcade et al., 2004) (a software for integrating genetic maps and QTL detected in independent experiments).
IMPLEMENTATION
Carhta Gene is implemented as a C++ library. A Tcl programable shell command for automated mapping is available and a graphical Tcl/Tk interface for interactive mapping. Binaries for Windows, Solaris and Linux are provided with an open source distribution (using the G-Forge site mulcyber.toulouse.inra.fr).
Fig. 1
A typical graphical session of CarHTa Gene:, displaying possible maps with distance and log-likelihoods.
This work was supported by GENOPLANTE project ‘Integrative Tools for Genetic Mapping’.
REFERENCES
Agarwala, R., Applegate, D.L., Maglott, D., Schuler, G.D., Schaffer, A.A.
2000
A fast and scalable radiation hybrid map construction and integration strategy.
Genome Res.
10
350
– 364
Arcade, A., Labourdette, A., Falque, M., Mangin, B., Chardon, F., Charcosset, A., Joets, J.
2004
Biomercator: integrating genetic maps and QTL towards discovery of candidate genes.
Bioinformatics
20
2324
– 2326
Ben-Dor, A., Chor, B., Pelleg, D.
2000
RHO—radiation hybrid ordering.
Genome Res.
10
365
– 378
Helsgaun, K.
2000
An effective implementation of the Lin-Kernighan traveling salesman heuristic.
Eur. J. Oper. Res.
126
106
– 130
Jourjon, M.-F., Jasson, S., Marcel, J., Ngom, B., Mangin, B.
2004
MCQTL: Multi-allelic QTL mapping in multi-cross design.
Bioinformatics
21
128
– 130
Lander, E.S., Green, P., Abrahamson, J., Barlow, A., Daly, M.J., Lincoln, S.E., Newburg, L.
1987
MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations.
Genomics
1
174
– 181
Lange, K., Boehnke, M., Cox, D.R., Lunetta, K.L.
1995
Statistical methods for polyploid radiation hybrid mapping.
Genome Res.
5
136
– 150
Lin, S. and Kernighan, B.W.
1973
An effective heuristic algorithm for the traveling salesman problem.
Oper. Res.
21
498
– 516
Schiex, T. and Gaspin, C.
1997
Cartagene: constructing and joining maximum likelihood genetic maps.
Proceedings of ISMB'97
, Halkidiki, Greece Porto Carras, pp.
258
–267
Schiex, T., Chabrier, P., Bouchez, M., Milan, D.
2001
Boosting EM for radiation hybrid and genetic mapping.
Proceedings of WABI'01
, pp.
41
–51 vol. 2149 LNCS
© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org
Citations
Views
Altmetric
Metrics
Total Views 1,759
1,350 Pageviews
409 PDF Downloads
Since 12/1/2016
Month: | Total Views: |
---|---|
December 2016 | 1 |
January 2017 | 1 |
February 2017 | 5 |
March 2017 | 18 |
April 2017 | 3 |
May 2017 | 16 |
June 2017 | 8 |
July 2017 | 17 |
August 2017 | 24 |
September 2017 | 4 |
October 2017 | 7 |
November 2017 | 4 |
December 2017 | 33 |
January 2018 | 27 |
February 2018 | 24 |
March 2018 | 29 |
April 2018 | 35 |
May 2018 | 29 |
June 2018 | 29 |
July 2018 | 27 |
August 2018 | 38 |
September 2018 | 12 |
October 2018 | 24 |
November 2018 | 25 |
December 2018 | 24 |
January 2019 | 36 |
February 2019 | 22 |
March 2019 | 28 |
April 2019 | 33 |
May 2019 | 12 |
June 2019 | 13 |
July 2019 | 22 |
August 2019 | 24 |
September 2019 | 32 |
October 2019 | 14 |
November 2019 | 22 |
December 2019 | 18 |
January 2020 | 17 |
February 2020 | 24 |
March 2020 | 18 |
April 2020 | 16 |
May 2020 | 6 |
June 2020 | 19 |
July 2020 | 15 |
August 2020 | 11 |
September 2020 | 22 |
October 2020 | 22 |
November 2020 | 13 |
December 2020 | 19 |
January 2021 | 11 |
February 2021 | 27 |
March 2021 | 34 |
April 2021 | 13 |
May 2021 | 39 |
June 2021 | 20 |
July 2021 | 24 |
August 2021 | 18 |
September 2021 | 14 |
October 2021 | 18 |
November 2021 | 18 |
December 2021 | 7 |
January 2022 | 12 |
February 2022 | 20 |
March 2022 | 12 |
April 2022 | 24 |
May 2022 | 18 |
June 2022 | 17 |
July 2022 | 18 |
August 2022 | 22 |
September 2022 | 16 |
October 2022 | 18 |
November 2022 | 6 |
December 2022 | 15 |
January 2023 | 18 |
February 2023 | 19 |
March 2023 | 17 |
April 2023 | 9 |
May 2023 | 12 |
June 2023 | 9 |
July 2023 | 6 |
August 2023 | 24 |
September 2023 | 17 |
October 2023 | 17 |
November 2023 | 19 |
December 2023 | 27 |
January 2024 | 23 |
February 2024 | 24 |
March 2024 | 24 |
April 2024 | 15 |
May 2024 | 18 |
June 2024 | 20 |
July 2024 | 12 |
August 2024 | 17 |
September 2024 | 18 |
October 2024 | 11 |
Citations
314 Web of Science
×
Email alerts
Citing articles via
More from Oxford Academic