Carhta Gene: multipopulation integrated genetic and radiation hybrid mapping (original) (raw)

Journal Article


INRA, Biométrie et Intelligence Artificielle/Génétique Cellulaire BP 27, 31326 Castanet-Tolosan Cedex, France

*To whom correspondence should be addressed.

INRA, Biométrie et Intelligence Artificielle/Génétique Cellulaire BP 27, 31326 Castanet-Tolosan Cedex, France

INRA, Biométrie et Intelligence Artificielle/Génétique Cellulaire BP 27, 31326 Castanet-Tolosan Cedex, France

INRA, Biométrie et Intelligence Artificielle/Génétique Cellulaire BP 27, 31326 Castanet-Tolosan Cedex, France

INRA, Biométrie et Intelligence Artificielle/Génétique Cellulaire BP 27, 31326 Castanet-Tolosan Cedex, France

14 September 2004

Revision received:

03 November 2004


03 November 2004


14 December 2004


Simon de Givry, Martin Bouchez, Patrick Chabrier, Denis Milan, Thomas Schiex, Carhta Gene: multipopulation integrated genetic and radiation hybrid mapping, Bioinformatics, Volume 21, Issue 8, April 2005, Pages 1703–1704,

Summary: Carhta Gene: is an integrated genetic and radiation hybrid (RH) mapping tool which can deal with multiple populations, including mixtures of genetic and RH data. Carhta Gene: performs multipoint maximum likelihood estimations with accelerated expectation–maximization algorithms for some pedigrees and has sophisticated algorithms for marker ordering. Dedicated heuristics for framework mapping are also included. Carhta Gene: can be used as a C++ library, through a shell command and a graphical interface. The XML output for companion tools is integrated.

Availability: The program is available free of charge from for Linux, Windows and Solaris machines (with Open Source).



The genetic mapping technique is used to locate polymorphic markers on chromosomes by making use of a probabilistic model of crossing-over. Many genetic mapping tools are available to analyze data in experimental crosses. Most of them are designed to analyze line crosses, one family at a time; few can integrate data from several crosses to build consensus maps (see

Radiation hybrid (RH) mapping is a somatic cell technique that complements the genetic mapping technique, allowing for finer resolution. Most existing RH mapping packages are listed at

Although capable of handling line crosses, Carhta Gene has been designed to create consensus maps from multiple populations. Instead of directly integrating existing maps, Carhta Gene computes maximum multipoint likelihood maps, taking into account all the available information, offering additional reliability. It can integrate RH and genetic data together.


Parametric probabilistic HMM based models for crossover during meiosis (Lander et al., 1987) and for chromosome breakage and retention during RH panel construction (Lange et al., 1995) include parameters such as recombination and breakage probabilities between adjacent markers as well as retention probability. Given experimental data, and assuming some marker ordering, the values of these parameters can be estimated by maximizing the probability that the data observed has been generated by the model. This likelihood therefore allows simultaneous evaluation of the assumed marker ordering and estimation of the corresponding parameters (distances). This is done by a so-called expectation–maximization (EM) algorithm. Models for backcross, f2 intercross, recombinant inbred lines (self and sibs) and phase-known outbreds are available. For RH estimation, the equal retention model both in its haploid and diploid forms is used. The EM forward–backward algorithm used in Carhta Gene has been accelerated by taking into account specific properties of backcross and haploid RH data (see Schiex et al., 2001). Compared with usual EM implementations, the accelerated algorithm can run one or two orders of magnitude faster with no loss of precision.

Carhta Gene provides two ways to merge data files:

  1. If one assumes that the data merged represents a single map (same order and distances, either genetic or RH), a so-called genetic merging is done. Untyped markers in a population/panel are considered as missing data and one consensus map is produced. RH and genetic data cannot be merged under this model because they use different distances.
  2. Otherwise, it is assumed that the data files are representative of maps with a common marker ordering but specific distances per dataset. Here, a single consensus ordering is produced but a specific set of distances is estimated for each model merged. Any type of data, genetic or RH, can be merged here. This is called order merging. These two methods can be combined freely.

For genetic merging, the EM implementation deals with combined datasets by performing an E computation on each dataset and then using an M step that takes into account the merging performed. For datasets merged by order, independent log-likelihoods are obtained per dataset and summed up.

The main problem in genetic or RH mapping arises from the number of possible marker orders. For n markers, there exists n!/2 different possible orders. Since the connection between mapping and the traveling salesman problem (TSP) is well known for genetic mapping (Schiex and Gaspin, and for RH mapping, (Ben-Dor et al., 2000) Carhta Gene relies on this connection and provides extensions of TSP solving algorithms:

A unique feature of Carhta Gene is that, instead of producing a single supposedly optimal map, it produces an ordered set of alternative maps which allows an estimation of the reliability of ordering of each marker. The set of all these maps can be explored manually and compared graphically (with a Postscript output; Fig. 1). Dedicated automatic tools facilitate the identification of unreliable markers for further analysis.

Final maps can be produced under MapMaker (Lander et al., 1987) and XML formats for data exchange with MCQTL (Jourjon et al., 2004) (a multiple population QTL mapping software) and BioMercator (Arcade et al., 2004) (a software for integrating genetic maps and QTL detected in independent experiments).


Carhta Gene is implemented as a C++ library. A Tcl programable shell command for automated mapping is available and a graphical Tcl/Tk interface for interactive mapping. Binaries for Windows, Solaris and Linux are provided with an open source distribution (using the G-Forge site

A typical graphical session of CarHTa Gene:, displaying possible maps with distance and log-likelihoods.

Fig. 1

Fig. 1

This work was supported by GENOPLANTE project ‘Integrative Tools for Genetic Mapping’.


© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email:





