Theobroma cacao genome sequenced: Yummier chocolate on the way!
By Mike Hanlon
December 27, 2010
If DNA sequencing never held much relevance for you, consider the benefits likely to flow from the recent sequencing and assembly of the chocolate tree genome. The Theobroma cacao plant is generally regarded as producing the world's finest chocolate, but is particularly vulnerable to disease and not particularly productive, and is hence shunned by risk averse growers. It is hoped the research will not only lead to hardier trees by altering the genes, but will also enable the percentages of cocoa butter, flavonoids, antioxidants, terpenoids and hormones to be regulated. The end result is likely to be smoother, more flavorsome, better smelling and even healthier chocolate. Now that's progress!
There is good reason why the Theobroma cacao plant is one of the oldest domesticated tree crops in existence – it produces the finest chocolate. It was domesticated by the Maya in Central America more than 3000 years ago and its beans even constituted part of the Mayan currency at one stage.
Hence the reason an international team of 18 universities led by CIRAD in France worked together to sequence the plant’s DNA. Though it produces the world’s finest cocoa, it accounts for less than five percent of the world cocoa production because of the plant’s low productivity and susceptibility to fungal diseases and insects. Most growers prefer to grow hybrid cacao trees that produce larger quantities of inferior chocolate and are more robust, so production and hence income is guaranteed.
Currently, most cacao farmers earn about $2 per day, but producers of fine cacao earn more. Increasing the productivity and ease of growing cacao can help to develop a sustainable cacao economy. Cocoa trees also encourage land rehabilitation and enriched biodiversity because they grow best under a forest canopy.
"Our analysis of the Criollo genome has uncovered the genetic basis of pathways leading to the most important quality traits of chocolate -- oil, flavonoid and terpene biosynthesis," said Siela Maximova, associate professor of horticulture, Penn State, and a member of the research team. "It has also led to the discovery of hundreds of genes potentially involved in pathogen resistance, all of which can be used to accelerate the development of elite varieties of cacao in the future."
Because the Criollo trees are self-pollinating, they are generally highly homozygous, possessing two identical forms of each gene, making this particular variety a good choice for accurate genome assembly.
The team was led by Claire Lanaud of CIRAD and Mark Guiltinan of Penn State, and including scientists from 18 other institutions. The researchers assembled 84 percent of the genome identifying 28,798 genes that code for proteins. They assigned 88 percent or 23,529 of these protein-coding genes to one of the 10 chromosomes in the Criollo cacao tree. They also looked at microRNAs, short noncoding RNAs that regulate genes, and found that microRNAs in Criollo are probably major regulators of gene expression.
"Interestingly, only 20 percent of the genome was made up of transposable elements, one of the natural pathways through which genetic sequences change," said Guiltinan "They do this by moving around the chromosomes, changing the order of the genetic material. Smaller amounts of transposons than found in other plant species could lead to slower evolution of the chocolate plant, which was shown to have a relatively simple evolutionary history in terms of genome structure."
Guiltinan and his colleagues are interested in specific gene families that could link to specific cocoa qualities or disease resistance. They hope that mapping these gene families will lead to a source of genes directly involved in variations in the plant that are useful for acceleration of plant breeding programs.
The researchers identified two types of disease resistance genes in the Criollo genome. They compared these to previously identified regions on the chromosomes that correlate with disease resistance – QTLs – and found that there was a correlation between many the resistance genes' QTL locations. The team suggests that a functional genomics approach, one that looks at what the genes do, is needed to confirm potential disease resistant genes in the Criollo genome.
Hidden in the genome the researchers also found genes that code for the production of cocoa butter, a substance highly prized in chocolate making, confectionery, pharmaceuticals and cosmetics. Most cocoa beans are already about 50 percent fat, but these 84 genes control not only the amounts but quality of the cocoa butter.
Other genes were found that influence the production of flavonoids, natural antioxidants and terpenoids, hormones, pigments and aromas. Altering the genes for these chemicals might produce chocolate with better flavors, aromas and even healthier chocolate.
Penn State researchers involved in this study include Guiltinan and Maximova; Yufan Zhang and Zi Shi, graduate students, plant biology; Stephen Schuster, Department of Biochemistry and Molecular Biology; John E. Carlson, School of Forest Resources and M.J. Axtell and Z. Ma, Department of Biology.
Other researchers involved were from CIRAD; Institut National de la Recherche Agronomique UMR; Genoscope; Centre National de la Recherche Scientifique; Centre National de Genotypage; Universite d'Evry; INRA-CNRS LIPM Laboratoire des Interactions Plantes Micro-organismes; Universite de Perpignan; Unite de Biometrie et d'Intelligence Artificielle; Institut des Sciences du Vegetal; and Chocolaterie Valrhona, all in France.
Also included are researchers from the University of Arizona; Cold Spring Harbor Laboratory; Centre National de la Recherche Agronomique, Ivory Coast; CEPLAC, Brazil; and Centro Nacional de Biotecnologia Agricola, Instituto de Estudios Avanzados, Venezuela.
CIRAD, the Agropolis foundation, the Région Languedoc Roussillon, Agence Nationale de la Recherche (ANR), Valrhona, the Venezuelan Ministry of Science, Technology and Industry, Hershey Corp., the American Cocoa Research Institute Endowment and the National Science Foundation supported this work.
The Theobroma cacao genome sequences are deposited in the EMB:/Genbank/DDBJ databases under accession numbers CACC01000001-CACC01025912. A genome browser can be found here.