Meloidogyne incognita sequencing project
The sequencing project was led by INRA Sophia Antipolis (P. Abad) with the bioinformatic support of the INRA Toulouse bioinformatic platform (J. Gouzy). The sequencing and the assembling of the sequence were made by Genoscope, the French National Sequencing Center.
Significance as a genetically tractable system
The genome size of M. incognita is estimated at 47-51 Mb (by flow cytometry and reassociation kinetics, respectively). Therefore, this genome has been thoughy to represent one of the smallest among nematode genomes. The GC content of 33% and the presence of 20% repetitive sequences represent similar values to those observed in the two sequenced Caenorhabditis species. There is no evidence for large amounts of highly repetitive DNA sequences in M.incognita and the ribosomal RNA gene repeats are estimated to be less than 100 copies.
M. incognita reproduces by mitotic parthenogenesis and more than 95% of the isolates studied have 2n= 36-42 chromosomes with a haploid genome (n=18). Therefore, it is generally assumed that these M. incognita isolates are diploids or hypotriploids with aneuploidy. However, M. incognita appears to be remarkably homogeneous and homozygous. Most of the isozyme phenotypes analyzed are so unique and conserved that they can be used to differentiate Meloidogyne species. Similarly, isolates of M. incognita from different continents are very difficult to separate on the basis of RFLP or 2D electrophoresis profiles. In the same way, RAPD, AFLP and microsatellites show a very low level of polymorphism between isolates of M. incognita.
Isolate of M. incognita
The objective of the project is to achieve a 10X-coverage of M. incognita genome, using the isolate "Morelos". This isolate has been chosen for the following reasons:
- a good cDNA library with full-length cDNA is available for this isolate with insert size ranging from 1 to 3 kb
- an EST collection of 30,000 clones is available for this isolate at INRA Sophia-Antipolis
- numerous sequence data are available for this isolate, including (i) dispersed sequence data, of a few kb each, surrounding genes of interest (cellulases, pectinase, genes involved in pathogenicity and avirulence.)
- molecular markers such as microsatellite loci and satellite sequences, useful for contig assignment, are available.
The attainment of the M. incognita complete sequence was reached in 4 steps:
- Shot-gun sequencing
A shot-gun sequencing was performed, using chromosomal DNA of the Morelos isolate, to reach a 10X genome coverage. The genomic material was prepared at INRA Sophia Antipolis and Génoscope prepared libraries of fragments whose size were determined using previous experience and knowledge of the dispersed repeated sequences known in M. incognita. Fragments ranging from 3 to 10 kb were cloned and end-sequenced. This part of the project corresponded to ca. 1,000 000 reads.
- BAC library
In order to facilitate the assembly of small contigs generated by shot-gun sequences, both ends of clones of a BAC library have been currently sequenceid. This library has been done in collaboration with the Génoscope from highly purified DNA extracted at INRA Sophia Antipolis. The Génoscope sequenced 25,000 BAC end clones ( 50,000 reads).
- ESTs libraries
Generation of a large number of ESTs from full-length cDNAs is needed to facilitate and improve the automatic sequence annotation. If a rather small number of ESTs is enough to build a training set for gene finders, the improvement of the accuracy and the validation of the automated gene predictions require massive generation of ESTs. For that reason, we have sequenced a total of 50,000 clones from five different cDNA libraries. In a first step, 10,000 clones were sequenced from each of the first cDNA library already available. In addition, two cDNA libraries were constructed for EST analysis that represent different nematode developmental stages (third-stage infective juveniles and females). The INRA Sophia Antipolis has considerable experience in making full-length cDNA libraries and has previously constructed such a library for second-stage infective M. incognita juveniles. So the overall project goal was to sequence 50,000 clones for EST analysis and genome annotation with approximately equal distribution across the five libraries. Initial runs were done from the 5-prime end and informative clones were then sequenced from the 3-prime end. The overall project represents a sequencing volume of 1,125 000 reads.
- Sequence assembly step The Génoscope has generated the M. incognita genome assembly.
The annotation process has started after the completion of the sequencing (EST and genome) and assembly (genome) steps by the Génoscope. The INRA Toulouse Bioinformatics plateform has performed all relevant sequence analyse (EuGene predictor
) and has provided the annotation framework needed for the expert annotation ensuring the use of international standards both in terms of technical solutions and data representation. Thus, web service has been developed with the international BioMoby software architecture. Genome browser is based on chado/apollo/Gbrowse
). Expert annotation is performed with Apollo
software. Functional annotation protocol uses the Gene Ontology
The protocol we used to annotate the M. incognita genome is composed by 9 tasks:
- to build a set of validated full length genes.
- to optimise EuGene parameters.
This step has required at least 600 full length genes for which both genomic and transcript sequences are available. This key step in the annotation process identifies the best EuGene parameters to take into account signal predictions and to integrate similarities (EST and proteins).
- to build a M. incognita repeat database.
- to predict protein coding gene prediction using the M. incognita EuGene pipeline.
- to apply in house ncRNA detection pipeline and to build a M. incognita ncRNA database.
- to perform the automatic protein analyses (Interpro, blast on relevant databases, etc...).
- to build a genome browser GBrowse to provide the annotators with tools and results required for expert annotation (Apollo interface).
- to organise training sessions to ensure optimal use of annotation tools.
- to provide support to the expert annotation process.
Plant-Nematode interaction team, Inra Paca
Creation date: 13 May 2009
Update: 15 January 2010