BioWorld International Correspondent
PARIS - A French private-public consortium unveiled the initial results it obtained from the comparison of 67 genomes, which it is making available to the international scientific community.
The consortium - composed of the Atomic Energy Commission (CEA), of Paris; Infobiogen (the National Center for Informatics Resources), located at the Genopole, France's national biotechnology science and business park at Evry; and a young biotechnology company, Paris-based Gene-IT SA - said its analysis makes it easier to identify and compare genes and understand their development. It is thus expected ultimately to result in the development of innovative therapies for various diseases and new environmental protection methods.
Pointing out that one of the problems in genomics research is to interpret the huge mass of data emanating from international sequencing programs, the consortium said that analyzing the organization of genomes makes it possible to study isolated genes and thus tackle inaccessible problems.
Known as Teraprot, the project brought together the computing power of the CEA's Tera supercomputer (the largest in Europe and the fourth largest in the world), the analytical power of Gene-IT's LASSAP software for analyzing complete genome sequences, and the bioinformatics expertise of Infobiogen. The Tera computer made it possible to perform a pair-by-pair comparison of all the proteins deduced from 67 complete genomic sequences, representing a total of 22 billion comparisons.
The proteome sets comprised three eukaryotes (A. thaliana, S. cerevisae and S. pombe), 12 archae bacteria (including A. pernix, A. fulgidus and M. jannaschii) and 52 bacteria (such as B. subtilis, E. coli, H. pylori and L. lactis). In addition, the entire set of proteins from plants was also fed into the system. The full data consisted of 240,000 protein sequences broken up into 114 distinct files (for some proteome sets, more than one file was created - one for the chromosomal and one for the plasmidic proteins, for example).
The proteome sets were compared using the LASSAP software, which contains an extensive set of high-performance, sequence comparison algorithms integrated in the BioWorkFlow environment. The latter enables users to quickly design and implement professional sequence comparison workflows in a consistent, integrated and scalable manner, and incorporates complete sequence database management, precise control over all sequence comparison algorithms and extensive results analysis.
The results are available on Infobiogen's website at www.infobiogen.fr.