共查询到20条相似文献,搜索用时 21 毫秒
1.
2.
Thomas Bock Wei-Hua Chen Alessandro Ori Nayab Malik Noella Silva-Martin Jaime Huerta-Cepas Sean T. Powell Panagiotis L. Kastritis Georgy Smyshlyaev Ivana Vonkova Joanna Kirkpatrick Tobias Doerks Leo Nesme Jochen Ba?ler Martin Kos Ed Hurt Teresa Carlomagno Anne-Claude Gavin Orsolya Barabas Christoph W. Müller Vera van?Noort Martin Beck Peer Bork 《Nucleic acids research》2014,42(22):13525-13533
3.
4.
Since the structure of the DNA molecule was identified half a century ago, the complete genome sequence has been determined for 37 prokaryotes and several eukaryotes. With the exponential growth of genetic information, bioinformatics has attempted to predict gene locations and functions in cyberspace prior to experimental confirmation at the bench. 相似文献
5.
MOTIVATION: Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome comparison either on a large or small scale would be facilitated by using a single standard for annotation, which incorporates a transparency of why an open reading frame (ORF) is considered to be a gene. RESULTS: A total of 143 prokaryotic genomes were scored with an updated version of the prokaryotic genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to approximately 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms that too many short genes are annotated in numerous organisms. Furthermore, genes might be missing in the annotation of some of the genomes. We predict 41 of 143 genomes to be over-annotated by >5%, meaning that too many ORFs are annotated as genes. We also predict that 12 of 143 genomes are under-annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank. We argue that the average performance of our standardized and fully automated method is slightly better than the annotation. 相似文献
6.
7.
The evolutionary history of the entire Escherichia coli chromosome was traced by examining the distribution of the approximately 4300 open reading frames (ORFs) from E.coli MG1655 among strains of known genealogical relationships. Using this framework to deduce the incidence of gene transfer and gene loss, a total of 67 events-37 additions and 30 deletions-were required to account for the distribution of all genes now present in the MG1655 chromosome. Nearly 90% of the ORFs were common to all strains examined, but, given the variation in gene content and chromosome size, strains can contain well over a megabase of unique DNA, conferring traits that distinguish them from other members of the species. Moreover, strains vary widely in their frequencies of deletions, which probably accounts for the variation in genome size within the species. 相似文献
8.
Rachel Drysdale 《Briefings in Functional Genomics and Prot》2003,2(2):128-134
The sequence and genome annotations of Drosophila melanogaster were initially published in late 1999 and early 2000. Since then, the Berkeley Drosophila Genome Project (BDGP) and FlyBase have improved the quality of the sequence and reviewed the annotations by hand, respectively, to produce an account of the fruit fly genome that is of the highest quality. This review discusses the main features of this process, both from the point of view of the biology revealed in the end result and in the development of software that has been central to this genome sequencing and annotation project. 相似文献
9.
10.
11.
Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome 下载免费PDF全文
Bergman CM Pfeiffer BD Rincón-Limas DE Hoskins RA Gnirke A Mungall CJ Wang AM Kronmiller B Pacleb J Park S Stapleton M Wan K George RA de Jong PJ Botas J Rubin GM Celniker SE 《Genome biology》2002,3(12):research0086.1-862
Background
It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined.Results
We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D. willistoni, and D. littoralis) covering more than 500 kb of the D. melanogaster genome. All D. melanogaster genes (and 78-82% of coding exons) identified in divergent species such as D. pseudoobscura show evidence of functional constraint. Addition of a third species can reveal functional constraint in otherwise non-significant pairwise exon comparisons. Microsynteny is largely conserved, with rearrangement breakpoints, novel transposable element insertions, and gene transpositions occurring in similar numbers. Rates of amino-acid substitution are higher in uncharacterized genes relative to genes that have previously been studied. Conserved non-coding sequences (CNCSs) tend to be spatially clustered with conserved spacing between CNCSs, and clusters of CNCSs can be used to predict enhancer sequences.Conclusions
Our results provide the basis for choosing species whose genome sequences would be most useful in aiding the functional annotation of coding and cis-regulatory sequences in Drosophila. Furthermore, this work shows how decoding the spatial organization of conserved sequences, such as the clustering of CNCSs, can complement efforts to annotate eukaryotic genomes on the basis of sequence conservation alone. 相似文献12.
13.
14.
15.
16.
MOTIVATION: Regulation of gene expression in space and time directs its localization to a specific subset of cells during development. Systematic determination of the spatiotemporal dynamics of gene expression plays an important role in understanding the regulatory networks driving development. An atlas for the gene expression patterns of fruit fly Drosophila melanogaster has been created by whole-mount in situ hybridization, and it documents the dynamic changes of gene expression pattern during Drosophila embryogenesis. The spatial and temporal patterns of gene expression are integrated by anatomical terms from a controlled vocabulary linking together intermediate tissues developed from one another. Currently, the terms are assigned to patterns manually. However, the number of patterns generated by high-throughput in situ hybridization is rapidly increasing. It is, therefore, tempting to approach this problem by employing computational methods. RESULTS: In this article, we present a novel computational framework for annotating gene expression patterns using a controlled vocabulary. In the currently available high-throughput data, annotation terms are assigned to groups of patterns rather than to individual images. We propose to extract invariant features from images, and construct pyramid match kernels to measure the similarity between sets of patterns. To exploit the complementary information conveyed by different features and incorporate the correlation among patterns sharing common structures, we propose efficient convex formulations to integrate the kernels derived from various features. The proposed framework is evaluated by comparing its annotation with that of human curators, and promising performance in terms of F1 score has been reported. 相似文献
17.
18.
19.
An integrated genetic linkage map was developed for the turkey (Meleagris gallopavo) that combines the genetic markers from the three previous mapping efforts. The UMN integrated map includes 613 loci arranged into 41 linkage groups. An additional 105 markers are tentatively placed within linkage groups based on two-point LOD scores and 19 markers remain unlinked. A total of 210 previously unmapped markers has been added to the UMN turkey genetic map. Markers from each of the 20 linkage groups identified in the Roslin map and the 22 linkage groups of the Nte map are incorporated into the new integrated map. Overall map distance contained within the 41 linkage groups is 3,365 cM (sex-averaged) with the largest linkage group (94 loci) measuring 533.1 cM. Average marker interval for the map was 7.86 cM. Sequences of markers included in the new map were compared to the chicken genome sequence by 'BLASTN'. Significant similarity scores were obtained for 95.6% of the turkey sequences encompassing an estimated 91% of the chicken genome. A physical map of the chicken genome based on positions of the turkey sequences was built and 36 of the 41 turkey linkage groups were aligned with the physical map, five linkage groups remain unassigned. Given the close similarities between the turkey and chicken genomes, the chicken genome sequence could serve as a scaffold for a genome sequencing effort in the turkey. 相似文献