首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
Few plant peptides involved in intercellular communication have been experimentally isolated. Sequence analysis of the Arabidopsis thaliana genome has revealed numerous transmembrane receptors predicted to bind proteinacious ligands, emphasizing the importance of identifying peptides with signaling function. Annotation of the Arabidopsis genome sequence has made it possible to identify peptide-encoding genes. However, such annotational identification is impeded because small genes are poorly predicted by gene-prediction algorithms, thus prompting the alternative approaches described here. We initially performed a systematic analysis of short polypeptides encoded by annotated genes on two Arabidopsis chromosomes using SignalP to identify potentially secreted peptides. Subsequent homology searches with selected, putatively secreted peptides, led to the identification of a potential, large Arabidopsis family of 34 genes. The predicted peptides are characterized by a conserved C-terminal sequence motif and additional primary structure conservation in a core region. The majority of these genes had not previously been annotated. A subset of the predicted peptides show high overall sequence similarity to Rapid Alkalinization Factor (RALF), a peptide isolated from tobacco. We therefore refer to this peptide family as RALFL for RALF-Like. RT-PCR analysis confirmed that several of the Arabidopsis genes are expressed and that their expression patterns vary. The identification of a large gene family in the genome of the model organism Arabidopsis thaliana demonstrates that a combination of systematic analysis and homology searching can contribute to peptide discovery.  相似文献   

2.
We have developed a rice (Oryza sativa) genome annotation database (Osa1) that provides structural and functional annotation for this emerging model species. Using the sequence of O. sativa subsp. japonica cv Nipponbare from the International Rice Genome Sequencing Project, pseudomolecules, or virtual contigs, of the 12 rice chromosomes were constructed. Our most recent release, version 3, represents our third build of the pseudomolecules and is composed of 98% finished sequence. Genes were identified using a series of computational methods developed for Arabidopsis (Arabidopsis thaliana) that were modified for use with the rice genome. In release 3 of our annotation, we identified 57,915 genes, of which 14,196 are related to transposable elements. Of these 43,719 non-transposable element-related genes, 18,545 (42.4%) were annotated with a putative function, 5,777 (13.2%) were annotated as encoding an expressed protein with no known function, and the remaining 19,397 (44.4%) were annotated as encoding a hypothetical protein. Multiple splice forms (5,873) were detected for 2,538 genes, resulting in a total of 61,250 gene models in the rice genome. We incorporated experimental evidence into 18,252 gene models to improve the quality of the structural annotation. A series of functional data types has been annotated for the rice genome that includes alignment with genetic markers, assignment of gene ontologies, identification of flanking sequence tags, alignment with homologs from related species, and syntenic mapping with other cereal species. All structural and functional annotation data are available through interactive search and display windows as well as through download of flat files. To integrate the data with other genome projects, the annotation data are available through a Distributed Annotation System and a Genome Browser. All data can be obtained through the project Web pages at http://rice.tigr.org.  相似文献   

3.
Multicellular organisms produce small cysteine-rich antimicrobial peptides as an innate defense against pathogens. While defensins, a well-known class of such peptides, are common among eukaryotes, there are other classes restricted to the plant kingdom. These include thionins, lipid transfer proteins and snakins. In earlier work, we identified several divergent classes of small putatively secreted cysteine-rich peptides (CRPs) in legumes [Graham et al. (2004)Plant Physiol. 135, 1179-97]. Here, we built sequence motif models for each of these classes of peptides, and iteratively searched for related sequences within the comprehensive UniProt protein dataset, the Institute for Genomic Research's 33 plant gene indices, and the entire genomes of the model dicot, Arabidopsis thaliana, and the model monocot and crop species, Oryza sativa (rice). Using this search strategy, we identified approximately 13,000 plant genes encoding peptides with common features: (i) an N-terminal signal peptide, (ii) a small divergent charged or polar mature peptide with conserved cysteines, (iii) a similar intron/exon structure, (iv) spatial clustering in the genomes studied, and (v) overrepresentation in expressed sequences from reproductive structures of specific taxa. The identified genes include classes of defensins, thionins, lipid transfer proteins, and snakins, plus other protease inhibitors, pollen allergens, and uncharacterized gene families. We estimate that these classes of genes account for approximately 2-3% of the gene repertoire of each model species. Although 24% of the genes identified were not annotated in the latest Arabidopsis genome releases (TIGR5, TAIR6), we confirmed expression via RT-PCR for 59% of the sequences attempted. These findings highlight limitations in current annotation procedures for small divergent peptide classes.  相似文献   

4.
Bioactive peptides play critical roles in regulating most biological processes in animals. The elucidation of the amino acid sequence of these regulatory peptides is crucial for our understanding of animal physiology. Most of the (neuro)peptides currently known were identified by purification and subsequent amino acid sequencing. With the entire genome sequence of some animals now available, it has become possible to predict novel putative peptides. In this way, BLAST (Basic Local Alignment Searching Tool) analysis of the Drosophila melanogaster genome has allowed annotation of 36 secretory peptide genes so far. Peptide precursor genes are, however, poorly predicted by this algorithm, thus prompting an alternative approach described here. With the described searching program we scanned the Drosophila genome for predicted proteins with the structural hallmarks of neuropeptide precursors. As a result, 76 additional putative secretory peptide genes were predicted in addition to the 43 annotated ones. These putative (neuro)peptide genes contain conserved motifs reminiscent of known neuropeptides from other animal species. Peptides that display sequence similarities to the mammalian vasopressin, atrial natriuretic peptide, and prolactin precursors and the invertebrate peptides orcokinin, prothoracicotropic hormones, trypsin modulating oostatic factor, and Drosophila immune induced peptides (DIMs) among others were discovered. Our data hence provide further evidence that many neuropeptide genes were already present in the ancestor of Protostomia and Deuterostomia prior to their divergence. This bioinformatic study opens perspectives for the genome-wide analysis of peptide genes in other eukaryotic model organisms.  相似文献   

5.
6.
7.
The correct annotation of genes encoding the smallest proteins is one of the biggest challenges of genome annotation, and perhaps more importantly, few annotated short open reading frames have been confirmed to correspond to synthesized proteins. We used sequence conservation and ribosome binding site models to predict genes encoding small proteins, defined as having 16–50 amino acids, in the intergenic regions of the Escherichia coli genome. We tested expression of these predicted as well as previously annotated genes by integrating the sequential peptide affinity tag directly upstream of the stop codon on the chromosome and assaying for synthesis using immunoblot assays. This approach confirmed that 20 previously annotated and 18 newly discovered proteins of 16–50 amino acids are synthesized. We summarize the properties of these small proteins; remarkably more than half of the proteins are predicted to be single‐transmembrane proteins, nine of which we show co‐fractionate with cell membranes.  相似文献   

8.
即使细菌基因组的基因结构较为简单,但在注释过程中也可能出现基因遗漏的现象。当潜在基因在高质量数据库中没有显著同源序列时,基于知识库的基因预测方法就会遇到困难。本文希望通过系统扫描基因组所有可能ORF的蛋白质序列模式来搜索遗漏基因。为验证该方法的可行性,作者系统分析了重要的工业发酵微生物谷氨酸棒杆菌的基因组,发现了25个候选疑似基因。它们具有显著的蛋白质序列模式,但在Swiss-Prot中元显著同源序列,并且在GenBank中仍未注释。深入分析发现,25个候选疑似基因中19个为可能基因,3个为可能假基因,3个为疑似基因序列。这些结果说明本文的分析方法可以有效地用于无显著同源序列基因的搜索。  相似文献   

9.
Xing XB  Li QR  Sun H  Fu X  Zhan F  Huang X  Li J  Chen CL  Shyr Y  Zeng R  Li YX  Xie L 《Genomics》2011,98(5):343-351
Identifying protein-coding genes in eukaryotic genomes remains a challenge in post-genome era due to the complex gene models. We applied a proteogenomics strategy to detect un-annotated protein-coding regions in mouse genome. High-accuracy tandem mass spectrometry (MS/MS) data from diverse mouse samples were generated by LTQ-Orbitrap mass spectrometer in house. Two searchable diagnostic proteomic datasets were constructed, one with all possible encoding exon junctions, and the other with all putative encoding exons, for the discovery of novel exon splicing events and novel uninterrupted protein-coding regions. Altogether 29,586 unique peptides were identified. Aligning backwards to the mouse genome, the translation of 4471 annotated genes was validated by the known peptides; and 172 genic events were defined in mouse genome by the novel peptides. The approach in the current work can provide substantial evidences for eukaryote genome annotation in encoding genes.  相似文献   

10.

Background

Since the initial publication of its complete genome sequence, Arabidopsis thaliana has become more important than ever as a model for plant research. However, the initial genome annotation was submitted by multiple centers using inconsistent methods, making the data difficult to use for many applications.

Results

Over the course of three years, TIGR has completed its effort to standardize the structural and functional annotation of the Arabidopsis genome. Using both manual and automated methods, Arabidopsis gene structures were refined and gene products were renamed and assigned to Gene Ontology categories. We present an overview of the methods employed, tools developed, and protocols followed, summarizing the contents of each data release with special emphasis on our final annotation release (version 5).

Conclusion

Over the entire period, several thousand new genes and pseudogenes were added to the annotation. Approximately one third of the originally annotated gene models were significantly refined yielding improved gene structure annotations, and every protein-coding gene was manually inspected and classified using Gene Ontology terms.  相似文献   

11.
Large-scale prokaryotic gene prediction and comparison to genome annotation   总被引:4,自引:0,他引:4  
MOTIVATION: Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome comparison either on a large or small scale would be facilitated by using a single standard for annotation, which incorporates a transparency of why an open reading frame (ORF) is considered to be a gene. RESULTS: A total of 143 prokaryotic genomes were scored with an updated version of the prokaryotic genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to approximately 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms that too many short genes are annotated in numerous organisms. Furthermore, genes might be missing in the annotation of some of the genomes. We predict 41 of 143 genomes to be over-annotated by >5%, meaning that too many ORFs are annotated as genes. We also predict that 12 of 143 genomes are under-annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank. We argue that the average performance of our standardized and fully automated method is slightly better than the annotation.  相似文献   

12.
13.
Most of our understanding of plant genome structure and evolution has come from the careful annotation of small (e.g., 100 kb) sequenced genomic regions or from automated annotation of complete genome sequences. Here, we sequenced and carefully annotated a contiguous 22 Mb region of maize chromosome 4 using an improved pseudomolecule for annotation. The sequence segment was comprehensively ordered, oriented, and confirmed using the maize optical map. Nearly 84% of the sequence is composed of transposable elements (TEs) that are mostly nested within each other, of which most families are low-copy. We identified 544 gene models using multiple levels of evidence, as well as five miRNA genes. Gene fragments, many captured by TEs, are prevalent within this region. Elimination of gene redundancy from a tetraploid maize ancestor that originated a few million years ago is responsible in this region for most disruptions of synteny with sorghum and rice. Consistent with other sub-genomic analyses in maize, small RNA mapping showed that many small RNAs match TEs and that most TEs match small RNAs. These results, performed on ∼1% of the maize genome, demonstrate the feasibility of refining the B73 RefGen_v1 genome assembly by incorporating optical map, high-resolution genetic map, and comparative genomic data sets. Such improvements, along with those of gene and repeat annotation, will serve to promote future functional genomic and phylogenomic research in maize and other grasses.  相似文献   

14.
Prokaryote gene annotation is complicated by large numbers of short open reading frames (ORFs) that arise naturally from genetic code design. Historically, many hypothetical ORFs have been annotated as genes in microbes, usually with an arbitrary length threshold (e.g. greater than 100 codons). Given the use of such thresholds, what is the extent of genuine undiscovered short genes in the current sampling of prokaryote genomes? To assess rigorously the potential under-annotation of short ORFs with homology, we exhaustively compared the polyORFome--all possible ORFs in 64 prokaryotes (53 bacteria and 11 archaea) plus budding yeast--to itself and to all known proteins. The novelty of our analysis is that, firstly, sequence comparisons to/between both annotated and un-annotated ORFs are considered, and secondly a two-step disabled-homology filter is applied to set aside putative pseudogenes and spurious ORFs. We find that un-annotated homologous short ORFs (uhORFs) correspond to a small but non-negligible fraction of the annotated prokaryote proteomes (0.5-3.8%, depending on selection criteria). Moreover, the disabled-homology filter indicates that about a third of uhORFs correspond to putative pseudogenes or spurious ORFs. Our analysis shows that the use of annotation length thresholds is unnecessary, as there are manageable numbers of short ORF homologies conserved (without disablements) across microbial genomes. Data on uhORFs are available from http://pseudogene.org/polyo  相似文献   

15.
16.
17.
18.
19.
The 2694 ORFs originally annotated as potential genes in the genome of Aeropyrum pernix can be categorized into three clusters (A, B, C), according to their nucleotide composition at three codon positions. Coding potential was found to be responsible for the phenomenon of three clusters in a 9-dimensional space derived from the nucleotide composition of ORFs: ORFs assigned to cluster A are coding ones, while those assigned to clusters B and C are non-coding ORFs. A "codingness" index called the AZ score is defined based on a clustering method used to recognize protein-coding genes in the A. pernix genome. The criterion for a coding or non-coding ORF is based on the AZ score. ORFs with AZ > 0 or AZ < 0 are coding or non-coding, respectively. Consequently, 620 out of 632 ORFs with putative functions based on the original annotation are contained in cluster A, which have positive AZ scores. In addition, all 29 ORFs encoding putative or conserved proteins newly added in RefSeq annotation also have positive AZ scores. Accordingly, the number of re-recognized protein-coding genes in the A. pernix genome is 1610, which is significantly less than 2694 in the original annotation and also much less than 1841 in the RefSeq annotation curated by NCBI staff. Annotation information of re-recognized genes and their AZ scores are available at: http://tubic.tju.edu.cn/Aper/.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号