首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Spermatozoa are central to fertilization and the evolutionary fitness of sexually reproducing organisms. As such, a deeper understanding of sperm proteomes (and associated reproductive tissues) has proven critical to the advancement of the fields of sexual selection and reproductive biology. Due to their extraordinary complexity, proteome depth-of-coverage is dependent on advancements in technology and related bioinformatics, both of which have made significant advancements in the decade since the last Drosophila sperm proteome was published. Here, we provide an updated version of the Drosophila melanogaster sperm proteome (DmSP3) using improved separation and detection methods and an updated genome annotation. Combined with previous versions of the sperm proteome, the DmSP3 contains a total of 3176 proteins, and we provide the first label-free quantitation of the sperm proteome for 2125 proteins. The top 20 most abundant proteins included the structural elements α- and β-tubulins and sperm leucyl-aminopeptidases. Both gene content and protein abundance were significantly reduced on the X chromosome, consistent with prior genomic studies of X chromosome evolution. We identified 9 of the 16 Y-linked proteins, including known testis-specific male fertility factors. We also identified almost one-half of known Drosophila ribosomal proteins in the DmSP3. The role of this subset of ribosomal proteins in sperm is unknown. Surprisingly, our expanded sperm proteome also identified 122 seminal fluid proteins (Sfps), proteins originally identified in the accessory glands. We show that a significant fraction of ‘sperm-associated Sfps’ are recalcitrant to concentrated salt and detergent treatments, suggesting this subclass of Sfps are expressed in testes and may have additional functions in sperm, per se. Overall, our results add to a growing landscape of both sperm and seminal fluid protein biology and in particular provides quantitative evidence at the protein level for prior findings supporting the meiotic sex-chromosome inactivation model for male-specific gene and X chromosome evolution.  相似文献   

3.
4.
5.
6.
7.
Proteogenomics     
Renuse S  Chaerkady R  Pandey A 《Proteomics》2011,11(4):620-630
The ability to sequence DNA rapidly, inexpensively and in a high-throughput fashion provides a unique opportunity to sequence whole genomes of a large number of species. The cataloging of protein-coding genes from these species, however, remains a non-trivial task with the majority of initial genome annotation dependent on the use of gene prediction algorithms. Recent advances in mass spectrometry-based proteomics now enable generation of accurate and comprehensive protein sequence of tissues and organisms. Proteogenomics allows us to harness the wealth of information available at the proteome level and apply it to the available genomic information of organisms. This includes identifying novel genes and splice isoforms, assigning correct start sites and validating predicted exons and genes. It is also possible to use proteogenomics to identify protein variants that could cause diseases, to identify protein biomarkers and to study genome variation. We anticipate proteogenomics to become a powerful approach that will be routinely employed by 'Genome and Proteome Centers' of the future.  相似文献   

8.
The genome of Mycobacterium tuberculosis (H37Rv) contains 4,019 protein coding genes, of which more than thousand have been categorized as 'hypothetical' implying that for these not even weak functional associations could be identified so far. We here predict reliable functional indications for half of this large hypothetical orfeome: 497 genes can be annotated based on orthology, and another 125 can be linked to interacting proteins via integrated genomic context analysis and literature mining. The assignments include newly identified clusters of interacting proteins, hypothetical genes that are associated to well known pathways and putative disease-relevant targets. All together, we have raised the fraction of the proteome with at least some functional annotation to 88% which should considerably enhance the interpretation of large-scale experiments targeting this medically important organism.  相似文献   

9.
10.
The first protein map was developed of Synechococcus sp. strain PCC 7942, a model organism for studies of photosynthesis, prokaryotic circadian rhythms, cell division, carbon-concentrating mechanisms, and adaptive responses to a variety of stresses. The proteome was analyzed by two-dimensional gel electrophoresis with subsequent MALDI-TOF mass spectroscopy and database analysis. Of the 140 analyzed protein spots, 110 were successfully identified as 62 different proteins, many of which occurred as multiple spots on the gel. The identified proteins participate in the major metabolic and cellular processes in cyanobacterial cells during the exponential growth phase. In addition, 14 proteins which were previously either unknown or considered to be hypothetical were shown to be true gene products in Synechococcus sp. strain PCC 7942. These results may be helpful for the annotation of the recently sequenced genome of this cyanobacterium, as well as for biochemical and physiological studies of Synechococcus.  相似文献   

11.
To better understand adaptation to harsh conditions encountered in hot arid deserts, we report the first complete genome sequence and proteome analysis of a bacterium, Deinococcus deserti VCD115, isolated from Sahara surface sand. Its genome consists of a 2.8-Mb chromosome and three large plasmids of 324 kb, 314 kb, and 396 kb. Accurate primary genome annotation of its 3,455 genes was guided by extensive proteome shotgun analysis. From the large corpus of MS/MS spectra recorded, 1,348 proteins were uncovered and semiquantified by spectral counting. Among the highly detected proteins are several orphans and Deinococcus-specific proteins of unknown function. The alliance of proteomics and genomics high-throughput techniques allowed identification of 15 unpredicted genes and, surprisingly, reversal of incorrectly predicted orientation of 11 genes. Reversal of orientation of two Deinococcus-specific radiation-induced genes, ddrC and ddrH, and identification in D. deserti of supplementary genes involved in manganese import extend our knowledge of the radiotolerance toolbox of Deinococcaceae. Additional genes involved in nutrient import and in DNA repair (i.e., two extra recA, three translesion DNA polymerases, a photolyase) were also identified and found to be expressed under standard growth conditions, and, for these DNA repair genes, after exposure of the cells to UV. The supplementary nutrient import and DNA repair genes are likely important for survival and adaptation of D. deserti to its nutrient-poor, dry, and UV-exposed extreme environment.  相似文献   

12.

Background

In order to maintain genome information accurately and relevantly, original genome annotations need to be updated and evaluated regularly. Manual reannotation of genomes is important as it can significantly reduce the propagation of errors and consequently diminishes the time spent on mistaken research. For this reason, after five years from the initial submission of the Entamoeba histolytica draft genome publication, we have re-examined the original 23 Mb assembly and the annotation of the predicted genes.

Principal Findings

The evaluation of the genomic sequence led to the identification of more than one hundred artifactual tandem duplications that were eliminated by re-assembling the genome. The reannotation was done using a combination of manual and automated genome analysis. The new 20 Mb assembly contains 1,496 scaffolds and 8,201 predicted genes, of which 60% are identical to the initial annotation and the remaining 40% underwent structural changes. Functional classification of 60% of the genes was modified based on recent sequence comparisons and new experimental data. We have assigned putative function to 3,788 proteins (46% of the predicted proteome) based on the annotation of predicted gene families, and have identified 58 protein families of five or more members that share no homology with known proteins and thus could be entamoeba specific. Genome analysis also revealed new features such as the presence of segmental duplications of up to 16 kb flanked by inverted repeats, and the tight association of some gene families with transposable elements.

Significance

This new genome annotation and analysis represents a more refined and accurate blueprint of the pathogen genome, and provides an upgraded tool as reference for the study of many important aspects of E. histolytica biology, such as genome evolution and pathogenesis.  相似文献   

13.
14.
15.

Background

Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals.

Results

We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure similarity search as well as class specific methods, we obtained a conservative set with a total of 3,391 structured RNA loci of which 1,011 and 2,314, respectively, hold strong sequence and structure similarity to structured RNAs in existing databases. The RNA loci cover 139 cis-regulatory element loci, 58 lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes. When running the pipeline on a local shuffled version of the genome, we obtained no matches at the highest confidence level. Additional analysis of RNA-seq data from a pooled library from 10 different pig tissues added another 165 miRNA loci, yielding an overall annotation of 3,556 structured RNA loci. This annotation represents our best effort at making an automated annotation. To further enhance the reliability, 571 of the 3,556 structured RNAs were manually curated by methods depending on the RNA class while 1,581 were declared as pseudogenes. We further created a multiple alignment of pig against 20 representative vertebrates, from which RNAz predicted 83,859 de novo RNA loci with conserved RNA structures. 528 of the RNAz predictions overlapped with the homology based annotation or novel miRNAs. We further present a substantial synteny analysis which includes 1,004 lineage specific de novo RNA loci and 4 ncRNA loci in the known annotation specific for Laurasiatheria (pig, cow, dolphin, horse, cat, dog, hedgehog).

Conclusions

We have obtained one of the most comprehensive annotations for structured ncRNAs of a mammalian genome, which is likely to play central roles in both health modelling and production. The core annotation is available in Ensembl 70 and the complete annotation is available at http://rth.dk/resources/rnannotator/susscr102/version1.02.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-459) contains supplementary material, which is available to authorized users.  相似文献   

16.
17.
The envelope of Escherichia coli is a complex organelle composed of the outer membrane, periplasm-peptidoglycan layer and cytoplasmic membrane. Each compartment has a unique complement of proteins, the proteome. Determining the proteome of the envelope is essential for developing an in silico bacterial model, for determining cellular responses to environmental alterations, for determining the function of proteins encoded by genes of unknown function and for development and testing of new experimental technologies such as mass spectrometric methods for identifying and quantifying hydrophobic proteins. The availability of complete genomic information has led several groups to develop computer algorithms to predict the proteome of each part of the envelope by searching the genome for leader sequences, β-sheet motifs and stretches of α-helical hydrophobic amino acids. In addition, published experimental data has been mined directly and by machine learning approaches. In this review we examine the somewhat confusing available literature and relate published experimental data to the most recent gene annotation of E. coli to describe the predicted and experimental proteome of each compartment. The problem of characterizing integral versus membrane-associated proteins is discussed. The E. coli envelope proteome provides an excellent test bed for developing mass spectrometric techniques for identifying hydrophobic proteins that have generally been refractory to analysis. We describe the gel based and solution based proteome analysis approaches along with protein cleavage and proteolysis methods that investigators are taking to tackle this difficult problem.  相似文献   

18.
Genome reannotation aims for complete and accurate characterization of gene models and thus is of critical significance for in-depth exploration of gene function. Although the availability of massive RNA-seq data provides great opportunities for gene model refinement, few efforts have been made to adopt these precious data in rice genome reannotation. Here we reannotate the rice (Oryza sativa L. ssp. japonica) genome based on integration of large-scale RNA-seq data and release a new annotation system IC4R-2.0. In general, IC4R-2.0 significantly improves the completeness of gene structure, identifies a number of novel genes, and integrates a variety of functional annotations. Furthermore, long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) are systematically characterized in the rice genome. Performance evaluation shows that compared to previous annotation systems, IC4R-2.0 achieves higher integrity and quality, primarily attributable to massive RNA-seq data applied in genome annotation. Consequently, we incorporate the improved annotations into the Information Commons for Rice (IC4R), a database integrating multiple omics data of rice, and accordingly update IC4R by providing more user-friendly web interfaces and implementing a series of practical online tools. Together, the updated IC4R, which is equipped with the improved annotations, bears great promise for comparative and functional genomic studies in rice and other monocotyledonous species. The IC4R-2.0 annotation system and related resources are freely accessible at http://ic4r.org/.  相似文献   

19.
20.
A hypothetical protein is predicted to be expressed from an open reading frame without known experimental evidence of translation. They constitute a substantial fraction of proteomes. Domain extraction from these hypothetical sequences helps to search for protein coding genes for protein structural and functional annotation. We describe the analysis of prediction data in a sequence dataset of hypothetical protein orthologs of Pongo abelii (orangutan) and Sus scrofa (pig). It should be noted that these orangutan-pig orthologs are also non-homologous to human proteins. These predicted data find application in the genome wide annotation of proteins in poorly understood genomes.

Abbreviations

PDB - Protein Data Bank, DEG - Database of Essential Genes, CDD - Conserved Domain Database, IUCN - International Union for Conservation of Nature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号