While genome sequencing efforts reveal the basic building blocksof life, a genome sequence alone is insufficient for elucidatingbiological function. Genome annotation—the process ofidentifying genes and assigning function to each gene in a genomesequence—provides the means to elucidate biological functionfrom sequence. Current state-of-the-art high-throughput genomeannotation uses a combination of comparative (sequence similaritydata) and non-comparative (ab initio gene prediction algorithms)methods to identify protein-coding genes in genome sequences.Because approaches used to validate the presence of predictedprotein-coding genes are typically based on expressed RNA sequences,they cannot independently and unequivocally determine whethera predicted protein-coding gene is translated into a protein.With the ability to directly measure peptides arising from expressedproteins, high-throughput liquid chromatography-tandem massspectrometry-based proteomics approaches can be used to verifycoding regions of a genomic sequence. Here, we highlight severalways in which high-throughput tandem mass spectrometry-basedproteomics can improve the quality of genome annotations andsuggest that it could be efficiently applied during the genecalling process so that the improvements are propagated throughthe subsequent functional annotation process. 相似文献
Vitis vinifera has been an emblematic plant for humans since the Neolithic period. Human civilization has been shaped by its domestication as both its medicinal and nutritional values were exploited. It is now cultivated on all habitable continents, and more than 5000 varieties have been developed. A global passion for the art of wine fuels innovation and a profound desire for knowledge on this plant. The genome sequence of a homozygotic cultivar and several RNA‐seq datasets on other varieties have been released paving the way to gaining further insight into its biology and tailoring improvements to varieties. However, its genome annotation remains unpolished. In this issue of Proteomics, Chapman and Bellgard (Proteomics 2017, 17, 1700197) discuss how proteogenomics can help improve genome annotation. By mining shotgun proteomics data, they defined new protein‐coding genes, refined gene structures, and corrected numerous mRNA splicing events. This stimulating study shows how large international consortia could work together to improve plant and animal genome annotation on a large scale. To achieve this aim, time should be invested to generate comprehensive, high‐quality experimental datasets for a wide range of well‐defined lineages and exploit them with pipelines capable of handling giant datasets. 相似文献
In eukaryotes, mechanisms such as alternative splicing (AS) and alternative translation initiation (ATI) contribute to organismal protein diversity. Specifically, splicing factors play crucial roles in responses to environment and development cues; however, the underlying mechanisms are not well investigated in plants. Here, we report the parallel employment of short‐read RNA sequencing, single molecule long‐read sequencing and proteomic identification to unravel AS isoforms and previously unannotated proteins in response to abscisic acid (ABA) treatment. Combining the data from the two sequencing methods, approximately 83.4% of intron‐containing genes were alternatively spliced. Two AS types, which are referred to as alternative first exon (AFE) and alternative last exon (ALE), were more abundant than intron retention (IR); however, by contrast to AS events detected under normal conditions, differentially expressed AS isoforms were more likely to be translated. ABA extensively affects the AS pattern, indicated by the increasing number of non‐conventional splicing sites. This work also identified thousands of unannotated peptides and proteins by ATI based on mass spectrometry and a virtual peptide library deduced from both strands of coding regions within the Arabidopsis genome. The results enhance our understanding of AS and alternative translation mechanisms under normal conditions, and in response to ABA treatment. 相似文献
Introduction: Lung cancer and related diseases have been one of the most common causes of deaths worldwide. Genomic-based biomarkers may hardly reflect the underlying dynamic molecular mechanism of functional protein interactions, which is the center of a disease. Recent developments in mass spectrometry (MS) have made it possible to analyze disease-relevant proteins expressed in clinical specimens by proteomic challenges.
Areas covered: To understand the molecular mechanisms of lung cancer and its subtypes, chronic obstructive pulmonary disease (COPD), asthma and others, great efforts have been taken to identify numerous relevant proteins by MS-based clinical proteomic approaches. Since lung cancer is a multifactorial disease that is biologically associated with asthma and COPD among various lung diseases, this study focused on proteomic studies on biomarker discovery using various clinical specimens for lung cancer, COPD, and asthma.
Expert commentary: MS-based exploratory proteomics utilizing clinical specimens, which can incorporate both experimental and bioinformatic analysis of protein-protein interaction and also can adopt proteogenomic approaches, makes it possible to reveal molecular networks that are relevant to a disease subgroup and that could differentiate between drug responders and non-responders, good and poor prognoses, drug resistance, and so on. 相似文献