首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
原核生物蛋白质基因组学研究进展   总被引:1,自引:0,他引:1  
随着基因组测序技术的不断发展,大量微生物基因组序列可以在短时间内得以准确鉴定。为了进一步探究基因组的结构与功能,基于序列特征与同源特征的基因组注释算法广泛应用于新测序物种。然而受基因组测序质量以及算法本身准确性偏低等问题的影响,现有的基因组注释存在着相当比例的假基因以及注释错误,尤其是蛋白质N端的注释错误。为了弥补基因组注释的不足,以基因芯片或RNA-seq为核心的转录组测序技术和以串联质谱为核心的蛋白质组测序技术可以高通量地对基因的转录和翻译产物进行精确测定,进而实现预测基因结构的实验验证。然而,原核生物细胞中存在的大量非编码RNA给转录组测序技术引入了污染数据,限制了其对基因组注释的应用。相对而言,以串联质谱技术为核心的蛋白质组学测序可以在短时间内鉴定到生物体内大量的蛋白质,实现注释基因的验证甚至校准。已成为基因组注释和重注释的重要依据,并因而衍生了"蛋白质基因组学"的新研究方向。文中首先介绍传统的基于序列预测和同源比对的基因组注释算法,指出其中存在的不足。在此基础上,结合转录组学与蛋白质组学的技术特点,分析蛋白质组学对于原核生物基因组注释的优势,总结现阶段大规模蛋白质基因组学研究的进展情况。最后从信息学角度指出当前蛋白质组数据进行基因组重注释存在的问题与相应的解决方案,进而探讨未来蛋白质基因组学的发展方向。  相似文献   

2.
While genome sequencing efforts reveal the basic building blocksof life, a genome sequence alone is insufficient for elucidatingbiological function. Genome annotation—the process ofidentifying genes and assigning function to each gene in a genomesequence—provides the means to elucidate biological functionfrom sequence. Current state-of-the-art high-throughput genomeannotation uses a combination of comparative (sequence similaritydata) and non-comparative (ab initio gene prediction algorithms)methods to identify protein-coding genes in genome sequences.Because approaches used to validate the presence of predictedprotein-coding genes are typically based on expressed RNA sequences,they cannot independently and unequivocally determine whethera predicted protein-coding gene is translated into a protein.With the ability to directly measure peptides arising from expressedproteins, high-throughput liquid chromatography-tandem massspectrometry-based proteomics approaches can be used to verifycoding regions of a genomic sequence. Here, we highlight severalways in which high-throughput tandem mass spectrometry-basedproteomics can improve the quality of genome annotations andsuggest that it could be efficiently applied during the genecalling process so that the improvements are propagated throughthe subsequent functional annotation process.   相似文献   

3.
Enzyme function less conserved than anticipated   总被引:13,自引:0,他引:13  
The level of sequence similarity that implies similarity in protein structure is well established. Recently, many groups proposed thresholds for similarity in sequence implying similarity in enzymatic function. All previous results suggest the strong conservation of enzymatic function above levels of 50% pairwise sequence identity. Here, I argue that all groups substantially overestimated the conservation of enzyme function because their data sets were either too biased, or too small. An unbiased analysis suggested that less than 30% of the pair fragments above 50% sequence identity have entirely identical EC numbers. Another surprising finding was that even BLAST E-values below 10(-50) did not suffice to automatically transfer enzyme function without errors. As expected, most misclassifications originated from similarities in relatively short regions and/or from transferring annotations for different domains. Both problems cannot be corrected easily by adjusting the thresholds for automatic transfer of genome annotations. A score relating sequence identity to alignment length (distance from HSSP-threshold) outperformed statistical BLAST scores for high sequence similarity. In particular, the distance score allowed error-free transfer of enzyme function for the 10% most similar enzyme pairs. The results illustrated how difficult it is to assess the conservation of protein function and to guarantee error-free genome annotations, in general: sets with millions of pair comparisons might not suffice to arrive at statistically significant conclusions. In practice, the revised detailed estimates for the sequence conservation of enzyme function may provide important benchmarks for everyday sequence analysis and for more cautious automatic genome annotations.  相似文献   

4.
5.
昆虫基因组及其大小   总被引:5,自引:0,他引:5  
薛建  程家安  张传溪 《昆虫学报》2009,52(8):901-906
昆虫基因组大小是由于基因组各种重复序列在扩增、缺失和分化过程中所致的数量差异造成的。这些差异使得昆虫不同类群间、种间和同种的不同种群间表现出基因组大小的不同。目前有59种昆虫已经列入基因组测序计划, 其中6种昆虫(黑腹果蝇Drosophila melanogaster、冈比亚按蚊Anopheles gambiae、家蚕Bombyx mori、意大利蜜蜂Apis mellifera、埃及伊蚊Aedes aegypti和赤拟谷盗Tribolium castaneum)的全基因组序列已经报道。有725种昆虫的基因组大小得到了估计, 大小在0.09~16.93 pg (88~16 558 Mb)之间。本文还介绍了昆虫基因组大小的估计方法, 讨论了昆虫基因组大小的变化及其意义。  相似文献   

6.
Closely related species of Drosophila tend to have similar genome sizes. The strong imbalance in favor of small deletions relative to insertions implies that the unconstrained DNA in Drosophila is unlikely to be passively inherited from even closely related ancestors, and yet most DNA in Drosophila genomes is intergenic and potentially unconstrained. In an attempt to investigate the maintenance of this intergenic DNA, we studied the evolution of an intergenic locus on the fourth chromosome of the Drosophila melanogaster genome. This 1.2-kb locus is marked by two distinct, large insertion events: a nuclear transposition of a mitochondrial sequence and a transposition of a nonautonomous DNA transposon DNAREP1_DM. Because we could trace the evolutionary histories of these sequences, we were able to reconstruct the length evolution of this region in some detail. We sequenced this locus in all four species of the D. melanogaster species complex: D. melanogaster, D. simulans, D. sechellia, and D. mauritiana. Although this locus is similar in size in these four species, less than 10% of the sequence from the most recent common ancestor remains in D. melanogaster and all of its sister species. This region appears to have increased in size through several distinct insertions in the ancestor of the D. melanogaster species complex and has been shrinking since the split of these lineages. In addition, we found no evidence suggesting that the size of this locus has been maintained over evolutionary time; these results are consistent with the model of a dynamic equilibrium between persistent DNA loss through small deletions and more sporadic DNA gain through less frequent but longer insertions. The apparent stability of genome size in Drosophila may belie very rapid sequence turnover at intergenic loci.  相似文献   

7.
8.
A new potential energy function representing the conformational preferences of sequentially local regions of a protein backbone is presented. This potential is derived from secondary structure probabilities such as those produced by neural network-based prediction methods. The potential is applied to the problem of remote homolog identification, in combination with a distance-dependent inter-residue potential and position-based scoring matrices. This fold recognition jury is implemented in a Java application called JThread. These methods are benchmarked on several test sets, including one released entirely after development and parameterization of JThread. In benchmark tests to identify known folds structurally similar to (but not identical with) the native structure of a sequence, JThread performs significantly better than PSI-BLAST, with 10% more structures identified correctly as the most likely structural match in a fold library, and 20% more structures correctly narrowed down to a set of five possible candidates. JThread also improves the average sequence alignment accuracy significantly, from 53% to 62% of residues aligned correctly. Reliable fold assignments and alignments are identified, making the method useful for genome annotation. JThread is applied to predicted open reading frames (ORFs) from the genomes of Mycoplasma genitalium and Drosophila melanogaster, identifying 20 new structural annotations in the former and 801 in the latter.  相似文献   

9.
Ramie, Boehmeria nivea (L.) Gaudich, family Urticaceae, is a plant native to eastern Asia, and one of the world's oldest fibre crops. It is also used as animal feed and for the phytoremediation of heavy metal‐contaminated farmlands. Thus, the genome sequence of ramie was determined to explore the molecular basis of its fibre quality, protein content and phytoremediation. For further understanding ramie genome, different paired‐end and mate‐pair libraries were combined to generate 134.31 Gb of raw DNA sequences using the Illumina whole‐genome shotgun sequencing approach. The highly heterozygous B. nivea genome was assembled using the Platanus Genome Assembler, which is an effective tool for the assembly of highly heterozygous genome sequences. The final length of the draft genome of this species was approximately 341.9 Mb (contig N50 = 22.62 kb, scaffold N50 = 1,126.36 kb). Based on ramie genome annotations, 30,237 protein‐coding genes were predicted, and the repetitive element content was 46.3%. The completeness of the final assembly was evaluated by benchmarking universal single‐copy orthologous genes (BUSCO); 90.5% of the 1,440 expected embryophytic genes were identified as complete, and 4.9% were identified as fragmented. Phylogenetic analysis based on single‐copy gene families and one‐to‐one orthologous genes placed ramie with mulberry and cannabis, within the clade of urticalean rosids. Genome information of ramie will be a valuable resource for the conservation of endangered Boehmeria species and for future studies on the biogeography and characteristic evolution of members of Urticaceae.  相似文献   

10.
FlyBase (http://flybase.bio.indiana.edu/) provides an integrated view of the fundamental genomic and genetic data on the major genetic model Drosophila melanogaster and related species. FlyBase has primary responsibility for the continual reannotation of the D. melanogaster genome. The ultimate goal of the reannotation effort is to decorate the euchromatic sequence of the genome with as much biological information as is available from the community and from the major genome project centers. A complete revision of the annotations of the now-finished euchromatic genomic sequence has been completed. There are many points of entry to the genome within FlyBase, most notably through maps, gene products and ontologies, structured phenotypic and gene expression data, and anatomy.  相似文献   

11.
12.
果蝇细胞凋亡核心机制的基因组比较   总被引:1,自引:0,他引:1  
基因组比较研究是从基因组序列推测调控网络的主要途径。细胞凋亡信号网络是调控网络的一个典型代表。EGL1、CED3、CED4和CED9及其同源蛋白质的线虫和哺乳动物构成保守的凋亡核心机制。目前果蝇细胞凋亡核心机制尚不完整,还未找到EGL1和CED9类似蛋白质。通过一系列基于生物信息学的基因组比较分析,在果蝇的基因组数据库中发现了两个BCL2/CED9和一个EGL1的同源蛋白质的编码基因,并重构了果蝇  相似文献   

13.
Systematically annotating function of enzymes that belong to large protein families encoded in a single eukaryotic genome is a very challenging task. We carried out such an exercise to annotate function for serine-protease family of the trypsin fold in Drosophila melanogaster, with an emphasis on annotating serine-protease homologues (SPHs) that may have lost their catalytic function. Our approach involves data mining and data integration to provide function annotations for 190 Drosophila gene products containing serine-protease-like domains, of which 35 are SPHs. This was accomplished by analysis of structure-function relationships, gene-expression profiles, large-scale protein-protein interaction data, literature mining and bioinformatic tools. We introduce functional residue clustering (FRC), a method that performs hierarchical clustering of sequences using properties of functionally important residues and utilizes correlation co-efficient as a quantitative similarity measure to transfer in vivo substrate specificities to proteases. We show that the efficiency of transfer of substrate-specificity information using this method is generally high. FRC was also applied on Drosophila proteases to assign putative competitive inhibitor relationships (CIRs). Microarray gene-expression data were utilized to uncover a large-scale and dual involvement of proteases in development and in immune response. We found specific recruitment of SPHs and proteases with CLIP domains in immune response, suggesting evolution of a new function for SPHs. We also suggest existence of separate downstream protease cascades for immune response against bacterial/fungal infections and parasite/parasitoid infections. We verify quality of our annotations using information from RNAi screens and other evidence types. Utilization of such multi-fold approaches results in 10-fold increase of function annotation for Drosophila serine proteases and demonstrates value in increasing annotations in multiple genomes.  相似文献   

14.
Tree House Explorer (THEx) is a genome browser that integrates phylogenomic data and genomic annotations into a single interactive platform for combined analysis. THEx allows users to visualize genome-wide variation in evolutionary histories and genetic divergence on a chromosome-by-chromosome basis, with continuous sliding window comparisons to gene annotations, recombination rates, and other user-specified, highly customizable feature annotations. THEx provides a new platform for interactive phylogenomic data visualization to analyze and interpret the diverse evolutionary histories woven throughout genomes. Hosted on Conda, THEx integrates seamlessly into new or pre-existing workflows.  相似文献   

15.
Genome size differences are usually attributed to the amplification and deletion of various repeated DNA sequences, including transposable elements (TEs). Because environmental changes may promote modifications in the amount of these repeated sequences, it has been postulated that when a species colonizes new environments this could be followed by an increase in its genome size. We tested this hypothesis by estimating the genome size of geographically distinct populations of Drosophila ananassae, Drosophila malerkotliana, Drosophila melanogaster, Drosophila simulans, Drosophila subobscura, and Zaprionus indianus, all of which have known colonization capacities. There was no strong statistical differences between continents for most species. However, we found that populations of D. melanogaster from east Africa have smaller genomes than more recent populations. For species in which colonization is a recent event, the differences between genome sizes do not thus seem to be related to colonization history. These findings suggest either that genome size is seldom modified in a significant way during colonization or that it takes time for genome size of invading species to change significantly.  相似文献   

16.
The rat genome project and the resources that it has generated are transforming the translation of rat biology to human medicine. The rat genome was sequenced to a high quality "draft," the structure and location of the genes were predicted, and a global assessment was published (Gibbs RA et al., Nature 428: 493-521, 2004). Since that time, researchers have made use of the genome sequence and annotations and related resources. We take this opportunity to review the currently available rat genome resources and to discuss the progress and future plans for the rat genome.  相似文献   

17.
How incorrect annotations evolve--the case of short ORFs   总被引:4,自引:0,他引:4  
The draft of the human genome sequence is still incomplete. The outstanding tasks include filling in some gaps, finalizing the assembly of short sequences, improving sequence accuracy and correctly identifying coding regions. However, a closely related problem that receives little attention is the substantial number of incorrect annotations that have penetrated some of the widely used databases. This article illustrates this problem using the example of ubiquitin genes, and draws some conclusions that apply to false annotations in other short open reading frames (ORFs). Although the focus is on the human genome, other genomes are equally prone to similar propagation of false annotations.  相似文献   

18.
The genome sequence of silkworm, Bombyx mori.   总被引:21,自引:0,他引:21  
We performed threefold shotgun sequencing of the silkworm (Bombyx mori) genome to obtain a draft sequence and establish a basic resource for comprehensive genome analysis. By using the newly developed RAMEN assembler, the sequence data derived from whole-genome shotgun (WGS) sequencing were assembled into 49,345 scaffolds that span a total length of 514 Mb including gaps and 387 Mb without gaps. Because the genome size of the silkworm is estimated to be 530 Mb, almost 97% of the genome has been organized in scaffolds, of which 75% has been sequenced. By carrying out a BLAST search for 50 characteristic Bombyx genes and 11,202 non-redundant expressed sequence tags (ESTs) in a Bombyx EST database against the WGS sequence data, we evaluated the validity of the sequence for elucidating the majority of silkworm genes. Analysis of the WGS data revealed that the silkworm genome contains many repetitive sequences with an average length of <500 bp. These repetitive sequences appear to have been derived from truncated transposons, which are interspersed at 2.5- to 3-kb intervals throughout the genome. This pattern suggests that silkworm may have an active mechanism that promotes removal of transposons from the genome. We also found evidence for insertions of mitochondrial DNA fragments at 9 sites. A search for Bombyx orthologs to Drosophila genes controlling sex determination in the WGS data revealed 11 Bombyx genes and suggested that the sex-determining systems differ profoundly between the two species.  相似文献   

19.
20.
With recent advances in genotyping and sequencing technologies,many disease susceptibility loci have been identified.However,much of the genetic heritability remains unexplained and the replication rate between independent studies is still low.Meanwhile,there have been increasing efforts on functional annotations of the entire human genome,such as the Encyclopedia of DNA Elements(ENCODE)project and other similar projects.It has been shown that incorporating these functional annotations to prioritize genome wide association signals may help identify true association signals.However,to our knowledge,the extent of the improvement when functional annotation data are considered has not been studied in the literature.In this article,we propose a statistical framework to estimate the improvement in replication rate with annotation data,and apply it to Crohn’s disease and DNase I hypersensitive sites.The results show that with cell line specific functional annotations,the expected replication rate is improved,but only at modest level.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号