首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
To address the void in the availability of high-quality proteomic data traversing the animal tree, we have implemented a pipeline for generating de novo assemblies based on publicly available data from the NCBI Sequence Read Archive, yielding a comprehensive collection of proteomes from 100 species spanning 21 animal phyla. We have also created the Animal Proteome Database (AniProtDB), a resource providing open access to this collection of high-quality metazoan proteomes, along with information on predicted proteins and protein domains for each taxonomic classification and the ability to perform sequence similarity searches against all proteomes generated using this pipeline. This solution vastly increases the utility of these data by removing the barrier to access for research groups who do not have the expertise or resources to generate these data themselves and enables the use of data from nontraditional research organisms that have the potential to address key questions in biomedicine.  相似文献   

2.
Here, we used data of complete genomes to study comparatively the metabolism of different species. We built phenetic trees based on the enzymatic functions present in different parts of metabolism. Seven broad metabolic classes, comprising a total of 69 metabolic pathways, were comparatively analyzed for 27 fully sequenced organisms of the domains Eukarya, Bacteria and Archaea. Phylogenetic profiles based on the presence/absence of enzymatic functions for each metabolic class were determined and distance matrices for all the organisms were then derived from the profiles. Unrooted phenetic trees based upon the matrices revealed the distribution of the organisms according to their metabolic capabilities, reflecting the ecological pressures and adaptations that those species underwent during their evolution. We found that organisms that are closely related in phylogenetic terms could be distantly related metabolically and that the opposite is also true. For example, obligate bacterial pathogens were usually grouped together in our metabolic trees, demonstrating that obligate pathogens share common metabolic features regardless of their diverse phylogenetic origins. The branching order of proteobacteria often did not match their classical phylogenetic classification and Gram-positive bacteria showed diverse metabolic affinities. Archaea were found to be metabolically as distant from free-living bacteria as from eukaryotes, and sometimes were placed close to the metabolically highly specialized group of obligate bacterial pathogens. Metabolic trees represent an integrative approach for the comparison of the evolution of the metabolism and its correlation with the evolution of the genome, helping to find new relationships in the tree of life.  相似文献   

3.
Das R  Gerstein M 《Proteins》2004,55(2):455-463
We have introduced a method to identify functional shifts in protein families. Our method is based on the calculation of an active-site conservation ratio, which we call the "ASC ratio." For a structurally based alignment of a protein family, this ratio is the average sequence similarity of the active-site region compared to the full-length protein. The active-site region is defined as all the residues within a certain radius of the known functionally important groups. Using our method, we have analyzed enzymes of central metabolism from a large number of genomes (35). We found that for most of the enzymes, the active-site region is more highly conserved than the full-length sequence. However, for three tricarboxylic acid (TCA)-cycle enzymes, active-site sequences are considerably more diverged (than full-length ones). In particular, we were able to identify in six pathogens a novel isocitrate dehydrogenase that has very low sequence similarity around the active site. Detailed sequence-structure analysis indicates that while the active-site structure of isocitrate dehydrogenase is most likely similar between pathogens and nonpathogens, the unusual sequence divergence could result from an extra domain added at the N-terminus. This domain has a leucine-rich motif similar one in the Yersinia pestis cytotoxin and may therefore confer additional pathogenic functions.  相似文献   

4.
Rai BK  Fiser A 《Proteins》2006,63(3):644-661
A major bottleneck in comparative protein structure modeling is the quality of input alignment between the target sequence and the template structure. A number of alignment methods are available, but none of these techniques produce consistently good solutions for all cases. Alignments produced by alternative methods may be superior in certain segments but inferior in others when compared to each other; therefore, an accurate solution often requires an optimal combination of them. To address this problem, we have developed a new approach, Multiple Mapping Method (MMM). The algorithm first identifies the alternatively aligned regions from a set of input alignments. These alternatively aligned segments are scored using a composite scoring function, which determines their fitness within the structural environment of the template. The best scoring regions from a set of alternative segments are combined with the core part of the alignments to produce the final MMM alignment. The algorithm was tested on a dataset of 1400 protein pairs using 11 combinations of two to four alignment methods. In all cases MMM showed statistically significant improvement by reducing alignment errors in the range of 3 to 17%. MMM also compared favorably over two alignment meta-servers. The algorithm is computationally efficient; therefore, it is a suitable tool for genome scale modeling studies.  相似文献   

5.
6.
哺乳动物经过长期进化,使其基因组在结构和功能上存在着明显的差异,构成了表型进化的基础。随着人类、部分哺乳动物基因组测序的完成,以比较基因组学为主要研究手段的哺乳动物进化研究应运而生,从而为在基因组水平上深入认识哺乳动物进化关系、揭示生命的起源和进化提供依据。对比较基因组学的主要研究方法进行了综述,进而探讨其在哺乳动物进化研究中的应用,并对哺乳动物比较基因组学的发展进行了展望。  相似文献   

7.
The genus Mycobacterium comprises significant pathogenic species that infect both humans and animals. One species within this genus, Mycobacterium tuberculosis, is the primary killer of humans resulting from bacterial infections. Five mycobacterial genomes belonging to four different species (M. tuberculosis, Mycobacterium bovis, Mycobacterium leprae and Mycobacterium avium ssp. paratuberculosis) have been sequenced to date and another 14 mycobacterial genomes are at various stages of completion. A comparative analysis of the gene products of key metabolic pathways revealed that the major differences among these species are in the gene products constituting the cell wall and the gene families encoding the acidic glycine-rich (PE/PPE/PGRS) proteins. Mycobacterium leprae has evolved by retaining a minimal gene set for most of the gene families, whereas M. avium ssp. paratuberculosis has acquired some of the virulence factors by lateral gene transfer.  相似文献   

8.
Protein structure prediction by comparative modeling benefits greatly from the use of multiple sequence alignment information to improve the accuracy of structural template identification and the alignment of target sequences to structural templates. Unfortunately, this benefit is limited to those protein sequences for which at least several natural sequence homologues exist. We show here that the use of large diverse alignments of computationally designed protein sequences confers many of the same benefits as natural sequences in identifying structural templates for comparative modeling targets. A large-scale massively parallelized application of an all-atom protein design algorithm, including a simple model of peptide backbone flexibility, has allowed us to generate 500 diverse, non-native, high-quality sequences for each of 264 protein structures in our test set. PSI-BLAST searches using the sequence profiles generated from the designed sequences ("reverse" BLAST searches) give near-perfect accuracy in identifying true structural homologues of the parent structure, with 54% coverage. In 41 of 49 genomes scanned using reverse BLAST searches, at least one novel structural template (not found by the standard method of PSI-BLAST against PDB) is identified. Further improvements in coverage, through optimizing the scoring function used to design sequences and continued application to new protein structures beyond the test set, will allow this method to mature into a useful strategy for identifying distantly related structural templates.  相似文献   

9.
Liu J  Rost B 《Proteins》2004,55(3):678-688
We developed a method CHOP dissecting proteins into domain-like fragments. The basic idea was to cut proteins beginning from very reliable experimental information (PDB), proceeding to expert annotations of domain-like regions (Pfam-A), and completing through cuts based on termini of known proteins. In this way, CHOP dissected more than two thirds of all proteins from 62 proteomes. Analysis of our structural domain-like fragments revealed four surprising results. First, >70% of all dissected proteins contained more than one fragment. Second, most domains spanned on average over approximately 100 residues. This average was similar for eukaryotic and prokaryotic proteins, and it is also valid-although previously not described-for all proteins in the PDB. Third, single-domain proteins were significant longer than most domains in multidomain proteins. Fourth, three fourths of all domains appeared shorter than 210 residues. We believe that our CHOP fragments constituted an important resource for functional and structural genomics. Nevertheless, our main motivation to develop CHOP was that the single-linkage clustering method failed to adequately group full-length proteins. In contrast, CLUP-the simple clustering scheme CLUP introduced here-succeeded largely to group the CHOP fragments from 62 proteomes such that all members of one cluster shared a basic structural core. CLUP found >63,000 multi- and >118,000 single-member clusters. Although most fragments were restricted to a particular cluster, approximately 24% of the fragments were duplicated in at least two clusters. Our thresholds for grouping two fragments into the same cluster were rather conservative. Nevertheless, our results suggested that structural genomics initiatives have to target >30,000 fragments to at least cover the multimember clusters in 62 proteomes.  相似文献   

10.
Cai XH  Jaroszewski L  Wooley J  Godzik A 《Proteins》2011,79(8):2389-2402
The protein universe can be organized in families that group proteins sharing common ancestry. Such families display variable levels of structural and functional divergence, from homogenous families, where all members have the same function and very similar structure, to very divergent families, where large variations in function and structure are observed. For practical purposes of structure and function prediction, it would be beneficial to identify sub-groups of proteins with highly similar structures (iso-structural) and/or functions (iso-functional) within divergent protein families. We compared three algorithms in their ability to cluster large protein families and discuss whether any of these methods could reliably identify such iso-structural or iso-functional groups. We show that clustering using profile-sequence and profile-profile comparison methods closely reproduces clusters based on similarities between 3D structures or clusters of proteins with similar biological functions. In contrast, the still commonly used sequence-based methods with fixed thresholds result in vast overestimates of structural and functional diversity in protein families. As a result, these methods also overestimate the number of protein structures that have to be determined to fully characterize structural space of such families. The fact that one can build reliable models based on apparently distantly related templates is crucial for extracting maximal amount of information from new sequencing projects.  相似文献   

11.
Alignment of protein sequences by their profiles   总被引:7,自引:0,他引:7  
The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.  相似文献   

12.
The advent of deep sequencing technology has unexpectedly advanced our structural understanding of molecules composed of nucleic acids. A significant amount of progress has been made recently extrapolating the chemical methods to probe RNA structure into sequencing methods. Herein we review some of the canonical methods to analyze RNA structure, and then we outline how these have been used to probe the structure of many RNAs in parallel. The key is the transformation of structural biology problems into sequencing problems, whereby sequencing power can be interpreted to understand nucleic acid proximity, nucleic acid conformation, or nucleic acid‐protein interactions. Utilizing such technologies in this way has the promise to provide novel structural insights into the mechanisms that control normal cellular physiology and provide insight into how structure could be perturbed in disease.  相似文献   

13.
14.
Rapid increase in protein sequence information from genome sequencing projects demand the intervention of bioinformatics tools to recognize interesting gene-products and associated function. Often, multiple algorithms need to be employed to improve accuracy in predictions and several structure prediction algorithms are on the public domain. Here, we report the availability of an Integrated Web-server as a bioinformatics online package dedicated for in-silico analysis of protein sequence and structure data (IWS). IWS provides web interface to both in-house and widely accepted programs from major bioinformatics groups, organized as 10 different modules. IWS also provides interactive images for Analysis Work Flow, which will provide transparency to the user to carry out analysis by moving across modules seamlessly and to perform their predictions in a rapid manner. AVAILABILITY: IWS IS AVAILABLE FROM THE URL: http://caps.ncbs.res.in/iws.  相似文献   

15.
Facing the ever-growing list of newly discovered classes of functional RNAs, it can be expected that further types of functional RNAs are still hidden in recently completed genomes. The computational identification of such RNA genes is, therefore, of major importance. While most known functional RNAs have characteristic secondary structures, their free energies are generally not statistically significant enough to distinguish RNA genes from the genomic background. Additional information is required. Considering the wide availability of new genomic data of closely related species, comparative studies seem to be the most promising approach. Here, we show that prediction of consensus structures of aligned sequences can be a significant measure to detect functional RNAs. We report a new method to test multiple sequence alignments for the existence of an unusually structured and conserved fold. We show for alignments of six types of well-known functional RNA that an energy score consisting of free energy and a covariation term significantly improves sensitivity compared to single sequence predictions. We further test our method on a number of non-coding RNAs from Caenorhabditis elegans/Caenorhabditis briggsae and seven Saccharomyces species. Most RNAs can be detected with high significance. We provide a Perl implementation that can be used readily to score single alignments and discuss how the methods described here can be extended to allow for efficient genome-wide screens.  相似文献   

16.
In this study, two-component system (TCS) gene profile and metabolic network gene profile based phylogenetic trees were constructed and compared to each other to evaluate the evolutionary relationship between the bacterial sensing system and metabolism. The gene profiles of the these systems suggested that bacteria employed different evolutionary strategies to optimize the two-component system and metabolic network. In addition, comparative analysis revealed that the TCS based tree showed better family grouping than the metabolic network based tree, which indicated that the TCS and metabolic network have been modified via self-evolution and recruitment methods, respectively.  相似文献   

17.
花色变异的分子基础与进化模式研究进展   总被引:8,自引:1,他引:8  
近年来国际上风行的生态学与进化生物学的学科整合已成为生物学发展的一个趋势.寻找适合的生物学系统来进行从表型到基因型的综合研究是推动这一整合向纵深发展的一项必要的和带探索性的工作.被子植物花色的形成机理和有关代谢途径上的结构和调控基因在若干模式植物中已有相当了解,使花色成为适合生态与进化生物学研究的一个首选性状,为进一步了解野生种中花色的形成机制奠定了基础.本文着重介绍旋花科(Convolvulaceae)番薯属(Ipomoea)花青素代谢途径的分子遗传学、生物化学和生态学工作,试图从多学科的角度提供有关花色自然变异的知识背景,并指出未解决的生物学问题和预期今后可能出现的发展.  相似文献   

18.
以白心木薯华南6068、华南9号、紫叶黄心木薯BGM019和粉红木薯Mirasol为材料,探究木薯块根膨大期和成熟期与类胡萝卜素代谢通路相关的14个基因和4种蛋白质表达水平变化。用HPLC检测块根β-胡萝卜素含量的变化,分别用qRT-PCR和Western blot方法对类胡萝卜素代谢通路相关基因和蛋白酶的表达水平进行分析。以华南6068为对照,研究结果表明:华南9号和紫叶黄心木薯BGM019成熟期中的类胡萝卜素合成途径关键基因PSY2、LCYB基因显著高于膨大期,而降解相关的关键基因CCD1、NCED3在成熟期的表达量显著低于膨大期(P0.05)。粉红木薯Mirasol成熟期中PSY2、LCYB的显著下调与CCD1、NCED3的显著上调(P0.05)是造成β-胡萝卜素含量差异的原因之一。通过分析不同木薯品种(系)在膨大期和成熟期块根类胡萝卜素代谢途径相关基因的表达水平,有助于解析β-胡萝卜素积累的分子机理。此外,Western blot结果显示抗坏血酸过氧化物酶、谷胱甘肽还原酶、超氧化物歧化酶和HSP70虽然和块根类胡萝卜素代谢途径没有直接关联,但它们在木薯膨大期和成熟期块根表达水平有显著差异(P0.05)。  相似文献   

19.
Many plant genomes display high levels of repetitive sequences. The assembly of these complex genomes using short high‐throughput sequence reads is still a challenging task. Underestimation or disregard of repeat complexity in these datasets can easily misguide downstream analysis. Detection of repetitive regions by k‐mer counting methods has proved to be reliable. Easy‐to‐use applications utilizing k‐mer counting are in high demand, especially in the domain of plants. We present Kmasker plants, a tool that uses k‐mer count information as an assistant throughout the analytical workflow of genome data that is provided as a command‐line and web‐based solution. Beside its core competence to screen and mask repetitive sequences, we have integrated features that enable comparative studies between different cultivars or closely related species and methods that estimate target specificity of guide RNAs for application of site‐directed mutagenesis using Cas9 endonuclease. In addition, we have set up a web service for Kmasker plants that maintains pre‐computed indices for 10 of the economically most important cultivated plants. Source code for Kmasker plants has been made publically available at https://github.com/tschmutzer/kmasker . The web service is accessible at https://kmasker.ipk-gatersleben.de .  相似文献   

20.
The completely sequenced archaeal genomes potentially encode, among their many functionally uncharacterized genes, novel enzymes of biotechnological interest. We have developed a prediction method for detection and classification of enzymes from sequence alone (available at http://www.cbs.dtu.dk/services/ArchaeaFun/). The method does not make use of sequence similarity; rather, it relies on predicted protein features like cotranslational and posttranslational modifications, secondary structure, and simple physical/chemical properties.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号