首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
This paper proposes a new method to identify communities in generally weighted complex networks and apply it to phylogenetic analysis. In this case, weights correspond to the similarity indexes among protein sequences, which can be used for network construction so that the network structure can be analyzed to recover phylogenetically useful information from its properties. The analyses discussed here are mainly based on the modular character of protein similarity networks, explored through the Newman-Girvan algorithm, with the help of the neighborhood matrix . The most relevant networks are found when the network topology changes abruptly revealing distinct modules related to the sets of organisms to which the proteins belong. Sound biological information can be retrieved by the computational routines used in the network approach, without using biological assumptions other than those incorporated by BLAST. Usually, all the main bacterial phyla and, in some cases, also some bacterial classes corresponded totally (100%) or to a great extent (>70%) to the modules. We checked for internal consistency in the obtained results, and we scored close to 84% of matches for community pertinence when comparisons between the results were performed. To illustrate how to use the network-based method, we employed data for enzymes involved in the chitin metabolic pathway that are present in more than 100 organisms from an original data set containing 1,695 organisms, downloaded from GenBank on May 19, 2007. A preliminary comparison between the outcomes of the network-based method and the results of methods based on Bayesian, distance, likelihood, and parsimony criteria suggests that the former is as reliable as these commonly used methods. We conclude that the network-based method can be used as a powerful tool for retrieving modularity information from weighted networks, which is useful for phylogenetic analysis.  相似文献   

2.
Mitochondria originated endosymbiotically from an Alphaproteobacteria-like ancestor. However, it is still uncertain which extant group of Alphaproteobacteria is phylogenetically closer to the mitochondrial ancestor. The proposed groups comprise the order Rickettsiales, the family Rhodospirillaceae, and the genus Rickettsia. In this study, we apply a new complex network approach to investigate the evolutionary origins of mitochondria, analyzing protein sequences modules in a critical network obtained through a critical similarity threshold between the studied sequences. The dataset included three ATP synthase subunits (4, 6, and 9) and its alphaproteobacterial homologs (b, a, and c). In all the subunits, the results gave no support to the hypothesis that Rickettsiales are closely related to the mitochondrial ancestor. Our findings support the hypothesis that mitochondria share a common ancestor with a clade containing all Alphaproteobacteria orders, except Rickettsiales.  相似文献   

3.
Among the multitude of methods available for the study of origin and evolution of various life forms on Earth, the phylogenetic approach, i.e. the delineation of natural genetic relatedness amongst different groups of organisms, has been of particular interest to evolutionary biologists. An approach towards analysing phylogeny is the comparison of genome sequences of extant organisms by a variety of computational techniques. These studies rely mostly on the similarity or dissimilarity in global character of the genome in terms of sequence, without any consideration to its structure. In this work, we report a potentially new methodology towards elucidation of molecular phylogeny. The approach considers a structural parameter of the genome, namely its flexibility, and uses it to compare the small subunit ribosomal ribonucleic acid (SSU rRNA) gene from a cross-section of species. We find that the flexibility pattern of the genome is strikingly similar in organisms that are closer in evolutionary distance than the ones that are separated. This method of comparison thus might be utilised in constructing phylogenetic trees from flexibility patterns derived from nucleotide sequence.  相似文献   

4.
A total of 48 full-length protein sequences of pectin lyases from different source organisms available in NCBI were subjected to multiple sequence alignment, domain analysis, and phylogenetic tree construction. A phylogenetic tree constructed on the basis of the protein sequences revealed two distinct clusters representing pectin lyases from bacterial and fungal sources. Similarly, the multiple accessions of different source organisms representing bacterial and fungal pectin lyases also formed distinct clusters, showing sequence level homology. The sequence level similarities among different groups of pectinase enzymes, viz. pectin lyase, pectate lyase, polygalacturonase, and pectin esterase, were also analyzed by subjecting a single protein sequence from each group with common source organism to tree construction. Four distinct clusters representing different groups of pectinases with common source organisms were observed, indicating the existing sequence level similarity among them. Multiple sequence alignment of pectin lyase protein sequence of different source organisms along with pectinases with common source organisms revealed a conserved region, indicating homology at sequence level. A conserved domain Pec_Lyase_C was frequently observed in the protein sequences of pectin lyases and pectate lyases, while Glyco_hydro_28 domains and Pectate lyase-like β-helix clan domain are frequently observed in polygalacturonases and pectin esterases, respectively. The signature amino acid sequence of 41 amino acids, i.e. TYDNAGVLPITVN-SNKSLIGEGSKGVIKGKGLRIVSGAKNI, related with the Pec_Lyase_C is frequently observed in pectin lyase protein sequences and might be related with the structure and enzymatic function.  相似文献   

5.
油料作物EST资源的生物信息学分析   总被引:1,自引:0,他引:1  
利用生物信息学方法,收集整理GenBank数据库中截至2008年5月收录的油料作物油菜、花生、芝麻、大豆、向日葵、蓖麻、亚麻、棕榈等八种油料作物的表达序列标签(EST)序列信息,共获得1,185,911条EST序列,使用Crosmatch、RepeatMask-er、Phrap、CAP3、EMBOSS、Blast、EST-pipeline、ORF finder、Interproscan、blast2go、IdentiCS等软件,基于Linux操作系统,进行了综合及分类分析。共获得289,892条UniEST序列,通过以上对EST序列信息的基因注释信息,筛选出与油脂代谢相关的基因信息,并以此为基础构建了油料作物油脂代谢途径比较结构图。本研究为油料作物油脂代谢相关基因数据库的构建和不同油料作物油脂代谢异同的比较打下基础。  相似文献   

6.
BACKGROUND: Hundreds of genes lacking homology to any protein of known function are sequenced every day. Genome-context methods have proved useful in providing clues about functional annotations for many proteins. However, genome-context methods detect many biological types of functional associations, and do not identify which type of functional association they have found. RESULTS: We have developed two new genome-context-based algorithms. Algorithm 1 extends our previous algorithm for identifying missing enzymes in predicted metabolic pathways (pathway holes) to use genome-context features. The new algorithm has significantly improved scope because it can now be applied to pathway reactions to which sequence similarity methods cannot be applied due to an absence of known sequences for enzymes catalyzing the reaction in other organisms. The new method identifies at least one known enzyme in the top ten hits for 58% of EcoCyc reactions that lack enzyme sequences in other organisms. Surprisingly, the addition of genome-context features does not improve the accuracy of the algorithm when sequences for the enzyme do exist in other organisms. Algorithm 2 uses genome-context methods to predict three distinct types of functional relationships between pairs of proteins: pairs that occur in the same protein complex, the same pathway, or the same operon. This algorithm performs with varying degrees of accuracy on each type of relationship, and performs best in predicting pathway and protein complex relationships.  相似文献   

7.
Nuclear DNA sequence data for diploid organisms are potentially a rich source of phylogenetic information for disentangling the evolutionary relationships of closely related organisms, but present special phylogenetic problems owing to difficulties arising from heterozygosity and recombination. We analyzed allelic relationships for two nuclear gene regions (phosphoenolpyruvate carboxykinase and elongation factor-1a), along with a mitochondrial gene region (NADH dehydrogenase subunit 5), for an assemblage of closely related species of carabid beetles (Carabus subgenus Ohomopterus). We used a network approach to examine whether the nuclear gene sequences provide substantial phylogenetic information on species relationships and evolutionary history. The mitochondrial gene genealogy strongly contradicted the morphological species boundary as a result of introgression of heterospecific mitochondria. Two nuclear gene regions showed high allelic diversity within species, and this diversity was partially attributable to recombination between various alleles and high variability in the intron region. Shared nuclear alleles among species were rare and were considered to represent shared ancestral polymorphism. Despite the presence of recombination, nuclear allelic networks recovered species monophyly more often and presented genetic differentiation patterns (low to high) among species more clearly. Overall, nuclear gene networks provide clear evidence for separate biological species and information on the phylogenetic relationships among closely related carabid beetles.  相似文献   

8.

Background  

To infer the tree of life requires knowledge of the common characteristics of each species descended from a common ancestor as the measuring criteria and a method to calculate the distance between the resulting values of each measure. Conventional phylogenetic analysis based on genomic sequences provides information about the genetic relationships between different organisms. In contrast, comparative analysis of metabolic pathways in different organisms can yield insights into their functional relationships under different physiological conditions. However, evaluating the similarities or differences between metabolic networks is a computationally challenging problem, and systematic methods of doing this are desirable. Here we introduce a graph-kernel method for computing the similarity between metabolic networks in polynomial time, and use it to profile metabolic pathways and to construct phylogenetic trees.  相似文献   

9.
In order to simplify and meaningfully categorize large sets of protein sequence data, it is commonplace to cluster proteins based on the similarity of those sequences. However, it quickly becomes clear that the sequence flexibility allowed a given protein varies significantly among different protein families. The degree to which sequences are conserved not only differs for each protein family, but also is affected by the phylogenetic divergence of the source organisms. Clustering techniques that use similarity thresholds for protein families do not always allow for these variations and thus cannot be confidently used for applications such as automated annotation and phylogenetic profiling. In this work, we applied a spectral bipartitioning technique to all proteins from 53 archaeal genomes. Comparisons between different taxonomic levels allowed us to study the effects of phylogenetic distances on cluster structure. Likewise, by associating functional annotations and phenotypic metadata with each protein, we could compare our protein similarity clusters with both protein function and associated phenotype. Our clusters can be analyzed graphically and interactively online.  相似文献   

10.
Insect chitin synthase cDNA sequence, gene organization and expression.   总被引:1,自引:0,他引:1  
Chitin is a major component of the cuticle of arthropods. However, the synthesis of chitin is poorly understood. Feeding larvae of the insect Lucilia cuprina on the fungal chitin synthase competitive inhibitor, nikkomycin Z resulted in strong concentration-dependent mortality of the larvae (LD50 = 280 nM). This result demonstrates that chitin is an essential component of this insect. The complete cDNA and deduced amino-acid sequences of the first arthropod chitin synthase-like protein, LcCS-1, from the larvae of the insect L. cuprina have been determined. The cDNA sequence is 5757 bp in length and codes for a large complex protein containing 1592 amino acids (Mr = 180 717). Analysis of the whole protein sequence reveals low, but significant, similarity to yeast chitin synthases with stronger areas of conservation centred on local regions implicated in the active sites of the yeast enzymes. Strikingly, LcCS-1 contains 15-18 potential transmembrane segments, indicating that the protein is an integral membrane protein. Two alternative topographical models of LcCS-1 are described, which involve its association with either the plasma membrane or the membrane of intracellular vesicles. LcCS-1 mRNA is produced in all life stages of the insect with expression in the larval stage limited to the integument and trachea. In a third instar larva the mRNA was localized to a single layer of epidermal cells immediately underlying the procuticle region of the integument. cDNA or genomic sequences that are highly related to fragments of LcCS-1 were demonstrated in three insect orders, one arachnid and Caenorhabditis elegans, thereby attesting to the importance of this enzyme in these chitin-producing organisms. Bioinformatics has been used to deduce the gene sequence and organization of the highly homologous Drosophila melanogaster orthologue of LcCS-1, DmCS-1.  相似文献   

11.
Chen L  Vitkup D 《Genome biology》2006,7(2):R17-13
Homology-based methods fail to assign genes to many metabolic activities present in sequenced organisms. To suggest genes for these orphan activities we developed a novel method that efficiently combines local structure of a metabolic network with phylogenetic profiles. We validated our method using known metabolic genes in Saccharomyces cerevisiae and Escherichia coli. We show that our method should be easily transferable to other organisms, and that it is robust to errors in incomplete metabolic networks.  相似文献   

12.
Genome-scale metabolic models are central in connecting genotypes to metabolic phenotypes. However, even for well studied organisms, such as Escherichia coli, draft networks do not contain a complete biochemical network. Missing reactions are referred to as gaps. These gaps need to be filled to enable functional analysis, and gap-filling choices influence model predictions. To investigate whether functional networks existed where all gap-filling reactions were supported by sequence similarity to annotated enzymes, four draft networks were supplemented with all reactions from the Model SEED database for which minimal sequence similarity was found in their genomes. Quadratic programming revealed that the number of reactions that could partake in a gap-filling solution was vast: 3,270 in the case of E. coli, where 72% of the metabolites in the draft network could connect a gap-filling solution. Nonetheless, no network could be completed without the inclusion of orphaned enzymes, suggesting that parts of the biochemistry integral to biomass precursor formation are uncharacterized. However, many gap-filling reactions were well determined, and the resulting networks showed improved prediction of gene essentiality compared with networks generated through canonical gap filling. In addition, gene essentiality predictions that were sensitive to poorly determined gap-filling reactions were of poor quality, suggesting that damage to the network structure resulting from the inclusion of erroneous gap-filling reactions may be predictable.  相似文献   

13.
The dramatic increase in heterogeneous types of biological data—in particular, the abundance of new protein sequences—requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity—GPCRs and kinases from humans, and the crotonase superfamily of enzymes—we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.  相似文献   

14.
A combination of algorithms to search RNA sequence for the potential for secondary structure formation, and search large numbers of sequences for structural similarity, were used to search the 5'UTRs of annotated genes in the Escherichia coli genome for regulatory RNA structures. Using this approach, similar RNA structures that regulate genes in the thiamin metabolic pathway were identified. In addition, several putative regulatory structures were discovered upstream of genes involved in other metabolic pathways including glycerol metabolism and ethanol fermentation. The results demonstrate that this computational approach is a powerful tool for discovery of important RNA structures within prokaryotic organisms.  相似文献   

15.
We have developed a phylogeny-based design method that has been used to produce mutated proteins with enhanced thermal stabilities. We previously validated the predictive worth of the method by producing and characterizing mutants in which one original residue or a small number of the original residues had been replaced with the one or the ones found in the phylogenetically predicted “ancestral” sequence. For the current study, this method was used to design a sequence for the deepest nodal position of a phylogenic tree composed of 16 gyrase B-subunit sequences, which was then synthesized and characterized. The sequence was inferred from the sequences of 16 extant DNA gyrases and 3 extant type VI DNA topoisomerases. Genes encoding the inferred sequence and its N-terminal ATPase domain were PCR constructed and expressed in Escherichia coli. The full-length designed protein is slightly less thermally stable than is subunit B from the extant thermophilic Thermus thermophilus DNA gyrase, whereas the thermal stability of the designed ATPase domain is more similar to that of the T. thermophilus ATPase domain. Moreover, the designed ATPase domain has significant catalytic activity. Therefore, even a small set of homologous amino acid sequences contains sufficient information to design a thermally stable and functional protein. Because the isolated designed ATPase domain is more thermally stable and catalytically active than is the sequence containing the most frequently occurring amino acids among the 16 gyrases, the phylogenetic approach was superior (in this case, at least) to the consensus approach when the same data set was used to predict the two sequences.  相似文献   

16.

Background  

Many aspects of biological functions can be modeled by biological networks, such as protein interaction networks, metabolic networks, and gene coexpression networks. Studying the statistical properties of these networks in turn allows us to infer biological function. Complex statistical network models can potentially more accurately describe the networks, but it is not clear whether such complex models are better suited to find biologically meaningful subnetworks.  相似文献   

17.
《Genomics》2019,111(6):1590-1603
Genomes are not random sequences because natural selection has injected information in biological sequences for billions of years. Inspired by this idea, we developed a simple method to compare genomes considering nucleotide counts in subsequences (blocks) instead of their exact sequences.We introduce the Block Alignment method for comparing two genomes and based on this comparison method, define a similarity score and a distance. The presented model ignores nucleotide order in the sequence. On the other hand, in this block comparison method, due to exclusion of point mutations and small size variations, there is no need for high coverage sequencing which is responsible for the high costs of data production and storage; moreover, the sequence comparisons could be performed with higher speed.Phylogenetic trees of two sets of bacterial genomes were constructed and the results were in full agreement with their already constructed phylogenetic trees. Furthermore, a weighted and directed similarity network of each set of bacterial genomes was inferred ab initio by this model. Remarkably, the communities of these networks are in agreement with the clades of the corresponding phylogenetic trees which means these similarity networks also contain phylogenetic information about the genomes. Moreover, the block comparison method was used to distinguish rob(15;21)c-associated iAMP21 and sporadic iAMP21 rearrangements in subgroups of chromosome 21 in acute lymphoblastic leukemia. Our results show a meaningful difference between the number of contigs that mapped to chromosomes 15 and 21 in these cases. Furthermore, the presented block alignment model can select the candidate blocks to perform more accurate analysis and it is capable to find conserved blocks on a set of genomes.  相似文献   

18.
The traditional knowledge in textbooks indicated that cephalochordates were the closest relatives to vertebrates among all extant organisms. However, this opinion was challenged by several recent phylogenetic studies using hundreds of nuclear genes. The researchers suggested that urochordates, but not cephalochordates, should be the closest living relatives to vertebrates. In the present study, by using data generated from hundreds of mtDNA sequences, we revalue the deuterostome phylogeny in terms of whole mitochondrial genomes (mitogenomes). Our results firmly demonstrate that each of extant deuterostome phyla and chordate subphyla is monophyletic. But the results present several alternative phylogenetic trees depending on different sequence datasets used in the analysis. Although no clear phylogenetic relationships are obtained, those trees indicate that the ancient common ancestor diversified rapidly soon after their appearance in the early Cambrian and generated all major deuterostome lineages during a short historical period, which is consistent with "Cambrian explosion" revealed by paleontologists. It was the 520-million-year's evolution that obscured the phylogenetic relationships of extant deuterostomes. Thus, we conclude that an integrative analysis approach rather than simply using more DNA sequences should be employed to address the distant evolutionary relationship.  相似文献   

19.
20.
MOTIVATION: Because of the complexity of metabolic networks and their regulation, formal modelling is a useful method to improve the understanding of these systems. An essential step in network modelling is to validate the network model. Petri net theory provides algorithms and methods, which can be applied directly to metabolic network modelling and analysis in order to validate the model. The metabolism between sucrose and starch in the potato tuber is of great research interest. Even if the metabolism is one of the best studied in sink organs, it is not yet fully understood. RESULTS: We provide an approach for model validation of metabolic networks using Petri net theory, which we demonstrate for the sucrose breakdown pathway in the potato tuber. We start with hierarchical modelling of the metabolic network as a Petri net and continue with the analysis of qualitative properties of the network. The results characterize the net structure and give insights into the complex net behaviour.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号