首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The coverage and reliability of protein-protein interactions determined by high-throughput experiments still needs to be improved, especially for higher organisms, therefore the question persists, how interactions can be verified and predicted by computational approaches using available data on protein structural complexes. Recently we developed an approach called IBIS (Inferred Biomolecular Interaction Server) to predict and annotate protein-protein binding sites and interaction partners, which is based on the assumption that the structural location and sequence patterns of protein-protein binding sites are conserved between close homologs. In this study first we confirmed high accuracy of our method and found that its accuracy depends critically on the usage of all available data on structures of homologous complexes, compared to the approaches where only a non-redundant set of complexes is employed. Second we showed that there exists a trade-off between specificity and sensitivity if we employ in the prediction only evolutionarily conserved binding site clusters or clusters supported by only one observation (singletons). Finally we addressed the question of identifying the biologically relevant interactions using the homology inference approach and demonstrated that a large majority of crystal packing interactions can be correctly identified and filtered by our algorithm. At the same time, about half of biological interfaces that are not present in the protein crystallographic asymmetric unit can be reconstructed by IBIS from homologous complexes without the prior knowledge of crystal parameters of the query protein.  相似文献   

3.
Studies of the life cycle stages of digeneans and oncophoreans (= monogeneans and cestodes) indicate that these two groups had separate origins from free-living rhabdocoel-like ancestors and that the original single-host life cycles became 2-host cycles through accidental ingestion, in digeneans by free-swimming adults being ingested by vertebrates, and in cestodes by eggs being ingested by invertebrates. In both lines a third host was incorporated as a means of increasing the efficiency of transfer between hosts, in digeneans between the primary mollusc and the secondary vertebrate, and in cestodes between the secondary (“first intermediate”) host and the primary vertebrate host.  相似文献   

4.
5.
6.
The sequencing and analysis of multiple housekeeping genes has been routinely used to phylogenetically compare closely related bacterial isolates. Recent studies using whole-genome alignment (WGA) and phylogenetics from >100 Escherichia coli genomes has demonstrated that tree topologies from WGA and multilocus sequence typing (MLST) markers differ significantly. A nonrepresentative phylogeny can lead to incorrect conclusions regarding important evolutionary relationships. In this study, the Phylomark algorithm was developed to identify a minimal number of useful phylogenetic markers that recapitulate the WGA phylogeny. To test the algorithm, we used a set of diverse draft and complete E. coli genomes. The algorithm identified more than 100,000 potential markers of different fragment lengths (500 to 900 nucleotides). Three molecular markers were ultimately chosen to determine the phylogeny based on a low Robinson-Foulds (RF) distance compared to the WGA phylogeny. A phylogenetic analysis demonstrated that a more representative phylogeny was inferred for a concatenation of these markers compared to all other MLST schemes for E. coli. As a functional test of the algorithm, the three markers (genomic guided E. coli markers, or GIG-EM) were amplified and sequenced from a set of environmental E. coli strains (ECOR collection) and informatically extracted from a set of 78 diarrheagenic E. coli strains (DECA collection). In the instances of the 40-genome test set and the DECA collection, the GIG-EM system outperformed other E. coli MLST systems in terms of recapitulating the WGA phylogeny. This algorithm can be employed to determine the minimal marker set for any organism that has sufficient genome sequencing.  相似文献   

7.
Accurate multiple sequence alignments of proteins are very important to several areas of computational biology and provide an understanding of phylogenetic history of domain families, their identification and classification. This article presents a new algorithm, REFINER, that refines a multiple sequence alignment by iterative realignment of its individual sequences with the predetermined conserved core (block) model of a protein family. Realignment of each sequence can correct misalignments between a given sequence and the rest of the profile and at the same time preserves the family's overall block model. Large-scale benchmarking studies showed a noticeable improvement of alignment after refinement. This can be inferred from the increased alignment score and enhanced sensitivity for database searching using the sequence profiles derived from refined alignments compared with the original alignments. A standalone version of the program is available by ftp distribution (ftp://ftp.ncbi.nih.gov/pub/REFINER) and will be incorporated into the next release of the Cn3D structure/alignment viewer.  相似文献   

8.
Phylogenetic inference based on matrix representation of trees.   总被引:14,自引:0,他引:14  
Rooted phylogenetic trees can be represented as matrices in which the rows correspond to termini, and columns correspond to internal nodes (elements of the n-tree). Parsimony analysis of such a matrix will fully recover the topology of the original tree. The maximum size of the represented matrix depends only on the number of termini in the tree; for a tree derived from molecular sequences, the represented matrix may be orders of magnitude smaller than the original data matrix. Representations of multiple trees (which may or may not have identical termini) can readily be combined into a single matrix; columns of discrete-character-state data can be added and, if desired, weighted differentially. Parsimony analysis of the resulting composite matrix yields a hybrid supertree which typically provides greater resolution than conventional consensus trees. Use of this method is illustrated with examples involving multiple tRNA genes in organelles and multiple protein-coding genes in eukaryotes.  相似文献   

9.
The use of some multiple-sequence alignments in phylogenetic analysis, particularly those that are not very well conserved, requires the elimination of poorly aligned positions and divergent regions, since they may not be homologous or may have been saturated by multiple substitutions. A computerized method that eliminates such positions and at the same time tries to minimize the loss of informative sites is presented here. The method is based on the selection of blocks of positions that fulfill a simple set of requirements with respect to the number of contiguous conserved positions, lack of gaps, and high conservation of flanking positions, making the final alignment more suitable for phylogenetic analysis. To illustrate the efficiency of this method, alignments of 10 mitochondrial proteins from several completely sequenced mitochondrial genomes belonging to diverse eukaryotes were used as examples. The percentages of removed positions were higher in the most divergent alignments. After removing divergent segments, the amino acid composition of the different sequences was more uniform, and pairwise distances became much smaller. Phylogenetic trees show that topologies can be different after removing conserved blocks, particularly when there are several poorly resolved nodes. Strong support was found for the grouping of animals and fungi but not for the position of more basal eukaryotes. The use of a computerized method such as the one presented here reduces to a certain extent the necessity of manually editing multiple alignments, makes the automation of phylogenetic analysis of large data sets feasible, and facilitates the reproduction of the final alignment by other researchers.  相似文献   

10.
Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of bayesian methods. In this paper, we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets and show how to apply this framework--which we refer to as PosetSMC--to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/ bouchard/PosetSMC.  相似文献   

11.
Given a family of related sequences, one can first determinealignments between various pairs of those sequences, then constructa simultaneous alignment of all the sequences that is determinedin a natural manner by the set of pairwise alignments. Thisapproach is sometimes effective for exposing the existence andlocations of conserved regions, which can then be aligned bymore sensitive multiple-alignment methods. This paper presentsan efficient algorithm for constructing a multiple alignmentfrom a set of pairwise alignments.  相似文献   

12.
We have developed a quick web-based application for designing conserved genomic PCR and RT-PCR primers from multigenome alignments targeting specific exons or introns. We used Pygr (The Python Graph Database Framework for Bioinformatics) to query intervals from multigenome alignments, which gives us less than a millisecond access to any intervals of any genome within multigenome alignments. PRIMER3 was used to extract optimal primers from a gene of interest. QPRIMER creates an electronic genomic PCR image from a set of conserved primers as well as summary pages for primer alignments and products. QPRIMER supports human, mouse, rat, chicken, dog, zebrafish and fruit fly. Availability: http://www.bioinformatics.ucla.edu/QPRIMER/.  相似文献   

13.

Background  

The alignment of biological sequences is of chief importance to most evolutionary and comparative genomics studies, yet the two main approaches used to assess alignment accuracy have flaws: reference alignments are derived from the biased sample of proteins with known structure, and simulated data lack realism.  相似文献   

14.
A total of 22 genes from the genome of Salinibacter ruber strain M31 were selected in order to study the phylogenetic position of this species based on protein alignments. The selection of the genes was based on their essential function for the organism, dispersion within the genome, and sufficient informative length of the final alignment. For each gene, an individual phylogenetic analysis was performed and compared with the resulting tree based on the concatenation of the 22 genes, which rendered a single alignment of 10,757 homologous positions. In addition to the manually chosen genes, an automatically selected data set of 74 orthologous genes was used to reconstruct a tree based on 17,149 homologous positions. Although single genes supported different topologies, the tree topology of both concatenated data sets was shown to be identical to that previously observed based on small subunit (SSU) rRNA gene analysis, in which S. ruber was placed together with Bacteroidetes. In both concatenated data sets the bootstrap was very high, but an analysis with a gradually lower number of genes indicated that the bootstrap was greatly reduced with less than 12 genes. The results indicate that tree reconstructions based on concatenating large numbers of protein coding genes seem to produce tree topologies with similar resolution to that of the single 16S rRNA gene trees. For classification purposes, 16S rRNA gene analysis may remain as the most pragmatic approach to infer genealogic relationships.  相似文献   

15.
ddbRNA: detection of conserved secondary structures in multiple alignments   总被引:4,自引:0,他引:4  
MOTIVATION: Structured non-coding RNAs (ncRNAs) have a very important functional role in the cell. No distinctive general features common to all ncRNA have yet been discovered. This makes it difficult to design computational tools able to detect novel ncRNAs in the genomic sequence. RESULTS: We devised an algorithm able to detect conserved secondary structures in both pairwise and multiple DNA sequence alignments with computational time proportional to the square of the sequence length. We implemented the algorithm for the case of pairwise and three-way alignments and tested it on ncRNAs obtained from public databases. On the test sets, the pairwise algorithm has a specificity greater than 97% with a sensitivity varying from 22.26% for Blast alignments to 56.35% for structural alignments. The three-way algorithm behaves similarly. Our algorithm is able to efficiently detect a conserved secondary structure in multiple alignments.  相似文献   

16.

Background  

In 2004, Bejerano et al. announced the startling discovery of hundreds of "ultraconserved elements", long genomic sequences perfectly conserved across human, mouse, and rat. Their announcement stimulated a flurry of subsequent research.  相似文献   

17.
The Conserved Domain Database (CDD) is now indexed as a separate database within the Entrez system and linked to other Entrez databases such as MEDLINE(R). This allows users to search for domain types by name, for example, or to view the domain architecture of any protein in Entrez's sequence database. CDD can be accessed on the WorldWideWeb at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. Users may also employ the CD-Search service to identify conserved domains in new sequences, at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. CD-Search results, and pre-computed links from Entrez's protein database, are calculated using the RPS-BLAST algorithm and Position Specific Score Matrices (PSSMs) derived from CDD alignments. CD-Searches are also run by default for protein-protein queries submitted to BLAST(R) at http://www.ncbi.nlm.nih.gov/BLAST. CDD mirrors the publicly available domain alignment collections SMART and PFAM, and now also contains alignment models curated at NCBI. Structure information is used to identify the core substructure likely to be present in all family members, and to produce sequence alignments consistent with structure conservation. This alignment model allows NCBI curators to annotate 'columns' corresponding to functional sites conserved among family members.  相似文献   

18.
Phylogenetic inference under the pure drift model   总被引:1,自引:1,他引:0  
When pairwise genetic distances are used for phylogenetic reconstruction, it is usually assumed that the genetic distance between two taxa contains information about the time after the two taxa diverged. As a result, upon an appropriate transformation if necessary, the distance usually can be fitted to a linear model such that it is expressed as the sum of lengths of all branches that connect the two taxa in a given phylogeny. This kind of distance is referred to as "additive distance." For a phylogenetic tree exclusively driven by random genetic drift, genetic distances related to coancestry coefficients (theta XY) between any two taxa are more suitable. However, these distances are fundamentally different from the additive distance in that coancestry does not contain any information about the time after two taxa split from a common ancestral population; instead, it reflects the time before the two taxa diverged. In other words, the magnitude of theta XY provides information about how long the two taxa share the same evolutionary pathways. The fundamental difference between the two kinds of distances has led to a different algorithm of evaluating phylogenetic trees when theta XY and related distance measures are used. Here we present the new algorithm using the ordinary- least-squares approach but fitting to a different linear model. This treatment allows genetic variation within a taxon to be included in the model. Monte Carlo simulation for a rooted phylogeny of four taxa has verified the efficacy and consistency of the new method. Application of the method to human population was demonstrated.   相似文献   

19.

Background  

Although originally thought to be less frequent in plants than in animals, alternative splicing (AS) is now known to be widespread in plants. Here we report the characteristics of AS in legumes, one of the largest and most important plant families, based on EST alignments to the genome sequences of Medicago truncatula (Mt) and Lotus japonicus (Lj).  相似文献   

20.
The application of Needleman-Wunsch alignment techniques to biological sequences is complicated by two serious problems when the sequences are long: the running time, which scales as the product of the lengths of sequences, and the difficulty in obtaining suitable parameters that produce meaningful alignments. The running time problem is often corrected by reducing the search space, using techniques such as banding, or chaining of high-scoring pairs. The parameter problem is more difficult to fix, partly because the probabilistic model, which Needleman-Wunsch is equivalent to, does not capture a key feature of biological sequence alignments, namely the alternation of conserved blocks and seemingly unrelated nonconserved segments. We present a solution to the problem of designing efficient search spaces for pair hidden Markov models that align biological sequences by taking advantage of their associated features. Our approach leads to an optimization problem, for which we obtain a 2-approximation algorithm, and that is based on the construction of Manhattan networks, which are close relatives of Steiner trees. We describe the underlying theory and show how our methods can be applied to alignment of DNA sequences in practice, successfully reducing the Viterbi algorithm search space of alignment PHMMs by three orders of magnitude.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号