首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The evolutionary history of living African amphibians remains poorly understood. This study estimates the phylogeny within the frog genera Arthroleptis and Cardioglossa using approximately 2400 bases of mtDNA sequence data (12S, tRNA-Valine, and 16S genes) from half of the described species. Analyses are conducted using parsimony, maximum likelihood, and Bayesian methods. The effect of alignment on phylogeny estimation is explored by separately analyzing alignments generated with different gap costs and a consensus alignment. The consensus alignment results in species paraphyly, low nodal support, and incongruence with the results based on other alignments, which produced largely similar results. Most nodes in the phylogeny are highly supported, yet several topologies are inconsistent with previous hypotheses. The monophyly of Cardioglossa and of miniature species previously assigned to Schoutedenella was further examined using Templeton and Shimodaira–Hasegawa tests. Cardioglossa monophyly is rejected and C. aureoli is transferred to Arthroleptis. These tests do not reject Schoutedenella monophyly, but this hypothesis receives no support from non-parametric bootstrapping or Bayesian posterior probabilities. This phylogeny provides a framework for reconstructing historical biogeography and analyzing the evolution of body size and life history. Direct development and miniaturization appear at the base of Arthroleptis phylogeny concomitant with a range expansion from Central Africa to throughout most of sub-Saharan Africa.  相似文献   

2.
We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain.  相似文献   

3.
MOTIVATION: Overlapping gene coding sequences (CDSs) are particularly common in viruses but also occur in more complex genomes. Detecting such genes with conventional gene-finding algorithms can be difficult for several reasons. If an overlapping CDS is on the same read-strand as a known CDS, then there may not be a distinct promoter or mRNA. Furthermore, the constraints imposed by double-coding can result in atypical codon biases. However, these same constraints lead to particular mutation patterns that may be detectable in sequence alignments. RESULTS: In this paper, we investigate several statistics for detecting double-coding sequences with pairwise alignments--including a new maximum-likelihood method. We also develop a model for double-coding sequence evolution. Using simulated sequences generated with the model, we characterize the distribution of each statistic as a function of sequence composition, length, divergence time and double-coding frame. Using these results, we develop several algorithms for detecting overlapping CDSs. The algorithms were tested on known overlapping CDSs and other overlapping open reading frames (ORFs) in the hepatitis B virus (HBV), Escherichia coli and Salmonella typhimurium genomes. The algorithms should prove useful for detecting novel overlapping genes--especially short coding ORFs in viruses. AVAILABILITY: Programs may be obtained from the authors. SUPPLEMENTARY INFORMATION: http://biochem.otago.ac.nz/double.html.  相似文献   

4.
5.
The concept of consensus in multiple sequence alignments (MSAs) has been used to design and engineer proteins previously with some success. However, consensus design implicitly assumes that all amino acid positions function independently, whereas in reality, the amino acids in a protein interact with each other and work cooperatively to produce the optimum structure required for its function. Correlation analysis is a tool that can capture the effect of such interactions. In a previously published study, we made consensus variants of the triosephosphate isomerase (TIM) protein using MSAs that included sequences form both prokaryotic and eukaryotic organisms. These variants were not completely native-like and were also surprisingly different from each other in terms of oligomeric state, structural dynamics, and activity. Extensive correlation analysis of the TIM database has revealed some clues about factors leading to the unusual behavior of the previously constructed consensus proteins. Among other things, we have found that the more ill-behaved consensus mutant had more broken correlations than the better-behaved consensus variant. Moreover, we report three correlation and phylogeny-based consensus variants of TIM. These variants were more native-like than the previous consensus mutants and considerably more stable than a wild-type TIM from a mesophilic organism. This study highlights the importance of choosing the appropriate diversity of MSA for consensus analysis and provides information that can be used to engineer stable enzymes.  相似文献   

6.
Sun Y  Zeng F  Zhang W  Qiao J 《Gene》2012,499(2):288-296
Antibiotic glycosyltransferases (AGts) attach unusual deoxy-sugars to aglycons so antibiotics can exert function. It has been reported that polyene macrolide (PEM) AGts have different evolutionary origin when compared with other polyketide AGts, and our previous analysis have suggested that they could be results of horizontal gene transfer (HGT) from eukaryotes. In this paper, we compared the structures of PEM AGts with structures of eukaryotes and other AGts, and then built models of the representative PEM AGts and GT-1 glycosyltransferases. We also constructed the Neighbor-Joining (NJ) trees based on the normalized Root Mean Square (RMS) distance, the Bayesian tree guided by structural alignments, and carried out analysis on several key conserved residues in PEM AGts. The NJ tree showed a close relationship between PEM AGts and eukaryotic glycosyltransferases, and Bayesian tree further supported their affinity with UDP-glucuronosyltransferases (UGTs). Analysis on key conserved residues showed that PEM AGts may have similar interaction mechanism such as in the formation of hydrogen bonds as eukaryotic glycosyltransferases. Using structure-based phylogenetic approaches, this study further supported that PEM AGts were the result of HGT between prokaryotes and eukaryotes.  相似文献   

7.
8.
Galtier et al. (Science 1999, 283, 220-221) exploit the correlation between the optimal growth temperature in prokaryotes and the G+C content of rRNAs and establish that the last universal common ancestor (LUCA) lived in a mesophilic environment. This result was achieved by estimating the G+C content of the ancestral sequences of the rRNAs of the LUCA through use of a complex Markov model. I have re-analysed their alignments of the rDNAs with maximum parsimony and I have found that their result is not robust and is, in all likelihood, incorrect. In particular, the rRNA ancestral sequences reconstructed with maximum parsimony from these rDNA alignments as well as those reconstructed after eliminating all the sites that turn out to be ambiguous to the parsimony algorithm and to a site-by-site inspection of these alignments, are such as to suggest that the LUCA lived in a thermophilic or hyperthermophilic environment. This finding is also supported by some tRNA ancestral sequences. The main conclusion of this analysis is that if the LUCA was a progenote then the origin of life might have taken place at a high temperature.  相似文献   

9.
10.
Previously published phylogenetic trees reconstructed on "Rieske protein" sequences frequently are at odds with each other, with those of other subunits of the parent enzymes and with small-subunit rRNA trees. These differences are shown to be at least partially if not completely due to problems in the reconstruction procedures. A major source of erroneous Rieske protein trees lies in the presence of a large, poorly conserved domain prone to accommodate very long insertions in well-defined structural hot spots substantially hampering multiple alignments. The remaining smaller domain, in contrast, is too conserved to allow distant phylogenies to be deduced with sufficient confidence. Three-dimensional structures of representatives from this protein family are now available from phylogenetically distant species and from diverse enzymes. Multiple alignments can thus be refined on the basis of these structures. We show that structurally guided alignments of Rieske proteins from Rieske-cytochrome b complexes and arsenite oxidases strongly reduce conflicts between resulting trees and those obtained on their companion enzyme subunits. Further problems encountered during this work, mainly consisting in database errors such as wrong annotations and frameshifts, are described. The obtained results are discussed against the background of hypotheses stipulating pervasive lateral gene transfer in prokaryotes.  相似文献   

11.
R Staden 《Nucleic acids research》1982,10(15):4731-4751
This paper describes a computer method for handling gel reading data produced by the shotgun method of DNA sequencing. The method greatly reduces the time the sequencer needs to spend checking and editing his data and yet it produces a consensus sequence for which the accuracy of determination of every base can be clearly shown. The program can take a batch of new gel readings, screen them against vector sequences removing any that match, and then compare and align all the sequences to produce a final consensus. No information is lost in this process as alignments are achieved by making only insertions and because all the individual gel readings are added to a database from which they can be retrieved and displayed lined up one above the other. This allows the user to check on the alignments achieved by the program and if necessary change them. As each gel reading is added to the database the consensus is automatically updated accordingly and used for the next comparisons. This is a much faster process than comparing each new gel against every individual gel in the database.  相似文献   

12.
The quest to discover the variety of ecological niches inhabited by Saccharomyces cerevisiae has led to research in areas as diverse as wineries, oak trees and insect guts. The discovery of fungal communities in the human gastrointestinal tract suggested the host's gut as a potential reservoir for yeast adaptation. Here, we report the existence of yeast populations associated with the human gut (HG) that differ from those isolated from other human body sites. Phylogenetic analysis on 12 microsatellite loci and 1715 combined CDSs from whole-genome sequencing revealed three subclusters of HG strains with further evidence of clonal colonization within the host's gut. The presence of such subclusters was supported by other genomic features, such as copy number variation, absence/introgressions of CDSs and relative polymorphism frequency. Functional analysis of CDSs specific of the different subclusters suggested possible alterations in cell wall composition and sporulation features. The phenotypic analysis combined with immunological profiling of these strains further showed that sporulation was related with strain-specific genomic characteristics in the immune recognition pattern. We conclude that both genetic and environmental factors involved in cell wall remodelling and sporulation are the main drivers of adaptation in S. cerevisiae populations in the human gut.  相似文献   

13.
Evolutionary relationship between K(+) channels and symporters.   总被引:5,自引:0,他引:5       下载免费PDF全文
  相似文献   

14.
Although multiple sequence alignments (MSAs) are essential for a wide range of applications from structure modeling to prediction of functional sites, construction of accurate MSAs for distantly related proteins remains a largely unsolved problem. The rapidly increasing database of spatial structures is a valuable source to improve alignment quality. We explore the use of 3D structural information to guide sequence alignments constructed by our MSA program PROMALS. The resulting tool, PROMALS3D, automatically identifies homologs with known 3D structures for the input sequences, derives structural constraints through structure-based alignments and combines them with sequence constraints to construct consistency-based multiple sequence alignments. The output is a consensus alignment that brings together sequence and structural information about input proteins and their homologs. PROMALS3D can also align sequences of multiple input structures, with the output representing a multiple structure-based alignment refined in combination with sequence constraints. The advantage of PROMALS3D is that it gives researchers an easy way to produce high-quality alignments consistent with both sequences and structures of proteins. PROMALS3D outperforms a number of existing methods for constructing multiple sequence or structural alignments using both reference-dependent and reference-independent evaluation methods.  相似文献   

15.
The evolutionary relationship within prokaryotes is examined based on signature sequences (defined as conserved inserts or deletions shared by specific taxa) and phylogenies derived from different proteins. Archaebacteria are indicated as being monophyletic by a number of proteins related to the information transfer processes. In contrast, for several other highly conserved proteins, common signature sequences are present in archaebacteria and Gram-positive bacteria, whereas Gram-negative bacteria are indicated as being distinct. For these proteins, archaebacteria do not form a phylogenetically distinct clade but show polyphyletic branching within Gram-positive bacteria. A closer relationship of archaebacteria to Gram-positive bacteria in comparison with Gram-negative bacteria is generally seen for the majority of the available gene/protein sequences. To account for these results and the fact that both archaebacteria and Gram-positive bacteria are prokaryotes surrounded by a single cell membrane, I propose that the primary division within prokaryotes is between monoderm prokaryotes (surrounded by a single membrane) and diderm prokaryotes (i.e. all true Gram-negative bacteria containing both an inner cytoplasmic membrane and an outer membrane). This proposal is consistent with both cell morphology and signature sequences in different proteins. The monophyletic nature of archaebacteria for some genes, and their polyphyletic branching within Gram-positive bacteria as suggested by others, is critically examined, and several explanations, including derivation of archaebacteria from Gram-positive bacteria in response to antibiotic selection pressure, are proposed. Signature sequences in proteins also indicate that the low-G + C Gram-positive bacteria are phylogenetically distinct from the high-G + C Gram-positive group and that the diderm prokaryotes (i.e. Gram-negative bacteria) appear to have evolved from the latter group. Protein phylogenies and signature sequences also show that all eukaryotic cells have received significant gene contributions from both an archaebacterium and a Gram-negative eubacterium. Thus, the hypothesis that archaebacteria and eukaryotes shared a common ancestor exclusive of eubacteria is not supported. These observations provide evidence for an alternate view of the evolutionary relationship among living organisms that is different from the currently popular three-domain proposal.  相似文献   

16.
AsMamDB: an alternative splice database of mammals   总被引:11,自引:1,他引:10  
Ji H  Zhou Q  Wen F  Xia H  Lu X  Li Y 《Nucleic acids research》2001,29(1):260-263
  相似文献   

17.
The accuracy of a homology model based on the structure of a distant relative or other topologically equivalent protein is primarily limited by the quality of the alignment. Here we describe a systematic approach for sequence-to-structure alignment, called ‘K*Sync’, in which alignments are generated by dynamic programming using a scoring function that combines information on many protein features, including a novel measure of how obligate a sequence region is to the protein fold. By systematically varying the weights on the different features that contribute to the alignment score, we generate very large ensembles of diverse alignments, each optimal under a particular constellation of weights. We investigate a variety of approaches to select the best models from the ensemble, including consensus of the alignments, a hydrophobic burial measure, low- and high-resolution energy functions, and combinations of these evaluation methods. The effect on model quality and selection resulting from loop modeling and backbone optimization is also studied. The performance of the method on a benchmark set is reported and shows the approach to be effective at both generating and selecting accurate alignments. The method serves as the foundation of the homology modeling module in the Robetta server.  相似文献   

18.
Ribosomal RNA (rRNA) genes are probably the most frequently used data source in phylogenetic reconstruction. Individual columns of rRNA alignments are not independent as a consequence of their highly conserved secondary structures. Unless explicitly taken into account, these correlation can distort the phylogenetic signal and/or lead to gross overestimates of tree stability. Maximum likelihood and Bayesian approaches are of course amenable to using RNA-specific substitution models that treat conserved base pairs appropriately, but require accurate secondary structure models as input. So far, however, no accurate and easy-to-use tool has been available for computing structure-aware alignments and consensus structures that can deal with the large rRNAs. The RNAsalsa approach is designed to fill this gap. Capitalizing on the improved accuracy of pairwise consensus structures and informed by a priori knowledge of group-specific structural constraints, the tool provides both alignments and consensus structures that are of sufficient accuracy for routine phylogenetic analysis based on RNA-specific substitution models. The power of the approach is demonstrated using two rRNA data sets: a mitochondrial rRNA set of 26 Mammalia, and a collection of 28S nuclear rRNAs representative of the five major echinoderm groups.  相似文献   

19.
We present a computational scheme to locally align a collection of RNA sequences using sequence and structure constraints. In addition, the method searches for the resulting alignments with the most significant common motifs, among all possible collections. The first part utilizes a simplified version of the Sankoff algorithm for simultaneous folding and alignment of RNA sequences, but maintains tractability by constructing multi-sequence alignments from pairwise comparisons. The algorithm finds the multiple alignments using a greedy approach and has similarities to both CLUSTAL and CONSENSUS, but the core algorithm assures that the pairwise alignments are optimized for both sequence and structure conservation. The choice of scoring system and the method of progressively constructing the final solution are important considerations that are discussed. Example solutions, and comparisons with other approaches, are provided. The solutions include finding consensus structures identical to published ones.  相似文献   

20.
Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号