首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
MOTIVATION: Multiple alignment of highly divergent sequences is a challenging problem for which available programs tend to show poor performance. Generally, this is due to a scoring function that does not describe biological reality accurately enough or a heuristic that cannot explore solution space efficiently enough. In this respect, we present a new program, Align-m, that uses a non-progressive local approach to guide a global alignment. RESULTS: Two large test sets were used that represent the entire SCOP classification and cover sequence similarities between 0 and 50% identity. Performance was compared with the publicly available algorithms ClustalW, T-Coffee and DiAlign. In general, Align-m has comparable or slightly higher accuracy in terms of correctly aligned residues, especially for distantly related sequences. Importantly, it aligns much fewer residues incorrectly, with average differences of over 15% compared with some of the other algorithms. AVAILABILITY: Align-m and the test sets are available at http://bioinformatics.vub.ac.be  相似文献   

2.
SUMMARY: Chimera allows the construction of chimeric protein or nucleic acid sequence files by concatenating sequences from two or more sequence files in PHYLIP formats. It allows the user to interactively select genes and species from the input files. The concatenated result is stored to one single output file in PHYLIP or NEXUS formats. AVAILABILITY: The computer program, including supporting files and example files, is available from http://www.dalicon.com/chimera/.  相似文献   

3.
TT virus is a virus distantly related to the Circoviridae family. We report here the complete genome characterization of two European human isolates (T3PB and TUPB) using a new and simple protocol for sequencing GC-rich genomic regions. Sequence analysis confirmed the existence of two major ORFs, of a CAV-like VP2 motif in ORF2 and of potential stem-loop structures in non-coding regions. Phylogenetic analyses based on complete genomic sequences of human isolates suggested that three different lineages exist at least. The first lineage includes genotypes 1, 2, and 3, and two other lineages include viruses related to the Japanese SANBAN and to the North American TUS01 isolates respectively. Sequence comparison made it possible to assign strain T3PB to genotype 3, and strain TUPB to the TUS01 group. Consequently, this study reports the first full-length sequence of a genotype 3 isolate and demonstrates that viruses belonging to the TUS01 lineage are present in the Old Word.  相似文献   

4.
We present a novel phylogenetic approach to infer ancestral ontogenies of shape characters described as landmark configurations. The method is rooted in previously published theoretical developments to analyse landmark data in a phylogenetic context with parsimony as the optimality criterion, in this case using the minimization of differences in landmark position to define not only ancestral shapes but also the changes in developmental timing between ancestor–descendant shape ontogenies. Evolutionary changes along the tree represent changes in relative developmental timing between ontogenetic trajectories (possible heterochronic events) and changes in shape within each stage. The method requires the user to determine the shape of the specimens between two standard events, for instance birth and onset of sexual maturity. Once the ontogenetic trajectory is discretized into a series of consecutive stages, the method enables the user to identify changes in developmental timing associated with changes in the offset and/or onset of the shape ontogenetic trajectories. The method is implemented in a C language program called SPASOS. The analysis of two empirical examples (anurans and felids) using this novel method yielded results in agreement with previous hypotheses about shape evolution in these groups based on non-phylogenetic analyses.  相似文献   

5.
We have developed a robust and sensitive method, called RNA-ID, to screen for cis-regulatory sequences in RNA using fluorescence-activated cell sorting (FACS) of yeast cells bearing a reporter in which expression of both superfolder green fluorescent protein (GFP) and yeast codon-optimized mCherry red fluorescent protein (RFP) is driven by the bidirectional GAL1,10 promoter. This method recapitulates previously reported progressive inhibition of translation mediated by increasing numbers of CGA codon pairs, and restoration of expression by introduction of a tRNA with an anticodon that base pairs exactly with the CGA codon. This method also reproduces effects of paromomycin and context on stop codon read-through. Five key features of this method contribute to its effectiveness as a selection for regulatory sequences: The system exhibits greater than a 250-fold dynamic range, a quantitative and dose-dependent response to known inhibitory sequences, exquisite resolution that allows nearly complete physical separation of distinct populations, and a reproducible signal between different cells transformed with the identical reporter, all of which are coupled with simple methods involving ligation-independent cloning, to create large libraries. Moreover, we provide evidence that there are sequences within a 9-nt library that cause reduced GFP fluorescence, suggesting that there are novel cis-regulatory sequences to be found even in this short sequence space. This method is widely applicable to the study of both RNA-mediated and codon-mediated effects on expression.  相似文献   

6.
7.
Comparative and phylogenetic analysis of developmental sequences   总被引:3,自引:0,他引:3  
Event pairing has been proposed for the optimization of developmental sequences (event sequences) on a given phylogenetic hypothesis (cladogram) to determine instances of sequence heterochrony. Here, we show that event pairing is faulty, leading to the optimization of impossible hypothetical ancestors, the underestimation of the lengths of the developmental sequences on the tree, and the proposition of synapomorphies that are not supported by the data. When used for phylogenetic analysis, event pairing can even produce cladograms that are inconsistent with the data. These errors are caused by the fact that event pairing treats dependent features as if they were independent. We present a new method for comparative and phylogenetic analysis of developmental sequences that does not exhibit these errors. Our method applies Search-based character optimization and treats the entire developmental sequence as a single character that is then analyzed by using an edit cost function, which specifies the transformation cost between pairs of observed and unobserved character states, and dynamic programming. In other words, the developmental sequence is directly optimized on the tree. We used event pairing as an edit cost function, but others are possible.  相似文献   

8.
Foster KW 《Protist》2003,154(1):43-55
The further evolution of informational molecular sequences should depend on the number of viable alternatives possible for the sequences as set by selection, the unrepaired mutation rate, and time. Most biomolecular clocks are based on Kimura's nearly neutral mutation random-drift hypothesis. This clock assumes that informational sequences are in equilibrium, i.e., the nucleotides mutate at a uniform rate and the number of nucleotides unconstrained by selection remains constant. Correcting for deviations from these assumptions should produce a more accurate clock. Informational molecules probably formed from polynucleotides having some other function such as nitrogen or nucleotide storage, thus being initially functionally unselected. At any time the rate of development of functionality in a protein may be expected to be proportional to the number of viable alternatives of sequence in its potentially interacting regions. Assuming the rate of unrepaired mutations is constant, these clocks should exponentially slow as they evolve, each with a different rate toward individual equilibria. Also if the degree of selection changes, its clock rate should change. For a more precise clock two approaches are suggested to estimate these time dependent changes in evolutionary rate. An improved clock could improve estimation of phylogeny and put a time scale on that phylogeny.  相似文献   

9.
Based on the computation of the influence function, a tool tomeasure the impact of each piece of sampled data on the statisticalinference of a parameter, we propose to analyze the supportof the maximum-likelihood (ML) tree for each site. We providea new tool for filtering data sets (nucleotides, amino acids,and others) in the context of ML phylogenetic reconstructions.Because different sites support different phylogenic topologiesin different ways, outlier sites, that is, sites with a verynegative influence value, are important: they can drasticallychange the topology resulting from the statistical inference.Therefore, these outlier sites must be clearly identified andtheir effects accounted for before drawing biological conclusionsfrom the inferred tree. A matrix containing 158 fungal terminalsall belonging to Chytridiomycota, Zygomycota, and Glomeromycotais analyzed. We show that removing the strongest outlier fromthe analysis strikingly modifies the ML topology, with a lossof as many as 20% of the internal nodes. As a result, estimatingthe topology on the filtered data set results in a topologywith enhanced bootstrap support. From this analysis, the polyphyleticstatus of the fungal phyla Chytridiomycota and Zygomycota isreinforced, suggesting the necessity of revisiting the systematicsof these fungal groups. We show the ability of influence functionto produce new evolution hypotheses.  相似文献   

10.
11.
Four-cluster analysis: a simple method to test phylogenetic hypotheses   总被引:5,自引:2,他引:3  
A simple statistical test for comparing three alternative phylogenetic hypotheses for four monophyletic groups is presented. This test is based on the minimum-evolution principle, and it does not require any information regarding the branching order within each monophyletic group. It is computationally efficient and can be easily extended to five or more monophyletic groups.   相似文献   

12.
In this study, a simple 4k-dimension feature representation vector is proposed to reconstruct phylogenetic trees, where k is the length of a word. The vector is composed of elements which characterize the relative difference of biological sequence from sequence generated by an independent random process. In addition, the variance of a vector which is obtained by averaging every column of feature representation matrix is employed to determine appropriate word length. In our experiments, reliable results can always be generated when word length is <7 which appears to be of lower computational complexity. Phylogenetic trees of 24 transferrins and 48 Hepatitis E viruses reconstructed at word length 6 are in good agreements with previous study, it shows that our method is efficient and powerful.  相似文献   

13.
Although phylogenetic inference of protein-coding sequences continues to dominate the literature, few analyses incorporate evolutionary models that consider the genetic code. This problem is exacerbated by the exclusion of codon-based models from commonly employed model selection techniques, presumably due to the computational cost associated with codon models. We investigated an efficient alternative to standard nucleotide substitution models, in which codon position (CP) is incorporated into the model. We determined the most appropriate model for alignments of 177 RNA virus genes and 106 yeast genes, using 11 substitution models including one codon model and four CP models. The majority of analyzed gene alignments are best described by CP substitution models, rather than by standard nucleotide models, and without the computational cost of full codon models. These results have significant implications for phylogenetic inference of coding sequences as they make it clear that substitution models incorporating CPs not only are a computationally realistic alternative to standard models but may also frequently be statistically superior.  相似文献   

14.
Plant Molecular Biology - Phylogenetic aspects, hotspots of nucleotide divergence, highly divergent genes, and specific RNA editing sites have been identified and characterized in the plastomes of...  相似文献   

15.
A new method of coding polymorphic multiistate characters for phylogenetic analysis is presented. By dividing such characters into subcharacters, their frequency distributions can be represented with discrete states. Differential weighting is used to counter the effect of representing one character with multiple characters. The new method, generalized frequency coding (GFC), is potentially superior to previously used methods in that it incorporates more information and is applicable to both qualitative and quantitative characters. When applied to a previously published data set that includes both types of polymorphic multistate characters, the method performed well, as assessed with g1 and nonparametric bootstrap statistics and giving results congruent with those of other studies. The data set was also used to compare GFC with both gap-weighting and Manhattan distance step matrix coding. On these grounds and for philosophical reasons, we consider GFC to be a better estimator of phylogeny.  相似文献   

16.
The neighbor-joining method: a new method for reconstructing phylogenetic trees   总被引:673,自引:29,他引:673  
A new method called the neighbor-joining method is proposed for reconstructing phylogenetic trees from evolutionary distance data. The principle of this method is to find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total branch length at each stage of clustering of OTUs starting with a starlike tree. The branch lengths as well as the topology of a parsimonious tree can quickly be obtained by using this method. Using computer simulation, we studied the efficiency of this method in obtaining the correct unrooted tree in comparison with that of five other tree-making methods: the unweighted pair group method of analysis, Farris's method, Sattath and Tversky's method, Li's method, and Tateno et al.'s modified Farris method. The new, neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods.   相似文献   

17.
We have developed a rapid parsimony method for reconstructing ancestral nucleotide states that allows calculation of initial branch lengths that are good approximations to optimal maximum-likelihood estimates under several commonly used substitution models. Use of these approximate branch lengths (rather than fixed arbitrary values) as starting points significantly reduces the time required for iteration to a solution that maximizes the likelihood of a tree. These branch lengths are close enough to the optimal values that they can be used without further iteration to calculate approximate maximum-likelihood scores that are very close to the "exact" scores found by iteration. Several strategies are described for using these approximate scores to substantially reduce times needed for maximum-likelihood tree searches.  相似文献   

18.
Nucleic acids from an unidentified virus from ringed seals (Phoca hispida) were amplified using sequence-independent PCR, subcloned, and then sequenced. The full genome of a novel RNA virus was derived, identifying the first sequence-confirmed picornavirus in a marine mammal. The phylogenetic position of the tentatively named seal picornavirus 1 (SePV-1) as an outlier to the grouping of parechoviruses was found consistently in alignable regions of the genome. A mean protein sequence identity of only 19.3 to 30.0% was found between the 3D polymerase gene sequence of SePV-1 and those of other picornaviruses. The predicted secondary structure of the short 506-base 5'-untranslated region showed some attributes of a type IVB internal ribosome entry site, and the polyprotein lacked an apparent L peptide, both properties associated with the Parechovirus genus. The presence of two SePV-1 2A genes and of the canonical sequence required for cotranslational cleavage resembled the genetic organization of Ljungan virus. Minor genetic variants were detected in culture supernatants derived from 8 of 108 (7.4%) seals collected in 2000 to 2002, indicating a high prevalence of SePV-1 in this hunted seal population. The high level of genetic divergence of SePV-1 compared to other picornaviruses and its mix of characteristics relative to its closest relatives support the provisional classification of SePV-1 as the prototype for a new genus in the family Picornaviridae.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号