首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Use of whole genome sequence data to infer baculovirus phylogeny   总被引:18,自引:0,他引:18       下载免费PDF全文
Several phylogenetic methods based on whole genome sequence data were evaluated using data from nine complete baculovirus genomes. The utility of three independent character sets was assessed. The first data set comprised the sequences of the 63 genes common to these viruses. The second set of characters was based on gene order, and phylogenies were inferred using both breakpoint distance analysis and a novel method developed here, termed neighbor pair analysis. The third set recorded gene content by scoring gene presence or absence in each genome. All three data sets yielded phylogenies supporting the separation of the Nucleopolyhedrovirus (NPV) and Granulovirus (GV) genera, the division of the NPVs into groups I and II, and species relationships within group I NPVs. Generation of phylogenies based on the combined sequences of all 63 shared genes proved to be the most effective approach to resolving the relationships among the group II NPVs and the GVs. The history of gene acquisitions and losses that have accompanied baculovirus diversification was visualized by mapping the gene content data onto the phylogenetic tree. This analysis highlighted the fluid nature of baculovirus genomes, with evidence of frequent genome rearrangements and multiple gene content changes during their evolution. Of more than 416 genes identified in the genomes analyzed, only 63 are present in all nine genomes, and 200 genes are found only in a single genome. Despite this fluidity, the whole genome-based methods we describe are sufficiently powerful to recover the underlying phylogeny of the viruses.  相似文献   

2.
Phylogeny estimation is extremely crucial in the study of molecular evolution. The increase in the amount of available genomic data facilitates phylogeny estimation from multilocus sequence data. Although maximum likelihood and Bayesian methods are available for phylogeny reconstruction using multilocus sequence data, these methods require heavy computation, and their application is limited to the analysis of a moderate number of genes and taxa. Distance matrix methods present suitable alternatives for analyzing huge amounts of sequence data. However, the manner in which distance methods can be applied to multilocus sequence data remains unknown. Here, we suggest new procedures to estimate molecular phylogeny using multilocus sequence data and evaluate its significance in the framework of the distance method. We found that concatenation of the multilocus sequence data may result in incorrect phylogeny estimation with an extremely high bootstrap probability (BP), which is due to incorrect estimation of the distances and intentional ignorance of the intergene variations. Therefore, we suggest that the distance matrices for multilocus sequence data be estimated separately and these matrices be subsequently combined to reconstruct phylogeny instead of phylogeny reconstruction using concatenated sequence data. To calculate the BPs of the reconstructed phylogeny, we suggest that 2-stage bootstrap procedures be adopted; in this, genes are resampled followed by resampling of the sequence columns within the resampled genes. By resampling the genes during calculation of BPs, intergene variations are properly considered. Via simulation studies and empirical data analysis, we demonstrate that our 2-stage bootstrap procedures are more suitable than the conventional bootstrap procedure that is adopted after sequence concatenation.  相似文献   

3.
4.
Having obtained the amino acid composition of a protein, chemists and molecular biologists may wish to identify the protein from this data alone. In general such data will have errors associated with them and the length of the protein may be known only approximately or not at all. In this paper a method is described which enables searching of protein sequence databases for sequences or fragments of sequences which have a composition similar to the one being sought. Such searches are generally quite discriminating as shown by the examples provided. This method has been implemented as part of the computer program Scrutineer and is being freely distributed. It is simple to use.  相似文献   

5.
6.
Taxonomic affiliations and molecular diversity of 41 heterocystous cyanobacteria representing 12 genera have been assessed on an evolutionary landscape using rbcl gene sequence data-based phylogenomics and evogenomics approaches. Phylogenetic affiliations have clearly demonstrated the polyphyly of the true branching cyanobacteria, along with a frequent intermixing amongst the heterocystous cyanobacteria. The monophyletic origin of the heterocystous cyanobacteria was also quite evident from maximum parsimony and neighbor joining analyses. Incongruency with the traditional scheme of cyanobacterial taxonomy was frequently observed, thus advocating towards some re-amendments in the cyanobacterial classificatory schemes. Evogenomics analyses of gene sequence data gave a clear indication about the greater evolutionary pace of the unbranched cyanobacteria as compared to the branched forms. It was evident that the order Nostocales would be controlling the future pace of evolution of heterocystous cyanobacteria. The cyanobacteria Nostoc was found to have the greatest genetic heterogeneity amongst the studied genera, along with some evidence towards events of lateral gene transfer amongst the heterocystous cyanobacteria in case of the rbcl gene. Thus, heterocystous cyanobacteria were found to be a fast evolving group, with estimates of gene conversion tracts pointing towards the unbranched heterocystous cyanobacteria being at the base of evolutionary diversifications of the complete heterocystous lineage.  相似文献   

7.
Computer programs to analyze DNA and amino acid sequence data   总被引:7,自引:2,他引:5       下载免费PDF全文
Extensive modifications have been incorporated into many of the computer programs written by Staden (1-4) to make them easier to cope with DNA and amino acid sequence data. These programs can be easily used by persons with minimal knowledge of computers.  相似文献   

8.
A comparison of phylogenetic network methods using computer simulation   总被引:1,自引:0,他引:1  

Background

We present a series of simulation studies that explore the relative performance of several phylogenetic network approaches (statistical parsimony, split decomposition, union of maximum parsimony trees, neighbor-net, simulated history recombination upper bound, median-joining, reduced median joining and minimum spanning network) compared to standard tree approaches, (neighbor-joining and maximum parsimony) in the presence and absence of recombination.

Principal Findings

In the absence of recombination, all methods recovered the correct topology and branch lengths nearly all of the time when the substitution rate was low, except for minimum spanning networks, which did considerably worse. At a higher substitution rate, maximum parsimony and union of maximum parsimony trees were the most accurate. With recombination, the ability to infer the correct topology was halved for all methods and no method could accurately estimate branch lengths.

Conclusions

Our results highlight the need for more accurate phylogenetic network methods and the importance of detecting and accounting for recombination in phylogenetic studies. Furthermore, we provide useful information for choosing a network algorithm and a framework in which to evaluate improvements to existing methods and novel algorithms developed in the future.  相似文献   

9.
As a part of the elucidation of the complete amino acid sequence of human phosphoglycerate kinase, 46 tryptic peptides, ranging in length from 1 to 26 residues, were isolated and characterized from the reduced and S-carboxymethylated enzyme. The isolated peptides were subjected to sequence analysis by the modified dansyl-Edman degradation procedure and automated Edman degradation technique. The results, together with the data on cyanogen bromide peptides and two additional tryptic peptides from cyanogen bromide peptides reported in the accompanying paper, established the complete amino acid sequence of human erythrocyte phosphoglycerate kinase.  相似文献   

10.
Phylogeny reconstruction is a difficult computational problem, because the number of possible solutions increases with the number of included taxa. For example, for only 14 taxa, there are more than seven trillion possible unrooted phylogenetic trees. For this reason, phylogenetic inference methods commonly use clustering algorithms (e.g., the neighbor-joining method) or heuristic search strategies to minimize the amount of time spent evaluating nonoptimal trees. Even heuristic searches can be painfully slow, especially when computationally intensive optimality criteria such as maximum likelihood are used. I describe here a different approach to heuristic searching (using a genetic algorithm) that can tremendously reduce the time required for maximum-likelihood phylogenetic inference, especially for data sets involving large numbers of taxa. Genetic algorithms are simulations of natural selection in which individuals are encoded solutions to the problem of interest. Here, labeled phylogenetic trees are the individuals, and differential reproduction is effected by allowing the number of offspring produced by each individual to be proportional to that individual's rank likelihood score. Natural selection increases the average likelihood in the evolving population of phylogenetic trees, and the genetic algorithm is allowed to proceed until the likelihood of the best individual ceases to improve over time. An example is presented involving rbcL sequence data for 55 taxa of green plants. The genetic algorithm described here required only 6% of the computational effort required by a conventional heuristic search using tree bisection/reconnection (TBR) branch swapping to obtain the same maximum-likelihood topology.   相似文献   

11.
Calculation of protein extinction coefficients from amino acid sequence data   总被引:128,自引:0,他引:128  
Quantitative study of protein-protein and protein-ligand interactions in solution requires accurate determination of protein concentration. Often, for proteins available only in "molecular biological" amounts, it is difficult or impossible to make an accurate experimental measurement of the molar extinction coefficient of the protein. Yet without a reliable value of this parameter, one cannot determine protein concentrations by the usual uv spectroscopic means. Fortunately, knowledge of amino acid residue sequence and promoter molecular weight (and thus also of amino acid composition) is generally available through the DNA sequence, which is usually accurately known for most such proteins. In this paper we present a method for calculating accurate (to +/- 5% in most cases) molar extinction coefficients for proteins at 280 nm, simply from knowledge of the amino acid composition. The method is calibrated against 18 "normal" globular proteins whose molar extinction coefficients are accurately known, and the assumptions underlying the method, as well as its limitations, are discussed.  相似文献   

12.
13.
Nucleic acid sequence database computer system.   总被引:6,自引:3,他引:3       下载免费PDF全文
  相似文献   

14.
Synthetic probes deduced from amino acid sequence data are widely used to detect cognate coding sequences in libraries of cloned DNA segments. The redundancy of the genetic code dictates that a choice must be made between (1) a mixture of probes reflecting all codon combinations, and (2) a single longer "optimal" probe. The second strategy is examined in detail. The frequency of sequences matching a given probe by chance alone can be determined and also the frequency of sequences closely resembling the probe and contributing to the hybridization background. Gene banks cannot be treated as random associations of the four nucleotides, and probe sequences deduced from amino acid sequence data occur more often than predicted by chance alone. Probe lengths must be increased to confer the necessary specificity. Examination of hybrids formed between unique homologous probes and their cognate targets reveals that short stretches of perfect homology occurring by chance make a significant contribution to the hybridization background. Statistical methods for improving homology are examined, taking human coding sequences as an example, and considerations of codon utilization and dinucleotide frequencies yield an overall homology of greater than 82%. Recommendations for probe design and hybridization are presented, and the choice between using multiple probes reflecting all codon possibilities and a unique optimal probe is discussed.  相似文献   

15.
The complete amino acid sequence of skeletal myoglobin from the Asian elephant (Elephas maximus) is reported. The functional significance of variations seen when this sequence is compared with that of sperm whale myoglobin is explored in the light of the crystallographic model available for the latter molecule. The phylogenetic implications of the elephant myoglobin amino acid sequence are evaluated by using the maximum parsimony technique. A similar analysis is also presented which incorporates all of the proteins sequenced from the elephant. These results are discussed with respect to current views on proboscidean phylogeny.  相似文献   

16.
A lattice model of proteins is introduced. "A protein molecule" is a chain of nown-intersecting units of a given length on the two-dimensional square lattice. The copolymeric character of protein molecules is incorporated into the model in the form of specificities of inter-unit interactions. This model proved most effective for studying the statistical mechanical characteristics of protein folding, unfolding and fluctuations. The specificities of inter-unit interactions are shown to be the primary factors responsible for the all-or-none type transition from native to denatured states of globular proteins. The model has been studied by the Monte Carlo method of Metropolis et al., which is now shown applied to approximately simulating a kinetic process. In the strong limit of the specificity of the inter-unit interaction the native conformation was reached in this method by starting from an extended conformation. The possible generalization and application of this method for finding the native conformation of proteins form their amino sequence are discussed.  相似文献   

17.
We have constructed a non-homologous database, termed the Integrated Sequence-Structure Database (ISSD) which comprises the coding sequences of genes, amino acid sequences of the corresponding proteins, their secondary structure and straight phi,psi angles assignments, and polypeptide backbone coordinates. Each protein entry in the database holds the alignment of nucleotide sequence, amino acid sequence and the PDB three-dimensional structure data. The nucleotide and amino acid sequences for each entry are selected on the basis of exact matches of the source organism and cell environment. The current version 1.0 of ISSD is available on the WWW at http://www.protein.bio.msu.su/issd/ and includes 107 non-homologous mammalian proteins, of which 80 are human proteins. The database has been used by us for the analysis of synonymous codon usage patterns in mRNA sequences showing their correlation with the three-dimensional structure features in the encoded proteins. Possible ISSD applications include optimisation of protein expression, improvement of the protein structure prediction accuracy, and analysis of evolutionary aspects of the nucleotide sequence-protein structure relationship.  相似文献   

18.
The phylogenetic position of Annelida as well as its ingroup relationships are a matter of ongoing debate. A molecular phylogenetic study of sedentary polychaete relationships was conducted based on 70 sequences of 18S rRNA, including unpublished sequences of 18 polychaete species. The data set was analysed with maximum parsimony and maximum likelihood methods. Clade robustness was estimated by parsimony-bootstrapping and jackknifing, decay index, and clade support, as well as a posteriori probability tests using Bayesian inference. Irrespective of the applied method, some traditional sedentary polychaete taxa, such as Cirratulidae, Opheliidae, Orbiniidae, Siboglinidae and Spionidae, were recovered by our phylogenetic reconstruction. A close relationship between Orbiniidae and Questa received a particularly strong support. Echiura appears to be a polychaete ingroup taxon which is closely related to Dasybranchus (Capitellidae). As in previous molecular analyses, no support was found for the monophyly of Annelida nor for that of Polychaeta. However, we suggest that an increase in taxon sampling may yield additional resolution in the reconstruction of polychaete ingroup phylogeny, although the difficulties in reconstructing the basal phylogenetic relationships within Annelida may be due to their rapid radiation.  相似文献   

19.
Plastocyanin partial amino acid sequences (40 residues) of five members of the Ranunculaceae were used, together with many other flowering plant plastocyanin sequences already published, to construct dendrograms. On this basis the Ranunculaceae appear more closely related to the Rosaceae and Fabaceae than to the other families investigated. Dendograms constructed from amino acid sequence data and serological data of five members of the Ranunculaceae were similar.  相似文献   

20.
To analyze the interrelationships between the amino acid sequences of the proteins of hepatitis C virus and the functional characteristics of different variants of this virus, a database of protein functional mapping of hepatitis C virus was developed. The database contains amino acid sequences (both full-size and fragmentary) retrieved from accessible databases and experimental data published in literature. The database also contains the results of comparison and treatment of primary data, including alignments and functional regions. On the basis of these data, variable and conservative regions of envelope proteins of hepatitis C virus were revealed. Antigenic and functional maps of structural and nonstructural proteins of the virus were constructed. The most variable region of the envelope protein E2 (HVR1) was analysed. It is assumed that the conservatism of some amino acid positions of HVR1 is related to the functions of this region.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号