首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

Background

Amino acid replacement rate matrices are a crucial component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Ideally, the rate matrix reflects the mutational behavior of the actual data under study; however, estimating amino acid replacement rate matrices requires large protein alignments and is computationally expensive and complex. As a compromise, sub-optimal pre-calculated generic matrices are typically used for protein-based phylogeny. Sequence availability has now grown to a point where problem-specific rate matrices can often be calculated if the computational cost can be controlled.

Results

The most time consuming step in estimating rate matrices by maximum likelihood is building maximum likelihood phylogenetic trees from protein alignments. We propose a new procedure, called FastMG, to overcome this obstacle. The key innovation is the alignment-splitting algorithm that splits alignments with many sequences into non-overlapping sub-alignments prior to estimating amino acid replacement rates. Experiments with different large data sets showed that the FastMG procedure was an order of magnitude faster than without splitting. Importantly, there was no apparent loss in matrix quality if an appropriate splitting procedure is used.

Conclusions

FastMG is a simple, fast and accurate procedure to estimate amino acid replacement rate matrices from large data sets. It enables researchers to study the evolutionary relationships for specific groups of proteins or taxa with optimized, data-specific amino acid replacement rate matrices. The programs, data sets, and the new mammalian mitochondrial protein rate matrix are available at http://fastmg.codeplex.com.  相似文献   

2.
SUMMARY: IQPNNI is a program to infer maximum-likelihood phylogenetic trees from DNA or protein data with a large number of sequences. We present an improved and MPI-parallel implementation showing very good scaling and speed-up behavior.  相似文献   

3.
Wu M  Eisen JA 《Genome biology》2008,9(10):R151
The explosive growth of genomic data provides an opportunity to make increased use of protein markers for phylogenetic inference. We have developed an automated pipeline for phylogenomic analysis (AMPHORA) that overcomes the existing bottlenecks limiting large-scale protein phylogenetic inference. We demonstrated its high throughput capabilities and high quality results by constructing a genome tree of 578 bacterial species and by assigning phylotypes to 18,607 protein markers identified in metagenomic data collected from the Sargasso Sea.  相似文献   

4.
We have developed a program for the fast and accurate detection of spontaneous synaptic events. The algorithm identifies each event of which the slope and amplitude which meet criteria. The significant feature of this algorithm is its stepwise and exploratory search for the onset and the peak points. During the first step, the program employing the algorithm makes a rough estimate of the candidate for a synaptic event, and determines a 'temporary' onset data point. The next step is the detection of the true onset data point and 'temporary' peak data point, which probably exist several points after the temporary onset data point. The third step is a backward search to detect the true peak data point. The final step is to check whether the amplitude of the detected event exceeds the threshold. This stepwise and shuttlewise search allows for the accurate detection of the peak points. Using this program, we succeeded in detecting an increased frequency and amplitude of spontaneous excitatory postsynaptic currents in chick cerebral neurons following the application of 12-O-tetradecanoyl-phorbol-13-acetate (TPA). In addition, we demonstrated that the program employing the algorithm was able to be used for the detection of extracellular action potentials.  相似文献   

5.
In phylogenetic analyses with combined multigene or multiprotein data sets, accounting for differing evolutionary dynamics at different loci is essential for accurate tree prediction. Existing maximum likelihood (ML) and Bayesian approaches are computationally intensive. We present an alternative approach that is orders of magnitude faster. The method, Distance Rates (DistR), estimates rates based upon distances derived from gene/protein sequence data. Simulation studies indicate that this technique is accurate compared with other methods and robust to missing sequence data. The DistR method was applied to a fungal mitochondrial data set, and the rate estimates compared well to those obtained using existing ML and Bayesian approaches. Inclusion of the protein rates estimated from the DistR method into the ML calculation of trees as a branch length multiplier resulted in a significantly improved fit as measured by the Akaike Information Criterion (AIC). Furthermore, bootstrap support for the ML topology was significantly greater when protein rates were used, and some evident errors in the concatenated ML tree topology (i.e., without protein rates) were corrected. [Bayesian credible intervals; DistR method; multigene phylogeny; PHYML; rate heterogeneity.].  相似文献   

6.
7.
8.
Likelihood methods and methods using invariants are procedures for inferring the evolutionary relationships among species through statistical analysis of nucleic acid sequences. A likelihood-ratio test may be used to determine the feasibility of any tree for which the maximum likelihood can be computed. The method of linear invariants described by Cavender, which includes Lake's method of evolutionary parsimony as a special case, is essentially a form of the likelihood-ratio method. In the case of a small number of species (four or five), these methods may be used to find a confidence set for the correct tree. An exact version of Lake's asymptotic chi 2 test has been mentioned by Holmquist et al. Under very general assumptions, a one-sided exact test is appropriate, which greatly increases power.  相似文献   

9.
The calculation of maximum likelihood pedigrees for related organisms using genotypic data is considered. The problem is formulated so that the domain of optimization is a permutation space. This is a feature shared by the travelling salesman problem, for which simulated annealing is known to be effective. Using this technique it is found that pedigrees can be reconstructed with minimal error using genotypic data of a quality currently realizable. In complex pedigrees accurate reconstruction can be done with no a priori age or sex information. For smaller numbers of individuals a method of efficiently enumerating all admissible pedigrees of nonzero likelihood is given.  相似文献   

10.
11.
12.
DECANI  JOHN S. 《Biometrika》1972,59(1):131-135
  相似文献   

13.
A branch search algorithm for maximum likelihood paired comparison ranking   总被引:2,自引:0,他引:2  
  相似文献   

14.
15.
A maximum likelihood approach to two-dimensional crystals   总被引:1,自引:0,他引:1  
Maximum likelihood (ML) processing of transmission electron microscopy images of protein particles can produce reconstructions of superior resolution due to a reduced reference bias. We have investigated a ML processing approach to images centered on the unit cells of two-dimensional (2D) crystal images. The implemented software makes use of the predictive lattice node tracking in the MRC software, which is used to window particle stacks. These are then noise-whitened and subjected to ML processing. Resulting ML maps are translated into amplitudes and phases for further processing within the 2dx software package. Compared with ML processing for randomly oriented single particles, the required computational costs are greatly reduced as the 2D crystals restrict the parameter search space. The software was applied to images of negatively stained or frozen hydrated 2D crystals of different crystal order. We find that the ML algorithm is not free from reference bias, even though its sensitivity to noise correlation is lower than for pure cross-correlation alignment. Compared with crystallographic processing, the newly developed software yields better resolution for 2D crystal images of lower crystal quality, and it performs equally well for well-ordered crystal images.  相似文献   

16.
Summary A non-linear method of ordinating vegetation samples based on the fitting of bell-shaped response curves is lescribed. For each species two Gaussian curves were itted, one to quantitative values, where the species was present, the other to probabilities of absence. A maximum likelihood approach was then used to obtain a best approximation of the positions of the samples along a one-dimensional gradient. By an iterative process successively better approximations were obtained.The method was successful in recovering gradients based on hypothetical data. With two sets of real data the gradient produced was more ecologically satisfying and far less distorted than that revealed by principal component analysis.  相似文献   

17.
E G Williamson  M Slatkin 《Genetics》1999,152(2):755-761
We develop a maximum-likelihood framework for using temporal changes in allele frequencies to estimate the number of breeding individuals in a population. We use simulations to compare the performance of this estimator to an F-statistic estimator of variance effective population size. The maximum-likelihood estimator had a lower variance and smaller bias. Taking advantage of the likelihood framework, we extend the model to include exponential growth and show that temporal allele frequency data from three or more sampling events can be used to test for population growth.  相似文献   

18.
Stochastic models of nucleotide substitution are playing an increasingly important role in phylogenetic reconstruction through such methods as maximum likelihood. Here, we examine the behaviour of a simple substitution model, and establish some links between the methods of maximum parsimony and maximum likelihood under this model.  相似文献   

19.

Background

Haplotype assembly, reconstructing haplotypes from sequence data, is one of the major computational problems in bioinformatics. Most of the current methodologies for haplotype assembly are designed for diploid individuals. In recent years, genomes having more than two sets of homologous chromosomes have attracted many research groups that are interested in the genomics of disease, phylogenetics, botany and evolution. However, there is still a lack of methods for reconstructing polyploid haplotypes.

Results

In this work, the minimum error correction with genotype information (MEC/GI) model, an important combinatorial model for haplotyping a single individual, is used to study the triploid individual haplotype reconstruction problem. A fast and accurate enumeration-based algorithm enumeration haplotyping triploid with least difference (EHTLD) is proposed for solving the MEC/GI model. The EHTLD algorithm tries to reconstruct the three haplotypes according to the order of single nucleotide polymorphism (SNP) loci along them. When reconstructing a given SNP site, the EHTLD algorithm enumerates three kinds of SNP values in terms of the corresponding site’s genotype value, and chooses the one, which leads to the minimum difference between the reconstructed haplotypes and the sequenced fragments covering that SNP site, to fill the SNP loci being reconstructed.

Conclusion

Extensive experimental comparisons were performed between the EHTLD algorithm and the well known HapCompass and HapTree. Compared with algorithms HapCompass and HapTree, the EHTLD algorithm can reconstruct more accurate haplotypes, which were proven by a number of experiments.
  相似文献   

20.
In phylogenetic inference by maximum-parsimony (MP), minimum-evolution (ME), and maximum-likelihood (ML) methods, it is customary to conduct extensive heuristic searches of MP, ME, and ML trees, examining a large number of different topologies. However, these extensive searches tend to give incorrect tree topologies. Here we show by extensive computer simulation that when the number of nucleotide sequences (m) is large and the number of nucleotides used (n) is relatively small, the simple MP or ML tree search algorithms such as the stepwise addition (SA) plus nearest neighbor interchange (NNI) search and the SA plus subtree pruning regrafting (SPR) search are as efficient as the extensive search algorithms such as the SA plus tree bisection-reconnection (TBR) search in inferring the true tree. In the case of ME methods, the simple neighbor-joining (NJ) algorithm is as efficient as or more efficient than the extensive NJ+TBR search. We show that when ME methods are used, the simple p distance generally gives better results in phylogenetic inference than more complicated distance measures such as the Hasegawa-Kishino-Yano (HKY) distance, even when nucleotide substitution follows the HKY model. When ML methods are used, the simple Jukes-Cantor (JC) model of phylogenetic inference generally shows a better performance than the HKY model even if the likelihood value for the HKY model is much higher than that for the JC model. This indicates that at least in the present case, selecting of a substitution model by using the likelihood ratio test or the AIC index is not appropriate. When n is small relative to m and the extent of sequence divergence is high, the NJ method with p distance often shows a better performance than ML methods with the JC model. However, when the level of sequence divergence is low, this is not the case.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号