首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 8 毫秒
1.
A comparison of phylogenetic network methods using computer simulation   总被引:1,自引:0,他引:1  

Background

We present a series of simulation studies that explore the relative performance of several phylogenetic network approaches (statistical parsimony, split decomposition, union of maximum parsimony trees, neighbor-net, simulated history recombination upper bound, median-joining, reduced median joining and minimum spanning network) compared to standard tree approaches, (neighbor-joining and maximum parsimony) in the presence and absence of recombination.

Principal Findings

In the absence of recombination, all methods recovered the correct topology and branch lengths nearly all of the time when the substitution rate was low, except for minimum spanning networks, which did considerably worse. At a higher substitution rate, maximum parsimony and union of maximum parsimony trees were the most accurate. With recombination, the ability to infer the correct topology was halved for all methods and no method could accurately estimate branch lengths.

Conclusions

Our results highlight the need for more accurate phylogenetic network methods and the importance of detecting and accounting for recombination in phylogenetic studies. Furthermore, we provide useful information for choosing a network algorithm and a framework in which to evaluate improvements to existing methods and novel algorithms developed in the future.  相似文献   

2.

Background

Modern dairy cattle breeding goals include several production and more and more functional traits. Estimated breeding values (EBV) that are combined in the total merit index usually come from single-trait models or from multivariate models for groups of traits. In most cases, a multivariate animal model based on phenotypic data for all traits is not feasible and approximate methods based on selection index theory are applied to derive the total merit index. Therefore, the objective of this study was to compare a full multitrait animal model with two approximate multitrait models and a selection index approach based on simulated data.

Methods

Three production and two functional traits were simulated to mimic the national Austrian Brown Swiss population. The reference method for derivation of the total merit index was a multitrait evaluation based on all phenotypic data. Two of the approximate methods were variations of an approximate multitrait model that used either yield deviations or de-regressed breeding values. The final method was an adaptation of the selection index method that is used in routine evaluations in Austria and Germany. Three scenarios with respect to residual covariances were set up: residual covariances were equal to zero, or half of or equal to the genetic covariances.

Results

Results of both approximate multitrait models were very close to those of the reference method, with rank correlations of 1. Both methods were nearly unbiased. Rank correlations for the selection index method showed good results when residual covariances were zero but correlations with the reference method decreased when residual covariances were large. Furthermore, EBV were biased when residual covariances were high.

Conclusions

We applied an approximate multitrait two-step procedure to yield deviations and de-regressed breeding values, which led to nearly unbiased results. De-regressed breeding values gave even slightly better results. Our results confirmed that ignoring residual covariances when a selection index approach is applied leads to remarkable bias. This could be relevant in terms of selection accuracy. Our findings suggest that the approximate multitrait approach applied to de-regressed breeding values can be used in routine genetic evaluation.  相似文献   

3.
MOTIVATION: In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and datasets. Several biclustering methods have been proposed in the literature; however, it is not clear how the different techniques compare with each other with respect to the biological relevance of the clusters as well as with other characteristics such as robustness and sensitivity to noise. Accordingly, no guidelines concerning the choice of the biclustering method are currently available. RESULTS: First, this paper provides a methodology for comparing and validating biclustering methods that includes a simple binary reference model. Although this model captures the essential features of most biclustering approaches, it is still simple enough to exactly determine all optimal groupings; to this end, we propose a fast divide-and-conquer algorithm (Bimax). Second, we evaluate the performance of five salient biclustering algorithms together with the reference model and a hierarchical clustering method on various synthetic and real datasets for Saccharomyces cerevisiae and Arabidopsis thaliana. The comparison reveals that (1) biclustering in general has advantages over a conventional hierarchical clustering approach, (2) there are considerable performance differences between the tested methods and (3) already the simple reference model delivers relevant patterns within all considered settings.  相似文献   

4.
5.
A comparison of somatotype methods   总被引:8,自引:0,他引:8  
In order to compare Parnell's and Heath's somatotype methods, the authors independently somatotyped a series of 59 adult male and 61 adult female subjects, (1) using the criteria of Heath's method, (2) using the criteria of Parnell's method, and (3) taking into consideration tentatively adapted Parnell criteria in addition to Heath's criteria. The authors conclude that when use similar rating criteria their mean differences are smaller, their overall correlations are similar, and their percentage agreements to a half-unit are higher (96%) than for comparisons reported by other investigators. The study considers the potentially important relationships of measurements of subcutaneous fat to ratings of the first component. The similarity of distributions of subcutaneous fat measurements and of first component ratings in selected samples suggest important interrelationships among ratings of the first component, height/ weight ratios and subcutaneous fat measurements. The authors feel: (1) that Parnell's method fails to modify the basic weaknesses in Sheldon's somatotype method; and (2) that analyses of the anthropometric data basic to Parnell's method, if guided by the criteria of Heath's method, will further objectify and simplify Heath's method, will improve agreement among independent raters, and will increase the usefulness of somatotyping as a research instrument.  相似文献   

6.
Three widely used methods for measuring total soil CO2 evolution are evaluated, including the dynamic CO2 absorption method, the static CO2 absorption method and the closed chamber method. The study covers laboratory experiments. numerical experiments with a simulation model and field measurements. The results are used to perform an error analysis. The aim of this error analysis is to indicate the impact of each method on the CO2 dynamics during the measurement, and to select the most suitable method for frequent field usage.Laboratory experiments and simulation results show that the dynamic CO2 absorption method has the potential to absorb all CO2 evolving at the soil surface. The results also prove that the method has only a minor impact on the CO2 concentration-depth gradient and the CO2 efflux. The static CO2 absorption method underestimates the soil CO2 evolution, because the absorption velocity is too low, due to slow diffusion processes. Measurements with the closed-chamber method are based on an increasing concentration with time under a closed cover. However, the accumulation of gas alters the concentration gradient in the soil profile and thus causes a rapidly decreasing efflux during the measurement. A commonly used mathematical procedure, which corrects for the altered concentration gradient, does not yield the exact surface efflux, because the effect of increasing storage in the soil profile is not incorporated. Field measurements of CO2 evolution, using the closed-chamber method and the dynamic CO2 absorption method confirm the trends that have been predicted by the simulation model. The results of this study indicate that the dynamic CO2 absorption method is accurate. As it is cheap and simple, it is suitable for the study of temporal and spatial dynamics of CO2 evolution from the soil.  相似文献   

7.
Okaty BW  Sugino K  Nelson SB 《PloS one》2011,6(1):e16493
Expression profiling of restricted neural populations using microarrays can facilitate neuronal classification and provide insight into the molecular bases of cellular phenotypes. Due to the formidable heterogeneity of intermixed cell types that make up the brain, isolating cell types prior to microarray processing poses steep technical challenges that have been met in various ways. These methodological differences have the potential to distort cell-type-specific gene expression profiles insofar as they may insufficiently filter out contaminating mRNAs or induce aberrant cellular responses not normally present in vivo. Thus we have compared the repeatability, susceptibility to contamination from off-target cell-types, and evidence for stress-responsive gene expression of five different purification methods--Laser Capture Microdissection (LCM), Translating Ribosome Affinity Purification (TRAP), Immunopanning (PAN), Fluorescence Activated Cell Sorting (FACS), and manual sorting of fluorescently labeled cells (Manual). We found that all methods obtained comparably high levels of repeatability, however, data from LCM and TRAP showed significantly higher levels of contamination than the other methods. While PAN samples showed higher activation of apoptosis-related, stress-related and immediate early genes, samples from FACS and Manual studies, which also require dissociated cells, did not. Given that TRAP targets actively translated mRNAs, whereas other methods target all transcribed mRNAs, observed differences may also reflect translational regulation.  相似文献   

8.
The dynamic and static properties of molecular dynamics simulations using various methods for treating solvent were compared. The SH3 protein domain was chosen as a test case because of its small size and high surface-to-volume ratio. The simulations were analyzed in structural terms by examining crystal packing, distribution of polar residues, and conservation of secondary structure. In addition, the "essential dynamics" method was applied to compare each of the molecular dynamics trajectories with a full solvent simulation. This method proved to be a powerful tool for the comparison of large concerted atomic motions in SH3. It identified methods of simulation that yielded significantly different dynamic properties compared to the full solvent simulation. Simulating SH3 using the stochastic dynamics algorithm with a vacuum (reduced charge) force field produced properties close to those of the full solvent simulation. The application of a recently described solvation term did not improve the dynamic properties. The large concerted atomic motions in the full solvent simulation as revealed by the essential dynamics method were analyzed for possible biological implications. Two loops, which have been shown to be involved in ligand binding, were seen to move in concert to open and close the ligand-binding site.  相似文献   

9.
Evaluation and comparison of gene clustering methods in microarray analysis   总被引:4,自引:0,他引:4  
MOTIVATION: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease or conditions under investigation. Many clustering methods including hierarchical clustering, K-means, PAM, SOM, mixture model-based clustering and tight clustering have been widely used in the literature. Yet no comprehensive comparative study has been performed to evaluate the effectiveness of these methods. RESULTS: In this paper, six gene clustering methods are evaluated by simulated data from a hierarchical log-normal model with various degrees of perturbation as well as four real datasets. A weighted Rand index is proposed for measuring similarity of two clustering results with possible scattered genes (i.e. a set of noise genes not being clustered). Performance of the methods in the real data is assessed by a predictive accuracy analysis through verified gene annotations. Our results show that tight clustering and model-based clustering consistently outperform other clustering methods both in simulated and real data while hierarchical clustering and SOM perform among the worst. Our analysis provides deep insight to the complicated gene clustering problem of expression profile and serves as a practical guideline for routine microarray cluster analysis.  相似文献   

10.
The sequences of a 51-kb region containing the cluster of five rat gamma-crystallin-coding genes (CRYG) and of a 7-kb region surrounding the sixth rat CRYG gene were determined. Approximately 78% of the total sequence represents intergenic DNA. We also sequenced 22 kb of DNA from the human CRYG gene cluster. All CRYG genes are associated with CpG-rich regions. The sequence similarity between the human and rat gene regions drops sharply (to 65%) in intronic and 3'-flanking regions but decreases only gradually in the 5'-flanking region. Highly conserved regions (greater than 80%) are found as far upstream as 1.5 kb. Overall intergenic distances are conserved. The human region contains much more repetitive DNA (24% vs. 10%) but less simple-sequence (sps) DNA (0.7% vs. 4%) than the rat region. Almost all repeats and spsDNA elements are located in the intergenic region. The location of repetitive and spsDNA differs between the orthologous regions and these elements were probably inserted after the evolutionary separation of rat and man. The Alu repeats in man and the B3 repeats in the rat are close copies of their respective consensus sequences and bordered by virtually perfect repeats. In contrast, the B1 and B2 repeats in the rat have diverged considerably from the consensus sequence and the surrounding direct repeats are usually imperfect. Thus the dispersion of the B1 and B2 repeats in the rat probably preceded that of the B3 repeats. Within the rat genomic region the spacing of Z-DNA elements is surprisingly regular, they are located about 12 kb apart. A search for putative matrix-associated regions suggests that the rat CRYG gene cluster is organized into two chromosomal domains.  相似文献   

11.

Background

The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented.

Results

TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a significance threshold are shown to be more accurate for TurboFold than for alternative methods that estimate base pairing probabilities. TurboFold-MEA, which uses base pairing probabilities from TurboFold in a maximum expected accuracy algorithm for secondary structure prediction, has accuracy comparable to the best performing secondary structure prediction methods. The computational and memory requirements for TurboFold are modest and, in terms of sequence length and number of sequences, scale much more favorably than joint alignment and folding algorithms.

Conclusions

TurboFold is an iterative probabilistic method for predicting secondary structures for multiple RNA sequences that efficiently and accurately combines the information from the comparative analysis between sequences with the thermodynamic folding model. Unlike most other multi-sequence structure prediction methods, TurboFold does not enforce strict commonality of structures and is therefore useful for predicting structures for homologous sequences that have diverged significantly. TurboFold can be downloaded as part of the RNAstructure package at http://rna.urmc.rochester.edu.  相似文献   

12.
Summary Techniques for transforming intact tissues of cereals were evaluated for their efficacy in transforming immature embryos and Type II callus of maize (Zea mays L.). The techniques used were particle bombardment, tissue electroporation, tissue electrophoresis, and silicon carbide fibers. Each method was assessed in terms of transient β-glucuronidase (GUS) expression. High levels of GUS expression were observed in A188 Type II callus using both tissue electroporation and particle bombardment, with means of 417.8 and 954.5 blue expression units (beu) per g fresh weight (FW) callus, respectively. Only particle bombardment resulted in high transient gene expression in immature embryos, with a mean transformation frequency of 34.8 b.e.u. per embryo. Very low levels of GUS expression were achieved with silicon carbide-mediated gene transfer, even when employing tissues used in the original publication (Black Mexican Sweet suspension cells). GUS expression was not obtained following tissue electrophoretic gene delivery.  相似文献   

13.
SUMMARY 1. Zooplankton production in a eutrophic reservoir was estimated by three common methods.
2. Estimates of daily production from the growth increment method and the birth and death rate versions of the biomass turnover method were poorly correlated ( r =0.58–0.60). Estimates of daily production rates from the above two versions of the biomass turnover method were strongly correlated ( r =0.90).
3. The mortality rate version of the biomass turnover method is illogical and yields anomalous results.
4. The growth increment method assumes steady state conditions and zero deaths within each stage and hence calculates potential production for each stage.
5. Estimates from a new computer simulation (PROD) were strongly correlated with ( r =0.92) but lower than those from the growth increment method. Estimates from PROD were more poorly correlated ( r =0.78) with those from the biomass turnover method.
6. There is a strong need for improved methods for estimating secondary production; computer based methods would seem to be the most promising.  相似文献   

14.
A comparison of two life table methods   总被引:1,自引:0,他引:1  
J W Kuzma 《Biometrics》1967,23(1):51-64
  相似文献   

15.
A survey of multiple sequence comparison methods   总被引:7,自引:0,他引:7  
Multiple sequence comparison refers to the search for similarity in three or more sequences. This article presents a survey of the exhaustive (optimal) and heuristic (possibly sub-optimal) methods developed for the comparison of multiple macromolecular sequences. Emphasis is given to the different approaches of the heuristic methods. Four distance measures derived from information engineering and genetic studies are introduced for the comparison between two alignments of sequences. The use ofentropy, which plays a central role in information theory as measures of information, choice and uncertainty, is proposed as a simple measure for the evaluation of the optimality of an alignment in the absence of anya priori knowledge about the structures of the sequences being compared. This article also gives two examples of comparison between alternative alignments of the same set of 5SRNAs as obtained by several different heuristic methods.  相似文献   

16.
We compare the performance of Nm estimates based on FST and RST obtained from microsatellite data using simulations of the stepwise mutation model with range constraints in allele size classes. The results of the simulations suggest that the use of microsatellite loci can lead to serious overestimations of Nm, particularly when population sizes are large (N > 5000) and range constraints are high (K < 20). The simulations also indicate that, when population sizes are small (N /= 50) and many loci (nl >/= 20), RST performs better than FST for most of the parameter space. However, FST-based estimates are always better than RST when sample sizes are moderate or small (ns 相似文献   

17.
Model-free methods are introduced to determine quantities pertaining to protein domain motions from normal mode analyses and molecular dynamics simulations. For the normal mode analysis, the methods are based on the assumption that in low frequency modes, domain motions can be well approximated by modes of motion external to the domains. To analyze the molecular dynamics trajectory, a principal component analysis tailored specifically to analyze interdomain motions is applied. A method based on the curl of the atomic displacements is described, which yields a sharp discrimination of domains, and which defines a unique interdomain screw-axis. Hinge axes are defined and classified as twist or closure axes depending on their direction. The methods have been tested on lysozyme. A remarkable correspondence was found between the first normal mode axis and the first principal mode axis, with both axes passing within 3 Å of the alpha-carbon atoms of residues 2, 39, and 56 of human lysozyme, and near the interdomain helix. The axes of the first modes are overwhelmingly closure axes. A lesser degree of correspondence is found for the second modes, but in both cases they are more twist axes than closure axes. Both analyses reveal that the interdomain connections allow only these two degrees of freedom, one more than provided by a pure mechanical hinge. Proteins 27:425–437, 1997. © 1997 Wiley-Liss, Inc.  相似文献   

18.
Numerous simulation studies have investigated the accuracy of phylogenetic inference of gene trees under maximum parsimony, maximum likelihood, and Bayesian techniques. The relative accuracy of species tree inference methods under simulation has received less study. The number of analytical techniques available for inferring species trees is increasing rapidly, and in this paper, we compare the performance of several species tree inference techniques at estimating recent species divergences using computer simulation. Simulating gene trees within species trees of different shapes and with varying tree lengths (T) and population sizes (), and evolving sequences on those gene trees, allows us to determine how phylogenetic accuracy changes in relation to different levels of deep coalescence and phylogenetic signal. When the probability of discordance between the gene trees and the species tree is high (i.e., T is small and/or is large), Bayesian species tree inference using the multispecies coalescent (BEST) outperforms other methods. The performance of all methods improves as the total length of the species tree is increased, which reflects the combined benefits of decreasing the probability of discordance between species trees and gene trees and gaining more accurate estimates for gene trees. Decreasing the probability of deep coalescences by reducing also leads to accuracy gains for most methods. Increasing the number of loci from 10 to 100 improves accuracy under difficult demographic scenarios (i.e., coalescent units ≤ 4N(e)), but 10 loci are adequate for estimating the correct species tree in cases where deep coalescence is limited or absent. In general, the correlation between the phylogenetic accuracy and the posterior probability values obtained from BEST is high, although posterior probabilities are overestimated when the prior distribution for is misspecified.  相似文献   

19.
A comparison of some methods of cluster analysis   总被引:13,自引:0,他引:13  
J C Gower 《Biometrics》1967,23(4):623-637
  相似文献   

20.
Summary A comparison among various forms of half-diallel analysis was made. The different half-diallel techniques used were: Griffing's model I, method 2 and 4, Morley-Jones' model; Walters and Morton's model, and Gardner and Eberhart's model. All these methods of diallel analysis were found to be interrelated. However, as the Gardner and Eberhart's model partitioned heterosis into different components as well as gave information about combining ability, this method had certainly some advantages over the others. The results further indicated the possibility of dominance variance being confounded with the additive variance of general combining ability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号