共查询到20条相似文献,搜索用时 8 毫秒
1.
Background
We present a series of simulation studies that explore the relative performance of several phylogenetic network approaches (statistical parsimony, split decomposition, union of maximum parsimony trees, neighbor-net, simulated history recombination upper bound, median-joining, reduced median joining and minimum spanning network) compared to standard tree approaches, (neighbor-joining and maximum parsimony) in the presence and absence of recombination.Principal Findings
In the absence of recombination, all methods recovered the correct topology and branch lengths nearly all of the time when the substitution rate was low, except for minimum spanning networks, which did considerably worse. At a higher substitution rate, maximum parsimony and union of maximum parsimony trees were the most accurate. With recombination, the ability to infer the correct topology was halved for all methods and no method could accurately estimate branch lengths.Conclusions
Our results highlight the need for more accurate phylogenetic network methods and the importance of detecting and accounting for recombination in phylogenetic studies. Furthermore, we provide useful information for choosing a network algorithm and a framework in which to evaluate improvements to existing methods and novel algorithms developed in the future. 相似文献2.
Christina Pfeiffer Birgit Fuerst-Waltl Hermann Schwarzenbacher Franz Steininger Christian Fuerst 《遗传、选种与进化》2015,47(1)
Background
Modern dairy cattle breeding goals include several production and more and more functional traits. Estimated breeding values (EBV) that are combined in the total merit index usually come from single-trait models or from multivariate models for groups of traits. In most cases, a multivariate animal model based on phenotypic data for all traits is not feasible and approximate methods based on selection index theory are applied to derive the total merit index. Therefore, the objective of this study was to compare a full multitrait animal model with two approximate multitrait models and a selection index approach based on simulated data.Methods
Three production and two functional traits were simulated to mimic the national Austrian Brown Swiss population. The reference method for derivation of the total merit index was a multitrait evaluation based on all phenotypic data. Two of the approximate methods were variations of an approximate multitrait model that used either yield deviations or de-regressed breeding values. The final method was an adaptation of the selection index method that is used in routine evaluations in Austria and Germany. Three scenarios with respect to residual covariances were set up: residual covariances were equal to zero, or half of or equal to the genetic covariances.Results
Results of both approximate multitrait models were very close to those of the reference method, with rank correlations of 1. Both methods were nearly unbiased. Rank correlations for the selection index method showed good results when residual covariances were zero but correlations with the reference method decreased when residual covariances were large. Furthermore, EBV were biased when residual covariances were high.Conclusions
We applied an approximate multitrait two-step procedure to yield deviations and de-regressed breeding values, which led to nearly unbiased results. De-regressed breeding values gave even slightly better results. Our results confirmed that ignoring residual covariances when a selection index approach is applied leads to remarkable bias. This could be relevant in terms of selection accuracy. Our findings suggest that the approximate multitrait approach applied to de-regressed breeding values can be used in routine genetic evaluation. 相似文献3.
A systematic comparison and evaluation of biclustering methods for gene expression data 总被引:9,自引:0,他引:9
Prelić A Bleuler S Zimmermann P Wille A Bühlmann P Gruissem W Hennig L Thiele L Zitzler E 《Bioinformatics (Oxford, England)》2006,22(9):1122-1129
MOTIVATION: In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and datasets. Several biclustering methods have been proposed in the literature; however, it is not clear how the different techniques compare with each other with respect to the biological relevance of the clusters as well as with other characteristics such as robustness and sensitivity to noise. Accordingly, no guidelines concerning the choice of the biclustering method are currently available. RESULTS: First, this paper provides a methodology for comparing and validating biclustering methods that includes a simple binary reference model. Although this model captures the essential features of most biclustering approaches, it is still simple enough to exactly determine all optimal groupings; to this end, we propose a fast divide-and-conquer algorithm (Bimax). Second, we evaluate the performance of five salient biclustering algorithms together with the reference model and a hierarchical clustering method on various synthetic and real datasets for Saccharomyces cerevisiae and Arabidopsis thaliana. The comparison reveals that (1) biclustering in general has advantages over a conventional hierarchical clustering approach, (2) there are considerable performance differences between the tested methods and (3) already the simple reference model delivers relevant patterns within all considered settings. 相似文献
4.
5.
A comparison of somatotype methods 总被引:8,自引:0,他引:8
In order to compare Parnell's and Heath's somatotype methods, the authors independently somatotyped a series of 59 adult male and 61 adult female subjects, (1) using the criteria of Heath's method, (2) using the criteria of Parnell's method, and (3) taking into consideration tentatively adapted Parnell criteria in addition to Heath's criteria. The authors conclude that when use similar rating criteria their mean differences are smaller, their overall correlations are similar, and their percentage agreements to a half-unit are higher (96%) than for comparisons reported by other investigators. The study considers the potentially important relationships of measurements of subcutaneous fat to ratings of the first component. The similarity of distributions of subcutaneous fat measurements and of first component ratings in selected samples suggest important interrelationships among ratings of the first component, height/ weight ratios and subcutaneous fat measurements. The authors feel: (1) that Parnell's method fails to modify the basic weaknesses in Sheldon's somatotype method; and (2) that analyses of the anthropometric data basic to Parnell's method, if guided by the criteria of Heath's method, will further objectify and simplify Heath's method, will improve agreement among independent raters, and will increase the usefulness of somatotyping as a research instrument. 相似文献
6.
A comparison of field methods for measuring soil carbon dioxide evolution: Experiments and simulation 总被引:7,自引:0,他引:7
Three widely used methods for measuring total soil CO2 evolution are evaluated, including the dynamic CO2 absorption method, the static CO2 absorption method and the closed chamber method. The study covers laboratory experiments. numerical experiments with a simulation model and field measurements. The results are used to perform an error analysis. The aim of this error analysis is to indicate the impact of each method on the CO2 dynamics during the measurement, and to select the most suitable method for frequent field usage.Laboratory experiments and simulation results show that the dynamic CO2 absorption method has the potential to absorb all CO2 evolving at the soil surface. The results also prove that the method has only a minor impact on the CO2 concentration-depth gradient and the CO2 efflux. The static CO2 absorption method underestimates the soil CO2 evolution, because the absorption velocity is too low, due to slow diffusion processes. Measurements with the closed-chamber method are based on an increasing concentration with time under a closed cover. However, the accumulation of gas alters the concentration gradient in the soil profile and thus causes a rapidly decreasing efflux during the measurement. A commonly used mathematical procedure, which corrects for the altered concentration gradient, does not yield the exact surface efflux, because the effect of increasing storage in the soil profile is not incorporated. Field measurements of CO2 evolution, using the closed-chamber method and the dynamic CO2 absorption method confirm the trends that have been predicted by the simulation model. The results of this study indicate that the dynamic CO2 absorption method is accurate. As it is cheap and simple, it is suitable for the study of temporal and spatial dynamics of CO2 evolution from the soil. 相似文献
7.
Expression profiling of restricted neural populations using microarrays can facilitate neuronal classification and provide insight into the molecular bases of cellular phenotypes. Due to the formidable heterogeneity of intermixed cell types that make up the brain, isolating cell types prior to microarray processing poses steep technical challenges that have been met in various ways. These methodological differences have the potential to distort cell-type-specific gene expression profiles insofar as they may insufficiently filter out contaminating mRNAs or induce aberrant cellular responses not normally present in vivo. Thus we have compared the repeatability, susceptibility to contamination from off-target cell-types, and evidence for stress-responsive gene expression of five different purification methods--Laser Capture Microdissection (LCM), Translating Ribosome Affinity Purification (TRAP), Immunopanning (PAN), Fluorescence Activated Cell Sorting (FACS), and manual sorting of fluorescently labeled cells (Manual). We found that all methods obtained comparably high levels of repeatability, however, data from LCM and TRAP showed significantly higher levels of contamination than the other methods. While PAN samples showed higher activation of apoptosis-related, stress-related and immediate early genes, samples from FACS and Manual studies, which also require dissociated cells, did not. Given that TRAP targets actively translated mRNAs, whereas other methods target all transcribed mRNAs, observed differences may also reflect translational regulation. 相似文献
8.
A comparison of structural and dynamic properties of different simulation methods applied to SH3. 总被引:3,自引:2,他引:1 下载免费PDF全文
D M van Aalten A Amadei R Bywater J B Findlay H J Berendsen C Sander P F Stouten 《Biophysical journal》1996,70(2):684-692
The dynamic and static properties of molecular dynamics simulations using various methods for treating solvent were compared. The SH3 protein domain was chosen as a test case because of its small size and high surface-to-volume ratio. The simulations were analyzed in structural terms by examining crystal packing, distribution of polar residues, and conservation of secondary structure. In addition, the "essential dynamics" method was applied to compare each of the molecular dynamics trajectories with a full solvent simulation. This method proved to be a powerful tool for the comparison of large concerted atomic motions in SH3. It identified methods of simulation that yielded significantly different dynamic properties compared to the full solvent simulation. Simulating SH3 using the stochastic dynamics algorithm with a vacuum (reduced charge) force field produced properties close to those of the full solvent simulation. The application of a recently described solvation term did not improve the dynamic properties. The large concerted atomic motions in the full solvent simulation as revealed by the essential dynamics method were analyzed for possible biological implications. Two loops, which have been shown to be involved in ligand binding, were seen to move in concert to open and close the ligand-binding site. 相似文献
9.
Thalamuthu A Mukhopadhyay I Zheng X Tseng GC 《Bioinformatics (Oxford, England)》2006,22(19):2405-2412
MOTIVATION: Microarray technology has been widely applied in biological and clinical studies for simultaneous monitoring of gene expression in thousands of genes. Gene clustering analysis is found useful for discovering groups of correlated genes potentially co-regulated or associated to the disease or conditions under investigation. Many clustering methods including hierarchical clustering, K-means, PAM, SOM, mixture model-based clustering and tight clustering have been widely used in the literature. Yet no comprehensive comparative study has been performed to evaluate the effectiveness of these methods. RESULTS: In this paper, six gene clustering methods are evaluated by simulated data from a hierarchical log-normal model with various degrees of perturbation as well as four real datasets. A weighted Rand index is proposed for measuring similarity of two clustering results with possible scattered genes (i.e. a set of noise genes not being clustered). Performance of the methods in the real data is assessed by a predictive accuracy analysis through verified gene annotations. Our results show that tight clustering and model-based clustering consistently outperform other clustering methods both in simulated and real data while hierarchical clustering and SOM perform among the worst. Our analysis provides deep insight to the complicated gene clustering problem of expression profile and serves as a practical guideline for routine microarray cluster analysis. 相似文献
10.
Nucleotide sequence of the rat gamma-crystallin gene region and comparison with an orthologous human region 总被引:5,自引:0,他引:5
The sequences of a 51-kb region containing the cluster of five rat gamma-crystallin-coding genes (CRYG) and of a 7-kb region surrounding the sixth rat CRYG gene were determined. Approximately 78% of the total sequence represents intergenic DNA. We also sequenced 22 kb of DNA from the human CRYG gene cluster. All CRYG genes are associated with CpG-rich regions. The sequence similarity between the human and rat gene regions drops sharply (to 65%) in intronic and 3'-flanking regions but decreases only gradually in the 5'-flanking region. Highly conserved regions (greater than 80%) are found as far upstream as 1.5 kb. Overall intergenic distances are conserved. The human region contains much more repetitive DNA (24% vs. 10%) but less simple-sequence (sps) DNA (0.7% vs. 4%) than the rat region. Almost all repeats and spsDNA elements are located in the intergenic region. The location of repetitive and spsDNA differs between the orthologous regions and these elements were probably inserted after the evolutionary separation of rat and man. The Alu repeats in man and the B3 repeats in the rat are close copies of their respective consensus sequences and bordered by virtually perfect repeats. In contrast, the B1 and B2 repeats in the rat have diverged considerably from the consensus sequence and the surrounding direct repeats are usually imperfect. Thus the dispersion of the B1 and B2 repeats in the rat probably preceded that of the B3 repeats. Within the rat genomic region the spacing of Z-DNA elements is surprisingly regular, they are located about 12 kb apart. A search for putative matrix-associated regions suggests that the rat CRYG gene cluster is organized into two chromosomal domains. 相似文献
11.
Background
The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented.Results
TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a significance threshold are shown to be more accurate for TurboFold than for alternative methods that estimate base pairing probabilities. TurboFold-MEA, which uses base pairing probabilities from TurboFold in a maximum expected accuracy algorithm for secondary structure prediction, has accuracy comparable to the best performing secondary structure prediction methods. The computational and memory requirements for TurboFold are modest and, in terms of sequence length and number of sequences, scale much more favorably than joint alignment and folding algorithms.Conclusions
TurboFold is an iterative probabilistic method for predicting secondary structures for multiple RNA sequences that efficiently and accurately combines the information from the comparative analysis between sequences with the thermodynamic folding model. Unlike most other multi-sequence structure prediction methods, TurboFold does not enforce strict commonality of structures and is therefore useful for predicting structures for homologous sequences that have diverged significantly. TurboFold can be downloaded as part of the RNAstructure package at http://rna.urmc.rochester.edu. 相似文献12.
E. M. Southgate M. R. Davey J. B. Power R. J. Westcott 《In vitro cellular & developmental biology. Plant》1998,34(3):218-224
Summary Techniques for transforming intact tissues of cereals were evaluated for their efficacy in transforming immature embryos and
Type II callus of maize (Zea mays L.). The techniques used were particle bombardment, tissue electroporation, tissue electrophoresis, and silicon carbide fibers.
Each method was assessed in terms of transient β-glucuronidase (GUS) expression. High levels of GUS expression were observed
in A188 Type II callus using both tissue electroporation and particle bombardment, with means of 417.8 and 954.5 blue expression
units (beu) per g fresh weight (FW) callus, respectively. Only particle bombardment resulted in high transient gene expression
in immature embryos, with a mean transformation frequency of 34.8 b.e.u. per embryo. Very low levels of GUS expression were
achieved with silicon carbide-mediated gene transfer, even when employing tissues used in the original publication (Black
Mexican Sweet suspension cells). GUS expression was not obtained following tissue electrophoretic gene delivery. 相似文献
13.
A comparison of zooplankton production estimates obtained from three commonly used methods and a computer simulation program 总被引:2,自引:0,他引:2
C. R. KING 《Freshwater Biology》1988,20(1):117-126
SUMMARY 1. Zooplankton production in a eutrophic reservoir was estimated by three common methods.
2. Estimates of daily production from the growth increment method and the birth and death rate versions of the biomass turnover method were poorly correlated ( r =0.58–0.60). Estimates of daily production rates from the above two versions of the biomass turnover method were strongly correlated ( r =0.90).
3. The mortality rate version of the biomass turnover method is illogical and yields anomalous results.
4. The growth increment method assumes steady state conditions and zero deaths within each stage and hence calculates potential production for each stage.
5. Estimates from a new computer simulation (PROD) were strongly correlated with ( r =0.92) but lower than those from the growth increment method. Estimates from PROD were more poorly correlated ( r =0.78) with those from the biomass turnover method.
6. There is a strong need for improved methods for estimating secondary production; computer based methods would seem to be the most promising. 相似文献
2. Estimates of daily production from the growth increment method and the birth and death rate versions of the biomass turnover method were poorly correlated ( r =0.58–0.60). Estimates of daily production rates from the above two versions of the biomass turnover method were strongly correlated ( r =0.90).
3. The mortality rate version of the biomass turnover method is illogical and yields anomalous results.
4. The growth increment method assumes steady state conditions and zero deaths within each stage and hence calculates potential production for each stage.
5. Estimates from a new computer simulation (PROD) were strongly correlated with ( r =0.92) but lower than those from the growth increment method. Estimates from PROD were more poorly correlated ( r =0.78) with those from the biomass turnover method.
6. There is a strong need for improved methods for estimating secondary production; computer based methods would seem to be the most promising. 相似文献
14.
15.
A survey of multiple sequence comparison methods 总被引:7,自引:0,他引:7
Multiple sequence comparison refers to the search for similarity in three or more sequences. This article presents a survey
of the exhaustive (optimal) and heuristic (possibly sub-optimal) methods developed for the comparison of multiple macromolecular
sequences. Emphasis is given to the different approaches of the heuristic methods. Four distance measures derived from information
engineering and genetic studies are introduced for the comparison between two alignments of sequences. The use ofentropy, which plays a central role in information theory as measures of information, choice and uncertainty, is proposed as a simple
measure for the evaluation of the optimality of an alignment in the absence of anya priori knowledge about the structures of the sequences being compared. This article also gives two examples of comparison between
alternative alignments of the same set of 5SRNAs as obtained by several different heuristic methods. 相似文献
16.
A comparison of two indirect methods for estimating average levels of gene flow using microsatellite data 总被引:1,自引:0,他引:1
We compare the performance of Nm estimates based on FST and RST obtained from microsatellite data using simulations of the stepwise mutation model with range constraints in allele size classes. The results of the simulations suggest that the use of microsatellite loci can lead to serious overestimations of Nm, particularly when population sizes are large (N > 5000) and range constraints are high (K < 20). The simulations also indicate that, when population sizes are small (N = 500) and migration rates are moderate (Nm approximately 2), violations to the assumption used to derive the Nm estimators lead to biased results. Under ideal conditions, i.e. large sample sizes (ns >/= 50) and many loci (nl >/= 20), RST performs better than FST for most of the parameter space. However, FST-based estimates are always better than RST when sample sizes are moderate or small (ns = 10) and the number of loci scored is low (nl < 20). These are the conditions under which many real investigations are carried out and therefore we conclude that in many cases the most conservative approach is to use FST. 相似文献
17.
Model-free methods are introduced to determine quantities pertaining to protein domain motions from normal mode analyses and molecular dynamics simulations. For the normal mode analysis, the methods are based on the assumption that in low frequency modes, domain motions can be well approximated by modes of motion external to the domains. To analyze the molecular dynamics trajectory, a principal component analysis tailored specifically to analyze interdomain motions is applied. A method based on the curl of the atomic displacements is described, which yields a sharp discrimination of domains, and which defines a unique interdomain screw-axis. Hinge axes are defined and classified as twist or closure axes depending on their direction. The methods have been tested on lysozyme. A remarkable correspondence was found between the first normal mode axis and the first principal mode axis, with both axes passing within 3 Å of the alpha-carbon atoms of residues 2, 39, and 56 of human lysozyme, and near the interdomain helix. The axes of the first modes are overwhelmingly closure axes. A lesser degree of correspondence is found for the second modes, but in both cases they are more twist axes than closure axes. Both analyses reveal that the interdomain connections allow only these two degrees of freedom, one more than provided by a pure mechanical hinge. Proteins 27:425–437, 1997. © 1997 Wiley-Liss, Inc. 相似文献
18.
Numerous simulation studies have investigated the accuracy of phylogenetic inference of gene trees under maximum parsimony, maximum likelihood, and Bayesian techniques. The relative accuracy of species tree inference methods under simulation has received less study. The number of analytical techniques available for inferring species trees is increasing rapidly, and in this paper, we compare the performance of several species tree inference techniques at estimating recent species divergences using computer simulation. Simulating gene trees within species trees of different shapes and with varying tree lengths (T) and population sizes (), and evolving sequences on those gene trees, allows us to determine how phylogenetic accuracy changes in relation to different levels of deep coalescence and phylogenetic signal. When the probability of discordance between the gene trees and the species tree is high (i.e., T is small and/or is large), Bayesian species tree inference using the multispecies coalescent (BEST) outperforms other methods. The performance of all methods improves as the total length of the species tree is increased, which reflects the combined benefits of decreasing the probability of discordance between species trees and gene trees and gaining more accurate estimates for gene trees. Decreasing the probability of deep coalescences by reducing also leads to accuracy gains for most methods. Increasing the number of loci from 10 to 100 improves accuracy under difficult demographic scenarios (i.e., coalescent units ≤ 4N(e)), but 10 loci are adequate for estimating the correct species tree in cases where deep coalescence is limited or absent. In general, the correlation between the phylogenetic accuracy and the posterior probability values obtained from BEST is high, although posterior probabilities are overestimated when the prior distribution for is misspecified. 相似文献
19.
A comparison of some methods of cluster analysis 总被引:13,自引:0,他引:13
J C Gower 《Biometrics》1967,23(4):623-637
20.
M. Singh R. K. Singh 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》1984,67(4):323-326
Summary A comparison among various forms of half-diallel analysis was made. The different half-diallel techniques used were: Griffing's model I, method 2 and 4, Morley-Jones' model; Walters and Morton's model, and Gardner and Eberhart's model. All these methods of diallel analysis were found to be interrelated. However, as the Gardner and Eberhart's model partitioned heterosis into different components as well as gave information about combining ability, this method had certainly some advantages over the others. The results further indicated the possibility of dominance variance being confounded with the additive variance of general combining ability. 相似文献