首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Many research groups are estimating trees containing anywhere from a few thousands to hundreds of thousands of species, toward the eventual goal of the estimation of a Tree of Life, containing perhaps as many as several million leaves. These phylogenetic estimations present enormous computational challenges, and current computational methods are likely to fail to run even on data sets in the low end of this range. One approach to estimate a large species tree is to use phylogenetic estimation methods (such as maximum likelihood) on a supermatrix produced by concatenating multiple sequence alignments for a collection of markers; however, the most accurate of these phylogenetic estimation methods are extremely computationally intensive for data sets with more than a few thousand sequences. Supertree methods, which assemble phylogenetic trees from a collection of trees on subsets of the taxa, are important tools for phylogeny estimation where phylogenetic analyses based upon maximum likelihood (ML) are infeasible. In this paper, we introduce SuperFine, a meta-method that utilizes a novel two-step procedure in order to improve the accuracy and scalability of supertree methods. Our study, using both simulated and empirical data, shows that SuperFine-boosted supertree methods produce more accurate trees than standard supertree methods, and run quickly on very large data sets with thousands of sequences. Furthermore, SuperFine-boosted matrix representation with parsimony (MRP, the most well-known supertree method) approaches the accuracy of ML methods on supermatrix data sets under realistic conditions.  相似文献   

2.
A model for accurate drift estimation in streams   总被引:1,自引:0,他引:1  
1. This paper explores the experimental difficulties involved with the use of drift nets in small streams, and outlines a method whereby the estimation of drift density (number of specimens m−3 of water) can be improved.
2. Changes in the filtering efficiency of the net caused by trapping of organic debris ('clogging') has the effect of reducing net entrance velocities, causing errors in the calculation of sampled water volume, and thus drift density. A model of the reductions in net entrance velocity based on empirical measurements of trapped debris is developed.
3. Cross-sectional velocity calculations suggest that errors can also be introduced into drift density calculations by positioning sampling nets only on the bed. A method to allow this effect is demonstrated.
4. As adjustments to the calculation of sampled volume are required when sampling in rivers that undergo marked changes in discharge during the sampling period, a method whereby these effects can be accommodated to improve drift density estimations is also outlined.
5. The results of this study imply that theoretical links between flow hydraulics and short-term drift behaviour are poorly understood.  相似文献   

3.
A simple method for accurate estimation of apoptotic cells   总被引:6,自引:0,他引:6  
A simple, sensitive, and reliable "DNA diffusion" assay for the quantification of apoptosis is described. Human lymphocytes and human lymphoblastoid cells, MOLT-4, were exposed to 0, 12.5, 25, 50, or 100 rad of X-rays. After 24 h of incubation, cells were mixed with agarose, microgels were made, and cells were lysed in high salt and detergents. DNA was precipitated in microgels by ethanol. Staining of DNA was done with an intense fluorescent dye, YOYO-1. Apoptotic cells show a halo of granular DNA with a hazy outer boundary. Necrotic cells, resulting from hyperthermia treatment, on the other hand, show an unusually large homogeneous nucleus with a clearly defined boundary. The number of cells with apoptotic and necrotic appearance can be scored and quantified by using a fluorescent microscope. Results were compared with other methods of apoptosis measurement: morphological estimations of apoptosis and DNA ladder pattern formation in regular agarose gel electrophoresis. Validation of the technique was done using some known inducers of apoptosis and necrosis (hyperthermia, hydrogen peroxide, mitoxantrone, novobiocin, and sodium ascorbate).  相似文献   

4.
Rapid and accurate estimation of release conditions in the javelin throw   总被引:2,自引:0,他引:2  
We have developed a system to measure initial conditions in the javelin throw rapidly enough to be used by the thrower for feedback in performance improvement. The system consists of three subsystems whose main tasks are: (A) acquisition of automatically digitized high speed (200 Hz) video x, y position data for the first 0.1-0.2 s of the javelin flight after release (B) estimation of five javelin release conditions from the x, y position data and (C) graphical presentation to the thrower of these release conditions and a simulation of the subsequent flight together with optimal conditions and flight for the sam release velocity. The estimation scheme relies on a simulation model and is at least an order of magnitude more accurate than previously reported measurements of javelin release conditions. The system provides, for the first time ever in any throwing event, the ability to critique nearly instantly in a precise, quantitative manner the crucial factors in the throw which determine the range. This should be expected to much greater control and consistency of throwing variables by athletes who use system and could even lead to an evolution of new throwing techniques.  相似文献   

5.
6.
Affymetrix SNP arrays have been widely used for single-nucleotide polymorphism (SNP) genotype calling and DNA copy number variation inference. Although numerous methods have achieved high accuracy in these fields, most studies have paid little attention to the modeling of hybridization of probes to off-target allele sequences, which can affect the accuracy greatly. In this study, we address this issue and demonstrate that hybridization with mismatch nucleotides (HWMMN) occurs in all SNP probe-sets and has a critical effect on the estimation of allelic concentrations (ACs). We study sequence binding through binding free energy and then binding affinity, and develop a probe intensity composite representation (PICR) model. The PICR model allows the estimation of ACs at a given SNP through statistical regression. Furthermore, we demonstrate with cell-line data of known true copy numbers that the PICR model can achieve reasonable accuracy in copy number estimation at a single SNP locus, by using the ratio of the estimated AC of each sample to that of the reference sample, and can reveal subtle genotype structure of SNPs at abnormal loci. We also demonstrate with HapMap data that the PICR model yields accurate SNP genotype calls consistently across samples, laboratories and even across array platforms.  相似文献   

7.
Phi-values provide an important benchmark for the comparison of experimental protein folding studies to computer simulations and theories of the folding process. Despite the growing importance of phi measurements, however, formulas to quantify the precision with which phi is measured have seen little significant discussion. Moreover, a commonly employed method for the determination of standard errors on phi estimates assumes that estimates of the changes in free energy of the transition and folded states are independent. Here we demonstrate that this assumption is usually incorrect and that this typically leads to the underestimation of phi precision. We derive an analytical expression for the precision of phi estimates (assuming linear chevron behavior) that explicitly takes this dependence into account. We also describe an alternative method that implicitly corrects for the effect. By simulating experimental chevron data, we show that both methods accurately estimate phi confidence intervals. We also explore the effects of the commonly employed techniques of calculating phi from kinetics estimated at non-zero denaturant concentrations and via the assumption of parallel chevron arms. We find that these approaches can produce significantly different estimates for phi (again, even for truly linear chevron behavior), indicating that they are not equivalent, interchangeable measures of transition state structure. Lastly, we describe a Web-based implementation of the above algorithms for general use by the protein folding community.  相似文献   

8.
Zhang SD 《PloS one》2011,6(4):e18874
BACKGROUND: Biomedical researchers are now often faced with situations where it is necessary to test a large number of hypotheses simultaneously, eg, in comparative gene expression studies using high-throughput microarray technology. To properly control false positive errors the FDR (false discovery rate) approach has become widely used in multiple testing. The accurate estimation of FDR requires the proportion of true null hypotheses being accurately estimated. To date many methods for estimating this quantity have been proposed. Typically when a new method is introduced, some simulations are carried out to show the improved accuracy of the new method. However, the simulations are often very limited to covering only a few points in the parameter space. RESULTS: Here I have carried out extensive in silico experiments to compare some commonly used methods for estimating the proportion of true null hypotheses. The coverage of these simulations is unprecedented thorough over the parameter space compared to typical simulation studies in the literature. Thus this work enables us to draw conclusions globally as to the performance of these different methods. It was found that a very simple method gives the most accurate estimation in a dominantly large area of the parameter space. Given its simplicity and its overall superior accuracy I recommend its use as the first choice for estimating the proportion of true null hypotheses in multiple testing.  相似文献   

9.
10.
An experimental-numerical study was performed to investigate the relationships between computed tomography (CT)-density and ash density, and between ash density and apparent density for bone tissue, to evaluate their influence on the accuracy of subject-specific FE models of human bones. Sixty cylindrical bone specimens were examined. CT-densities were computed from CT images while apparent and ash densities were measured experimentally. The CT/ash-density and ash/apparent-density relationships were calculated. Finite element models of eight human femurs were generated considering these relationships to assess their effect on strain prediction accuracy. CT and ash density were linearly correlated (R(2)=0.997) over the whole density range but not equivalent (intercep t <0, slope >1). A constant ash/apparent-density ratio (0.598+/-0.004) was found for cortical bone. A lower ratio, with a larger dispersion, was found for trabecular bone (0.459+/-0.100), but it became less dispersed, and equal to that of cortical tissue, when testing smaller trabecular specimens (0.598+/-0.036). This suggests that an experimental error occurred in apparent-density measurements for large trabecular specimens and a constant ratio can be assumed valid for the whole density range. Introducing the obtained relationships in the FE modelling procedure improved strain prediction accuracy (R(2)=0.95, RMSE=7%). The results suggest that: (i) a correction of the densitometric calibration should be used when evaluating bone ash-density from clinical CT scans, to avoid ash-density underestimation and overestimation for low- and high-density bone tissue, respectively; (ii) the ash/apparent-density ratio can be assumed constant in human femurs and (iii) the correction improves significantly the model accuracy and should be considered in subject-specific bone modelling.  相似文献   

11.
This essay draws attention to the Prison Reentry Industry’s potential to create unprecedented employment opportunities for formerly incarcerated individuals. Situated at the intersection of money, programming, and state-sponsored surveillance, the Prison Reentry Industry (PRI) is notable for its implication in prolonging and deepening people’s entrenchment in the criminal justice/reentry matrix; however, the burgeoning of a “reentry industry” also ensures the growth of employment positions tailored perfectly to the experiences of formerly incarcerated individuals. The PRI’s production of significant employment opportunities for certain members of the formerly incarcerated population turns social science research on incarceration and employment on its head. It is by now almost conventional wisdom that a criminal history stands as a barrier to employment, but the PRI’s potential to create a substantial job market for formerly incarcerated people may engender an extraordinary outcome in which a criminal record represents for some, the factor leading to entrée into the ranks of the employed.  相似文献   

12.
Microarray experiments generate data sets with information on the expression levels of thousands of genes in a set of biological samples. Unfortunately, such experiments often produce multiple missing expression values, normally due to various experimental problems. As many algorithms for gene expression analysis require a complete data matrix as input, the missing values have to be estimated in order to analyze the available data. Alternatively, genes and arrays can be removed until no missing values remain. However, for genes or arrays with only a small number of missing values, it is desirable to impute those values. For the subsequent analysis to be as informative as possible, it is essential that the estimates for the missing gene expression values are accurate. A small amount of badly estimated missing values in the data might be enough for clustering methods, such as hierachical clustering or K-means clustering, to produce misleading results. Thus, accurate methods for missing value estimation are needed. We present novel methods for estimation of missing values in microarray data sets that are based on the least squares principle, and that utilize correlations between both genes and arrays. For this set of methods, we use the common reference name LSimpute. We compare the estimation accuracy of our methods with the widely used KNNimpute on three complete data matrices from public data sets by randomly knocking out data (labeling as missing). From these tests, we conclude that our LSimpute methods produce estimates that consistently are more accurate than those obtained using KNNimpute. Additionally, we examine a more classic approach to missing value estimation based on expectation maximization (EM). We refer to our EM implementations as EMimpute, and the estimate errors using the EMimpute methods are compared with those our novel methods produce. The results indicate that on average, the estimates from our best performing LSimpute method are at least as accurate as those from the best EMimpute algorithm.  相似文献   

13.
New applications of DNA and RNA sequencing are expanding the field of biodiversity discovery and ecological monitoring, yet questions remain regarding precision and efficiency. Due to primer bias, the ability of metabarcoding to accurately depict biomass of different taxa from bulk communities remains unclear, while PCR‐free whole mitochondrial genome (mitogenome) sequencing may provide a more reliable alternative. Here, we used a set of documented mock communities comprising 13 species of freshwater macroinvertebrates of estimated individual biomass, to compare the detection efficiency of COI metabarcoding (three different amplicons) and shotgun mitogenome sequencing. Additionally, we used individual COI barcoding and de novo mitochondrial genome sequencing, to provide reference sequences for OTU assignment and metagenome mapping (mitogenome skimming), respectively. We found that, even though both methods occasionally failed to recover very low abundance species, metabarcoding was less consistent, by failing to recover some species with higher abundances, probably due to primer bias. Shotgun sequencing results provided highly significant correlations between read number and biomass in all but one species. Conversely, the read–biomass relationships obtained from metabarcoding varied across amplicons. Specifically, we found significant relationships for eight of 13 (amplicons B1FR‐450 bp, FF130R‐130 bp) or four of 13 (amplicon FFFR, 658 bp) species. Combining the results of all three COI amplicons (multiamplicon approach) improved the read–biomass correlations for some of the species. Overall, mitogenomic sequencing yielded more informative predictions of biomass content from bulk macroinvertebrate communities than metabarcoding. However, for large‐scale ecological studies, metabarcoding currently remains the most commonly used approach for diversity assessment.  相似文献   

14.
The tube building polychaete Lanice conchilega is a common and ecologically important species in intertidal and shallow subtidal sands. It builds a characteristic tube with ragged fringes and can retract rapidly into its tube to depths of more than 20 cm. Therefore, it is very difficult to sample L. conchilega individuals, especially with a Van Veen grab. Consequently, many studies have used tube counts as estimates of real densities. This study reports on some aspects to be considered when using tube counts as a density estimate of L. conchilega, based on intertidal and subtidal samples. Due to its accuracy and independence of sampling depth, the tube method is considered the prime method to estimate the density of L. conchilega. However, caution is needed when analyzing samples with fragile young individuals and samples from areas where temporary physical disturbance is likely to occur.  相似文献   

15.
Given the absence of universal marker genes in the viral kingdom, researchers typically use BLAST (with stringent E-values) for taxonomic classification of viral metagenomic sequences. Since majority of metagenomic sequences originate from hitherto unknown viral groups, using stringent e-values results in most sequences remaining unclassified. Furthermore, using less stringent e-values results in a high number of incorrect taxonomic assignments. The SOrt-ITEMS algorithm provides an approach to address the above issues. Based on alignment parameters, SOrt-ITEMS follows an elaborate work-flow for assigning reads originating from hitherto unknown archaeal/bacterial genomes. In SOrt-ITEMS, alignment parameter thresholds were generated by observing patterns of sequence divergence within and across various taxonomic groups belonging to bacterial and archaeal kingdoms. However, many taxonomic groups within the viral kingdom lack a typical Linnean-like taxonomic hierarchy. In this paper, we present ProViDE (Program for Viral Diversity Estimation), an algorithm that uses a customized set of alignment parameter thresholds, specifically suited for viral metagenomic sequences. These thresholds capture the pattern of sequence divergence and the non-uniform taxonomic hierarchy observed within/across various taxonomic groups of the viral kingdom. Validation results indicate that the percentage of 'correct' assignments by ProViDE is around 1.7 to 3 times higher than that by the widely used similarity based method MEGAN. The misclassification rate of ProViDE is around 3 to 19% (as compared to 5 to 42% by MEGAN) indicating significantly better assignment accuracy. ProViDE software and a supplementary file (containing supplementary figures and tables referred to in this article) is available for download from http://metagenomics.atc.tcs.com/binning/ProViDE/  相似文献   

16.
17.
18.
Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed.  相似文献   

19.
MOTIVATION: Time-series measurements of metabolite concentration have become increasingly more common, providing data for building kinetic models of metabolic networks using ordinary differential equations (ODEs). In practice, however, such time-course data are usually incomplete and noisy, and the estimation of kinetic parameters from these data is challenging. Practical limitations due to data and computational aspects, such as solving stiff ODEs and finding global optimal solution to the estimation problem, give motivations to develop a new estimation procedure that can circumvent some of these constraints. RESULTS: In this work, an incremental and iterative parameter estimation method is proposed that combines and iterates between two estimation phases. One phase involves a decoupling method, in which a subset of model parameters that are associated with measured metabolites, are estimated using the minimization of slope errors. Another phase follows, in which the ODE model is solved one equation at a time and the remaining model parameters are obtained by minimizing concentration errors. The performance of this two-phase method was tested on a generic branched metabolic pathway and the glycolytic pathway of Lactococcus lactis. The results showed that the method is efficient in getting accurate parameter estimates, even when some information is missing.  相似文献   

20.
An innovative approach is proposed for the estimation of hourly average solar radiation on tilted surface. The proposed approach, which is based on artificial neural networks and problem decomposition, demonstrates the following characteristics: Accuracy. Superior estimation is attained over both conventional approaches and theoretical models. Simplicity and efficiency. A small training set is employed and the training/test patterns involve few easily obtainable parameters. Generalization capability and robustness to noise. Apart from being of interest in meteorology, the accurate and efficient estimation of hourly average solar radiation on tilted surface is especially important in solar energy applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号