首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The methods of identification and spectral estimate are applied to the tachogram, i.e. the time series constituted by the cycle-by-cycle R-R interval durations measured on the ECG signal from cardiological patients in ambulatory rehabilitation training after episodes of myocardial infarction or ischemic disease. The Batch Least Squares Method is applied to identify the series as an AR process of 5th order. The whiteness test and Rissanen's optimization criterion are also fulfilled. The clinical information is in this way highly compressed in the pole diagram and in the Maximum Entropy Spectrum (MES) estimated on the basis of the AR coefficients. The experimental results in a restricted set of patients confirm the feasibility of new instrumentation design criteria for non-conventional R-R intervals parametrisation, successive diagnostic classification and beat prediction. Finally, some preliminary considerations about the capabilities of the introduced methods put into evidence the role of computerized techniques in recognizing the fundamental patterns of physiopathological heart rate variability, which the usual conventional methods of ECG analysis are not able to detect in a reliable way.  相似文献   

2.
3.
Although much of the information regarding genes'' expressions is encoded in the genome, deciphering such information has been very challenging. We reexamined Beer and Tavazoie''s (BT) approach to predict mRNA expression patterns of 2,587 genes in Saccharomyces cerevisiae from the information in their respective promoter sequences. Instead of fitting complex Bayesian network models, we trained naïve Bayes classifiers using only the sequence-motif matching scores provided by BT. Our simple models correctly predict expression patterns for 79% of the genes, based on the same criterion and the same cross-validation (CV) procedure as BT, which compares favorably to the 73% accuracy of BT. The fact that our approach did not use position and orientation information of the predicted binding sites but achieved a higher prediction accuracy, motivated us to investigate a few biological predictions made by BT. We found that some of their predictions, especially those related to motif orientations and positions, are at best circumstantial. For example, the combinatorial rules suggested by BT for the PAC and RRPE motifs are not unique to the cluster of genes from which the predictive model was inferred, and there are simpler rules that are statistically more significant than BT''s ones. We also show that CV procedure used by BT to estimate their method''s prediction accuracy is inappropriate and may have overestimated the prediction accuracy by about 10%.  相似文献   

4.
This article investigates an ensemble‐based technique called Bayesian Model Averaging (BMA) to improve the performance of protein amino acid pKa predictions. Structure‐based pKa calculations play an important role in the mechanistic interpretation of protein structure and are also used to determine a wide range of protein properties. A diverse set of methods currently exist for pKa prediction, ranging from empirical statistical models to ab initio quantum mechanical approaches. However, each of these methods are based on a set of conceptual assumptions that can effect a model's accuracy and generalizability for pKa prediction in complicated biomolecular systems. We use BMA to combine eleven diverse prediction methods that each estimate pKa values of amino acids in staphylococcal nuclease. These methods are based on work conducted for the pKa Cooperative and the pKa measurements are based on experimental work conducted by the García‐Moreno lab. Our cross‐validation study demonstrates that the aggregated estimate obtained from BMA outperforms all individual prediction methods with improvements ranging from 45 to 73% over other method classes. This study also compares BMA's predictive performance to other ensemble‐based techniques and demonstrates that BMA can outperform these approaches with improvements ranging from 27 to 60%. This work illustrates a new possible mechanism for improving the accuracy of pKa prediction and lays the foundation for future work on aggregate models that balance computational cost with prediction accuracy. Proteins 2014; 82:354–363. © 2013 Wiley Periodicals, Inc.  相似文献   

5.
Instead of assessing the overall fit of candidate models like the traditional model selection criteria, the focused information criterion focuses attention directly on the parameter of the primary interest and aims to select the model with the minimum estimated mean squared error for the estimate of the focused parameter. In this article we apply the focused information criterion for personalized medicine. By using individual‐level information from clinical observations, demographics, and genetics, we obtain the personalized predictive models to make the prognosis and diagnosis individually. The consideration of the heterogeneity among the individuals helps reduce the prediction uncertainty and improve the prediction accuracy. Two real data examples from biomedical research are studied as illustrations.  相似文献   

6.
A prediction model for the growth of plant height is developed, using polynomials in time to describe the growth rate. The growth rate is affected by forcing factors through the polynomial coefficients. A random slope model is used to describe the difference in growth rate for plants grown under similar conditions. Maximum likelihood estimates of model parameters are obtained and a selection procedure is employed to estimate the model complexity using Schwarz' bayesian criterion as a measure of predictive power. The procedure is applied to data sets for greenhouse grown poinsettias. The use of polynomials to describe the time effects on the growth rate makes the strategy versatile, and it can be used to predict the growth of many different crops. Many forcing factors of different types can be incorporated simultaneously in the model. Confidence of predictions are also quantifies, which is important when the results are applied in a practical situation, e.g. in climate control of commercial greenhouses.  相似文献   

7.
Multilocus coalescent methods for inferring species trees or historical demographic parameters typically require the assumption that gene trees for sampled SNPs or DNA sequence loci are conditionally independent given their species tree. In practice, researchers have used different criteria to delimit “independent loci.” One criterion identifies sampled loci as being independent of each other if they undergo Mendelian independent assortment (IA criterion). O'Neill et al. (2013, Molecular Ecology, 22, 111–129) used this approach in their phylogeographic study of North American tiger salamander species complex. In two other studies, researchers developed a pair of related methods that employ an independent genealogies criterion (IG criterion), which considers the effects of population‐level recombination on correlations between the gene trees of intrachromosomal loci. Here, I explain these three methods, illustrate their use with example data, and evaluate their efficacies. I show that the IA approach is more conservative, is simpler to use and requires fewer assumptions than the IG approaches. However, IG approaches can identify much larger numbers of independent loci than the IA method, which, in turn, allows researchers to obtain more precise and accurate estimates of species trees and historical demographic parameters. A disadvantage of the IG methods is that they require an estimate of the population recombination rate. Despite their drawbacks, IA and IG approaches provide molecular ecologists with promising a priori methods for selecting SNPs or DNA sequence loci that likely meet the independence assumption in coalescent‐based phylogenomic studies.  相似文献   

8.
Five methods to assess percolation rate from alternative earthen final covers (AEFCs) are described in the context of the precision with which the percolation rate can be estimated: trend analysis, tracer methods, water balance method, Darcy's Law calculations, and lysimetry. Trend evaluation of water content data is the least precise method because it cannot be used alone to assess the percolation rate. The precision of percolation rates estimated using tracer methods depends on the tracer concentration, percolation rate, and the sensitivity of the chemical extraction and analysis methods. Percolation rates determined using the water balance method have a precision of approximately 100 mm/yr in humid climates and 50 mm/yr in semiarid and drier climates, which is too large to demonstrate that an AEFC is meeting typical equivalency criterion (30 mm/yr or less). In most cases, the precision will be much poorer. Percolation rates computed using Darcy's Law with measured profiles of water content and matric suction typically have a precision that is about two orders of magnitude (or more) greater than the computed percolation rate. The Darcy's Law method can only be used for performance assessment if the estimated percolation rate is much smaller than the equivalency criterion and preferential flow is not present. Lysimetry provides the most precise estimates of percolation rate, but the precision depends on the method used to measure the collected water. The lysimeter used in the Alternative Cover Assessment Program (ACAP), which is described in this paper, can be used to estimate percolation rates with a precision between 0.00004 to 0.5 mm/yr, depending on the measurement method and the flow rates.  相似文献   

9.
Rates of molecular evolution vary widely between lineages, but quantification of how rates change has proven difficult. Recently proposed estimation procedures have mainly adopted highly parametric approaches that model rate evolution explicitly. In this study, a semiparametric smoothing method is developed using penalized likelihood. A saturated model in which every lineage has a separate rate is combined with a roughness penalty that discourages rates from varying too much across a phylogeny. A data-driven cross-validation criterion is then used to determine an optimal level of smoothing. This criterion is based on an estimate of the average prediction error associated with pruning lineages from the tree. The methods are applied to three data sets of six genes across a sample of land plants. Optimally smoothed estimates of absolute rates entailed 2- to 10-fold variation across lineages.  相似文献   

10.
Tao Sun  Ying Ding 《Biometrics》2023,79(3):2677-2690
Alzheimer's disease (AD) is a progressive and polygenic disorder that affects millions of individuals each year. Given that there have been few effective treatments yet for AD, it is highly desirable to develop an accurate model to predict the full disease progression profile based on an individual's genetic characteristics for early prevention and clinical management. This work uses data composed of all four phases of the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, including 1740 individuals with 8 million genetic variants. We tackle several challenges in this data, characterized by large-scale genetic data, interval-censored outcome due to intermittent assessments, and left truncation in one study phase (ADNIGO). Specifically, we first develop a semiparametric transformation model on interval-censored and left-truncated data and estimate parameters through a sieve approach. Then we propose a computationally efficient generalized score test to identify variants associated with AD progression. Next, we implement a novel neural network on interval-censored data (NN-IC) to construct a prediction model using top variants identified from the genome-wide test. Comprehensive simulation studies show that the NN-IC outperforms several existing methods in terms of prediction accuracy. Finally, we apply the NN-IC to the full ADNI data and successfully identify subgroups with differential progression risk profiles. Data used in the preparation of this article were obtained from the ADNI database.  相似文献   

11.
A recurring methodological problem in the evaluation of the predictive validity of selection methods is that the values of the criterion variable are available for selected applicants only. This so-called range restriction problem causes biased population estimates. Correction methods for direct and indirect range restriction scenarios have widely studied for continuous criterion variables but not for dichotomous ones. The few existing approaches are inapplicable because they do not consider the unknown base rate of success. Hence, there is a lack of scientific research on suitable correction methods and the systematic analysis of their accuracies in the cases of a naturally or artificially dichotomous criterion. We aim to overcome this deficiency by viewing the range restriction problem as a missing data mechanism. We used multiple imputation by chained equations to generate complete criterion data before estimating the predictive validity and the base rate of success. Monte Carlo simulations were conducted to investigate the accuracy of the proposed correction in dependence of selection ratio, predictive validity, and base rate of success in an experimental design. In addition, we compared our proposed missing data approach with Thorndike’s well-known correction formulas that have only been used in the case of continuous criterion variables so far. The results show that the missing data approach is more accurate in estimating the predictive validity than Thorndike’s correction formulas. The accuracy of our proposed correction increases as the selection ratio and the correlation between predictor and criterion increase. Furthermore, the missing data approach provides a valid estimate of the unknown base rate of success. On the basis of our findings, we argue for the use of multiple imputation by chained equations in the evaluation of the predictive validity of selection methods when the criterion is dichotomous.  相似文献   

12.
Quantitative predictions in computational life sciences are often based on regression models. The advent of machine learning has led to highly accurate regression models that have gained widespread acceptance. While there are statistical methods available to estimate the global performance of regression models on a test or training dataset, it is often not clear how well this performance transfers to other datasets or how reliable an individual prediction is–a fact that often reduces a user’s trust into a computational method. In analogy to the concept of an experimental error, we sketch how estimators for individual prediction errors can be used to provide confidence intervals for individual predictions. Two novel statistical methods, named CONFINE and CONFIVE, can estimate the reliability of an individual prediction based on the local properties of nearby training data. The methods can be applied equally to linear and non-linear regression methods with very little computational overhead. We compare our confidence estimators with other existing confidence and applicability domain estimators on two biologically relevant problems (MHC–peptide binding prediction and quantitative structure-activity relationship (QSAR)). Our results suggest that the proposed confidence estimators perform comparable to or better than previously proposed estimation methods. Given a sufficient amount of training data, the estimators exhibit error estimates of high quality. In addition, we observed that the quality of estimated confidence intervals is predictable. We discuss how confidence estimation is influenced by noise, the number of features, and the dataset size. Estimating the confidence in individual prediction in terms of error intervals represents an important step from plain, non-informative predictions towards transparent and interpretable predictions that will help to improve the acceptance of computational methods in the biological community.  相似文献   

13.
P. HANSEN 《Bioacoustics.》2013,22(1):61-78
ABSTRACT

A common task for researchers of animal vocalisations is statistically comparing repertoires, or sets of vocalisations. We evaluated five methods of comparing repertoires of ‘codas’, short repeated patterns of clicks, recorded from sperm whale (Physeter macrocephalus) groups. Three of the methods involved classification of codas—human observer classification, k-means cluster analysis using Calinski and Harabasz's (1974) criterion to determine k, and a divisive k-means clustering procedure using Duda and Hart's (1973) criterion to determine k. Two other methods used multivariate distances to calculate similarity measures between coda repertoires. When used on a sample coda dataset, observer classification failed to produce consistent results. Calinski and Harabasz's criterion did not provide a clear signal for determining the number of coda classes (k). Divisive clustering using Duda and Hart's criterion performed satisfactorily and, encouragingly, gave similar results to the multivariate similarity measures when used on our data. However, the relative performance of the k-means techniques is likely data dependent, so one method is not likely to perform best in all circumstances. Thus results should be checked to ensure they extract logical clusters. Using these techniques concurrently with multivariate measures allows the drawing of relatively robust conclusions about repertoire similarity while minimising uncertainties due to questionable validity of classifications.  相似文献   

14.

Key message

We propose a criterion to predict genomic selection efficiency for structured populations. This criterion is useful to define optimal calibration set and to estimate prediction reliability for multiparental populations.

Abstract

Genomic selection refers to the use of genotypic information for predicting the performance of selection candidates. It has been shown that prediction accuracy depends on various parameters including the composition of the calibration set (CS). Assessing the level of accuracy of a given prediction scenario is of highest importance because it can be used to optimize CS sampling before collecting phenotypes, and once the breeding values are predicted it informs the breeders about the reliability of these predictions. Different criteria were proposed to optimize CS sampling in highly diverse panels, which can be useful to screen collections of genotypes. But plant breeders often work on structured material such as biparental or multiparental populations, for which these criteria are less adapted. We derived from the generalized coefficient of determination (CD) theory different criteria to optimize CS sampling and to assess the reliability associated to predictions in structured populations. These criteria were evaluated on two nested association mapping (NAM) populations and two highly diverse panels of maize. They were efficient to sample optimized CS in most situations. They could also estimate at least partly the reliability associated to predictions between NAM families, but they could not estimate differences in the reliability associated to the predictions of NAM families using the highly diverse panels as calibration sets. We illustrated that the CD criteria could be adapted to various prediction scenarios including inter and intra-family predictions, resulting in higher prediction accuracies.
  相似文献   

15.
I show how one can estimate the shape of a thermal performance curve using information theory. This approach ranks plausible models by their Akaike information criterion (AIC), which is a measure of a model's ability to describe the data discounted by the model's complexity. I analyze previously published data to demonstrate how one applies this approach to describe a thermal performance curve. This exemplary analysis produced two interesting results. First, a model with a very high r2 (a modified Gaussian function) appeared to overfit the data. Second, the model favored by information theory (a Gaussian function) has been used widely in optimality studies of thermal performance curves. Finally, I discuss the choice between regression and ANOVA when comparing thermal performance curves and highlight a superior method called template mode of variation. Much progress can be made by abandoning traditional methods for a method that combines information theory with template mode of variation.  相似文献   

16.

Background  

Cross-validation (CV) is an effective method for estimating the prediction error of a classifier. Some recent articles have proposed methods for optimizing classifiers by choosing classifier parameter values that minimize the CV error estimate. We have evaluated the validity of using the CV error estimate of the optimized classifier as an estimate of the true error expected on independent data.  相似文献   

17.
For current state-of-the-art methods, the prediction of correct topology of membrane proteins has been reported to be above 80%. However, this performance has only been observed in small and possibly biased data sets obtained from protein structures or biochemical assays. Here, we test a number of topology predictors on an "unseen" set of proteins of known structure and also on four "genome-scale" data sets, including one recent large set of experimentally validated human membrane proteins with glycosylated sites. The set of glycosylated proteins is also used to examine the ability of prediction methods to separate membrane from nonmembrane proteins. The results show that methods utilizing multiple sequence alignments are overall superior to methods that do not. The best performance is obtained by TOPCONS, a consensus method that combines several of the other prediction methods. The best methods to distinguish membrane from nonmembrane proteins belong to the "Phobius" group of predictors. We further observe that the reported high accuracies in the smaller benchmark sets are not quite maintained in larger scale benchmarks. Instead, we estimate the performance of the best prediction methods for eukaryotic membrane proteins to be between 60% and 70%. The low agreement between predictions from different methods questions earlier estimates about the global properties of the membrane proteome. Finally, we suggest a pipeline to estimate these properties using a combination of the best predictors that could be applied in large-scale proteomics studies of membrane proteins.  相似文献   

18.
Borchers DL  Efford MG 《Biometrics》2008,64(2):377-385
Live-trapping capture-recapture studies of animal populations with fixed trap locations inevitably have a spatial component: animals close to traps are more likely to be caught than those far away. This is not addressed in conventional closed-population estimates of abundance and without the spatial component, rigorous estimates of density cannot be obtained. We propose new, flexible capture-recapture models that use the capture locations to estimate animal locations and spatially referenced capture probability. The models are likelihood-based and hence allow use of Akaike's information criterion or other likelihood-based methods of model selection. Density is an explicit parameter, and the evaluation of its dependence on spatial or temporal covariates is therefore straightforward. Additional (nonspatial) variation in capture probability may be modeled as in conventional capture-recapture. The method is tested by simulation, using a model in which capture probability depends only on location relative to traps. Point estimators are found to be unbiased and standard error estimators almost unbiased. The method is used to estimate the density of Red-eyed Vireos (Vireo olivaceus) from mist-netting data from the Patuxent Research Refuge, Maryland, U.S.A. Estimates agree well with those from an existing spatially explicit method based on inverse prediction. A variety of additional spatially explicit models are fitted; these include models with temporal stratification, behavioral response, and heterogeneous animal home ranges.  相似文献   

19.
Huang Y  Pepe MS 《Biometrika》2009,96(4):991-997
The performance of a well-calibrated risk model for a binary disease outcome can be characterized by the population distribution of risk and displayed with the predictiveness curve. Better performance is characterized by a wider distribution of risk, since this corresponds to better risk stratification in the sense that more subjects are identified at low and high risk for the disease outcome. Although methods have been developed to estimate predictiveness curves from cohort studies, most studies to evaluate novel risk prediction markers employ case-control designs. Here we develop semiparametric methods that accommodate case-control data. The semiparametric methods are flexible, and naturally generalize methods previously developed for cohort data. Applications to prostate cancer risk prediction markers illustrate the methods.  相似文献   

20.
Survival prediction from high-dimensional genomic data is dependent on a proper regularization method. With an increasing number of such methods proposed in the literature, comparative studies are called for and some have been performed. However, there is currently no consensus on which prediction assessment criterion should be used for time-to-event data. Without a firm knowledge about whether the choice of evaluation criterion may affect the conclusions made as to which regularization method performs best, these comparative studies may be of limited value. In this paper, four evaluation criteria are investigated: the log-rank test for two groups, the area under the time-dependent ROC curve (AUC), an R2-measure based on the Cox partial likelihood, and an R2-measure based on the Brier score. The criteria are compared according to how they rank six widely used regularization methods that are based on the Cox regression model, namely univariate selection, principal components regression (PCR), supervised PCR, partial least squares regression, ridge regression, and the lasso. Based on our application to three microarray gene expression data sets, we find that the results obtained from the widely used log-rank test deviate from the other three criteria studied. For future studies, where one also might want to include non-likelihood or non-model-based regularization methods, we argue in favor of AUC and the R2-measure based on the Brier score, as these do not suffer from the arbitrary splitting into two groups nor depend on the Cox partial likelihood.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号