首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Man Jin  Yixin Fang 《Biometrics》2011,67(1):124-132
Summary In family studies, canonical discriminant analysis can be used to find linear combinations of phenotypes that exhibit high ratios of between‐family to within‐family variabilities. But with large numbers of phenotypes, canonical discriminant analysis may overfit. To estimate the predicted ratios associated with the coefficients obtained from canonical discriminant analysis, two methods are developed; one is based on bias correction and the other based on cross‐validation. Because the cross‐validation is computationally intensive, an approximation to the cross‐validation is also developed. Furthermore, these methods can be applied to perform variable selection in canonical discriminant analysis. The proposed methods are illustrated with simulation studies and applications to two real examples.  相似文献   

2.
Multiple diagnostic tests and risk factors are commonly available for many diseases. This information can be either redundant or complimentary. Combining them may improve the diagnostic/predictive accuracy, but also unnecessarily increase complexity, risks, and/or costs. The improved accuracy gained by including additional variables can be evaluated by the increment of the area under (AUC) the receiver‐operating characteristic curves with and without the new variable(s). In this study, we derive a new test statistic to accurately and efficiently determine the statistical significance of this incremental AUC under a multivariate normality assumption. Our test links AUC difference to a quadratic form of a standardized mean shift in a unit of the inverse covariance matrix through a properly linear transformation of all diagnostic variables. The distribution of the quadratic estimator is related to the multivariate Behrens–Fisher problem. We provide explicit mathematical solutions of the estimator and its approximate non‐central F‐distribution, type I error rate, and sample size formula. We use simulation studies to prove that our new test maintains prespecified type I error rates as well as reasonable statistical power under practical sample sizes. We use data from the Study of Osteoporotic Fractures as an application example to illustrate our method.  相似文献   

3.
We compare several nonparametric and parametric weighting methods for the adjustment of the effect of strata. In particular, we focus on the adjustment methods in the context of receiver‐operating characteristic (ROC) analysis. Nonparametrically, rank‐based van Elteren's test and inverse‐variance (IV) weighting using the area under the ROC curve (AUC) are examined. Parametrically, the stratified t‐test and IV AUC weighted method are applied based on a binormal monotone transformation model. Stratum‐specific, pooled, and adjusted estimates are obtained. The pooled and adjusted AUCs are estimated. We illustrate and compare these weighting methods on a multi‐center diagnostic trial and through extensive Monte‐Carlo simulations.  相似文献   

4.
Recognition of the importance of cross‐validation (‘any technique or instance of assessing how the results of a statistical analysis will generalize to an independent dataset’; Wiktionary, en.wiktionary.org) is one reason that the U.S. Securities and Exchange Commission requires all investment products to carry some variation of the disclaimer, ‘Past performance is no guarantee of future results.’ Even a cursory examination of financial behaviour, however, demonstrates that this warning is regularly ignored, even by those who understand what an independent dataset is. In the natural sciences, an analogue to predicting future returns for an investment strategy is predicting power of a particular algorithm to perform with new data. Once again, the key to developing an unbiased assessment of future performance is through testing with independent data—that is, data that were in no way involved in developing the method in the first place. A ‘gold‐standard’ approach to cross‐validation is to divide the data into two parts, one used to develop the algorithm, the other used to test its performance. Because this approach substantially reduces the sample size that can be used in constructing the algorithm, researchers often try other variations of cross‐validation to accomplish the same ends. As illustrated by Anderson in this issue of Molecular Ecology Resources, however, not all attempts at cross‐validation produce the desired result. Anderson used simulated data to evaluate performance of several software programs designed to identify subsets of loci that can be effective for assigning individuals to population of origin based on multilocus genetic data. Such programs are likely to become increasingly popular as researchers seek ways to streamline routine analyses by focusing on small sets of loci that contain most of the desired signal. Anderson found that although some of the programs made an attempt at cross‐validation, all failed to meet the ‘gold standard’ of using truly independent data and therefore produced overly optimistic assessments of power of the selected set of loci—a phenomenon known as ‘high grading bias.’  相似文献   

5.
In model building and model evaluation, cross‐validation is a frequently used resampling method. Unfortunately, this method can be quite time consuming. In this article, we discuss an approximation method that is much faster and can be used in generalized linear models and Cox’ proportional hazards model with a ridge penalty term. Our approximation method is based on a Taylor expansion around the estimate of the full model. In this way, all cross‐validated estimates are approximated without refitting the model. The tuning parameter can now be chosen based on these approximations and can be optimized in less time. The method is most accurate when approximating leave‐one‐out cross‐validation results for large data sets which is originally the most computationally demanding situation. In order to demonstrate the method's performance, it will be applied to several microarray data sets. An R package penalized, which implements the method, is available on CRAN.  相似文献   

6.
DNA‐based protocols are the standard methods for the diagnosis of infected plant material. Nevertheless, these methods are time‐consuming and require trained personnel, with an efficacy depending on the sampling procedure. In comparison, recognition methods based on volatile compounds emissions are less precise, but allow a non‐destructive mass screening of bulk samples, and may be implemented to steer molecular diagnosis. In this study, the analysis of volatile compounds is used for the discrimination of fire blight (Erwinia amylovora) and blossom blight (Pseudomonas syringae pv. syringae) on apple propagation material. Possible marker compounds were identified by gas chromatography–mass spectroscopy (GC‐MS) and proton transfer reaction‐time of flight‐mass spectroscopy (PTR‐ToF‐MS). In addition, two commercial electronic noses were used for diagnosis. After a preliminary validation in vitro, a diagnostic protocol was successfully developed to scale up to real nursery conditions on cold stored, asymptomatic dormant plants.  相似文献   

7.
8.
Predicting the effect of missense variations on protein stability and dynamics is important for understanding their role in diseases, and the link between protein structure and function. Approaches to estimate these changes have been proposed, but most only consider single‐point missense variants and a static state of the protein, with those that incorporate dynamics are computationally expensive. Here we present DynaMut2, a web server that combines Normal Mode Analysis (NMA) methods to capture protein motion and our graph‐based signatures to represent the wildtype environment to investigate the effects of single and multiple point mutations on protein stability and dynamics. DynaMut2 was able to accurately predict the effects of missense mutations on protein stability, achieving Pearson's correlation of up to 0.72 (RMSE: 1.02 kcal/mol) on a single point and 0.64 (RMSE: 1.80 kcal/mol) on multiple‐point missense mutations across 10‐fold cross‐validation and independent blind tests. For single‐point mutations, DynaMut2 achieved comparable performance with other methods when predicting variations in Gibbs Free Energy (ΔΔG) and in melting temperature (ΔTm). We anticipate our tool to be a valuable suite for the study of protein flexibility analysis and the study of the role of variants in disease. DynaMut2 is freely available as a web server and API at http://biosig.unimelb.edu.au/dynamut2 .  相似文献   

9.
LncRNA and miRNA are key molecules in mechanism of competing endogenous RNAs(ceRNA), and their interactions have been discovered with important roles in gene regulation. As supplementary to the identification of lncRNA‐miRNA interactions from CLIP‐seq experiments, in silico prediction can select the most potential candidates for experimental validation. Although developing computational tool for predicting lncRNA‐miRNA interaction is of great importance for deciphering the ceRNA mechanism, little effort has been made towards this direction. In this paper, we propose an approach based on linear neighbour representation to predict lncRNA‐miRNA interactions (LNRLMI). Specifically, we first constructed a bipartite network by combining the known interaction network and similarities based on expression profiles of lncRNAs and miRNAs. Based on such a data integration, linear neighbour representation method was introduced to construct a prediction model. To evaluate the prediction performance of the proposed model, k‐fold cross validations were implemented. As a result, LNRLMI yielded the average AUCs of 0.8475 ± 0.0032, 0.8960 ± 0.0015 and 0.9069 ± 0.0014 on 2‐fold, 5‐fold and 10‐fold cross validation, respectively. A series of comparison experiments with other methods were also conducted, and the results showed that our method was feasible and effective to predict lncRNA‐miRNA interactions via a combination of different types of useful side information. It is anticipated that LNRLMI could be a useful tool for predicting non‐coding RNA regulation network that lncRNA and miRNA are involved in.  相似文献   

10.
Exosomes are small membrane vesicles released by many cells. These vesicles can mediate cellular communications by transmitting active molecules including long non‐coding RNAs (lncRNAs). In this study, our aim was to identify a panel of lncRNAs in serum exosomes for the diagnosis and recurrence prediction of bladder cancer (BC). The expressions of 11 candidate lncRNAs in exosome were investigated in training set (n = 200) and an independent validation set (n = 320) via quantitative real‐time PCR. A three‐lncRNA panel (PCAT‐1, UBC1 and SNHG16) was finally identified by multivariate logistic regression model to provide high diagnostic accuracy for BC with an area under the receiver‐operating characteristic curve (AUC) of 0.857 and 0.826 in training set and validation set, respectively, which was significantly higher than that of urine cytology. The corresponding AUCs of this panel for patients with Ta, T1 and T2‐T4 were 0.760, 0.827 and 0.878, respectively. In addition, Kaplan‐Meier analysis showed that non‐muscle‐invasive BC (NMIBC) patients with high UBC1 expression had significantly lower recurrence‐free survival (P = 0.01). Multivariate Cox analysis demonstrated that UBC1 was independently associated with tumour recurrence of NMIBC (P = 0.018). Our study suggested that lncRNAs in serum exosomes may serve as considerable diagnostic and prognostic biomarkers of BC.  相似文献   

11.
In this study, an improvement in the oligonucleotide‐based DNA microarray for the genoserotyping of Escherichia coli is presented. Primer and probes for additional 70 O antigen groups were developed. The microarray was transferred to a new platform, the ArrayStrip format, which allows high through‐put tests in 96‐well formats and fully automated microarray analysis. Thus, starting from a single colony, it is possible to determine within a few hours and a single experiment, 94 of the over 180 known O antigen groups as well as 47 of the 53 different H antigens. The microarray was initially validated with a set of defined reference strains that had previously been serotyped by conventional agglutination in various reference centers. For further validation of the microarray, 180 clinical E. coli isolates of human origin (from urine samples, blood cultures, bronchial secretions, and wound swabs) and 53 E. coli isolates from cattle, pigs, and poultry were used. A high degree of concordance between the results of classical antibody‐based serotyping and DNA‐based genoserotyping was demonstrated during validation of the new 70 O antigen groups as well as for the field strains of human and animal origin. Therefore, this oligonucleotide array is a diagnostic tool that is user‐friendly and more efficient than classical serotyping by agglutination. Furthermore, the tests can be performed in almost every routine lab and are easily expanded and standardized.  相似文献   

12.
Early detection is vital for prolonging 5‐year survival for patients with gastric cancer (GC). Numerous studies indicate that circulating long non‐coding RNAs (lncRNAs) can be used to diagnose malignant tumours. This study aimed to investigate the capacity of novel lncRNAs for diagnosing GC. A lncRNA microarray assay was used to screen differentially expressed lncRNAs between plasma of patients with GC and healthy controls. Plasma samples from 100 patients with healthy controls were used to construct a multiple‐gene panel. An additional 50 pairs of GC patients with healthy controls were used to evaluate the diagnostic accuracy of the panel. Expression levels of lncRNAs were quantified through real‐time polymerase chain reaction. The receiver operating characteristic curve and area under curve (AUC) were used to estimate the diagnostic capacity. We identified three lncRNAs, CTC‐501O10.1, AC100830.4 and RP11‐210K20.5 that were up‐regulated in the plasma of GC patients with AUCs 0.724, 0.730 and 0.737, respectively (< .01). Based on the logistic regression model, the combined AUC of the three lncRNAs was 0.764. The AUC of the panel was 0.700 in the validation cohort. These findings indicate that plasma lncRNAs can serve as potential biomarkers for detection of GC.  相似文献   

13.
Modeling plant growth using functional traits is important for understanding the mechanisms that underpin growth and for predicting new situations. We use three data sets on plant height over time and two validation methods—in‐sample model fit and leave‐one‐species‐out cross‐validation—to evaluate non‐linear growth model predictive performance based on functional traits. In‐sample measures of model fit differed substantially from out‐of‐sample model predictive performance; the best fitting models were rarely the best predictive models. Careful selection of predictor variables reduced the bias in parameter estimates, and there was no single best model across our three data sets. Testing and comparing multiple model forms is important. We developed an R package with a formula interface for straightforward fitting and validation of hierarchical, non‐linear growth models. Our intent is to encourage thorough testing of multiple growth model forms and an increased emphasis on assessing model fit relative to a model's purpose.  相似文献   

14.
Accumulating experimental evidence has demonstrated that microRNAs (miRNAs) have a huge impact on numerous critical biological processes and they are associated with different complex human diseases. Nevertheless, the task to predict potential miRNAs related to diseases remains difficult. In this paper, we developed a Kernel Fusion‐based Regularized Least Squares for MiRNA‐Disease Association prediction model (KFRLSMDA), which applied kernel fusion technique to fuse similarity matrices and then utilized regularized least squares to predict potential miRNA‐disease associations. To prove the effectiveness of KFRLSMDA, we adopted leave‐one‐out cross‐validation (LOOCV) and 5‐fold cross‐validation and then compared KFRLSMDA with 10 previous computational models (MaxFlow, MiRAI, MIDP, RKNNMDA, MCMDA, HGIMDA, RLSMDA, HDMP, WBSMDA and RWRMDA). Outperforming other models, KFRLSMDA achieved AUCs of 0.9246 in global LOOCV, 0.8243 in local LOOCV and average AUC of 0.9175 ± 0.0008 in 5‐fold cross‐validation. In addition, respectively, 96%, 100% and 90% of the top 50 potential miRNAs for breast neoplasms, colon neoplasms and oesophageal neoplasms were confirmed by experimental discoveries. We also predicted potential miRNAs related to hepatocellular cancer by removing all known related miRNAs of this cancer and 98% of the top 50 potential miRNAs were verified. Furthermore, we predicted potential miRNAs related to lymphoma using the data set in the old version of the HMDD database and 80% of the top 50 potential miRNAs were confirmed. Therefore, it can be concluded that KFRLSMDA has reliable prediction performance.  相似文献   

15.
Neisseria meningitidis, a human‐specific bacterial pathogen causes bacterial meningitis by invading the meninges (outer lining) of central nervous system. It is the polysaccharide present on the bacterial capsid that distinguishes various serogroups of N. meningitidis and can be utilized as antigens to elicit immune response. A computational approach identified candidate T‐cell epitopes from outer membrane proteins Por B of N. meningitidis (MC58): (273KGLVDDADI282 in loop VII and 170GRHNSESYH179 in loop IV) present on the exposed surface of immunogenic loops of class 3 outer membrane proteins allele of N. meningitidis. One of them, KGLVDDADI is used here for designing a diagnostic tool via molecularly imprinted piezoelectric sensor (molecularly imprinted polymer‐quartz crystal microbalance) for N. meningitidis strain MC58. Methacrylic acid, ethylene glycol dimethacrylate and azoisobutyronitrile were used as functional monomer, cross‐linker and initiator, respectively. The epitope can be simultaneously bound to methacrylic acid and fitted into the shape‐selective cavities. On extraction of epitope sequence from thus grafted polymeric film, shape‐selective and sensitive sites were generated on electrochemical quartz crystal microbalance crystal, ie, known as epitope imprinted polymers. Imprinting was characterized by atomic force microscopy images. The epitope‐imprinted sensor was able to selectively bind N. meningitidis proteins present in blood serum of patients suffering from brain fever. Thus, fabricated sensor can be used as a diagnostic tool for meningitis disease.  相似文献   

16.
Microcalcifications are an early mammographic sign of breast cancer and a target for stereotactic breast needle biopsy. Here, we develop and compare different approaches for developing Raman classification algorithms to diagnose invasive and in situ breast cancer, fibrocystic change and fibroadenoma that can be associated with microcalcifications. In this study, Raman spectra were acquired from tissue cores obtained from fresh breast biopsies and analyzed using a constituent‐based breast model. Diagnostic algorithms based on the breast model fit coefficients were devised using logistic regression, C4.5 decision tree classification, k‐nearest neighbor (k ‐NN) and support vector machine (SVM) analysis, and subjected to leave‐one‐out cross validation. The best performing algorithm was based on SVM analysis (with radial basis function), which yielded a positive predictive value of 100% and negative predictive value of 96% for cancer diagnosis. Importantly, these results demonstrate that Raman spectroscopy provides adequate diagnostic information for lesion discrimination even in the presence of microcalcifications, which to the best of our knowledge has not been previously reported. (© 2013 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

17.
Recently, microRNAs (miRNAs) are confirmed to be important molecules within many crucial biological processes and therefore related to various complex human diseases. However, previous methods of predicting miRNA–disease associations have their own deficiencies. Under this circumstance, we developed a prediction method called deep representations‐based miRNA–disease association (DRMDA) prediction. The original miRNA–disease association data were extracted from HDMM database. Meanwhile, stacked auto‐encoder, greedy layer‐wise unsupervised pre‐training algorithm and support vector machine were implemented to predict potential associations. We compared DRMDA with five previous classical prediction models (HGIMDA, RLSMDA, HDMP, WBSMDA and RWRMDA) in global leave‐one‐out cross‐validation (LOOCV), local LOOCV and fivefold cross‐validation, respectively. The AUCs achieved by DRMDA were 0.9177, 08339 and 0.9156 ± 0.0006 in the three tests above, respectively. In further case studies, we predicted the top 50 potential miRNAs for colon neoplasms, lymphoma and prostate neoplasms, and 88%, 90% and 86% of the predicted miRNA can be verified by experimental evidence, respectively. In conclusion, DRMDA is a promising prediction method which could identify potential and novel miRNA–disease associations.  相似文献   

18.
Inferring potential drug indications, for either novel or approved drugs, is a key step in drug development. Previous computational methods in this domain have focused on either drug repositioning or matching drug and disease gene expression profiles. Here, we present a novel method for the large‐scale prediction of drug indications (PREDICT) that can handle both approved drugs and novel molecules. Our method is based on the observation that similar drugs are indicated for similar diseases, and utilizes multiple drug–drug and disease–disease similarity measures for the prediction task. On cross‐validation, it obtains high specificity and sensitivity (AUC=0.9) in predicting drug indications, surpassing existing methods. We validate our predictions by their overlap with drug indications that are currently under clinical trials, and by their agreement with tissue‐specific expression information on the drug targets. We further show that disease‐specific genetic signatures can be used to accurately predict drug indications for new diseases (AUC=0.92). This lays the computational foundation for future personalized drug treatments, where gene expression signatures from individual patients would replace the disease‐specific signatures.  相似文献   

19.
Evaluating the classification accuracy of a candidate biomarker signaling the onset of disease or disease status is essential for medical decision making. A good biomarker would accurately identify the patients who are likely to progress or die at a particular time in the future or who are in urgent need for active treatments. To assess the performance of a candidate biomarker, the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) are commonly used. In many cases, the standard simple random sampling (SRS) design used for biomarker validation studies is costly and inefficient. In order to improve the efficiency and reduce the cost of biomarker validation, marker‐dependent sampling (MDS) may be used. In a MDS design, the selection of patients to assess true survival time is dependent on the result of a biomarker assay. In this article, we introduce a nonparametric estimator for time‐dependent AUC under a MDS design. The consistency and the asymptotic normality of the proposed estimator is established. Simulation shows the unbiasedness of the proposed estimator and a significant efficiency gain of the MDS design over the SRS design.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号