首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary When several diagnostic tests are available, one can combine them to achieve better diagnostic accuracy. This article considers the optimal linear combination that maximizes the area under the receiver operating characteristic curve (AUC); the estimates of the combination's coefficients can be obtained via a nonparametric procedure. However, for estimating the AUC associated with the estimated coefficients, the apparent estimation by re‐substitution is too optimistic. To adjust for the upward bias, several methods are proposed. Among them the cross‐validation approach is especially advocated, and an approximated cross‐validation is developed to reduce the computational cost. Furthermore, these proposed methods can be applied for variable selection to select important diagnostic tests. The proposed methods are examined through simulation studies and applications to three real examples.  相似文献   

2.
Sexual dimorphisms in shell-bearing snails expressed by characteristic traits of their respective shells would offer the possibility for a lot of studies about gender distribution in populations, species, etc. In this study, the seven main shell characters of the snail Cochlostoma septemspirale were measured in both sexes: (1) height and (2) width of the shell, (3) height and (4) width of the aperture, (5) width of the last whorl, (6) rib density on the last whorl, and (7) intensity of the reddish or brown pigments forming three bands over the shell. The variation of size and shape was explored with statistical methods adapted to principal components analysis (PCA) and linear discriminant analysis (LDA). In particular, we applied some multivariate morphometric tools for the analysis of ratios that have been developed only recently, that is, the PCA ratio spectrum, allometry ratio spectrum, and LDA ratio extractor. The overall separation of the two sexes was tested with LDA cross validation.The results show that there is a sexual dimorphism in the size and shape of shells. Females are more slender than males and are characterised by larger size, a slightly reduced aperture height but larger shell height and whorl width. Therefore they have a considerable larger shell volume (about one fifth) in the part above the aperture. Furthermore, the last whorl of females is slightly less strongly pigmented and mean rib density slightly higher. All characters overlap quite considerably between sexes. However, by using cross validation based on the 5 continuous shell characters more than 90% of the shells can be correctly assigned to each sex.  相似文献   

3.
An albino strain, which had originated from Okinawa, Japan, and a normally coloured strain, which had originated from West Africa, have been used to study density‐dependent morphometric phase characteristics and their changes in adults of the migratory locust, Locusta migratoria (L.). By repeated crossings we also obtained congenic albinos and normal phenotypes and investigated their morphometrics with increasing West African genome, eventually reaching 99.6% West African and 0.4% Okinawa gene pool. The data were analysed by the classical morphometric ratios (F/C and E/F; F = length of the hind femur, C = maximum width of the head, E = length of the fore wings), as well as by canonical discriminant (multivariate) analysis. The latter was based on measurements of F, C and E (as above), as well as of M (minimum width of the pronotum) and H (maximum height of the pronontum). Okinawa albinos showed more solitarious morphometrics and a smaller amplitude of morphometric phase change than West African normal phenotypes. Both the morphometric ratios and the canonical discriminant analysis demonstrated clearly that these differences were caused primarily by the strain (Okinawa vs. West African). However, the pigmentation (albino vs. normal colouration) also affected morphometric phase differences; albinos showed more solitarious morphometrics and somewhat more restricted morphometric phase change than congenic normal phenotypes. The effect of the pigmentation was considerably smaller than that of the strain. The results refute Nolte's claim that albino locusts constitute an extreme solitarious phase, even under crowding. However, Nolte's less extreme claim, that albino locusts have more solitarious morphometrics than normally coloured locusts, is validated by the present results.  相似文献   

4.
Mammalian interphase chromosomes fold into a multitude of loops to fit the confines of cell nuclei, and looping is tightly linked to regulated function. Chromosome conformation capture (3C) technology has significantly advanced our understanding of this structure‐to‐function relationship. However, all 3C‐based methods rely on chemical cross‐linking to stabilize spatial interactions. This step remains a “black box” as regards the biases it may introduce, and some discrepancies between microscopy and 3C studies have now been reported. To address these concerns, we developed “i3C”, a novel approach for capturing spatial interactions without a need for cross‐linking. We apply i3C to intact nuclei of living cells and exploit native forces that stabilize chromatin folding. Using different cell types and loci, computational modeling, and a methylation‐based orthogonal validation method, “TALE‐iD”, we show that native interactions resemble cross‐linked ones, but display improved signal‐to‐noise ratios and are more focal on regulatory elements and CTCF sites, while strictly abiding to topologically associating domain restrictions.  相似文献   

5.
Capsule Discriminant functions based on morphometric variables provide a reliable method for sex identification of free‐living and hacked young Ospreys.

Aims To describe an easy, accurate and low‐cost method for sex determination of fully grown nestling and fledgling Ospreys Pandion haliaetus based on morphometric measurements.

Methods Four different measurements were taken in 114 birds (40–73 days old) and a DNA analysis, using PCR amplification, was carried out for sex identification. A forward stepwise discriminant analysis was performed to build the best explanatory discriminant models, which were subsequently validated using statistics and external samples.

Results Our best discriminant function retained forearm and tarsus as the best predictor variables and classified 95.1% of the sample correctly, supported also by external cross‐validations with both hacked and free‐living birds. Moreover, a discriminant function with only forearm as predictor showed a similar high correct classification power (93.4%).

Conclusions These discriminant functions can be used as a reliable and immediate method for sex determination of young Ospreys since they showed high discriminant accuracy, close to that of molecular procedures, and were supported by external cross‐validations, both for free‐living and hacked birds. Thus, these morphometric measurements should be considered as standard tools for future scientific studies and management of Osprey populations  相似文献   

6.
This study explores various options available for choosing the number of principal coordinates m in the canonical analysis of principal coordinates ‘CAP’, a useful procedure that has wide‐ranging application wherever multivariate data sets are collected or generated. Choosing too few coordinates (small m) in this constrained (i.e. hypothesis‐based) ordination procedure may lead to inadequate separation of the groups (when used as a canonical discriminant analysis) or to inadequate correlation between explanatory and response variables (when used as a canonical correlations analysis), whereas choosing too many (large m) may lead to overparameterization, resulting in overfitting of the data and spurious relationships. It is shown here that the optimum number of principal coordinates is simply the one that results in the smallest P value in the canonical analysis carried out using permutations. For data in which more than one m value results in the same minimum P value, m should be chosen from that set to be the number of principal coordinates that minimizes the leave‐one‐out residual sum of squares. This choice of m provides suitable solutions for each of the 17 case studies investigated here (which yielded 17 canonical discriminant analyses and 7 canonical correlation analyses).  相似文献   

7.
Previous studies have shown that both single nucleotide polymorphisms (SNPs) and questionnaires-based method can be used for twin zygosity determination, but few validation studies have been conducted using Chinese populations. In the current study, we recruited 192 same sex Chinese adult twin pairs to evaluate the validity of using genetic markers-based method and questionnaire-based method in zygosity determination. We considered the relatedness analysis based on more than 0.6 million SNPs genotyping as the golden standards for zygosity determination. After quality control, qualified twins were left for relatedness analysis based on identical by descent calculation. Then those same sex twin pairs were included in the zygosity questionnaire validation analysis. Logistic regression model was applied to assess the discriminant ability of age, sex and the three questions in zygosity determination. Leave one out cross-validation was used as a measurement of internal validation. The results of zygosity determination based on 65 SNPs in 450k methylation array were all consistent with genotyping. Age, gender, questions of appearance confused by strangers and previously perceived zygosity consisted of the most predictable model with a consistency rate of 0.8698, cross validation predictive error of 0.1347. For twin studies with genotyping and\or 450k methylation array, there would be no need to conduct other zygosity testing for the sake of costs consideration.  相似文献   

8.
Recently, microRNAs (miRNAs) are confirmed to be important molecules within many crucial biological processes and therefore related to various complex human diseases. However, previous methods of predicting miRNA–disease associations have their own deficiencies. Under this circumstance, we developed a prediction method called deep representations‐based miRNA–disease association (DRMDA) prediction. The original miRNA–disease association data were extracted from HDMM database. Meanwhile, stacked auto‐encoder, greedy layer‐wise unsupervised pre‐training algorithm and support vector machine were implemented to predict potential associations. We compared DRMDA with five previous classical prediction models (HGIMDA, RLSMDA, HDMP, WBSMDA and RWRMDA) in global leave‐one‐out cross‐validation (LOOCV), local LOOCV and fivefold cross‐validation, respectively. The AUCs achieved by DRMDA were 0.9177, 08339 and 0.9156 ± 0.0006 in the three tests above, respectively. In further case studies, we predicted the top 50 potential miRNAs for colon neoplasms, lymphoma and prostate neoplasms, and 88%, 90% and 86% of the predicted miRNA can be verified by experimental evidence, respectively. In conclusion, DRMDA is a promising prediction method which could identify potential and novel miRNA–disease associations.  相似文献   

9.
The spherical truncation of electrostatic interactions between aminoacids makes it possible to break down long-range spatial electrostatic interactions, resulting in short-range interactions. As a result, a Markov Chain model may be used to calculate the probabilities with which the effect of a given interaction reaches aminoacids at different distances within the backbone. The entropies of a Markov Chain model of this type may then be used to codify information about the spatial distribution of charges in the protein used in this study exploring the structure-activity relationship. In this paper, a linear discriminant analysis is reported, which correctly classified 92.3% of 26 under investigation in training and leave-one-out cross validation, purely for illustrative purposes. Classification was carried out for three possible activities: lysozymes, dihydrofolate reductases, and alcohol dehydrogenases. The discriminant analysis equations were contracted into two canonical roots. These simple canonical roots have high regression coefficients (R(c1)=0.903 and R(c2)=0.70). Root1 explains the biological activity of alcohol dehydrogenases while Root2 discriminates between lysozymes and dihydrofolate reductases. It was possible to profile the effect of core, middle, and surface aminoacids on biological activity. In contrast, a model considering classic physicochemical parameters such as: polarizability, refractivity, and partition coefficient classify correctly only the 80.8% of the proteins.  相似文献   

10.
Genetic architecture fundamentally affects the way that traits evolve. However, the mapping of genotype to phenotype includes complex interactions with the environment or even the sex of an organism that can modulate the expressed phenotype. Line‐cross analysis is a powerful quantitative genetics method to infer genetic architecture by analysing the mean phenotype value of two diverged strains and a series of subsequent crosses and backcrosses. However, it has been difficult to account for complex interactions with the environment or sex within this framework. We have developed extensions to line‐cross analysis that allow for gene by environment and gene by sex interactions. Using extensive simulation studies and reanalysis of empirical data, we show that our approach can account for both unintended environmental variation when crosses cannot be reared in a common garden and can be used to test for the presence of gene by environment or gene by sex interactions. In analyses that fail to account for environmental variation between crosses, we find that line‐cross analysis has low power and high false‐positive rates. However, we illustrate that accounting for environmental variation allows for the inference of adaptive divergence, and that accounting for sex differences in phenotypes allows practitioners to infer the genetic architecture of sexual dimorphism.  相似文献   

11.
Understanding how species are distributed in the environment is increasingly important for natural resource management, particularly for keystone and habitat – forming species, and those of conservation concern. Habitat suitability models are fundamental to developing this understanding; however their use in management continues to be limited due to often‐vague model objectives and inadequate evaluation methods. Along the Northeast Pacific coast, canopy kelps (Macrocystis pyrifera and Nereocystis luetkeana) provide biogenic habitat and considerable primary production to nearshore ecosystems. We investigated the distribution of these species by examining a series of increasingly complex habitat suitability models ranging from process‐based models based on species’ ecology to complex generalised additive models applied to purpose‐collected survey data. Seeking empirical limits to model complexity, we explored the relationship between model complexity and forecast skill, measured using both cross‐validation and independent data evaluation. Our analysis confirmed the importance of predictors used in models of coastal kelp distributions developed elsewhere (i.e. depth, bottom type, bottom slope, and exposure); it also identified additional important factors including salinity, and potential interactions between exposure and salinity, and slope and tidal energy. Comparative results showed how cross‐validation can lead to over‐fitting, while independent data evaluation clearly identified the appropriate model complexity for generating habitat forecasts. Our results also illustrate that, depending on the evaluation data, predictions from simpler models can out‐perform those from more complex models. Collectively, the insights from evaluating multiple models with multiple data sets contribute to the holistic assessment of model forecast skill. The continued development of methods and metrics for evaluating model forecasts with independent data, and the explicit consideration of model objectives and assumptions, promise to increase the utility of model forecasts to decision makers.  相似文献   

12.
In model building and model evaluation, cross‐validation is a frequently used resampling method. Unfortunately, this method can be quite time consuming. In this article, we discuss an approximation method that is much faster and can be used in generalized linear models and Cox’ proportional hazards model with a ridge penalty term. Our approximation method is based on a Taylor expansion around the estimate of the full model. In this way, all cross‐validated estimates are approximated without refitting the model. The tuning parameter can now be chosen based on these approximations and can be optimized in less time. The method is most accurate when approximating leave‐one‐out cross‐validation results for large data sets which is originally the most computationally demanding situation. In order to demonstrate the method's performance, it will be applied to several microarray data sets. An R package penalized, which implements the method, is available on CRAN.  相似文献   

13.
Over the past century, studies of human pigmentary disorders along with mouse and zebrafish models have shed light on the many cellular functions associated with visible pigment phenotypes. This has led to numerous genes annotated with the ontology term “pigmentation” in independent human, mouse, and zebrafish databases. Comparisons among these datasets revealed that each is individually incomplete in documenting all genes involved in integument‐based pigmentation phenotypes. Additionally, each database contained inherent species‐specific biases in data annotation, and the term “pigmentation” did not solely reflect integument pigmentation phenotypes. This review presents a comprehensive, cross‐species list of 650 genes involved in pigmentation phenotypes that was compiled with extensive manual curation of genes annotated in OMIM, MGI, ZFIN, and GO. The resulting cross‐species list of genes both intrinsic and extrinsic to integument pigment cells provides a valuable tool that can be used to expand our knowledge of complex, pigmentation‐associated pathways.  相似文献   

14.
Summary High‐dimensional data such as microarrays have brought us new statistical challenges. For example, using a large number of genes to classify samples based on a small number of microarrays remains a difficult problem. Diagonal discriminant analysis, support vector machines, and k‐nearest neighbor have been suggested as among the best methods for small sample size situations, but none was found to be superior to others. In this article, we propose an improved diagonal discriminant approach through shrinkage and regularization of the variances. The performance of our new approach along with the existing methods is studied through simulations and applications to real data. These studies show that the proposed shrinkage‐based and regularization diagonal discriminant methods have lower misclassification rates than existing methods in many cases.  相似文献   

15.
An initial linkage analysis of the alcoholism phenotype as defined by DSM-III-R criteria and alcoholism defined by DSM-IV criteria showed many, sometimes striking, inconsistencies. These inconsistencies are greatly reduced by making the definition of alcoholism more specific. We defined new phenotypes combining the alcoholism definitions and the latent variables, defining an individual as affected if that individual is alcoholic under one of the definitions (either DSM-III-R or DSM-IV), and indicated having a symptom defined by one of the latent variables. This was done for each of the two alcoholism definitions and five latent variables, selected from a canonical discriminant analyses indicating they formed significant groupings using the electrophysiological variables. We found that linkage analyses utilizing these latent variables were much more robust and consistent than the linkage results based on DSM-III-R or DSM-IV criteria for definition of alcoholism. We also performed linkage analyses on two first principal components derived phenotypes, one derived from the electrophysiological variables, and the other derived from the latent variables. A region on chromosome 2 at 250 cM was found to be linked to both of these derived phenotypes. Further examination of the SNPs in this region identified several haplotypes strongly associated with these derived phenotypes.  相似文献   

16.
Recognition of the importance of cross‐validation (‘any technique or instance of assessing how the results of a statistical analysis will generalize to an independent dataset’; Wiktionary, en.wiktionary.org) is one reason that the U.S. Securities and Exchange Commission requires all investment products to carry some variation of the disclaimer, ‘Past performance is no guarantee of future results.’ Even a cursory examination of financial behaviour, however, demonstrates that this warning is regularly ignored, even by those who understand what an independent dataset is. In the natural sciences, an analogue to predicting future returns for an investment strategy is predicting power of a particular algorithm to perform with new data. Once again, the key to developing an unbiased assessment of future performance is through testing with independent data—that is, data that were in no way involved in developing the method in the first place. A ‘gold‐standard’ approach to cross‐validation is to divide the data into two parts, one used to develop the algorithm, the other used to test its performance. Because this approach substantially reduces the sample size that can be used in constructing the algorithm, researchers often try other variations of cross‐validation to accomplish the same ends. As illustrated by Anderson in this issue of Molecular Ecology Resources, however, not all attempts at cross‐validation produce the desired result. Anderson used simulated data to evaluate performance of several software programs designed to identify subsets of loci that can be effective for assigning individuals to population of origin based on multilocus genetic data. Such programs are likely to become increasingly popular as researchers seek ways to streamline routine analyses by focusing on small sets of loci that contain most of the desired signal. Anderson found that although some of the programs made an attempt at cross‐validation, all failed to meet the ‘gold standard’ of using truly independent data and therefore produced overly optimistic assessments of power of the selected set of loci—a phenomenon known as ‘high grading bias.’  相似文献   

17.
Mumps is an acute infectious childhood disease caused by mumps virus (MuV), a member of genus Rubu-lavirus, family Paramyxoviridae. Based on the genetic variability in small hydrophobic (SH) genes, currently MuVs have been divided into twelve confirmed genotypes designated as A-L and one proposed genotype, M. Despite successful vaccination program, a few genotypes are observed to co-circulate amongst vaccinated population. Furthermore, lack of cross protection between different genotypes is reported and hence, as a part of epidemiological surveillance, WHO has recommended genotyping of MuV. Currently genotyping is carried out using molecular phylogeny analysis (MPA) of SH genes and no genotyping server is available for MuV. The present study reports development of a genotyping server for the same, which employs three independent methods. The server uses two conventional methods viz., BLAST, MPA and a novel method based on Return Time Distribution (RTD), which is developed in-house. A server for genotyping of mumps virus is developed and made available at http://bioinfo.net.in/muv/homepage.html. RTD-based alignment-free method was initially developed for MPA and is applied for genotyping of MuV for the first time. It is found to have 98.95% of accuracy when measured using leave-one-out cross validation method on reference and test datasets. In addition to RTD, the server also imple-ments BLAST and MPA for genotyping of MuV. All the three methods were found to be highly reliable as evident from consensus predictions. A server for genotyping of MuV, which implements sequence-based bioinformatics approaches is developed and validated using SH gene sequences of known genotypes. This server will be useful for epidemiological surveillance and to monitor the circulation of MuV genotypes within and across geographic areas. This will also facilitate phylodynamics studies of mumps viruses.  相似文献   

18.
Ecological data often show temporal, spatial, hierarchical (random effects), or phylogenetic structure. Modern statistical approaches are increasingly accounting for such dependencies. However, when performing cross‐validation, these structures are regularly ignored, resulting in serious underestimation of predictive error. One cause for the poor performance of uncorrected (random) cross‐validation, noted often by modellers, are dependence structures in the data that persist as dependence structures in model residuals, violating the assumption of independence. Even more concerning, because often overlooked, is that structured data also provides ample opportunity for overfitting with non‐causal predictors. This problem can persist even if remedies such as autoregressive models, generalized least squares, or mixed models are used. Block cross‐validation, where data are split strategically rather than randomly, can address these issues. However, the blocking strategy must be carefully considered. Blocking in space, time, random effects or phylogenetic distance, while accounting for dependencies in the data, may also unwittingly induce extrapolations by restricting the ranges or combinations of predictor variables available for model training, thus overestimating interpolation errors. On the other hand, deliberate blocking in predictor space may also improve error estimates when extrapolation is the modelling goal. Here, we review the ecological literature on non‐random and blocked cross‐validation approaches. We also provide a series of simulations and case studies, in which we show that, for all instances tested, block cross‐validation is nearly universally more appropriate than random cross‐validation if the goal is predicting to new data or predictor space, or for selecting causal predictors. We recommend that block cross‐validation be used wherever dependence structures exist in a dataset, even if no correlation structure is visible in the fitted model residuals, or if the fitted models account for such correlations.  相似文献   

19.
Modeling plant growth using functional traits is important for understanding the mechanisms that underpin growth and for predicting new situations. We use three data sets on plant height over time and two validation methods—in‐sample model fit and leave‐one‐species‐out cross‐validation—to evaluate non‐linear growth model predictive performance based on functional traits. In‐sample measures of model fit differed substantially from out‐of‐sample model predictive performance; the best fitting models were rarely the best predictive models. Careful selection of predictor variables reduced the bias in parameter estimates, and there was no single best model across our three data sets. Testing and comparing multiple model forms is important. We developed an R package with a formula interface for straightforward fitting and validation of hierarchical, non‐linear growth models. Our intent is to encourage thorough testing of multiple growth model forms and an increased emphasis on assessing model fit relative to a model's purpose.  相似文献   

20.
Several market research studies have shown that consumers are primarily concerned with the provenance of the food they eat. Among the available identification methods, only DNA‐based techniques appear able to completely prevent frauds. In this study, a new method to discriminate among different bovine breeds and assign new individuals to groups was developed. Bulls of three cattle breeds farmed in Italy – Holstein, Brown, and Simmental – were genotyped using the 50K SNP Illumina BeadChip. Multivariate canonical discriminant analysis was used to discriminate among breeds, and discriminant analysis (DA) was used to assign new observations. This method was able to completely identify the three groups at chromosome level. Moreover, a genome‐wide analysis developed using 340 linearly independent SNPs yielded a significant separation among groups. Using the reduced set of markers, the DA was able to assign 30 independent individuals to the proper breed. Finally, a set of 48 high discriminant SNPs was selected and used to develop a new run of the analysis. Again, the procedure was able to significantly identify the three breeds and to correctly assign new observations. These results suggest that an assay with the selected 48 SNP could be used to routinely track monobreed products.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号