首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
MOTIVATION: Discriminant analysis for high-dimensional and low-sample-sized data has become a hot research topic in bioinformatics, mainly motivated by its importance and challenge in applications to tumor classifications for high-dimensional microarray data. Two of the popular methods are the nearest shrunken centroids, also called predictive analysis of microarray (PAM), and shrunken centroids regularized discriminant analysis (SCRDA). Both methods are modifications to the classic linear discriminant analysis (LDA) in two aspects tailored to high-dimensional and low-sample-sized data: one is the regularization of the covariance matrix, and the other is variable selection through shrinkage. In spite of their usefulness, there are potential limitations with each method. The main concern is that both PAM and SCRDA are possibly too extreme: the covariance matrix in the former is restricted to be diagonal while in the latter there is barely any restriction. Based on the biology of gene functions and given the feature of the data, it may be beneficial to estimate the covariance matrix as an intermediate between the two; furthermore, more effective shrinkage schemes may be possible. RESULTS: We propose modified LDA methods to integrate biological knowledge of gene functions (or variable groups) into classification of microarray data. Instead of simply treating all the genes independently or imposing no restriction on the correlations among the genes, we group the genes according to their biological functions extracted from existing biological knowledge or data, and propose regularized covariance estimators that encourages between-group gene independence and within-group gene correlations while maintaining the flexibility of any general covariance structure. Furthermore, we propose a shrinkage scheme on groups of genes that tends to retain or remove a whole group of the genes altogether, in contrast to the standard shrinkage on individual genes. We show that one of the proposed methods performed better than PAM and SCRDA in a simulation study and several real data examples.  相似文献   

2.
Incorporating prior information into the analysis of contingency tables   总被引:1,自引:0,他引:1  
M W Knuiman  T P Speed 《Biometrics》1988,44(4):1061-1071
Contingency tables are often analyzed using log-linear models and in some situations prior information on the value of parameters in the log-linear model is available. In this article we describe a prior-posterior procedure that incorporates prior information directly into the analysis through a multivariate normal prior for the log-linear parameters. The mode and curvature of the posterior density are proposed as summary statistics.  相似文献   

3.
Stable isotopes are a powerful tool for ecologists, often used to assess contributions of different sources to a mixture (e.g. prey to a consumer). Mixing models use stable isotope data to estimate the contribution of sources to a mixture. Uncertainty associated with mixing models is often substantial, but has not yet been fully incorporated in models. We developed a Bayesian-mixing model that estimates probability distributions of source contributions to a mixture while explicitly accounting for uncertainty associated with multiple sources, fractionation and isotope signatures. This model also allows for optional incorporation of informative prior information in analyses. We demonstrate our model using a predator–prey case study. Accounting for uncertainty in mixing model inputs can change the variability, magnitude and rank order of estimates of prey (source) contributions to the predator (mixture). Isotope mixing models need to fully account for uncertainty in order to accurately estimate source contributions.  相似文献   

4.
In classification, prior knowledge is incorporated in a Bayesian framework by assuming that the feature-label distribution belongs to an uncertainty class of feature-label distributions governed by a prior distribution. A posterior distribution is then derived from the prior and the sample data. An optimal Bayesian classifier (OBC) minimizes the expected misclassification error relative to the posterior distribution. From an application perspective, prior construction is critical. The prior distribution is formed by mapping a set of mathematical relations among the features and labels, the prior knowledge, into a distribution governing the probability mass across the uncertainty class. In this paper, we consider prior knowledge in the form of stochastic differential equations (SDEs). We consider a vector SDE in integral form involving a drift vector and dispersion matrix. Having constructed the prior, we develop the optimal Bayesian classifier between two models and examine, via synthetic experiments, the effects of uncertainty in the drift vector and dispersion matrix. We apply the theory to a set of SDEs for the purpose of differentiating the evolutionary history between two species.  相似文献   

5.
In randomized studies with missing outcomes, non-identifiable assumptions are required to hold for valid data analysis. As a result, statisticians have been advocating the use of sensitivity analysis to evaluate the effect of varying assumptions on study conclusions. While this approach may be useful in assessing the sensitivity of treatment comparisons to missing data assumptions, it may be dissatisfying to some researchers/decision makers because a single summary is not provided. In this paper, we present a fully Bayesian methodology that allows the investigator to draw a 'single' conclusion by formally incorporating prior beliefs about non-identifiable, yet interpretable, selection bias parameters. Our Bayesian model provides robustness to prior specification of the distributional form of the continuous outcomes.  相似文献   

6.
MOTIVATION: Because co-expressed genes are likely to share the same biological function, cluster analysis of gene expression profiles has been applied for gene function discovery. Most existing clustering methods ignore known gene functions in the process of clustering. RESULTS: To take advantage of accumulating gene functional annotations, we propose incorporating known gene functions into a new distance metric, which shrinks a gene expression-based distance towards 0 if and only if the two genes share a common gene function. A two-step procedure is used. First, the shrinkage distance metric is used in any distance-based clustering method, e.g. K-medoids or hierarchical clustering, to cluster the genes with known functions. Second, while keeping the clustering results from the first step for the genes with known functions, the expression-based distance metric is used to cluster the remaining genes of unknown function, assigning each of them to either one of the clusters obtained in the first step or some new clusters. A simulation study and an application to gene function prediction for the yeast demonstrate the advantage of our proposal over the standard method.  相似文献   

7.
The success of pro-active management of invasive plants depends on the ability to rapidly detect invasive populations and individuals. However, the factors important for detection depend on the spatial scale examined. We propose a protocol for developing risk maps at national, landscape, and local scales to improve detection rates of invasive plant species. We test this approach in the context of developing an eradication plan for the invasive tree Acacia stricta in South Africa. At a national scale we used bioclimatic models coupled with the most likely sites of introduction (i.e. forestry nursery plantations) to identify areas where national-scale surveillance should be focussed. At the landscape and local scales we correlated the presence of A. stricta populations to various attributes. Regional populations were found in forestry plantations only, and mostly on highly used graded roads along which seeds are spread by road maintenance vehicles. Locally, previously recorded plant localities accurately predicted individuals in subsequent surveys. Using these variables, we produced a map of high-risk areas that facilitated targeted searches—which reduced the required search effort by ca. 83 %—and developed recommendations for site-specific surveying. With the high visibility of plants, and relatively small seed banks, long-term annual clearing should achieve eradication. We propose that such multi-scale risk mapping is valuable for prioritising management and surveillance efforts, though caution that the approach is correlative and so it does not represent all the sites that can be invaded.  相似文献   

8.
We provide a review of multicriteria decision-making (MCDM) methods that may potentially be used during systematic conservation planning for the design of conservation area networks (CANs). We review 26 methods and present the core ideas of 19 of them. We suggest that the computation of the non-dominated set (NDS) be the first stage of any such analysis. This process requires only that alternatives be qualitatively ordered by each criterion. If the criteria can also be similarly ordered, at the next stage, Regime is the most appropriate method to refine the NDS. If the alternatives can also be given quantitative values by the criteria, Goal Programming will prove useful in many contexts. If both the alternatives and the criteria can be quantitatively evaluated, and the criteria are independent of each other but may be compounded, then multi-attribute value theory (MAVT) should be used (with preferences conveniently elicited by a modified Analytic Hierarchy Process (mAHP) provided that the number of criteria is not large).  相似文献   

9.
10.
11.
During the past thirty years, natural selection due to predation has been investigated with regard to prey motion in three areas that are relevant to the evolution of mimicry: (1) anti-apostatic selection, (2) locomotor mimicry, and (3) escape mimicry. Anti-apostatic selection, or selection against the odd individuals, arises when prey are at very high densities or when prey are Müllerian mimics. When prey are at high densities, motion of the prey increases selection against odd individuals. When the prey are Müllerian mimics, motion may also play an important role in strengthening selection against odd individuals. This may explain locomotor mimicry between Müllerian mimics. Locomotor mimicry arises when two distantly-related prey species appear alike in behaviour, and there is a corresponding suite of morphological, physiological, and biomechanical traits that the prey have in common. Locomotor mimicry has been demonstrated in Müllerian mimics. It is also predicted to occur in Batesian mimics but with important limitations due to selection by the predator for the prey to maintain the ability to escape if detected. Locomotor mimicry may also occur between palatable species that are alike as a result of unprofitable prey (or escape) mimicry. Escape mimicry arises when prey are difficult to capture. By frustration learning, the predator associates the colour of the prey with unprofitability. In all three instances, dis-similarity in colour or motion probably increases selection against the odd individual. In addition, the interaction of colour and motion gives rise to greater reliability of the signals to a specialist predator. However for a generalist predator, multiple component signals of the prey lead to errors in signal perception and greater risk of cheating. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

12.
The study of nematode genomes over the last three decades has relied heavily on the model organism Caenorhabditis elegans, which remains the best-assembled and annotated metazoan genome. This is now changing as a rapidly expanding number of nematodes of medical and economic importance have been sequenced in recent years. The advent of sequencing technologies to achieve the equivalent of the $1000 human genome promises that every nematode genome of interest will eventually be sequenced at a reasonable cost. As the sequencing of species spanning the nematode phylum becomes a routine part of characterizing nematodes, the comparative approach and the increasing use of ecological context will help us to further understand the evolution and functional specializations of any given species by comparing its genome to that of other closely and more distantly related nematodes. We review the current state of nematode genomics and discuss some of the highlights that these genomes have revealed and the trend and benefits of ecological genomics, emphasizing the potential for new genomes and the exciting opportunities this provides for nematological studies.  相似文献   

13.
Bayesian methods allow borrowing of historical information through prior distributions. The concept of prior effective sample size (prior ESS) facilitates quantification and communication of such prior information by equating it to a sample size. Prior information can arise from historical observations; thus, the traditional approach identifies the ESS with such a historical sample size. However, this measure is independent of newly observed data, and thus would not capture an actual “loss of information” induced by the prior in case of prior-data conflict. We build on a recent work to relate prior impact to the number of (virtual) samples from the current data model and introduce the effective current sample size (ECSS) of a prior, tailored to the application in Bayesian clinical trial designs. Special emphasis is put on robust mixture, power, and commensurate priors. We apply the approach to an adaptive design in which the number of recruited patients is adjusted depending on the effective sample size at an interim analysis. We argue that the ECSS is the appropriate measure in this case, as the aim is to save current (as opposed to historical) patients from recruitment. Furthermore, the ECSS can help overcome lack of consensus in the ESS assessment of mixture priors and can, more broadly, provide further insights into the impact of priors. An R package accompanies the paper.  相似文献   

14.
15.
16.
Researchers interested in the association of a predictor with an outcome will often collect information about that predictor from more than one source. Standard multiple regression methods allow estimation of the effect of each predictor on the outcome while controlling for the remaining predictors. The resulting regression coefficient for each predictor has an interpretation that is conditional on all other predictors. In settings in which interest is in comparison of the marginal pairwise relationships between each predictor and the outcome separately (e.g., studies in psychiatry with multiple informants or comparison of the predictive values of diagnostic tests), standard regression methods are not appropriate. Instead, the generalized estimating equations (GEE) approach can be used to simultaneously estimate, and make comparisons among, the separate pairwise marginal associations. In this paper, we consider maximum likelihood (ML) estimation of these marginal relationships when the outcome is binary. ML enjoys benefits over GEE methods in that it is asymptotically efficient, can accommodate missing data that are ignorable, and allows likelihood-based inferences about the pairwise marginal relationships. We also explore the asymptotic relative efficiency of ML and GEE methods in this setting.  相似文献   

17.
Predicting clinical variables from whole‐brain neuroimages is a high‐dimensional problem that can potentially benefit from feature selection or extraction. Penalized regression is a popular embedded feature selection method for high‐dimensional data. For neuroimaging applications, spatial regularization using the or norm of the image gradient has shown good performance, yielding smooth solutions in spatially contiguous brain regions. Enormous resources have been devoted to establishing structural and functional brain connectivity networks that can be used to define spatially distributed yet related groups of voxels. We propose using the fused sparse group lasso (FSGL) penalty to encourage structured, sparse, and interpretable solutions by incorporating prior information about spatial and group structure among voxels. We present optimization steps for FSGL penalized regression using the alternating direction method of multipliers algorithm. With simulation studies and in application to real functional magnetic resonance imaging data from the Autism Brain Imaging Data Exchange, we demonstrate conditions under which fusion and group penalty terms together outperform either of them alone.  相似文献   

18.
19.

Background

Reconstruction of protein-protein interaction or metabolic networks based on expression data often involves in silico predictions, while on the other hand, there are unspecific networks of in vivo interactions derived from knowledge bases.We analyze networks designed to come as close as possible to data measured in vivo, both with respect to the set of nodes which were taken to be expressed in experiment as well as with respect to the interactions between them which were taken from manually curated databases

Results

A signaling network derived from the TRANSPATH database and a metabolic network derived from KEGG LIGAND are each filtered onto expression data from breast cancer (SAGE) considering different levels of restrictiveness in edge and vertex selection.We perform several validation steps, in particular we define pathway over-representation tests based on refined null models to recover functional modules. The prominent role of the spindle checkpoint-related pathways in breast cancer is exhibited. High-ranking key nodes cluster in functional groups retrieved from literature. Results are consistent between several functional and topological analyses and between signaling and metabolic aspects.

Conclusions

This construction involved as a crucial step the passage to a mammalian protein identifier format as well as to a reaction-based semantics of metabolism. This yielded good connectivity but also led to the need to perform benchmark tests to exclude loss of essential information. Such validation, albeit tedious due to limitations of existing methods, turned out to be informative, and in particular provided biological insights as well as information on the degrees of coherence of the networks despite fragmentation of experimental data.Key node analysis exploited the networks for potentially interesting proteins in view of drug target prediction.
  相似文献   

20.
In the assessment of clinical utility of biomarkers, case-control studies are often undertaken based on existing serum samples. A common assumption made in these studies is that higher levels of the biomarker are associated with increased disease risk. In this article, we consider methods of analysis in which monotonicity is incorporated in associating the biomarker and the clinical outcome. We consider the roles of discrimination versus association and assess methods for both goals. In addition, we propose a semiparametric isotonic regression model for binary data and describe a simple estimation procedure as well as attendant inferential procedures. We apply the various methodologies to data from a prostate cancer study involving a serum biomarker.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号