首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Sufficient dimension reduction via bayesian mixture modeling   总被引:1,自引:0,他引:1  
Reich BJ  Bondell HD  Li L 《Biometrics》2011,67(3):886-895
Dimension reduction is central to an analysis of data with many predictors. Sufficient dimension reduction aims to identify the smallest possible number of linear combinations of the predictors, called the sufficient predictors, that retain all of the information in the predictors about the response distribution. In this article, we propose a Bayesian solution for sufficient dimension reduction. We directly model the response density in terms of the sufficient predictors using a finite mixture model. This approach is computationally efficient and offers a unified framework to handle categorical predictors, missing predictors, and Bayesian variable selection. We illustrate the method using both a simulation study and an analysis of an HIV data set.  相似文献   

2.
In data analysis using dimension reduction methods, the main goal is to summarize how the response is related to the covariates through a few linear combinations. One key issue is to determine the number of independent, relevant covariate combinations, which is the dimension of the sufficient dimension reduction (SDR) subspace. In this work, we propose an easily-applied approach to conduct inference for the dimension of the SDR subspace, based on augmentation of the covariate set with simulated pseudo-covariates. Applying the partitioning principal to the possible dimensions, we use rigorous sequential testing to select the dimensionality, by comparing the strength of the signal arising from the actual covariates to that appearing to arise from the pseudo-covariates. We show that under a “uniform direction” condition, our approach can be used in conjunction with several popular SDR methods, including sliced inverse regression. In these settings, the test statistic asymptotically follows a beta distribution and therefore is easily calibrated. Moreover, the family-wise type I error rate of our sequential testing is rigorously controlled. Simulation studies and an analysis of newborn anthropometric data demonstrate the robustness of the proposed approach, and indicate that the power is comparable to or greater than the alternatives.  相似文献   

3.
Lu W  Li L 《Biometrics》2011,67(2):513-523
Methodology of sufficient dimension reduction (SDR) has offered an effective means to facilitate regression analysis of high-dimensional data. When the response is censored, however, most existing SDR estimators cannot be applied, or require some restrictive conditions. In this article, we propose a new class of inverse censoring probability weighted SDR estimators for censored regressions. Moreover, regularization is introduced to achieve simultaneous variable selection and dimension reduction. Asymptotic properties and empirical performance of the proposed methods are examined.  相似文献   

4.
Dimension reduction in regression without matrix inversion   总被引:3,自引:0,他引:3  
Regressions in which the fixed number of predictors p exceedsthe number of independent observational units n occur in a varietyof scientific fields. Sufficient dimension reduction providesa promising approach to such problems, by restricting attentionto d < n linear combinations of the original ppredictors. However, standard methods of sufficient dimensionreduction require inversion of the sample predictor covariancematrix. We propose a method for estimating the central subspacethat eliminates the need for such inversion and is applicableregardless of the (n, p) relationship. Simulations show thatour method compares favourably with standard large sample techniqueswhen the latter are applicable. We illustrate our method witha genomics application.  相似文献   

5.
We introduce a non-parametric approach using bootstrap-assisted correspondence analysis to identify and validate genes that are differentially expressed in factorial microarray experiments. Model comparison showed that although both parametric and non-parametric methods capture the different profiles in the data, our method is less inclined to false positive results due to dimension reduction in data analysis.  相似文献   

6.
Zeng  Peng 《Biometrika》2008,95(2):469-479
The central subspace and central mean subspace are two importanttargets of sufficient dimension reduction. We propose a weightedchi-squared test to determine their dimensions based on matriceswhose column spaces are exactly equal to the central subspaceor the central mean subspace. The asymptotic distribution ofthe test statistic is obtained. Simulation examples are usedto demonstrate the performance of this test.  相似文献   

7.
8.
Yoo  Jae Keun; Cook  R. Dennis 《Biometrika》2007,94(1):231-242
The aim of this article is to develop optimal sufficient dimensionreduction methodology for the conditional mean in multivariateregression. The context is roughly the same as that of a relatedmethod by Cook & Setodji (2003), but the new method hasseveral advantages. It is asymptotically optimal in the sensedescribed herein and its test statistic for dimension alwayshas a chi-squared distribution asymptotically under the nullhypothesis. Additionally, the optimal method allows tests ofpredictor effects. A comparison of the two methods is provided.  相似文献   

9.
A model free approach to combining biomarkers   总被引:1,自引:0,他引:1  
For most diseases, single biomarkers do not have adequate sensitivity or specificity for practical purposes. We present an approach to combine several biomarkers into a composite marker score without assuming a model for the distribution of the predictors. Using sufficient dimension reduction techniques, we replace the original markers with a lower-dimensional version, obtained through linear transformations of markers that contain sufficient information for regression of the predictors on the outcome. We combine the linear transformations using their asymptotic properties into a scalar diagnostic score via the likelihood ratio statistic. The performance of this score is assessed by the area under the receiver-operator characteristics curve (ROC), a popular summary measure of the discriminatory ability of a single continuous diagnostic marker for binary disease outcomes. An asymptotic chi-squared test for assessing individual biomarker contribution to the diagnostic score is also derived.  相似文献   

10.
Wenjing Wang  Xin Zhang  Lexin Li 《Biometrics》2019,75(4):1109-1120
Motivated by brain connectivity analysis and many other network data applications, we study the problem of estimating covariance and precision matrices and their differences across multiple populations. We propose a common reducing subspace model that leads to substantial dimension reduction and efficient parameter estimation. We explicitly quantify the efficiency gain through an asymptotic analysis. Our method is built upon and further extends a nascent technique, the envelope model, which adopts a generalized sparsity principle. This distinguishes our proposal from most xisting covariance and precision estimation methods that assume element‐wise sparsity. Moreover, unlike most existing solutions, our method can naturally handle both covariance and precision matrices in a unified way, and work with matrix‐valued data. We demonstrate the efficacy of our method through intensive simulations, and illustrate the method with an autism spectrum disorder data analysis.  相似文献   

11.
We propose a new method for selection of the most informative variables from the set of variables which can be measured directly. The information is measured by metrics similar to those used in experimental design theory, such as determinant of the dispersion matrix of prediction or various functions of its eigenvalues. The basic model admits both population variability and observational errors, which allows us to introduce algorithms based on ideas of optimal experimental design. Moreover, we can take into account cost of measuring various variables which makes the approach more practical. It is shown that the selection of optimal subsets of variables is invariant to scale transformations unlike other methods of dimension reduction, such as principal components analysis or methods based on direct selection of variables, for instance principal variables and battery reduction. The performance of different approaches is compared using the clinical data.  相似文献   

12.
基于流形学习的基因表达谱数据可视化   总被引:2,自引:0,他引:2  
基因表达谱的可视化本质上是高维数据的降维问题。采用流形学习算法来解决基因表达谱的降维数据可视化,讨论了典型的流形学习算法(Isomap和LLE)在表达谱降维中的适用性。通过类内/类间距离定量评价数据降维的效果,对两个典型基因芯片数据集(结肠癌基因表达谱数据集和急性白血病基因表达谱数据集)进行降维分析,发现两个数据集的本征维数都低于3,因而可以用流形学习方法在低维投影空间中进行可视化。与传统的降维方法(如PCA和MDS)的投影结果作比较,显示Isomap流形学习方法有更好的可视化效果。  相似文献   

13.
Environmental DNA (eDNA) metabarcoding provides an efficient approach for documenting biodiversity patterns in marine and terrestrial ecosystems. The complexity of these data prevents current methods from extracting and analyzing all the relevant ecological information they contain, and new methods may provide better dimensionality reduction and clustering. Here we present two new deep learning-based methods that combine different types of neural networks (NNs) to ordinate eDNA samples and visualize ecosystem properties in a two-dimensional space: the first is based on variational autoencoders and the second on deep metric learning. The strength of our new methods lies in the combination of two inputs: the number of sequences found for each molecular operational taxonomic unit (MOTU) detected and their corresponding nucleotide sequence. Using three different datasets, we show that our methods accurately represent several biodiversity indicators in a two-dimensional latent space: MOTU richness per sample, sequence α-diversity per sample, Jaccard's and sequence β-diversity between samples. We show that our nonlinear methods are better at extracting features from eDNA datasets while avoiding the major biases associated with eDNA. Our methods outperform traditional dimension reduction methods such as Principal Component Analysis, t-distributed Stochastic Neighbour Embedding, Nonmetric Multidimensional Scaling and Uniform Manifold Approximation and Projection for dimension reduction. Our results suggest that NNs provide a more efficient way of extracting structure from eDNA metabarcoding data, thereby improving their ecological interpretation and thus biodiversity monitoring.  相似文献   

14.
The fractal dimension D may be calculated in many ways, since its strict definition, the Hausdorff definition is too complicated for practical estimation. In this paper we perform a comparative study often methods of fractal analysis of time series. In Benoit, a commercial program for fractal analysis, five methods of computing fractal dimension of time series (rescaled range analysis, power spectral analysis, roughness-length, variogram methods and wavelet method) are available. We have implemented some other algorithms for calculating D: Higuchi's fractal dimension, relative dispersion analysis, running fractal dimension, method based on mathematical morphology and method based on intensity differences. For biomedical signals results obtained by means of different algorithms are different, but consistent.  相似文献   

15.
Extensions to gene set enrichment   总被引:2,自引:0,他引:2  
MOTIVATION: Gene Set Enrichment Analysis (GSEA) has been developed recently to capture changes in the expression of pre-defined sets of genes. We propose number of extensions to GSEA, including the use of different statistics to describe the association between genes and phenotypes of interest. We make use of dimension reduction procedures, such as principle component analysis, to identify gene sets with correlated expression. We also address issues that arise when gene sets overlap. RESULTS: Our proposals extend the range of applicability of GSEA and allow for adjustments based on other covariates. We have provided a well-defined procedure to address interpretation issues that can raise when gene sets have substantial overlap. We have shown how standard dimension reduction methods, such as PCA, can be used to help further interpret GSEA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

16.
We present a novel method for finding low-dimensional views of high-dimensional data: Targeted Projection Pursuit. The method proceeds by finding projections of the data that best approximate a target view. Two versions of the method are introduced; one version based on Procrustes analysis and one based on an artificial neural network. These versions are capable of finding orthogonal or non-orthogonal projections, respectively. The method is quantitatively and qualitatively compared with other dimension reduction techniques. It is shown to find 2D views that display the classification of cancers from gene expression data with a visual separation equal to, or better than, existing dimension reduction techniques. AVAILABILITY: source code, additional diagrams, and original data are available from http://computing.unn.ac.uk/staff/CGJF1/tpp/bioinf.html  相似文献   

17.
In functional genomics it is more rule than exception that experimental designs are used to generate the data. The samples of the resulting data sets are thus organized according to this design and for each sample many biochemical compounds are measured, e.g. typically thousands of gene-expressions or hundreds of metabolites. This results in high-dimensional data sets with an underlying experimental design. Several methods have recently become available for analyzing such data while utilizing the underlying design. We review these methods by putting them in a unifying and general framework to facilitate understanding the (dis-)similarities between the methods. The biological question dictates which method to use and the framework allows for building new methods to accommodate a range of such biological questions. The framework is built on well known fixed-effect ANOVA models and subsequent dimension reduction. We present the framework both in matrix algebra as well as in more insightful geometrical terms. We show the workings of the different special cases of our framework with a real-life metabolomics example from nutritional research and a gene-expression example from the field of virology.  相似文献   

18.
A neural network has been used to reduce the dimensionality of multivariate data sets to produce two-dimensional (2D) displays of these sets. The data consisted of physicochemical properties for sets of biologically active molecules calculated by computational chemistry methods. Previous work has demonstrated that these data contain sufficient relevant information to classify the compounds according to their biological activity. The plots produced by the neural network are compared with results from two other techniques for linear and nonlinear dimension reduction, and are shown to give comparable and, in one case, superior results. Advantages of this technique are discussed.  相似文献   

19.
Robust PCA and classification in biosciences   总被引:7,自引:0,他引:7  
MOTIVATION: Principal components analysis (PCA) is a very popular dimension reduction technique that is widely used as a first step in the analysis of high-dimensional microarray data. However, the classical approach that is based on the mean and the sample covariance matrix of the data is very sensitive to outliers. Also, classification methods based on this covariance matrix do not give good results in the presence of outlying measurements. RESULTS: First, we propose a robust PCA (ROBPCA) method for high-dimensional data. It combines projection-pursuit ideas with robust estimation of low-dimensional data. We also propose a diagnostic plot to display and classify the outliers. This ROBPCA method is applied to several bio-chemical datasets. In one example, we also apply a robust discriminant method on the scores obtained with ROBPCA. We show that this combination of robust methods leads to better classifications than classical PCA and quadratic discriminant analysis. AVAILABILITY: All the programs are part of the Matlab Toolbox for Robust Calibration, available at http://www.wis.kuleuven.ac.be/stat/robust.html.  相似文献   

20.
NMR spectroscopy is central to atomic resolution studies in biology and chemistry. Key to this approach are multidimensional experiments. Obtaining such experiments with sufficient resolution, however, is a slow process, in part since each time increment in every indirect dimension needs to be recorded twice, in quadrature. We introduce a modified compressed sensing (CS) algorithm enabling reconstruction of data acquired with random acquisition of quadrature components in gradient-selection NMR. We name this approach random quadrature detection (RQD). Gradient-selection experiments are essential to the success of modern NMR and with RQD, a 50 % reduction in the number of data points per indirect dimension is possible, by only acquiring one quadrature component per time point. Using our algorithm (CSRQD), high quality reconstructions are achieved. RQD is modular and combined with non-uniform sampling we show that this provides increased flexibility in designing sampling schedules leading to improved resolution with increasing benefits as dimensionality of experiments increases, with particular advantages for 4- and higher dimensional experiments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号