首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Canonical correlation analysis (CCA) describes the associations between two sets of variables by maximizing the correlation between linear combinations of the variables in each dataset. However, in high‐dimensional settings where the number of variables exceeds the sample size or when the variables are highly correlated, traditional CCA is no longer appropriate. This paper proposes a method for sparse CCA. Sparse estimation produces linear combinations of only a subset of variables from each dataset, thereby increasing the interpretability of the canonical variates. We consider the CCA problem from a predictive point of view and recast it into a regression framework. By combining an alternating regression approach together with a lasso penalty, we induce sparsity in the canonical vectors. We compare the performance with other sparse CCA techniques in different simulation settings and illustrate its usefulness on a genomic dataset.  相似文献   

2.
3.
Zhang X  Huang S  Sun W  Wang W 《Genetics》2012,190(4):1511-1520
Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. In a typical eQTL study, the huge number of genetic markers and expression traits and their complicated correlations present a challenging multiple-testing correction problem. The resampling-based test using permutation or bootstrap procedures is a standard approach to address the multiple-testing problem in eQTL studies. A brute force application of the resampling-based test to large-scale eQTL data sets is often computationally infeasible. Several computationally efficient methods have been proposed to calculate approximate resampling-based P-values. However, these methods rely on certain assumptions about the correlation structure of the genetic markers, which may not be valid for certain studies. We propose a novel algorithm, rapid and exact multiple testing correction by resampling (REM), to address this challenge. REM calculates the exact resampling-based P-values in a computationally efficient manner. The computational advantage of REM lies in its strategy of pruning the search space by skipping genetic markers whose upper bounds on test statistics are small. REM does not rely on any assumption about the correlation structure of the genetic markers. It can be applied to a variety of resampling-based multiple-testing correction methods including permutation and bootstrap methods. We evaluate REM on three eQTL data sets (yeast, inbred mouse, and human rare variants) and show that it achieves accurate resampling-based P-value estimation with much less computational cost than existing methods. The software is available at http://csbio.unc.edu/eQTL.  相似文献   

4.
The discovery of quantitative trait loci (QTL) in model organisms has relied heavily on the ability to perform controlled breeding to generate genotypic and phenotypic diversity. Recently, we and others have demonstrated the use of an existing set of diverse inbred mice (referred to here as the mouse diversity panel, MDP) as a QTL mapping population. The use of the MDP population has many advantages relative to traditional F(2) mapping populations, including increased phenotypic diversity, a higher recombination frequency, and the ability to collect genotype and phenotype data in community databases. However, these methods are complicated by population structure inherent in the MDP and the lack of an analytical framework to assess statistical power. To address these issues, we measured gene expression levels in hypothalamus across the MDP. We then mapped these phenotypes as quantitative traits with our association algorithm, resulting in a large set of expression QTL (eQTL). We utilized these eQTL, and specifically cis-eQTL, to develop a novel nonparametric method for association analysis in structured populations like the MDP. These eQTL data confirmed that the MDP is a suitable mapping population for QTL discovery and that eQTL results can serve as a gold standard for relative measures of statistical power.  相似文献   

5.
A statistical framework for expression quantitative trait loci mapping   总被引:1,自引:0,他引:1  
Chen M  Kendziorski C 《Genetics》2007,177(2):761-771
  相似文献   

6.
DNA sequence variation causes changes in gene expression, which in turn has profound effects on cellular states. These variations affect tissue development and may ultimately lead to pathological phenotypes. A genetic locus containing a sequence variation that affects gene expression is called an “expression quantitative trait locus” (eQTL). Whereas the impact of cellular context on expression levels in general is well established, a lot less is known about the cell-state specificity of eQTL. Previous studies differed with respect to how “dynamic eQTL” were defined. Here, we propose a unified framework distinguishing static, conditional and dynamic eQTL and suggest strategies for mapping these eQTL classes. Further, we introduce a new approach to simultaneously infer eQTL from different cell types. By using murine mRNA expression data from four stages of hematopoiesis and 14 related cellular traits, we demonstrate that static, conditional and dynamic eQTL, although derived from the same expression data, represent functionally distinct types of eQTL. While static eQTL affect generic cellular processes, non-static eQTL are more often involved in hematopoiesis and immune response. Our analysis revealed substantial effects of individual genetic variation on cell type-specific expression regulation. Among a total number of 3,941 eQTL we detected 2,729 static eQTL, 1,187 eQTL were conditionally active in one or several cell types, and 70 eQTL affected expression changes during cell type transitions. We also found evidence for feedback control mechanisms reverting the effect of an eQTL specifically in certain cell types. Loci correlated with hematological traits were enriched for conditional eQTL, thus, demonstrating the importance of conditional eQTL for understanding molecular mechanisms underlying physiological trait variation. The classification proposed here has the potential to streamline and unify future analysis of conditional and dynamic eQTL as well as many other kinds of QTL data.  相似文献   

7.
8.
9.
This article presents a novel algorithm that efficiently computes L1 penalized (lasso) estimates of parameters in high‐dimensional models. The lasso has the property that it simultaneously performs variable selection and shrinkage, which makes it very useful for finding interpretable prediction rules in high‐dimensional data. The new algorithm is based on a combination of gradient ascent optimization with the Newton–Raphson algorithm. It is described for a general likelihood function and can be applied in generalized linear models and other models with an L1 penalty. The algorithm is demonstrated in the Cox proportional hazards model, predicting survival of breast cancer patients using gene expression data, and its performance is compared with competing approaches. An R package, penalized , that implements the method, is available on CRAN.  相似文献   

10.
11.
12.
In optimizations the dimension of the problem may severely, sometimes exponentially increase optimization time. Parametric function approximatiors (FAPPs) have been suggested to overcome this problem. Here, a novel FAPP, cost component analysis (CCA) is described. In CCA, the search space is resampled according to the Boltzmann distribution generated by the energy landscape. That is, CCA converts the optimization problem to density estimation. Structure of the induced density is searched by independent component analysis (ICA). The advantage of CCA is that each independent ICA component can be optimized separately. In turn, (i) CCA intends to partition the original problem into subproblems and (ii) separating (partitioning) the original optimization problem into subproblems may serve interpretation. Most importantly, (iii) CCA may give rise to high gains in optimization time. Numerical simulations illustrate the working of the algorithm.  相似文献   

13.
We have recently shown that an energy penalty for the incorporation of residual tensorial constraints into molecular structure calculations can be formulated without the explicit knowledge of the Saupe orientation tensor (Moltke and Grzesiek, J. Biomol. NMR, 1999, 15, 77–82). Here we report the implementation of such an algorithm into the program X-PLOR. The new algorithm is easy to use and has good convergence properties. The algorithm is used for the structure refinement of the HIV-1 Nef protein using 252 dipolar coupling restraints. The approach is compared to the conventional penalty function with explicit knowledge of the orientation tensor's amplitude and rhombicity. No significant differences are found with respect to speed, Ramachandran core quality or coordinate precision.  相似文献   

14.
Using information from allele-specific gene expression (ASE) can improve the power to map gene expression quantitative trait loci (eQTLs). However, such practice has been limited, partly due to computational challenges and lack of clarification on the size of power gain or new findings besides improved power. We have developed geoP, a computationally efficient method to estimate permutation p-values, which makes it computationally feasible to perform eQTL mapping with ASE counts for large cohorts. We have applied geoP to map eQTLs in 28 human tissues using the data from the Genotype-Tissue Expression (GTEx) project. We demonstrate that using ASE data not only substantially improve the power to detect eQTLs, but also allow us to quantify individual-specific genetic effects, which can be used to study the variation of eQTL effect sizes with respect to other covariates. We also compared two popular methods for eQTL mapping with ASE: TReCASE and RASQUAL. TReCASE is ten times or more faster than RASQUAL and it provides more robust type I error control.  相似文献   

15.
A scaffold is a three-dimensional matrix that provides a structural base to fill tissue lesion and provides cells with a suitable environment for proliferation and differentiation. Cell-seeded scaffolds can be implanted immediately or be cultured in vitro for a period of time before implantation. To obtain uniform cell growth throughout the entire volume of the scaffolds, an optimal strategy on cell seeding into scaffolds is important. We propose an efficient and accurate numerical scheme for a mathematical model to predict the growth and distribution of cells in scaffolds. The proposed numerical algorithm is a hybrid method which uses both finite difference approximations and analytic closed-form solutions. The effects of each parameter in the mathematical model are numerically investigated. Moreover, we propose an optimization algorithm which finds the best set of model parameters that minimize a discrete l 2 error between numerical and experimental data. Using the mathematical model and its efficient and accurate numerical simulations, we could interpret experimental results and identify dominating mechanisms.  相似文献   

16.

Background

For treating a complex disease such as cancer, we need effective means to control the biological network that underlies the disease. However, biological networks are typically robust to external perturbations, making it difficult to beneficially alter the network dynamics by controlling a single target. In fact, multi-target therapeutics is often more effective compared to monotherapies, and combinatory drugs are commonly used these days for treating various diseases. A practical challenge in combination therapy is that the number of possible drug combinations increases exponentially, which makes the prediction of the optimal drug combination a difficult combinatorial optimization problem. Recently, a stochastic optimization algorithm called the Gur Game algorithm was proposed for drug optimization, which was shown to be very efficient in finding potent drug combinations.

Results

In this paper, we propose a novel stochastic optimization algorithm that can be used for effective optimization of combinatory drugs. The proposed algorithm analyzes how the concentration change of a specific drug affects the overall drug response, thereby making an informed guess on how the concentration should be updated to improve the drug response. We evaluated the performance of the proposed algorithm based on various drug response functions, and compared it with the Gur Game algorithm.

Conclusions

Numerical experiments clearly show that the proposed algorithm significantly outperforms the original Gur Game algorithm, in terms of reliability and efficiency. This enhanced optimization algorithm can provide an effective framework for identifying potent drug combinations that lead to optimal drug response.
  相似文献   

17.
Liu B  de la Fuente A  Hoeschele I 《Genetics》2008,178(3):1763-1776
Our goal is gene network inference in genetical genomics or systems genetics experiments. For species where sequence information is available, we first perform expression quantitative trait locus (eQTL) mapping by jointly utilizing cis-, cis-trans-, and trans-regulation. After using local structural models to identify regulator-target pairs for each eQTL, we construct an encompassing directed network (EDN) by assembling all retained regulator-target relationships. The EDN has nodes corresponding to expressed genes and eQTL and directed edges from eQTL to cis-regulated target genes, from cis-regulated genes to cis-trans-regulated target genes, from trans-regulator genes to target genes, and from trans-eQTL to target genes. For network inference within the strongly constrained search space defined by the EDN, we propose structural equation modeling (SEM), because it can model cyclic networks and the EDN indeed contains feedback relationships. On the basis of a factorization of the likelihood and the constrained search space, our SEM algorithm infers networks involving several hundred genes and eQTL. Structure inference is based on a penalized likelihood ratio and an adaptation of Occam's window model selection. The SEM algorithm was evaluated using data simulated with nonlinear ordinary differential equations and known cyclic network topologies and was applied to a real yeast data set.  相似文献   

18.
Wei Zou  Zhao-Bang Zeng 《Genetica》2009,137(2):125-134
To find the correlations between genome-wide gene expression variations and sequence polymorphisms in inbred cross populations, we developed a statistical method to claim expression quantitative trait loci (eQTL) in a genome. The method is based on multiple interval mapping (MIM), a model selection procedure, and uses false discovery rate (FDR) to measure the statistical significance of the large number of eQTL. We compared our method with a similar procedure proposed by Storey et al. and found that our method can be more powerful. We identified the features in the two methods that resulted in different statistical powers for eQTL detection, and confirmed them by simulation. We organized our computational procedure in an R package which can estimate FDR for positive findings from similar model selection procedures. The R package, MIM-eQTL, can be found at .  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号