首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Ye C  Cui Y  Wei C  Elston RC  Zhu J  Lu Q 《Human heredity》2011,71(3):161-170
  相似文献   

2.
Fesel C 《PloS one》2012,7(3):e33990
Many multifactorial biologic effects, particularly in the context of complex human diseases, are still poorly understood. At the same time, the systematic acquisition of multivariate data has become increasingly easy. The use of such data to analyze and model complex phenotypes, however, remains a challenge. Here, a new analytic approach is described, termed coreferentiality, together with an appropriate statistical test. Coreferentiality is the indirect relation of two variables of functional interest in respect to whether they parallel each other in their respective relatedness to multivariate reference data, which can be informative for a complex effect or phenotype. It is shown that the power of coreferentiality testing is comparable to multiple regression analysis, sufficient even when reference data are informative only to a relatively small extent of 2.5%, and clearly exceeding the power of simple bivariate correlation testing. Thus, coreferentiality testing uses the increased power of multivariate analysis, however, in order to address a more straightforward interpretable bivariate relatedness. Systematic application of this approach could substantially improve the analysis and modeling of complex phenotypes, particularly in the context of human study where addressing functional hypotheses by direct experimentation is often difficult.  相似文献   

3.
Wang L  Zhou J  Qu A 《Biometrics》2012,68(2):353-360
We consider the penalized generalized estimating equations (GEEs) for analyzing longitudinal data with high-dimensional covariates, which often arise in microarray experiments and large-scale health studies. Existing high-dimensional regression procedures often assume independent data and rely on the likelihood function. Construction of a feasible joint likelihood function for high-dimensional longitudinal data is challenging, particularly for correlated discrete outcome data. The penalized GEE procedure only requires specifying the first two marginal moments and a working correlation structure. We establish the asymptotic theory in a high-dimensional framework where the number of covariates p(n) increases as the number of clusters n increases, and p(n) can reach the same order as n. One important feature of the new procedure is that the consistency of model selection holds even if the working correlation structure is misspecified. We evaluate the performance of the proposed method using Monte Carlo simulations and demonstrate its application using a yeast cell-cycle gene expression data set.  相似文献   

4.
Medicago truncatula has become a model system to study legume biology. It is imperative that detailed growth characteristics of the most commonly used cultivar, line A17 cv Jemalong, be documented. Such analysis creates a basis to analyze phenotypic alterations due to genetic lesions or environmental stress and is essential to characterize gene function and its relationship to morphological development. We have documented morphological development of M. truncatula to characterize its temporal developmental growth pattern; developed a numerical nomenclature coding system that identifies stages in morphological development; tested the coding system to identify phenotypic differences under phosphorus (P) and nitrogen (N) deprivation; and created visual models using the L-system formalism. The numerical nomenclature coding system, based on a series of defined growth units, represents incremental steps in morphological development. Included is a decimal component dividing growth units into nine substages. A measurement component helps distinguish alterations that may be missed by the coding system. Growth under N and P deprivation produced morphological alterations that were distinguishable using the coding system and its measurement component. N and P deprivation resulted in delayed leaf development and expansion, delayed axillary shoot emergence and elongation, decreased leaf and shoot size, and altered root growth. Timing and frequency of flower emergence in P-deprived plants was affected. This numerical coding system may be used as a standardized method to analyze phenotypic variation in M. truncatula due to nutrient stress, genetic lesions, or other factors and should allow valid growth comparisons across geographically distant laboratories.  相似文献   

5.
Extracting features from high-dimensional data is a critically important task for pattern recognition and machine learning applications. High-dimensional data typically have much more variables than observations, and contain significant noise, missing components, or outliers. Features extracted from high-dimensional data need to be discriminative, sparse, and can capture essential characteristics of the data. In this paper, we present a way to constructing multivariate features and then classify the data into proper classes. The resulting small subset of features is nearly the best in the sense of Greenshtein's persistence; however, the estimated feature weights may be biased. We take a systematic approach for correcting the biases. We use conjugate gradient-based primal-dual interior-point techniques for large-scale problems. We apply our procedure to microarray gene analysis. The effectiveness of our method is confirmed by experimental results.  相似文献   

6.
A modification of the principal component test is presented. It uses a weighted combination of the sums of squares for different principal components and is thus more powerful in high-dimensional settings with small sample sizes. Under usual normality assumptions, a rotation test is proposed which enables an exact conditional parametric test. The procedure is demonstrated with microarray data for the bacterial composition in the rhizosphere of different potato cultivars. In simulation studies, the power of the proposed statistic is compared with the competing multivariate parametric tests.  相似文献   

7.
8.
Recent interest in cancer research focuses on predicting patients' survival by investigating gene expression profiles based on microarray analysis. We propose a doubly penalized Buckley-James method for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses the elastic-net penalty that is a mixture of L1- and L2-norm penalties. Similar to the elastic-net method for a linear regression model with uncensored data, the proposed method performs automatic gene selection and parameter estimation, where highly correlated genes are able to be selected (or removed) together. The two-dimensional tuning parameter is determined by generalized crossvalidation. The proposed method is evaluated by simulations and applied to the Michigan squamous cell lung carcinoma study.  相似文献   

9.

Background  

Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes.  相似文献   

10.

Background  

The goal of class prediction studies is to develop rules to accurately predict the class membership of new samples. The rules are derived using the values of the variables available for each subject: the main characteristic of high-dimensional data is that the number of variables greatly exceeds the number of samples. Frequently the classifiers are developed using class-imbalanced data, i.e., data sets where the number of samples in each class is not equal. Standard classification methods used on class-imbalanced data often produce classifiers that do not accurately predict the minority class; the prediction is biased towards the majority class. In this paper we investigate if the high-dimensionality poses additional challenges when dealing with class-imbalanced prediction. We evaluate the performance of six types of classifiers on class-imbalanced data, using simulated data and a publicly available data set from a breast cancer gene-expression microarray study. We also investigate the effectiveness of some strategies that are available to overcome the effect of class imbalance.  相似文献   

11.
The experimental meaning of the phenomenological differential equations for a competing population is reviewed. It is concluded that it is virtually impossible to construct the differential equations precise enough for studying stability. We consider instead a method of phenomenological analysis which can be applied to a set of population curves. We suggest an ecological index calculated from the population curves, which indicates a group property of the entire system. As a function of time, the index is presumably insensitive to Volterra type fluctuations. A marked increase of the index's value however indicates a marked change of the environmental conditions. It is not easy to deduce the group property from the population curves alone, because a change in population is in general due to the superposition of external disturbances and Volterra fluctuations.  相似文献   

12.
The Tat protein export system serves to export folded proteins harboring an N-terminal twin arginine signal peptide across the cytoplasmic membrane. In this study, we have used gene expression profiling of Escherichia coli supported by phenotypic analysis to investigate how cells respond to a defect in the Tat pathway. Previous work has demonstrated that strains mutated in genes encoding essential Tat pathway components are defective in the integrity of their cell envelope because of the mislocalization of two amidases involved in cell wall metabolism (Ize, B., Stanley, N. R., Buchanan, G., and Palmer, T. (2003) Mol. Microbiol. 48, 1183-1193). To distinguish between genes that are differentially expressed specifically because of the cell envelope defect and those that result from other effects of the tatC deletion, we also analyzed two different transposon mutants of the DeltatatC strain that have their outer membrane integrity restored. Approximately 50% of the genes that were differentially expressed in the tatC mutant are linked to the envelope defect, with the products of many of these genes involved in self-defense or protection mechanisms, including the production of exopolysaccharide. Among the changes that were not explicitly linked to envelope integrity, we characterized a role for the Tat system in iron acquisition and copper homeostasis. Finally, we have demonstrated that overproduction of the Tat substrate SufI saturates the Tat translocon and produces effects on global gene expression that are similar to those resulting from the DeltatatC mutation.  相似文献   

13.
Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be 'kernelized'. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, depending only on a small fraction of the training data. We propose two methods. One is based on a geometric idea, where-akin to support vector classification-the margin between the failed observation and the observations currently at risk is maximised. The other approach is based on obtaining a sparse model by adding observations one after another akin to the Import Vector Machine (IVM). Data examples studied suggest that both methods can outperform competing approaches. AVAILABILITY: Software is available under the GNU Public License as an R package and can be obtained from the first author's website http://www.maths.bris.ac.uk/~maxle/software.html.  相似文献   

14.
15.
In high-throughput -omics studies, markers identified from analysis of single data sets often suffer from a lack of reproducibility because of sample limitation. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. Integrative analysis of multiple -omics data sets is challenging because of the high dimensionality of data and heterogeneity among studies. In this article, for marker selection in integrative analysis of data from multiple heterogeneous studies, we propose a 2-norm group bridge penalization approach. This approach can effectively identify markers with consistent effects across multiple studies and accommodate the heterogeneity among studies. We propose an efficient computational algorithm and establish the asymptotic consistency property. Simulations and applications in cancer profiling studies show satisfactory performance of the proposed approach.  相似文献   

16.
Kollmus  Heike  Fuchs  Helmut  Lengger  Christoph  Haselimashhadi  Hamed  Bogue  Molly A.  &#;stereicher  Manuela A.  Horsch  Marion  Adler  Thure  Aguilar-Pimentel  Juan Antonio  Amarie  Oana Veronica  Becker  Lore  Beckers  Johannes  Calzada-Wack  Julia  Garrett  Lillian  Hans  Wolfgang  H&#;lter  Sabine M.  Klein-Rodewald  Tanja  Maier  Holger  Mayer-Kuckuk  Philipp  Miller  Gregor  Moreth  Kristin  Neff  Frauke  Rathkolb  Birgit  R&#;cz  Ildik&#;  Rozman  Jan  Spielmann  Nadine  Treise  Irina  Busch  Dirk  Graw  Jochen  Klopstock  Thomas  Wolf  Eckhard  Wurst  Wolfgang  Yildirim  Ali &#;nder  Mason  Jeremy  Torres  Arturo  Balling  Rudi  Mehaan  Terry  Gailus-Durner  Valerie  Schughart  Klaus  Hrab&#; de Angelis  Martin 《Mammalian genome》2020,31(1):30-48
Mammalian Genome - The collaborative cross (CC) is a large panel of mouse-inbred lines derived from eight founder strains (NOD/ShiLtJ, NZO/HILtJ, A/J, C57BL/6J, 129S1/SvImJ, CAST/EiJ, PWK/PhJ, and...  相似文献   

17.
18.
19.
High-dimensional biomarker data are often collected in epidemiological studies when assessing the association between biomarkers and human disease is of interest. We develop a latent class modeling approach for joint analysis of high-dimensional semicontinuous biomarker data and a binary disease outcome. To model the relationship between complex biomarker expression patterns and disease risk, we use latent risk classes to link the 2 modeling components. We characterize complex biomarker-specific differences through biomarker-specific random effects, so that different biomarkers can have different baseline (low-risk) values as well as different between-class differences. The proposed approach also accommodates data features that are common in environmental toxicology and other biomarker exposure data, including a large number of biomarkers, numerous zero values, and complex mean-variance relationship in the biomarkers levels. A Monte Carlo EM (MCEM) algorithm is proposed for parameter estimation. Both the MCEM algorithm and model selection procedures are shown to work well in simulations and applications. In applying the proposed approach to an epidemiological study that examined the relationship between environmental polychlorinated biphenyl (PCB) exposure and the risk of endometriosis, we identified a highly significant overall effect of PCB concentrations on the risk of endometriosis.  相似文献   

20.
The Self-Organizing Map (SOM) is an efficient tool for visualizing high-dimensional data. In this paper, an intuitive and effective SOM projection method is proposed for mapping high-dimensional data onto the two-dimensional grid structure with a growing self-organizing mechanism. In the learning phase, a growing SOM is trained and the growing cell structure is used as the baseline framework. In the ordination phase, the new projection method is used to map the input vector so that the input data is mapped to the structure of the SOM without having to plot the weight values, resulting in easy visualization of the data. The projection method is demonstrated on four different data sets, including a 118 patent data set and a 399 checical abstract data set related to polymer cements, with promising results and a significantly reduced network size.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号