共查询到20条相似文献,搜索用时 17 毫秒
1.
2.
3.
4.
5.
由于研究环境变化和微生物群落的需要,近年来高通量组学技术得到了迅猛开发和应用.其中,基于测序和芯片技术的宏基因组学是一个关键的、最成熟的组学技术,为大多数的其它组学技术提供了支撑.相比较而言,宏转录组学、宏蛋白质组学和宏代谢组学也取得了少数的有限成功,但已经显示出可喜的潜力.所有的组学技术都有赖于生物信息学,使得后者成为组学技术应用的一个主要的技术瓶颈.这些新的组学技术对环境微生物学领域产生了革命性的影响,极大地丰富了我们对于环境微生物基因资源和功能活性的了解. 相似文献
6.
7.
The importance of lipids for cell function and health has been widely recognized, e.g., a disorder in the lipid composition of cells has been related to atherosclerosis caused cardiovascular disease (CVD). Lipidomics analyses are characterized by large yet not a huge number of mutually correlated variables measured and their associations to outcomes are potentially of a complex nature. Differential network analysis provides a formal statistical method capable of inferential analysis to examine differences in network structures of the lipids under two biological conditions. It also guides us to identify potential relationships requiring further biological investigation. We provide a recipe to conduct permutation test on association scores resulted from partial least square regression with multiple imputed lipidomic data from the LUdwigshafen RIsk and Cardiovascular Health (LURIC) study, particularly paying attention to the left-censored missing values typical for a wide range of data sets in life sciences. Left-censored missing values are low-level concentrations that are known to exist somewhere between zero and a lower limit of quantification. To make full use of the LURIC data with the missing values, we utilize state of the art multiple imputation techniques and propose solutions to the challenges that incomplete data sets bring to differential network analysis. The customized network analysis helps us to understand the complexities of the underlying biological processes by identifying lipids and lipid classes that interact with each other, and by recognizing the most important differentially expressed lipids between two subgroups of coronary artery disease (CAD) patients, the patients that had a fatal CVD event and the ones who remained stable during two year follow-up. 相似文献
8.
Biological networks, such as genetic regulatory networks and protein interaction networks, provide important information for studying gene/protein activities. In this paper, we propose a new method, NetBoosting, for incorporating a priori biological network information in analyzing high dimensional genomics data. Specially, we are interested in constructing prediction models for disease phenotypes of interest based on genomics data, and at the same time identifying disease susceptible genes. We employ the gradient descent boosting procedure to build an additive tree model and propose a new algorithm to utilize the network structure in fitting small tree weak learners. We illustrate by simulation studies and a real data example that, by making use of the network information, NetBoosting outperforms a few existing methods in terms of accuracy of prediction and variable selection. 相似文献
9.
Silvia Pineda Francisco X. Real Manolis Kogevinas Alfredo Carrato Stephen J. Chanock Núria Malats Kristel Van Steen 《PLoS genetics》2015,11(12)
Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease conditions. 相似文献
10.
11.
12.
The use of high-throughput techniques to generate large volumes of protein-protein interaction (PPI) data has increased the need for methods that systematically and automatically suggest functional relationships among proteins. In a yeast PPI network, previous work has shown that the local connection topology, particularly for two proteins sharing an unusually large number of neighbors, can predict functional association. In this study we improved the prediction scheme by developing a new algorithm and applied it on a human PPI network to make a genome-wide functional inference. We used the new algorithm to measure and reduce the influence of hub proteins on detecting function-associated protein pairs. We used the annotations of the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as benchmarks to compare and evaluate the function relevance. The application of our algorithms to human PPI data yielded 4,233 significant functional associations among 1,754 proteins. Further functional comparisons between them allowed us to assign 466 KEGG pathway annotations to 274 proteins and 123 GO annotations to 114 proteins with estimated false discovery rates of <21% for KEGG and <30% for GO. We clustered 1,729 proteins by their functional associations and made functional inferences from detailed analysis on one subcluster highly enriched in the TGF-β signaling pathway (P<10−50). Analysis of another four subclusters also suggested potential new players in six signaling pathways worthy of further experimental investigations. Our study gives clear insight into the common neighbor-based prediction scheme and provides a reliable method for large-scale functional annotation in this post-genomic era. 相似文献
13.
14.
Yun-gang Luo Defeng Wang Kai Liu Jian Weng Yuefeng Guan Kate C. C. Chan Winnie C. W. Chu Lin Shi 《PloS one》2015,10(9)
Childhood obstructive sleep apnea (OSA) is a sleeping disorder commonly affecting school-aged children and is characterized by repeated episodes of blockage of the upper airway during sleep. In this study, we performed a graph theoretical analysis on the brain morphometric correlation network in 25 OSA patients (OSA group; 5 female; mean age, 10.1 ± 1.8 years) and investigated the topological alterations in global and regional properties compared with 20 healthy control individuals (CON group; 6 females; mean age, 10.4 ± 1.8 years). A structural correlation network based on regional gray matter volume was constructed respectively for each group. Our results revealed a significantly decreased mean local efficiency in the OSA group over the density range of 0.32–0.44 (p < 0.05). Regionally, the OSAs showed a tendency of decreased betweenness centrality in the left angular gyrus, and a tendency of decreased degree in the right lingual and inferior frontal (orbital part) gyrus (p < 0.005, uncorrected). We also found that the network hubs in OSA and controls were distributed differently. To the best of our knowledge, this is the first study that characterizes the brain structure network in OSA patients and invests the alteration of topological properties of gray matter volume structural network. This study may help to provide new evidence for understanding the neuropathophysiology of OSA from a topological perspective. 相似文献
16.
Summary . We consider analysis of clustered data with mixed bivariate responses, i.e., where each member of the cluster has a binary and a continuous outcome. We propose a new bivariate random effects model that induces associations among the binary outcomes within a cluster, among the continuous outcomes within a cluster, between a binary outcome and a continuous outcome from different subjects within a cluster, as well as the direct association between the binary and continuous outcomes within the same subject. For the ease of interpretations of the regression effects, the marginal model of the binary response probability integrated over the random effects preserves the logistic form and the marginal expectation of the continuous response preserves the linear form. We implement maximum likelihood estimation of our model parameters using standard software such as PROC NLMIXED of SAS . Our simulation study demonstrates the robustness of our method with respect to the misspecification of the regression model as well as the random effects model. We illustrate our methodology by analyzing a developmental toxicity study of ethylene glycol in mice. 相似文献
17.
Maike Ahrens Michael Turewicz Swaantje Casjens Caroline May Beate Pesch Christian Stephan Dirk Woitalla Ralf Gold Thomas Brüning Helmut E. Meyer J?rg Rahnenführer Martin Eisenacher 《PloS one》2013,8(11)
Detection of yet unknown subgroups showing differential gene or protein expression is a frequent goal in the analysis of modern molecular data. Applications range from cancer biology over developmental biology to toxicology. Often a control and an experimental group are compared, and subgroups can be characterized by differential expression for only a subgroup-specific set of genes or proteins. Finding such genes and corresponding patient subgroups can help in understanding pathological pathways, diagnosis and defining drug targets. The size of the subgroup and the type of differential expression determine the optimal strategy for subgroup identification. To date, commonly used software packages hardly provide statistical tests and methods for the detection of such subgroups. Different univariate methods for subgroup detection are characterized and compared, both on simulated and on real data. We present an advanced design for simulation studies: Data is simulated under different distributional assumptions for the expression of the subgroup, and performance results are compared against theoretical upper bounds. For each distribution, different degrees of deviation from the majority of observations are considered for the subgroup. We evaluate classical approaches as well as various new suggestions in the context of omics data, including outlier sum, PADGE, and kurtosis. We also propose the new FisherSum score. ROC curve analysis and AUC values are used to quantify the ability of the methods to distinguish between genes or proteins with and without certain subgroup patterns. In general, FisherSum for small subgroups and -test for large subgroups achieve best results. We apply each method to a case-control study on Parkinson''s disease and underline the biological benefit of the new method. 相似文献
18.
19.