首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Litter decomposition rate (k) is typically estimated from proportional litter mass loss data using models that assume constant, normally distributed errors. However, such data often show non-normal errors with reduced variance near bounds (0 or 1), potentially leading to biased k estimates. We compared the performance of nonlinear regression using the beta distribution, which is well-suited to bounded data and this type of heteroscedasticity, to standard nonlinear regression (normal errors) on simulated and real litter decomposition data. Although the beta model often provided better fits to the simulated data (based on the corrected Akaike Information Criterion, AICc), standard nonlinear regression was robust to violation of homoscedasticity and gave equally or more accurate k estimates as nonlinear beta regression. Our simulation results also suggest that k estimates will be most accurate when study length captures mid to late stage decomposition (50–80% mass loss) and the number of measurements through time is ≥5. Regression method and data transformation choices had the smallest impact on k estimates during mid and late stage decomposition. Estimates of k were more variable among methods and generally less accurate during early and end stage decomposition. With real data, neither model was predominately best; in most cases the models were indistinguishable based on AICc, and gave similar k estimates. However, when decomposition rates were high, normal and beta model k estimates often diverged substantially. Therefore, we recommend a pragmatic approach where both models are compared and the best is selected for a given data set. Alternatively, both models may be used via model averaging to develop weighted parameter estimates. We provide code to perform nonlinear beta regression with freely available software.  相似文献   

3.
Gene-gene interactions may play an important role in the genetics of a complex disease. Detection and characterization of gene-gene interactions is a challenging issue that has stimulated the development of various statistical methods to address it. In this study, we introduce a method to measure gene interactions using entropy-based statistics from a contingency table of trait and genotype combinations. We also developed an exploration procedure by using graphs. We propose a standardized relative information gain (RIG) measure to evaluate the interactions between single nucleotide polymorphism (SNP) combinations. To identify the k th order interactions, contingency tables of trait and genotype combinations of k SNPs are constructed, with which RIGs are calculated. The RIGs are standardized using the mean and standard deviation from the permuted datasets. SNP combinations yielding high standardized RIG are chosen for gene-gene interactions. Detection of high-order interactions and comparison of interaction strengths between different orders are made possible by using standardized RIG. We have applied the proposed standardized entropy-based method to two types of data sets from a simulation study and a real genetic association study. We have compared our method and the multifactor dimensionality reduction (MDR) method through power analysis of eight different genetic models with varying penetrance rates, number of SNPs, and sample sizes. Our method shows successful identification of genetic associations and gene-gene interactions both in simulation and real genetic data. Simulation results suggest that the proposed entropy-based method is better able to detect high-order interactions and is superior to the MDR method in most cases. The proposed method is well suited for detecting interactions without main effects as well as for models including main effects.  相似文献   

4.
Gene–gene and gene–environment interactions govern a substantial portion of the variation in complex traits and diseases. In convention, a set of either unrelated or family samples are used in detection of such interactions; even when both kinds of data are available, the unrelated and the family samples are analyzed separately, potentially leading to loss in statistical power. In this report, to detect gene–gene interactions we propose a generalized multifactor dimensionality reduction method that unifies analyses of nuclear families and unrelated subjects within the same statistical framework. We used principal components as genetic background controls against population stratification, and when sibling data are included, within-family control were used to correct for potential spurious association at the tested loci. Through comprehensive simulations, we demonstrate that the proposed method can remarkably increase power by pooling unrelated and offspring’s samples together as compared with individual analysis strategies and the Fisher’s combining p value method while it retains a controlled type I error rate in the presence of population structure. In application to a real dataset, we detected one significant tetragenic interaction among CHRNA4, CHRNB2, BDNF, and NTRK2 associated with nicotine dependence in the Study of Addiction: Genetics and Environment sample, suggesting the biological role of these genes in nicotine dependence development.  相似文献   

5.
A challenge for physiologists and neuroscientists is to map information transfer between components of the systems that they study at different scales, in order to derive important knowledge on structure and function from the analysis of the recorded dynamics. The components of physiological networks often interact in a nonlinear way and through mechanisms which are in general not completely known. It is then safer that the method of choice for analyzing these interactions does not rely on any model or assumption on the nature of the data and their interactions. Transfer entropy has emerged as a powerful tool to quantify directed dynamical interactions. In this paper we compare different approaches to evaluate transfer entropy, some of them already proposed, some novel, and present their implementation in a freeware MATLAB toolbox. Applications to simulated and real data are presented.  相似文献   

6.
In order to identify genes involved in complex diseases, it is crucial to study the genetic interactions at the systems biology level. By utilizing modern high throughput microarray technology, it has become feasible to obtain gene expressions data and turn it into knowledge that explains the regulatory behavior of genes. In this study, an unsupervised nonlinear model was proposed to infer gene regulatory networks on a genome-wide scale. The proposed model consists of two components, a robust correlation estimator and a nonlinear recurrent model. The robust correlation estimator was used to initialize the parameters of the nonlinear recurrent curve-fitting model. Then the initialized model was used to fit the microarray data. The model was used to simulate the underlying nonlinear regulatory mechanisms in biological organisms. The proposed algorithm was applied to infer the regulatory mechanisms of the general network in Saccharomyces cerevisiae and the pulmonary disease pathways in Homo sapiens. The proposed algorithm requires no prior biological knowledge to predict linkages between genes. The prediction results were checked against true positive links obtained from the YEASTRACT database, the TRANSFAC database, and the KEGG database. By checking the results with known interactions, we showed that the proposed algorithm could determine some meaningful pathways, many of which are supported by the existing literature.  相似文献   

7.
High-throughput quantification of genetically coherent units (GCUs) is essential for deciphering population dynamics and species interactions within a community of microbes. Current techniques for microbial community analyses are, however, not suitable for this kind of high-throughput application. Here, we demonstrate the use of multivariate statistical analysis of complex DNA sequence electropherograms for the effective and accurate estimation of relative genotype abundance in cell samples from mixed microbial populations. The procedure is no more labor-intensive than standard automated DNA sequencing and provides a very effective means of quantitative data acquisition from experimental microbial communities. We present results with the Campylobacter jejuni strain-specific marker gene gltA, as well as the 16S rRNA gene, which is a universal marker across bacterial assemblages. The statistical models computed for these genes are applied to genetic data from two different experimental settings, namely, a chicken infection model and a multispecies anaerobic fermentation model, demonstrating collection of time series data from model bacterial communities. The method presented here is, however, applicable to any experimental scenario where the interest is quantification of GCUs in genetically heterogeneous DNA samples.  相似文献   

8.
For most common diseases with heritable components, not a single or a few single-nucleotide polymorphisms (SNPs) explain most of the variance for these disorders. Instead, much of the variance may be caused by interactions (epistasis) among multiple SNPs or interactions with environmental conditions. We present a new powerful statistical model for analyzing and interpreting genomic data that influence multifactorial phenotypic traits with a complex and likely polygenic inheritance. The new method is based on Markov chain Monte Carlo (MCMC) and allows for identification of sets of SNPs and environmental factors that when combined increase disease risk or change the distribution of a quantitative trait. Using simulations, we show that the MCMC method can detect disease association when multiple, interacting SNPs are present in the data. When applying the method on real large-scale data from a Danish population-based cohort, multiple interactions are identified that severely affect serum triglyceride levels in the study individuals. The method is designed for quantitative traits but can also be applied on qualitative traits. It is computationally feasible even for a large number of possible interactions and differs fundamentally from most previous approaches by entertaining nonlinear interactions and by directly addressing the multiple-testing problem.  相似文献   

9.
Deep neural networks have demonstrated improved performance at predicting the sequence specificities of DNA- and RNA-binding proteins compared to previous methods that rely on k-mers and position weight matrices. To gain insights into why a DNN makes a given prediction, model interpretability methods, such as attribution methods, can be employed to identify motif-like representations along a given sequence. Because explanations are given on an individual sequence basis and can vary substantially across sequences, deducing generalizable trends across the dataset and quantifying their effect size remains a challenge. Here we introduce global importance analysis (GIA), a model interpretability method that quantifies the population-level effect size that putative patterns have on model predictions. GIA provides an avenue to quantitatively test hypotheses of putative patterns and their interactions with other patterns, as well as map out specific functions the network has learned. As a case study, we demonstrate the utility of GIA on the computational task of predicting RNA-protein interactions from sequence. We first introduce a convolutional network, we call ResidualBind, and benchmark its performance against previous methods on RNAcompete data. Using GIA, we then demonstrate that in addition to sequence motifs, ResidualBind learns a model that considers the number of motifs, their spacing, and sequence context, such as RNA secondary structure and GC-bias.  相似文献   

10.
Pulmonary hypertension (PH) is a debilitating vascular disease that leads to pulmonary artery (PA) stiffening, which is a predictor of patient mortality. During PH development, PA stiffening adversely affects right ventricular function. PA stiffening has been investigated through the arterial nonlinear elastic response during mechanical testing using a canine PH model. However, only circumferential properties were reported and in the absence of chronic PH-induced PA remodeling. Remodeling can alter arterial nonlinear elastic properties via chronic changes in extracellular matrix (ECM) content and geometry. Here, we used an established constitutive model to demonstrate and differentiate between strain-stiffening, which is due to nonlinear elasticity, and remodeling-induced stiffening, which is due to ECM and geometric changes, in a canine model of chronic thromboembolic PH (CTEPH). To do this, circumferential and axial tissue strips of large extralobar PAs from control and CTEPH tissues were tested in uniaxial tension, and data were fit to a phenomenological constitutive model. Strain-induced stiffening was evident from mechanical testing as nonlinear elasticity in both directions and computationally by a high correlation coefficient between the mechanical data and model (R2 = 0.89). Remodeling-induced stiffening was evident from a significant increase in the constitutive model stress parameter, which correlated with increased PA collagen content and decreased PA elastin content as measured histologically. The ability to differentiate between strain- and remodeling-induced stiffening in vivo may lead to tailored clinical treatments for PA stiffening in PH patients.  相似文献   

11.
With big data becoming widely available in healthcare, machine learning algorithms such as random forest (RF) that ignores time-to-event information and random survival forest (RSF) that handles right-censored data are used for individual risk prediction alternatively to the Cox proportional hazards (Cox-PH) model. We aimed to systematically compare RF and RSF with Cox-PH. RSF with three split criteria [log-rank (RSF-LR), log-rank score (RSF-LRS), maximally selected rank statistics (RSF-MSR)]; RF, Cox-PH, and Cox-PH with splines (Cox-S) were evaluated through a simulation study based on real data. One hundred eighty scenarios were investigated assuming different associations between the predictors and the outcome (linear/linear and interactions/nonlinear/nonlinear and interactions), training sample sizes (500/1000/5000), censoring rates (50%/75%/93%), hazard functions (increasing/decreasing/constant), and number of predictors (seven, 15 including noise variables). Methods' performance was evaluated with time-dependent area under curve and integrated Brier score. In all scenarios, RF had the worst performance. In scenarios with a low number of events (⩽70), Cox-PH was at least noninferior to RSF, whereas under linearity assumption it outperformed RSF. Under the presence of interactions, RSF performed better than Cox-PH as the number of events increased whereas Cox-S reached at least similar performance with RSF under nonlinear effects. RSF-LRS performed slightly worse than RSF-LR and RSF-MSR when including noise variables and interaction effects. When applied to real data, models incorporating survival time performed better. Although RSF algorithms are a promising alternative to conventional Cox-PH as data complexity increases, they require a higher number of events for training. In time-to-event analysis, algorithms that consider survival time should be used.  相似文献   

12.
Cen Wu  Yuehua Cui 《Human genetics》2013,132(12):1413-1425
The genetic influences on complex disease traits generally depend on the joint effects of multiple genetic variants, environmental factors, as well as their interplays. Gene × environment (G × E) interactions play vital roles in determining an individual’s disease risk, but the underlying genetic machinery is poorly understood. Traditional analysis assuming linear relationship between genetic and environmental factors, along with their interactions, is commonly pursued under the regression-based framework to examine G × E interactions. This assumption, however, could be violated due to nonlinear responses of genetic variants to environmental stimuli. As an extension to our previous work on continuous traits, we proposed a flexible varying-coefficient model for the detection of nonlinear G × E interaction with binary disease traits. Varying coefficients were approximated by a non-parametric regression function through which one can assess the nonlinear response of genetic factors to environmental changes. A group of statistical tests were proposed to elucidate various mechanisms of G × E interaction. The utility of the proposed method was illustrated via simulation and real data analysis with application to type 2 diabetes.  相似文献   

13.
In this article, we develop an admixture F model (AFM) for the estimation of population-level coancestry coefficients from neutral molecular markers. In contrast to the previously published F model, the AFM enables disentangling small population size and lack of migration as causes of genetic differentiation behind a given level of FST. We develop a Bayesian estimation scheme for fitting the AFM to multiallelic data acquired from a number of local populations. We demonstrate the performance of the AFM, using simulated data sets and real data on ninespine sticklebacks (Pungitius pungitius) and common shrews (Sorex araneus). The results show that the parameterization of the AFM conveys more information about the evolutionary history than a simple summary parameter such as FST. The methods are implemented in the R package RAFM.  相似文献   

14.
Although several model-based methods are promising for the identification of influential single factors and multi-factor interactions, few are widely used in real applications for most of the model-selection procedures are complex and/or infeasible in computation for high-dimensional data. In particular, the ability of the methods to reveal more true factors and fewer false ones often relies heavily on the selection of appropriate values of tuning parameters, which is still a difficult task to practical analysts. This article provides a simple algorithm modified from stepwise forward regression for the identification of influential factors. Instead of keeping the identified factors in the next models for adjustment in stepwise regression, we propose to subtract the effects of identified factors in each run and always fit a single-term model to the effect-subtracted responses. The computation is lighter as the proposed method only involves calculations of a simple test statistic; and therefore it could be applied to screen ultrahigh-dimensional data for important single factors and multi-factor interactions. Most importantly, we have proposed a novel stopping rule of using a constant threshold for the simple test statistic, which is different from the conventional stepwise regression with AIC or BIC criterion. The performance of the new algorithm has been confirmed competitive by extensive simulation studies compared to several methods available in R packages, including the popular group lasso, surely independence screening, Bayesian quantitative trait locus mapping methods and others. Findings from two real data examples, including a genome-wide association study, demonstrate additional useful information of high-order interactions that can be gained from implementing the proposed algorithm.  相似文献   

15.
Time hierarchies, arising as a result of interactions between system’s components, represent a ubiquitous property of dynamical biological systems. In addition, biological systems have been attributed switch-like properties modulating the response to various stimuli across different organisms and environmental conditions. Therefore, establishing the interplay between these features of system dynamics renders itself a challenging question of practical interest in biology. Existing methods are suitable for systems with one stable steady state employed as a well-defined reference. In such systems, the characterization of the time hierarchies has already been used for determining the components that contribute to the dynamics of biological systems. However, the application of these methods to bistable nonlinear systems is impeded due to their inherent dependence on the reference state, which in this case is no longer unique. Here, we extend the applicability of the reference-state analysis by proposing, analyzing, and applying a novel method, which allows investigation of the time hierarchies in systems exhibiting bistability. The proposed method is in turn used in identifying the components, other than reactions, which determine the systemic dynamical properties. We demonstrate that in biological systems of varying levels of complexity and spanning different biological levels, the method can be effectively employed for model simplification while ensuring preservation of qualitative dynamical properties (i.e., bistability). Finally, by establishing a connection between techniques from nonlinear dynamics and multivariate statistics, the proposed approach provides the basis for extending reference-based analysis to bistable systems.  相似文献   

16.
17.
18.

Background  

It is hypothesized that common, complex diseases may be due to complex interactions between genetic and environmental factors, which are difficult to detect in high-dimensional data using traditional statistical approaches. Multifactor Dimensionality Reduction (MDR) is the most commonly used data-mining method to detect epistatic interactions. In all data-mining methods, it is important to consider internal validation procedures to obtain prediction estimates to prevent model over-fitting and reduce potential false positive findings. Currently, MDR utilizes cross-validation for internal validation. In this study, we incorporate the use of a three-way split (3WS) of the data in combination with a post-hoc pruning procedure as an alternative to cross-validation for internal model validation to reduce computation time without impairing performance. We compare the power to detect true disease causing loci using MDR with both 5- and 10-fold cross-validation to MDR with 3WS for a range of single-locus and epistatic disease models. Additionally, we analyze a dataset in HIV immunogenetics to demonstrate the results of the two strategies on real data.  相似文献   

19.
Worldwide, hypertension is reported to be in approximately a quarter of the population and is the leading biomedical risk factor for mortality worldwide. In the vasculature hypertension is associated with endothelial dysfunction and increased inflammation leading to atherosclerosis and various disease states such as chronic kidney disease2, stroke3 and heart failure4. An initial step in vascular inflammation leading to atherogenesis is the adhesion cascade which involves the rolling, tethering, adherence and subsequent transmigration of leukocytes through the endothelium. Recruitment and accumulation of leukocytes to the endothelium is mediated by an upregulation of adhesion molecules such as vascular cell adhesion molecule-1 (VCAM-1), intracellular cell adhesion molecule-1 (ICAM-1) and E-selectin as well as increases in cytokine and chemokine release and an upregulation of reactive oxygen species5. In vitro methods such as static adhesion assays help to determine mechanisms involved in cell-to-cell adhesion as well as the analysis of cell adhesion molecules. Methods employed in previous in vitro studies have demonstrated that acute increases in pressure on the endothelium can lead to monocyte adhesion, an upregulation of adhesion molecules and inflammatory markers6 however, similar to many in vitro assays, these findings have not been performed in real time under physiological flow conditions, nor with whole blood. Therefore, in vivo assays are increasingly utilised in animal models to demonstrate vascular inflammation and plaque development. Intravital microscopy is now widely used to assess leukocyte adhesion, rolling, migration and transmigration7-9. When combining the effects of pressure on leukocyte to endothelial adhesion the in vivo studies are less extensive. One such study examines the real time effects of flow and shear on arterial growth and remodelling but inflammatory markers were only assessed via immunohistochemistry10. Here we present a model for recording leukocyte adhesion in real time in intact pressurised blood vessels using whole blood perfusion. The methodology is a modification of an ex vivo vessel chamber perfusion model9 which enables real-time analysis of leukocyte -endothelial adhesive interactions in intact vessels. Our modification enables the manipulation of the intraluminal pressure up to 200 mmHg allowing for study not only under physiological flow conditions but also pressure conditions. While pressure myography systems have been previously demonstrated to observe vessel wall and lumen diameter11 as well as vessel contraction this is the first time demonstrating leukocyte-endothelial interactions in real time. Here we demonstrate the technique using carotid arteries harvested from rats and cannulated to a custom-made flow chamber coupled to a fluorescent microscope. The vessel chamber is equipped with a large bottom coverglass allowing a large diameter objective lens with short working distance to image the vessel. Furthermore, selected agonist and/or antagonists can be utilized to further investigate the mechanisms controlling cell adhesion. Advantages of this method over intravital microscopy include no involvement of invasive surgery and therefore a higher throughput can be obtained. This method also enables the use of localised inhibitor treatment to the desired vessel whereas intravital only enables systemic inhibitor treatment.  相似文献   

20.
Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号