共查询到20条相似文献,搜索用时 0 毫秒
1.
Shah M Corbeil J 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(1):14-26
We propose a general theoretical framework for analyzing differentially expressed genes and behavior patterns from two homogenous short time-course data. The framework generalizes the recently proposed Hilbert-Schmidt Independence Criterion (HSIC)-based framework adapting it to the time-series scenario by utilizing tensor analysis for data transformation. The proposed framework is effective in yielding criteria that can identify both the differentially expressed genes and time-course patterns of interest between two time-series experiments without requiring to explicitly cluster the data. The results, obtained by applying the proposed framework with a linear kernel formulation, on various data sets are found to be both biologically meaningful and consistent with published studies. 相似文献
2.
3.
Using the modelling of solute transport in flowing media as an example, this paper outlines the main aspects of a systematic approach to the modelling of natural systems from experimental time-series data. The objective of the approach, which exploits sophisticated methods of recursive parameter estimation, is to produce a parametrically efficient, data-based model which is both physically meaningful and statistically well defined. Although the proposed methodology has its origins in systems and control theory and may be unfamiliar to some natural scientists, it has been developed and refined for use with natural environmental systems over the past 20 years, and has wide application potential in areas such as biology and ecology. In this sense, the paper is intended to introduce the more general reader to the topic, in the hope that the tutorial review and practical examples will stimulate interest and encourage reference to the many publications cited in the paper. The practical examples are concerned with the modelling of pollutant dispersion in stream channels: phloem translocation and carbon partitioning in plants: and rainfall-streamflow modelling in a river catchment. 相似文献
4.
Short tandem repeats, specifically microsatellites, are widely used genetic markers, associated with human genetic diseases, and play an important role in various regulatory mechanisms and evolution. Despite their importance, much is yet unknown about their mutational dynamics. The increasing availability of genome data has led to several in silico studies of microsatellite evolution which have produced a vast range of algorithms and software for tandem repeat detection. Documentation of these tools is often sparse, or provided in a format that is impenetrable to most biologists without informatics background. This article introduces the major concepts behind repeat detecting software essential for informed tool selection. We reflect on issues such as parameter settings and program bias, as well as redundancy filtering and efficiency using examples from the currently available range of programs, to provide an integrated comparison and practical guide to microsatellite detecting programs. 相似文献
5.
MOTIVATION: Inferring genetic networks from time-series expression data has been a great deal of interest. In most cases, however, the number of genes exceeds that of data points which, in principle, makes it impossible to recover the underlying networks. To address the dimensionality problem, we apply the subset selection method to a linear system of difference equations. Previous approaches assign the single most likely combination of regulators to each target gene, which often causes over-fitting of the small number of data. RESULTS: Here, we propose a new algorithm, named LEARNe, which merges the predictions from all the combinations of regulators that have a certain level of likelihood. LEARNe provides more accurate and robust predictions than previous methods for the structure of genetic networks under the linear system model. We tested LEARNe for reconstructing the SOS regulatory network of Escherichia coli and the cell cycle regulatory network of yeast from real experimental data, where LEARNe also exhibited better performances than previous methods. AVAILABILITY: The MATLAB codes are available upon request from the authors. 相似文献
6.
Bonhoeffer S Barbour AD De Boer RJ 《Proceedings. Biological sciences / The Royal Society》2002,269(1503):1887-1893
In order to develop a better understanding of the evolutionary dynamics of HIV drug resistance, it is necessary to quantify accurately the in vivo fitness costs of resistance mutations. However, the reliable estimation of such fitness costs is riddled with both theoretical and experimental difficulties. Experimental fitness assays typically suffer from the shortcoming that they are based on in vitro data. Fitness estimates based on the mathematical analysis of in vivo data, however, are often questionable because the underlying assumptions are not fulfilled. In particular, the assumption that the replication rate of the virus population is constant in time is frequently grossly violated. By extending recent work of Marée and colleagues, we present here a new approach that corrects for time-dependent viral replication in time-series data for growth competition of mutants. This approach allows a reliable estimation of the relative replicative capacity (with confidence intervals) of two competing virus variants growing within the same patient, using longitudinal data for the total plasma virus load, the relative frequency of the two variants and the death rate of infected cells. We assess the accuracy of our method using computer-generated data. An implementation of the developed method is freely accessible on the Web (http://www.eco.ethz.ch/fitness.html). 相似文献
7.
Spatial interactions are key determinants in the dynamics of many epidemiological and ecological systems; therefore it is
important to use spatio-temporal models to estimate essential parameters. However, spatially-explicit data sets are rarely
available; moreover, fitting spatially-explicit models to such data can be technically demanding and computationally intensive.
Thus non-spatial models are often used to estimate parameters from temporal data. We introduce a method for fitting models
to temporal data in order to estimate parameters which characterise spatial epidemics. The method uses semi-spatial models
and pair approximation to take explicit account of spatial clustering of disease without requiring spatial data. The approach is demonstrated for
data from experiments with plant populations invaded by a common soilborne fungus, Rhizoctonia solani. Model inferences concerning the number of sources of disease and primary and secondary infections are tested against independent
measures from spatio-temporal data. The applicability of the method to a wide range of host-pathogen systems is discussed. 相似文献
8.
Yao Q Tong H Finkenstädt B Stenseth NC 《Proceedings. Biological sciences / The Royal Society》2000,267(1460):2459-2467
Typically, in many studies in ecology, epidemiology, biomedicine and others, we are confronted with panels of short time-series of which we are interested in obtaining a biologically meaningful grouping. Here, we propose a bootstrap approach to test whether the regression functions or the variances of the error terms in a family of stochastic regression models are the same. Our general setting includes panels of time-series models as a special case. We rigorously justify the use of the test by investigating its asymptotic properties, both theoretically and through simulations. The latter confirm that for finite sample size, bootstrap provides a better approximation than classical asymptotic theory. We then apply the proposed tests to the mink-muskrat data across 81 trapping regions in Canada. Ecologically interpretable groupings are obtained, which serve as a necessary first step before a fuller biological and statistical analysis of the food chain interaction. 相似文献
9.
Background
The increasing availability of time-series expression data opens up new possibilities to study functional linkages of genes. Present methods used to infer functional linkages between genes from expression data are mainly based on a point-to-point comparison. Change trends between consecutive time points in time-series data have been so far not well explored. 相似文献10.
Helmut Vogel 《Journal of molecular evolution》1975,6(4):271-283
The measures of compositional nonrandomness to be discussed as to their physical significance and to their power of detecting evolutionary significant variations are (see article)(pi a priori probability for amino acid i, ni its number of occurrences in a protein of length L). As a concrete example, the pi are here supposed to represent equal frequencies of all non-stop codons. For each quantity, four levels are defined: The base level, with optimal (i.e. minimal nonrandomness) composition, admitting non-integer values of ni; the integer level with optimal integer composition; the noise level, represented by a typical random cain; and the real protein level. On all these levels, S, which is the measure with the most direct physical sense, shows the smoothest behavior with the smallest relative fluctuations and thus the highest resolution. 相似文献
11.
Background
Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods.Methods
In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values.Results
When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction.Conclusions
Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional occurrence of large negative accuracies when the evaluated line was not included in the training dataset. Furthermore, when using a multi-line training dataset, non-linear models provided information on the genotype data that was complementary to the linear models, which indicates that the underlying data distributions of the three studied lines were indeed heterogeneous.Electronic supplementary material
The online version of this article (doi:10.1186/s12711-014-0075-3) contains supplementary material, which is available to authorized users. 相似文献12.
MOTIVATION: In the process of developing risk prediction models, various steps of model building and model selection are involved. If this process is not adequately controlled, overfitting may result in serious overoptimism leading to potentially erroneous conclusions. METHODS: For right censored time-to-event data, we estimate the prediction error for assessing the performance of a risk prediction model (Gerds and Schumacher, 2006; Graf et al., 1999). Furthermore, resampling methods are used to detect overfitting and resulting overoptimism and to adjust the estimates of prediction error (Gerds and Schumacher, 2007). RESULTS: We show how and to what extent the methodology can be used in situations characterized by a large number of potential predictor variables where overfitting may be expected to be overwhelming. This is illustrated by estimating the prediction error of some recently proposed techniques for fitting a multivariate Cox regression model applied to the data of a prognostic study in patients with diffuse large-B-cell lymphoma (DLBCL). AVAILABILITY: Resampling-based estimation of prediction error curves is implemented in an R package called pec available from the authors. 相似文献
13.
Signal transduction networks are crucial for inter- and intra-cellular signaling. Signals are often transmitted via covalent modification of protein structure, with phosphorylation/dephosphorylation as the primary example. In this paper, we apply a recently described method of computational algebra to the modeling of signaling networks, based on time-course protein modification data. Computational algebraic techniques are employed to construct next-state functions. A Monte Carlo method is used to approximate the Deegan-Packel Index of Power corresponding to the respective variables. The Deegan-Packel Index of Power is used to conjecture dependencies in the cellular signaling networks. We apply this method to two examples of protein modification time-course data available in the literature. These experiments identified protein carbonylation upon exposure of cells to sub-lethal concentrations of copper. We demonstrate that this method can identify protein dependencies that might correspond to regulatory mechanisms to shut down glycolysis in a reverse, step-wise fashion in response to copper-induced oxidative stress in yeast. These examples show that the computational algebra approach can identify dependencies that may outline signaling networks involved in the response of glycolytic enzymes to the oxidative stress caused by copper. 相似文献
14.
Individual organisms are affected by various natural and anthropogenic environmental factors throughout their life history. This is reflected in the way population abundance fluctuates. Consequently, observed population dynamics are often produced by the superimposition of multiple environmental signals. This complicates the analysis of population time-series. Here, a multivariate time-series method called maximum autocorrelation factor analysis (MAFA) was used to extract underlying signals from multiple population time series data. The extracted signals were compared with environmental variables that were suspected to affect the populations. Finally, a simple multiple regression analysis was applied to the same data set, and the results from the regression analysis were compared with those from MAFA. The extracted signals with MAFA were strongly associated with the environmental variables, suggesting that they represent environmental factors. On the other hand, with the multiple regression analysis, one of the important signals was not identifiable, revealing the shortcoming of the conventional approach. MAFA summarizes data based on their lag-one autocorrelation. This allows the identification of underlying signals with a small effect size on population abundance during the observation. It also uses multiple time series collected in parallel; this enables us to effectively analyze short time series. In this study, annual spawning adult counts of Chinook salmon at various locations within the Klamath Basin, California, were analyzed. 相似文献
15.
16.
Stephen F Madden Susan B Carpenter Ian B Jeffery Harry Björkbacka Katherine A Fitzgerald Luke A O'Neill Desmond G Higgins 《BMC bioinformatics》2010,11(1):257
Background
MicroRNAs (miRNAs) are non-coding RNAs that regulate gene expression by binding to the messenger RNA (mRNA) of protein coding genes. They control gene expression by either inhibiting translation or inducing mRNA degradation. A number of computational techniques have been developed to identify the targets of miRNAs. In this study we used predicted miRNA-gene interactions to analyse mRNA gene expression microarray data to predict miRNAs associated with particular diseases or conditions. 相似文献17.
A central issue in cognitive neuroscience is which cortical areas are involved in managing information processing in a cognitive task and to understand their temporal interactions. Since the transfer of information in the form of electrical activity from one cortical region will in turn evoke electrical activity in other regions, the analysis of temporal synchronization provides a tool to understand neuronal information processing between cortical regions. We adopt a method for revealing time-dependent functional connectivity. We apply statistical analyses of phases to recover the information flow and the functional connectivity between cortical regions for high temporal resolution data. We further develop an evaluation method for these techniques based on two kinds of model networks. These networks consist of coupled Rössler attractors or of coupled stochastic Ornstein–Uhlenbeck systems. The implemented time-dependent coupling includes uni- and bi-directional connectivities as well as time delayed feedback. The synchronization dynamics of these networks are analyzed using the mean phase coherence, based on averaging over phase-differences, and the general synchronization index. The latter is based on the Shannon entropy. The combination of these with a parametric time delay forms the basis of a connectivity pattern, which includes the temporal and time lagged dynamics of the synchronization between two sources. We model and discuss potential artifacts. We find that the general phase measures are remarkably stable. They produce highly comparable results for stochastic and periodic systems. Moreover, the methods proves useful for identifying brief periods of phase coupling and delays. Therefore, we propose that the method is useful as a basis for generating potential functional connective models. 相似文献
18.
Mario PL Calus Heyun Huang Addie Vereijken Jeroen Visscher Jan ten Napel Jack J Windig 《遗传、选种与进化》2014,46(1)
Background
The prediction accuracy of several linear genomic prediction models, which have previously been used for within-line genomic prediction, was evaluated for multi-line genomic prediction.Methods
Compared to a conventional BLUP (best linear unbiased prediction) model using pedigree data, we evaluated the following genomic prediction models: genome-enabled BLUP (GBLUP), ridge regression BLUP (RRBLUP), principal component analysis followed by ridge regression (RRPCA), BayesC and Bayesian stochastic search variable selection. Prediction accuracy was measured as the correlation between predicted breeding values and observed phenotypes divided by the square root of the heritability. The data used concerned laying hens with phenotypes for number of eggs in the first production period and known genotypes. The hens were from two closely-related brown layer lines (B1 and B2), and a third distantly-related white layer line (W1). Lines had 1004 to 1023 training animals and 238 to 240 validation animals. Training datasets consisted of animals of either single lines, or a combination of two or all three lines, and had 30 508 to 45 974 segregating single nucleotide polymorphisms.Results
Genomic prediction models yielded 0.13 to 0.16 higher accuracies than pedigree-based BLUP. When excluding the line itself from the training dataset, genomic predictions were generally inaccurate. Use of multiple lines marginally improved prediction accuracy for B2 but did not affect or slightly decreased prediction accuracy for B1 and W1. Differences between models were generally small except for RRPCA which gave considerably higher accuracies for B2. Correlations between genomic predictions from different methods were higher than 0.96 for W1 and higher than 0.88 for B1 and B2. The greater differences between methods for B1 and B2 were probably due to the lower accuracy of predictions for B1 (~0.45) and B2 (~0.40) compared to W1 (~0.76).Conclusions
Multi-line genomic prediction did not affect or slightly improved prediction accuracy for closely-related lines. For distantly-related lines, multi-line genomic prediction yielded similar or slightly lower accuracies than single-line genomic prediction. Bayesian variable selection and GBLUP generally gave similar accuracies. Overall, RRPCA yielded the greatest accuracies for two lines, suggesting that using PCA helps to alleviate the “n ≪ p” problem in genomic prediction.Electronic supplementary material
The online version of this article (doi:10.1186/s12711-014-0057-5) contains supplementary material, which is available to authorized users. 相似文献19.
20.
Non-allelic homologous recombination (NAHR) is a common mechanism for generating genome rearrangements and is implicated in numerous genetic disorders, but its detection in high-throughput sequencing data poses a serious challenge. We present a probabilistic model of NAHR and demonstrate its ability to find NAHR in low-coverage sequencing data from 44 individuals. We identify NAHR-mediated deletions or duplications in 109 of 324 potential NAHR loci in at least one of the individuals. These calls segregate by ancestry, are more common in closely spaced repeats, often result in duplicated genes or pseudogenes, and affect highly studied genes such as GBA and CYP2E1.