首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.

Background  

Dynamic programming is a widely used programming technique in bioinformatics. In sharp contrast to the simplicity of textbook examples, implementing a dynamic programming algorithm for a novel and non-trivial application is a tedious and error prone task. The algebraic dynamic programming approach seeks to alleviate this situation by clearly separating the dynamic programming recurrences and scoring schemes.  相似文献   

3.
Genetic programming is a technique that can be used to tackle the hugely demanding data-processing problems encountered in the natural sciences. Application of genetic programming to a problem using parasites as biological tags demonstrates its potential for developing explanatory models using data that are both complex and noisy.  相似文献   

4.
In the present paper, we describe how a directed graph was constructed and then searched for the optimum path using a dynamic programming approach, based on the secondary structure propensity of the protein short sequence derived from a training data set. The protein secondary structure was thus predicted in this way. The average three-state accuracy of the algorithm used was 76.70%.  相似文献   

5.
Well‐intentioned environmental management can backfire, causing unforeseen damage. To avoid this, managers and ecologists seek accurate predictions of the ecosystem‐wide impacts of interventions, given small and imprecise datasets, which is an incredibly difficult task. We generated and analysed thousands of ecosystem population time series to investigate whether fitted models can aid decision‐makers to select interventions. Using these time‐series data (sparse and noisy datasets drawn from deterministic Lotka‐Volterra systems with two to nine species, of known network structure), dynamic model forecasts of whether a species’ future population will be positively or negatively affected by rapid eradication of another species were correct > 70% of the time. Although 70% correct classifications is only slightly better than an uninformative prediction (50%), this classification accuracy can be feasibly improved by increasing monitoring accuracy and frequency. Our findings suggest that models may not need to produce well‐constrained predictions before they can inform decisions that improve environmental outcomes.  相似文献   

6.
Classification of newly determined protein structures is important in understanding their function and mechanism of action. Currently available methods employ a global structure alignment strategy and are computationally expensive. We propose a two-step methodology with a quick screen to significantly reduce the number of candidate structures followed by global structure alignment of the query structure with the reduced set. We represent a protein structure as a sequence of local structures, codified in the form of geometric invariants. Geometric invariants are quantities that remain unchanged under transformations such as translation and rotation. Protein structures represented as multi-attribute sequences are aligned via dynamic programming to identify close neighbors of the query structure. The query structure is then compared with this reduced dataset using conventional structure comparison methods to predict its functional class. For a typical protein structure, the screening method was able to reduce the protein data bank to mere 200 proteins while preserving structurally closest neighbor in the reduced set. This has resulted in 30 to 60 fold improvement in the execution time. We present the results of leave-one-out classification experiment on ASTRAL-95 domains and comparison with SCOP classification hierarchy.  相似文献   

7.
For years, we have been building models of gene regulatory networks, where recent advances in molecular biology shed some light on new structural and dynamical properties of such highly complex systems. In this work, we propose a novel timing of updates in random and scale-free Boolean networks, inspired by recent findings in molecular biology. This update sequence is neither fully synchronous nor asynchronous, but rather takes into account the sequence in which genes affect each other. We have used both Kauffman's original model and Aldana's extension, which takes into account the structural properties about known parts of actual GRNs, where the degree distribution is right-skewed and long-tailed. The computer simulations of the dynamics of the new model compare favorably to the original ones and show biologically plausible results both in terms of attractors number and length. We have complemented this study with a complete analysis of our systems’ stability under transient perturbations, which is one of biological networks defining attribute. Results are encouraging, as our model shows comparable and usually even better behavior than preceding ones without loosing Boolean networks attractive simplicity.  相似文献   

8.
A protein structure comparison method is described that allows the generation of large populations of high-scoring alternate alignments. This was achieved by incorporating a random element into an iterative double dynamic programming algorithm. The maximum scores from repeated comparisons of a pair of structures converged on a value that was taken as the global maximum. This lay 15% over the score obtained from the single fixed (unrandomized) calculation. The effect of the gap penalty was observed through the shift of the alignment populations, characterized by their alignment length and root-mean-square deviation (RMSD). The best (lowest RMSD) values found in these populations provided a base-line against which other methods were compared.  相似文献   

9.
10.
Longitudinal data analysis using generalized linear models   总被引:186,自引:0,他引:186  
  相似文献   

11.
Lam Tran  Kevin He  Di Wang  Hui Jiang 《Biometrics》2023,79(2):1280-1292
The proliferation of biobanks and large public clinical data sets enables their integration with a smaller amount of locally gathered data for the purposes of parameter estimation and model prediction. However, public data sets may be subject to context-dependent confounders and the protocols behind their generation are often opaque; naively integrating all external data sets equally can bias estimates and lead to spurious conclusions. Weighted data integration is a potential solution, but current methods still require subjective specifications of weights and can become computationally intractable. Under the assumption that local data are generated from the set of unknown true parameters, we propose a novel weighted integration method based upon using the external data to minimize the local data leave-one-out cross validation (LOOCV) error. We demonstrate how the optimization of LOOCV errors for linear and Cox proportional hazards models can be rewritten as functions of external data set integration weights. Significant reductions in estimation error and prediction error are shown using simulation studies mimicking the heterogeneity of clinical data as well as a real-world example using kidney transplant patients from the Scientific Registry of Transplant Recipients.  相似文献   

12.
It is widely acknowledged that the analysis of comparative data from related species should be performed taking into account their phylogenetic relationships. We introduce a new method, based on the use of generalized estimating equations (GEE), for the analysis of comparative data. The principle is to incorporate, in the modelling process, a correlation matrix that specifies the dependence among observations. This matrix is obtained from the phylogenetic tree of the studied species. Using this approach, a variety of distributions (discrete or continuous) can be analysed using a generalized linear modelling framework, phylogenies with multichotomies can be analysed, and there is no need to estimate ancestral character state. A simulation study showed that the proposed approach has good statistical properties with a type-I error rate close to the nominal 5%, and statistical power to detect correlated evolution between two characters which increases with the strength of the correlation. The proposed approach performs well for the analysis of discrete characters. We illustrate our approach with some data on macro-ecological correlates in birds. Some extensions of the use of GEE are discussed.  相似文献   

13.
Feng  Xikang  Chen  Lingxi  Qing  Yuhao  Li  Ruikang  Li  Chaohui  Li  Shuai Cheng 《BMC genomics》2021,22(5):1-13
Background

All diseases containing genetic material undergo genetic evolution and give rise to heterogeneity including cancer and infection. Although these illnesses are biologically very different, the ability for phylogenetic retrodiction based on the genomic reads is common between them and thus tree-based principles and assumptions are shared. Just as the different frequencies of tumor genomic variants presupposes the existence of multiple tumor clones and provides a handle to computationally infer them, we postulate that the different variant frequencies in viral reads offers the means to infer multiple co-infecting sublineages.

Results

We present a common methodological framework to infer the phylogenomics from genomic data, be it reads of SARS-CoV-2 of multiple COVID-19 patients or bulk DNAseq of the tumor of a cancer patient. We describe the Concerti computational framework for inferring phylogenies in each of the two scenarios.To demonstrate the accuracy of the method, we reproduce some known results in both scenarios. We also make some additional discoveries.

Conclusions

Concerti successfully extracts and integrates information from multi-point samples, enabling the discovery of clinically plausible phylogenetic trees that capture the heterogeneity known to exist both spatially and temporally. These models can have direct therapeutic implications by highlighting “birth” of clones that may harbor resistance mechanisms to treatment, “death” of subclones with drug targets, and acquisition of functionally pertinent mutations in clones that may have seemed clinically irrelevant. Specifically in this paper we uncover new potential parallel mutations in the evolution of the SARS-CoV-2 virus. In the context of cancer, we identify new clones harboring resistant mutations to therapy.

  相似文献   

14.
Data analysis--not data production--is becoming the bottleneck in gene expression research. Data integration is necessary to cope with an ever increasing amount of data, to cross-validate noisy data sets, and to gain broad interdisciplinary views of large biological data sets. New Internet resources may help researchers to combine data sets across different gene expression platforms. However, noise and disparities in experimental protocols strongly limit data integration. A detailed review of four selected studies reveals how some of these limitations may be circumvented and illustrates what can be achieved through data integration.  相似文献   

15.
16.
Many of the statistical techniques commonly used in ecology assume independence among responses. However, there are many marine mammal survey techniques, such as those involving time series or subgroups, which result in correlations within the data. Generalized estimating equations (GEEs) take such correlations into account and are an extension of generalized linear models. This study demonstrates the application of GEEs by modeling temporal variation in bottlenose dolphin presence from sightings data. Since dolphins could remain in the study area for several hours resulting in temporal autocorrelation, an autoregressive correlation structure was used within the GEE, each cluster representing hours within a day of survey effort. The results of the GEE model showed that there was significant diel, tidal, and interannual variation in the presence of dolphins. Dolphins were most likely to be seen in the early morning and during the summer months. Dolphin presence generally peaked during low tide, but this varied among years. There was a significantly lower probability of dolphins being present in 2003 than 2004, but not between 2004 and the other years (1991, 1992, and 2002). GEE‐model fitting packages are now readily available, making this a valuable, versatile tool for marine mammal biologists.  相似文献   

17.
Comparative genome hybridization (CGH) to DNA microarrays (array CGH) is a technique capable of detecting deletions and duplications in genomes at high resolution. However, array CGH studies of the human genome noting false negative and false positive results using large insert clones as probes have raised important concerns regarding the suitability of this approach for clinical diagnostic applications. Here, we adapt the Smith–Waterman dynamic-programming algorithm to provide a sensitive and robust analytic approach (SW-ARRAY) for detecting copy-number changes in array CGH data. In a blind series of hybridizations to arrays consisting of the entire tiling path for the terminal 2 Mb of human chromosome 16p, the method identified all monosomies between 267 and 1567 kb with a high degree of statistical significance and accurately located the boundaries of deletions in the range 267–1052 kb. The approach is unique in offering both a nonparametric segmentation procedure and a nonparametric test of significance. It is scalable and well-suited to high resolution whole genome array CGH studies that use array probes derived from large insert clones as well as PCR products and oligonucleotides.  相似文献   

18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号