首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
It has been hypothesized that a large fraction of 24% noncoding DNA in R. prowazekii consists of degraded genes. This hypothesis has been based on the relatively high G+C content of noncoding DNA. However, a comparison with other genomes also having a low overall G+C content shows that this argument would also apply to other bacteria. To test this hypothesis, we study the coding potential in sets of genes, pseudogenes, and intergenic regions. We find that the correlation function and the χ2-measure are clearly indicative of the coding function of genes and pseudogenes. However, both coding potentials make almost no indication of a preexisting reading frame in the remaining 23% of noncoding DNA. We simulate the degradation of genes due to single-nucleotide substitutions and insertions/deletions and quantify the number of mutations required to remove indications of the reading frame. We discuss a reduced selection pressure as another possible origin of this comparatively large fraction of noncoding sequences. Received: 27 December 1999 / Accepted: 5 July 2000  相似文献   

2.
The energetics of protein‐DNA interactions are often modeled using so‐called statistical potentials, that is, energy models derived from the atomic structures of protein‐DNA complexes. Many statistical protein‐DNA potentials based on differing theoretical assumptions have been investigated, but little attention has been paid to the types of data and the parameter estimation process used in deriving the statistical potentials. We describe three enhancements to statistical potential inference that significantly improve the accuracy of predicted protein‐DNA interactions: (i) incorporation of binding energy data of protein‐DNA complexes, in conjunction with their X‐ray crystal structures, (ii) use of spatially‐aware parameter fitting, and (iii) use of ensemble‐based parameter fitting. We apply these enhancements to three widely‐used statistical potentials and use the resulting enhanced potentials in a structure‐based prediction of the DNA binding sites of proteins. These enhancements are directly applicable to all statistical potentials used in protein‐DNA modeling, and we show that they can improve the accuracy of predicted DNA binding sites by up to 21%. Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

3.
Currently used techniques for the analysis of single-molecule trajectories only exploit a small part of the available information stored in the data. Here, we apply a Bayesian inference scheme to trajectories of confined receptors that are targeted by pore-forming toxins to extract the two-dimensional confining potential that restricts the motion of the receptor. The receptor motion is modeled by the overdamped Langevin equation of motion. The method uses most of the information stored in the trajectory and converges quickly onto inferred values, while providing the uncertainty on the determined values. The inference is performed on the polynomial development of the potential and on the diffusivities that have been discretized on a mesh. Numerical simulations are used to test the scheme and quantify the convergence toward the input values for forces, potential, and diffusivity. Furthermore, we show that the technique outperforms the classical mean-square-displacement technique when forces act on confined molecules because the typical mean-square-displacement analysis does not account for them. We also show that the inferred potential better represents input potentials than the potential extracted from the position distribution based on Boltzmann statistics that assumes statistical equilibrium.  相似文献   

4.
5.
We assess the similarity of base substitution processes, described by empirically derived 4 × 4 matrices, using chi-square homogeneity tests. Such significance analyses allow us to assess variation in sequence evolution across sites and we apply them to matrices derived from noncoding sites in different contexts in grass chloroplast DNA. We show that there is statistically significant variation in rates and patterns of mutation among noncoding sites in different contexts and then demonstrate a similar and significant influence of context on substitutions at fourfold degenerate sites of coding regions from grass chloroplast DNA. These results show that context has the same general effect on substitution bias in coding and noncoding DNA: the A+T content of flanking bases is correlated with rate of substitution, transition bias, and GC → AT pressure, while the number of flanking pyrimidines on a single strand is correlated with a mutational bias, or skew, toward pyrimidines. Despite the similarity in general trends, however, when we compare coding and noncoding matrices we find that there is a statistically significant difference between them even when we control for context. Most noticeably, fourfold degenerate sites in coding sequences are undergoing substitution at a higher rate and there are also significant differences in the relationship between pyrimidines skew and the number of flanking pyrimidines. Possible reasons for the differences between coding and noncoding sites are discussed. Furthermore, our analysis illustrates a simple statistical way for comparing substitution processes across sites allowing us to better study variation in evolutionary processes across a genome. [Reviewing Editor: Dr. Martin Kreitman]  相似文献   

6.
T.L. Sitnikova  A.A. Zharkikh   《Bio Systems》1993,30(1-3):113-135
This work is an attempt to study the structural features and evolutionary patterns of nucleotide sequences by analyzing their 1- through 4-plet frequencies and statistical relations between them. We present mathematical apparatus for this analysis. In particular, we introduce criteria to estimate the degree of homogeneity of L-plet composition in a given set of sequences and the dependence of the L-plet frequencies on the composition of lower orders. We apply these criteria to the study of eubacteria, mitochondria and chloroplasts. We demonstrate that L-plet frequencies are quite useful for revealing evolutionary relationship between DNA sequences and that the non-random distribution is more typical for doublets than to triplets. Non-randomness of triplet composition is more characteristic to coding than to non-coding regions, while no significant differences in dinucleotide composition can be observed. The obtained results can be used for revealing possible mechanisms of the codon usage phenomena.  相似文献   

7.
Many clonal organisms experience occasional events of sexual recombination, with profound consequences for their population dynamics and evolutionary trajectories. With the recent development of polymorphic genetic markers and new statistical methods, we now have an unprecedented ability to detect recombination in organisms that are thought to reproduce strictly, or essentially asexually. However, it is not always obvious which methodology to apply. Consequently, biologists might decide how to analyse their data without clear guidelines. Here, we discuss the available methods, focusing on those best suited when working with limited genetic information, such as a few genetic markers or DNA sequences. We conclude by commenting on the prospects offered by some recent conceptual advances and the access to high throughput technologies in an increasing number of model organisms.  相似文献   

8.
9.
10.
11.
MOTIVATION: Biologists usually work with textual DNA sequences (succession of A, C, G and T). This representation allows biologists to study the syntax and other linguistic properties of DNA sequences. Nevertheless, such a linear coding offers only a local and a one-dimensional vision of the molecule. The 3D structure of DNA is known to be very important in many essential biological mechanisms. By using 3D conformation models, one is able to construct a 3D trajectory of a naked DNA molecule. From the various studies that we performed, it turned out that two very different textual DNA sequences could have similar 3D structures. RESULTS: In this article, we address a new research work on 3D pattern matching for DNA sequences. The aim of this work is to enhance conventional pattern matching analyses with 3D-augmented criteria. We have developed an algorithm, based on 3D trajectories, which compares angles formed by these trajectories and thus quantifies the difference between two 3D DNA sequences. This analysis performs from a global scale to al local one. AVAILABILITY: Available on request from the authors.  相似文献   

12.
We study the coding potential of human DNA sequences, using the positional asymmetry function (D(p)) and the positional information function (I(q)). Both D(p)and I(q)are based on the positional dependence of single nucleotide frequencies. We investigate the accuracy of D(p)and I(q)in distinguishing coding and non-coding DNA as a function of the parameters p and q, respectively, and explore at which parameters p(opt)and q(opt)both D(p)and I(q)distinguish coding and non-coding DNA most accurately. We compare our findings with classically used parameter values and find that optimized coding potentials yield comparable accuracies as classical frame-independent coding potentials trained on prior data. We find that p(opt)and q(opt)vary only slightly with the sequence length.  相似文献   

13.
It has recently been proposed that variation in DNA methylation at specific genomic locations may play an important role in the development of complex diseases such as cancer. Here, we develop 1- and 2-group multiple testing procedures for identifying and quantifying regions of DNA methylation variability. Our method is the first genome-wide statistical significance calculation for increased or differential variability, as opposed to the traditional approach of testing for mean changes. We apply these procedures to genome-wide methylation data obtained from biological and technical replicates and provide the first statistical proof that variably methylated regions exist and are due to interindividual variation. We also show that differentially variable regions in colon tumor and normal tissue show enrichment of genes regulating gene expression, cell morphogenesis, and development, supporting a biological role for DNA methylation variability in cancer.  相似文献   

14.
15.
We present an event tree analysis of studying the dynamics of the Hodgkin-Huxley (HH) neuronal networks. Our study relies on a coarse-grained projection to event trees and to the event chains that comprise these trees by using a statistical collection of spatial-temporal sequences of relevant physiological observables (such as sequences of spiking multiple neurons). This projection can retain information about network dynamics that covers multiple features, swiftly and robustly. We demonstrate that for even small differences in inputs, some dynamical regimes of HH networks contain sufficiently higher order statistics as reflected in event chains within the event tree analysis. Therefore, this analysis is effective in discriminating small differences in inputs. Moreover, we use event trees to analyze the results computed from an efficient library-based numerical method proposed in our previous work, where a pre-computed high resolution data library of typical neuronal trajectories during the interval of an action potential (spike) allows us to avoid resolving the spikes in detail. In this way, we can evolve the HH networks using time steps one order of magnitude larger than the typical time steps used for resolving the trajectories without the library, while achieving comparable statistical accuracy in terms of average firing rate and power spectra of voltage traces. Our numerical simulation results show that the library method is efficient in the sense that the results generated by using this numerical method with much larger time steps contain sufficiently high order statistical structure of firing events that are similar to the ones obtained using a regular HH solver. We use our event tree analysis to demonstrate these statistical similarities.  相似文献   

16.
We provide a new automated statistical method for DNA barcoding based on a Bayesian phylogenetic analysis. The method is based on automated database sequence retrieval, alignment, and phylogenetic analysis using a custom-built program for Bayesian phylogenetic analysis. We show on real data that the method outperforms Blast searches as a measure of confidence and can help eliminate 80% of all false assignment based on best Blast hit. However, the most important advance of the method is that it provides statistically meaningful measures of confidence. We apply the method to a re-analysis of previously published ancient DNA data and show that, with high statistical confidence, most of the published sequences are in fact of Neanderthal origin. However, there are several cases of chimeric sequences that are comprised of a combination of both Neanderthal and modern human DNA.  相似文献   

17.
In previous notes, we have described both mathematical properties of potential (n-switches) and potential-Hamiltonian (Liénard systems) continuous differential systems, and also biological applications, especially those concerning primitive cyclic RNAs related to the genetic code. In the present note, we give a general definition of a potential automaton, and we show that a discrete Hopfield-like system already introduced by Goles et al. is a good candidate for such a potential automaton: it has a Lyapunov functional that decreases on its trajectories and whose time derivative is just its discrete velocity. Then we apply this new notion of potential automaton to the genetic code. We show in particular that the consideration of only physicochemical properties of amino-acids, like their molecular weight, hydrophobicity and ability to create hydrogen bonds suffices to build a potential decreasing on trajectories corresponding to the synonymy classes of the genetic code. Such an 'a minima' construction reinforces the classical stereochemical hypothesis about the origin of the genetic code and authorizes new views about the optimality of its synonymy classes.  相似文献   

18.
Sorensen D 《Genetica》2009,136(2):319-332
A remarkable research impetus has taken place in statistical genetics since the last World Conference. This has been stimulated by breakthroughs in molecular genetics, automated data-recording devices and computer-intensive statistical methods. The latter were revolutionized by the bootstrap and by Markov chain Monte Carlo (McMC). In this overview a number of specific areas are chosen to illustrate the enormous flexibility that McMC has provided for fitting models and exploring features of data that were previously inaccessible. The selected areas are inferences of the trajectories over time of genetic means and variances, models for the analysis of categorical and count data, the statistical genetics of a model postulating that environmental variance is partly under genetic control, and a short discussion of models that incorporate massive genetic marker information. We provide an overview of the application of McMC to study model fit, and finally, a discussion is presented on the development of efficient McMC updating schemes for non-standard models.  相似文献   

19.
We report here the release of a web-based tool (MDDNA) to study and model the fine structural details of DNA on the basis of data extracted from a set of molecular dynamics (MD) trajectories of DNA sequences involving all the unique tetranucleotides. The dynamic web interface can be employed to analyze the first neighbor sequence context effects on the 10 unique dinucleotide steps of DNA. Functionality is included to build all atom models of any user-defined sequence based on the MD results. The backend of this interface is a relational database storing the conformational details of DNA obtained in 39 different MD simulation trajectories comprising all the 136 unique tetranucleotide steps. Examples of the use of this data to predict DNA structures are included. Availability: http://humphry.chem.wesleyan.edu:8080/MDDNA. Supplementary information: Supplementary data including color figures are available at Bioinformatics online.  相似文献   

20.
Membrane proteins move in heterogeneous environments with spatially (sometimes temporally) varying friction and with biochemical interactions with various partners. It is important to reliably distinguish different modes of motion to improve our knowledge of the membrane architecture and to understand the nature of interactions between membrane proteins and their environments. Here, we present an analysis technique for single molecule tracking (SMT) trajectories that can determine the preferred model of motion that best matches observed trajectories. The method is based on Bayesian inference to calculate the posteriori probability of an observed trajectory according to a certain model. Information theory criteria, such as the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and modified AIC (AICc), are used to select the preferred model. The considered group of models includes free Brownian motion, and confined motion in 2nd or 4th order potentials. We determine the best information criteria for classifying trajectories. We tested its limits through simulations matching large sets of experimental conditions and we built a decision tree. This decision tree first uses the BIC to distinguish between free Brownian motion and confined motion. In a second step, it classifies the confining potential further using the AIC. We apply the method to experimental Clostridium Perfingens -toxin (CPT) receptor trajectories to show that these receptors are confined by a spring-like potential. An adaptation of this technique was applied on a sliding window in the temporal dimension along the trajectory. We applied this adaptation to experimental CPT trajectories that lose confinement due to disaggregation of confining domains. This new technique adds another dimension to the discussion of SMT data. The mode of motion of a receptor might hold more biologically relevant information than the diffusion coefficient or domain size and may be a better tool to classify and compare different SMT experiments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号