首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Choong MK  Yan H 《Bioinformation》2008,2(7):273-278
This paper presents a new method for exon detection in DNA sequences based on multi-scale parametric spectral analysis. A forward-backward linear prediction (FBLP) with the singular value decomposition (SVD) algorithm FBLP-SVD is applied to the double-base curves (DB-curves) of a DNA sequence using a variable moving window sizes to estimate the signal spectrum at multiple scales. Simulations are done on short human genes in the range of 11bp to 2032bp and the results show that our proposed method out-performs the classical Fourier transform method. The multi-scale approach is shown to be more effective than using a single scale with a fixed window size. In addition, our method is flexible as it requires no training data.  相似文献   

2.
In this article, we introduce the drifting Markov models (DMMs) which are inhomogeneous Markov models designed for modeling the heterogeneities of sequences (in our case DNA or protein sequences) in a more flexible way than homogeneous Markov chains or even hidden Markov models (HMMs). We focus here on the polynomial drift: the transition matrix varies in a polynomial way. To show the reliability of our models on DNA, we exhibit high similarities between the probability distributions of nucleotides obtained by our models and the frequencies of these nucleotides computed by using a sliding window. In a further step, these DMMs can be used as the states of an HMM: on each of its segments, the observed process can be modeled by a drifting Markov model. Search of rare words in DNA sequences remains possible with DMMs and according to the fits provided, DMMs turn out to be a powerful tool for this purpose. The software is available on request from the author. It will soon be integrated on seq++ library (http://stat.genopole.cnrs.fr/seqpp/).  相似文献   

3.
Stochastic models for heterogeneous DNA sequences   总被引:10,自引:0,他引:10  
The composition of naturally occurring DNA sequences is often strikingly heterogeneous. In this paper, the DNA sequence is viewed as a stochastic process with local compositional properties determined by the states of a hidden Markov chain. The model used is a discrete-state, discreteoutcome version of a general model for non-stationary time series proposed by Kitagawa (1987). A smoothing algorithm is described which can be used to reconstruct the hidden process and produce graphic displays of the compositional structure of a sequence. The problem of parameter estimation is approached using likelihood methods and an EM algorithm for approximating the maximum likelihood estimate is derived. The methods are applied to sequences from yeast mitochondrial DNA, human and mouse mitochondrial DNAs, a human X chromosomal fragment and the complete genome of bacteriophage lambda.  相似文献   

4.
There is a growing and significant demand for reliable, simple and sensitive methods for repeated scanning of a given gene or gene fragment for detection and characterization of mutations. Solid-phase sequencing by single base primer extension of nested GBATM primers on miniaturized DNA arrays can be used to effectively scan targeted sequences for missense, insertion and deletion mutations. This paper describes the use of N-GBA arrays designed to scan the sequence of a 33 base region of exon 8 of the p53 gene (codons 272-282) encompassing a hot spot for mutations associated with the development of cancer. Synthetic DNA templates containing various missense, insertion and deletion mutations, as well as DNA prepared from pancreatic and biliary tumor cells, were genotyped using the exon 8 arrays.  相似文献   

5.
Directed evolution experiments rely on the cyclical application of mutagenesis, screening and amplification in a test tube. They have led to the creation of novel proteins for a wide range of applications. However, directed evolution currently requires an uncertain, typically large, number of labor intensive and expensive experimental cycles before proteins with improved function are identified. This paper introduces predictive models for quantifying the outcome of the experiments aiding in the setup of directed evolution for maximizing the chances of obtaining DNA sequences encoding enzymes with improved activities. Two methods of DNA manipulation are analysed: error-prone PCR and DNA recombination. Error-prone PCR is a DNA replication process that intentionally introduces copying errors by imposing mutagenic reaction conditions. The proposed model calculates the probability of producing a specific nucleotide sequence after a number of PCR cycles. DNA recombination methods rely on the mixing and concatenation of genetic material from a number of parent sequences. This paper focuses on modeling a specific DNA recombination protocol, DNA shuffling. Three aspects of the DNA shuffling procedure are modeled: the fragment size distribution after random fragmentation by DNase I, the assembly of DNA fragments, and the probability of assembling specific sequences or combinations of mutations. Results obtained with the proposed models compare favorably with experimental data.  相似文献   

6.
7.
We propose a new method for tumor classification from gene expression data, which mainly contains three steps. Firstly, the original DNA microarray gene expression data are modeled by independent component analysis (ICA). Secondly, the most discriminant eigenassays extracted by ICA are selected by the sequential floating forward selection technique. Finally, support vector machine is used to classify the modeling data. To show the validity of the proposed method, we applied it to classify three DNA microarray datasets involving various human normal and tumor tissue samples. The experimental results show that the method is efficient and feasible.  相似文献   

8.
The most commonly used models for analysing local dependencies in DNA sequences are (high-order) Markov chains. Incorporating knowledge relative to the possible grouping of the nucleotides enables to define dedicated sub-classes of Markov chains. The problem of formulating lumpability hypotheses for a Markov chain is therefore addressed. In the classical approach to lumpability, this problem can be formulated as the determination of an appropriate state space (smaller than the original state space) such that the lumped chain defined on this state space retains the Markov property. We propose a different perspective on lumpability where the state space is fixed and the partitioning of this state space is represented by a one-to-many probabilistic function within a two-level stochastic process. Three nested classes of lumped processes can be defined in this way as sub-classes of first-order Markov chains. These lumped processes enable parsimonious reparameterizations of Markov chains that help to reveal relevant partitions of the state space. Characterizations of the lumped processes on the original transition probability matrix are derived. Different model selection methods relying either on hypothesis testing or on penalized log-likelihood criteria are presented as well as extensions to lumped processes constructed from high-order Markov chains. The relevance of the proposed approach to lumpability is illustrated by the analysis of DNA sequences. In particular, the use of lumped processes enables to highlight differences between intronic sequences and gene untranslated region sequences.  相似文献   

9.
Law NF  Cheng KO  Siu WC 《Bioinformation》2006,1(7):242-246
Z-curve features are one of the popular features used in exon/intron classification. We showed that although both Z-curve and Fourier approaches are based on detecting 3-periodicity in coding regions, there are significant differences in their spectral formulation. From the spectral formulation of the Z-curve, we obtained three modified sequences that characterize different biological properties. Spectral analysis on the modified sequences showed a much more prominent 3-periodicity peak in coding regions than the Fourier approach. For long sequences, prominent peaks at 2Pi/3 are observed at coding regions, whereas for short sequences, clearly discernible peaks are still visible. Better classification can be obtained using spectral features derived from the modified sequences.  相似文献   

10.
The comparative and evolutionary analysis of molecular data has allowed researchers to tackle biological questions that have long remained unresolved. The evolution of DNA and amino acid sequences can now be modeled accurately enough that the information conveyed can be used to reconstruct the past. The methods to infer phylogeny (the pattern of historical relationships among lineages of organisms and/or sequences) range from the simplest, based on parsimony, to more sophisticated and highly parametric ones based on likelihood and Bayesian approaches. In general, molecular systematics provides a powerful statistical framework for hypothesis testing and the estimation of evolutionary processes, including the estimation of divergence times among taxa. The field of molecular systematics has experienced a revolution in recent years, and, although there are still methodological problems and pitfalls, it has become an essential tool for the study of evolutionary patterns and processes at different levels of biological organization. This review aims to present a brief synthesis of the approaches and methodologies that are most widely used in the field of molecular systematics today, as well as indications of future trends and state-of-the-art approaches.  相似文献   

11.
To maximize the information commonly collected from otoliths, the effect of DNA extraction on the estimation of age with otoliths was evaluated by comparing sagittal otolith samples from common coral trout ( Plectropomus leopardus ) for clarity and ageing discrepancies in DNA-extracted and untreated control otoliths. The DNA extraction process had no significant effect, indicating that archived otoliths can be used as a source of DNA while retaining their utility for age estimation.  相似文献   

12.
A new method is proposed for estimation of polymerase activities using fluorescence detection during isothermal reaction. The method allows simultaneous determination of DNA-dependent DNA polymerase and 5'-3'-exonuclease activities using amplifiers supplied with an optical module for fluorescence detection under real-time conditions. Different primer-template combinations used as polymerase substrates were compared. Primer elongation (polymerase reaction) is detected by changes in SYBR Green I fluorescence upon binding to dsDNA during reaction; nuclease activities are detected by changes in fluorescence due to cleavage of the probe, containing the reporter fluorophore and fluorescence quencher, and hybridized in advance to the template single-stranded region. It was also shown that the method can be used for determination of relative activities of DNA polymerase preparations, estimation of temperature-time dissociation parameters of polymerase complexes with specific antibodies to its active center, and analysis of effects of inhibitors and activators of different nature on reaction rates of dsDNA polymerization and 5'-3'-exonuclease cleavage by polymerase. The method can be also used for estimation of endonuclease activities of DNA polymerases.  相似文献   

13.
14.
High-efficiency thermal asymmetric interlaced (HE-TAIL) PCR is a modified thermal asymmetric interlaced (TAIL) method for finding unknown genomic DNA sequences adjacent to known sequences in GC-rich plant DNA. Necessary modifications to obtain high-efficiency amplification of flanking sequences are the inclusion of 2 control reactions during tertiary cycling and the design of long gene-specific primers, which can be used during single-step annealing-extension PCR. The modified protocol is suitable to walk from short known sequences, such as sequence-tagged sites (STS), expressed sequence tags (EST), or short exon sequences, and enables researchers to clone full-length open reading frames (ORFs) without library screening. Moreover, the HE-TAIL method can be used to identify DNA sequences flanking T-DNA insertions or to isolate promoter regions. Although individual steps are limited to about 4 kb, multiple steps can be done to walk upstream or downstream of known regions.  相似文献   

15.
Many signal processing based methods for finding hidden periodicities in DNA sequences have primarily focused on assigning numerical values to the symbolic DNA sequence and then applying spectral analysis tools such as the short-time discrete Fourier transform (ST-DFT) to locate these repeats. The key results pertaining to this approach are however obtained using a very specific symbolic to numerical map, namely the so-called Voss representation. An important research problem is to therefore quantify the sensitivity of these results to the choice of the symbolic to numerical map. In this article, a novel algebraic approach to the periodicity detection problem is presented and provides a natural framework for studying the role of the symbolic to numerical map in finding these repeats. More specifically, we derive a new matrix-based expression of the DNA spectrum that comprises most of the widely used mappings in the literature as special cases, shows that the DNA spectrum is in fact invariable under all these mappings, and generates a necessary and sufficient condition for the invariance of the DNA spectrum to the symbolic to numerical map. Furthermore, the new algebraic framework decomposes the periodicity detection problem into several fundamental building blocks that are totally independent of each other. Sophisticated digital filters and/or alternate fast data transforms such as the discrete cosine and sine transforms can therefore be always incorporated in the periodicity detection scheme regardless of the choice of the symbolic to numerical map. Although the newly proposed framework is matrix based, identification of these periodicities can be achieved at a low computational cost.  相似文献   

16.
The energetics of protein‐DNA interactions are often modeled using so‐called statistical potentials, that is, energy models derived from the atomic structures of protein‐DNA complexes. Many statistical protein‐DNA potentials based on differing theoretical assumptions have been investigated, but little attention has been paid to the types of data and the parameter estimation process used in deriving the statistical potentials. We describe three enhancements to statistical potential inference that significantly improve the accuracy of predicted protein‐DNA interactions: (i) incorporation of binding energy data of protein‐DNA complexes, in conjunction with their X‐ray crystal structures, (ii) use of spatially‐aware parameter fitting, and (iii) use of ensemble‐based parameter fitting. We apply these enhancements to three widely‐used statistical potentials and use the resulting enhanced potentials in a structure‐based prediction of the DNA binding sites of proteins. These enhancements are directly applicable to all statistical potentials used in protein‐DNA modeling, and we show that they can improve the accuracy of predicted DNA binding sites by up to 21%. Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

17.
We have compared two statistical methods of estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences, which have been proposed by Templeton (1993) and Bandeltet al. (1995). Monte-Carlo simulations were used for generating DNA sequence data. Different evolutionary scenarios were simulated and the estimation procedures were evaluated. We have found that for both methods (i) the estimates are insensitive to demographic parameters and (ii) the standard deviations of the estimates are too high for these methods to be reliably used in practice.  相似文献   

18.
Although gene amplification, a process that is markedly enhanced in tumor cells, has been studied in many different cell systems, there is still controversy about the mechanism(s) involved in this process. It is still unclear what happens to the DNA sequences that become amplified, whether they remain present at their original location (conservative gene amplification) or whether gene amplification necessarily results in a deletion at the original location (non-conservative gene amplification). We have studied gene amplification in a human osteosarcoma cell line, starting from a cell clone which contains only one copy of a plasmid integrate. Independent amplificants, originating from this clone and containing elevated plasmid copy numbers, were isolated and analyzed. Based on previous observations, encompassing the persistence of single-copy DNA sequences besides amplified DNA sequences clustered at a different location in the independent amplificants, we proposed an amplification pathway including a local duplication step and transposition of the duplicated DNA to other chromosomal positions. Now we have extended our study to more independent amplificants. We prove that the single-copy plasmid-containing chromosomes in the different amplificants and the single-copy plasmid-containing chromosome in the original parental cell clone are indeed identical, namely a translocation chromosome composed of at least three parts of which two originate from chromosomes 14 and 17. We show that the unit of amplification and the unit of the proposed transposition event are at least 1.5 Mb. We also demonstrate that the amplified DNA sequences, present at genomic locations other than the original single-copy DNA sequences, are preferentially associated with chromosome 16. We find that the amplified DNA sequences are often located at or near a site of chromosome translocation involving chromosome 16. In one cell clone we detect the amplified DNA sequences in most of the cells to be located within a complete chromosome 16 while in a minority of cells the amplified sequences are located at or near a breakpoint on a translocation chromosome 16. This indicates that this amplification region is highly unstable and frequently gives rise to translocation events.  相似文献   

19.
20.
We have developed a novel technique for specific amplification of rare methylated DNA fragments in a high background of unmethylated sequences that avoids the need of bisulphite conversion. The methylation-dependent restriction enzyme GlaI is used to selectively cut methylated DNA. Then targeted fragments are tagged using specially designed ‘helper’ oligonucleotides that are also used to maintain selection in subsequent amplification cycles in a process called ‘helper-dependent chain reaction’. The process uses disabled primers called ‘drivers’ that can only prime on each cycle if the helpers recognize specific sequences within the target amplicon. In this way, selection for the sequence of interest is maintained throughout the amplification, preventing amplification of unwanted sequences. Here we show how the method can be applied to methylated Septin 9, a promising biomarker for early diagnosis of colorectal cancer. The GlaI digestion and subsequent amplification can all be done in a single tube. A detection sensitivity of 0.1% methylated DNA in a background of unmethylated DNA was achieved, which was similar to the well-established Heavy Methyl method that requires bisulphite-treated DNA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号