首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
三周期性是大多数基因组序列的编码区所具有的主要特征.本文提出只计算1/3频率点的傅里叶频谱的快速计算方法,并用它分析DNA序列的三周期性,再利用小波变换在一定尺度下滤波来实现对DNA序列编码区的预测.理论分析和大量计算机实验证实了方法的有效性,预测效果良好.该方法运算快速,不需要任何训练组,也不依赖于现有数据库的信息.  相似文献   

2.
Mapping nucleotide sequences onto a "DNA walk" produces a novel representation of DNA that can then be studied quantitatively using techniques derived from fractal landscape analysis. We used this method to analyze 11 complete genomic and cDNA myosin heavy chain (MHC) sequences belonging to 8 different species. Our analysis suggests an increase in fractal complexity for MHC genes with evolution with vertebrate > invertebrate > yeast. The increase in complexity is measured by the presence of long-range power-law correlations, which are quantified by the scaling exponent alpha. We develop a simple iterative model, based on known properties of polymeric sequences, that generates long-range nucleotide correlations from an initially noncorrelated coding region. This new model-as well as the DNA walk analysis-both support the intron-late theory of gene evolution.  相似文献   

3.
High-efficiency thermal asymmetric interlaced (HE-TAIL) PCR is a modified thermal asymmetric interlaced (TAIL) method for finding unknown genomic DNA sequences adjacent to known sequences in GC-rich plant DNA. Necessary modifications to obtain high-efficiency amplification of flanking sequences are the inclusion of 2 control reactions during tertiary cycling and the design of long gene-specific primers, which can be used during single-step annealing-extension PCR. The modified protocol is suitable to walk from short known sequences, such as sequence-tagged sites (STS), expressed sequence tags (EST), or short exon sequences, and enables researchers to clone full-length open reading frames (ORFs) without library screening. Moreover, the HE-TAIL method can be used to identify DNA sequences flanking T-DNA insertions or to isolate promoter regions. Although individual steps are limited to about 4 kb, multiple steps can be done to walk upstream or downstream of known regions.  相似文献   

4.
The palindrome is one class of symmetrical duplications with reverse complementary characters,which is widely distributed in many organisms.Graphical representation of DNA sequence provides a simple way of viewing and comparing various genomic structures.Through 3-D DNA walk analysis,the similarity and differences in nucleotide composition,as well as the evolutionary relationship between human and chimpanzee MAGE/CSAG-palindromes,can be clearly revealed.Further wavelet analysis indicated that duplicated segments have irregular patterns compared to their surrounding sequences.However,sequence similarity analysis suggests that there is possible common ancestor between human and chimpanzee MAGE/CSAG-palindromes.Based on the specific distribution and orientation of the repeated sequences,a simple possible evolutionary model of the palindromes is suggested,which may help us to better understand the evolutionary course of the genes and the symmetrical sequences.  相似文献   

5.
Challenging tasks are encountered in the field of bioinformatics. The choice of the genomic sequence’s mapping technique is one the most fastidious tasks. It shows that a judicious choice would serve in examining periodic patterns distribution that concord with the underlying structure of genomes. Despite that, searching for a coding technique that can highlight all the information contained in the DNA has not yet attracted the attention it deserves. In this paper, we propose a new mapping technique based on the chaos game theory that we call the frequency chaos game signal (FCGS). The particularity of the FCGS coding resides in exploiting the statistical properties of the genomic sequence itself. This may reflect important structural and organizational features of DNA. To prove the usefulness of the FCGS approach in the detection of different local periodic patterns, we use the wavelet analysis because it provides access to information that can be obscured by other time-frequency methods such as the Fourier analysis. Thus, we apply the continuous wavelet transform (CWT) with the complex Morlet wavelet as a mother wavelet function. Scalograms that relate to the organism Caenorhabditis elegans (C. elegans) exhibit a multitude of periodic organization of specific DNA sequences.  相似文献   

6.
Soybean is believed to be a diploidized tetraploid generated from an allotetraploid ancestor. In this study, we used hypomethylated genomic DNA as a source of probes to investigate the genomic structure and methylation patterns of duplicated sequences. Forty-five genomic clones from Phaseolus vulgaris and 664 genomic clones from Glycine max were used to examine the duplicated regions in the soybean genome. Southern analysis of genomic DNA using probes from both sources revealed that greater than 15% of the hypomethylated genomic regions were only present once in the soybean genome. The remaining ca. 85% of the hypomethylated regions comprise duplicated or middle repetitive DNA sequences. If only the ratio of single to duplicate probe patterns is considered, it appears that 25% of the single-copy sequences have been lost. By using a subset of probes that only detected duplicated sequences, we examined the methylation status of the homeologous genomes with the restriction enzymes MspI and HpaII. We found that in all cases both copies of these regions were hypomethylated, although there were examples of low-level methylation. It appears that duplicate sequences are being eliminated in the diploidization process. Our data reveal no evidence that duplicated sequences are being silenced by inactivation correlated with methylation patterns.  相似文献   

7.
The hybridization of human DNA with three non-cross-hybridizing monomers (68 bp in length) of the heterochromatic Sau3A family of DNA repeats, indicates the coexistence within a Sau3A-positive genomic block of divergent Sau3A units as well as of unrelated sequences. To gain some insight into the structure of these human heterochromatic DNA regions, three previously cloned Sau3A-positive genomic fragments (with a total length of approximately 1900 base-pairs (bp] were sequenced. The analysis of the sequences showed the presence of clustered Sau3A units with different degrees of divergence and of two DNA regions of approximately 100 bp and 291 bp in length, unrelated to the family of repeats. A consensus sequence derived from the 24 identified Sau3A monomers presents, among highly variable regions, two less variant regions of 8 bp and 10 bp in length, respectively. The Sau3A-unrelated DNA fragment 291 bp in length, used as a probe on genomic DNA digested with a series of restriction enzymes, defines a "new" family of DNA repeats possessing periodicities for HaeIII (HaeIII family). Sau3A and HaeIII repeats display a high degree of linkage in a collection of Sau3A-positive genomic recombinant phages.  相似文献   

8.
Analyses of genomic DNA sequences have shown in previous works that base pairs are correlated at large distances with scale-invariant statistical properties. We show in the present study that these correlations between nucleotides (letters) result in fact from long-range correlations (LRC) between sequence-dependent DNA structural elements (words) involved in the packaging of DNA in chromatin. Using the wavelet transform technique, we perform a comparative analysis of the DNA text and of the corresponding bending profiles generated with curvature tables based on nucleosome positioning data. This exploration through the optics of the so-called `wavelet transform microscope' reveals a characteristic scale of 100-200 bp that separates two regimes of different LRC. We focus here on the existence of LRC in the small-scale regime ( 200 bp). Analysis of genomes in the three kingdoms reveals that this regime is specifically associated to the presence of nucleosomes. Indeed, small scale LRC are observed in eukaryotic genomes and to a less extent in archaeal genomes, in contrast with their absence in eubacterial genomes. Similarly, this regime is observed in eukaryotic but not in bacterial viral DNA genomes. There is one exception for genomes of Poxviruses, the only animal DNA viruses that do not replicate in the cell nucleus and do not present small scale LRC. Furthermore, no small scale LRC are detected in the genomes of all examined RNA viruses, with one exception in the case of retroviruses. Altogether, these results strongly suggest that small-scale LRC are a signature of the nucleosomal structure. Finally, we discuss possible interpretations of these small-scale LRC in terms of the mechanisms that govern the positioning, the stability and the dynamics of the nucleosomes along the DNA chain. This paper is maily devoted to a pedagogical presentation of the theoretical concepts and physical methods which are well suited to perform a statistical analysis of genomic sequences. We review the results obtained with the so-called wavelet-based multifractal analysis when investigating the DNA sequences of various organisms in the three kingdoms. Some of these results have been announced in B. Audit et al. [1, 2].  相似文献   

9.

Background

Array-based comparative genomic hybridization (array CGH) is a highly efficient technique, allowing the simultaneous measurement of genomic DNA copy number at hundreds or thousands of loci and the reliable detection of local one-copy-level variations. Characterization of these DNA copy number changes is important for both the basic understanding of cancer and its diagnosis. In order to develop effective methods to identify aberration regions from array CGH data, many recent research work focus on both smoothing-based and segmentation-based data processing. In this paper, we propose stationary packet wavelet transform based approach to smooth array CGH data. Our purpose is to remove CGH noise in whole frequency while keeping true signal by using bivariate model.

Results

In both synthetic and real CGH data, Stationary Wavelet Packet Transform (SWPT) is the best wavelet transform to analyze CGH signal in whole frequency. We also introduce a new bivariate shrinkage model which shows the relationship of CGH noisy coefficients of two scales in SWPT. Before smoothing, the symmetric extension is considered as a preprocessing step to save information at the border.

Conclusion

We have designed the SWTP and the SWPT-Bi which are using the stationary wavelet packet transform with the hard thresholding and the new bivariate shrinkage estimator respectively to smooth the array CGH data. We demonstrate the effectiveness of our approach through theoretical and experimental exploration of a set of array CGH data, including both synthetic data and real data. The comparison results show that our method outperforms the previous approaches.
  相似文献   

10.
We present an algorithm to detect protein sub-structural motifs from primary sequence. The input to the algorithm is a set of aligned multiple protein sequences. It uses wavelet transforms to decompose protein sequences represented numerically by different indices (such as polarity, accessible surface area or electron-ion integration potentials of the amino acids). The numerical representation of a protein sequence has significant correlation with its biological activity, thus common motifs are expected to be observable from the wavelet spectrum. The decomposed signals are then up-sampled and similarity search techniques are used to identify similar regions across all the proteins at multiple scales. Results indicate that wavelet transform techniques are a promising approach for rapid motif detection.  相似文献   

11.
Identifying protein-coding regions in DNA sequences is an active issue in computational biology. In this study, we present a self adaptive spectral rotation (SASR) approach, which visualizes coding regions in DNA sequences, based on investigation of the Triplet Periodicity property, without any preceding training process. It is proposed to help with the rough coding regions prediction when there is no extra information for the training required by other outstanding methods. In this approach, at each position in the DNA sequence, a Fourier spectrum is calculated from the posterior subsequence. Following the spectrums, a random walk in complex plane is generated as the SASR's graphic output. Applications of the SASR on real DNA data show that patterns in the graphic output reveal locations of the coding regions and the frame shifts between them: arcs indicate coding regions, stable points indicate non-coding regions and corners' shapes reveal frame shifts. Tests on genomic data set from Saccharomyces Cerevisiae reveal that the graphic patterns for coding and non-coding regions differ to a great extent, so that the coding regions can be visually distinguished. Meanwhile, a time cost test shows that the SASR can be easily implemented with the computational complexity of O(N).  相似文献   

12.
In the Suppressor of Underreplication( SuUR) mutant strain of Drosophila melanogaster, the heterochromatin of polytene chromosomes is not underreplicated and, as a consequence, a number of beta-heterochromatic regions acquire a banded structure. The chromocenter does not form in these polytene chromosomes, and heterochromatic regions, normally part of the chromocenter, become accessible to cytological analysis. We generated four genomic DNA libraries from specific heterochromatic regions by microdissection of polytene chromosomes. In situ hybridization of individual libraries onto SuUR polytene chromosomes shows that repetitive DNA sequences spread into the neighboring euchromatic regions. This observation allows the localization of eu-heterochromatin transition zones on polytene chromosomes. We find that genomic scaffolds from the eu-heterochromatin transition zones are enriched in repetitive DNA sequences homologous to those flanking the suppressor of forked gene [ su(f) repeat]. We isolated and sequenced about 300 clones from the heterochromatic DNA libraries obtained. Most of the clones contain repetitive DNA sequences; however, some of the clones have unique DNA sequences shared with parts of unmapped genomic scaffolds. Hybridization of these clones onto SuUR polytene chromosomes allowed us to assign the cytological localizations of the corresponding genomic scaffolds within heterochromatin. Our results demonstrate that the SuUR mutant renders possible the mapping of heterochromatic scaffolds on polytene chromosomes.  相似文献   

13.
MOTIVATION: At a recent meeting, the wavelet transform was depicted as a small child kicking back at its father, the Fourier transform. Wavelets are more efficient and faster than Fourier methods in capturing the essence of data. Nowadays there is a growing interest in using wavelets in the analysis of biological sequences and molecular biology-related signals. RESULTS: This review is intended to summarize the potential of state of the art wavelets, and in particular wavelet statistical methodology, in different areas of molecular biology: genome sequence, protein structure and microarray data analysis. I conclude by discussing the use of wavelets in modeling biological structures.  相似文献   

14.
It has been established that the precise positioning of nucleosomes on genomic DNA can be achieved, at least for a minority of them, through sequence-dependent processes. However, to what extent DNA sequences play a role in the positioning of the major part of nucleosomes is still debated. The aim of the present study is to examine to what extent long-range correlations (LRC) are related to the presence of nucleosomes. Using the wavelet transform technique, we perform a comparative analysis of the DNA text and of the corresponding bending profiles generated with curvature tables based on nucleosome positioning data. The exploration of a number of eukaryotic and bacterial genomes through the optics of the so-called "wavelet transform microscope" reveals a characteristic scale of 100-200 bp that separates two regimes of different LRC. Here, we focus on the existence of LRC in the small-scale regime (10-200 bp) which are actually observed in eukaryotic genomes, in contrast to their absence in eubacterial genomes. Analysis of viral DNA genomes shows that, like their host's genomes, eukaryotic viruses present LRC but eubacterial viruses do not. There is one exception for genomes of poxviruses (Vaccinia and Melamoplus sanguinipes) which do not replicate in the cell nucleus and do not exhibit LRC. No small-scale LRC are detected in the genomes of all examined RNA viruses, with the exception of retroviruses. These results together with the observation of LRC between particular sequence motifs known to participate in the formation of nucleosomes (e.g. AA dinucleotides) strongly suggest that the 10-200 bp LRC are a signature of the sequence-dependence of nucleosome positioning. Finally, we discuss possible interpretations of these LRC in terms of the physical mechanisms that might govern the positioning and the dynamics of the nucleosomes along the DNA chain through cooperative processes.  相似文献   

15.
The distribution of interspersed repetitive DNA sequences in the human genome   总被引:25,自引:0,他引:25  
The distribution of interspersed repetitive DNA sequences in the human genome has been investigated, using a combination of biochemical, cytological, computational, and recombinant DNA approaches. "Low-resolution" biochemical experiments indicate that the general distribution of repetitive sequences in human DNA can be adequately described by models that assume a random spacing, with an average distance of 3 kb. A detailed "high-resolution" map of the repetitive sequence organization along 400 kb of cloned human DNA, including 150 kb of DNA fragments isolated for this study, is consistent with this general distribution pattern. However, a higher frequency of spacing distances greater than 9.5 kb was observed in this genomic DNA sample. While the overall repetitive sequence distribution is best described by models that assume a random distribution, an analysis of the distribution of Alu repetitive sequences appearing in the GenBank sequence database indicates that there are local domains with varying Alu placement densities. In situ hybridization to human metaphase chromosomes indicates that local density domains for Alu placement can be observed cytologically. Centric heterochromatin regions, in particular, are at least 50-fold underrepresented in Alu sequences. The observed distribution for repetitive sequences in human DNA is the expected result for sequences that transpose throughout the genome, with local regions of "preference" or "exclusion" for integration.  相似文献   

16.
Hypersensitive (HS) sites in genomic sequences are reliable markers of DNA regulatory regions that control gene expression. Annotation of regulatory regions is important in understanding phenotypical differences among cells and diseases linked to pathologies in protein expression. Several computational techniques are devoted to mapping out regulatory regions in DNA by initially identifying HS sequences. Statistical learning techniques like Support Vector Machines (SVM), for instance, are employed to classify DNA sequences as HS or non-HS. This paper proposes a method to automate the basic steps in designing an SVM that improves the accuracy of such classification. The method proceeds in two stages and makes use of evolutionary algorithms. An evolutionary algorithm first designs optimal sequence motifs to associate explicit discriminating feature vectors with input DNA sequences. A second evolutionary algorithm then designs SVM kernel functions and parameters that optimally separate the HS and non-HS classes. Results show that this two-stage method significantly improves SVM classification accuracy. The method promises to be generally useful in automating the analysis of biological sequences, and we post its source code on our website.  相似文献   

17.
Summary Calculations of DNA angular parameters in 50 eukaryotic sequences reveal regions of large conformational deviations from ideal DNA around regulatory sites. Frequently, discrete peaks of structural variation are present upstream of genes. Known regulatory regions often include variants of consensus sequences. Thus, imprecise sequences and structures are recognized within large genomic stretches. The existence of structurally wrinkled regions in the vicinity of regulatory sequences is likely to facilitate greatly their recognition by proteins and enzymes.  相似文献   

18.
"Minghui 63" is the restorer line for a number of the most important commercial rice hybrids varieties in China. To facilitate long-term commitment in genetic analysis and molecular cloning of the superior genes in the genome of "Minghui 63", the authors have constructed a largeinsert genomic DNA library using the bacterial artificial chromosome (BAC) cloning vector (pBe- loBAC 11). Size fractionated Hind m digest of genomic DNA was ligated to the BAC vector, and the ligation mixture was used to transform the bacterial strain DH10B. A total of over 26 000 clones were obtained with the average insert size of about 150 kb, ranging from 90 to 240 kb. These clones thus represent 9 x rice haploid genome equivalents. The library is now being used for physical mapping of several genomic regions for map-based gene cloning.  相似文献   

19.
20.
C Nobile  G Romeo 《Genomics》1988,3(3):272-274
A method for partial digestion of total human DNA with restriction enzymes has been developed on the basis of a principle already utilized by P.A. Whittaker and E. Southern (1986, Gene 41: 129-134) for the analysis of phage lambda recombinants. Total human DNA irradiated with uv light of 254 nm is partially digested by restriction enzymes that recognize sequences containing adjacent thymidines because of TT dimer formation. The products resulting from partial digestion of specific genomic regions are detected in Southern blots by genomic-unique DNA probes with high reproducibility. This procedure is rapid and simple to perform because the same conditions of uv irradiation are used for different enzymes and probes. It is shown that restriction site polymorphisms occurring in the genomic regions analyzed are recognized by the "allelic" partial digest patterns they determine.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号