首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The chaos game representation (CGR) is a scatter plot derived from a DNA sequence, with each point of the plot corresponding to one base of the sequence. If the DNA sequence were a random collection of bases, the CGR would be a uniformly filled square; conversely, any patterns visible in the CGR represent some pattern (information) in the DNA sequence. In this paper, patterns previously observed in a variety of DNA sequences are explained solely in terms of nucleotide, dinucleotide and trinucleotide frequencies.  相似文献   

2.
A human myosin heavy-chain gene, cloned in gamma Charon 4A phage (and as a clone designated lambda gMHC-1), was shown to code for a cardiac myosin heavy chain of the beta-type. The 5' end of the 14,200-base-pair genomic DNA clone is located in the head region of the myosin chain. The 3' end was shown to extent to the COOH terminus and includes the 3'-nontranslated sequence of the corresponding mRNA. The identification of lambda gMHC-1 as coding for a cardiac beta-myosin heavy chain was achieved by heteroduplex mapping using genomic cardiac myosin heavy-chain DNA of rabbit as a probe and, furthermore, by DNA sequence analysis of three selected subregions of the clones DNA including the 3'-nontranslated sequence. It was demonstrated by the S1 nuclease protection technique that the beta-myosin heavy-chain gene is transcribed in human heart muscle. In addition, we have found by the same technique that it is also expressed in human skeletal muscle.  相似文献   

3.
J Jakubowski  K Kornfeld 《Genetics》1999,153(2):743-752
Ras-mediated signaling is required for induction of vulval cell fates during Caenorhabditis elegans development. By screening for suppressors of the multivulva phenotype caused by constitutively active let-60 ras, we identified the mutation n2527. To clone the gene affected by n2527, we developed a method for high-resolution mapping. We took advantage of the genomic DNA sequence of the N2 strain by using DNA sequencing to scan for single-nucleotide polymorphisms (SNPs) at defined genomic positions of the RC301 strain. An average of one polymorphism per 1.4 kb was detected in predicted intergenic regions. Because of this high frequency, DNA sequencing is an efficient method to scan for SNPs. By alternating between identifying SNPs and mapping n2527 using selected recombinants, we generated an SNP map of progressively higher density. An intensive search for SNPs resulted in a local map with an average marker spacing of approximately 4 kb. This was used to map n2527 to a 9.6-kb interval. The small size of this interval made it feasible to use DNA sequencing to identify the molecular lesion. In principle, this approach can be used for high-resolution mapping of any C. elegans mutation. Furthermore, this approach can be applied to other species as the genomic sequence becomes available. The n2527 mutation affects a previously uncharacterized gene that we named cdf-1, as it encodes a predicted protein with significant similarity to members of the cation diffusion facilitator family.  相似文献   

4.
Jiming Jiang  Bikram S Gill 《Génome》2006,49(9):1057-1068
Fluorescence in situ hybridization (FISH), which allows direct mapping of DNA sequences on chromosomes, has become the most important technique in plant molecular cytogenetics research. Repetitive DNA sequence can generate unique FISH patterns on individual chromosomes for karyotyping and phylogenetic analysis. FISH on meiotic pachytene chromosomes coupled with digital imaging systems has become an efficient method to develop physical maps in plant species. FISH on extended DNA fibers provides a high-resolution mapping approach to analyze large DNA molecules and to characterize large genomic loci. FISH-based physical mapping provides a valuable complementary approach in genome sequencing and map-based cloning research. We expect that FISH will continue to play an important role in relating DNA sequence information to chromosome biology. FISH coupled with immunoassays will be increasingly used to study features of chromatin at the cytological level that control expression and regulation of genes.  相似文献   

5.
Species-specific patterns of DNA bending and sequence.   总被引:16,自引:6,他引:10       下载免费PDF全文
Nucleotide sequences in the GenEMBL database were analyzed using strategies designed to reveal species-specific patterns of DNA bending and DNA sequence. The results uncovered striking species-dependent patterns of bending with more variations among individual organisms than between prokaryotes and eukaryotes. The frequency of bent sites in sequences from different bacteria was related to genomic A + T content and this relationship was confirmed by electrophoretic analysis of genomic DNA. However, base composition was not an accurate predictor for DNA bending in eukaryotes. Sequences from C. elegans exhibited the highest frequency of bent sites in the database and the RNA polymerase II locus from the nematode was the most bent gene in GenEMBL. Bent DNA extended throughout most introns and gene flanking segments from C.elegans while exon regions lacked A-tract bending characteristics. Independent evidence for the strong bending character of this genome was provided by electrophoretic studies which revealed that a large number of the fragments from C.elegans DNA exhibited anomalous gel mobilities when compared to genomic fragments from over 20 other organisms. The prevalence of bent sites in this genome enabled us to detect selectively C.elegans sequences in a computer search of the database using as probes C.elegans introns, bending elements, and a 20 nucleotide consensus sequence for bent DNA. This approach was also used to provide additional examples of species-specific sequence patterns in eukaryotes where it was shown that (A) greater than or equal to 10 and (A.T) greater than or equal to 5 tracts are prevalent throughout the untranslated DNA of D.discodium and P.falciparum, respectively. These results provide new insight into the organization of eukaryotic DNA because they show that species-specific patterns of simple sequences are found in introns and in other untranslated regions of the genome.  相似文献   

6.
With the completion of the human and a few model organisms' genomes, and with the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time-based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Our method requires approximately 6 . N byte memory and a computational time of N log N to extract all the repeat-related and periodic or quasi-periodic features from a sequence of length N without any prior knowledge on the consensus sequence of those features, hence enables us to carry out sequence analysis on the whole genomic scale by a PC.  相似文献   

7.
8.
Forward genetic screens provide a powerful approach for inferring gene function on the basis of the phenotypes associated with mutated genes. However, determining the causal mutation by traditional mapping and candidate gene sequencing is often the rate-limiting step, especially when analyzing many mutants. We report two genomic approaches for more rapidly determining the identity of the affected genes in Caenorhabditis elegans mutants. First, we report our use of restriction site-associated DNA (RAD) polymorphism markers for rapidly mapping mutations after chemical mutagenesis and mutant isolation. Second, we describe our use of genomic interval pull-down sequencing (GIPS) to selectively capture and sequence megabase-sized portions of a mutant genome. Together, these two methods provide a rapid and cost-effective approach for positional cloning of C. elegans mutant loci, and are also applicable to other genetic model systems.  相似文献   

9.
三周期性是大多数基因组序列的编码区所具有的主要特征.本文提出只计算1/3频率点的傅里叶频谱的快速计算方法,并用它分析DNA序列的三周期性,再利用小波变换在一定尺度下滤波来实现对DNA序列编码区的预测.理论分析和大量计算机实验证实了方法的有效性,预测效果良好.该方法运算快速,不需要任何训练组,也不依赖于现有数据库的信息.  相似文献   

10.
《Genomics》2020,112(2):1847-1852
A novel method is proposed to detect the acceptor and donor splice sites using chaos game representation and artificial neural network. In order to achieve high accuracy, inputs to the neural network, or feature vector, shall reflect the true nature of the DNA segments. Therefore it is important to have one-to-one numerical representation, i.e. a feature vector should be able to represent the original data. Chaos game representation (CGR) is an iterative mapping technique that assigns each nucleotide in a DNA sequence to a respective position on the plane in a one-to-one manner. Using CGR, a DNA sequence can be mapped to a numerical sequence that reflects the true nature of the original sequence. In this research, we propose to use CGR as feature input to a neural network to detect splice sites on the NN269 dataset. Computational experiments indicate that this approach gives good accuracy while being simpler than other methods in the literature, with only one neural network component. The code and data for our method can be accessed from this link: https://github.com/thoang3/portfolio/tree/SpliceSites_ANN_CGR.  相似文献   

11.
R-ISSR as a new tool for genomic fingerprinting, mapping, and gene tagging   总被引:1,自引:0,他引:1  
In the present study we propose and test the concept of R-ISSR, a new tool for genomic fingerprinting, mapping, and gene tagging. The concept is based on the fact that primers for inter-simple sequence repeat (ISSR) and random-amplified polymorphic DNA (RAPD) analysis elicit different genomic information, and the combined use of these 2 kinds of primers in the same polymerase chain reaction (PCR) reactions might reveal new genomic loci that could not be detected with either technique alone. The feasibility of this tool was first electronically simulated with sequence analysis software andArabidopsis chromosome sequence. Next, different combinations of ISSR and RAPD primers were applied in real PCR reactions to detect new genomic loci in 2 maize lines (Q319 and 1145). Sequencing gels were used to separate PCR products and showed good resolving ability in comparison with agarose gels. RAPD primers could be successfully used with ISSR primers for the detection of new genomic loci and applied in a new way for genomic mapping, fingerprinting, and gene tagging.  相似文献   

12.
The Tc1 transposon of Caenorhabditis elegans always integrates into the sequence TA, but some TA sites are preferred to others. We investigated a TA target site from the gpa-2 gene of C.elegans that was previously found to be preferred (hot) for Tc1 integration in vivo . This site with its immediate flanks was cloned into a plasmid, and remained hot in vitro , showing that sequences immediately adjacent to the TA dinucleotide determine this target choice. Further deletion mapping and mutagenesis showed that a 4 bp sequence on one side of the TA is sufficient to make a site hot; this sequence nicely fits the previously identified Tc1 consensus sequence for integration. In addition, we found a second type of hot site: this site is only preferred for integration when the target DNA is supercoiled, not when it is relaxed. Excision frequencies were relatively independent of the flanking sequences. The distribution of Tc1 insertions into a plasmid was similar when we used nuclear extracts or purified Tc1 transposase in vitro , showing that the Tc1 transposase is the protein responsible for the target choice.  相似文献   

13.
The two strands of a DNA molecule with a repetitive sequence can pair into many different basepairing patterns. For perfectly periodic sequences, early bulk experiments of P?rschke indicate the existence of a sliding process, permitting the rapid transition between different relative strand positions. Here, we use a detailed theoretical model to study the basepairing dynamics of periodic and nearly periodic DNA. As suggested by P?rschke, DNA sliding is mediated by basepairing defects (bulge loops), which can diffuse along the DNA. Moreover, a shear force f on opposite ends of the two strands yields a characteristic dynamic response: An outward average sliding velocity v approximately 1/N is induced in a double strand of length N, provided f is larger than a threshold fc. Conversely, if the strands are initially misaligned, they realign even against an external force f < fc. These dynamics effectively result in a viscoelastic behavior of DNA under shear forces, with properties that are programmable through the choice of the DNA sequence. We find that a small number of mutations in periodic sequences does not prevent DNA sliding, but introduces a time delay in the dynamic response. We clarify the mechanism for the time delay and describe it quantitatively within a phenomenological model. Based on our findings, we suggest new dynamical roles for DNA in artificial nanoscale devices. The basepairing dynamics described here is also relevant for the extension of repetitive sequences inside genomic DNA.  相似文献   

14.
A wavelet transform of the DNA "walk" constructed from a genomic sequence offers a direct visualization of short and long-range patterns in nucleotide sequences. We study sequences that encode diverse biological functions, taken from a variety of genomes. Pattern irregularities in the transform are frequently associated with sequences of biological interest. Exonic regions, for example, visualize differently under wavelet analysis than introns, and ribosomal RNA regions display distinct universal signatures. DNA walk wavelet analysis can provide a sensitive and rapid assessment of the putative biological significance of genomic DNA.  相似文献   

15.
We have investigated the target choice of the related transposable elements Tc1 and Tc3 of the nematode C. elegans. The exact locations of 204 independent Tc1 insertions and 166 Tc3 insertions in an 1 kbp region of the genome were determined. There was no phenotypic selection for the insertions. All insertions were into the sequence TA. Both elements have a strong preference for certain positions in the 1 kbp region. Hot sites for integration are not clustered or regularly spaced. The orientation of the integrated transposon has no effect on the distribution pattern. We tested several explanations for the target site preference. If simple structural features of the DNA (e.g. bends) would mark hot sites, we would expect the patterns of the two related transposons Tc1 and Tc3 to be similar; however we found them to be completely different. Furthermore we found that the sequence at the donor site has no effect on the choice of the new insertion site, because the insertion pattern of a transposon that jumps from a transgenic donor site is identical to the insertion pattern of transposons jumping from endogenous genomic donor sites. The most likely explanation for the target choice is therefore that the primary sequence of the target site is recognized by the transposase. However, alignment of the Tc1 and Tc3 integration sites does not reveal a strong consensus sequence for either transposon.  相似文献   

16.
The sequencing of the human genome is well underway. Technology has advanced, such that the total genomic sequence is possible, along with an extensive catalogue of genes via comprehensive cDNA libraries. With the recent completion of the Saccharomyces cerevisiae sequencing project and the imminent completion of that of Caenorhabditis elegans, the most frequently asked question is how much can sequence data alone tell us? The answer is that that a DNA sequence taken in isolation from a single organism reveals very little. The vast majority of DNA in most organisms is noncoding. Protein coding sequences or genes cannot function as isolated units without interaction with noncoding DNA and neighboring genes. This genomic environment is specific to each organism. In order to understand this we need to look at similar genes in different organisms, to determine how function and position has changed over the course of evolution. By understanding evolutionary processes we can gain a greater insight into what makes a gene and the wider processes of genetics and inheritance. Comparative genomics (with model organisms), once the poor relation of the human genome project, is starting to provide the key to unlock the DNA code.  相似文献   

17.
Identifying protein-coding regions in DNA sequences is an active issue in computational biology. In this study, we present a self adaptive spectral rotation (SASR) approach, which visualizes coding regions in DNA sequences, based on investigation of the Triplet Periodicity property, without any preceding training process. It is proposed to help with the rough coding regions prediction when there is no extra information for the training required by other outstanding methods. In this approach, at each position in the DNA sequence, a Fourier spectrum is calculated from the posterior subsequence. Following the spectrums, a random walk in complex plane is generated as the SASR's graphic output. Applications of the SASR on real DNA data show that patterns in the graphic output reveal locations of the coding regions and the frame shifts between them: arcs indicate coding regions, stable points indicate non-coding regions and corners' shapes reveal frame shifts. Tests on genomic data set from Saccharomyces Cerevisiae reveal that the graphic patterns for coding and non-coding regions differ to a great extent, so that the coding regions can be visually distinguished. Meanwhile, a time cost test shows that the SASR can be easily implemented with the computational complexity of O(N).  相似文献   

18.
We describe a rapid and cost-effective technique for the in vitro removal of introns and other unwanted regions from genomic DNA to generate a single sequence of continuous coding capacity, where tissues required for RNA extraction and complementary DNA synthesis are unavailable. Based on an overlapping fusion-PCR strategy, we name this procedure SPLICE (for swift PCR for ligating in vitro constructed exons). As proof-of-principle, we used SPLICE successfully to generate a single piece of DNA containing the coding region of a five-exon gene, the short-wavelength-sensitive 1 (SWS1) opsin gene, from genomic DNA extracted from the brown lemur, Eulemur fulvus, in only two short rounds of PCR. Where the genomic structure and sequence is known, this technique may be universally applied to any gene expressed in any organism to generate a practical unit for investigating the function of a particular gene of interest. In this report, we provide a detailed protocol, experimental considerations, and suggestions for troubleshooting.  相似文献   

19.
An estimated 80% of genomic DNA in eukaryotes is packaged as nucleosomes, which, together with the remaining interstitial linker regions, generate higher order chromatin structures [1]. Nucleosome sequences isolated from diverse organisms exhibit ∼10 bp periodic variations in AA, TT and GC dinucleotide frequencies. These sequence elements generate intrinsically curved DNA and help establish the histone-DNA interface. We investigated an important unanswered question concerning the interplay between chromatin organization and genome evolution: do the DNA sequence preferences inherent to the highly conserved histone core exert detectable natural selection on genomic divergence and polymorphism? To address this hypothesis, we isolated nucleosomal DNA sequences from Drosophila melanogaster embryos and examined the underlying genomic variation within and between species. We found that divergence along the D. melanogaster lineage is periodic across nucleosome regions with base changes following preferred nucleotides, providing new evidence for systematic evolutionary forces in the generation and maintenance of nucleosome-associated dinucleotide periodicities. Further, Single Nucleotide Polymorphism (SNP) frequency spectra show striking periodicities across nucleosomal regions, paralleling divergence patterns. Preferred alleles occur at higher frequencies in natural populations, consistent with a central role for natural selection. These patterns are stronger for nucleosomes in introns than in intergenic regions, suggesting selection is stronger in transcribed regions where nucleosomes undergo more displacement, remodeling and functional modification. In addition, we observe a large-scale (∼180 bp) periodic enrichment of AA/TT dinucleotides associated with nucleosome occupancy, while GC dinucleotide frequency peaks in linker regions. Divergence and polymorphism data also support a role for natural selection in the generation and maintenance of these super-nucleosomal patterns. Our results demonstrate that nucleosome-associated sequence periodicities are under selective pressure, implying that structural interactions between nucleosomes and DNA sequence shape sequence evolution, particularly in introns.  相似文献   

20.
Locating protein coding regions in genomic DNA is a critical step in accessing the information generated by large scale sequencing projects. Current methods for gene detection depend on statistical measures of content differences between coding and noncoding DNA in addition to the recognition of promoters, splice sites, and other regulatory sites. Here we explore the potential value of recurrent amino acid sequence patterns 3-19 amino acids in length as a content statistic for use in gene finding approaches. A finite mixture model incorporating these patterns can partially discriminate protein sequences which have no (detectable) known homologs from randomized versions of these sequences, and from short (< or = 50 amino acids) non-coding segments extracted from the S. cerevisiea genome. The mixture model derived scores for a collection of human exons were not correlated with the GENSCAN scores, suggesting that the addition of our protein pattern recognition module to current gene recognition programs may improve their performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号