首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Genomes contain various types of repetitive sequences. They may be used as probes for seeking genome rearrangements because they are rather free from the natural selection if they are located in the intergenic regions. In this study, we searched for tandem repeats (TRs) in 44 prokaryotic genomes by the color-coding method and sought the signs of genome rearrangements by detailed analysis of the detected TRs. We found 13,542 tandem repeats from 44 prokaryotic genomes in total ranging from several tens to one thousand per genome. The results of statistical analysis show that TRs tend to exist on high base composition bias regions in some genomes. Moreover, we recognized the characteristic distribution patterns of equivalent TR-pairs in 12 genomes, which are expected to indicate the occurrence of whole-genome duplication (WGD) on the genomes. It is demonstrated that TRs could indeed be used for seeking genome rearrangements. Although it has not been made clear at this time whether or not WGD had occurred in prokaryotic genomes, the results of the analyses of equivalent TR-pairs in this study are thought to be evidences of WGD in these genomes.  相似文献   

2.

Background  

Biological sequence repeats arranged in tandem patterns are widespread in DNA and proteins. While many software tools have been designed to detect DNA tandem repeats (TRs), useful algorithms for identifying protein TRs with varied levels of degeneracy are still needed.  相似文献   

3.
Tandem repeats (TRs) represent one of the most prevalent features of genomic sequences. Due to their abundance and functional significance, a plethora of detection tools has been devised over the last two decades. Despite the longstanding interest, TR detection is still not resolved. Our large-scale tests reveal that current detectors produce different, often nonoverlapping inferences, reflecting characteristics of the underlying algorithms rather than the true distribution of TRs in genomic data. Our simulations show that the power of detecting TRs depends on the degree of their divergence, and repeat characteristics such as the length of the minimal repeat unit and their number in tandem. To reconcile the diverse predictions of current algorithms, we propose and evaluate several statistical criteria for measuring the quality of predicted repeat units. In particular, we propose a model-based phylogenetic classifier, entailing a maximum-likelihood estimation of the repeat divergence. Applied in conjunction with the state of the art detectors, our statistical classification scheme for inferred repeats allows to filter out false-positive predictions. Since different algorithms appear to specialize at predicting TRs with certain properties, we advise applying multiple detectors with subsequent filtering to obtain the most complete set of genuine repeats.  相似文献   

4.
The presence of repeated sequences is a fundamental feature of genomes. Tandemly repeated DNA appears in both eukaryotic and prokaryotic genomes, it is associated with various regulatory mechanisms and plays an important role in genomic fingerprinting. In this paper, we describe mreps, a powerful software tool for a fast identification of tandemly repeated structures in DNA sequences. mreps is able to identify all types of tandem repeats within a single run on a whole genomic sequence. It has a resolution parameter that allows the program to identify 'fuzzy' repeats. We introduce main algorithmic solutions behind mreps, describe its usage, give some execution time benchmarks and present several case studies to illustrate its capabilities. The mreps web interface is accessible through http://www.loria.fr/mreps/.  相似文献   

5.
Tandem repeats finder: a program to analyze DNA sequences.   总被引:66,自引:3,他引:63       下载免费PDF全文
A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistically based recognition criteria. We demonstrate the algorithm's speed and its ability to detect tandem repeats that have undergone extensive mutational change by analyzing four sequences: the human frataxin gene, the human beta T cellreceptor locus sequence and two yeast chromosomes. These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface atc3.biomath.mssm.edu/trf.html has been established for automated use of the program.  相似文献   

6.
After the dog genome was sequenced, an increasing number of studies involving genetic research of dogs have been conducted to understand gene functions and mammalian evolution. To study the genetic diversity in dogs and other mammals, genetic markers linked to function and conserved in wide lineages are necessary. Thus far, few polymorphic markers have been used in dogs. In this study, we surveyed the entire dog genome and predicted a total of 109 tandem repeats (TRs) located on the protein coding region that may be polymorphic by our prediction model. We selected 10 TRs that may be related to neurophysiology and neural developments, and tested them in 167 individuals of 8 dog breeds: 5 European dog breeds (Beagle, Golden Retriever, Labrador Retriever, German Shepherd, and Toy Poodle) and 3 Japanese dog breeds (Japanese Spitz, Shiba, and Shikoku). Among the tested TRs, nine were polymorphic indicating that 90% of the TRs were successfully predicted to be polymorphic. PCR fragments of the TRs were amplified from dog brain cDNA, showing their expression in the dog brain. Our results provide abundant opportunities for the study of phenotypic variations in dogs, and our prediction method for variable number of tandem repeats (VNTRs) can be applied to any other animal genome sequences for the survey of functional and polymorphic markers.  相似文献   

7.
Tandem repeats occur frequently in biological sequences. They are important for studying genome evolution and human disease. A number of methods have been designed to detect a single tandem repeat in a sliding window. In this article, we focus on the case that an unknown number of tandem repeat segments of the same pattern are dispersively distributed in a sequence. We construct a probabilistic generative model for the tandem repeats, where the sequence pattern is represented by a motif matrix. A Bayesian approach is adopted to compute this model. Markov chain Monte Carlo (MCMC) algorithms are used to explore the posterior distribution as an effort to infer both the motif matrix of tandem repeats and the location of repeat segments. Reversible jump Markov chain Monte Carlo (RJMCMC) algorithms are used to address the transdimensional model selection problem raised by the variable number of repeat segments. Experiments on both synthetic data and real data show that this new approach is powerful in detecting dispersed short tandem repeats. As far as we know, it is the first work to adopt RJMCMC algorithms in the detection of tandem repeats.  相似文献   

8.
Exact Tandem Repeats Analyzer 1.0 (E-TRA) combines sequence motif searches with keywords such as ‘organs’, ‘tissues’, ‘cell lines’ and ‘development stages’ for finding simple exact tandem repeats as well as non-simple repeats. E-TRA has several advanced repeat search parameters/options compared to other repeat finder programs as it not only accepts GenBank, FASTA and expressed sequence tags (EST) sequence files, but also does analysis of multiple files with multiple sequences. The minimum and maximum tandem repeat motif lengths that E-TRA finds vary from one to one thousand. Advanced user defined parameters/options let the researchers use different minimum motif repeats search criteria for varying motif lengths simultaneously. One of the most interesting features of genomes is the presence of relatively short tandem repeats (TRs). These repeated DNA sequences are found in both prokaryotes and eukaryotes, distributed almost at random throughout the genome. Some of the tandem repeats play important roles in the regulation of gene expression whereas others do not have any known biological function as yet. Nevertheless, they have proven to be very beneficial in DNA profiling and genetic linkage analysis studies. To demonstrate the use of E-TRA, we used 5,465,605 human EST sequences derived from 18,814,550 GenBank EST sequences. Our results indicated that 12.44% (679,800) of the human EST sequences contained simple and non-simple repeat string patterns varying from one to 126 nucleotides in length. The results also revealed that human organs, tissues, cell lines and different developmental stages differed in number of repeats as well as repeat composition, indicating that the distribution of expressed tandem repeats among tissues or organs are not random, thus differing from the un-transcribed repeats found in genomes.  相似文献   

9.
MOTIVATION: One of the most interesting features of genomes (both coding and non-coding regions) is the presence of relatively short tandemly repeated DNA sequences known as tandem repeats (TRs). We developed a new PC-based stand-alone software analysis program, combining sequence motif searches with keywords such as organs, tissues, cell lines or development stages for finding exact, inexact and compound, TRs. Tandem Repeats Analyzer 1.5 (TRA) has several advanced repeat search parameters/options over other repeat finder programs as it does not only accept GenBank, FASTA and expressed sequence tag (EST) sequence files but also does analysis of multifiles with multisequences. Advanced user-defined parameters/options let the researchers use different motif lengths search criteria for varying motif lengths simultaneously. The outputs show statistical results to be evaluated by the user. The discovery of TRs in ESTs could be useful for both gene mapping and association studies and discovering TRs located in coding regions of important genes that are expressed under various conditions of environment, stress, organ, tissue and development stage. RESULTS: In this paper, we demonstrated applications of TRA using 175 899 ESTs sequences for three Arabidopsis spp. downloaded from GenBank. The EST-SSRs/ESTs ratios were found 43.1%, 15.3% and 2.34% in A.lyrata, A.thaliana and A.halleri, respectively. Analysis revealed that organs, tissues and development stages possessed different amounts of repeats and repeat compositions. This indicated that the distribution of TRs among the tissues or organs may not be random differing from the untranscribed repeats found in genomes. AVAILABILITY: The program can be obtained free by anonymous FTP from ftp.akdeniz.edu.tr/Araclar/TRA.  相似文献   

10.

Background  

The analysis of Inter-Alu PCR patterns obtained from human genomic DNA samples is a promising technique for a simultaneous analysis of many genomic loci flanked by Alu repetitive sequences in order to detect the presence of genetic polymorphisms. Inter-Alu PCR products may be separated and analyzed by capillary electrophoresis using an automatic sequencer that generates a complex pattern of peaks. We propose an algorithmic method based on the Haar-Walsh Wavelet Packet Transformation (WPT) for an efficient detection of fingerprint-type patterns generated by PCR-based methodologies. We have tested our algorithmic approach on inter-Alu patterns obtained from the genomic DNA of three couples of monozygotic twins, expecting that the inter-Alu patterns of each twins couple will show differences due to unavoidable experimental variability. On the contrary the differences among samples of different twins are supposed to originate from genetic variability. Our goal is to automatically detect regions in the inter-Alu pattern likely associated to the presence of genetic polymorphisms.  相似文献   

11.
Genome variation studies in Plasmodium falciparum have focused on SNPs and, more recently, large-scale copy number polymorphisms and ectopic rearrangements. Here, we examine another source of variation: variable number tandem repeats (VNTRs). Interspersed low complexity features, including the well-studied P. falciparum microsatellite sequences, are commonly classified as VNTRs; however, this study is focused on longer coding VNTR polymorphisms, a small class of copy number variations. Selection against frameshift mutation is a main constraint on tandem repeats (TRs) in coding regions, while limited propagation of TRs longer than 975 nt total length is a minor restriction in coding regions. Comparative analysis of three P. falciparum genomes reveals that more than 9% of all P. falciparum ORFs harbor VNTRs, much more than has been reported for any other species. Moreover, genotyping of VNTR loci in a drug-selected line, progeny of a genetic cross, and 334 field isolates demonstrates broad variability in these sequences. Functional enrichment analysis of ORFs harboring VNTRs identifies stress and DNA damage responses along with chromatin modification activities, suggesting an influence on genome mutability and functional variation. Analysis of the repeat units and their flanking regions in both P. falciparum and Plasmodium reichenowi sequences implicates a replication slippage mechanism in the generation of TRs from an initially unrepeated sequence. VNTRs can contribute to rapid adaptation by localized sequence duplication. They also can confound SNP-typing microarrays or mapping short-sequence reads and therefore must be accounted for in such analyses.  相似文献   

12.
The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012.exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of α-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of ‘magnifying glass’ effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes).  相似文献   

13.
Enrichment of four tandem repeats of guanine (G) rich and cytosine (C) rich sequences in functionally important regions of human genome forebodes the biological implications of four-stranded DNA structures, such as G-quadruplex and i-motif, that can form in these sequences. However, there have been few reports on the intramolecular formation of non-B DNA structures in less than four tandem repeats of G or C rich sequences. Here, using mechanical unfolding at the single-molecule level, electrophoretic mobility shift assay (EMSA), circular dichroism (CD), and ultraviolet (UV) spectroscopy, we report an intramolecularly folded non-B DNA structure in three tandem cytosine rich repeats, 5'-TGTC4ACAC4TGTC4ACA (ILPR-I3), in the human insulin linked polymorphic region (ILPR). The thermal denaturation analyses of the sequences with systematic C to T mutations have suggested that the structure is linchpinned by a stack of hemiprotonated cytosine pairs between two terminal C4 tracts. Mechanical unfolding and Br(2) footprinting experiments on a mixture of the ILPR-I3 and a 5'-C4TGT fragment have further indicated that the structure serves as a building block for intermolecular i-motif formation. The existence of such a conformation under acidic or neutral pH complies with the strand-by-strand folding pathway of ILPR i-motif structures.  相似文献   

14.
Much attention has been devoted to identifying genomic patterns underlying the evolution of the human brain and its emergent advanced cognitive capabilities, which lie at the heart of differences distinguishing humans from chimpanzees, our closest living relatives. Here, we identify two particular intragene repeat structures of noncoding human DNA, spanning as much as a hundred kilobases, that are present in human genome but are absent from the chimpanzee genome and other nonhuman primates. Using our novel computational method Global Repeat Map, we examine tandem repeat structure in human and chimpanzee chromosome 1. In human chromosome 1, we find three higher order repeats (HORs), two of them novel, not reported previously, whereas in chimpanzee chromosome 1, we find only one HOR, a 2mer alphoid HOR instead of human alphoid 11mer HOR. In human chromosome 1, we identify an HOR based on 39-bp primary repeat unit, with secondary, tertiary, and quartic repeat units, fully embedded in human hornerin gene, related to regenerating and psoriatric skin. Such an HOR is not found in chimpanzee chromosome 1. We find a remarkable human 3mer HOR organization based on the ~1.6-kb primary repeat unit, fully embedded within the neuroblastoma breakpoint family genes, which is related to the function of the human brain. Such HORs are not present in chimpanzees. In general, we find that human-chimpanzee differences are much larger for tandem repeats, in particularly for HORs, than for gene sequences. This may be of great significance in light of recent studies that are beginning to reveal the large-scale regulatory architecture of the human genome, in particular the role of noncoding sequences. We hypothesize about the possible importance of human accelerated HOR patterns as components in the gene expression multilayered regulatory network.  相似文献   

15.
Ames D  Murphy N  Helentjaris T  Sun N  Chandler V 《Genetics》2008,179(3):1693-1704
Using the compiled human genome sequence, we systematically cataloged all tandem repeats with periods between 20 and 2000 bp and defined two subsets whose consensus sequences were found at either single-locus tandem repeats (slTRs) or multilocus tandem repeats (mlTRs). Parameters compiled for these subsets provide insights into mechanisms underlying the creation and evolution of tandem repeats. Both subsets of tandem repeats are nonrandomly distributed in the genome, being found at higher frequency at many but not all chromosome ends and internal clusters of mlTRs were also observed. Despite the integral role of recombination in the biology of tandem repeats, recombination hotspots colocalized only with shorter microsatellites and not the longer repeats examined here. An increased frequency of slTRs was observed near imprinted genes, consistent with a functional role, while both slTRs and mlTRs were found more frequently near genes implicated in triplet expansion diseases, suggesting a general instability of these regions. Using our collated parameters, we identified 2230 slTRs as candidates for highly informative molecular markers.  相似文献   

16.
We explored the possibilities of whole-genome duplication (WGD) in prokaryotic species,where we performed statistical analyses of the configurations of the central angles between homologous tandem repeats (TRs) on the circular chromosomes.At first,we detected TRs on their chromosomes and identified equivalent tandem repeat pairs (ETRPs); here,an ETRP is defined as a pair of tandem repeats sequentially similar to each other.Then we carried out statistical analyses of the central angle distributions of the de...  相似文献   

17.
Expansion or shrinkage of existing tandem repeats (TRs) associated with various biological processes has been actively studied in both prokaryotic and eukaryotic genomes, while their origin and biological implications remain mostly unknown. Here we describe various duplications (de novo TRs) that occurred in the coding region of a β-lactamase gene, where a conserved structure called the omega loop is encoded. These duplications that occurred under selection using ceftazidime conferred substrate spectrum extension to include the antibiotic. Under selective pressure with one of the original substrates (amoxicillin), a high level of reversion occurred in the mutant β-lactamase genes completing a cycle back to the original substrate spectrum. The de novo TRs coupled with reversion makes a genetic toggling mechanism enabling reversible switching between the two phases of the substrate spectrum of β-lactamases. This toggle exemplifies the effective adaptation of de novo TRs for enhanced bacterial survival. We found pairs of direct repeats that mediated the DNA duplication (TR formation). In addition, we found different duos of sequences that mediated the DNA duplication. These novel elements—that we named SCSs (same-strand complementary sequences)—were also found associated with β-lactamase TR mutations from clinical isolates. Both direct repeats and SCSs had a high correlation with TRs in diverse bacterial genomes throughout the major phylogenetic lineages, suggesting that they comprise a fundamental mechanism shaping the bacterial evolution.  相似文献   

18.
Assessments of DNA inhomogeneities in yeast chromosome III.   总被引:6,自引:3,他引:3       下载免费PDF全文
With the sequencing of the first complete eukaryotic chromosome, III of yeast (YCIII) of length 315 kb, several types of questions concerning chromosomal organization and the heterogeneity of eukaryotic DNA sequences can be approached. We have undertaken extensive analysis of YCIII with the goals of: (1) discerning patterns and anomalies in the occurrences of short oligonucleotides; (2) characterizing the nature and locations of significant direct and inverted repeats; (3) delimiting regions unusually rich in particular base types (e.g., G+C, purines); and (4) analyzing the distributions of markers of interest, e.g., delta (delta) elements, ARS (autonomous replicating sequences), special oligonucleotides, close repeats and close dyad pairings, and gene sequences. YCIII reveals several distinctive sequence features, including: (i) a relative abundance of significant local and global repeats highlighting five genes containing substantial close or tandem DNA repeats; (ii) an anomalous distribution of delta elements involving two clusters and a long gap; (iii) a significantly even distribution of ARS; (iv) a relative increase in the frequency of T runs and AT iterations downstream of genes and A runs upstream of genes; and (v) two regions of complex repetitive sequences and anomalous DNA composition, 29000-31000 and 291000-295000, the latter centered at the HMRa locus. Interpretations of these findings for chromosomal organization and implications for regulation of gene expression are discussed.  相似文献   

19.

Background

Chaos Game Representation (CGR) is an iterated function that bijectively maps discrete sequences into a continuous domain. As a result, discrete sequences can be object of statistical and topological analyses otherwise reserved to numerical systems. Characteristically, CGR coordinates of substrings sharing an L-long suffix will be located within 2 -L distance of each other. In the two decades since its original proposal, CGR has been generalized beyond its original focus on genomic sequences and has been successfully applied to a wide range of problems in bioinformatics. This report explores the possibility that it can be further extended to approach algorithms that rely on discrete, graph-based representations.

Results

The exploratory analysis described here consisted of selecting foundational string problems and refactoring them using CGR-based algorithms. We found that CGR can take the role of suffix trees and emulate sophisticated string algorithms, efficiently solving exact and approximate string matching problems such as finding all palindromes and tandem repeats, and matching with mismatches. The common feature of these problems is that they use longest common extension (LCE) queries as subtasks of their procedures, which we show to have a constant time solution with CGR. Additionally, we show that CGR can be used as a rolling hash function within the Rabin-Karp algorithm.

Conclusions

The analysis of biological sequences relies on algorithmic foundations facing mounting challenges, both logistic (performance) and analytical (lack of unifying mathematical framework). CGR is found to provide the latter and to promise the former: graph-based data structures for sequence analysis operations are entailed by numerical-based data structures produced by CGR maps, providing a unifying analytical framework for a diversity of pattern matching problems.  相似文献   

20.
An efficient algorithm for detecting approximate tandem repeats in genomic sequences is presented. The algorithm is based on innovative statistical criteria to detect candidate regions which may include tandem repeats; these regions are subsequently verified by alignments based on dynamic programming. No prior information about the period size or pattern is needed. Also, the algorithm is virtually capable of detecting repeats with any period. An implementation of the algorithm is compared with the two state-of-the-art tandem repeats detection tools to demonstrate its effectiveness both on natural and synthetic data. The algorithm is available at www.cs.brown.edu/people/domanic/tandem/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号