首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Tandem repeats occur frequently in biological sequences. They are important for studying genome evolution and human disease. A number of methods have been designed to detect a single tandem repeat in a sliding window. In this article, we focus on the case that an unknown number of tandem repeat segments of the same pattern are dispersively distributed in a sequence. We construct a probabilistic generative model for the tandem repeats, where the sequence pattern is represented by a motif matrix. A Bayesian approach is adopted to compute this model. Markov chain Monte Carlo (MCMC) algorithms are used to explore the posterior distribution as an effort to infer both the motif matrix of tandem repeats and the location of repeat segments. Reversible jump Markov chain Monte Carlo (RJMCMC) algorithms are used to address the transdimensional model selection problem raised by the variable number of repeat segments. Experiments on both synthetic data and real data show that this new approach is powerful in detecting dispersed short tandem repeats. As far as we know, it is the first work to adopt RJMCMC algorithms in the detection of tandem repeats.  相似文献   

2.
MOTIVATION: A tandem repeat in DNA is a sequence of two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats occur in the genomes of both eukaryotic and prokaryotic organisms. They are important in numerous fields including disease diagnosis, mapping studies, human identity testing (DNA fingerprinting), sequence homology and population studies. Although tandem repeats have been used by biologists for many years, there are few tools available for performing an exhaustive search for all tandem repeats in a given sequence. RESULTS: In this paper we describe an efficient algorithm for finding all tandem repeats within a sequence, under the edit distance measure. The contributions of this paper are two-fold: theoretical and practical. We present a precise definition for tandem repeats over the edit distance and an efficient, deterministic algorithm for finding these repeats. AVAILABILITY: The algorithm has been implemented in C++, and the software is available upon request and can be used at http://www.sci.brooklyn.cuny.edu/~sokol/trepeats. The use of this tool will assist biologists in discovering new ways that tandem repeats affect both the structure and function of DNA and protein molecules.  相似文献   

3.
Exact Tandem Repeats Analyzer 1.0 (E-TRA) combines sequence motif searches with keywords such as ‘organs’, ‘tissues’, ‘cell lines’ and ‘development stages’ for finding simple exact tandem repeats as well as non-simple repeats. E-TRA has several advanced repeat search parameters/options compared to other repeat finder programs as it not only accepts GenBank, FASTA and expressed sequence tags (EST) sequence files, but also does analysis of multiple files with multiple sequences. The minimum and maximum tandem repeat motif lengths that E-TRA finds vary from one to one thousand. Advanced user defined parameters/options let the researchers use different minimum motif repeats search criteria for varying motif lengths simultaneously. One of the most interesting features of genomes is the presence of relatively short tandem repeats (TRs). These repeated DNA sequences are found in both prokaryotes and eukaryotes, distributed almost at random throughout the genome. Some of the tandem repeats play important roles in the regulation of gene expression whereas others do not have any known biological function as yet. Nevertheless, they have proven to be very beneficial in DNA profiling and genetic linkage analysis studies. To demonstrate the use of E-TRA, we used 5,465,605 human EST sequences derived from 18,814,550 GenBank EST sequences. Our results indicated that 12.44% (679,800) of the human EST sequences contained simple and non-simple repeat string patterns varying from one to 126 nucleotides in length. The results also revealed that human organs, tissues, cell lines and different developmental stages differed in number of repeats as well as repeat composition, indicating that the distribution of expressed tandem repeats among tissues or organs are not random, thus differing from the un-transcribed repeats found in genomes.  相似文献   

4.
Yang J  Meng Q  Liu XQ 《Molecular microbiology》2004,51(4):1185-1192
Protein splicing inteins can be small as approximately 130 aa or up to approximately 600 aa when harbouring an endonuclease domain. Here we report the identification and characterization of an unusually large intein, 1650 aa long and the largest of known inteins, encoded by the replicative DNA helicase gene dnaB of the oceanic N2-fixing cyanobacterium Trichodesmium erythraeum. This Ter DnaB-1 intein co-exists with a 177-aa mini-intein in the same host protein and harbours large tandem repeats in which an 84-aa sequence is repeated 16 times. Comparison between this tandem repeats and the recently reported tandem repeats of Ter DnaE-1 intein revealed differences and similarities. The two tandem repeats, residing in different inteins of different host proteins, differ by 50% in size and have little sequence similarity. Tandem repeats in the Ter DnaB-1 intein were required for the protein splicing activity when tested in Escherichia coli, in contrast to tandem repeats of the Ter DnaE-1 intein that inhibited protein splicing. On the other hand, tandem repeats of both inteins are located in the same corresponding region of the intein sequence and have the same number of repeating units. These suggest that the two tandem repeats could be related but have diverged greatly in size, sequence and effect on protein splicing. Alternatively, they could have independent origins but evolved certain similarities because of common constraints in structure and maintenance.  相似文献   

5.
Tandem repeats finder: a program to analyze DNA sequences.   总被引:66,自引:3,他引:63       下载免费PDF全文
A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistically based recognition criteria. We demonstrate the algorithm's speed and its ability to detect tandem repeats that have undergone extensive mutational change by analyzing four sequences: the human frataxin gene, the human beta T cellreceptor locus sequence and two yeast chromosomes. These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface atc3.biomath.mssm.edu/trf.html has been established for automated use of the program.  相似文献   

6.
MOTIVATION: One of the main tasks of DNA sequence analysis is identification of repetitive patterns. DNA symbol repetitions play a key role in a number of applications, including prediction of gene and exon locations, identification of diseases, reconstruction of human evolutionary history and DNA forensics. RESULTS: A new approach towards identification of tandem repeats in DNA sequences is proposed. The approach is a refinement of previously considered method, based on the complex periodicity transform. The refinement is obtained, among others, by mapping of DNA symbols to pure quaternions. This mapping results in an enhanced, symbol-balanced sensitivity of the transform to DNA patterns, and an unambiguous threshold selection criterion. Computational efficiency of the transform is further improved, and coupling of the computation with the period value is removed, thereby facilitating parallel implementation of the algorithm. Additionally, a post-processing stage is inserted into the algorithm, enabling unambiguous display of results in a convenient graphical format. Comparison of the quaternionic periodicity transform with two well-known pattern detection techniques shows that the new approach is competitive with these two techniques in detection of exact and approximate repeats.  相似文献   

7.
Micro-and minisatellites constitute an essential part of DNA with low sequence complexity and perform a number of important functions. The TandemSWAN program was used to search the human genome for tandem repeats with a length of a repeated unit to 70 bp, including repeats with a large number of nucleotide substitutions. It was shown that, for a significant fraction of the program-found minisatellites with a repeat unit length less than 25 bp, a shorter repeated motif can be discerned in this sequence, which is often similar to the sequence of microsatellites occurring widely in the human genome. A model of hierarchical origin of minisatellites in the human genome was proposed.  相似文献   

8.
Presence of a Simple Tandem Repeat in the ITS1 Region of the Xylariales   总被引:1,自引:0,他引:1  
A Simple Tandem Repeat sequence of 11 nucleotides has been found in the ITS1 region of the rDNA of members of Order Xylariales. The number of repetitions detected ranged from one to six, and they could be found in pure tandem or interspersed. The same core sequences have also been found in DNA from other organisms, although usually not repeated in tandem. These repetitions could have been generated by slipped strand mispairing. The presence of this sequence increases the normal rate of divergence in the ITS1 of the Xylariales. The phylogenetic implications of the presence of this sequence in the molecular taxonomy of Xylariales are also discussed. Received: 19 October 2000 / Accepted: 21 December 2000  相似文献   

9.
Tandem repeat sequences are frequently associated with gene silencing phenomena. The Arabidopsis thaliana FWA gene contains two tandem repeats and is an efficient target for RNA-directed de novo DNA methylation when it is transformed into plants. We showed that the FWA tandem repeats are necessary and sufficient for de novo DNA methylation and that repeated character rather than intrinsic sequence is likely important. Endogenous FWA can adopt either of two stable epigenetic states: methylated and silenced or unmethylated and active. Surprisingly, we found small interfering RNAs (siRNAs) associated with FWA in both states. Despite this, only the methylated form of endogenous FWA could recruit further RNA-directed DNA methylation or cause efficient de novo methylation of transgenic FWA. This suggests that RNA-directed DNA methylation occurs in two steps: first, the initial recruitment of the siRNA-producing machinery, and second, siRNA-directed DNA methylation either in cis or in trans. The efficiency of this second step varies depending on the nature of the siRNA-producing locus, and at some loci, it may require pre-existing chromatin modifications such as DNA methylation itself. Enhancement of RNA-directed DNA methylation by pre-existing DNA methylation could create a self-reinforcing system to enhance the stability of silencing. Tandem repeats throughout the Arabidopsis genome produce siRNAs, suggesting that repeat acquisition may be a general mechanism for the evolution of gene silencing.  相似文献   

10.
MOTIVATION: Tandem repeats are associated with disease genes, play an important role in evolution and are important in genomic organization and function. Although much research has been done on short perfect patterns of repeats, there has been less focus on imperfect repeats. Thus, there is an acute need for a tandem repeats database that provides reliable and up to date information on both perfect and imperfect tandem repeats in the human genome and relates these to disease genes. RESULTS: This paper presents a web-accessible relational tandem repeats database that relates tandem repeats to gene locations and disease genes of the human genome. In contrast to other available databases, this database identifies both perfect and imperfect repeats of 1-2000 bp unit lengths. The utility of this database has been illustrated by analysing these repeats for their distribution and frequencies across chromosomes and genomic locations and between protein-coding and non-coding regions. The applicability of this database to identify diseases associated with previously uncharacterized tandem repeats is demonstrated.  相似文献   

11.
Boeva VA  Fridman MV  Makeev VIu 《Biofizika》2006,51(4):650-655
Micro- and minisatellites constitute an essential part of DNA with a low sequence complexity and carry several important functions. A search for tandem repeats in the human genome with a length of a repeat unit of up to 70 bp, including repeats with a great number of nucleotide substitutions, has been performed using the TaadeaSWAN program. It was shown that, for a considerable number of minisatellites with the length of the repeating unit of less than 25 nt, a shorter repeating motif can be distinguished in the sequence of this repeat, which often is similar to the sequence of minisatellites widely occurring in the human genome. A model of hierarchic origination of minisatellites in the human genome is suggested.  相似文献   

12.
We obtained the complete mitochondrial genome of U.thibetanus mupinensis by DNA sequencing based on the PCR fragments of 18 primers we designed. The results indicate that the mtDNA is 16,868 bp in size, encodes 13 protein genes, 22 tRNA genes, and 2 rRNA genes, with an overall H-strand base composition of 31.2% A, 25.4% C, 15.5% G and 27.9% T. The sequence of the control region (CR) located between tRNA-Pro and tRNA-Phe is 1422 bp in size, consists of 8.43% of the whole genome, GC content is 51.9% and has a 6bp tandem repeat and two 10bp tandem repeats identified by using the Tandem Repeats Finder. U. thibetanus mupinensis mitochondrial genome shares high similarity with those of three other Ursidae: U. americanus (91.46%), U. arctos (89.25%) and U. maritimus (87.66%).  相似文献   

13.
Repetitive DNA sequences in the bovine corticotropin-beta-lipotropin precursor gene region have been mapped and subjected to nucleotide sequence analysis. Two of the four repetitive DNA segments found are located in the 5'-flanking region, and one each within the intervening sequences. Each repetitive DNA segment contains one to three highly homologous unit sequences with an approximate length of 120 base pairs. All the unit sequences are flanked on the 3' side by tandem repeats. There are about 10(5) copies of the repetitive DNA in the bovine genome. Comparison of the bovine repetitive sequences with those of other mammalian species reveals the presence of a homologous segment of approximately 40 base pairs. This segment and the region preceding it in the bovine repetitive DNA exhibit sequence homology with the region encompassing the origin of DNA replication in papovaviruses.  相似文献   

14.
A family of four satellite DNAs has been characterized in the genome of the bivalve mollusc, Donax trunculus. All share HindIII sites, a similar monomer length of about 160 base pairs (bp), and the related oligonucleotide motifs GGTCA and GGGTTA, repeated six to 15 times within the repetitive units. The motif GGTCA is common to all members of the satellite family. It is present in three of them in both orientations, interspersed within nonrepetitive DNA sequences. The hexanucleotide GGGTTA appears to be the main building element of one of the satellites forming a prominent subrepeat structure in conjunction with the 5-bp motif. The former has been also found in perfect tandem repeats in a junction region adjacent to the proper satellite sequence. Southern analysis has revealed that (GGGTTA)n and/or related sequences are abundant and widely distributed in the D. trunculus genome. The distribution observed is consistent with the concurrence of the scattering of short sequence motifs throughout the genome and the spread of longer DNA segments, with concomitant formation of satellite monomer repeats. Both kinds of dispersion may have contributed to the observed complex arrangement of the HindIII satellite DNA family in Donax. Received: 28 May 1996 / Accepted: 30 July 1996  相似文献   

15.
Human mammary cells present on the cell surface a polymorphic epithelial mucin (PEM) which is developmentally regulated and aberrantly expressed in tumors. PEM carries tumor-associated epitopes recognized by the monoclonal antibodies HMFG-1, HMFG-2, and SM-3. Previously isolated partial cDNA clones revealed that the core protein contained a large domain consisting of variable numbers of 20-amino acid repeat units. We now report the full sequence for PEM, as deduced from cDNA sequences. The encoded protein consists of three distinct regions: the amino terminus consisting of a putative signal peptide and degenerate repeats; the major portion of the protein which is the tandem repeat region; the carboxyl terminus consisting of degenerate tandem repeats and a unique sequence containing a transmembrane sequence and a cytoplasmic tail. Potential O-glycosylation sites (serines or threonines) make up more than one-fourth of the amino acids. Length variations in the tandem repeat result in PEM being an expressed variable number tandem repeat locus. Tandem repeats appear to be a general characteristic of mucin core proteins.  相似文献   

16.
Simple sequence repeats (SSRs) are omnipresent in prokaryotes and eukaryotes, and are found anywhere in the genome in both protein encoding and noncoding regions. In present study the whole genome sequences of seven chromosomes (Shigella flexneri 2a str301 and 2457T, Shigella sonnei, Escherichia coli k12, Mycobacterium tuberculosis, Mycobacterium leprae and Staphylococcus saprophyticus) have downloaded from the GenBank database for identifying abundance, distribution and composition of SSRs and also to determine difference between the tandem repeats in real genome and randomness genome (using sequence shuffling tool) of the organisms included in this study. The data obtained in the present study show that: (i) tandem repeats are widely distributed throughout the genomes; (ii) SSRs are differentially distributed among coding and noncoding regions in investigated Shigella genomes; (iii) total frequency of SSRs in noncoding regions are higher than coding regions; (iv) in all investigated chromosomes ratio of Trinucleotide SSRs in real genomes are much higher than randomness genomes and Di nucleotide SSRs are lower; (v) Ratio of total and mononucleotide SSRs in real genome is higher than randomness genomes in E. coli K12, S. flexneri str 301 and S. saprophyticus, while it is lower in S. flexneri str 2457T, S.sonnei and M. tuberculosis and it is approximately same in M. leprae; (vi) frequency of codon repetitions are vary considerably depending on the type of encoded amino acids.  相似文献   

17.
Exploiting a serendipitously observed bovine male-specific signal, generated by the mouse pSP64.2.5EI minisatellite probe, we have cloned a bovine (Bos taurus) Y-specific sequence: btDYZ-1. This sequence is composed of 60 tandem repetitions of a motif consisting of two parts: a 40-bp-long unit, showing a mean divergence of 27% between repeats, separated from the next repeat by a TG-rich stretch varying in length between 12 and 63 bp. The number of copies of this repeated motif has been estimated at 6 X 10(4) per male genome. As a consequence, the corresponding satellite, DYZ-1, might represent approximately 1/20 of the bovine Y chromosome. btDYZ-1 has been mapped by in situ hybridization to the pericentric region of the Y chromosome. It is characterized by a substantial genetic polymorphism and has been shown to be conserved within the Bos and Bison genera of the Bovinae subfamily. This sequence is being used to develop a sexing procedure for bovine preimplantation embryos based on the polymerase chain reaction.  相似文献   

18.

Background

Polymorphic tandem repeat typing is a new generic technology which has been proved to be very efficient for bacterial pathogens such as B. anthracis, M. tuberculosis, P. aeruginosa, L. pneumophila, Y. pestis. The previously developed tandem repeats database takes advantage of the release of genome sequence data for a growing number of bacteria to facilitate the identification of tandem repeats. The development of an assay then requires the evaluation of tandem repeat polymorphism on well-selected sets of isolates. In the case of major human pathogens, such as S. aureus, more than one strain is being sequenced, so that tandem repeats most likely to be polymorphic can now be selected in silico based on genome sequence comparison.

Results

In addition to the previously described general Tandem Repeats Database, we have developed a tool to automatically identify tandem repeats of a different length in the genome sequence of two (or more) closely related bacterial strains. Genome comparisons are pre-computed. The results of the comparisons are parsed in a database, which can be conveniently queried over the internet according to criteria of practical value, including repeat unit length, predicted size difference, etc. Comparisons are available for 16 bacterial species, and the orthopox viruses, including the variola virus and three of its close neighbors.

Conclusions

We are presenting an internet-based resource to help develop and perform tandem repeats based bacterial strain typing. The tools accessible at http://minisatellites.u-psud.fr now comprise four parts. The Tandem Repeats Database enables the identification of tandem repeats across entire genomes. The Strain Comparison Page identifies tandem repeats differing between different genome sequences from the same species. The "Blast in the Tandem Repeats Database" facilitates the search for a known tandem repeat and the prediction of amplification product sizes. The "Bacterial Genotyping Page" is a service for strain identification at the subspecies level.
  相似文献   

19.
Tandem repeats within the inverted terminal repetition of vaccinia virus DNA   总被引:23,自引:0,他引:23  
R Wittek  B Moss 《Cell》1980,21(1):277-284
A tandemly repeated sequence within the genome of vaccinia virus is cut to fragments of approximately 70 bp by Hinf I, Taq I or Mbo II. The 70 bp repetition was localized within the much larger (10,300 bp) inverted terminal repetition by restriction analysis of cloned DNA fragments and by hybridization of the purified 70 bp repeat to vaccinia virus DNA restriction fragments. The molar abundance of the 70 bp fragment corresponds to a 30 fold repetition at each end of the genome. The repeating restriction endonuclease sites were mapped by agarose gel electrophoresis of partial Hinf I digests of the terminally labeled cloned DNA fragment. The first of 13 repetitive Hinf I sites occurred approximately 150 bp from the end of the cloned DNA. After an intervening sequence of approximately 435 bp, a second series of 17 repetitive Hinf I sites occurred. The DNA between the two blocks of repetitions has a unique sequence containing single Dde I, Alu I and Sau 3A sites. Tandem repeats within the inverted terminal repetition could serve to accelerate self-annealing of single strands of DNA to form circular structures during replication.  相似文献   

20.
Zhang D  Yang Q  Ding Y  Cao X  Xue Y  Cheng Z 《Genomics》2008,92(2):107-114
Tandem repetitive sequences are DNA motifs common in the genomes of eukaryotic species and are often embedded in heterochromatic regions. In most eukaryotes, ribosomal genes, as well as centromeres and telomeres or subtelomeres, are associated with abundant tandem arrays of repetitive sequences and typically represent the final barriers to completion of whole-genome sequencing. The nature of these repeats makes it difficult to estimate their actual sizes. In this study, combining the two cytological techniques DNA fiber-FISH and pachytene chromosome FISH allowed us to characterize the tandem repeats distributed genome wide in Antirrhinum majus and identify four types of tandem repeats, 45S rDNA, 5S rDNA, CentA1, and CentA2, representing the major tandem repetitive components, which were estimated to have a total length of 18.50 Mb and account for 3.59% of the A. majus genome. FISH examination revealed that all the tandem repeats correspond to heterochromatic knobs along the pachytene chromosomes. Moreover, the methylation status of the tandem repeats was investigated in both somatic cells and pollen mother cells from anther tissues using an antibody against 5-methylcytosine combined with sequential FISH analyses. Our results showed that these repeats were hypomethylated in anther tissues, especially in the pollen mother cells at pachytene stage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号