首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
Simple sequence repeats are predominantly found in most organisms. They play a major role in studies of genetic diversity, and are useful as diagnostic markers for many diseases. The simple sequence repeats database (SSRD) for the human genome was created for easy access to such repeats, for analysis, and to be used to understand their biological significance. The data includes the abundance and distribution of SSRs in the coding and non-coding regions of the genome, as well as their association with the UTRs of genes. The exact locations of repeats with respect to genomic regions (such as UTRs, exons, introns or intergenic regions) and their association with STS markers are also highlighted. The resource will facilitate repeat sequence analysis in the human genome and the understanding of the functional and evolutionary significance of simple sequence repeats. SSRD is available through two websites, http://www.ccmb.res.in/ssr and http://www.ingenovis.com/ssr.  相似文献   

4.
5.
The mouse genomic locus containing the oncogene c-mos was analyzed for repetitive DNA sequences. We found a single B1 repeat 10 kb upstream and three B1 repeats 0.6 kb, 2.7 kb, and 5.4 kb, respectively, downstream from c-mos. The B1 repeat closest to c-mos contains an internal 7-bp duplication and a 18-bp insertion. Localized between the last two B1 repeats is a copy of a novel mouse repeat. Sequence comparison of three copies of this novel repeat family shows that they a) contain a conserved BglII site, b) are approximately 420 bp long, c) possess internal 50-bp polypurine tracts, and d) have structural characteristics of transposable elements. They are present in about 1500 copies per haploid genome in the mouse, but are not detectable in DNA of other mammals. The BglII repeat downstream from c-mos is interrupted by a single 632-bp LTR element. We estimate that approximately 1200 copies of this element are present per haploid genome in BALB/c mice. It shares sequence homology in the R-U5 region with an LTR element found in 129/J mice.  相似文献   

6.
7.
In a recent study, we reported that the combined average mutation rate of 10 di-, 6 tri-, and 8 tetranucleotide repeats in Drosophila melanogaster was 6.3 x 10(-6) mutations per locus per generation, a rate substantially below that of microsatellite repeat units in mammals studied to date (range = 10(-2)-10(-5) per locus per generation). To obtain a more precise estimate of mutation rate for dinucleotide repeat motifs alone, we assayed 39 new dinucleotide repeat microsatellite loci in the mutation accumulation lines from our earlier study. Our estimate of mutation rate for a total of 49 dinucleotide repeats is 9.3 x 10(-6) per locus per generation, only slightly higher than the estimate from our earlier study. We also estimated the relative difference in microsatellite mutation rate among di-, tri-, and tetranucleotide repeats in the genome of D. melanogaster using a method based on population variation, and we found that tri- and tetranucleotide repeats mutate at rates 6.4 and 8.4 times slower than that of dinucleotide repeats, respectively. The slower mutation rates of tri- and tetranucleotide repeats appear to be associated with a relatively short repeat unit length of these repeat motifs in the genome of D. melanogaster. A positive correlation between repeat unit length and allelic variation suggests that mutation rate increases as the repeat unit lengths of microsatellites increase.   相似文献   

8.
9.
The interspersed repeat content of mammalian genomes has been best characterized in human, mouse and cow. In this study, we carried out de novo identification of repeated elements in the equine genome and identified previously unknown elements present at low copy number. The equine genome contains typical eutherian mammal repeats, but also has a significant number of hybrid repeats in addition to clade-specific Long Interspersed Nuclear Elements (LINE). Equus caballus clade specific LINE 1 (L1) repeats can be classified into approximately five subfamilies, three of which have undergone significant expansion. There are 1115 full-length copies of these equine L1, but of the 103 presumptive active copies, 93 fall within a single subfamily, indicating a rapid recent expansion of this subfamily. We also analysed both interspersed and simple sequence repeats (SSR) genome-wide, finding that some repeat classes are spatially correlated with each other as well as with G+C content and gene density. Based on these spatial correlations, we have confirmed that recently-described ancestral vs. clade-specific genome territories can be defined by their repeat content. The clade-specific Short Interspersed Nuclear Element correlations were scattered over the genome and appear to have been extensively remodelled. In contrast, territories enriched for ancestral repeats tended to be contiguous domains. To determine if the latter territories were evolutionarily conserved, we compared these results with a similar analysis of the human genome, and observed similar ancestral repeat enriched domains. These results indicate that ancestral, evolutionarily conserved mammalian genome territories can be identified on the basis of repeat content alone. Interspersed repeats of different ages appear to be analogous to geologic strata, allowing identification of ancient vs. newly remodelled regions of mammalian genomes.  相似文献   

10.
小鼠基因组中的微卫星重复序列的数量、分布和密度   总被引:1,自引:0,他引:1  
作者分析了老鼠基因组中各染色体及其内含子、外显子和基因间区上各种类型的微卫星(1-6个碱基的重复序列)的数量及其密度。SSR约占老鼠基因组的2.85%,其中46.2%存在于基因间区,4.75%存在于外显子,49.05%在内含子区域,即非编码区富含微卫星。微卫星的数量与染色体或基因区域的大小有关,但密度与染色体或基因区域的大小的关系并不十分密切。第4染色体的外显子区域中6种类型的SSR含量都比其它区域少。A,T,AC,AG,AT,AAC,AAG,AGG,AAAC,AAAG,AAAT,AACC,AAAAC,AAAAG,AAAAT,AAACC,AAAGG,AAGAG,AAAAAC,AAAAAG,AAAAAT,AAAGAG,ACACAT,ACAGAG,ACAGGC,ACATAT是老鼠基因组中主要的SSR类型,而一些5碱基重复单元的SSR在老鼠基因组的某一条甚至某几条染色体都不存在  相似文献   

11.
Tandem repeats occur frequently in biological sequences. They are important for studying genome evolution and human disease. A number of methods have been designed to detect a single tandem repeat in a sliding window. In this article, we focus on the case that an unknown number of tandem repeat segments of the same pattern are dispersively distributed in a sequence. We construct a probabilistic generative model for the tandem repeats, where the sequence pattern is represented by a motif matrix. A Bayesian approach is adopted to compute this model. Markov chain Monte Carlo (MCMC) algorithms are used to explore the posterior distribution as an effort to infer both the motif matrix of tandem repeats and the location of repeat segments. Reversible jump Markov chain Monte Carlo (RJMCMC) algorithms are used to address the transdimensional model selection problem raised by the variable number of repeat segments. Experiments on both synthetic data and real data show that this new approach is powerful in detecting dispersed short tandem repeats. As far as we know, it is the first work to adopt RJMCMC algorithms in the detection of tandem repeats.  相似文献   

12.
Whole genome comparison has revealed the presence of short sequence repeats (also called mycobacterial interspersed repeat units and variable number tandem repeat units) used for genotyping schemes. In this study, we have used deletion analysis, single nucleotide polymorphism data and spoligotype taken from published data from others to investigate the evolution of selected repeats that form the common denominators of the majority of established schemes. Analysis of the number of repeats per locus from over 400 isolates revealed that the general trend globally appears to be loss of repeats in modern strains compared with ancestral strains.  相似文献   

13.
14.
15.
Abundant repetitive DNA sequences are an enigmatic part of the human genome. Despite increasing evidence on the functionality of DNA repeats, their biologic role is still elusive and under frequent debate. Macrosatellites are the largest of the tandem DNA repeats, located on one or multiple chromosomes. The contribution of macrosatellites to genome regulation and human health was demonstrated for the D4Z4 macrosatellite repeat array on chromosome 4q35. Reduced copy number of D4Z4 repeats is associated with local euchromatinization and the onset of facioscapulohumeral muscular dystrophy. Although the role other macrosatellite families may play remains rather obscure, their diverse functionalities within the genome are being gradually revealed. In this review, we will outline structural and functional features of coding and noncoding macrosatellite repeats, and highlight recent findings that bring these sequences into the spotlight of genome organization and disease development.  相似文献   

16.
Exact Tandem Repeats Analyzer 1.0 (E-TRA) combines sequence motif searches with keywords such as ‘organs’, ‘tissues’, ‘cell lines’ and ‘development stages’ for finding simple exact tandem repeats as well as non-simple repeats. E-TRA has several advanced repeat search parameters/options compared to other repeat finder programs as it not only accepts GenBank, FASTA and expressed sequence tags (EST) sequence files, but also does analysis of multiple files with multiple sequences. The minimum and maximum tandem repeat motif lengths that E-TRA finds vary from one to one thousand. Advanced user defined parameters/options let the researchers use different minimum motif repeats search criteria for varying motif lengths simultaneously. One of the most interesting features of genomes is the presence of relatively short tandem repeats (TRs). These repeated DNA sequences are found in both prokaryotes and eukaryotes, distributed almost at random throughout the genome. Some of the tandem repeats play important roles in the regulation of gene expression whereas others do not have any known biological function as yet. Nevertheless, they have proven to be very beneficial in DNA profiling and genetic linkage analysis studies. To demonstrate the use of E-TRA, we used 5,465,605 human EST sequences derived from 18,814,550 GenBank EST sequences. Our results indicated that 12.44% (679,800) of the human EST sequences contained simple and non-simple repeat string patterns varying from one to 126 nucleotides in length. The results also revealed that human organs, tissues, cell lines and different developmental stages differed in number of repeats as well as repeat composition, indicating that the distribution of expressed tandem repeats among tissues or organs are not random, thus differing from the un-transcribed repeats found in genomes.  相似文献   

17.
An algorithm for approximate tandem repeats.   总被引:4,自引:0,他引:4  
A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g., abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g., abcdaacd. In this paper we consider two criterions of similarity: the Hamming distance (k mismatches) and the edit distance (k differences). For a string S of length n and an integer k our algorithm reports all locally optimal approximate repeats, r = umacro ?, for which the Hamming distance of umacro and ? is at most k, in O(nk log (n/k)) time, or all those for which the edit distance of umacro and ? is at most k, in O(nk log k log (n/k)) time. This paper concentrates on a more general type of repeat called multiple tandem repeats. A multiple tandem repeat in a sequence S is a (periodic) substring r of S of the form r = u(a)u', where u is a prefix of r and u' is a prefix of u. An approximate multiple tandem repeat is a multiple repeat with errors; the repeated subsequences are similar but not identical. We precisely define approximate multiple repeats, and present an algorithm that finds all repeats that concur with our definition. The time complexity of the algorithm, when searching for repeats with up to k errors in a string S of length n, is O(nka log (n/k)) where a is the maximum number of periods in any reported repeat. We present some experimental results concerning the performance and sensitivity of our algorithm. The problem of finding repeats within a string is a computational problem with important applications in the field of molecular biology. Both exact and inexact repeats occur frequently in the genome, and certain repeats occurring in the genome are known to be related to diseases in the human.  相似文献   

18.
Mononucleotide repeats (MNRs) are abundant in eukaryotic genomes and exhibit a high degree of length variability due to insertion and deletion events. However, the relationship between these repeats and mutation rates in surrounding sequences has not been systematically investigated. We have analyzed the frequency of single nucleotide polymorphisms (SNPs) at positions close to and within MNRs in the human genome. Overall, we find a 2- to 4-fold increase in the SNP frequency at positions immediately adjacent to the boundaries of MNRs, relative to that at more distant bases. This relationship exhibits a strong asymmetry between 3' and 5' ends of repeat tracts and is dependent upon the repeat motif, length and orientation of surrounding repeats. Our analysis suggests that the incorporation or exclusion of bases adjacent to the boundary of the repeat through substitutions, in which these nucleotides mutate towards or away from the base present within the repeat, respectively, may be another mechanism by which MNRs expand and contract in the human genome.  相似文献   

19.
20.
Recently transposed Alu repeats result from multiple source genes.   总被引:27,自引:11,他引:16       下载免费PDF全文
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号