首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
Abstract

Identifying and predicting the structural characteristics of novel repeats throughout the genome can lend insight into biological function. Specific repeats are believed to have biological significance as a function of their distribution patterns. We have developed ‘GenomeMark,’ a computer program that detects and statistically analyzes candidate repeats. Specifically, ‘GenomeMark’ identifies the periodic distribution of unique words, calculating their χ2 and Z-score values. Using ‘GenomeMark,’ we identified novel sequence words present in tandem throughout genomes. We found that these sequences have remarkable spacer sequence distributions and many were genome specific, validating the genome signature theory. Further analysis confirmed that many of these sequences have a specific biological function. The program is available from the authors upon request and is freely available for non-commercial and academic entities.  相似文献   

2.
Grover D  Kannan K  Brahmachari SK  Mukerji M 《Genetica》2005,124(2-3):273-289
Elucidation of complete nucleotide sequence of the human has revealed that coding sequences that store the information needed to synthesize functional proteins, occupy only 2% of the genomic region. The remaining 98%, barring few regulatory sequences, has been referred to as non-functional or junk DNA and consists of many kinds of repeat elements. In fact, human genome is the most repeat rich genome sequenced so far, in which more than half of the region is occupied by such sequences. Determination of significance of these repeats in the human genome has become the focus of many studies all over the world, especially after genome sequencing did not reveal any significant difference in coding regions between lower eukaryotes and human. In this article, we have focused on Alu repeats that are primate specific elements with many interesting biological properties. Moreover, these are the repeats with highest copy number in the human genome. We have highlighted different facets of their interaction with the genome and changing paradigms regarding their role in genome organization.  相似文献   

3.
We have isolated four repetitive DNA fragments from maize DNA. Only one of these sequences showed homology to sequences within the EMBL database, despite each having an estimated copy number of between 3 x 104 and 5 x 104 per haploid genome. Hybridization of the four repeats to maize mitotic chromosomes showed that the sequences are evenly dispersed throughout most, but not all, of the maize genome, whereas hybridization to yeast colonies containing random maize DNA fragments inserted into yeast artificial chromosomes (YACs) indicated that there was considerable clustering of the repeats at a local level. We have exploited the distribution of the repeats to produce repetitive sequence fingerprints of individual YAC clones. These fingerprints not only provide information about the occurrence and organization of the repetitive sequences within the maize genome, but they can also be used to determine the organization of overlapping maize YAC clones within a contiguous fragment (contigs). Key words : maize, repetitive DNA, YACs.  相似文献   

4.
Novel functional role of CA repeats and hnRNP L in RNA stability   总被引:6,自引:1,他引:5  
CA dinucleotide repeat sequences are very common in the human genome. We have recently demonstrated that the polymorphic CA repeats in intron 13 of the human endothelial nitric oxide synthase (eNOS) gene function as an unusual, length-dependent splicing enhancer. The CA repeat enhancer requires for its activity specific binding of hnRNP L. Here we show that in the absence of bound hnRNP L, the pre-mRNA is cleaved directly upstream of the CA repeats. The addition of recombinant hnRNP L restores RNA stability. CA repeats are both necessary and sufficient for this specific cleavage in the 5' adjacent RNA sequence. We conclude that-in addition to its role as a splicing activator-hnRNP L can act in vitro as a sequence-specific RNA protection factor. Based on the wide abundance of CA repetitive sequences in the human genome, this may represent a novel, generally important role of this abundant hnRNP protein.  相似文献   

5.
MOTIVATION: Complex genomes contain numerous repeated sequences, and genomic duplication is believed to be a main evolutionary mechanism to obtain new functions. Several tools are available for de novo repeat sequence identification, and many approaches exist for clustering homologous protein sequences. We present an efficient new approach to identify and cluster homologous DNA sequences with high accuracy at the level of whole genomes, excluding low-complexity repeats, tandem repeats and annotated interspersed repeats. We also determine the boundaries of each group member so that it closely represents a biological unit, e.g. a complete gene, or a partial gene coding a protein domain. RESULTS: We developed a program called HomologMiner to identify homologous groups applicable to genome sequences that have been properly marked for low-complexity repeats and annotated interspersed repeats. We applied it to the whole genomes of human (hg17), macaque (rheMac2) and mouse (mm8). Groups obtained include gene families (e.g. olfactory receptor gene family, zinc finger families), unannotated interspersed repeats and additional homologous groups that resulted from recent segmental duplications. Our program incorporates several new methods: a new abstract definition of consistent duplicate units, a new criterion to remove moderately frequent tandem repeats, and new algorithmic techniques. We also provide preliminary analysis of the output on the three genomes mentioned above, and show several applications including identifying boundaries of tandem gene clusters and novel interspersed repeat families. AVAILABILITY: All programs and datasets are downloadable from www.bx.psu.edu/miller_lab.  相似文献   

6.
MOTIVATION: One of the major features of genomic DNA sequences, distinguishing them from texts in most spoken or artificial languages, is their high repetitiveness. Variation in the repetitiveness of genomic texts reflects the presence and density of different biologically important messages. Thus, deviation from an expected number of repeats in both directions indicates a possible presence of a biological signal. Linguistic complexity corresponds to repetitiveness of a genomic text, and potential regulatory sites may be discovered through construction of typical patterns of complexity distribution. RESULTS: We developed software for fast calculation of linguistic sequence complexity of DNA sequences. Our program utilizes suffix trees to compute the number of subwords present in genomic sequences, thereby allowing calculation of linguistic complexity in time linear in genome size. The measure of linguistic complexity was applied to the complete genome of Haemophilus influenzae. Maps of complexity along the entire genome were obtained using sliding windows of 40, 100, and 2000 nucleotides. This approach provided an efficient way to detect simple sequence repeats in this genome. In addition, local profiles of complexity distribution around the starts of translation were constructed for 21 complete prokaryotic genomes. We hypothesize that complexity profiles correspond to evolutionary relationships between organisms. We found principal differences in profiles of the GC-rich and other (non-GC-rich) genomes. We also found characteristic differences in profiles of AT genomes, which probably reflect individual species variations in translational regulation. AVAILABILITY: The program is available upon request from Alexander Bolshoy or at http://csweb.haifa.ac.il/library/#complex.  相似文献   

7.
Three different repeat sequences have been mapped within the cloned EcoRI fragments that contain the adult beta-globin genes from the BALB/c (Hddd) mouse. One sequence, "a", occurs 1.5-2 kb 3' to the beta-major gene. A second, "b", is found 4kb 5' and 7.5kb 3' to the beta-minor gene. The 14kb EcoRI fragment bearing the beta-minor gene carries at least one additional repetitive element, "c". Probing a BALB/c DNA library with each repeat has demonstrated that these sequences are moderately to highly repetitive and are extensively interspersed with each other throughout the genome. In addition, repeats "a" and "b" are preferentially found in satellite and main-band DNa, respectively. The occurrence of these repeats elsewhere in the beta-globin cluster was demonstrated by probing the non-adult globin clones with each repeat. The arrangement of these repeats around the non-adult genes is 5'-"b"-"b"-epsilon y-beta hl-beta h2-"c"-beta h3-3'. Probing the C57BL/10 (Hbbs) adult gene clones with these repeats demonstrated that the distribution of these sequences in the adult region of these two haplotypes is essentially the same.  相似文献   

8.
All organisms that have been studied until now have been found to have differential distribution of simple sequence repeats (SSRs), with more SSRs in intergenic than in coding sequences. SSR distribution was investigated in Archaea genomes where complete chromosome sequences of 19 Archaea were analyzed with the program SPUTNIK to find di- to penta-nucleotide repeats. The number of repeats was determined for the complete chromosome sequences and for the coding and non-coding sequences. Different from what has been found for other groups of organisms, there is an abundance of SSRs in coding regions of the genome of some Archaea. Dinucleotide repeats were rare and CG repeats were found in only two Archaea. In general, trinucleotide repeats are the most abundant SSR motifs; however, pentanucleotide repeats are abundant in some Archaea. Some of the tetranucleotide and pentanucleotide repeat motifs are organism specific. In general, repeats are short and CG-rich repeats are present in Archaea having a CG-rich genome. Among the 19 Archaea, SSR density was not correlated with genome size or with optimum growth temperature. Pentanucleotide density had an inverse correlation with the CG content of the genome.  相似文献   

9.
Simple sequence repeats (SSRs) have become important molecular markers for a broad range of applications, such as genome mapping and characterization, phenotype mapping, marker assisted selection of crop plants and a range of molecular ecology and diversity studies. These repeated DNA sequences are found in both prokaryotes and eukaryotes. They are distributed almost at random throughout the genome, ranging from mononucleotide to trinucleotide repeats. They are also found at longer lengths (> 6 repeating units) of tracts. Most of the computer programs that find SSRs do not report its exact position. A computer program SSRscanner was written to find out distribution, frequency and exact location of each SSR in the genome. SSRscanner is user friendly. It can search repeats of any length and produce outputs with their exact position on chromosome and their frequency of occurrence in the sequence.

Availability  相似文献   


10.
Full-length L1 elements have been shown to possess, at their 5' end, tandem repeats called "A" or "F" types. By sequencing the 5' region of two large L1 copies that did not hybridize to A or F probes, we have identified a new sequence that is found at the 5' end of many L1 elements and that we call "V." The element characterized has no 200-bp tandem repetitive structure, and the new 5' sequence is not similar to the A or F sequences. The study of the relationships between the V and L1 sequences has shown that only half of the V (i.e., V-specific 5') sequences in the genome are linked to the 5' end of L1 copies. In related rodent species, a comparative study by Southern blot and PCR analysis of the V sequence suggests that this L1 subfamily has an ancient origin and that V sequence isolated from the remainder of the L1 element has been amplified during the evolution of the mouse genome.  相似文献   

11.
DNA, chromosomes, and in situ hybridization.   总被引:6,自引:0,他引:6  
Trude Schwarzacher 《Génome》2003,46(6):953-962
In situ hybridization is a powerful and unique technique that correlates molecular information of a DNA sequence with its physical location along chromosomes and genomes. It thus provides valuable information about physical map position of sequences and often is the only means to determine abundance and distribution of repetitive sequences making up the majority of most genomes. Repeated DNA sequences, composed of units of a few to a thousand base pairs in size, occur in blocks (tandem or satellite repeats) or are dispersed (including transposable elements) throughout the genome. They are often the most variable components of a genome, often being species and, occasionally, chromosome specific. Their variability arises through amplification, diversification and dispersion, as well as homogenization and loss; there is a remarkable correlation of molecular sequence features with chromosomal organization including the length of repeat units, their higher order structures, chromosomal locations, and dispersion mechanisms. Our understanding of the structure, function, organization, and evolution of genomes and their evolving repetitive components enabled many new cytogenetic applications to both medicine and agriculture, particularly in diagnosis and plant breeding.  相似文献   

12.
A recombinant library of human DNA sequences was screened with a segment of simian virus 40 (SV40) DNA that spans the viral origin of replication. One hundred and fifty phage were isolated that hybridized to this probe. Restriction enzyme and hybridization analyses indicated that these sequences were partially homologous to one another. Direct DNA sequencing of two such SV40-hybridizing segments indicated that this was not a highly conserved family of sequences, but rather a set of DNA fragments that contained repetitive regions of high guanine plus cytosine content. These sequences were not members of the previously described Alu family of repeats and hybridized to SV40 DNA more strongly than do Alu family members. Computer analyses showed that the human DNA segments contained multiple homologies with sequences throughout the SV40 origin region, although sequences on the late side of the viral origin contained the strongest cross-hybridizing sequences. Because of the number and complexity of the matches detected, we could not determine unambiguously which of the many possible heteroduplexes between these DNAs was thermodynamically most favored. No hybridization of these human DNA sequences to any other segment of the SV40 genome was detected. In contrast, the human DNA segments isolated cross-hybridized with many sequences within the human genome. We tested for the presence of several functional domains on two of these human DNA fragments. One SV40-hybridizing fragment, SVCR29, contained a sequence which enhanced the efficiency of thymidine kinase transformation in human cells by approximately 20-fold. This effect was seen in an orientation-independent manner when the sequence was present at the 3' end of the chicken thymidine kinase gene. We propose that this segment of DNA contains a sequence analogous to the 72-base-pair repeats of SV40. The existence of such an "activator" element in cellular DNA raises the possibility that families of these sequences may exist in the mammalian genome.  相似文献   

13.
MOTIVATION: Tandemly organized repetitive sequences (satellite DNA) are widespread in complex eukaryotic genomes. In plants, satellite repeats often represent a substantial part of nuclear DNA but only a little is known about the molecular mechanisms of their amplification and their possible role(s) in genome evolution and function. Unfortunately, addressing these questions via characterization of general sequence properties of known satellite repeats has been hindered by a difficulty in obtaining a complete and unbiased set of sequence data for this analysis. This is mainly due to the presence of multiple entries of homologous sequences and of single entries that contain more than one repeated unit (monomer) in the public databases. RESULTS: We have established a computer database specialized for plant satellite repeats (PlantSat) that integrates sequence data available from various resources with supplementary information including repeat consensus sequences, abundances, and chromosomal localizations. The sequences are stored as individual repeat monomers grouped into families, which simplifies their computer analysis and makes it more accurate. Using this feature, we have performed a basic sequence analysis of the whole set of plant satellite repeats with respect to their monomer length and nucleotide composition. The analysis revealed several preferred length ranges of the monomers (approximately 165 bp and its multiples) and an over-representation of the AA/TT dinucleotide in the repeats. We have also detected an enrichment of satellite DNA sequences for the motif CAAAA that is supposed to be involved in breakage-reunion of repeated sequences.  相似文献   

14.
简单重复序列亦称微卫星,被成功应用于许多真核生物、原核生物和病毒的基因组和进化研究,但是噬菌体中的微卫星目前很少被研究。因此对60条尾病毒目基因组中的微卫星和和复合型微卫星(由两个或两个以上直接相邻的微卫星组成)做综合性分析,在这60个基因组中总共观察到11 874个微卫星和449个复合型微卫星。相关性分析表明微卫星个数与基因组大小成正线性相关(ρ=0.899, P<0.01)。参考序列中的微卫星个数少于对应的随机序列中微卫星个数,这种反常现象主要是因为参考序列含有较少的单核苷酸和二核苷酸重复。A/T和AT/TA重复是单核苷酸和二核苷酸重复中最主要的类型,因此单核苷酸重复中的GC含量明显低于相应的序列中的GC含量;相比之下,微卫星中的二核苷酸和三核苷酸重复的GC含量与对应的参考序列的GC含量无明显区别。尾病毒目基因组中的这些结果与其它生物体基因组存在一定的差别。有助于了解尾病毒目中微卫星的分布、进化和生物学功能。  相似文献   

15.
The large-scale bacterial artificial chromosome-end sequencing project of Nile tilapia (Oreochromis niloticus) has generated extensive sequence data that allowed the examination of the repeat content in this fish genome and building of a repeat library specific for this species. This library was established based on Tilapiini repeat sequences from GenBank, sequences orthologous to the repeat library of zebrafish in Repbase, and novel repeats detected by genome analysis using MIRA assembler. We estimate that repeats constitute about 14% of the tilapia genome and also give estimates for the occurrence of the different repeats based on the Basic Local Alignment Search Tool searches within the database of known tilapia sequences. The frequent occurrence of novel repeats in the tilapia genome indicates the importance of using the species-specific repeat masker prior to sequence analyses. A web tool based on the RepeatMasker software was designed to assist tilapia genomics.  相似文献   

16.
MOTIVATION: Low-complexity or cryptically simple sequences are widespread in protein sequences but their evolution and function are poorly understood. To date methods for the detection of low complexity in proteins have been directed towards the filtering of such regions prior to sequence homology searches but not to the analysis of the regions per se. However, many of these regions are encoded by non-repetitive DNA sequences and may therefore result from selection acting on protein structure and/or function. RESULTS: We have developed a new tool, based on the SIMPLE algorithm, that facilitates the quantification of the amount of simple sequence in proteins and determines the type of short motifs that show clustering above a certain threshold. By modifying the sensitivity of the program simple sequence content can be studied at various levels, from highly organised tandem structures to complex combinations of repeats. We compare the relative amount of simplicity in different functional groups of yeast proteins and determine the level of clustering of the different amino acids in these proteins. AVAILABILITY: The program is available on request or online at http://www.biochem.ucl.ac.uk/bsm/SIMPLE.  相似文献   

17.
Telomer repeats represented by hexamer (TTAGGG)n at chromosome termini are required for correct function and chromosome stability. At the same time, interstitial telomer sequence (ITS) located far from the chromosome ends are known for several mammalian genomes, including the human genome. It is assumed that these repeats mark the points of fusion or other chromosome reconstructions of ancestors. Exact localization of all interstitial telomer sequences in the genome could greatly improve our understanding of the mechanism of karyotype evolution and species origin. We have developed a software for a search of interstitial telomer sequences in complete sequences of mammalian genomes. We have demonstrated the evolutionary significance of repeats by an example of human chromosome 2. The results and supplementary materials are available at the site of the Institute of Cytology and Genetics: http://www.bionet.nsc.ru/labs/theorylabmain/orlov/telomere/.  相似文献   

18.
MOTIVATION: Microsatellites, also known as simple sequence repeats, are the tandem repeats of nucleotide motifs of the size 1-6 bp found in every genome known so far. Their importance in genomes is well known. Microsatellites are associated with various disease genes, have been used as molecular markers in linkage analysis and DNA fingerprinting studies, and also seem to play an important role in the genome evolution. Therefore, it is of importance to study distribution, enrichment and polymorphism of microsatellites in the genomes of interest. For this, the prerequisite is the availability of a computational tool for extraction of microsatellites (perfect as well as imperfect) and their related information from whole genome sequences. Examination of available tools revealed certain lacunae in them and prompted us to develop a new tool. RESULTS: In order to efficiently screen genome sequences for microsatellites (perfect as well as imperfect), we developed a new tool called IMEx (Imperfect Microsatellite Extractor). IMEx uses simple string-matching algorithm with sliding window approach to screen DNA sequences for microsatellites and reports the motif, copy number, genomic location, nearby genes, mutational events and many other features useful for in-depth studies. IMEx is more sensitive, efficient and useful than the available widely used tools. IMEx is available in the form of a stand-alone program as well as in the form of a web-server. AVAILABILITY: A World Wide Web server and the stand-alone program are available for free access at http://203.197.254.154/IMEX/ or http://www.cdfd.org.in/imex.  相似文献   

19.
Although most non-long terminal repeat (non-LTR) retrotransposons are inserted throughout the host genome, many non-LTR elements in the R1 clade are inserted into specific sites within the target sequence. Four R1 clade families have distinct target specificity: R1 and RT insert into specific sites of 28S rDNA, and TRAS and SART insert into different sites within the (TTAGG)(n) telomeric repeats. To study the evolutionary history of target specificity of R1-clade retrotransposons, we have screened extensively novel representatives of the clade from various insects by in silico and degenerate polymerase chain reaction (PCR) cloning. We found four novel sequence-specific elements; Waldo (WaldoAg1, 2, and WaldoFs1) inserts into ACAY repeats, Mino (MinoAg1) into AC repeats, R6 into another specific site of the 28S rDNA, and R7 into a specific site of the 18S rDNA. In contrast, several elements (HOPE, WISHBm1, HidaAg1, NotoAg1, KagaAg1, Ha1Fs1) lost target sequence specificity, although some of them have preferred target sequences. Phylogenetic trees based on the RT and EN domains of each element showed that (1) three rDNA-specific elements, RT, R6, and R7, diverged from Waldo; (2) the elements having similar target sequences are phylogenetically related; and (3) the target specificity in the R1 clade was obtained once and thereafter altered and lost several times independently. These data indicate that the target specificity in R1 clade retroelements has changed during evolution and is more divergent than has been speculated so far.  相似文献   

20.
Transposable elements (TEs) are mobile, repetitive DNA sequences that are almost ubiquitous in prokaryotic and eukaryotic genomes. They have a large impact on genome structure, function and evolution. With the recent development of high-throughput sequencing methods, many genome sequences have become available, making possible comparative studies of TE dynamics at an unprecedented scale. Several methods have been proposed for the de novo identification of TEs in sequenced genomes. Most begin with the detection of genomic repeats, but the subsequent steps for defining TE families differ. High-quality TE annotations are available for the Drosophila melanogaster and Arabidopsis thaliana genome sequences, providing a solid basis for the benchmarking of such methods. We compared the performance of specific algorithms for the clustering of interspersed repeats and found that only a particular combination of algorithms detected TE families with good recovery of the reference sequences. We then applied a new procedure for reconciling the different clustering results and classifying TE sequences. The whole approach was implemented in a pipeline using the REPET package. Finally, we show that our combined approach highlights the dynamics of well defined TE families by making it possible to identify structural variations among their copies. This approach makes it possible to annotate TE families and to study their diversification in a single analysis, improving our understanding of TE dynamics at the whole-genome scale and for diverse species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号