首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Epstein-Barr virus DNA is known to have partially homologous segments, designated DL and DR, near the left and right ends of the long unique region (Raab-Traub et al., Cell 22:257-267, 1980). DL and DR are each partially composed of tandem direct repeat sequences. DL contains 11 to 14 repeats of a 124-base-pair sequence designated IR2. DR contains approximately 30 direct repeats of a 103-base-pair sequence designated IR4. The DL and DR sequences have colinear partial homology for approximately 2.4 and 1.5 kilobase pairs to the right of IR2 and IR4, respectively. IR2 and IR4 are similar sequences and evolved in part from a common ancestor. Both sequences are 84% guanine and cytosine and have limited homology to Epstein-Barr virus IR1 and to the herpes simplex virus type 1 inverted terminal repeat "a" sequence. IR2 encodes part of an abundant 2.5-kilobase persistent early EBV RNA expressed in productively infected cells, but does not encode part of the 3-kilobase Epstein-Barr virus RNA which is transcribed from the adjacent IR1-U2 region of the Epstein-Barr virus genome in latently infected cells.  相似文献   

2.
The BamHI K region of Epstein-Barr virus DNA is transcribed in latently infected cells from Burkitt tumors and in growth-transformed B-lymphocytes latently infected with Epstein-Barr virus. We determined the nucleotide sequence of a 1,153-base pair HinfI fragment in BamHI fragment K from the B95-8 Epstein-Barr virus isolate. The fragment contains a remarkable 708-base pair simple sequence repeat array, designated IR3, which is composed of only three nucleotide triplet elements: GGG, GCA, and GGA. The triplets are organized into three repeat units: GCAGGA, GCAGGAGGA, and GGGGCAGGA. Immediately 3' of IR3 are tandem nearly perfect direct repeats of two different 24-base pair sequences. IR3 is conserved at a colinear position in the DNAs of other Epstein-Barr virus isolates, and a homologous sequence maps at the same location in the genome of a genetically related baboon herpesvirus, herpesvirus papio. IR3 is transcribed from left to right in latently infected, growth-transformed IB4 cells. It encodes part of a 2.0-kilobase exon of the 3.7-kilobase cytoplasmic polyadenylated RNA previously detected in IB4 cells (van Santen et al., Proc. Natl. Acad. Sci. U.S.A. 78:1930-1934, 1981). IR3 also encodes parts of 2.4- and 1.0-kilobase RNAs in productively infected B95-8 cells.  相似文献   

3.

Background

Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data.

Results

Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that the most abundant tandem repeat is the centromere DNA, which was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution.

Conclusions

While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animal and plant genomes. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes.  相似文献   

4.
G. S. Wilkinson  F. Mayer  G. Kerth    B. Petri 《Genetics》1997,146(3):1035-1048
Analysis of mitochondrial DNA control region sequences from 41 species of bats representing 11 families revealed that repeated sequence arrays near the tRNA-Pro gene are present in all vespertilionine bats. Across 18 species tandem repeats varied in size from 78 to 85 bp and contained two to nine repeats. Heteroplasmy ranged from 15% to 63%. Fewer repeats among heteroplasmic than homoplasmic individuals in a species with up to nine repeats indicates selection may act against long arrays. A lower limit of two repeats and more repeats among heteroplasmic than homoplasmic individuals in two species with few repeats suggests length mutations are biased. Significant regressions of heteroplasmy, θ and π, on repeat number further suggest that repeat duplication rate increases with repeat number. Comparison of vespertilionine bat consensus repeats to mammal control region sequences revealed that tandem repeats of similar size, sequence and number also occur in shrews, cats and bighorn sheep. The presence of two conserved protein-binding sequences in all repeat units indicates that convergent evolution has occurred by duplication of functional units. We speculate that D-loop region tandem repeats may provide signal redundancy and a primitive repair mechanism in the event of somatic mutations to these binding sites.  相似文献   

5.
Tandemly repeated sequences are a major component of the eukaryotic genome. Although the general characteristics of tandem repeats have been well documented, the processes involved in their origin and maintenance remain unknown. In this study, a region on the paternal sex ratio (PSR) chromosome was analyzed to investigate the mechanisms of tandem repeat evolution. The region contains a junction between a tandem array of PSR2 repeats and a copy of the retrotransposon NATE, with other dispersed repeats (putative mobile elements) on the other side of the element. Little similarity was detected between the sequence of PSR2 and the region of NATE flanking the array, indicating that the PSR2 repeat did not originate from the underlying NATE sequence. However, a short region of sequence similarity (11/15 bp) and an inverted region of sequence identity (8 bp) are present on either side of the junction. These short sequences may have facilitated nonhomologous recombination between NATE and PSR2, resulting in the formation of the junction. Adjacent to the junction, the three most terminal repeats in the PSR2 array exhibited a higher sequence divergence relative to internal repeats, which is consistent with a theoretical prediction of the unequal exchange model for tandem repeat evolution. Other NATE insertion sites were characterized which show proximity to both tandem repeats and complex DNAs containing additional dispersed repeats. An ``accretion model' is proposed to account for this association by the accumulation of mobile elements at the ends of tandem arrays and into ``islands' within arrays. Mobile elements inserting into arrays will tend to migrate into islands and to array ends, due to the turnover in the number of intervening repeats. Received: 18 August 1997 / Accepted: 18 September 1998  相似文献   

6.
The repetitive sequence PisTR-A has an unusual organization in the pea (Pisum sativum) genome, being present both as short dispersed repeats as well as long arrays of tandemly arranged satellite DNA. Cloning, sequencing and FISH analysis of both PisTR-A variants revealed that the former occurs in the genome embedded within the sequence of Ty3/gypsy-like Ogre elements, whereas the latter forms homogenized arrays of satellite repeats at several genomic loci. The Ogre elements carry the PisTR-A sequences in their 3′ untranslated region (UTR) separating the gag-pol region from the 3′ LTR. This region was found to be highly variable among pea Ogre elements, and includes a number of other tandem repeats along with or instead of PisTR-A. Bioinformatic analysis of LTR-retrotransposons mined from available plant genomic sequence data revealed that the frequent occurrence of variable tandem repeats within 3′ UTRs is a typical feature of the Tat lineage of plant retrotransposons. Comparison of these repeats to known plant satellite sequences uncovered two other instances of satellites with sequence similarity to a Tat-like retrotransposon 3′ UTR regions. These observations suggest that some retrotransposons may significantly contribute to satellite DNA evolution by generating a library of short repeat arrays that can subsequently be dispersed through the genome and eventually further amplified and homogenized into novel satellite repeats.  相似文献   

7.
Human mammary cells present on the cell surface a polymorphic epithelial mucin (PEM) which is developmentally regulated and aberrantly expressed in tumors. PEM carries tumor-associated epitopes recognized by the monoclonal antibodies HMFG-1, HMFG-2, and SM-3. Previously isolated partial cDNA clones revealed that the core protein contained a large domain consisting of variable numbers of 20-amino acid repeat units. We now report the full sequence for PEM, as deduced from cDNA sequences. The encoded protein consists of three distinct regions: the amino terminus consisting of a putative signal peptide and degenerate repeats; the major portion of the protein which is the tandem repeat region; the carboxyl terminus consisting of degenerate tandem repeats and a unique sequence containing a transmembrane sequence and a cytoplasmic tail. Potential O-glycosylation sites (serines or threonines) make up more than one-fourth of the amino acids. Length variations in the tandem repeat result in PEM being an expressed variable number tandem repeat locus. Tandem repeats appear to be a general characteristic of mucin core proteins.  相似文献   

8.
9.
Exact Tandem Repeats Analyzer 1.0 (E-TRA) combines sequence motif searches with keywords such as ‘organs’, ‘tissues’, ‘cell lines’ and ‘development stages’ for finding simple exact tandem repeats as well as non-simple repeats. E-TRA has several advanced repeat search parameters/options compared to other repeat finder programs as it not only accepts GenBank, FASTA and expressed sequence tags (EST) sequence files, but also does analysis of multiple files with multiple sequences. The minimum and maximum tandem repeat motif lengths that E-TRA finds vary from one to one thousand. Advanced user defined parameters/options let the researchers use different minimum motif repeats search criteria for varying motif lengths simultaneously. One of the most interesting features of genomes is the presence of relatively short tandem repeats (TRs). These repeated DNA sequences are found in both prokaryotes and eukaryotes, distributed almost at random throughout the genome. Some of the tandem repeats play important roles in the regulation of gene expression whereas others do not have any known biological function as yet. Nevertheless, they have proven to be very beneficial in DNA profiling and genetic linkage analysis studies. To demonstrate the use of E-TRA, we used 5,465,605 human EST sequences derived from 18,814,550 GenBank EST sequences. Our results indicated that 12.44% (679,800) of the human EST sequences contained simple and non-simple repeat string patterns varying from one to 126 nucleotides in length. The results also revealed that human organs, tissues, cell lines and different developmental stages differed in number of repeats as well as repeat composition, indicating that the distribution of expressed tandem repeats among tissues or organs are not random, thus differing from the un-transcribed repeats found in genomes.  相似文献   

10.
11.
串联重复序列的物种差异及其生物功能   总被引:13,自引:0,他引:13  
高焕  孔杰 《动物学研究》2005,26(5):555-564
串联重复序列是指1-200个碱基左右的核心重复单位,以头尾相串联的方式重复多次所组成的重 复序列。它广泛存在于真核生物和一些原核生物的基因组中,并表现出种属、碱基组成等的特异性。在基因组 整体水平上,各种优势的重复序列类型不同。即使在同一重复序列类型内部,不同重复拷贝类别(如AT、AC 等)在基因组中的存在也表现出很大的差异。同时,这些重复序列类型和各重复拷贝类别在同一物种的不同染 色体间,以及基因的编码区和非编码区间也表现种属和碱基组成差异。这些差异显示了重复序列起源和进化的 复杂性,可能涉及到多种机制和因素,并与生物功能密切相关。另外,由于重复序列分析软件和统计标准还存 在算法、重复长度、完美性等问题,需要进一步探讨。此外,串联重复序列的自身进化关系、全基因组水平上 的进化地位、在基因组中的生物功能、重复序列数据库建立和应用研究等,将是今后研究的主要课题。  相似文献   

12.
Proteins that share even low sequence homologies are known to adopt similar folds. The beta-propeller structural motif is one such example. Identifying sequences that adopt a beta-propeller fold is useful to annotate protein structure and function. Often, tandem sequence repeats provide the necessary signal for identifying beta-propellers in proteins. In our recent analysis to identify cell surface proteins in archaeal and bacterial genomes, we identified some proteins that contain novel tandem repeats "LVIVD", "RIVW" and "LGxL". In this work, based on protein fold predictions and three-dimensional comparative modeling methods, we predicted that these repeat types fold as beta-propeller. Further, the evolutionary trace analysis of all proteins constituting amino acid sequence repeats in beta-propellers suggest that the novel repeats have diverged from a common ancestor.  相似文献   

13.
We study the length distribution functions for the 16 possible distinct dimeric tandem repeats in DNA sequences of diverse taxonomic partitions of GenBank (known human and mouse genomes, and complete genomes of Caenorhabditis elegans and yeast). For coding DNA, we find that all 16 distribution functions are exponential. For non-coding DNA, the distribution functions for most of the dimeric repeats have surprisingly long tails, that fit a power-law function. We hypothesize that: (i) the exponential distributions of dimeric repeats in protein coding sequences indicate strong evolutionary pressure against tandem repeat expansion in coding DNA sequences; and (ii) long tails in the distributions of dimers in non-coding DNA may be a result of various mutational mechanisms. These long, non-exponential tails in the distribution of dimeric repeats in non-coding DNA are hypothesized to be due to the higher tolerance of non-coding DNA to mutations. By comparing genomes of various phylogenetic types of organisms, we find that the shapes of the distributions are not universal, but rather depend on the specific class of species and the type of a dimer.  相似文献   

14.
In higher eukaryotes, the 5S ribosomal DNA (5S rDNA) is organized in tandem arrays with repeat units composed of a coding region and a non-transcribed spacer sequence (NTS). These tandem arrays can be found on either one or more chromosome pairs. 5S rDNA copies from the tilapia fish, Oreochromis niloticus, were cloned and the nucleotide sequences of the coding region and of the non-transcribed spacer were determined. Moreover, the genomic organization of the 5S rDNA tandem repeats was investigated by fluorescence IN SITU hybridization (FISH) and Southern blot hybridization. Two 5S rDNA classes, one consisting of 1.4-kb repeats and another one with 0.5-kb repeats were identified and designated 5S rDNA type I and type II, respectively. An inverted 5S rRNA gene and a 5S rRNA putative pseudogene were also identified inside the tandem repeats of 5S rDNA type I. FISH permitted the visualization of the 5S rRNA genes at three chromosome loci, one of them consisting of arrays of the 5S rDNA type I, and the two others corresponding to arrays of the 5S rDNA type II. The two classes of the 5S rDNA, the presence of pseudogenes, and the inverted genes observed in the O. niloticus genome might be a consequence of the intense dynamics of the evolution of these tandem repeat elements.  相似文献   

15.
Analysis of the genomes of different bovine herpesvirus 1 strains revealed a UL terminal HindIII fragment differing in size (from 2.4 to 2.8 kilobases). This fragment polymorphism occurred in the DNA of a wild-type isolate, in highly passaged, apathogenic tissue culture derivatives, and in plaque-purified substrains. This heterogeneity was due to variations in the copy number of a 14-base-pair tandem repeat comprising the base sequence 5'-GCTCCTCCTCCCTC-3', which also exists, with some differences, in other short reiteration sequences of herpes simplex virus type 1, Epstein-Barr virus, and related human cellular DNA. Furthermore, the tandem repeat array was located in close proximity to the left end of the viral genome and may functionally be involved in viral replication.  相似文献   

16.
基于后缀列的基因序列最大串联重复查找技术   总被引:1,自引:0,他引:1  
重复序列分析在全基因组研究中起着重要作用,其首要任务就是在DNA序列中识别并定位所有的重复结构。本文提出了一种新的算法,此算法基于一种简单的数据结构——后缀数,用于查找给定的DNA序列中所有的最大串联重复。并且在该算法的基础上编写了一个有效实用的软件——RepLocate,同时给出了它应用到已知的DNA序列的实例。  相似文献   

17.
Cheng ZJ  Murata M 《Genetics》2003,164(2):665-672
From a wild diploid species that is a relative of wheat, Aegilops speltoides, a 301-bp repeat containing 16 copies of a CAA microsatellite was isolated. Southern blot and fluorescence in situ hybridization revealed that approximately 250 bp of the sequence is tandemly arrayed at the centromere regions of A- and B-genome chromosomes of common wheat and rye chromosomes. Although the DNA sequence of this 250-bp repeat showed no notable homology in the databases, the flanking or intervening sequences between the repeats showed high homologies (>82%) to two separate sequences of the gag gene and its upstream region in cereba, a Ty3/gypsy-like retroelement of Hordeum vulgare. Since the amino acid sequence deduced from the 250 bp with seven CAAs showed some similarity ( approximately 53%) to that of the gag gene, we concluded that the 250-bp repeats had also originated from the cereba-like retroelements in diploid wheat such as Ae. speltoides and had formed tandem arrays, whereas the 300-bp repeats were dispersed as a part of cereba-like retroelements. This suggests that some tandem repeats localized at the centromeric regions of cereals and other plant species originated from parts of retrotransposons.  相似文献   

18.
Li J  He S  Zhang L  Hu Y  Yang F  Ma L  Huang J  Li L 《Protoplasma》2012,249(1):207-215
Some reports have shown that nucleolar organizer regions are located at the telomeric region and have a structural connection with telomeres at the cellular level in many organisms. In this study, we found that all 45S ribosomal DNA (rDNA) signals were located at telomeric regions on the chromosomes in Chrysanthemum segetum L., and the 45S rDNA showed distinct signal patterns on different metaphase chromosome spreads. The bicolor fluorescence in situ hybridization experiment on the extended fibers revealed that telomere repeats were structurally connected with or interspersed into rDNA sequences. The close cytological structure relation between rDNA and telomere sequences led us to use PCR with combinations of the telomere primer and the rDNA primer to obtain some fragments, which were flanked by different rDNA and telomere primer sequences. One representative clone CHS2 contains closely connected rDNA and telomere sequences, suggesting that the telomere sequence invaded into the conserved rDNA sequence. In addition, the sequences of some PCR clones were flanked by the single telomeric primer sequence or the rDNA primer sequence. These results suggested that homologous recombination occurred between tandem repeat units of rDNA sequences or telomere repeats at the chromosome terminus.  相似文献   

19.
BACKGROUND: Triplet repeat sequences are of considerable biological importance as the expansion of such tandem arrays can lead to the onset of a range of human diseases. Such sequences can self-pair via mismatch alignments to form higher order structures that have the potential to cause replication blocks, followed by strand slippage and sequence expansion. The all-purine d(GGA)n triplet repeat sequence is of particular interest because purines can align via G.G, A.A and G.A mismatch formation. RESULTS: We have solved the structure of the uniformly 13C,15N-labeled d(G1-G2-A3-G4-G5-A6-T7) sequence in 10 mM Na+ solution. This sequence adopts a novel twofold-symmetric duplex fold where interlocked V-shaped arrowhead motifs are aligned solely via interstrand G1.G4, G2.G5 and A3.A6 mismatch formation. The tip of the arrowhead motif is centered about the p-A3-p step, and symmetry-related local parallel-stranded duplex domains are formed by the G1-G2-A3 and G4-G5-A6 segments of partner strands. CONCLUSIONS: The purine-rich (GGA)n triplet repeat sequence is dispersed throughout the eukaryotic genome. Several features of the arrowhead duplex motif for the (GGA)2 triplet repeat provide a unique scaffold for molecular recognition. These include the large localized bend in the sugar-phosphate backbones, the segmental parallel-stranded alignment of strands and the exposure of the Watson-Crick edges of several mismatched bases.  相似文献   

20.
The structure of eight satellite DNA molecules containing a junction between tandem arrays of different repeated sequences is described. In one class of junctions there was an abrupt switch with the juxtaposition of two satellite arrays. These arrays were closely related and the periodicity of repeats was maintained in phase across the junction. These arrays usually showed extreme homogeneity in their repeating sequences. A second class of junctions was more complex, and in two cases may have arisen by the insertion of a mobile element into a satellite array. A novel mechanism of satellite formation is proposed to explain the precision of junctions and sequence similarities of neighboring satellite arrays. Homogeneous satellite arrays would be generated enzymatically by synthesis of a repeat using the preceding repeat as template. Occasional errors in copying of the template, either single base changes or misreading the length of the repeat unit, would lead to abrupt switches in the repeating sequence.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号