首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Short interspersed repetitive elements (SINEs) are widely distributed among the genomes of eukaryotes. We proposed previously that a SINE should be defined by the presence of a region homologous to a tRNA or to 7SL RNA, together with A-box and B-box promoter sequences, in order to distinguish SINEs from other short repetitive sequences, such as short segments of LINEs (long interspersed repetitive elements; Okada et al. Gene 205, 229–243, 1997). Numerous SINE sequences have been deposited to date in DNA databases. In some cases, however, designation of a particular sequence is problematic when the short repetitive sequence has been defined as a SINE without reference to the presence or absence of promoter elements specific for RNA polymerase III. We demonstrate here that four different sequences, namely, ARE1p, ARE2p, CetSINE1, and CetSINE2, each of which has been reported as a SINE, are, in fact, only partial sequences of members of a new subfamily of L1. We also demonstrate that members of this subfamily are distributed specifically among the genomes of cetartiodactyls. Received: 3 May 2000 / Accepted: 22 August 2000  相似文献   

2.
During routine screens of the NCBI databases using human repetitive elements we discovered an unlikely level of nucleotide identity across a broad range of phyla. To ascertain whether databases containing DNA sequences, genome assemblies and trace archive reads were contaminated with human sequences, we performed an in depth search for sequences of human origin in non-human species. Using a primate specific SINE, AluY, we screened 2,749 non-primate public databases from NCBI, Ensembl, JGI, and UCSC and have found 492 to be contaminated with human sequence. These represent species ranging from bacteria (B. cereus) to plants (Z. mays) to fish (D. rerio) with examples found from most phyla. The identification of such extensive contamination of human sequence across databases and sequence types warrants caution among the sequencing community in future sequencing efforts, such as human re-sequencing. We discuss issues this may raise as well as present data that gives insight as to how this may be occurring.  相似文献   

3.
A repeating element of DNA has been isolated and sequenced from the genome of Bordetella pertussis. Restriction map analysis of this element shows single internal ClaI, SphI, BstEII and SalI sites. Over 40 DNA fragments are seen in ClaI digests of B. pertussis genomic DNA to which the repetitive DNA sequence hybridizes. Sequence analysis of the repeat reveals that it has properties consistent with bacterial insertion sequence (IS) elements. These properties include its length of 1053 bp, multiple copy number and presence of 28 bp of near-perfect inverted repeats at its termini. Unlike most IS elements, the presence of this element in the B. pertussis genome is not associated with a short duplication in the target DNA sequence. This repeating element is not found in the genomes of B. parapertussis or B. bronchiseptica. Analysis of a DNA fragment adjacent to one copy of the repetitive DNA sequence has identified a different repeating element which is found in nine copies in B. parapertussis and four copies in B. pertussis, suggesting that there may be other repeating DNA elements in the different Bordetella species. Computer analysis of the B. pertussis repetitive DNA element has revealed no significant nucleotide homology between it and any other bacterial transposable elements, suggesting that this repetitive sequence is specific for B. pertussis.  相似文献   

4.
Huntley MA  Golding GB 《Proteins》2002,48(1):134-140
A simple sequence is abundant in the proteins that have been sequenced to date. But unusual protein features, such as a simple sequence, are not present in the same high frequency within structural databases. A subset of these simple sequences, a group with a highly repetitive nature has been shown to be abundant in eukaryotes but not in prokaryotes. In this study, an examination of the eukaryotic proteins in the Protein Data Bank (PDB) has revealed a large deficiency of low complexity, highly repetitive protein repeats. Through simulated databases of similar samples of eukaryotic proteins taken from the National Center for Biotechnology Information (NCBI) database, it is shown that the PDB contains a significantly less highly repetitive, simple sequence than artificial databases of similar composition randomly derived from NCBI. When the structural data for those few PDB sequences that did contain a highly repetitive simple sequence is examined in detail, it is found that in most cases the tertiary structure is unknown for the regions consisting of a simple sequence. This lack of a simple sequence both in the PDB database and in the structural information suggests that this type of simple sequence may produce disordered structures that make structural characterization difficult.  相似文献   

5.
We have characterized a family of repetitive DNA elements in the beta-globin locus of the goat. These sequences are structurally analogous to the Alu families of repeats of other mammals. Repetitive elements are located both in the intervening sequences and in the intergenic regions of the goat beta-globin locus. Nucleotide sequence analysis of five repetitive elements located within the large intervening sequence of the beta-like globin genes, and four repeats located 5' to the major developmentally regulated beta-globin genes has resulted in the definition of a consensus sequence for this family of repeats.  相似文献   

6.
During recloning of Nicotiana tabacum L. repetitive sequence R8.3 in Escherichia coli, a modified clone that differed from the original by the insertion of an IS10 sequence was unintentionally produced. The insert was flanked by a 9-bp direct repeat derived from the R8.3 sequence, the 9-bp duplication of acceptor DNA in the site of insertion being a characteristic of IS10 transposition events. A database search using the FASTA program showed IS10 and other prokaryotic IS elements inserted into numerous eukaryotic clones. Unexpectedly, the IS10, which is not a natural component of the E. coli genome, appeared to be by far the most frequent contaminant of DNA databases among several IS sequences tested. In the GenEMBL database, the IS10 query sequence yielded positive scores with more than 500 eukaryotic clones. Insertions of shortened IS10 sequences having only one intact terminal inverted repeat were commonly found. Most full-length IS10 insertions (32 out of 40 analyzed) were flanked by 9-bp direct repeats having the consensus 5'-NPuCNN-NGPyN-3' with a strong preference for 5'-TGCTNA-GNN-3'. One insertion was flanked by an inverted repeat of more than 400 bp in length. PCR amplification and Southern analysis revealed the presence of IS10 sequences in E. coli strains commonly used for DNA cloning, including some reported to be Tn10-free. No IS10-specific PCR product was obtained with N. tabacum or human DNA. Our data suggest that transposition of IS10 elements may accompany cloning steps, particularly into large BAC vectors. This might lead to the relatively frequent contamination of DNA databases by this bacterial sequence. It is estimated that one in approximately every thousand eukaryotic clone in the databases is contaminated by IS-derived sequences. We recommend checking submitted sequences for the presence of IS10 and other IS elements. In addition, DNA databases should be corrected by removing contaminating IS sequences.  相似文献   

7.
Carrot is the most economically important member of the Apiaceae family and a major source of provitamin A carotenoids in the human diet. However, carrot molecular resources are relatively underdeveloped, hampering a number of genetic studies. Here, we report on the synthesis and characterization of a bacterial artificial chromosome (BAC) library of carrot. The library is 17.3-fold redundant and consists of 92,160 clones with an average insert size of 121 kb. To provide an overview of the composition and organization of the carrot nuclear genome we generated and analyzed 2,696 BAC-end sequences (BES) from nearly 2,000 BACs, totaling 1.74 Mb of BES. This analysis revealed that 14% of the BES consists of known repetitive elements, with transposable elements representing more than 80% of this fraction. Eleven novel carrot repetitive elements were identified, covering 8.5% of the BES. Analysis of microsatellites showed a comparably low frequency for these elements in the carrot BES. Comparisons of the translated BES with protein databases indicated that approximately 10% of the carrot genome represents coding sequences. Moreover, among eight dicot species used for comparison purposes, carrot BES had highest homology to protein-coding sequences from tomato. This deep-coverage library will aid carrot breeding and genetics. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users. Nucleotide sequence data reported are available in the DDBJ/EMBL/GenBank databases under the accession numbers FJ147695–FJ150390.  相似文献   

8.
9.
Total polysomal RNA from Xenopus laevis stage 40 embryos was probed for the presence of repetitive sequences by Northern blot analysis with a genomic DNA fragment which had previously been shown to contain several repetitive sequence elements (Spohr et al., 1981). The analysis revealed that various presumptive mRNAs contain sequences complementary to the repetitive probe. Consequently, a cDNA library was constructed and screened with the same probe. Forty-eight positive recombinants containing eucaryotic inserts of 300–700 base pairs were isolated and one such clone was characterized in detail. Analysis of its nucleotide sequence revealed the presence of an open reading frame for 118 amino acids. Comparison of nucleotide sequences located 3′ to this presumptive protein coding region with the sequence of the genomic DNA fragment used as a probe clearly identifies and allows one to define the exact location of the repetitive element in the cloned cDNA. This analysis shows furthermore that one portion of the repeated sequence is highly conserved in the two members of this repetitive sequence family, whereas the other part is more divergent. In this area blocks of oligonucleotides are scattered between nonhomologous DNA stretches. The occurrence frequency of the presumptive mRNAs which carry repetitive elements homologous to the used repetitive probe is suggested to be close to that of rare mRNAs.  相似文献   

10.
11.
Summary The Bombyx fibroin gene has a discrete mosaic structure of various repetitive sequences, which may have evolved through various repeating arrangements. Detailed sequence analysis of the fibroin gene containing coding and noncoding regions revealed that the whole sequence could be arranged as an array of short repetitive sequences. A portion of the intron of the fibroin gene is one of interspersed repetitive elements. We cloned a 1.5-kb DNA fragment of the Bombyx genome that contains interspersed elements homologous to the intron sequence. Sequence comparison between the intron and the 1.5-kb fragment shows that partial duplication has frequently occurred in evolutionary progress, and the resultant repetitive blocks of short motif sequences are abundant in the genome. These facts suggest that tandem duplication of the short motif sequence is an important rearrangement in genomic evolution of the fibroin gene. Offprint requests to: S. Ichimura  相似文献   

12.
The 5' and 3' untranslated regions of eukaryotic mRNAs may play a crucial role in the regulation of gene expression controlling mRNA localization, stability and translational efficiency. For this reason we developed UTRdb, a specialized database of 5' and 3' untranslated sequences of eukaryotic mRNAs cleaned from redundancy. UTRdb entries are enriched with specialized information not present in the primary databases including the presence of nucleotide sequence patterns already demonstrated by experimental analysis to have some functional role. All these patterns have been collected in the UTRsite database so that it is possible to search any input sequence for the presence of annotated functional motifs. Furthermore, UTRdb entries have been annotated for the presence of repetitive elements. All internet resources implemented for retrieval and functional analysis of 5' and 3' untranslated regions of eukaryotic mRNAs are accessible at http://bigarea.area.ba.cnr.it:8000/EmbIT/UTRH ome/  相似文献   

13.
C A Fields  D L Grady  R K Moyzis 《Genomics》1992,13(2):431-436
Fifteen examples of the transposon-like human element (THE) LTR and thirteen examples of the MstII interspersed repeat are aligned to generate new consensus sequences for these human repetitive elements. The consensus sequences of these elements are very similar, indicating that they compose subfamilies of a single human interspersed repetitive sequence family. Members of this highly polymorphic repeat family have been mapped to at least 11 chromosomes. Seven examples of the THE internal sequence are also aligned to generate a new consensus sequence for this element. Estimates of the abundance of this repetitive sequence family, derived from both hybridization analysis and frequency of occurrence in GenBank, indicate that THE-LTR/MstII sequences are present every 100-3000 kb in human DNA. The widespread occurrence of members of this family makes them useful landmarks, like Alu, L1, and (GT)n repeats, for physical and genetic mapping of human DNA.  相似文献   

14.
The 5' and 3' untranslated regions of eukaryotic mRNAs may play a crucial role in the regulation of gene expression controlling mRNA localization, stability and translational efficiency. For this reason we developed UTRdb (http://bigarea.area.ba.cnr.it:8000/BioWWW/#U TRdb), a specialized database of 5' and 3' untranslated sequences of eukaryotic mRNAs cleaned from redundancy. UTRdb entries are enriched with specialized information not present in the primary databases including the presence of nucleotide sequence patterns already demonstrated by experimental analysis to have some functional role. All these patterns have been collected in the UTRsite database so that it is possible to search any input sequence for the presence of annotated functional motifs. Furthermore, UTRdb entries have been annotated for the presence of repetitive elements.  相似文献   

15.
16.
Expressed sequence tags (ESTs) are randomly sequenced cDNA clones. Currently, nearly 3 million human and 2 million mouse ESTs provide valuable resources that enable researchers to investigate the products of gene expression. The EST databases have proven to be useful tools for detecting homologous genes, for exon mapping, revealing differential splicing, etc. With the increasing availability of large amounts of poorly characterised eukaryotic (notably human) genomic sequence, ESTs have now become a vital tool for gene identification, sometimes yielding the only unambiguous evidence for the existence of a gene expression product. However, BLAST-based Web servers available to the general user have not kept pace with these developments and do not provide appropriate tools for querying EST databases with large highly spliced genes, often spanning 50 000-100 000 bases or more. Here we describe Gene2EST (http://woody.embl-heidelberg.de/gene2est/), a server that brings together a set of tools enabling efficient retrieval of ESTs matching large DNA queries and their subsequent analysis. RepeatMasker is used to mask dispersed repetitive sequences (such as Alu elements) in the query, BLAST2 for searching EST databases and Artemis for graphical display of the findings. Gene2EST combines these components into a Web resource targeted at the researcher who wishes to study one or a few genes to a high level of detail.  相似文献   

17.
D Tautz 《Nucleic acids research》1989,17(16):6463-6471
Short simple sequence stretches occur as highly repetitive elements in all eukaryotic genomes and partially also in prokaryotes and eubacteria. They are thought to arise by slippage like events working on randomly occurring internally repetitive sequence stretches. This predicts that they should be generally hypervariable in length. I have used the polymerase chain reaction (PCR) process to show that several randomly chosen simple sequence loci with different nucleotide composition and from different species show extensive length polymorphisms. These simple sequence length polymorphisms (SSLP) may be usefully exploited for identity testing, population studies, linkage analysis and genome mapping.  相似文献   

18.
Eller CD  Regelson M  Merriman B  Nelson S  Horvath S  Marahrens Y 《Gene》2007,390(1-2):153-165
Housekeeping genes are expressed across a wide variety of tissues. Since repetitive sequences have been reported to influence the expression of individual genes, we employed a novel approach to determine whether housekeeping genes can be distinguished from tissue-specific genes by their repetitive sequence context. We show that Alu elements are more highly concentrated around housekeeping genes while various longer (> 400-bp) repetitive sequences (“repeats”), including Long Interspersed Nuclear Element-1 (LINE-1) elements, are excluded from these regions. We further show that isochore membership does not distinguish housekeeping genes from tissue-specific genes and that repetitive sequence environment distinguishes housekeeping genes from tissue-specific genes in every isochore. The distinct repetitive sequence environment, in combination with other previously published sequence properties of housekeeping genes, was used to develop a method of predicting housekeeping genes on the basis of DNA sequence alone. Using expression across tissue types as a measure of success, we demonstrate that repetitive sequence environment is by far the most important sequence feature identified to date for distinguishing housekeeping genes.  相似文献   

19.
20.
Why repetitive DNA is essential to genome function   总被引:1,自引:0,他引:1  
There are clear theoretical reasons and many well-documented examples which show that repetitive, DNA is essential for genome function. Generic repeated signals in the DNA are necessary to format expression of unique coding sequence files and to organise additional functions essential for genome replication and accurate transmission to progeny cells. Repetitive DNA sequence elements are also fundamental to the cooperative molecular interactions forming nucleoprotein complexes. Here, we review the surprising abundance of repetitive DNA in many genomes, describe its structural diversity, and discuss dozens of cases where the functional importance of repetitive elements has been studied in molecular detail. In particular, the fact that repeat elements serve either as initiators or boundaries for heterochromatin domains and provide a significant fraction of scaffolding/matrix attachment regions (S/MARs) suggests that the repetitive component of the genome plays a major architectonic role in higher order physical structuring. Employing an information science model, the 'functionalist' perspective on repetitive DNA leads to new ways of thinking about the systemic organisation of cellular genomes and provides several novel possibilities involving repeat elements in evolutionarily significant genome reorganisation. These ideas may facilitate the interpretation of comparisons between sequenced genomes, where the repetitive DNA component is often greater than the coding sequence component.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号