首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 604 毫秒
1.
2.
During recloning of Nicotiana tabacum L. repetitive sequence R8.3 in Escherichia coli, a modified clone that differed from the original by the insertion of an IS10 sequence was unintentionally produced. The insert was flanked by a 9-bp direct repeat derived from the R8.3 sequence, the 9-bp duplication of acceptor DNA in the site of insertion being a characteristic of IS10 transposition events. A database search using the FASTA program showed IS10 and other prokaryotic IS elements inserted into numerous eukaryotic clones. Unexpectedly, the IS10, which is not a natural component of the E. coli genome, appeared to be by far the most frequent contaminant of DNA databases among several IS sequences tested. In the GenEMBL database, the IS10 query sequence yielded positive scores with more than 500 eukaryotic clones. Insertions of shortened IS10 sequences having only one intact terminal inverted repeat were commonly found. Most full-length IS10 insertions (32 out of 40 analyzed) were flanked by 9-bp direct repeats having the consensus 5'-NPuCNN-NGPyN-3' with a strong preference for 5'-TGCTNA-GNN-3'. One insertion was flanked by an inverted repeat of more than 400 bp in length. PCR amplification and Southern analysis revealed the presence of IS10 sequences in E. coli strains commonly used for DNA cloning, including some reported to be Tn10-free. No IS10-specific PCR product was obtained with N. tabacum or human DNA. Our data suggest that transposition of IS10 elements may accompany cloning steps, particularly into large BAC vectors. This might lead to the relatively frequent contamination of DNA databases by this bacterial sequence. It is estimated that one in approximately every thousand eukaryotic clone in the databases is contaminated by IS-derived sequences. We recommend checking submitted sequences for the presence of IS10 and other IS elements. In addition, DNA databases should be corrected by removing contaminating IS sequences.  相似文献   

3.
Summary A 190 by insertion is associated with the white-eosin mutation in Drosophila melanogaster. This insertion is a member of a family of transposable elements, pogo elements, which is of the same class as the P and hobo elements of D. melanogaster. Strains typically have many copies of a 190 by element, 10–15 elements 1.1–1.5 kb in size and several copies of a 2.1 kb element. The smaller elements all appear to be derived from the largest by single internal deletions so that all elements share terminal sequences. They either always insert at the dinucleotide TA and have perfect 21 bp terminal inverse repeats, or have 22 by inverse repeats and produce no duplication upon insertion. Analysis by DNA blotting of their distribution and occupancy of insertion sites in different strains suggests that they may be less mobile than P or hobo. The DNA sequence of the largest element has two long open reading frames on one strand which are joined by splicing as indicated by cDNA analysis. RNAs of this strand are made, whose sizes are similar to the major size classes of elements. A protein predicted by the DNA sequence has significant homology with a human centrosomal-associated protein, CENP-B. Homologous sequences were not detected in other Drosophila species, suggesting that this transposable element family may be restricted to D. melanogaster.  相似文献   

4.
HERV-H, a family of endogenous retroviral elements that has undergone successive expansions in the human genome, includes sequences that are expressed in placenta and T cells. With a PCR approach to the HERV-H using human monochromosomal somatic cell hybrid DNA, we identified 8 new HERV-H sequences on the X chromosome, and one novel HERV-H element, HY-1, the first reported such element on the Y chromosome, and compared these with sequences in the nucleotide sequence database. Phylogenetic analysis indicated that clone HX-1 and BAC clone 523A23 on the X chromosome were found to be in close relationship to the sequences of DJ088A21 on the human chromosome 7q31. This finding allows us to speculate that HERV-H elements may have evolved by intra-chromosomal spread. Our data may relevant to an understanding of human genomic plasticity.  相似文献   

5.
We describe the further development of a widely used package of DNA and protein sequence analysis programs for microcomputers (1,2,3). The package now provides a screen oriented user interface, and an enhanced working environment with powerful formatting, disk access, and memory management tools. The new GenBank floppy disk database is supported transparently to the user and a similar version of the NBRF protein database is provided. The programs can use sequence file annotation to automatically annotate printouts and translate or extract specified regions from sequences by name. The sequence comparison programs can now perform a 5000 X 5000 bp analysis in 12 minutes on an IBM PC. A program to locate potential protein coding regions in nucleic acids, a digitizer interface, and other additions are also described.  相似文献   

6.
7.
Eight terminally deleted Drosophila melanogaster chromosomes have now been found to be "healed." In each case, the healed chromosome end had acquired sequence from the HeT DNA family, a complex family of repeated sequences found only in telomeric and pericentric heterochromatin. The sequences were apparently added by transposition events involving no sequence homology. We now report that the sequences transposed in healing these chromosomes identify a novel transposable element, HeT-A, which makes up a subset of the HeT DNA family. Addition of HeT-A elements to broken chromosome ends appears to be polar. The proximal junction between each element and the broken chromosome end is an oligo(A) tract beginning 54 nucleotides downstream from a conserved AATAAA sequence on the strand running 5' to 3' from the chromosome end. The distal (telomeric) ends of HeT-A elements are variably truncated; however, we have not yet been able to determine the extreme distal sequence of a complete element. Our analysis covers approximately 2,600 nucleotides of the HeT-A element, beginning with the oligo(A) tract at one end. Sequence homology is strong (greater than 75% between all elements studied). Sequence may be conserved for DNA structure rather than for protein coding; even the most recently transposed HeT-A elements lack significant open reading frames in the region studied. Instead, the elements exhibit conserved short-range sequence repeats and periodic long-range variation in base composition. These conserved features suggest that HeT-A elements, although transposable elements, may have a structural role in telomere organization or maintenance.  相似文献   

8.
9.
SeqState     
Choosing and designing primers based on available DNA sequence data and statistical contrasting of domains or structural features is a common routine among molecular biologists. Currently available, free software tools were found to lack desirable features related to these tasks. This was the motivation for developing a new program, SeqState. SeqState locates regions that remain to be sequenced in phylogenetic DNA datasets, evaluates user-provided primers and selects primers best suited to fill gaps in the sequences. If the primers provided by the user are unsuitable, new primers are designed. Primers can be loaded from a primer database, be supplied as part of the alignment or be entered manually. The position of internal primers is automatically localised in the loaded data file. Primers can be edited, and changes and new primers can be saved to the database. Primer sheets allow the user to view internal dimers, complements to a second primer, mismatches to all loaded sequences, and other primer characteristics. Calculation of various sequence statistics can be requested for the whole dataset or parts thereof (character sets), with standard errors estimated by bootstrapping. Insertion-deletion events can be evaluated statistically and encoded for subsequent phylogenetic analysis according to several published coding principles.  相似文献   

10.
DNA sequence variations of chalcone synthase (Chs) and Apetala3 gene promoters from 22 cruciferous plant species were analyzed to identify putative conserved regulatory elements. Our comparative approach confirmed the existence of numerous conserved sequences which may act as regulatory elements in both investigated promoters. To confirm the correct identification of a well-conserved UV-light-responsive promoter region, a subset of Chs promoter fragments were tested in Arabidopsis thaliana protoplasts. All promoters displayed similar light responsivenesses, indicating the general functional relevance of the conserved regulatory element. In addition to known regulatory elements, other highly conserved regions were detected which are likely to be of functional importance. Phylogenetic trees based on DNA sequences from both promoters (gene trees) were compared with the hypothesized phylogenetic relationships (species trees) of these taxa. The data derived from both promoter sequences were congruent with the phylogenies obtained from coding regions of other nuclear genes and from chloroplast DNA sequences. This indicates that promoter sequence evolution generally is reflective of species phylogeny. Our study also demonstrates the great value of comparative genomics and phylogenetics as a basis for functional analysis of promoter action and gene regulation.  相似文献   

11.
Lowary and Widom selected from random sequences those which form exceptionally stable nucleosomes, including clone 601, the current champion of strong nucleosome (SN) sequences. This unique sequence database (LW sequences) carries sequence elements which confer stability on the nucleosomes formed on the sequences, and, thus, may serve as source of information on the structure of “ideal” or close to ideal nucleosome DNA sequence. An important clue is also provided by crystallographic study of Vasudevan and coauthors on clone 601 nucleosomes. It demonstrated that YR·YR dinucleotide stacks (primarily TA·TA) follow one another at distances 10 or 11 bases or multiples thereof, such that they all are located on the interface between DNA and histone octamer. Combining this important information with alignment of the YR-containing 10-mers and 11-mers from LW sequences, the bendability matrices of the stable nucleosome DNA are derived. The matrices suggest that the periodically repeated TA (YR), RR, and YY dinucleotides are the main sequence features of the SNs. This consensus coincides with the one for recently discovered SNs with visibly periodic DNA sequences. Thus, the experimentally observed stable LW nucleosomes and SNs derived computationally appear to represent the same entity – exceptionally stable SNs.  相似文献   

12.
We have analyzed a sequence of approximately 70 base pairs (bp) that shows a high degree of similarity to sequences present in the non-coding regions of a number of human and other mammalian genes. The sequence was discovered in a fragment of human genomic DNA adjacent to an integrated hepatitis B virus genome in cells derived from human hepatocellular carcinoma tissue. When one of the viral flanking sequences was compared to nucleotide sequences in GenBank, more than thirty human genes were identified that contained a similar sequence in their non-coding regions. The sequence element was usually found once or twice in a gene, either in an intron or in the 5' or 3' flanking regions. It did not share any similarities with known short interspersed nucleotide elements (SINEs) or presently known gene regulatory elements. This element was highly conserved at the same position within the corresponding human and mouse genes for myoglobin and N-myc, indicating evolutionary conservation and possible functional importance. Preliminary DNase I footprinting data suggested that the element or its adjacent sequences may bind nuclear factors to generate specific DNase I hypersensitive sites. The size, structure, and evolutionary conservation of this sequence indicates that it is distinct from other types of short interspersed repetitive elements. It is possible that the element may have a cis-acting functional role in the genome.  相似文献   

13.
Emerson RO  Thomas JH 《Journal of virology》2011,85(22):12043-12052
SCAN is a protein domain frequently found at the N termini of proteins encoded by mammalian tandem zinc finger (ZF) genes, whose structure is known to be similar to that of retroviral gag capsid domains and whose multimerization has been proposed as a model for retroviral assembly. We report that the SCAN domain is derived from the C-terminal portion of the gag capsid (CA) protein from the Gmr1-like family of Gypsy/Ty3-like retrotransposons. On the basis of sequence alignments and phylogenetic distributions, we show that the ancestral host SCAN domain (ESCAN for extended SCAN) was exapted from a full-length CA gene from a Gmr1-like retrotransposon at or near the root of the tetrapod animal branch. A truncated variant of ESCAN that corresponds to the annotated SCAN domain arose shortly thereafter and appears to be the only form extant in mammals. The Anolis lizard has a large number of tandem ZF genes with N-terminal ESCAN or SCAN domains. We predict DNA binding sites for all Anolis ESCAN-ZF and SCAN-ZF proteins and demonstrate several highly significant matches to Anolis Gmr1-like sequences, suggesting that at least some of these proteins target retroelements. SCAN is known to mediate protein dimerization, and the CA protein multimerizes to form the core retroviral and retrotransposon capsid structure. We speculate that the SCAN domain originally functioned to target host ZF proteins to retroelement capsids.  相似文献   

14.
R Staden 《Nucleic acids research》1982,10(15):4731-4751
This paper describes a computer method for handling gel reading data produced by the shotgun method of DNA sequencing. The method greatly reduces the time the sequencer needs to spend checking and editing his data and yet it produces a consensus sequence for which the accuracy of determination of every base can be clearly shown. The program can take a batch of new gel readings, screen them against vector sequences removing any that match, and then compare and align all the sequences to produce a final consensus. No information is lost in this process as alignments are achieved by making only insertions and because all the individual gel readings are added to a database from which they can be retrieved and displayed lined up one above the other. This allows the user to check on the alignments achieved by the program and if necessary change them. As each gel reading is added to the database the consensus is automatically updated accordingly and used for the next comparisons. This is a much faster process than comparing each new gel against every individual gel in the database.  相似文献   

15.
16.
17.
18.
Sampling strategies for distances between DNA sequences   总被引:2,自引:0,他引:2  
B S Weir  C J Basten 《Biometrics》1990,46(3):551-582
An international effort is now underway to obtain the DNA sequence for the entire human genome (Watson and Jordan, 1989, Genomics 5, 654-656; Barnhart, 1989, Genomics 5, 657-660). This Human Genome Initiative will generate sequence data from several species other than humans, and will result in several copies per species of at least some regions of the genome. Although the project has generated much interest, it is but one aspect of the widespread effort to generate DNA sequence data. Published sequences are collected in common databases, and release 63 of GenBank in March 1990 contained 40,127,752 bases from 33,337 reported sequences (News from GenBank 3; Mountain View, California: Intelligenetics, Inc., 1990). Large though this database is, it is only about 1% of the number of bases in the human genome. Interpretations of data of such magnitude are going to require the collaborative efforts of biometricians and molecular biologists, and an aim of this paper is to show that there is also a role for readers of this journal in the design of surveys of DNA sequences. Discussion here will center on the use of sequence data in evolutionary studies, where some region of DNA is sequenced in several different species. The object is to infer the evolutionary history of that particular region, or of the species themselves. Statistical issues in the very important studies on sequences to locate and characterize regions responsible for human diseases will not be addressed here. We will discuss appropriate ways of measuring distances between DNA sequences and of predicting the sampling properties of the distances. There are procedures for inferring evolutionary histories for a set of elements that depend on a matrix of distances between each pair of elements, and the precision of resulting trees must be influenced by the precision of the distances. We will show that account needs to be taken of two sampling processes--the sampling of sequences by the investigator ("statistical sampling"), and the sampling of genetic material involved in the formation of offspring from a parental population ("genetic sampling").  相似文献   

19.
In this study, we introduce a novel bioinformatics program, Spore-associated Symbiotic Microbes Position-specific Function (SeSaMe PS Function), for position-specific functional analysis of short sequences derived from metagenome sequencing data of the arbuscular mycorrhizal fungi. The unique advantage of the program lies in databases created based on genus-specific sequence properties derived from protein secondary structure, namely amino acid usages, codon usages, and codon contexts of 3-codon DNA 9-mers. SeSaMe PS Function searches a query sequence against reference sequence database, identifies 3-codon DNA 9-mers with structural roles, and creates a comparative dataset containing the codon usage biases of the 3-codon DNA 9-mers from 54 bacterial and fungal genera. The program applies correlation principal component analysis in conjunction with K-means clustering method to the comparative dataset. 3-codon DNA 9-mers clustered as a sole member or with only a few members are often structurally and functionally distinctive sites that provide useful insights into important molecular interactions. The program provides a versatile means for studying functions of short sequences from metagenome sequencing and has a wide spectrum of applications. SeSaMe PS Function is freely accessible at www.fungalsesame.org.  相似文献   

20.
MOTIVATION: A large body of evidence suggests that protein structural information is frequently encoded in local sequences-sequence-structure relationships derived from local structure/sequence analyses could significantly enhance the capacities of protein structure prediction methods. In this paper, the prediction capacity of a database (LSBSP2) that organizes local sequence-structure relationships encoded in local structures with two consecutive secondary structure elements is tested with two computational procedures for protein structure prediction. The goal is twofold: to test the folding hypothesis that local structures are determined by local sequences, and to enhance our capacity in predicting protein structures from their amino acid sequences. RESULTS: The LSBSP2 database contains a large set of sequence profiles derived from exhaustive pair-wise structural alignments for local structures with two consecutive secondary structure elements. One computational procedure makes use of the PSI-BLAST alignment program to predict local structures for testing sequence fragments by matching the testing sequence fragments onto the sequence profiles in the LSBSP2 database. The results show that 54% of the test sequence fragments were predicted with local structures that match closely with their native local structures. The other computational procedure is a filter system that is capable of removing false positives as possible from a set of PSI-BLAST hits. An assessment with a large set of non-redundant protein structures shows that the PSI-BLAST + filter system improves the prediction specificity by up to two-fold over the prediction specificity of the PSI-BLAST program for distantly related protein pairs. Tests with the two computational procedures above demonstrate that local sequence-structure relationships can indeed enhance our capacity in protein structure prediction. The results also indicate that local sequences encoded with strong local structure propensities play an important role in determining the native state folding topology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号