首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
The frequencies of "words", oligonucleotides within nucleotide sequences, reflect the genetic information contained in the sequence "texts". Nucleotide sequences are characteristically represented by their contrast word vocabularies. Comparison of the sequences by correlating their contrast vocabularies is shown to reflect well the relatedness (unrelatedness) between the sequences. A single value, the linguistic similarity between the sequences, is suggested as a measure of sequence relatedness. Sequences as short as 1000 bases can be characterized and quantitatively related to other sequences by this technique. The linguistic sequence similarity value is used for analysis of taxonomically and functionally diverse nucleotide sequences. The similarity value is shown to be very sensitive to the relatedness of the source species, thus providing a convenient tool for taxonomic classification of species by their sequence vocabularies. Functionally diverse sequences appear distinct by their linguistic similarity values. This can be a basis for a quick screening technique for functional characterization of the sequences and for mapping functionally distinct regions in long sequences.  相似文献   

3.
Sequence alignment is an important bioinformatics tool for identifying homology, but searching against the full set of available sequences is likely to result in many hits to poorly annotated sequences providing very little information. Consequently, we often want alignments against a specific subset of sequences: for instance, we are looking for sequences from a particular species, sequences that have known 3d-structures, sequences that have a reliable (curated) function annotation, and so on. Although such subset databases are readily available, they only represent a small fraction of all sequences. Thus, the likelihood of finding close homologs for query sequences is smaller, and the alignments will in general have lower scores. This makes it difficult to distinguish hits to homologous sequences from random hits to unrelated sequences. Here, we propose a method that addresses this problem by first aligning query sequences against a large database representing the corpus of known sequences, and then constructing indirect (or transitive) alignments by combining the results with alignments from the large database against the desired target database. We compare the results to direct pairwise alignments, and show that our method gives us higher sensitivity alignments against the target database.  相似文献   

4.
5.
Repetitive DNA sequences near immunoglobulin genes in the mouse genome (Steinmetz et al., 1980a,b) were characterized by restriction mapping and hybridization. Six sequences were determined that turned out to belong to a new family of dispersed repetitive DNA. From the sequences, which are called R1 to R6, a 475 base-pair consensus sequence was derived. The R family is clearly distinct from the mouse B1 family (Krayev et al., 1980). According to saturation hybridization experiments, there are about 100,000 R sequences per haploid genome, and they are probably distributed throughout the genome. The individual R sequences have an average divergence from the consensus sequence of 12.5%, which is largely due to point mutations and, among those, to transitions. Some R sequences are severly truncated. The R sequences extend into A-rich sequences and are flanked by short direct repeats. Also, two large insertions in the R2 sequence are flanked by direct repeats. In the neighbourhood of and within R sequences, stretches of DNA have been identified that are homologous to parts of small nuclear RNA sequences. Mouse satellite DNA-like sequences and members of the B1 family were also found in close proximity to the R sequences. The dispersion of R sequences within the mouse genome may be a consequence of transposition events. The possible role of the R sequences in recombination and/or gene conversion processes is discussed.  相似文献   

6.
The arrangements of inverted-repeated and repeated DNA sequences in the human genome have been investigated by an electron microscope method. The arrangement of the interspersed repeated DNA sequences is found to be similar to the corresponding arrangement found in Xenopus. This arrangement consists of 300-nucleotide-long repeated DNA sequences interspersed with roughly gene-size single-copy DNA sequences. The inverted-repeated sequences are also 300 nucleotides in length and are interspersed with the other DNA sequence classes.Most inverted-repeated sequences (64%) are spaced by another sequence which is recognized by electron microscopy as a single-stranded loop in a hairpin structure. The average length of this spacer loop is 1.6 kilobases. Although some pairs of inverted-repeated sequences are clustered, most seem to be randomly distributed throughout the genome. The average distance separating two pairs of inverted-repeated sequences is 10 to 20 kilobases. The interspersed repeated sequences and inverted-repeated sequences are arranged simultaneously in a portion of the human genome resulting in an interspersion of all three sequence classes.  相似文献   

7.
DNA condensation with polyamines. II. Electron microscopic studies   总被引:24,自引:0,他引:24  
Approximately 75% of the wheat and rye genomes consist of repeated sequence DNA. Three-quarters of the non-repeated or few copy sequences in wheat are less than 1000 base-pairs long, whilst in rye approximately half of the non-repeated or few copy sequences are in this size class. Most of the remaining non-repeated or few copy sequences appear to be a few thousand base-pairs long.In this paper a somewhat novel approach has been used to quantitatively analyse the linear organisation of the large proportion of repeated sequence DNA as well as the non-repeated DNA in the wheat and rye genomes. Repeated sequences in the genomes of oats, barley, wheat and rye have been used as probes to distinguish and isolate four different groups of repeated sequences and their neighbouring sequences from the wheat and rye genomes. Radioactively labelled wheat or rye DNA fragments ranging from 200 to over 9000 nucleotides long were incubated separately with large excesses of denatured unlabelled oats, barley, wheat and rye DNAs to Cot values which enable all the repeated sequences of the unlabelled DNA to renature. The following parameters were then determined from the proportions of total labelled DNA in fragments which had at least partially renatured. (1) The proportions of the repeated sequences in the labelled DNAs that were able to hybridise to each unlabelled DNA; (2) the mean distance apart of the hybridising sequences on the longer labelled fragments; and (3) the proportion of the genome in which the hybridising sequences were concentrated. Analysis of these results, together with those of separate experiments designed to quantitatively estimate the nature of sequences unable to reanneal with the repeated sequences of each of the probe DNAs, have enabled schematic maps to be drawn which show how the repeated and non-repeated sequences are arranged in the wheat and rye genomes.Both genomes are constructed from millions of relatively short sequences, most of them considerably shorter than 3000 base-pairs. This structure was recognised because adjacent sequences can be distinguished by their frequency of repetition (i.e. repeated or non-repeated) or by their evolutionary origin. Approximately 40 to 45% of the wheat genome and 30 to 35% of the rye genome consists of short non-repeated sequences interspersed between short repeated sequences. Approximately 50% of the wheat genome and 60% of the rye genome consists of tandemly arranged repeated sequences of different evolutionary origins. It is postulated that much of this complex repeated sequence DNA could have arisen from amplification of compound sequences, each containing repeated and non-repeated sequence DNA.Short repeated sequences with a number average length of around 200 base-pairs and which occupy about 20% of the wheat and rye genomes are related to repeated sequences also found in oats and barley. They are concentrated in 60 to 70% of the wheat and rye genomes, being interspersed with different short repeated sequences and a significant proportion of the short non-repeated sequences.Rye chromosomes contain more DNA than wheat chromosomes. This is principally, but not entirely, due to additional repeated sequence DNA. Many quantitative changes appear to have occurred in both genomes, possibly affecting most families of repeated sequences, since wheat and rye diverged from a common ancestor. Both species contain species-specific repeated sequences (24% of rye genome; 16% of wheat genome) but a large proportion of these are closely interspersed with repeated sequences found in both genomes.  相似文献   

8.
A cDNA-AFLP experiment was designed to identify and clone nucleotide sequences induced during seed germination in Arabidopsis thaliana. Sequences corresponding to known genes involved in processes important for germination, such as mitochondrial biogenesis, protein synthesis and cell cycle progression, were isolated. Other sequences correspond to Arabidopsis BAC clones in regions where genes have not been annotated. Notably, a number of the sequences cloned did not correspond to available sequences in the databases from the Arabidopsis genome, but instead present significant similarity with DNA from other organisms, for example fish species; among them, some may encode transposons. A number of the sequences isolated showed no significant similarity with any sequences in the public databases. Oligonucleotides derived from these new sequences were used to amplify genomic DNA of Arabidopsis. Expression analysis of representative sequences is presented. This work suggests that, during germination, there may be a massive transposon mobilization that may be useful in the annotation of new genome sequences and identification of regulatory mechanisms.  相似文献   

9.
10.
11.
Sim KL  Creamer TP 《Proteins》2004,54(4):629-638
Protein simple sequences, a subset of low-complexity sequences, are regions of sequence highly enriched in one or a few residue types. Simple sequences are exceedingly common, the average being more than one per protein sequence. Despite being so common, such sequences are not well-studied. The simple sequences that have been subjected to detailed study are often found to possess important functions. Here we present a survey of protein simple sequences, generally enriched in a single residue type, with the aim of studying their conservation. We find that the majority of such simple sequences are not conserved. However, conserved protein simple sequences are relatively common, with approximately 11% of the surveyed protein families possessing a conserved simple sequence. The data obtained in this study support the idea that simple sequences are conserved for functional reasons. Such functions can range from substrate binding, to mediating protein-protein interactions, to structural integrity. A perhaps surprising finding is that the residue enriching a conserved simple sequence is itself not necessarily conserved. Neither is the length of many of the highly conserved simple sequences. In the few cases where structural and functional data is available it is found that the conserved simple sequences are consistent with both local structure and function. The data presented support the idea that protein simple sequences can be conserved and have important roles in protein structure and function.  相似文献   

12.
Certain human DNA sequences are much less methylated at CpG sites in sperm than in various adult somatic tissues. The DNA of term placenta displays intermediate levels of methylation at these sequences (Sp-0.3 sequences). We report here that pluripotent embryonal carcinoma (EC) cells derived from testicular germ cell tumors are hypermethylated at the three previously cloned Sp-0.3 sequences and seven newly isolated sequences that exhibit sperm-specific hypomethylation. In contrast to their hypermethylation in EC cells, the Sp-0.3 sequences are hypomethylated in a line of yolk sac carcinoma cells, which like placenta, represent an extraembryonic lineage. These DNA sequences, therefore, appear to be subject to coordinate changes in their methylation during differentiation, probably early in embryogenesis, despite their diversity in copy number (1 to 10(4] and primary structure. Two of these Sp-0.3 sequences are highly homologous to DNA sequences in human chromosomal regions that might be recombination hotspots, namely, a cryptic satellite DNA sequence at a fragile site and the downstream region of the beta-globin gene cluster.  相似文献   

13.
Abstract

The frequencies of “words”, oligonucleotides within nucleotide sequences, reflect the genetic information contained in the sequence “texts”. Nucleotide sequences are characteristically represented by their contrast word vocabularies. Comparison of the sequences by correlating their contrast vocabularies is shown to reflect well the relatedness (unrelatedness) between the sequences. A single value, the linguistic similarity between the sequences, is suggested asa measure of sequence relatedness. Sequences as short as 1000 bases can be characterized and quantitatively related to other sequences by this technique. The linguistic sequence similarity value is used for analysis of taxonomically and functionally diverse nucleotide sequences. The similarity value is shown to be very sensitive to the relatedness of the source species, thus providing a convenient tool for taxonomic classification of species by their sequence vocabularies. Functionally diverse sequences appear distinct by their linguistic similarity values. This can be a basis for a quick screening technique for functional characterization of the sequences and for mapping functionally distinct regions in long sequences.  相似文献   

14.
Repetitive extragenic palindromic (REP) sequences are highly conserved inverted repeats present in up to 1000 copies on the Escherichia coli chromosome. We have shown both in vivo and in vitro that REP sequences can stabilize upstream mRNA by blocking the processive action of 3'----5' exonucleases. In a number of operons, mRNA stabilization by REP sequences plays an important role in the control of gene expression. Furthermore, differential mRNA stability mediated by the REP sequences can be responsible for differential gene expression within polycistronic operons. Despite the key role of REP sequences in mRNA stability and gene expression in a number of operons, several lines of evidence suggest that this is unlikely to be the primary reason for the exceptionally high degree of sequence conservation between REP sequences. Other possible functions for REP sequences are discussed. We propose that REP sequences may be a prokaryotic equivalent of 'selfish DNA' and that gene conversion may play a role in the evolution and maintenance of REP sequences.  相似文献   

15.
Monkey mummy bones and teeth originating from the North Saqqara Baboon Galleries (Egypt), soft tissue from a mummified baboon in a museum collection, and nineteenth/twentieth-century skin fragments from mangabeys were used for DNA extraction and PCR amplification of part of the mitochondrial 12S rRNA gene. Sequences aligning with the 12S rRNA gene were recovered but were only distantly related to contemporary monkey mitochondrial 12S rRNA sequences. However, many of these sequences were identical or closely related to human nuclear DNA sequences resembling mitochondrial 12S rRNA (isolated from a cell line depleted in mitochondria) and therefore have to be considered contamination. Subsequently in a separate study we were able to recover genuine mitochondrial 12S rRNA sequences from many extant species of nonhuman Old World primates and sequences closely resembling the human nuclear integrations. Analysis of all sequences by the neighbor-joining (NJ) method indicated that mitochondrial DNA sequences and their nuclear counterparts can be divided into two distinct clusters. One cluster contained all temporary cytoplasmic mitochondrial DNA sequences and approximately half of the monkey nuclear mitochondriallike sequences. A second cluster contained most human nuclear sequences and the other half of monkey nuclear sequences with a separate branch leading to human and gorilla mitochondrial and nuclear sequences. Sequences recovered from ancient materials were equally divided between the two clusters. These results constitute a warning for when working with ancient DNA or performing phylogenetic analysis using mitochondrial DNA as a target sequence: Nuclear counterparts of mitochondrial genes may lead to faulty interpretation of results.Correspondence to: A.C. van der Kuyl  相似文献   

16.
Isolation of human sequences that replicate autonomously in human cells.   总被引:41,自引:17,他引:24       下载免费PDF全文
We have isolated a heterogeneous collection of human genomic sequences which replicate autonomously when introduced into human cells. The novel strategy for the isolation of these sequences involved cloning random human DNA fragments into a defective Epstein-Barr virus vector. This vector alone was unable to replicate in human cells, but appeared to provide for the nuclear retention of linked DNA. The human sequences persist in a long-term replication assay (greater than 2 months) in the presence of the viral nuclear retention sequences. Using a short-term (4-day) assay, we showed that the human sequences are able to replicate in the absence of all viral sequences. The plasmids bearing human sequences were shown to replicate based on the persistence of MboI-sensitive plasmid DNA in the long-term assay and the appearance of DpnI-resistant DNA in the short-term assay. The human sequences were shown to be responsible for the replication activity and may represent authentic human origins of replication.  相似文献   

17.
The shufflon of plasmid R64 consists of four DNA segments separated and flanked by seven sfx recombination sites. Rci-mediated recombination between any inverted sfx sequences causes inversion of the DNA segments independently or in groups. The R64 shufflon selects one of seven pilV genes encoding type IV pilus adhesins, in which the N-terminal region is constant, while the C-terminal regions are variable. The R64 sfx sequences are asymmetric. The sfx central region and right arm sequences are conserved, but left arm sequences are not. Here we constructed a symmetric sfx sequence, in which the sfx left arm sequence was changed to the inverted repeat of the right arm sequence and made artificial shufflon segments carrying symmetric sfx sequences in inverted or direct orientations. The symmetric sfx sequence exhibited the highest inversion frequency in a shufflon segment flanked by two inverted sfx sequences. Rci-dependent deletion of a shufflon segment flanked by two direct symmetric sfx sequences was observed, suggesting that asymmetry of R64 sfx sequences inhibits recombination between direct sfx sequences. In addition, intermolecular recombination between symmetric sfx sequences was also observed. The extra C-terminal domain of Rci was shown to be essential for inversion of the R64 shufflon using asymmetric sfx sequences but not essential for recombination using symmetric sfx sequences, suggesting that the Rci C-terminal segment helps the binding of Rci to asymmetric sfx sequences. Rci protein lacking the C-terminal domain bound to both arms of symmetric sfx sequence but only to the right arm of asymmetric sfx sequence.  相似文献   

18.
The nucleic acid sequences found in DNA and RNA from rat cells which are homologous to Kirsten sarcoma virus have been characterized. The homologous sequences are present in multiple copies per diploid rat cellular genome in a variety of different rat cellular dna's. In certain cells that constitutively express only low levels of sequences homologous to Kirsten sarcoma virus, bromodeoxyuridine treatment leads to the expression of high levels of these sequences in RNA. Supernatants from cell lines producing the sequences homologous to Kirsten sarcoma virus contain high levels of these sequences which are purified to the same degree as the previously known rat type C viral nucleic acid sequences by type C particles being released from such cells. The results indicate that the sequences in rat cells homologous to Kisten sarcoma virus have three characteristics of known mammalian type C viruses, and suggest that at least part of Kirsten sarcoma virus rat-derived sequences represent a distinct class of endogenous rat type C virus that has no detectable homology to the other known class of endogenous rat type C virus.  相似文献   

19.
The RNA genome of the Moloney isolate of murine sarcoma virus (M-MSV) consists of two parts--a sarcoma-specific region with no homology to known leukemia viral RNAs, and a shared region present also in Moloney murine leukemia virus RNA. Complementary DNA was isolated which was specific for each part of the M-MSV genome. The DNA of a number of mammalian species was examined for the presence of nucleotide sequences homologous with the two M-MSV regions. Both sets of viral sequences had homologous nucleotide sequences present in normal mouse cellular DNA. MSV-specific sequences found in mouse cellular DNA closely matched those nucleotide sequences found in M-MSV as seen by comparisons of thermal denaturation profiles. In all normal mouse cells tested, the cellular set of M-MSV-specific nucleotide sequences was present in DNA as one to a few copies per cell. The rate of base substitution of M-MSV nucleotide sequences was compared with the rate of evolution of both unique sequences and the hemoglobin gene of various species. Conservation of MSV-specific nucleotide sequences among species was similar to that of mouse globin gene(s) and greater than that of average unique cellular sequences. In contrast, cellular nucleotide sequences that are homologous to the M-MSV-murine leukemia virus "common" nucleotide region were present in multiple copies in mouse cells and were less well matched, as seen by reduced melting profiles of the hybrids. The cellular common nucleotide sequences diverged very rapidly during evolution, with a base substitution rate similar to that reported for some primate and avian endogenous virogenes. The observation that two sets of covalently linked viral sequences evolved at very different rates suggests that the origin of M-MSV may be different from endogenous helper viruses and that cellular sequences homologous to MSV-specific nucleotide sequences may be important to survival.  相似文献   

20.
We describe a new computer program that identifies conserved secondary structures in aligned nucleotide sequences of related single-stranded RNAs. The program employs a series of hash tables to identify and sort common base paired helices that are located in identical positions in more than one sequence. The program gives information on the total number of base paired helices that are conserved between related sequences and provides detailed information about common helices that have a minimum of one or more compensating base changes. The program is useful in the analysis of large biological sequences. We have used it to examine the number and type of complementary segments (potential base paired helices) that can be found in common among related random sequences similar in base composition to 16S rRNA from Escherichia coli. Two types of random sequences were analyzed. One set consisted of sequences that were independent but they had the same mononucleotide composition as the 16S rRNA. The second set contained sequences that were 80% similar to one another. Different results were obtained in the analysis of these two types of random sequences. When 5 sequences that were 80% similar to one another were analyzed, significant numbers of potential helices with two or more independent base changes were observed. When 5 independent sequences were analyzed, no potential helices were found in common. The results of the analyses with random sequences were compared with the number and type of helices found in the phylogenetic model of the secondary structure of 16S ribosomal RNA. Many more helices are conserved among the ribosomal sequences than are found in common among similar random sequences. In addition, conserved helices in the 16S rRNAs are, on the average, longer than the complementary segments that are found in comparable random sequences. The significance of these results and their application in the analysis of long non-ribosomal nucleotide sequences is discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号