首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 671 毫秒
1.
The frequencies of "words", oligonucleotides within nucleotide sequences, reflect the genetic information contained in the sequence "texts". Nucleotide sequences are characteristically represented by their contrast word vocabularies. Comparison of the sequences by correlating their contrast vocabularies is shown to reflect well the relatedness (unrelatedness) between the sequences. A single value, the linguistic similarity between the sequences, is suggested as a measure of sequence relatedness. Sequences as short as 1000 bases can be characterized and quantitatively related to other sequences by this technique. The linguistic sequence similarity value is used for analysis of taxonomically and functionally diverse nucleotide sequences. The similarity value is shown to be very sensitive to the relatedness of the source species, thus providing a convenient tool for taxonomic classification of species by their sequence vocabularies. Functionally diverse sequences appear distinct by their linguistic similarity values. This can be a basis for a quick screening technique for functional characterization of the sequences and for mapping functionally distinct regions in long sequences.  相似文献   

2.
Abstract

The frequencies of “words”, oligonucleotides within nucleotide sequences, reflect the genetic information contained in the sequence “texts”. Nucleotide sequences are characteristically represented by their contrast word vocabularies. Comparison of the sequences by correlating their contrast vocabularies is shown to reflect well the relatedness (unrelatedness) between the sequences. A single value, the linguistic similarity between the sequences, is suggested asa measure of sequence relatedness. Sequences as short as 1000 bases can be characterized and quantitatively related to other sequences by this technique. The linguistic sequence similarity value is used for analysis of taxonomically and functionally diverse nucleotide sequences. The similarity value is shown to be very sensitive to the relatedness of the source species, thus providing a convenient tool for taxonomic classification of species by their sequence vocabularies. Functionally diverse sequences appear distinct by their linguistic similarity values. This can be a basis for a quick screening technique for functional characterization of the sequences and for mapping functionally distinct regions in long sequences.  相似文献   

3.
目的研究不同来源HBV病毒株的基因同源性。方法在CenBank中调取中国各地递交的HBV病毒株的全基因序列24个,使用ClustalW1.83生物软件,对各HBV病毒株的全基因序列进行同源性比较,并建立基因进化树,分析其特点。结果不同地区的HBV病毒株的全基因序列并不一致,同一地区来源的HBV病毒株的全基因序列亦并不一致,甚至差别很大。结论不同地区间和地区内的HBV病毒株的全基因序列具有很大的异质性。  相似文献   

4.
5.
Repetitive extragenic palindromic (REP) sequences are highly conserved inverted repeats present in up to 1000 copies on the Escherichia coli chromosome. We have shown both in vivo and in vitro that REP sequences can stabilize upstream mRNA by blocking the processive action of 3'----5' exonucleases. In a number of operons, mRNA stabilization by REP sequences plays an important role in the control of gene expression. Furthermore, differential mRNA stability mediated by the REP sequences can be responsible for differential gene expression within polycistronic operons. Despite the key role of REP sequences in mRNA stability and gene expression in a number of operons, several lines of evidence suggest that this is unlikely to be the primary reason for the exceptionally high degree of sequence conservation between REP sequences. Other possible functions for REP sequences are discussed. We propose that REP sequences may be a prokaryotic equivalent of 'selfish DNA' and that gene conversion may play a role in the evolution and maintenance of REP sequences.  相似文献   

6.
The arrangements of inverted-repeated and repeated DNA sequences in the human genome have been investigated by an electron microscope method. The arrangement of the interspersed repeated DNA sequences is found to be similar to the corresponding arrangement found in Xenopus. This arrangement consists of 300-nucleotide-long repeated DNA sequences interspersed with roughly gene-size single-copy DNA sequences. The inverted-repeated sequences are also 300 nucleotides in length and are interspersed with the other DNA sequence classes.Most inverted-repeated sequences (64%) are spaced by another sequence which is recognized by electron microscopy as a single-stranded loop in a hairpin structure. The average length of this spacer loop is 1.6 kilobases. Although some pairs of inverted-repeated sequences are clustered, most seem to be randomly distributed throughout the genome. The average distance separating two pairs of inverted-repeated sequences is 10 to 20 kilobases. The interspersed repeated sequences and inverted-repeated sequences are arranged simultaneously in a portion of the human genome resulting in an interspersion of all three sequence classes.  相似文献   

7.
The diversity of serine proteases secreted from Chrysomya bezziana larvae was investigated biochemically and by PCR and sequence analysis. Cation-exchange chromatography of purified larval serine proteases resolved four trypsin-like activities and three chymotrypsin-like activities as discerned by kinetic studies with benzoyl-Arg-p-nitroanilide and succinyl-Ala-Ala-Pro-Phe-p-nitroanilide. Amino-terminal sequencing of the three most abundant fractions gave two sequences, which were homologous to other Dipteran trypsins and chymotrypsins. Analysis of products generated by PCR of cDNA from whole larvae using specific primers based on the amino-terminal sequences and generic serine protease primers identified 22 different sequences, while phylogenetic analysis of the deduced amino acid sequences differentiated two trypsin-like and four chymotrypsin-like families. Phylogenetic comparisons with Dipteran and mammalian serine protease sequences showed that all the Chrysomya bezziana sequences clustered with Dipteran sequences. The Chrysomya bezziana chymotrypsin-like sequences segregated within a Dipteran cluster of chymotrypsin sequences, but were well dispersed amongst these sequences. The largest Chrysomya bezziana serine protease family, the trypB family, clustered tightly as a group, and was closely related to a Lucilia cuprina trypsin but distinct from Drosophila melanogaster alpha and beta trypsins. The trypB family contains ten highly homologous sequences and probably represents an example of concerted evolution of a trypsin gene in Chrysomya bezziana.  相似文献   

8.
The chromatin structure encompassing the lysozyme gene domain in hen oviduct nuclei was studied by measuring the partitioning of coding and flanking sequences during chromatin fractionation and by analyzing the nucleosome repeat in response to micrococcal nuclease digestion. Following micrococcal nuclease digestion, nuclei were sedimented to obtain a chromatin fraction released during digestion (S1) and then lysed in tris(hydroxymethyl)aminomethane-(ethylenedinitrilo)tetraacetic acid-[ethylenebis(oxyethylenenitrilo)]tetraacetic acid and centrifuged again to yield a second solubilized chromatin fraction (S2) and a pelleted fraction (P2). By dot-blot hybridization with 14 specific probes, it is found that the fractionation procedure defines three classes of sequences within the lysozyme gene domain. The coding sequences, which partition with fraction P2, are flanked by class I flanking sequences, which partition with fractions S1 and P2 and which extend over 11 kilobases (kb) on the 5'side and probably over about 4 kb on the 3' side. The partitioning of class II flanking sequences, which are located distal of class I flanking sequences, is different from that of class I flanking sequences. Coding sequences lack a canonical nucleosome repeat, class I flanking sequences possess a disturbed nucleosome repeat, and class II flanking sequences generate an extended nucleosomal ladder. Coding and class I flanking sequences are more readily digested by micrococcal nuclease than class II flanking sequences and the inactive beta A-globin gene. In hen liver, where the lysozyme gene is inactive, coding and class I flanking sequences fractionate into fractions S2 and P2.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

9.
The phylogenetic diversity of an oligotrophic marine picoplankton community was examined by analyzing the sequences of cloned ribosomal genes. This strategy does not rely on cultivation of the resident microorganisms. Bulk genomic DNA was isolated from picoplankton collected in the north central Pacific Ocean by tangential flow filtration. The mixed-population DNA was fragmented, size fractionated, and cloned into bacteriophage lambda. Thirty-eight clones containing 16S rRNA genes were identified in a screen of 3.2 x 10(4) recombinant phage, and portions of the rRNA gene were amplified by polymerase chain reaction and sequenced. The resulting sequences were used to establish the identities of the picoplankton by comparison with an established data base of rRNA sequences. Fifteen unique eubacterial sequences were obtained, including four from cyanobacteria and eleven from proteobacteria. A single eucaryote related to dinoflagellates was identified; no archaebacterial sequences were detected. The cyanobacterial sequences are all closely related to sequences from cultivated marine Synechococcus strains and with cyanobacterial sequences obtained from the Atlantic Ocean (Sargasso Sea). Several sequences were related to common marine isolates of the gamma subdivision of proteobacteria. In addition to sequences closely related to those of described bacteria, sequences were obtained from two phylogenetic groups of organisms that are not closely related to any known rRNA sequences from cultivated organisms. Both of these novel phylogenetic clusters are proteobacteria, one group within the alpha subdivision and the other distinct from known proteobacterial subdivisions. The rRNA sequences of the alpha-related group are nearly identical to those of some Sargasso Sea picoplankton, suggesting a global distribution of these organisms.  相似文献   

10.
The shufflon of plasmid R64 consists of four DNA segments separated and flanked by seven sfx recombination sites. Rci-mediated recombination between any inverted sfx sequences causes inversion of the DNA segments independently or in groups. The R64 shufflon selects one of seven pilV genes encoding type IV pilus adhesins, in which the N-terminal region is constant, while the C-terminal regions are variable. The R64 sfx sequences are asymmetric. The sfx central region and right arm sequences are conserved, but left arm sequences are not. Here we constructed a symmetric sfx sequence, in which the sfx left arm sequence was changed to the inverted repeat of the right arm sequence and made artificial shufflon segments carrying symmetric sfx sequences in inverted or direct orientations. The symmetric sfx sequence exhibited the highest inversion frequency in a shufflon segment flanked by two inverted sfx sequences. Rci-dependent deletion of a shufflon segment flanked by two direct symmetric sfx sequences was observed, suggesting that asymmetry of R64 sfx sequences inhibits recombination between direct sfx sequences. In addition, intermolecular recombination between symmetric sfx sequences was also observed. The extra C-terminal domain of Rci was shown to be essential for inversion of the R64 shufflon using asymmetric sfx sequences but not essential for recombination using symmetric sfx sequences, suggesting that the Rci C-terminal segment helps the binding of Rci to asymmetric sfx sequences. Rci protein lacking the C-terminal domain bound to both arms of symmetric sfx sequence but only to the right arm of asymmetric sfx sequence.  相似文献   

11.
Repetitive DNA sequences near immunoglobulin genes in the mouse genome (Steinmetz et al., 1980a,b) were characterized by restriction mapping and hybridization. Six sequences were determined that turned out to belong to a new family of dispersed repetitive DNA. From the sequences, which are called R1 to R6, a 475 base-pair consensus sequence was derived. The R family is clearly distinct from the mouse B1 family (Krayev et al., 1980). According to saturation hybridization experiments, there are about 100,000 R sequences per haploid genome, and they are probably distributed throughout the genome. The individual R sequences have an average divergence from the consensus sequence of 12.5%, which is largely due to point mutations and, among those, to transitions. Some R sequences are severly truncated. The R sequences extend into A-rich sequences and are flanked by short direct repeats. Also, two large insertions in the R2 sequence are flanked by direct repeats. In the neighbourhood of and within R sequences, stretches of DNA have been identified that are homologous to parts of small nuclear RNA sequences. Mouse satellite DNA-like sequences and members of the B1 family were also found in close proximity to the R sequences. The dispersion of R sequences within the mouse genome may be a consequence of transposition events. The possible role of the R sequences in recombination and/or gene conversion processes is discussed.  相似文献   

12.
Sequence alignment is an important bioinformatics tool for identifying homology, but searching against the full set of available sequences is likely to result in many hits to poorly annotated sequences providing very little information. Consequently, we often want alignments against a specific subset of sequences: for instance, we are looking for sequences from a particular species, sequences that have known 3d-structures, sequences that have a reliable (curated) function annotation, and so on. Although such subset databases are readily available, they only represent a small fraction of all sequences. Thus, the likelihood of finding close homologs for query sequences is smaller, and the alignments will in general have lower scores. This makes it difficult to distinguish hits to homologous sequences from random hits to unrelated sequences. Here, we propose a method that addresses this problem by first aligning query sequences against a large database representing the corpus of known sequences, and then constructing indirect (or transitive) alignments by combining the results with alignments from the large database against the desired target database. We compare the results to direct pairwise alignments, and show that our method gives us higher sensitivity alignments against the target database.  相似文献   

13.
不具有3-碱基周期性的编码序列初探   总被引:4,自引:0,他引:4  
对120个较短编码序列(<1 200 bp)的Fourier频谱进行分析表明,3-碱基周期性在短编码序列中并不是绝对存在的.统计分析提示,编码序列有无3-碱基周期性与序列的碱基组成和分布、所编码蛋白质氨基酸的选用和顺序以及同义密码子的使用都有一定的关系.一般地,非周期-3序列中A+U含量高于G+C含量,周期-3序列的情况则相反;非周期-3序列中碱基在密码子三个位点上的分布比周期-3序列中的分布均匀;非周期-3序列密码子和氨基酸的使用偏向没有周期-3序列的大.在利用Fourier分析方法预测DNA序列中的基因和外显子时,应充分考虑到这些现象.  相似文献   

14.
The isochore structure of the nuclear genome of angiosperms described by Salinas et al. (1) was confirmed by using a different experimental approach, namely by showing that the levels of coding sequences from both dicots and Gramineae are linearly correlated with GC levels of the corresponding flanking sequences. The compositional distribution of homologous coding sequences from several orders of dicots and from Gramineae were also studied and shown to mimick the compositional distributions previously seen (1) for coding sequences in general, most coding sequences from Gramineae being much higher than those of the dicots explored. These differences were even stronger for third codon positions and led to striking codon usages for many coding sequences especially in the case of Gramineae.  相似文献   

15.
以7种古菌、46种细菌和10种真核生物的基因组为样本,考虑碱基间的短程关联和长程关联作用,得到编码序列的密码对和基因间序列的三联体对中不同位点的二核苷酸频率,据此构建了基于编码序列和基因间序列的系统发生关系。无论是基于编码序列还是基因间序列对信息进行聚类,古菌或真核均被聚在一支上,表明聚类参数的选择是合适的;与基于氨基酸序列构建的系统发生关系进行两两比较,发现大部分硬壁菌的编码序列与基因间序列之间,以及编码序列与氨基酸序列之间的进化都存在较大差异。通过分析认为,只有综合考虑这三类序列的进化信息,才可能得到更自然的系统发生关系。  相似文献   

16.
Three clones of non-repetitive sequences and six clones containing repetitive sequences were obtained from micronuclear DNA of Tetrahymena thermophila. All the non-repetitive and three repetitive sequences had the same organization in micro- and macronuclear DNAs as revealed by blot hybridization. On the other hand, the remaining three clones with repetitive sequences had apparently different organization in the two nuclear DNAs. All these repetitive sequences showed a smear on the blot in addition to a number of discrete bands when micronuclear DNA was digested with EcoR I. In macronuclear DNAs, these sequences invariably became one or two bands and the smear disappeared. We conclude that, when a macronucleus develops from a micronucleus, the non-repetitive sequences amplify by more than 20 times with relatively few rearrangement, whereas some selected portions of repeated and/or repeat-contiguous sequences are amplified with rather extensive reorganization.  相似文献   

17.
Mouse satellite DNA sequences isolated by centrifugation in CS2SO4--Ag+ gradients are analyzed for buoyant density by CSCl density gradients and for their content of fast reassociating sequences by denaturation and partial reassociation. Our data suggest that in CS2SO4 gradients silver ions separate a satellite band which contains both fast reassociating G+C rich sequences and slow reassociating, A+T rich DNA sequences.  相似文献   

18.
Cot analysis shows that the haploid Drosophila genome contains 12% rapidly reassociating, highly reiterated DNA, 12% middle repetitive DNA with an average reiteration frequency of 70, and 70% single-copy DNA. The distribution of the middle repetitive sequences in the genome has been studied by an examination in the electron microscope of the structures obtained when middle repetitive sequences present on large DNA strands reassociate and by the hydroxyapatite binding methods developed by Davidson et al. (1973). At least one third by weight of the middle repetitive sequences are interspersed in single-copy sequences. These interspersed middle repetitive sequences have a fairly uniform distribution of lengths from less than 0.5 to 13 kb, with a number average value of 5.6 kb. The average distance between middle repetitive sequences is greater than 13 kb. The data do not exclude the possibility that essentially all of the middle repetitive sequences have the interspersion pattern described above; however, it is possible that some of the middle repetitive sequences of Drosophila are clustered in stretches of length much greater than 13 kb. The interspersion pattern of the middle repetitive sequences in Drosophila is quite different from that which occurs in the sea urchin, in Xenopus, in rat, and probably many other higher eucaryotes.  相似文献   

19.
Activation of the transformation potential of the cellular fps gene   总被引:27,自引:0,他引:27  
D A Foster  M Shibuya  H Hanafusa 《Cell》1985,42(1):105-115
Chicken cellular-fps (c-fps) sequences were substituted for viral-fps (v-fps) sequences in two retroviral genome structures, one that expressed a c-fps gene product that was indistinguishable from the normal c-fps gene product expressed in chicken bone marrow cells, and another that expressed a gag-fps fusion protein. When c-fps gene sequences (without linked gag gene sequences) were expressed at high levels in a viral vector, no transformation of fibroblasts was detected. It was previously demonstrated that the corresponding v-fps sequences could transform fibroblasts. When the same c-fps sequences were expressed in a form linked to gag gene sequences, transformation of fibroblasts and induction of tumors were observed. The data suggest that the c-fps gene product lacks transformation potential by itself even when overexpressed and that the transformation potential of the c-fps gene can be activated by either mutation (or mutations) in the fps coding region or by fusion with viral gag gene sequences.  相似文献   

20.
《Gene》1996,172(1):GC33-GC41
We have developed a fast heuristic algorithm for multiple sequence alignment which provides near-to-optimal results for sufficiently homologous sequences. The algorithm makes use of the standard dynamic programming procedure by applying it to all pairs of sequences. The resulting score matrices for pair-wise alignment give rise to secondary matrices containing the additional charges imposed by forcing the alignment path to run through a particular vertex. Such a constraint corresponds to slicing the sequences at the positions defining that vertex, and aligning the remaining pairs of prefix and suffix sequences separately. From these secondary matrices, one can compute - for any given family of sequences - suitable positions for cutting all of these sequences simultaneously, thus reducing the problem of aligning a family of n sequences of average length l in a Divide and Conquer fashion to aligning two families of n sequences of approximately half that length.In this paper, we explain the method for the case of 3 sequences in detail, and we demonstrate its potential and its limits by discussing its behaviour for several test families. A generalization for aligning more than 3 sequences is lined out, and some actual alignments constructed by our algorithm for various user-defined parameters are presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号