首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The frequencies of "words", oligonucleotides within nucleotide sequences, reflect the genetic information contained in the sequence "texts". Nucleotide sequences are characteristically represented by their contrast word vocabularies. Comparison of the sequences by correlating their contrast vocabularies is shown to reflect well the relatedness (unrelatedness) between the sequences. A single value, the linguistic similarity between the sequences, is suggested as a measure of sequence relatedness. Sequences as short as 1000 bases can be characterized and quantitatively related to other sequences by this technique. The linguistic sequence similarity value is used for analysis of taxonomically and functionally diverse nucleotide sequences. The similarity value is shown to be very sensitive to the relatedness of the source species, thus providing a convenient tool for taxonomic classification of species by their sequence vocabularies. Functionally diverse sequences appear distinct by their linguistic similarity values. This can be a basis for a quick screening technique for functional characterization of the sequences and for mapping functionally distinct regions in long sequences.  相似文献   

2.
Abstract

The frequencies of “words”, oligonucleotides within nucleotide sequences, reflect the genetic information contained in the sequence “texts”. Nucleotide sequences are characteristically represented by their contrast word vocabularies. Comparison of the sequences by correlating their contrast vocabularies is shown to reflect well the relatedness (unrelatedness) between the sequences. A single value, the linguistic similarity between the sequences, is suggested asa measure of sequence relatedness. Sequences as short as 1000 bases can be characterized and quantitatively related to other sequences by this technique. The linguistic sequence similarity value is used for analysis of taxonomically and functionally diverse nucleotide sequences. The similarity value is shown to be very sensitive to the relatedness of the source species, thus providing a convenient tool for taxonomic classification of species by their sequence vocabularies. Functionally diverse sequences appear distinct by their linguistic similarity values. This can be a basis for a quick screening technique for functional characterization of the sequences and for mapping functionally distinct regions in long sequences.  相似文献   

3.
Internal repeats in protein sequences have wide-ranging implications for the structure and function of proteins. A keen analysis of the repeats in protein sequences may help us to better understand the structural organization of proteins and their evolutionary relations. In this paper, a mathematical method for searching for latent periodicity in protein sequences is developed. Using this method, we identified simple sequence repeats in the alkaline proteases and found that the sequences could show the same periodicity as their tertiary structures. This result may help us to reduce difficulties in the study of the relationship between sequences and their structures.  相似文献   

4.
为了深入研究基因组序列的多重分形性质,首先选取12条较长的DNA序列,并根据此12条DNA序列的编码/非编码片段将DNA序列转换成相应的12条时间序列,其次对这12个时间序列进行多重分形Hurst分析,计算它们的Hurst指数,并且利用Hurst指数分析序列的自相似性,进一步将得到的Hurst指数与DNA一维游走模型相比较,发现12条序列均具有长程相关性,这说明DNA序列中确实存在着长程相关现象。  相似文献   

5.
We developed a new method which searches sequence segments responsible for the recognition of a given chemical structure. These segments are detected as those locally conserved among a sequence to be analyzed (target sequence) and a set of sequences (reference sequences). Reference sequences are the sequences of functionally related proteins, ligands of which contain a common chemical substructure in their molecular structures. 'Similarity graphing' cuts target sequences into segments, aligns them with reference sequence pairwise, calculates the degree of similarity for each alignment, and shows graphically cumulative similarity values on target sequence. Any locally conserved regions, short or long in length and weak or strong in similarity, are detected at their optimal conditions by adjusting three parameters. The 'enzyme-reaction database' contains chemical structures and their related enzymes. When a chemical substructure is input into the database, sequences of the enzymes related to the input substructure are systematically searched from the NBRF sequence database and output as reference sequences. Examples of analysis using similarity graphing in combination with the enzyme-reaction database showed a great potentiality in the systematic analysis of the relationships between sequences and molecular recognitions for protein engineering.  相似文献   

6.
Primers for PCR amplification of partial (1,102 of 1,680 bp) formyltetrahydrofolate synthetase (FTHFS) gene sequences were developed and tested. Partial FTHFS sequences were successfully amplified from DNA from pure cultures of known acetogens, from other FTHFS-producing organisms, from the roots of the smooth cordgrass, Spartina alterniflora, and from fresh horse manure. The amplimers recovered were cloned, their nucleotide sequences were determined, and their translated amino acid sequences were used to construct phylogenetic trees. We found that FTHFS sequences from homoacetogens formed a monophyletic cluster that did not contain sequences from nonhomoacetogens and that FTHFS sequences appear to be informative regarding major physiological features of FTHFS-producing organisms.  相似文献   

7.
Certain human DNA sequences are much less methylated at CpG sites in sperm than in various adult somatic tissues. The DNA of term placenta displays intermediate levels of methylation at these sequences (Sp-0.3 sequences). We report here that pluripotent embryonal carcinoma (EC) cells derived from testicular germ cell tumors are hypermethylated at the three previously cloned Sp-0.3 sequences and seven newly isolated sequences that exhibit sperm-specific hypomethylation. In contrast to their hypermethylation in EC cells, the Sp-0.3 sequences are hypomethylated in a line of yolk sac carcinoma cells, which like placenta, represent an extraembryonic lineage. These DNA sequences, therefore, appear to be subject to coordinate changes in their methylation during differentiation, probably early in embryogenesis, despite their diversity in copy number (1 to 10(4] and primary structure. Two of these Sp-0.3 sequences are highly homologous to DNA sequences in human chromosomal regions that might be recombination hotspots, namely, a cryptic satellite DNA sequence at a fragile site and the downstream region of the beta-globin gene cluster.  相似文献   

8.
Primers for PCR amplification of partial (1,102 of 1,680 bp) formyltetrahydrofolate synthetase (FTHFS) gene sequences were developed and tested. Partial FTHFS sequences were successfully amplified from DNA from pure cultures of known acetogens, from other FTHFS-producing organisms, from the roots of the smooth cordgrass, Spartina alterniflora, and from fresh horse manure. The amplimers recovered were cloned, their nucleotide sequences were determined, and their translated amino acid sequences were used to construct phylogenetic trees. We found that FTHFS sequences from homoacetogens formed a monophyletic cluster that did not contain sequences from nonhomoacetogens and that FTHFS sequences appear to be informative regarding major physiological features of FTHFS-producing organisms.  相似文献   

9.
Two programs written in BASIC are used for the teaching of protein synthesis. Students may individually and at their own pace test their knowledge of base pairing using one of the programs. Again individually, students then investigate the process of protein synthesis using randomly generated DNA sequences produced by the second program. Students are thereby reinforcing their understanding of base pairing, complementary sequences, triplet codons, and relating nucleotide sequences to amino acid sequences. The final part of the second exercise introduces a known genetic defect of man.  相似文献   

10.
The genomes of barley and wheat, two of the world's most important crops, are very large and complex due to their high content of repetitive DNA. In order to obtain a whole-genome sequence sample, we performed two runs of 454 (GS20) sequencing on genomic DNA of barley cv. Morex, which yielded approximately 1% of a haploid genome equivalent. Almost 60% of the sequences comprised known transposable element (TE) families, and another 9% represented novel repetitive sequences. We also discovered high amounts of low-complexity DNA and non-genic low-copy DNA. We identified almost 2300 protein coding gene sequences and more than 660 putative conserved non-coding sequences. Comparison of the 454 reads with previously published genomic sequences suggested that TE families are distributed unequally along chromosomes. This was confirmed by in situ hybridizations of selected TEs. A comparison of these data for the barley genome with a large sample of publicly available wheat sequences showed that several TE families that are highly abundant in wheat are absent from the barley genome. This finding implies that the TE composition of their genomes differs dramatically, despite their very similar genome size and their close phylogenetic relationship.  相似文献   

11.
D A Rouch  R A Skurray 《Gene》1989,76(2):195-205
The nucleotide sequences for the IS257 family of insertion sequences from Staphylococcus aureus were compared with those of the ISS1 family from Streptococcus lactis and the IS15 family which is widespread amongst Gram-negative bacteria. These elements have a striking degree of similarity in both their putative transposase polypeptide sequences and their nucleotide sequences (40 to 64% between pairs), including 12 out of 14 bp conservation in their terminal inverted repeats. The evolutionary distance between the IS15 family and the IS257 and ISS1 families of Gram-positive origin is approximately twice that between the IS257 and ISS1 families. Analysis of base substitutions in the three sequences has provided insights into the effect of selection for the G + C content of immigrant genes to conform to that of their hosts, and into the evolution of biases in overall amino acid composition of cellular proteins in prokaryotes and eukaryotes. The IS257, ISS1, IS15 families form a superfamily of insertion sequences that has been involved in the spread of a number of antimicrobial resistance determinants in Gram-positive and Gram-negative pathogens.  相似文献   

12.
Public gene sequence databases have become important research tools to understand viruses and other organisms. Evidence suggests that the identifying information for some of the sequences in these databases might not belong to the sequences they are associated with. We developed two tests to conduct a comprehensive analysis of all published sequences of the hemaglutinin and neuramidase genes of avian influenza viruses (AIVs) to identify sequences that may have been misclassified. One test identified sequence pairs with highly similar nucleotide sequences despite a difference of several years between their sampling dates. Another test, which was applied to samples sequenced and deposited more than once, detected sequences with more nucleotide differences to their own than to their closest relatives. All sequences identified as misclassified were further traced to relevant publications to assess the likelihood of contamination and determine if any conclusions were associated with the use of these sequences. Our results suggested that among 4040 published gene sequences examined, approximately 0.8% might be misclassified and that publications using these sequences may include inaccurate statements. Findings from this report suggest that using laboratory-adapted strains and handling multiple samples simultaneously increases the risk of contamination. The tests reported here may be useful for screening new submissions to public sequence databases.  相似文献   

13.
This report deals with the study of compositional properties of human gene sequences evaluating similarities and differences among functionally distinct sectors of the gene independently of the reading frame. To retrieve the compositional information of DNA, we present a neighbor base dependent coding system in which the alphabet of 64 letters (DNA triplets) is compressed to an alphabet of 14 letters here termed triplet composons. The triplets containing the same set of distinct bases in whatever order and number form a triplet composon. The reading of the DNA sequence is performed starting at any letter of the initial triplet and then moving, triplet-to-triplet, until the end of the sequence. The readings were made in an overlapping way along the length of the sequences. The analysis of the compositional content in terms of the composon usage frequencies of the gene sequences shows that: (i) the compositional content of the sequences is far from that of random sequences, even in the case of non-protein coding sequences; (ii) coding sequences can be classified as components of compositional clusters; and (iii) intron sequences in a cluster have the same composon usage frequencies, even as their base composition differs notably from that of their home coding sequences. A comparison of the composon usage frequencies between human and mouse homologous genes indicated that two clusters found in humans do not have their counterpart in mouse whereas the others clusters are stable in both species with respect to their composon usage frequencies in both coding and noncoding sequences.  相似文献   

14.
Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C) content, Codon Usage (Cd), Trinucleotide Usage (Tn), and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS) in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment.  相似文献   

15.
16.
The similarity of two nucleotide sequences is often expressed in terms of evolutionary distance, a measure of the amount of change needed to transform one sequence into the other. Given two sequences with a small distance between them, can their similarity be explained by their base composition alone? The nucleotide order of these sequences contributes to their similarity if the distance is much smaller than their average permutation distance, which is obtained by calculating the distances for many random permutations of these sequences. To determine whether their similarity can be explained by their dinucleotide and codon usage, random sequences must be chosen from the set of permuted sequences that preserve dinucleotide and codon usage. The problem of choosing random dinucleotide and codon-preserving permutations can be expressed in the language of graph theory as the problem of generating random Eulerian walks on a directed multigraph. An efficient algorithm for generating such walks is described. This algorithm can be used to choose random sequence permutations that preserve (1) dinucleotide usage, (2) dinucleotide and trinucleotide usage, or (3) dinucleotide and codon usage. For example, the similarity of two 60-nucleotide DNA segments from the human beta-1 interferon gene (nucleotides 196-255 and 499-558) is not just the result of their nonrandom dinucleotide and codon usage.   相似文献   

17.
Two lambda phage clones carrying mitochondrial-DNA-like (mtDNA-like) sequences isolated from a human gene library were named Lm E-1 and Lm C-2, and their DNA structures were characterized. Lm E-1 contains about 0.4 kb DNA homologous to the 5' portion of the mitochondrial 16S ribosomal RNA (rRNA) gene and Lm C-2, a 1.6 kb DNA homologous to the 3' portion of the 12S rRNA gene and to almost all of the 16S rRNA gene. Comparisons of their nucleotide sequences with those of the corresponding regions of the human mtDNA revealed no detectable DNA rearrangement and their homologies to the human mtDNA are 84% and 80%, respectively. There are neither terminal repeats in the nuclear mtDNA-like sequences nor duplications of the nuclear DNAs flanking the mtDNA-like sequences. Evolutionary relationship between these two human nuclear mtDNA-like sequences and the human and bovine mtDNAs is discussed.  相似文献   

18.
The Afa-family sequences in wheat-related species, Triticeae, are tandem repetitive sequences of 340 bp. All the analyzed Triticeae species carried the sequences in their genomes, though the copy numbers varied about 100-fold among the species. The nucleotide fragments amplified by PCR were cloned and sequenced, and their behavior in the evolution of Triticeae was analyzed by the neighbor-joining (NJ) method. The sequences in genomes with many copies of this family clustered at independent branches of the phylogenic tree, whereas the sequences in genomes with a few copies did not. This may suggest that Afa-family sequences had amplified several times in the evolution of Triticeae, each using a limited number of different master copies. In addition, the sequences of the A and B genomes of hexaploid common wheat indicated that the Afa-family sequences had not evolved in a concerted manner between the genomes. Furthermore, the sequences of each chromosome of the D genome of this species indicated that the sequences had amplified on all over the D-genome chromosomes in a short period. Received: 1 September 1997 / Accepted: 19 January 1998  相似文献   

19.
A few foldback (FB) transposable elements have, between their long terminal inverted repeats, central loop sequences which have been shown to be different from FB inverted repeat sequences. We have investigated loop sequences from two such FB elements by analyzing their genomic distribution and sequence conservation and, in particular, by determining if they are normally associated with FB elements. One of these FB loop sequences seems to be present in a few conserved copies found adjacent to FB inverted repeat sequences, suggesting that it represents an integral component of some FB elements. The other loop sequence is less well-conserved and not usually associated with FB inverted repeats. This sequence is a member of another family of transposable elements, the HB family, and was found inserted in an FB element only by chance. We compare the complete DNA sequences of two HB elements and examine the ends of four HB elements.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号