首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We describe a new computer program that identifies conserved secondary structures in aligned nucleotide sequences of related single-stranded RNAs. The program employs a series of hash tables to identify and sort common base paired helices that are located in identical positions in more than one sequence. The program gives information on the total number of base paired helices that are conserved between related sequences and provides detailed information about common helices that have a minimum of one or more compensating base changes. The program is useful in the analysis of large biological sequences. We have used it to examine the number and type of complementary segments (potential base paired helices) that can be found in common among related random sequences similar in base composition to 16S rRNA from Escherichia coli. Two types of random sequences were analyzed. One set consisted of sequences that were independent but they had the same mononucleotide composition as the 16S rRNA. The second set contained sequences that were 80% similar to one another. Different results were obtained in the analysis of these two types of random sequences. When 5 sequences that were 80% similar to one another were analyzed, significant numbers of potential helices with two or more independent base changes were observed. When 5 independent sequences were analyzed, no potential helices were found in common. The results of the analyses with random sequences were compared with the number and type of helices found in the phylogenetic model of the secondary structure of 16S ribosomal RNA. Many more helices are conserved among the ribosomal sequences than are found in common among similar random sequences. In addition, conserved helices in the 16S rRNAs are, on the average, longer than the complementary segments that are found in comparable random sequences. The significance of these results and their application in the analysis of long non-ribosomal nucleotide sequences is discussed.  相似文献   

2.
Sim KL  Creamer TP 《Proteins》2004,54(4):629-638
Protein simple sequences, a subset of low-complexity sequences, are regions of sequence highly enriched in one or a few residue types. Simple sequences are exceedingly common, the average being more than one per protein sequence. Despite being so common, such sequences are not well-studied. The simple sequences that have been subjected to detailed study are often found to possess important functions. Here we present a survey of protein simple sequences, generally enriched in a single residue type, with the aim of studying their conservation. We find that the majority of such simple sequences are not conserved. However, conserved protein simple sequences are relatively common, with approximately 11% of the surveyed protein families possessing a conserved simple sequence. The data obtained in this study support the idea that simple sequences are conserved for functional reasons. Such functions can range from substrate binding, to mediating protein-protein interactions, to structural integrity. A perhaps surprising finding is that the residue enriching a conserved simple sequence is itself not necessarily conserved. Neither is the length of many of the highly conserved simple sequences. In the few cases where structural and functional data is available it is found that the conserved simple sequences are consistent with both local structure and function. The data presented support the idea that protein simple sequences can be conserved and have important roles in protein structure and function.  相似文献   

3.
Disease-resistance related sequences in common bean.   总被引:11,自引:0,他引:11  
Primers based on a conserved nucleotide binding site (NBS) found in several cloned plant disease resistance genes were used to amplify DNA fragments from the genome of common bean (Phaseolus vulgaris). Cloning and sequence analysis of these fragments uncovered eight unique classes of disease-resistance related sequences. All eight classes contained the conserved kinase 2 motif, and five classes contained the kinase 3a motif. Gene expression was noted for five of the eight classes of sequences. A clone from the SB3 class mapped 17.8 cM from the Ur-6 gene that confers resistance to several races of the bean rust pathogen Uromyces appendiculatus. Linkage mapping identified microclusters of disease-resistance related sequence in common bean, and sequences mapped to four linkage groups in one population. Comparison with similar sequences from soybean (Glycine max) revealed that any one class of common bean disease-resistance related sequences was more identical to a soybean NBS-containing sequence than to the sequence of another common bean class.  相似文献   

4.
We developed a new method which searches sequence segments responsible for the recognition of a given chemical structure. These segments are detected as those locally conserved among a sequence to be analyzed (target sequence) and a set of sequences (reference sequences). Reference sequences are the sequences of functionally related proteins, ligands of which contain a common chemical substructure in their molecular structures. 'Similarity graphing' cuts target sequences into segments, aligns them with reference sequence pairwise, calculates the degree of similarity for each alignment, and shows graphically cumulative similarity values on target sequence. Any locally conserved regions, short or long in length and weak or strong in similarity, are detected at their optimal conditions by adjusting three parameters. The 'enzyme-reaction database' contains chemical structures and their related enzymes. When a chemical substructure is input into the database, sequences of the enzymes related to the input substructure are systematically searched from the NBRF sequence database and output as reference sequences. Examples of analysis using similarity graphing in combination with the enzyme-reaction database showed a great potentiality in the systematic analysis of the relationships between sequences and molecular recognitions for protein engineering.  相似文献   

5.
Variation in gene expression may give rise to a significant fraction of inter-individual phenotypic variation. Studies searching for the underlying genetic controls for such variation have been conducted in model organisms and humans in recent years. In our previous effort of assessing conserved underlying haplotype patterns across ethnic populations, we constructed common haplotypes using SNPs having conserved linkage disequilibrium (LD) across ethnic populations. These common haplotypes cluster into a simple evolutionary structure based on their frequencies, defining only up to three conserved clusters termed 'haplotype frameworks'. One intriguing preliminary finding was that a significant portion of reported variants strongly associated with cis-regulation tags these globally conserved haplotype frameworks. Here we expand the investigation by collecting genes showing stringently determined cis-association between genotypes and expression phenotypes from major studies. We conducted phylogenetic analysis of current major haplotypes along with the corresponding haplotypes derived from chimpanzee reference sequences. Our analysis reveals that, for the vast majority of such cis-regulatory genes, the tagging SNPs showing the strongest association also tag the haplotype lineages directly separated from ancestry, inferred from either chimpanzee reference sequences or the allele frequency-derived haplotype frameworks, suggesting that the differentially expressed phenotypes were evolved relatively early in human history. Such evolutionary signatures provide keys for a more effective identification of globally-conserved candidate regulatory haplotypes across human genes in future epidemiologic and pharmacogenetic studies.  相似文献   

6.
Goto N  Kurokawa K  Yasunaga T 《Gene》2007,401(1-2):172-180
To date, the complete genome sequences of more than 250 organisms have been determined. This information can now be used to determine whether there exist any invariant sequences that are conserved among all organisms, from bacteria to plants, animals, and humans. The existence of invariant sequences would strongly suggest that these sequences have been inherited unchanged from the last common ancestor of all life, and that they have essential functions. We have developed a new software program to identify invariant sequences conserved among the currently sequenced genomes and applied this analysis to the complete genome sequences of 266 organisms. We have identified 3 invariant DNA sequences longer than or equal to 11 bp and 6 invariant amino acid sequences longer than or equal to 6 aa. The longest invariant DNA sequence, AAGTCGTACAAGGT (15 bp), was found in the 16S/18S rRNA gene. Two 8 aa sequences, GHVDHGKT in IF2 and EF-Tu and DTPGHVDF in EF-G, were the longest invariant amino acid sequences detected. These sequences could be essential elements from the genome of the last common ancestor and may have remained unchanged throughout evolution.  相似文献   

7.
The tomato Pto gene encodes a serine/threonine kinase (STK) whose molecular characterization has provided valuable insights into the disease resistance mechanism of tomato and it is considered as a promising candidate for engineering broad-spectrum pathogen resistance in this crop. In this study, a pair of degenerate primers based on conserved subdomains of plant STKs similar to the tomato Pto protein was used to amplify similar sequences in banana. A fragment of approximately 550 bp was amplified, cloned and sequenced. The sequence analysis of several clones revealed 13 distinct sequences highly similar to STKs. Based on their significant similarity with the tomato Pto protein (BLASTX E value <3e-53), seven of them were classified as Pto resistance gene candidates (Pto-RGCs). Multiple sequence alignment of the banana Pto-RGC products revealed that these sequences contain several conserved subdomains present in most STKs and also several conserved residues that are crucial for Pto function. Moreover, the phylogenetic analysis showed that the banana Pto-RGCs were clustered with Pto suggesting a common evolutionary origin with this R gene. The Pto-RGCs isolated in this study represent a valuable sequence resource that could assist in the development of disease resistance in banana.  相似文献   

8.
The complete nucleotide sequences of the genomes of the type 2 ( P712 , Ch, 2ab ) and type 3 (Leon 12a1b ) poliovirus vaccine strains were determined. Comparison of the sequences with the previously established genome sequence of type 1 (LS-c, 2ab ) poliovirus vaccine strain revealed that 71% of the nucleotides in the genome RNAs were common, that the 5' and 3' termini of the genomes were highly homologous, and that more than 80% of the nucleotide differences in the coding region occurred in the third letter position of in-phase codons, resulting in a low frequency of amino acid difference. These results strongly suggested that the serotypes of poliovirus derived from a common prototype. A comparison of the amino acid sequences predicted from the genome sequences showed highest variation in the capsid protein region, whereas non-structural proteins are highly conserved. Initiation of polyprotein synthesis occurs in all three strains more than 740 nucleotides downstream from the 5' end. An analysis of the non-coding region suggests that small peptides that could potentially originate from this region are conserved. The amino acid sequences immediately surrounding the cleavage signals, however, show a higher than average degree of variation. The analysis of the amino acid sequences of the capsid protein VP1 of all serotypes has led to the prediction of potential antigenic sites on the virion involved in neutralization.  相似文献   

9.
Functional RNA structures tend to be conserved during evolution. This finding is, for example, exploited by comparative methods for RNA secondary structure prediction that currently provide the state-of-art in terms of prediction accuracy. We here provide strong evidence that homologous RNA genes not only fold into similar final RNA structures, but that their folding pathways also share common transient structural features that have been evolutionarily conserved. For this, we compile and investigate a non-redundant data set of 32 sequences with known transient and final RNA secondary structures and devise a dedicated computational analysis pipeline.  相似文献   

10.
11.
Sequence analysis of plant disease resistance genes shows similarity among themselves, with the presence of conserved motifs common to the nucleotide‐binding site (NBS). Oligonucleotide degenerate primers designed from the conserved NBS motifs encoded by several plant disease resistance genes were used to amplify resistance gene analogues (RGAs) corresponding to the NBS sequences from the genomic DNA of various plant species. Using specific primers designed from the conserved NBS regions, 22 RGAs were cloned and sequenced from pearl millet (Pennisetum glaucum L. Br.). Phylogenetic analysis of the predicted amino acid sequences grouped the RGAs into nine distinct classes. GenBank database searches with the consensus protein sequences of each of the nine classes revealed their conserved NBS domains and similarity to other known R genes of various crop species. One RGA 213 was mapped onto LG1 and LG7 in the pearl millet linkage map. This is the first report of the isolation and characterization of RGAs from pearl millet, which will facilitate the improvement of marker‐assisted breeding strategies.  相似文献   

12.
The RNA recognition motif (RRM) is one of the most common eukaryotic protein motifs. RRM sequences form a conserved globular structure known as the RNA-binding domain (RBD) or the ribonucleoprotein domain. Many proteins that contain RRM sequences bind RNA in a sequence-specific manner. To investigate the basis for the RNA-binding specificity of RRMs, we subjected 330 aligned RRM sequences to covariance analysis. The analysis revealed a single network of covariant amino acid pairs comprising the buried core of the RBD and a surface patch. Structural studies have implicated a subset of these residues in RNA binding. The covariance linkages identify a larger set of amino acid residues, including some not directly in contact with bound RNA, that may influence RNA-binding specificity.  相似文献   

13.
Molecular evolution of the mammalian ribosomal protein gene, RPS14   总被引:4,自引:0,他引:4  
Ribosomal protein S14 genes (RPS14) in eukaryotic species from protozoa to primates exhibit dramatically different intron-exon structures yet share homologous polypeptide-coding sequences. To recognize common features of RPS14 gene architectures in closely related mammalian species and to evaluate similarities in their noncoding DNA sequences, we isolated the intron-containing S14 locus from Chinese hamster ovary (CHO) cell DNA by using a PCR strategy and compared it with human RPS14. We found that rodent and primate S14 genes are composed of identical protein-coding exons interrupted by introns at four conserved DNA sites. However, the structures of corresponding CHO and human RPS14 introns differ significantly. Nonetheless, individual intron splice donor, splice acceptor, and upstream flanking motifs have been conserved within mammalian S14 homologues as well as within RPS14 gene fragments PCR amplified from other vertebrate genera (birds and bony fish). Our data indicate that noncoding, intronic DNA sequences within highly conserved, single-copy ribosomal protein genes are useful molecular landmarks for phylogenetic analysis of closely related vertebrate species.   相似文献   

14.
15.
A new approach is proposed for determining common RNA secondary structures within a set of homologous RNAs. The approach is a combination of phylogenetic and thermodynamic methods which is based on the prediction of optimal and suboptimal secondary structures, topological similarity searches and phylogenetic comparative analysis. The optimal and suboptimal RNA secondary structures are predicted by energy minimization. Structural comparison of the predicted RNA secondary structures is used to find conserved structures that are topologically similar in all these homologous RNAs. The validity of the conserved structural elements found is then checked by phylogenetic comparison of the sequences. This procedure is used to predict common structures of ribonuclease P (RNAase P) RNAs.  相似文献   

16.
Proteins in the intracellular lipid-binding protein (iLBP) family show remarkably high structural conservation despite their low-sequence identity. A multiple-sequence alignment using 52 sequences of iLBP family members revealed 15 fully conserved positions, with a disproportionately high number of these (n=7) located in the relatively small helical region. The conserved positions displayed high structural conservation based on comparisons of known iLBP crystal structures. It is striking that the beta-sheet domain had few conserved positions, despite its high structural conservation. This observation prompted us to analyze pair-wise interactions within the beta-sheet region to ask whether structural information was encoded in interacting amino acid pairs. We conducted this analysis on the iLBP family member, cellular retinoic acid-binding protein I (CRABP I), whose folding mechanism is under study in our laboratory. Indeed, an analysis based on a simple classification of hydrophobic and polar amino acids revealed a network of conserved interactions in CRABP I that cluster spatially, suggesting a possible nucleation site for folding. Significantly, a small number of residues participated in multiple conserved interactions, suggesting a key role for these sites in the structure and folding of CRABP I. The results presented here correlate well with available experimental evidence on folding of CRABPs and their family members and suggest future experiments. The analysis also shows the usefulness of considering pair-wise conservation based on a simple classification of amino acids, in analyzing sequences and structures to find common core regions among homologues.  相似文献   

17.
MOTIVATION: RNA structure motifs contained in mRNAs have been found to play important roles in regulating gene expression. However, identification of novel RNA regulatory motifs using computational methods has not been widely explored. Effective tools for predicting novel RNA regulatory motifs based on genomic sequences are needed. RESULTS: We present a new method for predicting common RNA secondary structure motifs in a set of functionally or evolutionarily related RNA sequences. This method is based on comparison of stems (palindromic helices) between sequences and is implemented by applying graph-theoretical approaches. It first finds all possible stable stems in each sequence and compares stems pairwise between sequences by some defined features to find stems conserved across any two sequences. Then by applying a maximum clique finding algorithm, it finds all significant stems conserved across at least k sequences. Finally, it assembles in topological order all possible compatible conserved stems shared by at least k sequences and reports a number of the best assembled stem sets as the best candidate common structure motifs. This method does not require prior structural alignment of the sequences and is able to detect pseudoknot structures. We have tested this approach on some RNA sequences with known secondary structures, in which it is capable of detecting the real structures completely or partially correctly and outperforms other existing programs for similar purposes. AVAILABILITY: The algorithm has been implemented in C++ in a program called comRNA, which is available at http://ural.wustl.edu/softwares.html  相似文献   

18.

Background

This study examines the structural features and phylogeny of the α subunits of 69 full-length NifD (MoFe subunit), VnfD (VFe subunit), and AnfD (FeFe subunit) sequences.

Methodology/Principal Findings

The analyses of this set of sequences included BLAST scores, multiple sequence alignment, examination of patterns of covariant residues, phylogenetic analysis and comparison of the sequences flanking the conserved Cys and His residues that attach the FeMo cofactor to NifD and that are also conserved in the alternative nitrogenases. The results show that NifD nitrogenases fall into two distinct groups. Group I includes NifD sequences from many genera within Bacteria, including all nitrogen-fixing aerobes examined, as well as strict anaerobes and some facultative anaerobes, but no archaeal sequences. In contrast, Group II NifD sequences were limited to a small number of archaeal and bacterial sequences from strict anaerobes. The VnfD and AnfD sequences fall into two separate groups, more closely related to Group II NifD than to Group I NifD. The pattern of perfectly conserved residues, distributed along the full length of the Group I and II NifD, VnfD, and AnfD, confirms unambiguously that these polypeptides are derived from a common ancestral sequence.

Conclusions/Significance

There is no indication of a relationship between the patterns of covariant residues specific to each of the four groups discussed above that would give indications of an evolutionary pathway leading from one type of nitrogenase to another. Rather the totality of the data, along with the phylogenetic analysis, is consistent with a radiation of Group I and II NifDs, VnfD and AnfD from a common ancestral sequence. All the data presented here strongly support the suggestion made by some earlier investigators that the nitrogenase family had already evolved in the last common ancestor of the Archaea and Bacteria.  相似文献   

19.
Amino acid sequences for 11 acetohydroxy acid synthase (EC 4.1.3.18; AHS) polypeptides with experimentally established activity were chosen for computational comparisons to detect conserved local information associated with reaction specificity for each sequence. Windowed analysis by Pearson product moment cross-correlation of six amino acid sidechain properties revealed locally conserved segments common to all proteins with AHS activity. Seven information segments were detected in the same arrangement in sequences for the large subunit polypeptides of prokaryotes, and in the sequences for single polypeptides of eukaryotic AHS. The information segments were numbered 1-7 according to sequential position, and sequence features such as cofactor binding sites were defined for specific segments. Extension of the information segment analysis to seven other proteins of the pyruvate decarboxylase superfamily permitted use of the content and organization of information segments to recognize four classes of enzyme reaction specificity. Estimates of information entropy, based upon a state space defined by reaction specificity, directly reflected the known reaction complexity for all but one enzyme examined. Our data suggest that development of information-segment models for enzyme superfamilies may improve the accuracy of inferring protein activity from sequence.  相似文献   

20.
Comparative genome sequence analysis is powerful, but sequencing genomes is expensive. It is desirable to be able to predict how many genomes are needed for comparative genomics, and at what evolutionary distances. Here I describe a simple mathematical model for the common problem of identifying conserved sequences. The model leads to some useful rules of thumb. For a given evolutionary distance, the number of comparative genomes needed for a constant level of statistical stringency in identifying conserved regions scales inversely with the size of the conserved feature to be detected. At short evolutionary distances, the number of comparative genomes required also scales inversely with distance. These scaling behaviors provide some intuition for future comparative genome sequencing needs, such as the proposed use of “phylogenetic shadowing” methods using closely related comparative genomes, and the feasibility of high-resolution detection of small conserved features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号