首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
When investigators undertake searches of DNA databases, they normally discard large numbers of alignments that demonstrate very weak resemblances to each other, retaining only those that show statistically significant levels of resemblance. We show here that a great deal of information can be extracted from these weak alignments by examining them en masse. This is done by building three-dimensional similarity landscapes from the alignments, landscapes that reveal whether an unusual number of individually nonsignificant alignments tend to match up to a particular region of the query sequence being searched. The power of the search is increased by the use of libraries consisting entirely of introns or of exons. We show that (1) similarity landscapes with a variety of features can be generated from both intron and exon libraries, using introns or exons as query sequences; (2) the landscape features are real and not a statistical artifact; (3) well-known protein motifs used as query sequences can generate various landscape features; and (4) there is some evidence for resemblances between short regions of sequence carried by introns and exons. One possible interpretation of these results is that both introns and exons may have been built up during their evolution from short regions of sequence that as a result are now widely distributed throughout eukaryotic genomes. Such an interpretation would imply that these short regions have common ancestry. Alternatively, the wide sharing of short pieces of DNA may reflect regions with particular structural properties that have arisen through convergent evolution. The similarity-landscape approach can be used to detect such widespread structural motifs and sequence motifs in the genome that might be missed by less-global searches. It can also be used in conjunction with algorithms developed for detecting significant multiple alignments by isolating promising subsets of the databases that can be examined in more detail.Correspondence to: C. Wills  相似文献   

2.
Kagiampakis I  Jin H  Kim S  Vannucci M  LiWang PJ  Tsai J 《Biochemistry》2008,47(40):10637-10648
In the chemokine family, we characterize two examples of evolutionarily conserved unfavorable sequence motifs that affect quaternary structure. In contrast to the straightforward action of favorable sequences, these unfavorable motifs produce interactions disfavoring one outcome to indirectly promote another one but should not be confused with the broad sampling produced by negative selection and/or design. To identify such motifs, we developed a statistically validated computational method combining structure and phylogeny. This approach was applied in an analysis of the alternate forms of homodimerization exhibited in the chemokine family. While the chemokine family exhibits the same tertiary fold, members of certain subfamilies, including CXCL8, form a homodimer across the beta1 strand whereas members of other subfamilies, including CCL4 and CCL2, form a homodimer on the opposite side of the chemokine fold. These alternate dimerization states suggest that CCL4 and CCL2 contain specific sequences that disfavor CXCL8 dimerization. Using our computational approach, we identified two evolutionarily conserved sequence motifs in the CC subfamilies: a drastic two-residue deletion (DeltaRV) and a simple point mutation (V27R). Cloned into the CXCL8 background, these two motifs were experimentally proven to confer a monomeric state. NMR analyses indicate that these variants are structured in solution and retain the chemokine fold. Structurally, the motifs retain a chemokine tertiary fold while introducing unfavorable quaternary interactions that inhibit CXCL8 dimerization. In demonstrating the success of our computational method, our results argue that these unfavorable motifs have been evolutionarily conserved to specifically disfavor one dimerization state and, as a result, indirectly contribute to favoring another.  相似文献   

3.
4.
5.
6.
Hydroxyproline (Hyp)-rich glycoproteins (HRGPs) participate in all aspects of plant growth and development. HRGPs are generally highly O-glycosylated through the Hyp residues, which means carbohydrates help define the interactive molecular surface and, hence, HRGP function. The Hyp contiguity hypothesis predicts that contiguous Hyp residues are sites of HRGP arabinosylation, whereas clustered noncontiguous Hyp residues are sites of galactosylation, giving rise to the arabinogalactan heteropolysaccharides that characterize the arabinogalactan-proteins. Early tests of the hypothesis using synthetic genes encoding only clustered noncontiguous Hyp in the sequence (serine [Ser]-Hyp-Ser-Hyp)(n) or contiguous Hyp in the series (Ser-Hyp-Hyp)(n) and (Ser-Hyp-Hyp-Hyp-Hyp)(n) confirmed that arabinogalactan polysaccharide was added only to noncontiguous Hyp, whereas arabinosylation occurred on contiguous Hyp. Here, we extended our tests of the codes that direct arabinogalactan polysaccharide addition to Hyp by building genes encoding the repetitive sequences (alanine [Ala]-proline [Pro]-Ala-Pro)(n), (threonine [Thr]-Pro-Thr-Pro)(n), and (valine [Val]-Pro-Val-Pro)(n), and expressing them in tobacco (Nicotiana tabacum) Bright-Yellow 2 cells as fusion proteins with green fluorescent protein. All of the Pro residues in the (Ala-Pro-Ala-Pro)(n) fusion protein were hydroxylated and consistent with the hypothesis that every Hyp residue was glycosylated with arabinogalactan polysaccharide. In contrast, 20% to 30% of Pro residues remained non-hydroxylated in the (Thr-Pro-Thr-Pro)(n), and (Val-Pro-Val-Pro)(n) fusion proteins. Furthermore, although 50% to 60% of the Hyp residues were glycosylated with arabinogalactan polysaccharide, some remained non-glycosylated or were arabinosylated. These results suggest that the amino acid side chains of flanking residues influence the extent of Pro hydroxylation and Hyp glycosylation and may explain why isolated noncontiguous Hyp in extensins do not acquire an arabinogalactan polysaccharide but are arabinosylated or remain non-glycosylated.  相似文献   

7.
Motif3D is a web-based protein structure viewer designed to allow sequence motifs, and in particular those contained in the fingerprints of the PRINTS database, to be visualised on three-dimensional (3D) structures. Additional functionality is provided for the rhodopsin-like G protein-coupled receptors, enabling fingerprint motifs of any of the receptors in this family to be mapped onto the single structure available, that of bovine rhodopsin. Motif3D can be used via the web interface available at: http://www.bioinf.man.ac.uk/dbbrowser/motif3d/motif3d.html.  相似文献   

8.
9.
PCR primers of arbitrary nucleotide sequence have identified DNA polymorphisms useful for genetic mapping in a large variety of organisms. Although technically very powerful, the use of arbitrary primers for genome mapping has the disadvantage of characterizing DNA sequences of unknown function. Thus, there is no reason to anticipate that DNA fragments amplified by use of arbitrary primers will be enriched for either transcribed or promoter sequences that may be conserved in evolution. For these reasons, we modified the arbitrarily primed PCR method by using oligonucleotide primers derived from conserved promoter elements and protein motifs. Twenty-nine of these primers were tested individually and in pairwise combinations for their ability to amplify genomic DNA from a variety of species including various inbred strains of laboratory mice and Mus spretus. Using recombinant inbred strains of mice, we determined the chromosomal location of 27 polymorphic fragments in the mouse genome. The results demonstrated that motif sequence-tagged PCR products are reliable markers for mapping the mouse genome and that motif primers can also be used for genomic fingerprinting of many divergent species.  相似文献   

10.
Scrutineer is an interactive, user-friendly program designedto search for motifs, patterns and profiles in the Swissprot,Protein Identification Resource (PIR) or SeqDb protein sequencedatabases. Basic capabilities include (i) searches for stringsof amino acids with multiple choices at a given position; (ii)searches for strings including variable-length segments anddelocalized constraints; (iii) searches over subsets of a databaseor particular regions within each sequence (e.g. N-terminalone-third); (iv) searches involving secondary structure predictions,physicochemical characteristics, and the like; and (v) searchesusing aligned sequences as targets with various optional weightingschemes. The various search criteria and hits can be combinedand complex targets located. Once the data are loaded into virtualmemory, all occurrences in PIR release 22.0 (3.7 X 106 aminoacids) of a given short string of amino acids (e.g. ahexamer)are found in -36s. Scrutineer can also describe the entire database,user-specified hits, user-defined regions of sequence and allhits. The source code and accompanying manual are being freelydistributed.  相似文献   

11.
12.

Background  

Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features.  相似文献   

13.
An automatic procedure is proposed to identify, from the protein sequence database, conserved amino acid patterns (or sequence motifs) that are exclusive to a group of functionally related proteins. This procedure is applied to the PIR database and a dictionary of sequence motifs that relate to specific superfamilies constructed. The motifs have a practical relevance in identifying the membership of specific superfamilies without the need to perform sequence database searches in 20% of newly determined sequences. The sequence motifs identified represent functionally important sites on protein molecules. When multiple blocks exist in a single motif they are often close together in the 3-D structure. Furthermore, occasionally these motif blocks were found to be split by introns when the correlation with exon structures was examined.  相似文献   

14.
15.
We present the complete nucleotide sequence of a Drosophila alpha-amylase gene and its flanking regions, as determined by cDNA and genomic sequence analysis. This gene, unlike its mammalian counterparts, contains no introns. Nevertheless the insect and mammalian genes share extensive nucleotide similarity and the insect protein contains the four amino acid sequence blocks common to all alpha-amylases. In Drosophila melanogaster, there are two closely-linked copies of the alpha-amylase gene and they are divergently transcribed. In the 5'-regions of the two gene-copies we find high sequence divergence, yet the typical eukaryotic gene expression motifs have been maintained. The 5'-terminus of the alpha-amylase mRNA, as determined by primer extension analysis, maps to a characteristic Drosophila sequence motif. Additional conserved elements upstream of both genes may also be involved in amylase gene expression which is known to be under complex controls that include glucose repression.  相似文献   

16.
Essential genes code for fundamental cellular functions required for the viability of an organism. For this reason, essential genes are often highly conserved across organisms. However, this is not always the case: orthologues of genes that are essential in one organism are sometimes not essential in other organisms or are absent from their genomes. This suggests that, in the course of evolution, essential genes can be rendered nonessential. How can a gene become non-essential? Here we used genetic manipulation to deplete the products of 26 different essential genes in Escherichia coli. This depletion results in a lethal phenotype, which could often be rescued by the overexpression of a non-homologous, non-essential gene, most likely through replacement of the essential function. We also show that, in a smaller number of cases, the essential genes can be fully deleted from the genome, suggesting that complete functional replacement is possible. Finally, we show that essential genes whose function can be replaced in the laboratory are more likely to be non-essential or not present in other taxa. These results are consistent with the notion that patterns of evolutionary conservation of essential genes are influenced by their compensability—that is, by how easily they can be functionally replaced, for example through increased expression of other genes.  相似文献   

17.
18.
19.
CompariMotif is a novel tool for making motif-motif comparisons, identifying and describing similarities between regular expression motifs. CompariMotif can identify a number of different relationships between motifs, including exact matches, variants of degenerate motifs and complex overlapping motifs. Motif relationships are scored using shared information content, allowing the best matches to be easily identified in large comparisons. Many input and search options are available, enabling a list of motifs to be compared to itself (to identify recurring motifs) or to datasets of known motifs. AVAILABILITY: CompariMotif can be run online at http://bioware.ucd.ie/ and is freely available for academic use as a set of open source Python modules under a GNU General Public License from http://bioinformatics.ucd.ie/shields/software/comparimotif/  相似文献   

20.
The context-dependent expression of genes is the core for biological activities, and significant attention has been given to identification of various factors contributing to gene expression at genomic scale. However, so far this type of analysis has been focused either on relation between mRNA expression and non-coding sequence features such as upstream regulatory motifs or on correlation between mRNA abundance and non-random features in coding sequences (e.g., codon usage and amino acid usage). In this study multiple regression analyses of the mRNA abundance and all sequence information in Desulfovibrio vulgaris were performed, with the goal to investigate how much coding and non-coding sequence features contribute to the variations in mRNA expression, and in what manner they act together. Using the AlignACE program, 442 over-represented motifs were identified from the upstream 100bp region of 293 genes located in the known regulons. Regression of mRNA expression data against the measures of coding and non-coding sequence features indicated that 54.1% of the variations in mRNA abundance can be explained by the presence of upstream motifs, while coding sequences alone contribute to 29.7% of the variations in mRNA abundance. Interestingly, most of contribution from coding sequences is overlapping with that from upstream motifs; thereby a total of 60.3% of the variations in mRNA abundance can be explained when coding and non-coding information was included. This result demonstrates that upstream regulatory motifs and coding sequence information contribute to the overall mRNA expression in a combinatorial rather than an additive manner.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号