首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: Short linear peptide motifs mediate protein-protein interaction, cell compartment targeting and represent the sites of post-translational modification. The identification of functional motifs by conventional sequence searches, however, is hampered by the short length of the motifs resulting in a large number of hits of which only a small portion is functional. RESULTS: We have developed a procedure for the identification of functional motifs, which scores pattern conservation in homologous sequences by taking explicitly into account the sequence similarity to the query sequence. For a further improvement of this method, sequence filters have been optimized to mask those sequence regions containing little or no linear motifs. The performance of this approach was verified by measuring its ability to identify 576 experimentally validated motifs among a total of 15 563 instances in a set of 415 protein sequences. Compared to a random selection procedure, the joint application of sequence filters and the novel scoring scheme resulted in a 9-fold enrichment of validated functional motifs on the first rank. In addition, only half as many hits need to be investigated to recover 75% of the functional instances in our dataset. Therefore, this motif-scoring approach should be helpful to guide experiments because it allows focusing on those short linear peptide motifs that have a high probability to be functional.  相似文献   

2.
3.
Peptidomics is a challenging field in which to create a link between genomic information and biological function through biochemical analysis of expressed peptides, including precise identification of post-translational modifications and proteolytic processing. We found that secreted peptides in Arabidopsis plants diffuse into the medium of whole-plant submerged cultures, and can be effectively identified by o- chlorophenol extraction followed by LC-MS analysis. Using this system, we first confirmed that a 12-amino-acid mature CLE44 peptide accumulated at a considerable level in the culture medium of transgenic plants overexpressing CLE44 . Next, using an in silico approach, we identified a novel gene family encoding small secreted peptides that exhibit significant sequence similarity within the C-terminal short conserved domain. We determined that the mature peptide encoded by At1g47485 , a member of this gene family, is a 15-amino-acid peptide containing two hydroxyproline residues derived from the conserved domain. This peptide, which we have named CEP1, is mainly expressed in the lateral root primordia and, when overexpressed or externally applied, significantly arrests root growth. CEP1 is a candidate for a novel peptide plant hormone.  相似文献   

4.
Amphibian tachykinin precursor   总被引:1,自引:0,他引:1  
The precursor of amphibian tachykinin has not been found although more than 30 tachykinins have been isolated from amphibians since 1964. In this report, two tachykinin-like peptides are identified from the skin secretions of the frog, Odorrana grahami. Their amino acid sequences are DDTEDLANKFIGLM-NH(2) (named tachykinin OG1) and DDASDRAKKFYGLM-NH(2) that is the same with ranamargarin found in Rana margaretae, respectively, with a conserved FXGLM-NH(2) C-terminal consensus motif. By cDNA cloning, their precursors were screened from the skin cDNA library of O. grahami. The precursors are composed of 61 amino acid (aa) residues including a signal peptide followed by an acidic spacer peptide and one copy of mature tachykinin-like peptide. Their overall structure is different from structures of other tachykinin precursors such as human protachykinin 1 precursor containing 143 aa including one copy of substance P (SP) and neurokinin A (NKA), and ascidian tachykinin 1 precursor containing 164 aa including two copies of tachykinin-like peptides. The current results demonstrate that the biosynthesis mode of tachykinins in amphibians is different from other animals.  相似文献   

5.
In order to use DNA sequences for specimen identification (e.g., barcoding, fingerprinting) an algorithm to compare query sequences with a reference database is needed. Precision and accuracy of query sequence identification was estimated for hierarchical clustering (parsimony and neighbor joining), similarity methods (BLAST, BLAT and megaBLAST), combined clustering/similarity methods (BLAST/parsimony and BLAST/neighbor joining), diagnostic methods (DNA–BAR and DOME ID), and a new method (ATIM). We offer two novel alignment‐free algorithmic solutions (DOME ID and ATIM) to identify query sequences for the purposes of DNA barcoding. Publicly available gymnosperm nrITS 2 and plastid matK sequences were used as test data sets. On the test data sets, almost all of the methods were able to accurately identify sequences to genus; however, no method was able to accurately identify query sequences to species at a frequency that would be considered useful for routine specimen identification (42–71% unambiguously correct). Clustering methods performed the worst (perhaps due to alignment issues). Similarity methods, ATIM, DNA–BAR, and DOME ID all performed at approximately the same level. Given the relative precision of the algorithms (median = 67% unambiguous), the low accuracy of species‐level identification observed could be ascribed to the lack of correspondence between patterns of allelic similarity and species delimitations. Application of DNA barcoding to sequences of CITES listed cycads (Cycadopsida) provides an example of the potential application of DNA barcoding to enforcement of conservation laws. © The Willi Hennig Society 2006.  相似文献   

6.
We report the use of chemical derivatization with MALDI-MS/MS analysis for de novo sequence analysis. Using three frequently used homology-based search algorithms, we were able to identify more than 40 proteins from banana, a nonmodel plant with unsequenced genome. Furthermore, this approach allowed the identification of different isoforms. We also observed that the identification score obtained varied according to the position of the peptide sequences in the query using the MS-Blast algorithm.  相似文献   

7.
MOTIVATION: The discovery of solid-binding peptide sequences is accelerating along with their practical applications in biotechnology and materials sciences. A better understanding of the relationships between the peptide sequences and their binding affinities or specificities will enable further design of novel peptides with selected properties of interest both in engineering and medicine. RESULTS: A bioinformatics approach was developed to classify peptides selected by in vivo techniques according to their inorganic solid-binding properties. Our approach performs all-against-all comparisons of experimentally selected peptides with short amino acid sequences that were categorized for their binding affinity and scores the alignments using sequence similarity scoring matrices. We generated novel scoring matrices that optimize the similarities within the strong-binding peptide sequences and the differences between the strong- and weak-binding peptide sequences. Using the scoring matrices thus generated, a given peptide is classified based on the sequence similarity to a set of experimentally selected peptides. We demonstrate the new approach by classifying experimentally characterized quartz-binding peptides and computationally designing new sequences with specific affinities. Experimental verifications of binding of these computationally designed peptides confirm our predictions with high accuracy. We further show that our approach is a general one and can be used to design new sequences that bind to a given inorganic solid with predictable and enhanced affinity.  相似文献   

8.
We describe two novel sequence similarity search algorithms, FASTS and FASTF, that use multiple short peptide sequences to identify homologous sequences in protein or DNA databases. FASTS searches with peptide sequences of unknown order, as obtained by mass spectrometry-based sequencing, evaluating all possible arrangements of the peptides. FASTF searches with mixed peptide sequences, as generated by Edman sequencing of unseparated mixtures of peptides. FASTF deconvolutes the mixture, using a greedy heuristic that allows rapid identification of high scoring alignments while reducing the total number of explored alternatives. Both algorithms use the heuristic FASTA comparison strategy to accelerate the search but use alignment probability, rather than similarity score, as the criterion for alignment optimality. Statistical estimates are calculated using an empirical correction to a theoretical probability. These calculated estimates were accurate within a factor of 10 for FASTS and 1000 for FASTF on our test dataset. FASTS requires only 15-20 total residues in three or four peptides to robustly identify homologues sharing 50% or greater protein sequence identity. FASTF requires about 25% more sequence data than FASTS for equivalent sensitivity, but additional sequence data are usually available from mixed Edman experiments. Thus, both algorithms can identify homologues that diverged 100 to 500 million years ago, allowing proteomic identification from organisms whose genomes have not been sequenced.  相似文献   

9.
10.
The zebrafish, Danio rerio, has three types of pigment cells (melanophores, xanthophores and iridophores) and, in adult fish, these cells are organized into a stripe pattern. The mechanisms underlying formation of the stripe pattern are largely unknown. We report here the identification and characterization of a novel dominant zebrafish mutation, hagoromo (hag), which was generated by insertional mutagenesis using a pseudotyped retrovirus. The hag mutation caused disorganized stripe patterns. Two hag mutant alleles were isolated independently and proviruses were located within the fifth intron of a novel gene, which we named hag, encoding an F-box/WD40-repeat protein. The hag gene was mapped to linkage group (LG)13, close to fgf8 and pax2.1. Amino acid sequence similarity, conserved exon-intron boundaries and conserved synteny indicated that zebrafish hag is an ortholog of mouse Dactylin, the gene mutated in the Dactylaplasia (Dac) mouse [1]. The Dac mutation is dominant and causes defects in digit formation in fore- and hindlimbs. This study revealed that the hag locus is important for pattern formation in fish but is involved in distinct morphogenetic events in different vertebrates.  相似文献   

11.
GlobPlot: Exploring protein sequences for globularity and disorder   总被引:2,自引:0,他引:2  
A major challenge in the proteomics and structural genomics era is to predict protein structure and function, including identification of those proteins that are partially or wholly unstructured. Non-globular sequence segments often contain short linear peptide motifs (e.g. SH3-binding sites) which are important for protein function. We present here a new tool for discovery of such unstructured, or disordered regions within proteins. GlobPlot (http://globplot.embl.de) is a web service that allows the user to plot the tendency within the query protein for order/globularity and disorder. We show examples with known proteins where it successfully identifies inter-domain segments containing linear motifs, and also apparently ordered regions that do not contain any recognised domain. GlobPlot may be useful in domain hunting efforts. The plots indicate that instances of known domains may often contain additional N- or C-terminal segments that appear ordered. Thus GlobPlot may be of use in the design of constructs corresponding to globular proteins, as needed for many biochemical studies, particularly structural biology. GlobPlot has a pipeline interface--GlobPipe--for the advanced user to do whole proteome analysis. GlobPlot can also be used as a generic infrastructure package for graphical displaying of any possible propensity.  相似文献   

12.
Immune responses contribute to the pathogenesis of vitiligo and target melanoma sometimes associated with vitiligo-like depigmentation in some melanoma patients. We analyzed the sera from patients with vitiligo and cutaneous melanoma for reactivity toward tyrosinase peptide sequences 1) endowed with low level of similarity to human proteome, and 2) potentially able to bind HLA-DR1 Ags. We report that the tyrosinase autoantigen was immunorecognized with the same molecular pattern by sera from vitiligo and melanoma patients. Five autoantigen peptides composed the immunodominant anti-tyrosinase response: aa95-104FMGFNCGNCK; aa175-182 LFVWMHYY; aa176-190FVWMHYYVSMDALLG; aa222-236IQKLTGDENFTIPYW, and aa233-247 IPYWDWRDAEKCDIC. All of the five antigenic peptides were characterized by being (or containing) a sequence with low similarity level to the self proteome. Sera from healthy subjects were responsive to aa95-104FMGFNCGNCK, aa222-236IQKLTGDENFTIPYW, and aa233-247 IPYWDWRDAEKCDIC, but did not react with the aa175-182LFVWMHYY and aa176-190FVWMHYYVSMDALLG peptide sequences containing the copper-binding His180 and the oculocutaneous albinism I-A variant position F176. Our results indicate a clear-cut link between peptide immunogenicity and low similarity level of the corresponding amino acid sequence, and are an example of a comparative analysis that might allow to comprehensively distinguish the epitopic peptide sequences within a disease from those associated to natural autoantibodies. In particular, these data, for the first time, delineate the linear B epitope pattern on tyrosinase autoantigen and provide definitive evidence of humoral immune responses against tyrosinase.  相似文献   

13.
Li L  Wu C  Huang H  Zhang K  Gan J  Li SS 《Nucleic acids research》2008,36(10):3263-3273
Systematic identification of binding partners for modular domains such as Src homology 2 (SH2) is important for understanding the biological function of the corresponding SH2 proteins. We have developed a worldwide web-accessible computer program dubbed SMALI for scoring matrix-assisted ligand identification for SH2 domains and other signaling modules. The current version of SMALI harbors 76 unique scoring matrices for SH2 domains derived from screening oriented peptide array libraries. These scoring matrices are used to search a protein database for short peptides preferred by an SH2 domain. An experimentally determined cut-off value is used to normalize an SMALI score, therefore allowing for direct comparison in peptide-binding potential for different SH2 domains. SMALI employs distinct scoring matrices from Scansite, a popular motif-scanning program. Moreover, SMALI contains built-in filters for phosphoproteins, Gene Ontology (GO) correlation and colocalization of subject and query proteins. Compared to Scansite, SMALI exhibited improved accuracy in identifying binding peptides for SH2 domains. Applying SMALI to a group of SH2 domains identified hundreds of interactions that overlap significantly with known networks mediated by the corresponding SH2 proteins, suggesting SMALI is a useful tool for facile identification of signaling networks mediated by modular domains that recognize short linear peptide motifs.  相似文献   

14.
15.
We consider the problem of similarity queries in biological network databases. Given a database of networks, similarity query returns all the database networks whose similarity (i.e. alignment score) to a given query network is at least a specified similarity cutoff value. Alignment of two networks is a very costly operation, which makes exhaustive comparison of all the database networks with a query impractical. To tackle this problem, we develop a novel indexing method, named RINQ (Reference-based Indexing for Biological Network Queries). Our method uses a set of reference networks to eliminate a large portion of the database quickly for each query. A reference network is a small biological network. We precompute and store the alignments of all the references with all the database networks. When our database is queried, we align the query network with all the reference networks. Using these alignments, we calculate a lower bound and an approximate upper bound to the alignment score of each database network with the query network. With the help of upper and lower bounds, we eliminate the majority of the database networks without aligning them to the query network. We also quickly identify a small portion of these as guaranteed to be similar to the query. We perform pairwise alignment only for the remaining networks. We also propose a supervised method to pick references that have a large chance of filtering the unpromising database networks. Extensive experimental evaluation suggests that (i) our method reduced the running time of a single query on a database of around 300 networks from over 2 days to only 8 h; (ii) our method outperformed the state of the art method Closure Tree and SAGA by a factor of three or more; and (iii) our method successfully identified statistically and biologically significant relationships across networks and organisms.  相似文献   

16.
Lectins are proteins with ability to bind reversibly and non-enzymatically to a specific carbohydrate. They are involved in numerous biological processes and show enormous biotechnological potential. Among plant lectins, the hevein domain is extremely common, being observed in several kinds of lectins. Moreover, this domain is also observed in an important class of antimicrobial peptides named hevein-like peptides. Due to higher cysteine residues conservation, hevein-like peptides could be mined among the sequence databases. By using the pattern CX(4,5)CC[GS]X(2)GXCGX[GST]X(2,3)[FWY]C[GS]X[AGS] novel hevein-like peptide precursors were found from three different plants: Oryza sativa, Vitis vinifera and Selaginella moellendorffii. In addition, an hevein-like peptide precursor from the phytopathogenic fungus Phaeosphaeria nodorum was also identified. The molecular models indicate that they have the same scaffold as others, composed of an antiparallel β-sheet and short helices. Nonetheless, the fungal hevein-like peptide probably has a different disulfide bond pattern. Despite this difference, the complexes between peptide and N,N,N-triacetylglucosamine are stable, according to molecular dynamics simulations. This is the first report of an hevein-like peptide from an organism outside the plant kingdom. The exact role of an hevein-like peptide in the fungal biology must be clarified, while in plants they are clearly involved in plant defense. In summary, data here reported clear shows that an in silico strategy could lead to the identification of novel hevein-like peptides that could be used as biotechnological tools in the fields of health and agribusiness.  相似文献   

17.
18.
U Buwitt  T Flohr    E C Bttger 《The EMBO journal》1992,11(2):489-496
Here we report the molecular cloning of several related human cDNAs from which a full-length sequence can be determined. The cDNAs encode a 2.8 kb mRNA that is strongly induced by interferon (IFN) gamma and the expression of which is not cell-restricted but observed in fibroblasts, macrophages and epithelial cells. The deduced amino acid sequence predicts a protein of 471 amino acids with high sequence similarity to a previously identified rabbit peptide chain release factor. Functional studies to demonstrate release factor activity showed that the protein encoded by this cDNA inhibited the readthrough activity of a yeast UGA suppressor tRNA in an in vitro translation system. The identification of this novel cDNA implies that translational control by IFN induced proteins may not be restricted to the initial steps of protein synthesis but may also act by regulation of peptide chain termination.  相似文献   

19.
Little DP 《PloS one》2011,6(8):e20552
For DNA barcoding to succeed as a scientific endeavor an accurate and expeditious query sequence identification method is needed. Although a global multiple-sequence alignment can be generated for some barcoding markers (e.g. COI, rbcL), not all barcoding markers are as structurally conserved (e.g. matK). Thus, algorithms that depend on global multiple-sequence alignments are not universally applicable. Some sequence identification methods that use local pairwise alignments (e.g. BLAST) are unable to accurately differentiate between highly similar sequences and are not designed to cope with hierarchic phylogenetic relationships or within taxon variability. Here, I present a novel alignment-free sequence identification algorithm--BRONX--that accounts for observed within taxon variability and hierarchic relationships among taxa. BRONX identifies short variable segments and corresponding invariant flanking regions in reference sequences. These flanking regions are used to score variable regions in the query sequence without the production of a global multiple-sequence alignment. By incorporating observed within taxon variability into the scoring procedure, misidentifications arising from shared alleles/haplotypes are minimized. An explicit treatment of more inclusive terminals allows for separate identifications to be made for each taxonomic level and/or for user-defined terminals. BRONX performs better than all other methods when there is imperfect overlap between query and reference sequences (e.g. mini-barcode queries against a full-length barcode database). BRONX consistently produced better identifications at the genus-level for all query types.  相似文献   

20.
Microbial communities are of great environmental, medical, and industrial significance. To date, biomolecular methods to study communities have focused on identifying species, with limited capabilities to reveal functions. Proteomics has the potential to yield functional information about these communities, but the application of proteomic methods to complex mixtures of unsequenced organisms is in its infancy. In this study, 2DE, MALDI-TOF/TOF MS, and de novo peptide sequencing were used for the separation and identification of proteins differentially expressed over time following exposure of a bacterial community to an inhibitory level of cadmium. Significant community proteome responses after 0.25, 1, 2, and 3 h of exposure to cadmium were observed, with more than 100 protein expression changes detected at each time point. Several temporal responses were observed, and the most common expression pattern was immediate up- or down-regulation within 15 min of shock followed by maintenance of that level. More than 100 unique differentially expressed proteins were identified through database searching and de novo sequencing. Proteins of importance in the cadmium shock included ATPases, oxidoreductases, and transport proteins. The ability of proteomics to detect the differential regulation of these proteins even during short cadmium exposures shows that it is a powerful tool in explaining cellular mechanisms for a mixed culture. This is the first report of the large-scale identification of proteins involved in the dynamic response of a community of unsequenced bacteria using de novo sequencing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号