首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Most homologous pairs of proteins have no significant sequence similarity to each other and are not identified by direct sequence comparison or profile-based strategies. However, multiple sequence alignments of low similarity homologues typically reveal a limited number of positions that are well conserved despite diversity of function. It may be inferred that conservation at most of these positions is the result of the importance of the contribution of these amino acids to the folding and stability of the protein. As such, these amino acids and their relative positions may define a structural signature. We demonstrate that extraction of this fold template provides the basis for the sequence database to be searched for patterns consistent with the fold, enabling identification of homologs that are not recognized by global sequence analysis. The fold template method was developed to address the need for a tool that could comprehensively search the midnight and twilight zones of protein sequence similarity without reliance on global statistical significance. Manual implementations of the fold template method were performed on three folds--immunoglobulin, c-lectin and TIM barrel. Following proof of concept of the template method, an automated version of the approach was developed. This automated fold template method was used to develop fold templates for 10 of the more populated folds in the SCOP database. The fold template method developed three-dimensional structural motifs or signatures that were able to return a diverse collection of proteins, while maintaining a low false positive rate. Although the results of the manual fold template method were more comprehensive than the automated fold template method, the diversity of the results from the automated fold template method surpassed those of current methods that rely on statistical significance to infer evolutionary relationships among divergent proteins.  相似文献   

2.
Short motifs are known to play diverse roles in proteins, such as in mediating the interactions with other molecules, binding to membranes, or conducting a specific biological function. Standard approaches currently employed to detect short motifs in proteins search for enrichment of amino acid motifs considering mostly the sequence information. Here, we presented a new approach to search for common motifs (protein signatures) which share both physicochemical and structural properties, looking simultaneously at different features. Our method takes as an input an amino acid sequence and translates it to a new alphabet that reflects its intrinsic structural and chemical properties. Using the MEME search algorithm, we identified the proteins signatures within subsets of protein which encompass common sequence and structural information. We demonstrated that we can detect enriched structural motifs, such as the amphipathic helix, from large datasets of linear sequences, as well as predicting common structural properties (such as disorder, surface accessibility, or secondary structures) of known functional‐motifs. Finally, we applied the method to the yeast protein interactome and identified novel putative interacting motifs. We propose that our approach can be applied for de novo protein function prediction given either sequence or structural information. Proteins 2013; © 2012 Wiley Periodicals, Inc.  相似文献   

3.

Background  

Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets.  相似文献   

4.
Nucleic acid sequences from genome sequencing projects are submitted as raw data, from which biologists attempt to elucidate the function of the predicted gene products. The protein sequences are stored in public databases, such as the UniProt Knowledgebase (UniProtKB), where curators try to add predicted and experimental functional information. Protein function prediction can be done using sequence similarity searches, but an alternative approach is to use protein signatures, which classify proteins into families and domains. The major protein signature databases are available through the integrated InterPro database, which provides a classification of UniProtKB sequences. As well as characterization of proteins through protein families, many researchers are interested in analyzing the complete set of proteins from a genome (i.e. the proteome), and there are databases and resources that provide non-redundant proteome sets and analyses of proteins from organisms with completely sequenced genomes. This article reviews the tools and resources available on the web for single and large-scale protein characterization and whole proteome analysis.  相似文献   

5.
Karlin D  Belshaw R 《PloS one》2012,7(3):e31719
Paramyxovirinae are a large group of viruses that includes measles virus and parainfluenza viruses. The viral Phosphoprotein (P) plays a central role in viral replication. It is composed of a highly variable, disordered N-terminus and a conserved C-terminus. A second viral protein alternatively expressed, the V protein, also contains the N-terminus of P, fused to a zinc finger. We suspected that, despite their high variability, the N-termini of P/V might all be homologous; however, using standard approaches, we could previously identify sequence conservation only in some Paramyxovirinae. We now compared the N-termini using sensitive sequence similarity search programs, able to detect residual similarities unnoticeable by conventional approaches. We discovered that all Paramyxovirinae share a short sequence motif in their first 40 amino acids, which we called soyuz1. Despite its short length (11-16aa), several arguments allow us to conclude that soyuz1 probably evolved by homologous descent, unlike linear motifs. Conservation across such evolutionary distances suggests that soyuz1 plays a crucial role and experimental data suggest that it binds the viral nucleoprotein to prevent its illegitimate self-assembly. In some Paramyxovirinae, the N-terminus of P/V contains a second motif, soyuz2, which might play a role in blocking interferon signaling. Finally, we discovered that the P of related Mononegavirales contain similarly overlooked motifs in their N-termini, and that their C-termini share a previously unnoticed structural similarity suggesting a common origin. Our results suggest several testable hypotheses regarding the replication of Mononegavirales and suggest that disordered regions with little overall sequence similarity, common in viral and eukaryotic proteins, might contain currently overlooked motifs (intermediate in length between linear motifs and disordered domains) that could be detected simply by comparing orthologous proteins.  相似文献   

6.
Fröhlich H 《PloS one》2011,6(10):e25364
Diagnostic and prognostic biomarkers for cancer based on gene expression profiles are viewed as a major step towards a better personalized medicine. Many studies using various computational approaches have been published in this direction during the last decade. However, when comparing different gene signatures for related clinical questions often only a small overlap is observed. This can have various reasons, such as technical differences of platforms, differences in biological samples or their treatment in lab, or statistical reasons because of the high dimensionality of the data combined with small sample size, leading to unstable selection of genes. In conclusion retrieved gene signatures are often hard to interpret from a biological point of view. We here demonstrate that it is possible to construct a consensus signature from a set of seemingly different gene signatures by mapping them on a protein interaction network. Common upstream proteins of close gene products, which we identified via our developed algorithm, show a very clear and significant functional interpretation in terms of overrepresented KEGG pathways, disease associated genes and known drug targets. Moreover, we show that such a consensus signature can serve as prior knowledge for predictive biomarker discovery in breast cancer. Evaluation on different datasets shows that signatures derived from the consensus signature reveal a much higher stability than signatures learned from all probesets on a microarray, while at the same time being at least as predictive. Furthermore, they are clearly interpretable in terms of enriched pathways, disease associated genes and known drug targets. In summary we thus believe that network based consensus signatures are not only a way to relate seemingly different gene signatures to each other in a functional manner, but also to establish prior knowledge for highly stable and interpretable predictive biomarkers.  相似文献   

7.
Here we have identified HIV-1 B clade Envelope (Env) amino acid signatures from early in infection that may be favored at transmission, as well as patterns of recurrent mutation in chronic infection that may reflect common pathways of immune evasion. To accomplish this, we compared thousands of sequences derived by single genome amplification from several hundred individuals that were sampled either early in infection or were chronically infected. Samples were divided at the outset into hypothesis-forming and validation sets, and we used phylogenetically corrected statistical strategies to identify signatures, systematically scanning all of Env. Signatures included single amino acids, glycosylation motifs, and multi-site patterns based on functional or structural groupings of amino acids. We identified signatures near the CCR5 co-receptor-binding region, near the CD4 binding site, and in the signal peptide and cytoplasmic domain, which may influence Env expression and processing. Two signatures patterns associated with transmission were particularly interesting. The first was the most statistically robust signature, located in position 12 in the signal peptide. The second was the loss of an N-linked glycosylation site at positions 413-415; the presence of this site has been recently found to be associated with escape from potent and broad neutralizing antibodies, consistent with enabling a common pathway for immune escape during chronic infection. Its recurrent loss in early infection suggests it may impact fitness at the time of transmission or during early viral expansion. The signature patterns we identified implicate Env expression levels in selection at viral transmission or in early expansion, and suggest that immune evasion patterns that recur in many individuals during chronic infection when antibodies are present can be selected against when the infection is being established prior to the adaptive immune response.  相似文献   

8.
Artificial selection has greatly improved the beef production performance and changed its genetic basis. High-density SNP markers provide a way to track these changes and use selective signatures to search for the genes associated with artificial selection. In this study, we performed extended haplotype homozygosity (EHH) tests based on Illumina BovineSNP50 (54 K) Chip data from 942 Simmental cattle to identify significant core regions containing selective signatures, then verified the biological significance of these identified regions based on some commonly used bioinformatics analyses. A total of 224 regions over the whole genome in Simmental cattle showing the highest significance and containing some important functional genes, such as GHSR, TG and CANCNA2D1 were chosen. We also observed some significant terms in the enrichment analyses of second GO terms and KEGG pathways, indicating that these genes are associated with economically relevant cattle traits. This is the first detection of selection signature in Simmental cattle. Our findings significantly expand the selection signature map of the cattle genome, and identify functional candidate genes under positive selection for future genetic research.  相似文献   

9.
Multidimensional omic datasets often have correlated features leading to the possibility of discovering multiple biological signatures with similar predictive performance for a phenotype. However, their exploration is limited by low sample size and the exponential nature of the combinatorial search leading to high computational cost. To address these issues, we have developed an algorithm muSignAl (multiple signature algorithm) which selects multiple signatures with similar predictive performance while systematically bypassing the requirement of exploring all the combinations of features. We demonstrated the workflow of this algorithm with an example of proteomics dataset. muSignAl is applicable in various bioinformatics-driven explorations, such as understanding the relationship between multiple biological feature sets and phenotypes, and discovery and development of biomarker panels while providing the opportunity of optimising their development cost with the help of equally good multiple signatures. Source code of muSignAl is freely available at https://github.com/ShuklaLab/muSignAl .  相似文献   

10.
Mooney SD  Liang MH  DeConde R  Altman RB 《Proteins》2005,61(4):741-747
A primary challenge for structural genomics is the automated functional characterization of protein structures. We have developed a sequence-independent method called S-BLEST (Structure-Based Local Environment Search Tool) for the annotation of previously uncharacterized protein structures. S-BLEST encodes the local environment of an amino acid as a vector of structural property values. It has been applied to all amino acids in a nonredundant database of protein structures to generate a searchable structural resource. Given a query amino acid from an experimentally determined or modeled structure, S-BLEST quickly identifies similar amino acid environments using a K-nearest neighbor search. In addition, the method gives an estimation of the statistical significance of each result. We validated S-BLEST on X-ray crystal structures from the ASTRAL 40 nonredundant dataset. We then applied it to 86 crystallographically determined proteins in the protein data bank (PDB) with unknown function and with no significant sequence neighbors in the PDB. S-BLEST was able to associate 20 proteins with at least one local structural neighbor and identify the amino acid environments that are most similar between those neighbors.  相似文献   

11.
The superfamily of ferritin-like proteins has recently expanded to include a phylogenetically distinct class of proteins termed DPS-like (DPSL) proteins. Despite their distinct genetic signatures, members of this subclass share considerable similarity to previously recognized DPS proteins. Like DPS, these proteins are expressed in response to oxidative stress, form dodecameric cage-like particles, preferentially utilize H(2)O(2) in the controlled oxidation of Fe(2+), and possess a short N-terminal extension implicated in stabilizing cellular DNA. Given these extensive similarities, the functional properties responsible for the preservation of the DPSL signature in the genomes of diverse prokaryotes have been unclear. Here, we describe the crystal structure of a DPSL protein from the thermoacidophilic archaeon Sulfolobus solfataricus. Although the overall fold of the polypeptide chain and the oligomeric state of this protein are indistinguishable from those of authentic DPS proteins, several important differences are observed. First, rather than a ferroxidase site at the subunit interface, as is observed in all other DPS proteins, the ferroxidase site in SsDPSL is buried within the four-helix bundle, similar to bacterioferritin. Second, the structure reveals a channel leading from the exterior surface of SsDPSL to the bacterioferritin-like dimetal binding site, possibly allowing divalent cations and/or H(2)O(2) to access the active site. Third, a pair of cysteine residues unique to DPSL proteins is found adjacent to the dimetal binding site juxtaposed between the exterior surface of the protein and the active site channel. The cysteine residues in this thioferritin motif may play a redox active role, possibly serving to recycle iron at the ferroxidase center.  相似文献   

12.
Gulls are the primary hosts of H13 and H16 avian influenza viruses (AIVs). The molecular basis for this host restriction is only partially understood. In this study, amino acid sequences from Eurasian gull H13 and H16 AIVs and Eurasian AIVs (non H13 and H16) were compared to determine if specific signatures are present only in the internal proteins of H13 and H16 AIVs, using a bioinformatics approach. Amino acids identified in an initial analysis performed on 15 selected sequences were checked against a comprehensive set of AIV sequences retrieved from Genbank to verify them as H13 and H16 specific signatures. Analysis of protein similarities and prediction of subcellular localization signals were performed to search for possible functions associated with the confirmed signatures. H13 and H16 AIV specific signatures were found in all the internal proteins examined, but most were found in the non-structural protein 1 (NS1) and in the nucleoprotein. A putative functional signature was predicted to be present in the nuclear export protein. Moreover, it was predicted that the NS1 of H13 and H16 AIVs lack one of the nuclear localization signals present in NS1 of other AIV subtypes. These findings suggest that the signatures found in the internal proteins of H13 and H16 viruses are possibly related to host restriction.  相似文献   

13.
Horizontal DNA transfer is an important factor of evolution and participates in biological diversity. Unfortunately, the location and length of horizontal transfers (HTs) are known for very few species. The usage of short oligonucleotides in a sequence (the so-called genomic signature) has been shown to be species-specific even in DNA fragments as short as 1 kb. The genomic signature is therefore proposed as a tool to detect HTs. Since DNA transfers originate from species with a signature different from those of the recipient species, the analysis of local variations of signature along recipient genome may allow for detecting exogenous DNA. The strategy consists in (i) scanning the genome with a sliding window, and calculating the corresponding local signature (ii) evaluating its deviation from the signature of the whole genome and (iii) looking for similar signatures in a database of genomic signatures. A total of 22 prokaryote genomes are analyzed in this way. It has been observed that atypical regions make up ~6% of each genome on the average. Most of the claimed HTs as well as new ones are detected. The origin of putative DNA transfers is looked for among ~12000 species. Donor species are proposed and sometimes strongly suggested, considering similarity of signatures. Among the species studied, Bacillus subtilis, Haemophilus Influenzae and Escherichia coli are investigated by many authors and give the opportunity to perform a thorough comparison of most of the bioinformatics methods used to detect HTs.  相似文献   

14.
A 9000-Mr protein isolated from a 60% ethanolic extract of soybean (Glycine max) seeds has been characterized and fully sequenced. The protein consists of 80 amino acid residues with four disulfide bonds. It contains a large number of hydrophobic residues and lacks methionine, phenylalanine, tryptophan, lysine and histidine residues. The protein readily crystallizes from water but is quite soluble in aqueous organic solvents like 95% 1-propanol. It aggregates to form large molecules (above 80 kDa) under ordinary denaturing conditions, such as 6 M guanidine X HCl and 8 M urea. Sequence analysis showed that the amino-terminal four-fifths is extremely hydrophobic and most of the acidic residues exist as their amide forms, and only the carboxyl-terminal short segment is rather hydrophilic. A computer search for homology detected an unexpected similarity of this protein to rat prolactin; however, its significance could not be assessed and this protein appears to represent a hitherto unknown protein family. Although no biochemical activity could be detected, the existence in relatively high abundance (approx. 200 mg from 1 kg seeds) of this novel protein may suggest its physiological significance in the plant.  相似文献   

15.

Background  

Molecular signatures are sets of genes, proteins, genetic variants or other variables that can be used as markers for a particular phenotype. Reliable signature discovery methods could yield valuable insight into cell biology and mechanisms of human disease. However, it is currently not clear how to control error rates such as the false discovery rate (FDR) in signature discovery. Moreover, signatures for cancer gene expression have been shown to be unstable, that is, difficult to replicate in independent studies, casting doubts on their reliability.  相似文献   

16.
Identification of functional open reading frames in chloroplast genomes   总被引:7,自引:0,他引:7  
K H Wolfe  P M Sharp 《Gene》1988,66(2):215-222
We have used a rapid computer dot-matrix comparison method to identify all DNA regions which have been evolutionarily conserved between the completely sequenced chloroplast genomes of tobacco and a liverwort. Analysis of these regions reveals 74 homologous open reading frames (ORFs) which have been conserved as to length and amino acid sequence; these ORFs also have an excess of nucleotide substitutions at silent sites of codons. Since the nonfunctional parts of these genomes have become saturated with mutations and show no sequence similarity whatsoever, the homologous ORFs are almost certainly functional. A further four pairs of ORFs show homology limited to only a short part of their putative gene products. Amino acid sequence identities range between 50 and 99%; some chloroplast proteins are seen to be among the most slowly evolving of all known proteins. A search of the nucleotide and amino acid sequence databanks has revealed several previously unidentified genes in chloroplast sequences from other species, but no new homologies to prokaryotic genes.  相似文献   

17.
Aggregate signatures allow anyone to combine different signatures signed by different signers on different messages into a short signature. An ideal aggregate signature scheme is an identity-based aggregate signature (IBAS) scheme that supports full aggregation since it can reduce the total transmitted data by using an identity string as a public key and anyone can freely aggregate different signatures. Constructing a secure IBAS scheme that supports full aggregation in bilinear maps is an important open problem. Recently, Yuan et al. proposed such a scheme and claimed its security in the random oracle model under the computational Diffie-Hellman assumption. In this paper, we show that there is an efficient forgery on their IBAS scheme and that their security proof has a serious flaw.  相似文献   

18.
Mucosal transmission of the human immunodeficiency virus (HIV) results in a bottleneck in viral genetic diversity. Gnanakaran and colleagues used a computational strategy to identify signature amino acids at particular positions in Envelope that were associated either with transmitted sequences sampled very early in infection, or sequences sampled during chronic infection. Among the strongest signatures observed was an enrichment for the stable presence of histidine at position 12 at transmission and in early infection, and a recurrent loss of histidine at position 12 in chronic infection. This amino acid lies within the leader peptide of Envelope, a region of the protein that has been shown to influence envelope glycoprotein expression and virion infectivity. We show a strong association between a positively charged amino acid like histidine at position 12 in transmitted/founder viruses with more efficient trafficking of the nascent envelope polypeptide to the endoplasmic reticulum and higher steady-state glycoprotein expression compared to viruses that have a non-basic position 12 residue, a substitution that was enriched among viruses sampled from chronically infected individuals. When expressed in the context of other viral proteins, transmitted envelopes with a basic amino acid position 12 were incorporated at higher density into the virus and exhibited higher infectious titers than did non-signature envelopes. These results support the potential utility of using a computational approach to examine large viral sequence data sets for functional signatures and indicate the importance of Envelope expression levels for efficient HIV transmission.  相似文献   

19.
20.

Background

Highly parallel analysis of gene expression has recently been used to identify gene sets or ‘signatures’ to improve patient diagnosis and risk stratification. Once a signature is generated, traditional statistical testing is used to evaluate its prognostic performance. However, due to the dimensionality of microarrays, this can lead to false interpretation of these signatures.

Principal Findings

A method was developed to test batches of a user-specified number of randomly chosen signatures in patient microarray datasets. The percentage of random generated signatures yielding prognostic value was assessed using ROC analysis by calculating the area under the curve (AUC) in six public available cancer patient microarray datasets. We found that a signature consisting of randomly selected genes has an average 10% chance of reaching significance when assessed in a single dataset, but can range from 1% to ∼40% depending on the dataset in question. Increasing the number of validation datasets markedly reduces this number.

Conclusions

We have shown that the use of an arbitrary cut-off value for evaluation of signature significance is not suitable for this type of research, but should be defined for each dataset separately. Our method can be used to establish and evaluate signature performance of any derived gene signature in a dataset by comparing its performance to thousands of randomly generated signatures. It will be of most interest for cases where few data are available and testing in multiple datasets is limited.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号