首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure.  相似文献   

2.

Background  

Leucine-rich repeats are one of the more common modules found in proteins. The leucine-rich repeat consensus motif is LxxLxLxxNxLxxLxxLxxLxx- where the first 11–12 residues are highly conserved and the remainder of the repeat can vary in size Leucine-rich repeat proteins have been subdivided into seven subfamilies, none of which include members of the epidermal growth factor receptor or insulin receptor families despite the similarity between the 3D structure of the L domains of the type I insulin-like growth factor receptor and some leucine-rich repeat proteins.  相似文献   

3.

Background  

Inferences about protein function are often made based on sequence homology to other gene products of known activities. This approach is valuable for small families of conserved proteins but can be difficult to apply to large superfamilies of proteins with diverse function. In this study we looked at sequence homology between members of the DJ-1/ThiJ/PfpI superfamily, which includes a human protein of unclear function, DJ-1, associated with inherited Parkinson's disease.  相似文献   

4.

Background  

The kelch motif is an ancient and evolutionarily-widespread sequence motif of 44–56 amino acids in length. It occurs as five to seven repeats that form a β-propeller tertiary structure. Over 28 kelch-repeat proteins have been sequenced and functionally characterised from diverse organisms spanning from viruses, plants and fungi to mammals and it is evident from expressed sequence tag, domain and genome databases that many additional hypothetical proteins contain kelch-repeats. In general, kelch-repeat β-propellers are involved in protein-protein interactions, however the modest sequence identity between kelch motifs, the diversity of domain architectures, and the partial information on this protein family in any single species, all present difficulties to developing a coherent view of the kelch-repeat domain and the kelch-repeat protein superfamily. To understand the complexity of this superfamily of proteins, we have analysed by bioinformatics the complement of kelch-repeat proteins encoded in the human genome and have made comparisons to the kelch-repeat proteins encoded in other sequenced genomes.  相似文献   

5.

Background  

The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins.  相似文献   

6.

Background  

Protein phosphatase 1 (PP1) is involved in diverse cellular processes, and is targeted to substrates via interaction with many different protein binding partners. PP1 catalytic subunits (PP1c) fall into PP1α and PP1β subfamilies based on sequence analysis, however very few PP1c binding proteins have been demonstrated to discriminate between PP1α and PP1β.  相似文献   

7.

Background  

Detecting homology between remotely related protein families is an important problem in computational biology since the biological properties of uncharacterized proteins can often be inferred from those of homologous proteins. Many existing approaches address this problem by measuring the similarity between proteins through sequence or structural alignment. However, these methods do not exploit collective aspects of the protein space and the computed scores are often noisy and frequently fail to recognize distantly related protein families.  相似文献   

8.

Background

Predicting protein function from primary sequence is an important open problem in modern biology. Not only are there many thousands of proteins of unknown function, current approaches for predicting function must be improved upon. One problem in particular is overly-specific function predictions which we address here with a new statistical model of the relationship between protein sequence similarity and protein function similarity.

Methodology

Our statistical model is based on sets of proteins with experimentally validated functions and numeric measures of function specificity and function similarity derived from the Gene Ontology. The model predicts the similarity of function between two proteins given their amino acid sequence similarity measured by statistics from the BLAST sequence alignment algorithm. A novel aspect of our model is that it predicts the degree of function similarity shared between two proteins over a continuous range of sequence similarity, facilitating prediction of function with an appropriate level of specificity.

Significance

Our model shows nearly exact function similarity for proteins with high sequence similarity (bit score >244.7, e-value >1e−62, non-redundant NCBI protein database (NRDB)) and only small likelihood of specific function match for proteins with low sequence similarity (bit score <54.6, e-value <1e−05, NRDB). For sequence similarity ranges in between our annotation model shows an increasing relationship between function similarity and sequence similarity, but with considerable variability. We applied the model to a large set of proteins of unknown function, and predicted functions for thousands of these proteins ranging from general to very specific. We also applied the model to a data set of proteins with previously assigned, specific functions that were electronically based. We show that, on average, these prior function predictions are more specific (quite possibly overly-specific) compared to predictions from our model that is based on proteins with experimentally determined function.  相似文献   

9.

Background  

Concerted evolution occurs in multigene families and is characterized by stretches of homogeneity and higher sequence similarity between paralogues than between orthologues. Here we identify human gene pairs that have undergone concerted evolution, caused by ongoing gene conversion, since at least the human-mouse divergence. Our strategy involved the identification of duplicated genes with greater similarity within a species than between species. These genes were required to be present in multiple mammalian genomes, suggesting duplication early in mammalian divergence. To eliminate genes that have been conserved due to strong purifying selection, our analysis also required at least one intron to have retained high sequence similarity between paralogues.  相似文献   

10.

Background  

Sequence related families of genes and proteins are common in bacterial genomes. In Escherichia coli they constitute over half of the genome. The presence of families and superfamilies of proteins suggest a history of gene duplication and divergence during evolution. Genome encoded protein families, their size and functional composition, reflect metabolic potentials of the organisms they are found in. Comparing protein families of different organisms give insight into functional differences and similarities.  相似文献   

11.

Background  

Geminiviruses (family Geminiviridae) are small single-stranded (ss) DNA viruses infecting plants. Their virion morphology is unique in the known viral world – two incomplete T = 1 icosahedra are joined together to form twinned particles. Geminiviruses utilize a rolling-circle mode to replicate their genomes. A limited sequence similarity between the three conserved motifs of the rolling-circle replication initiation proteins (RCR Reps) of geminiviruses and plasmids of Gram-positive bacteria allowed Koonin and Ilyina to propose that geminiviruses descend from bacterial replicons.  相似文献   

12.

Background  

New techniques for determining relationships between biomolecules of all types – genes, proteins, noncoding DNA, metabolites and small molecules – are now making a substantial contribution to the widely discussed explosion of facts about the cell. The data generated by these techniques promote a picture of the cell as an interconnected information network, with molecular components linked with one another in topologies that can encode and represent many features of cellular function. This networked view of biology brings the potential for systematic understanding of living molecular systems.  相似文献   

13.

Background  

Qualitative pathogen resistance in both dicotyledenous and monocotyledonous plants has been attributed to the action of resistance (R) genes, including those encoding nucleotide binding site – leucine rich repeat (NBS-LRR) proteins and receptor-like kinase enzymes. This study describes the large-scale isolation and characterisation of candidate R genes from perennial ryegrass. The analysis was based on the availability of an expressed sequence tag (EST) resource and a functionally-integrated bioinformatics database.  相似文献   

14.

Background  

Protein structures have conserved features – motifs, which have a sufficient influence on the protein function. These motifs can be found in sequence as well as in 3D space. Understanding of these fragments is essential for 3D structure prediction, modelling and drug-design. The Protein Data Bank (PDB) is the source of this information however present search tools have limited 3D options to integrate protein sequence with its 3D structure.  相似文献   

15.

Background  

We present Pegasys – a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools.  相似文献   

16.

Background  

Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification.  相似文献   

17.

Background  

Probabilistic models for sequence comparison (such as hidden Markov models and pair hidden Markov models for proteins and mRNAs, or their context-free grammar counterparts for structural RNAs) often assume a fixed degree of divergence. Ideally we would like these models to be conditional on evolutionary divergence time.  相似文献   

18.

Background  

Most non-coding RNA families exert their function by means of a conserved, common secondary structure. The Rfam data base contains more than five hundred structurally annotated RNA families. Unfortunately, searching for new family members using covariance models (CMs) is very time consuming. Filtering approaches that use the sequence conservation to reduce the number of CM searches, are fast, but it is unknown to which sacrifice.  相似文献   

19.

Background  

The mechanism by which duplicate genes originate – whether by duplication of a whole genome or of a genomic segment – influences their genetic fates. To study events that trigger duplicate gene persistence after whole genome duplication in vertebrates, we have analyzed molecular evolution and expression of hundreds of persistent duplicate gene pairs in allopolyploid clawed frogs (Xenopus and Silurana). We collected comparative data that allowed us to tease apart the molecular events that occurred soon after duplication from those that occurred later on. We also quantified expression profile divergence of hundreds of paralogs during development and in different tissues.  相似文献   

20.

Background  

Repeat-induced point mutation (RIP) is a fungal-specific genome defence mechanism that alters the sequences of repetitive DNA, thereby inactivating coding genes. Repeated DNA sequences align between mating and meiosis and both sequences undergo C:G to T:A transitions. In most fungi these transitions preferentially affect CpA di-nucleotides thus altering the frequency of certain di-nucleotides in the affected sequences. The majority of previously published in silico analyses were limited to the comparison of ratios of pre- and post-RIP di-nucleotides in putatively RIP-affected sequences – so-called RIP indices. The analysis of RIP is significantly more informative when comparing sequence alignments of repeated sequences. There is, however, a dearth of bioinformatics tools available to the fungal research community for alignment-based RIP analysis of repeat families.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号