首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Fibrous proteins found in natural materials such as silk fibroins, spider silks, and viral spikes increasingly serve as a source of inspiration for the design of novel, artificial fibrous materials. The fiber protein from the adenovirus has previously served as a model for the design of artificial, self-assembling fibers. The fibrous shaft of this protein consists of 15-amino-acid sequence repeats that fold into a triple β-spiral motif in their native context. Recombinant proteins based on multimers of simplified consensus shaft repeats were previously reported to form self-assembling fibrils from which filaments could be spun. Here, we describe the structural characterization of these fibrils; X-ray fiber diffraction, Raman spectroscopy, and Congo Red binding strongly suggest an amyloid-type structure for these fibrils, with β-strands arranged perpendicular to the fibril axis. This amyloid structure is distinct from the native β-spiral fold, and similar to amyloid structures formed by short, synthetic peptides corresponding to shaft sequences. We discuss implications for the rational design of novel fibrous materials, based on crystal structure information and knowledge of folding and assembly pathways of natural fibrous proteins.  相似文献   

2.
3.
Cytosine methylation at symmetrical CpG and CpNpG sequences plays a key role in the epigenetic control of plant growth and development; yet, the way by which the methylation signal is interpreted into a functional state has not been elucidated. In animals, the methylation signal is recognized by methyl-CpG-binding domain (MBD) proteins that specifically bind methylated CpG dinucleotides. In Arabidopsis thaliana, 12 putative MBD proteins were identified and classified into seven subclasses. Here, we characterized six MBD proteins representing four subclasses (II, III, IV, and VI) of the Arabidopsis MBD family. We found that AtMBD7 (subclass VI), a unique protein containing a double MBD motif, as well as AtMBD5 and AtMBD6 (subclass IV), bind specifically symmetrically methylated CpG sites. The MBD motif derived from AtMBD6, but not from AtMBD2, was sufficient for binding methylated CpG dinucleotides. AtMBD6 precipitated histone deacetylase (HDAC) activity from the leaf nuclear extract. The examined AtMBD proteins neither bound methylated CpNpG sequences nor did they display DNA demethylase activity. Our results suggest that AtMBD5, AtMBD6, and AtMBD7 are likely to function in Arabidopsis plants as mediators of the CpG methylation, linking DNA methylation-induced gene silencing with histone deacetylation.  相似文献   

4.
In this paper, we present the protein classification based on structural trees (PCBOST). This is a novel hierarchical classification of proteins that is primarily based on similarity of overall folds of proteins as well as on the modeled folding pathways of proteins. Amino acid sequences, functions of proteins and their evolutionary relationship are not taken into account in this classification. To date the database includes 3847 proteins and domains grouped into six categories having structural similarity and forming six structural trees (total 10,547 PDB-entries). The work on extension of the database and construction of novel structural trees is in progress. The service is free for all users and available at the URL <http://strees.protres.ru/>.  相似文献   

5.
Gromiha MM  Suwa M 《Proteins》2006,63(4):1031-1037
Discriminating outer membrane proteins (OMPs) from other folding types of globular and membrane proteins is an important task both for identifying OMPs from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this work, we have analyzed the performance of different methods, based on Bayes rules, logistic functions, neural networks, support vector machines, decision trees, etc. for discriminating OMPs. We found that most of the machine learning techniques discriminate OMPs with similar accuracy. The neural network-based method could discriminate the OMPs from other proteins [globular/transmembrane helical (TMH)] at the fivefold cross-validation accuracy of 91.0% in a dataset of 1,088 proteins. The accuracy of discriminating globular proteins is 88.8% and that of TMH proteins is 93.7%. Further, the neural network method is tested with globular proteins belonging to 30 different folding types and it could successfully exclude 95% of the considered proteins. The proteins with SAM domain such as knottins, rubredoxin, and thioredoxin folds are eliminated with 100% accuracy. These accuracy levels are comparable to or better than other methods in the literature. We suggest that this method could be effectively used to discriminate OMPs and for detecting OMPs in genomic sequences.  相似文献   

6.
Hornet silk, a fibrous protein in the cocoon produced by the larva of the vespa, is composed of four major proteins. In this study, we constructed silk-gland cDNA libraries from larvae of the hornet Vespa simillima xanthoptera Cameron and deduced the full amino acid sequences of the four hornet silk proteins, which were named Vssilk 1-4 in increasing order of molecular size. Portions of the amino acid sequences of the four proteins were confirmed by Matrix-assisted laser desorption/ionization-time of flight/mass spectrometry (MALDI-TOF/MS) and N-terminal protein sequencing. The primary sequences of the four Vssilk proteins (1-4) were highly divergent, but the four proteins had some common properties: (i) the amino acid compositions of all four proteins were similar to each other in that the well-defined and characteristic repetitive patterns present in most of the known silk proteins were absent; and (ii) the characteristics of the amino acid sequences of the four proteins were also similar in that Ser-rich structures such as sericin were localized at both ends of the chains and Ala-rich structures such as fibroin were found in the center. These characteristic primary structures might be responsible for the coexisting alpha-helix and beta-sheet conformations that make up the unique secondary structure of hornet silk proteins in the native state. Because heptad repeat sequences of hydrophobic residue are present in the Ala-rich region, we believe that the Ala-rich region of hornet silk predominantly forms a coiled coil with an alpha-helix conformation.  相似文献   

7.
Amyloid fibrils are fibrous beta-structures that derive from abnormal folding and assembly of peptides and proteins. Despite a wealth of structural studies on amyloids, the nature of the amyloid structure remains elusive; possible connections to natural, beta-structured fibrous motifs have been suggested. In this work we focus on understanding amyloid structure and formation from sequences of a natural, beta-structured fibrous protein. We show that short peptides (25 to 6 amino acids) corresponding to repetitive sequences from the adenovirus fiber shaft have an intrinsic capacity to form amyloid fibrils as judged by electron microscopy, Congo Red binding, infrared spectroscopy, and x-ray fiber diffraction. In the presence of the globular C-terminal domain of the protein that acts as a trimerization motif, the shaft sequences adopt a triple-stranded, beta-fibrous motif. We discuss the possible structure and arrangement of these sequences within the amyloid fibril, as compared with the one adopted within the native structure. A 6-amino acid peptide, corresponding to the last beta-strand of the shaft, was found to be sufficient to form amyloid fibrils. Structural analysis of these amyloid fibrils suggests that perpendicular stacking of beta-strand repeat units is an underlying common feature of amyloid formation.  相似文献   

8.
Worldwide structural genomics projects are increasing structure coverage of sequence space but have not significantly expanded the protein structure space itself (i.e., number of unique structural folds) since 2007. Discovering new structural folds experimentally by directed evolution and random recombination of secondary-structure blocks is also proved rarely successful. Meanwhile, previous computational efforts for large-scale mapping of protein structure space are limited to simple model proteins and led to an inconclusive answer on the completeness of the existing observed protein structure space. Here, we build novel protein structures by extending naturally occurring circular (single-loop) permutation to multiple loop permutations (MLPs). These structures are clustered by structural similarity measure called TM-score. The computational technique allows us to produce different structural clusters on the same naturally occurring, packed, stable core but with alternatively connected secondary-structure segments. A large-scale MLP of 2936 domains from structural classification of protein domains reproduces those existing structural clusters (63%) mostly as hubs for many nonredundant sequences and illustrates newly discovered novel clusters as islands adopted by a few sequences only. Results further show that there exist a significant number of novel potentially stable clusters for medium-size or large-size single-domain proteins, in particular, > 100 amino acid residues, that are either not yet adopted by nature or adopted only by a few sequences. This study suggests that MLP provides a simple yet highly effective tool for engineering and design of novel protein structures (including naturally knotted proteins). The implication of recovering new-fold targets from critical assessment of structure prediction techniques (CASP) by MLP on template-based structure prediction is also discussed. Our MLP structures are available for download at the publication page of the Web site http://sparks.informatics.iupui.edu.  相似文献   

9.
G-quartets are square planar arrangements of four guanine bases, which can form extraordinarily stable stacks when present in nucleic acid sequences. Such G-quadruplex structures were long regarded as an in vitro phenomenon, but the widespread presence of suitable sequences in genomes and the identification of proteins that stabilize, modify or resolve these nucleic acid structures have provided circumstantial evidence for their physiological relevance. The therapeutic potential of small molecules that can stabilize or disrupt G-quadruplex structures has invigorated the field in recent years. Here we review some of the key observations that support biological functions for G-quadruplex DNA as well as the techniques and tools that have enabled researchers to probe these structures and their interactions with proteins and small molecules.  相似文献   

10.
For a minimalist model of protein folding, which we introduced recently, we investigate various methods to obtain folding sequences. A detailed study of random sequences shows that, for this model, such sequences usually do not fold to their ground states during simulations. Straightforward techniques for the construction of folding sequences, based solely on the target structure, fail. We describe in detail an optimization algorithm, based on genetic algorithms, for the “simulated breeding” of folding sequences in this model. We find that, for any target structure studied, there is not only a single folding sequence but a patch of sequences in sequence space that fold to this structure. In addition, we show that, much as in real proteins, nonhomologous sequences may fold to the same target structure. © 1997 John Wiley & Sons, Inc.  相似文献   

11.
MOTIVATION: Tandem repeats (TRs) are associated with human disease, play a role in evolution and are important in regulatory processes. Despite their importance, locating and characterizing these patterns within anonymous DNA sequences remains a challenge. In part, the difficulty is due to imperfect conservation of patterns and complex pattern structures. We study recognition algorithms for two complex pattern structures: variable length tandem repeats (VLTRs) and multi-period tandem repeats (MPTRs). RESULTS: We extend previous algorithmic research to a class of regular tandem repeats (RegTRs). We formally define RegTRs, as well as two important subclasses: VLTRs and MPTRs. We present algorithms for identification of TRs in these classes. Furthermore, our algorithms identify degenerate VLTRs and MPTRs: repeats containing substitutions, insertions and deletions. To illustrate our work, we present results of our analysis for two difficult regions in cattle and human data which reflect practical occurrences of these subclasses in GenBank sequence data. In addition, we show the applicability of our algorithmic techniques for identifying Alu sequences, gene clusters and other distant regions of similarity. We illustrate this with an example from yeast chromosome I.  相似文献   

12.
The "thread keratins (TK)" alpha and gamma so far have been considered highly specialized intermediate filament (IF) proteins restricted to hagfish. From lamprey, we now have sequenced five novel IF proteins closely related to TKalpha and TKgamma, respectively. Moreover, we have detected corresponding sequences in EST and genomic databases of teleosts and amphibians. The structure of the TKalpha genes and the positions of their deduced amino acid sequences in a phylogenetic tree clearly support their classification as type II keratins. The genes encoding TKgamma show a structure typical for type III IF proteins, whereas their positions in phylogenetic trees favor a close relationship to the type I keratins. Considering that most keratin-like sequences detected in the lancelet also exhibit a gene structure typical for type III IF proteins, it seems likely that the keratin gene(s) originated from an ancient type III IF protein gene. According to EST analyses, the expression of the thread keratins in teleost fish and amphibians may be particularly restricted to larval stages, which, in conjunction with the observed absence of TKalpha and TKgamma genes in any of the available Amniota databases, indicates a thread keratin function closely related to larval development in an aquatic environment.  相似文献   

13.
14.

Background  

Remote homology detection is a hard computational problem. Most approaches have trained computational models by using either full protein sequences or multiple sequence alignments (MSA), including all positions. However, when we deal with proteins in the "twilight zone" we can observe that only some segments of sequences (motifs) are conserved. We introduce a novel logical representation that allows us to represent physico-chemical properties of sequences, conserved amino acid positions and conserved physico-chemical positions in the MSA. From this, Inductive Logic Programming (ILP) finds the most frequent patterns (motifs) and uses them to train propositional models, such as decision trees and support vector machines (SVM).  相似文献   

15.
BackgroundWoody plants (trees and shrubs) play an important role in terrestrial ecosystems, but their size and longevity make them difficult subjects for traditional experiments. In the last 20 years functional–structural plant models (FSPMs) have evolved: they consider the interplay between plant modular structure, the immediate environment and internal functioning. However, computational constraints and data deficiency have long been limiting factors in a broader application of FSPMs, particularly at the scale of forest communities. Recently, terrestrial laser scanning (TLS), has emerged as an invaluable tool for capturing the 3-D structure of forest communities, thus opening up exciting opportunities to explore and predict forest dynamics with FSPMs.ScopeThe potential synergies between TLS-derived data and FSPMs have yet to be fully explored. Here, we summarize recent developments in FSPM and TLS research, with a specific focus on woody plants. We then evaluate the emerging opportunities for applying FSPMs in an ecological and evolutionary context, in light of TLS-derived data, with particular consideration of the challenges posed by scaling up from individual trees to whole forests. Finally, we propose guidelines for incorporating TLS data into the FSPM workflow to encourage overlap of practice amongst researchers.ConclusionsWe conclude that TLS is a feasible tool to help shift FSPMs from an individual-level modelling technique to a community-level one. The ability to scan multiple trees, of multiple species, in a short amount of time, is paramount to gathering the detailed structural information required for parameterizing FSPMs for forest communities. Conventional techniques, such as repeated manual forest surveys, have their limitations in explaining the driving mechanisms behind observed patterns in 3-D forest structure and dynamics. Therefore, other techniques are valuable to explore how forests might respond to environmental change. A robust synthesis between TLS and FSPMs provides the opportunity to virtually explore the spatial and temporal dynamics of forest communities.  相似文献   

16.
In this article, we present some simple yet effective statistical techniques for analysing and comparing large DNA sequences. These techniques are based on frequency distributions of DNA words in a large sequence, and have been packaged into a software called SWORDS. Using sequences available in public domain databases housed in the Internet, we demonstrate how SWORDS can be conveniently used by molecular biologists and geneticists to unmask biologically important features hidden in large sequences and assess their statistical significance.  相似文献   

17.
High-resolution structure determination of soluble globular proteins relies heavily on x-ray crystallography techniques. Such an approach is often ineffective for investigations into the structure of fibrous proteins as these proteins generally do not crystallize. Thus investigations into fibrous protein structure have relied on less direct methods such as x-ray fiber diffraction and circular dichroism. Ultraviolet linear dichroism has the potential to provide additional information on the structure of such biomolecular systems. However, existing systems are not optimized for the requirements of fibrous proteins. We have designed and built a low-volume (200 μL), low-wavelength (down to 180 nm), low-pathlength (100 μm), high-alignment flow-alignment system (couette) to perform ultraviolet linear dichroism studies on the fibers formed by a range of biomolecules. The apparatus has been tested using a number of proteins for which longer wavelength linear dichroism spectra had already been measured. The new couette cell has also been used to obtain data on two medically important protein fibers, the all-β-sheet amyloid fibers of the Alzheimer's derived protein Aβ and the long-chain assemblies of α1-antitrypsin polymers.  相似文献   

18.
Homology-derived secondary structure of proteins (HSSP) is a well-known database of multiple sequence alignments (MSAs) which merges information of protein sequences and their three-dimensional structures. It is available for all proteins whose structure is deposited in the PDB. It is also used by STING and (Java)Protein Dossier to calculate and present relative entropy as a measure of the degree of conservation for each residue of proteins whose structure has been solved and deposited in the PDB. However, if the STING and (Java)Protein Dossier are to provide support for analysis of protein structures modeled in computers or being experimentally solved but not yet deposited in the PDB, then we need a new method for building alignments having a flavor of HSSP alignments (myMSAr). The present study describes a new method and its corresponding databank (SH2QS--database of sequences homologue to the query [structure-having] sequence). Our main interest in making myMSAr was to measure the degree of residue conservation for a given query sequence, regardless of whether it has a corresponding structure deposited in the PDB. In this study, we compare the measurement of residue conservation provided by corresponding alignments produced by HSSP and SH2QS. As a case study, we also present two biologically relevant examples, the first one highlighting the equivalence of analysis of the degree of residue conservation by using HSSP or SH2QS alignments, and the second one presenting the degree of residue conservation for a structure modeled in a computer, which , as a consequence, does not have an alignment reported by HSSP.  相似文献   

19.
Structural genomics projects represent major undertakings that will change our understanding of proteins. They generate unique datasets that, for the first time, present a standardized view of proteins in terms of their physical and chemical properties. By analyzing these datasets here, we are able to discover correlations between a protein's characteristics and its progress through each stage of the structural genomics pipeline, from cloning, expression, purification, and ultimately to structural determination. First, we use tree-based analyses (decision trees and random forest algorithms) to discover the most significant protein features that influence a protein's amenability to high-throughput experimentation. Based on this, we identify potential bottlenecks in various stages of the structural genomics process through specialized "pipeline schematics". We find that the properties of a protein that are most significant are: (i.) whether it is conserved across many organisms; (ii). the percentage composition of charged residues; (iii). the occurrence of hydrophobic patches; (iv). the number of binding partners it has; and (v). its length. Conversely, a number of other properties that might have been thought to be important, such as nuclear localization signals, are not significant. Thus, using our tree-based analyses, we are able to identify combinations of features that best differentiate the small group of proteins for which a structure has been determined from all the currently selected targets. This information may prove useful in optimizing high-throughput experimentation. Further information is available from http://mining.nesg.org/.  相似文献   

20.
Abstract

The exact role of high density lipoprotein in atheroprotection is not well understood yet due to its complex nature; it comprises more than ten subclasses that vary in size, composition, and function. Isolation and characterization of these subclasses is an important step for further studies addressing their functions in health and disease. In this work, we present a novel method that is relatively simple and efficient for isolation of high density lipoprotein subclasses. The method depends on fractional filtration of the subclasses through a preformed gel membrane system under the effect of an electric field, where the stepwise isolation of the subclasses depends on differences in their rates of migration in polyacrylamide gel. Using this design, we were able to isolate seven high density lipoprotein subclasses with relative molecular masses of 42,000–50,000; 71,000; 103,000; 124,000; 150,000; 182,000; and 219,000. All the subclasses contained apolipoprotein A-I, phosphatidylcholine, sphingomyelin, free cholesterol, esterified cholesterol, and triacylglycerols. Some fractions of some samples contained the apolipoproteins A-II, C-I, C-II, C-III, and E. A subclass of molecular mass of 106,000 was identified and isolated from a healthy young subject that contained albumin and apoA-I with some free and esterified cholesterol, but with no triacylglycerols. This electrofiltration technique offers a novel tool for isolating pure native high density lipoprotein subclasses in a concentrated form that can be used directly for detailed studies of their physicochemical and physiological properties.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号