首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Anan'ko GG 《Genetika》2002,38(4):554-567
The distribution of paralogous domains in relation to the boundaries of functional systems (FSs) is examined. It was found that the frequencies of particular domain types in genes for the hemostasis and complement FSs by far exceeded the frequencies expected on assumption of their random distribution in the genome, i.e., the domains were not randomly distributed in relation to the FS boundaries. For instance, it was shown that approximately 50% of the total mRNA of genes for the hemostasis and complement FSs encodes 20 domain types repeated on average 2.7 (from 2 to 115) times. Thus, the present structure of the FS associations plays a key role in the formation of new associations in the system evolution. Possible causes and mechanisms of the accumulation of paralogous genes and domains in these systems are discussed. The distribution asymmetry may be explained by the systemic character of the organization (system connectivity). Since any structural innovation must be included in the scheme of the present associations, the new protein must contain at least one functional site complementary to sites of the molecules already functioning in the system. The mechanism of preference of "own" domains probably consists in fixation via selection of the shortest among many alternative possible formation pathways of the new functional structure. This mechanism must promote the accumulation in the FS of copies of already functioning structures (genes, domains) that can relatively rapidly adapt for performing the new function.  相似文献   

2.
Current methods for identification of domains within protein sequences require either structural information or the identification of homologous domain sequences in different sequence contexts. Knowledge of structural domain boundaries is important for fold recognition experiments and structural determination by X-ray crystallography or nuclear magnetic resonance spectroscopy using the divide-and-conquer approach. Here, a new and conceptually simple method for the identification of structural domain boundaries in multiple protein sequence alignments is presented. Analysis of covariance at positions within the alignment is first used to predict 3D contacts. By the nature of the domain as an independent folding unit, inter-domain predicted contacts are fewer than intra-domain predicted contacts. By analysing all possible domain boundaries and constructing a smoothed profile of predicted contact density (PCD), true structural domain boundaries are predicted as local profile minima associated with low PCD. A training data set is constructed from 52 non-homologous two-domain protein sequences of known 3D structure and used to determine optimal parameters for the profile analysis. The alignments in the training data set contained 48 +/- 17 (mean +/- SD) sequences and lengths of 257 +/- 121 residues. Of the 47 alignments yielding predictions, 35% of true domain boundaries are predicted to within 15 amino acids by the local profile minimum with the lowest profile value. Including predictions from the second- and third-lowest local minima increases the correct domain boundary coverage to 60%, whereas the lowest five local minima cover 79% of correct domain boundaries. Through further profile analysis, criteria are presented which reliably identify subsets of more accurate predictions. Retrospective analysis of CASP3 targets shows predictions of sufficient accuracy to enable dramatically improved fold recognition results. Finally, a prediction is made for geminivirus AL1 protein which is in full agreement with biochemical data, yielding a plausible, novel threading result.  相似文献   

3.
A database of 452 two-domain proteins with less than 25% homology was constructed. One half of the database was used to obtain statistics on the appearance of amino acid residues at domain boundaries. Small and hydrophilic residues (proline, glycine, asparagine, glutamic acid, arginine, etc.) occurred more often at domain boundaries than in total proteins. Hydrophobic residues (tryptophan, methionine, phenylalanine, etc.) were rarer at domain boundaries than in total proteins. Probability scales of amino acid appearance in boundary-flanking regions were constructed with these statistics and used to predict the domain boundaries in proteins of the other half of the database. The probability scale obtained by averaging the appearance of amino acids over an 8-residue region (±4 residues from the real domain boundaries) yielded the best results: domain boundaries were predicted within 40 residues of the real boundary in 57% of proteins and within 20 residues of the real boundary in 41% of proteins. The probability scale was used to predict the domain boundaries in proteins with unknown structures (CASP6).  相似文献   

4.
Structure of the gene for human coagulation factor V.   总被引:22,自引:0,他引:22  
L D Cripe  K D Moore  W H Kane 《Biochemistry》1992,31(15):3777-3785
Activated factor V (Va) serves as an essential protein cofactor for the conversion of prothrombin to thrombin by factor Xa. Analysis of the factor V cDNA indicates that the protein contains several types of internal repeats with the following domain structure: A1-A2-B-A3-C1-C2. In this report we describe the isolation and characterization of genomic DNA coding for human factor V. The factor V gene contains 25 exons which range in size from 72 to 2820 bp. The structure of the gene for factor V is similar to the previously characterized gene for factor VIII. Based on the aligned amino acid sequences of the two proteins, 21 of the 24 intron-exon boundaries in the factor V gene occur at the same location as in the factor VIII gene. In both genes, the junctions of the A1-A2 and A2-A3 domains are each encoded by a single exon. In contrast, the boundaries between domains A3-C1 and C1-C2 occur at intron-exon boundaries, which is consistent with evolution through domain duplication and exon shuffling. The connecting region or B domain of factor V is encoded by a single large exon of 2820 bp. The corresponding exon of the factor VIII gene contains 3106 bp. The 5' and 3' ends of both of these exons encode sequences homologous to the carboxyl-terminal end of domain A2 and the amino-terminal end of domain A3 in ceruloplasmin. There is otherwise no homology between the B domain exons.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

5.
Protein domain prediction is often the preliminary step in both experimental and computational protein research. Here we present a new method to predict the domain boundaries of a multidomain protein from its amino acid sequence using a fuzzy mean operator. Using the nr-sequence database together with a reference protein set (RPS) containing known domain boundaries, the operator is used to assign a likelihood value for each residue of the query sequence as belonging to a domain boundary. This procedure robustly identifies contiguous boundary regions. For a dataset with a maximum sequence identity of 30%, the average domain prediction accuracy of our method is 97% for one domain proteins and 58% for multidomain proteins. The presented model is capable of using new sequence/structure information without re-parameterization after each RPS update. When tested on a current database using a four year old RPS and on a database that contains different domain definitions than those used to train the models, our method consistently yielded the same accuracy while two other published methods did not. A comparison with other domain prediction methods used in the CASP7 competition indicates that our method performs better than existing sequence-based methods.  相似文献   

6.
Time-correlated atomic motions were used to characterize protein domain boundaries from atomic coordinates generated by molecular dynamics simulations. A novel application of the dynamical cross-correlation matrix (DCCM) analysis tool was used to help identify putative protein domains. In implementing this new approach, several DCCM maps were calculated, each using a different coordinate reference frame from which protein domain boundaries and protein domain residue constituents could be identified. Cytochrome P450BM-3, from Bacillus megaterium, was used as the model protein in this study. The analyses indicated that the simulated protein comprises three distinct domain regions; in contrast, only two protein domains were identified in the original crystal structure report. Specifically, the DCCM analyses showed that the F-G helix region was a separate domain entity and not a part of the alpha domain, as previously designated. The simulations demonstrated that the domain motions of the F-G helix region effected both the size and shape of the enzyme active site, and that the dynamics of the F-G helix domain could possibly control access of substrate to the binding pocket.  相似文献   

7.
In this article, we present a de novo method for predicting protein domain boundaries, called OPUS-Dom. The core of the method is a novel coarse-grained folding method, VECFOLD, which constructs low-resolution structural models from a target sequence by folding a chain of vectors representing the predicted secondary-structure elements. OPUS-Dom generates a large ensemble of folded structure decoys by VECFOLD and labels the domain boundaries of each decoy by a domain parsing algorithm. Consensus domain boundaries are then derived from the statistical distribution of the putative boundaries and three empirical sequence-based domain profiles. OPUS-Dom generally outperformed several state-of-the-art domain prediction algorithms over various benchmark protein sets. Even though each VECFOLD-generated structure contains large errors, collectively these structures provide a more robust delineation of domain boundaries. The success of OPUS-Dom suggests that the arrangement of protein domains is more a consequence of limited coordination patterns per domain arising from tertiary packing of secondary-structure segments, rather than sequence-specific constraints.  相似文献   

8.
We have created a database of two-domain proteins with homology less than 25% (452 proteins). Based on one half of this set of proteins statistics of appearance of amino acid residues on the domain boundaries of multiple domain proteins has been obtained. Small and hydrophilic amino acids (proline, glycine, asparagine, glutamic acid, arginine and others) appear on the domain boundaries more often than in the whole protein. Opposite, hydrophobic amino acid residues (tryptophane, methionine, phenylalanine and others) appear on the domain boundaries more rarely. The obtained scales of the appearance of amino acid residues on the boundary regions from the statistics have been used for calculation of domain boundaries in the proteins of the second half of the database. The probability scale obtained by averaging the appearance of amino acid residues on the domain boundary region including 8 residues (+/-4 residues from the real domain boundary) gives the best result: for 57% of proteins the predicted boundary was closer than 40 residues to the boundary assigned from three-dimensional structures, for 41% it was closer than 20 residues from the real boundary. The probability scale was used to predict domain boundaries for proteins with unknown three-dimensional structure (international competition CASP6).  相似文献   

9.
We have examined the structure of S-layers isolated from Sulfolobus acidocaldarius using atomic force microscopy (AFM) and transmission electron microscopy (TEM). From the AFM images, we were able to directly observe individual dimers of the crystal, defects in the crystal structure, and twin boundaries. We have identified two types of boundaries, one defined by a mirror plane and the other by a glide plane. This work shows that twin boundaries are highly structured regions that are directly related to the organization of units within each crystal domain. Projection maps from TEM images have shown that there are significant differences in the final average maps has allowed us to relate high magnification views obtained by AFM to the relatively high resolution information obtained by electron microscopy and image processing.  相似文献   

10.
The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (+/-20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed.  相似文献   

11.
Domain size distributions can predict domain boundaries   总被引:8,自引:0,他引:8  
MOTIVATION: The sizes of protein domains observed in the 3D-structure database follow a surprisingly narrow distribution. Structural domains are furthermore formed from a single-chain continuous segment in over 80% of instances. These observations imply that some choices of domain boundaries on an otherwise uncharacterized sequence are more likely than others, based solely on the size and segment number of predicted domains. This property might be used to guess the locations of protein domain boundaries. RESULTS: To test this possibility we enumerate putative domain boundaries and calculate their relative likelihood under a probability model that considers only the size and segment number of predicted domains. We ask, in a cross-validated test using sequences with known 3D structure, whether the most likely guesses agree with the observed domain structure. We find that domain boundary predictions are surprisingly successful for sequences up to 400 residues long and that guessing domain boundaries in this way can improve the sensitivity of threading analysis.  相似文献   

12.
The community of an individual: implications for the community concept   总被引:1,自引:0,他引:1  
V. Thomas Parker 《Oikos》2004,104(1):27-34
The concept of the ecological community is examined from the perspective of its criteria and domain. The multiple definitions and uses of this concept indicate a variety of scales and approaches. In this paper, a core definition of the minimal criteria and domain is proposed. Using those criteria, a model of the ecological community is developed based on a focal individual and its interactions with other individuals. In order to increase the scale of the domain of this approach, additional criteria are required. This model is used to explore characteristics of the minimum domain and larger scales of the community concept. The structure that emerges emphasizes context dependency and the potential for indeterminacy for most types of interactions. A prominent historical argument, the nature of boundaries between communities, has no relevance in this model.  相似文献   

13.
George RA  Heringa J 《Proteins》2002,48(4):672-681
Protein sequences containing more than one structural domain are problematic when used in homology searches where they can either stop an iterative database search prematurely or cause an explosion of a search to common domains. We describe a method, DOMAINATION, that infers domains and their boundaries in a query sequence from local gapped alignments generated using PSI-BLAST. Through a new technique to recognize domain insertions and permutations, DOMAINATION submits delineated domains as successive database queries in further iterative steps. Assessed over a set of 452 multidomain proteins, the method predicts structural domain boundaries with an overall accuracy of 50% and improves finding distant homologies by 14% compared with PSI-BLAST. DOMAINATION is available as a web based tool at http://mathbio.nimr.mrc.ac.uk, and the source code is available from the authors upon request.  相似文献   

14.
The identification and annotation of protein domains provides a critical step in the accurate determination of molecular function. Both computational and experimental methods of protein structure determination may be deterred by large multi-domain proteins or flexible linker regions. Knowledge of domains and their boundaries may reduce the experimental cost of protein structure determination by allowing researchers to work on a set of smaller and possibly more successful alternatives. Current domain prediction methods often rely on sequence similarity to conserved domains and as such are poorly suited to detect domain structure in poorly conserved or orphan proteins. We present here a simple computational method to identify protein domain linkers and their boundaries from sequence information alone. Our domain predictor, Armadillo (http://armadillo.blueprint.org), uses any amino acid index to convert a protein sequence to a smoothed numeric profile from which domains and domain boundaries may be predicted. We derived an amino acid index called the domain linker propensity index (DLI) from the amino acid composition of domain linkers using a non-redundant structure dataset. The index indicates that Pro and Gly show a propensity for linker residues while small hydrophobic residues do not. Armadillo predicts domain linker boundaries from Z-score distributions and obtains 35% sensitivity with DLI in a two-domain, single-linker dataset (within +/-20 residues from linker). The combination of DLI and an entropy-based amino acid index increases the overall Armadillo sensitivity to 56% for two domain proteins. Moreover, Armadillo achieves 37% sensitivity for multi-domain proteins, surpassing most other prediction methods. Armadillo provides a simple, but effective method by which prediction of domain boundaries can be obtained with reasonable sensitivity. Armadillo should prove to be a valuable tool for rapidly delineating protein domains in poorly conserved proteins or those with no sequence neighbors. As a first-line predictor, domain meta-predictors could yield improved results with Armadillo predictions.  相似文献   

15.
Parisien M  Major F 《Proteins》2005,61(3):545-558
Systematic protein folding studies depend on protein three-dimensional structure annotation, the assignment of amino acid structural types from atomic coordinates. Significant stabilizing factors between adjacent beta-sheet peptide chains have recently been characterized and were not considered during the development of previously published annotation methods. To produce an accurate beta-sheet domain catalog and to encompass the full beta-sheet spectacle, we developed a method, beta-Spider, which evaluates a packing energy between adjacent peptide chains in accordance with the newly discovered stabilizing factors. While considering important energetic factors, our approach also minimizes the use of subjective criteria, such as (phi,psi) boundaries and sets of H-bonding motifs that are used in other existing methods. As a result of the application of beta-Spider to a set of available high-resolution X-ray crystal structures, we present here a new beta-sheet catalog that differs considerably from the one produced by the most acclaimed DSSP method. The catalog includes new H-bonding motifs that were never reported.  相似文献   

16.
A conformational variability of the collagen triple helix was studied with the methods of molecular mechanics. The Rich-Crick model with one hydrogen bond per tripeptide fragment or the model with two hydrogen bonds per tripeptide fragment were used for tripeptides forming the primary structure of the protein. Imino acid and amino acid residues were located in the second position of the tripeptide fragments in the first and second cases, respectively. Conformations on domain boundaries, which had alternating structures with one and two hydrogen bonds per tripeptide, were particularly studied. Essentially all types of collagen backbone composed of amino acid residues most frequently occurring in this protein were considered. A new model was suggested that combined elements of the Rich-Crick model and our new approach. This was shown to be stereochemically valid, energetically advantageous, and consistent with the experimental data. It was conclusively demonstrated that the primary structure of collagen determines its tertiary structure.  相似文献   

17.
The delineation of domain boundaries of a given sequence in the absence of known 3D structures or detectable sequence homology to known domains benefits many areas in protein science, such as protein engineering, protein 3D structure determination and protein structure prediction. With the exponential growth of newly determined sequences, our ability to predict domain boundaries rapidly and accurately from sequence information alone is both essential and critical from the viewpoint of gene function annotation. Anyone attempting to predict domain boundaries for a single protein sequence is invariably confronted with a plethora of databases that contain boundary information available from the internet and a variety of methods for domain boundary prediction. How are these derived and how well do they work? What definition of 'domain' do they use? We will first clarify the different definitions of protein domains, and then describe the available public databases with domain boundary information. Finally, we will review existing domain boundary prediction methods and discuss their strengths and weaknesses.  相似文献   

18.
Protein domains exist by themselves or in combination with other domains to form complex multidomain proteins. Defining domain boundaries in proteins is essential for understanding their evolution and function but is not trivial. More specifically, partitioning domains that interact by forming a single β-sheet is known to be particularly troublesome for automatic structure-based domain decomposition pipelines. Here, we study edge-to-edge β-strand interactions between domains in a protein chain, to help define the boundaries for some more difficult cases where a single β-sheet spanning over two domains gives an appearance of one. We give a number of examples where β-strands belonging to a single β-sheet do not belong to a single domain and highlight the difficulties of automatic domain parsers on these examples. This work can be used as a baseline for defining domain boundaries in homologous proteins or proteins with similar domain interactions in the future.  相似文献   

19.
The production of diffraction-quality crystals remains a difficult obstacle on the road to high-resolution structural characterization of proteins. This is primarily a result of the empirical nature of the process. Although crystallization is not predictable, factors inhibiting it are well established. First, crystal formation is always entropically unfavorable. Reducing the entropic cost of crystallizing a given protein is thus desirable. It is common practice to map boundaries and remove unstructured regions surrounding the folded protein domain. However, a problem arises when flexible regions are not at the boundaries but within a domain. Such regions cannot be deleted without adding new restraints to the domain. We encountered this problem during an attempt to crystallize the beta subunit of the eukaryotic signal recognition particle (SRbeta), bearing a long and flexible internal loop. Native SRbeta did not crystallize. However, after circularly permuting the protein by connecting the spatially close N and C termini with a short heptapeptide linker GGGSGGG and removing 26 highly flexible loop residues within the domain, we obtained diffraction-quality crystals. This protein-engineering method is simple and should be applicable to other proteins, especially because N and C termini of protein domains are often close in space. The success of this method profits from prior knowledge of the domain fold, which is becoming increasingly common in today's postgenomic era.  相似文献   

20.
The conservation of hox genes as well as their genomic organization across the phyla suggests that this system of anterior–posterior axis formation arose early during evolution and has come under strong selection pressure. Studies in the split Hox cluster of Drosophila have shown that proper expression of hox genes is dependent on chromatin domain boundaries that prevent inappropriate interactions among different types of cis-regulatory elements. To investigate whether boundary function and their role in regulation of hox genes is conserved in insects with intact Hox clusters, we used an algorithm to locate potential boundary elements in the Hox complex of mosquito, Anopheles gambiae. Several potential boundary elements were identified that could be tested for their functional conservation. Comparative analysis revealed that like Drosophila, the bithorax region in A. gambiae contains an extensive array of boundaries and enhancers organized into domains. We analysed a subset of candidate boundary elements and show that they function as enhancer blockers in Drosophila. The functional conservation of boundary elements from mosquito in fly suggests that regulation of hox genes involving chromatin domain boundaries is an evolutionary conserved mechanism and points to an important role of such elements in key developmentally regulated loci.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号