首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Multiple comparison or alignmentof protein sequences has become a fundamental tool in many different domains in modern molecular biology, from evolutionary studies to prediction of 2D/3D structure, molecular function and inter-molecular interactions etc. By placing the sequence in the framework of the overall family, multiple alignments can be used to identify conserved features and to highlight differences or specificities. In this paper, we describe a comprehensive evaluation of many of the most popular methods for multiple sequence alignment (MSA), based on a new benchmark test set. The benchmark is designed to represent typical problems encountered when aligning the large protein sequence sets that result from today's high throughput biotechnologies. We show that alignmentmethods have significantly progressed and can now identify most of the shared sequence features that determine the broad molecular function(s) of a protein family, even for divergent sequences. However,we have identified a number of important challenges. First, the locally conserved regions, that reflect functional specificities or that modulate a protein's function in a given cellular context,are less well aligned. Second, motifs in natively disordered regions are often misaligned. Third, the badly predicted or fragmentary protein sequences, which make up a large proportion of today's databases, lead to a significant number of alignment errors. Based on this study, we demonstrate that the existing MSA methods can be exploited in combination to improve alignment accuracy, although novel approaches will still be needed to fully explore the most difficult regions. We then propose knowledge-enabled, dynamic solutions that will hopefully pave the way to enhanced alignment construction and exploitation in future evolutionary systems biology studies.  相似文献   

2.
The soluble methane monooxygenase (sMMO) hydroxylase is a prototypical member of the class of proteins with non-heme carboxylate-bridged diiron sites. The sMMO subclass of enzyme systems has several distinguishing characteristics, including the ability to catalyze hydroxylation or epoxidation chemistry, a multisubunit hydroxylase containing diiron centers in its alpha subunits, and the requirement of a coupling protein for optimal activity. Sequence homology alignment of known members of the sMMO family was performed in an effort to identify protein regions giving rise to these unique features. DNA sequencing of the Methylococcus capsulatus (Bath) sMMO genes confirmed previously identified sequencing errors and corrected two additional errors, each of which was confirmed by at least one independent method. Alignments of homologous proteins from sMMO, phenol hydroxylase, toluene 2-, 3-, and 4-monooxygenases, and alkene monooxygenase systems revealed an interesting set of absolutely conserved amino-acid residues, including previously unidentified residues located outside the diiron active site of the hydroxylase. By mapping these residues on to the M. capsulatus (Bath) sMMO hydroxylase crystal structure, functional and structural roles were proposed for the conserved regions. Analysis of the active site showed a highly conserved hydrogen-bonding network on one side of the diiron cluster but little homology on the opposite side, where substrates are presumed to bind. It is suggested that conserved residues on the hydroxylase surface may be important for protein-protein interactions with the reductase and coupling ancillary proteins and/or serve as part of an electron-transfer pathway. A possible way by which binding of the coupling protein at the surface of the hydroxylase might transfer information to the diiron active site at the interior is proposed.  相似文献   

3.
Human DNA polymerase nu (pol nu) is one of three A family polymerases conserved in vertebrates. Although its biological functions are unknown, pol nu has been implicated in DNA repair and in translesion DNA synthesis (TLS). Pol nu lacks intrinsic exonucleolytic proofreading activity and discriminates poorly against misinsertion of dNTP opposite template thymine or guanine, implying that it should copy DNA with low base substitution fidelity. To test this prediction and to comprehensively examine pol nu DNA synthesis fidelity as a clue to its function, here we describe human pol nu error rates for all 12 single base-base mismatches and for insertion and deletion errors during synthesis to copy the lacZ alpha-complementation sequence in M13mp2 DNA. Pol nu copies this DNA with average single-base insertion and deletion error rates of 7 x 10(-5) and 17 x 10(-5), respectively. This accuracy is comparable to that of replicative polymerases in the B family, lower than that of its A family homolog, human pol gamma, and much higher than that of Y family TLS polymerases. In contrast, the average single-base substitution error rate of human pol nu is 3.5 x 10(-3), which is inaccurate compared to the replicative polymerases and comparable to Y family polymerases. Interestingly, the vast majority of errors made by pol nu reflect stable misincorporation of dTMP opposite template G, at average rates that are much higher than for homologous A family members. This pol nu error is especially prevalent in sequence contexts wherein the template G is preceded by a C-G or G-C base pair, where error rates can exceed 10%. Amino acid sequence alignments based on the structures of more accurate A family polymerases suggest substantial differences in the O-helix of pol nu that could contribute to this unique error signature.  相似文献   

4.
5.
Errors in genotyping data have been shown to have a significant effect on the estimation of recombination fractions in high-resolution genetic maps. Previous estimates of errors in existing databases have been limited to the analysis of relatively few markers and have suggested rates in the range 0.5%-1.5%. The present study capitalizes on the fact that within the Centre d'Etude du Polymorphisme Humain (CEPH) collection of reference families, 21 individuals are members of more than one family, with separate DNA samples provided by CEPH for each appearance of these individuals. By comparing the genotypes of these individuals in each of the families in which they occur, an estimated error rate of 1.4% was calculated for all loci in the version 4.0 CEPH database. Removing those individuals who were clearly identified by CEPH as appearing in more than one family resulted in a 3.0% error rate for the remaining samples, suggesting that some error checking of the identified repeated individuals may occur prior to data submission. An error rate of 3.0% for version 4.0 data was also obtained for four chromosome 5 markers that were retyped through the entire CEPH collection. The effects of these errors on a multipoint map were significant, with a total sex-averaged length of 36.09 cM with the errors, and 19.47 cM with the errors corrected. Several statistical approaches to detect and allow for errors during linkage analysis are presented. One method, which identified families containing possible errors on the basis of the impact on the maximum lod score, showed particular promise, especially when combined with the limited retyping of the identified families. The impact of the demonstrated error rate in an established genotype database on high-resolution mapping is significant, raising the question of the overall value of incorporating such existing data into new genetic maps.  相似文献   

6.
We have previously examined characteristics of maternal chromosomes 21 that exhibited a single recombination on 21q and proposed that certain recombination configurations are risk factors for either meiosis I (MI) or meiosis II (MII) nondisjunction. The primary goal of this analysis was to examine characteristics of maternal chromosomes 21 that exhibited multiple recombinant events on 21q to determine whether additional risk factors or mechanisms are suggested. In order to identify the origin (maternal or paternal) and stage (MI or MII) of the meiotic errors, as well as placement of recombination, we genotyped over 1,500 SNPs on 21q. Our analyses included 785 maternal MI errors, 87 of which exhibited two recombinations on 21q, and 283 maternal MII errors, 81 of which exhibited two recombinations on 21q. Among MI cases, the average location of the distal recombination was proximal to that of normally segregating chromosomes 21 (35.28 vs. 38.86 Mb), a different pattern than that seen for single events and one that suggests an association with genomic features. For MII errors, the most proximal recombination was closer to the centromere than that on normally segregating chromosomes 21 and this proximity was associated with increasing maternal age. This pattern is same as that seen among MII errors that exhibit only one recombination. These findings are important as they help us better understand mechanisms that may underlie both age-related and nonage-related meiotic chromosome mal-segregation.  相似文献   

7.
Rho family GTPases are ideal candidates to regulate aspects of cytoskeletal dynamics downstream of axon guidance receptors. To examine the in vivo role of Rho GTPases in midline guidance, dominant negative (dn) and constitutively active (ct) forms of Rho, Drac1, and Dcdc42 are expressed in the Drosophila CNS. When expressed alone, only ctDrac and ctDcdc42 cause axons in the pCC/MP2 pathway to cross the midline inappropriately. Heterozygous loss of Roundabout enhances the ctDrac phenotype and causes errors in embryos expressing dnRho or ctRho. Homozygous loss of Son-of-Sevenless (Sos) also enhances the ctDrac phenotype and causes errors in embryos expressing either dnRho or dnDrac. CtRho suppresses the midline crossing errors caused by loss of Sos. CtDrac and ctDcdc42 phenotypes are suppressed by heterozygous loss of Profilin, but strongly enhanced by coexpression of constitutively active myosin light chain kinase (ctMLCK), which increases myosin II activity. Expression of ctMLCK also causes errors in embryos expressing either dnRho or ctRho. Our data confirm that Rho family GTPases are required for regulation of actin polymerization and/or myosin activity and that this is critical for the response of growth cones to midline repulsive signals. Midline repulsion appears to require down-regulation of Drac1 and Dcdc42 and activation of Rho.  相似文献   

8.
Landan G  Graur D 《Gene》2009,441(1-2):141-147
We characterize pairwise and multiple sequence alignment (MSA) errors by comparing true alignments from simulations of sequence evolution with reconstructed alignments. The vast majority of reconstructed alignments contain many errors. Error rates rapidly increase with sequence divergence, thus, for even intermediate degrees of sequence divergence, more than half of the columns of a reconstructed alignment may be expected to be erroneous. In closely related sequences, most errors consist of the erroneous positioning of a single indel event and their effect is local. As sequences diverge, errors become more complex as a result of the simultaneous mis-reconstruction of many indel events, and the lengths of the affected MSA segments increase dramatically. We found a systematic bias towards underestimation of the number of gaps, which leads to the reconstructed MSA being on average shorter than the true one. Alignment errors are unavoidable even when the evolutionary parameters are known in advance. Correct reconstruction can only be guaranteed when the likelihood of true alignment is uniquely optimal. However, true alignment features are very frequently sub-optimal or co-optimal, with the result that optimal albeit erroneous features are incorporated into the reconstructed MSA. Progressive MSA utilizes a guide-tree in the reconstruction of MSAs. The quality of the guide-tree was found to affect MSA error levels only marginally.  相似文献   

9.
Chen H  Kihara D 《Proteins》2008,71(3):1255-1274
The error in protein tertiary structure prediction is unavoidable, but it is not explicitly shown in most of the current prediction algorithms. Estimated error of a predicted structure is crucial information for experimental biologists to use the prediction model for design and interpretation of experiments. Here, we propose a method to estimate errors in predicted structures based on the stability of the optimal target-template alignment when compared with a set of suboptimal alignments. The stability of the optimal alignment is quantified by an index named the SuboPtimal Alignment Diversity (SPAD). We implemented SPAD in a profile-based threading algorithm and investigated how well SPAD can indicate errors in threading models using a large benchmark dataset of 5232 alignments. SPAD shows a very good correlation not only to alignment shift errors but also structure-level errors, the root mean square deviation (RMSD) of predicted structure models to the native structures (i.e. global errors), and local errors at each residue position. We have further compared SPAD with seven other quality measures, six from sequence alignment-based measures and one atomic statistical potential, discrete optimized protein energy (DOPE), in terms of the correlation coefficient to the global and local structure-level errors. In terms of the correlation to the RMSD of structure models, when a target and a template are in the same SCOP family, the sequence identity showed a best correlation to the RMSD; in the superfamily level, SPAD was the best; and in the fold level, DOPE was best. However, in a head-to-head comparison, SPAD wins over the other measures. Next, SPAD is compared with three other measures of local errors. In this comparison, SPAD was best in all of the family, the superfamily and the fold levels. Using the discovered correlation, we have also predicted the global and local error of our predicted structures of CASP7 targets by the SPAD. Finally, we proposed a sausage representation of predicted tertiary structures which intuitively indicate the predicted structure and the estimated error range of the structure simultaneously.  相似文献   

10.
The nosology of the inborn errors of myelin metabolism has been stymied by the lack of molecular genetic analysis. Historically, Pelizaeus-Merzbacher disease has encompassed a host of neurologic disorders that present with a deficit of myelin, the membrane elaborated by glial cells that encircles and successively enwraps axons. We describe here a Pelizaeus-Merzbacher pedigree of the classical type, with X-linked inheritance, a typical clinical progression, and a pathologic loss of myelinating cells and myelin in the central nervous system. To discriminate variants of Pelizaeus-Merzbacher disease, a set of oligonucleotide primers was constructed to polymerase-chain-reaction (PCR) amplify and sequence the gene encoding proteolipid protein (PLP), a structural protein that comprises half of the protein of the myelin sheath. The PLP gene in one of two affected males and the carrier mother of this family exhibited a single base difference in the more than 2 kb of the PLP gene sequenced, a C----T transition that would create a serine substitution for proline at the carboxy end of the protein. Our results delineate the clinical features of Pelizaeus-Merzbacher disease, define the possible molecular pathology of this dysmyelinating disorder, and address the molecular classification of inborn errors of myelin metabolism. Patients with the classical form (type I) and the more severely affected, connatal variant of Pelizaeus-Merzbacher disease (type II) would be predicted to display mutation at the PLP locus. The other variants (types III-VI), which have sometimes been categorized as Pelizaeus-Merzbacher disease, may represent mutations in genes encoding other structural myelin proteins or proteins critical to myelination.  相似文献   

11.
Zou G  Pan D  Zhao H 《Genetics》2003,164(3):1161-1173
The identification of genotyping errors is an important issue in mapping complex disease genes. Although it is common practice to genotype multiple markers in a candidate region in genetic studies, the potential benefit of jointly analyzing multiple markers to detect genotyping errors has not been investigated. In this article, we discuss genotyping error detections for a set of tightly linked markers in nuclear families, and the objective is to identify families likely to have genotyping errors at one or more markers. We make use of the fact that recombination is a very unlikely event among these markers. We first show that, with family trios, no extra information can be gained by jointly analyzing markers if no phase information is available, and error detection rates are usually low if Mendelian consistency is used as the only standard for checking errors. However, for nuclear families with more than one child, error detection rates can be greatly increased with the consideration of more markers. Error detection rates also increase with the number of children in each family. Because families displaying Mendelian consistency may still have genotyping errors, we calculate the probability that a family displaying Mendelian consistency has correct genotypes. These probabilities can help identify families that, although showing Mendelian consistency, may have genotyping errors. In addition, we examine the benefit of available haplotype frequencies in the general population on genotyping error detections. We show that both error detection rates and the probability that an observed family displaying Mendelian consistency has correct genotypes can be greatly increased when such additional information is available.  相似文献   

12.
H J?rnvall 《FEBS letters》1999,456(1):85-88
Motifer is a software tool able to find directly in nucleotide databases very distant homologues to an amino acid query sequence. It focuses searches on a specific amino acid pattern, scoring the matching and intervening residues as specified by the user. The program has been developed for searching databases of expressed sequence tags (ESTs), but it is also well suited to search genomic sequences. The query sequence can be a variable pattern with alternative amino acids or gaps and the sequences searched can contain introns or sequencing errors with accompanying frame shifts. Other features include options to generate a searchable output, set the maximal sequencing error frequency, limit searches to given species, or exclude already known matches. Motifer can find sequence homologues that other search algorithms would deem unrelated or would not find because of sequencing errors or a too large number of other homologues. The ability of Motifer to find relatives to a given sequence is exemplified by searches for members of the transforming growth factor-beta family and for proteins containing a WW-domain. The functions aimed at enhancing EST searches are illustrated by the 'in silico' cloning of a novel cytochrome P450 enzyme.  相似文献   

13.
The identification of catalytic residues is an essential step in functional characterization of enzymes. We present a purely structural approach to this problem, which is motivated by the difficulty of evolution-based methods to annotate structural genomics targets that have few or no homologs in the databases. Our approach combines a state-of-the-art support vector machine (SVM) classifier with novel structural features that augment structural clues by spatial averaging and Z scoring. Special attention is paid to the class imbalance problem that stems from the overwhelming number of non-catalytic residues in enzymes compared to catalytic residues. This problem is tackled by: (1) optimizing the classifier to maximize a performance criterion that considers both Type I and Type II errors in the classification of catalytic and non-catalytic residues; (2) under-sampling non-catalytic residues before SVM training; and (3) during SVM training, penalizing errors in learning catalytic residues more than errors in learning non-catalytic residues. Tested on four enzyme datasets, one specifically designed by us to mimic the structural genomics scenario and three previously evaluated datasets, our structure-based classifier is never inferior to similar structure-based classifiers and comparable to classifiers that use both structural and evolutionary features. In addition to the evaluation of the performance of catalytic residue identification, we also present detailed case studies on three proteins. This analysis suggests that many false positive predictions may correspond to binding sites and other functional residues. A web server that implements the method, our own-designed database, and the source code of the programs are publicly available at http://www.cs.bgu.ac.il/~meshi/functionPrediction.  相似文献   

14.
The purpose of this work is the development of a family-based association test that allows for random genotyping errors and missing data and makes use of information on affected and unaffected pedigree members. We derive the conditional likelihood functions of the general nuclear family for the following scenarios: complete parental genotype data and no genotyping errors; only one genotyped parent and no genotyping errors; no parental genotype data and no genotyping errors; and no parental genotype data with genotyping errors. We find maximum likelihood estimates of the marker locus parameters, including the penetrances and population genotype frequencies under the null hypothesis that all penetrance values are equal and under the alternative hypothesis. We then compute the likelihood ratio test. We perform simulations to assess the adequacy of the central chi-square distribution approximation when the null hypothesis is true. We also perform simulations to compare the power of the TDT and this likelihood-based method. Finally, we apply our method to 23 SNPs genotyped in nuclear families from a recently published study of idiopathic scoliosis (IS). Our simulations suggest that this likelihood ratio test statistic follows a central chi-square distribution with 1 degree of freedom under the null hypothesis, even in the presence of missing data and genotyping errors. The power comparison shows that this likelihood ratio test is more powerful than the original TDT for the simulations considered. For the IS data, the marker rs7843033 shows the most significant evidence for our method (p = 0.0003), which is consistent with a previous report, which found rs7843033 to be the 2nd most significant TDTae p value among a set of 23 SNPs.  相似文献   

15.
It has long been known that insufficient consideration of spatial autocorrelation leads to unreliable hypothesis‐tests and inaccurate parameter estimates. Yet, ecologists are confronted with a confusing array of methods to account for spatial autocorrelation. Although Beale et al. (2010) provided guidance for continuous data on regular grids, researchers still need advice for other types of data in more flexible spatial contexts. In this paper, we extend Beale et al. (2010)‘s work to count data on both regularly‐ and irregularly‐spaced plots, the latter being commonly encountered in ecological studies. Through a simulation‐based approach, we assessed the accuracy and the type I errors of two frequentist and two Bayesian ready‐to‐use methods in the family of generalized mixed models, with distance‐based or neighbourhood‐based correlated random effects. In addition, we tested whether the methods are robust to spatial non‐stationarity, and over‐ and under‐dispersion – both typical features of species distribution count data which violate standard regression assumptions. In the simplest of our simulated datasets, the two frequentist methods gave inflated type I errors, while the two Bayesian methods provided satisfying results. When facing real‐world complexities, the distance‐based Bayesian method (MCMC with Langevin–Hastings updates) performed best of all. We hope that, in the light of our results, ecological researchers will feel more comfortable including spatial autocorrelation in their analyses of count data.  相似文献   

16.
Microtubules (MTs) play an important role in cell division, and their functions are regulated by a set of microtubule-associated proteins (MAPs). Tubulin polymerization promoting protein family member 3 (TPPP3), also known as p20, is a new member of the tubulin polymerization promoting protein (TPPP) family. Previous studies have demonstrated that TPPP3 specifically binds to MTs and positively regulates MTs assembly, which leads to significant ultrastructural alterations of the MTs network. However, the physiological function of TPPP3 is still largely unknown. In the present study, we showed that knockdown of endogenous TPPP3 by RNA interference (RNAi) suppressed cell proliferation and induced cell cycle arrest in HeLa cells. Furthermore, we showed that the depletion of TPPP3 caused mitotic abnormalities, such as the formation of multipolar spindles and chromosome segregation errors, which lead to apoptosis in HeLa cells. Our study suggested that TPPP3 played a crucial role in cell mitosis by regulating centrosomes amplification and/or spindles translocation processes.  相似文献   

17.
The crystal structure of Cel44A, which is one of the enzymatic components of the cellulosome of Clostridium thermocellum, was solved at a resolution of 0.96 A. This enzyme belongs to glycoside hydrolase family (GH family) 44. The structure reveals that Cel44A consists of a TIM-like barrel domain and a beta-sandwich domain. The wild-type and the E186Q mutant structures complexed with substrates suggest that two glutamic acid residues, Glu(186) and Glu(359), are the active residues of the enzyme. Biochemical experiments were performed to confirm this idea. The structural features indicate that GH family 44 belongs to clan GH-A and that the reaction catalyzed by Cel44A is retaining type hydrolysis. The stereochemical course of hydrolysis was confirmed by a (1)H NMR experiment using the reduced cellooligosaccharide as a substrate.  相似文献   

18.
Lectins form a class of proteins that have evolved a specialized carbohydrate-binding function. Based on amino acid sequence analysis, several lectin families have been described and a lectin domain, the (QxW)3 domain, was discussed recently based on 11 family members. In this paper, the (QxW)3 domain family is extended to 45 sequences, several of which have very low sequence identity with the previously known members of the family. A hidden Markov model was used to identify the most divergent members of the family. The expanded set of sequences gives us a more complete appreciation of the conserved features, and the lack thereof, in this lectin family. This, in turn, provides new insights in the structural and functional properties of the individual family members.  相似文献   

19.
The parrot (Psittaciformes) show many highly distinctive features of head morphology. Jaw and tongue musculature have been investigated in seven other orders, for most of which parrot affinities have been postulated. The functional properties and evolution of various modifications found in parrots are discussed. Several features seen in the Tooth-billed pigeon ( Didunculus strigirostris ) show a significant trend towards conditions in parrots, favouring the view that the Columbiformes are the order mostly closely related to the Psittaciformes. These features also set Didunculus apart from other pigeons, and it is strongly urged that it be given full family rank.  相似文献   

20.
Molecular Evolution of the Myeloperoxidase Family   总被引:4,自引:0,他引:4  
Animal myeloperoxidase and its relatives constitute a diverse protein family, which includes myeloperoxidase, eosinophil peroxidase, thyroid peroxidase, salivary peroxidase, lactoperoxidase, ovoperoxidase, peroxidasin, peroxinectin, cyclooxygenase, and others. The members of this protein family share a catalytic domain of about 500 amino acid residues in length, although some members have distinctive mosaic structures. To investigate the evolution of the protein family, we performed a comparative analysis of its members, using the amino acid sequences and the coordinate data available today. The results obtained in this study are as follows: (1) 60 amino acid sequences belonging to this family were collected by database searching. We found a new member of the myeloperoxidase family derived from a bacterium. This is the first report of a bacterial member of this family. (2) An unrooted phylogenetic tree of the family was constructed according to the alignment. Considering the branching pattern in the obtained phylogenetic tree, together with the mosaic features in the primary structures, 60 members of the myeloperoxidase family were classified into 16 subfamilies. (3) We found two molecular features that distinguish cyclooxygenase from the other members of the protein family. (4) Several structurally deviated segments were identified by a structural comparison between cyclooxygenase and myeloperoxidase. Some of the segments seemed to be associated with the functional and/or structural differences between the enzymes. Received: 25 January 2000 / Accepted: 19 July 2000  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号