首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Skrabanek L  Niv MY 《Proteins》2008,72(4):1138-1147
Sequence signature databases such as PROSITE, which include protein pattern motifs indicative of a protein's function, are widely used for function prediction studies, cellular localization annotation, and sequence classification. Correct annotation relies on high precision of the motifs. We present a new and general approach for increasing the precision of established protein pattern motifs by including secondary structure constraints (SSCs). We use Scan2S, the first sequence motif-scanning program to optionally include SSCs, to augment PROSITE pattern motifs. The constraints were derived from either the DSSP secondary structure assignment or the PSIPRED predictions for PROSITE-documented true positive hits. The secondary structure-augmented motifs were scanned against all SwissProt sequences, for which secondary structure predictions were precalculated. Against this dataset, motifs with PSIPRED-derived SSCs exhibited improved performance over motifs with DSSP-derived constraints. The precision of 763 of the 782 PSIPRED-augmented motifs remained unchanged or increased compared to the original motifs; 26 motifs showed an absolute precision increase of 10-30%. We provide the complete set of augmented motifs and the Scan2S program at http://physiology.med.cornell.edu/go/scan2s. Our results suggest a general protocol for increasing the precision of protein pattern detection via the inclusion of SSCs.  相似文献   

2.
MOTIVATION: Increase the discriminatory power of PROSITE profiles to facilitate function determination and provide biologically relevant information about domains detected by profiles for the annotation of proteins. SUMMARY: We have created a new database, ProRule, which contains additional information about PROSITE profiles. ProRule contains notably the position of structurally and/or functionally critical amino acids, as well as the condition they must fulfill to play their biological role. These supplementary data should help function determination and annotation of the UniProt Swiss-Prot knowledgebase. ProRule also contains information about the domain detected by the profile in the Swiss-Prot line format. Hence, ProRule can be used to make Swiss-Prot annotation more homogeneous and consistent. The format of ProRule can be extended to provide information about combination of domains. AVAILABILITY: ProRule can be accessed through ScanProsite at http://www.expasy.org/tools/scanprosite. A file containing the rules will be made available under the PROSITE copyright conditions on our ftp site (ftp://www.expasy.org/databases/prosite/) by the next PROSITE release.  相似文献   

3.
4.
5.
True positive hits of PROSITE sequence pattern are expected to have a characteristic three-dimensional structure. The combined sequence-structure attributes of PROSITE patterns can be used for function prediction of an uncharacterized protein with known primary and 3D structure, a situation that might arise in structural genomics projects. We have found specific examples of true hits of PROSITE patterns displaying structural plasticity by assuming significantly different local conformation, depending upon the context. Our work highlights the importance of taking into account all the known distinct conformations of PROSITE patterns, while creating a sensitive 3D template for the pattern, for use in functional annotation.  相似文献   

6.
Human embryogenesis includes an integrated set of complex yet coordinated development of different organs and tissues, which is regulated by the spatiotemporal expression of many genes. Deciphering the gene regulation profile is essential for understanding the molecular basis of human embryo development. While molecular and genetic studies in mouse have served as a valuable tool to understand mammalian development, significant differences exists in human and mouse development at morphological and genomic levels. Thus it is important to carry out research directly on human embryonic development. Here we will review some recent studies on gene regulation during human embryogenesis with particular focus on the period of organogenesis, which had not been well studied previously. We will highlight a gene expression database of human embryos from the 4(th) to the 9(th) week. The analysis of gene regulation during this period reveals that genes functioning in a given developmental process tend to be coordinately regulated during human embryogenesis. This feature allows us to use this database to identify new genes important for a particular developmental process/pathway and deduce the potential function of a novel gene during organogenesis. Such a gene expression atlas should serve as an important resource for molecular study of human development and pathogenesis.  相似文献   

7.
The identification of MHC restricted epitopes is an important goal in peptide based vaccine and diagnostic development. As wet lab experiments for identification of MHC binding peptide are expensive and time consuming, in silico tools have been developed as fast alternatives, however with low performance. In the present study, we used IEDB training and blind validation datasets for the prediction of peptide binding to fourteen human MHC class I and II molecules using Gibbs motif sampler, weight matrix and artificial neural network methods. As compare to MHC class I predictor based on sequence weighting (Aroc=0.95 and CC=0.56) and artificial neural network (Aroc=0.73 and CC=0.25), MHC class II predictor based on Gibbs sampler did not perform well (Aroc=0.62 and CC=0.19). The predictive accuracy of Gibbs motif sampler in identifying the 9-mer cores of a binding peptide to DRB1 alleles are also limited (40¢), however above the random prediction (14¢). Therefore, the size of dataset (training and validation) and the correct identification of the binding core are the two main factors limiting the performance of MHC class-II binding peptide prediction. Overall, these data suggest that there is substantial room to improve the quality of the core predictions using novel approaches that capture distinct features of MHC-peptide interactions than the current approaches.  相似文献   

8.
Distribution patterns along a slope and vertical root distribution were compared among seven major woody species in a secondary forest of the warm-temperate zone in central Japan in relation to differences in soil moisture profiles through a growing season among different positions along the slope. Pinus densiflora, Juniperus rigida, Ilex pedunculosa and Lyonia ovalifolia, growing mostly on the upper part of the slope with shallow soil depth had shallower roots. Quercus serrata and Quercus glauca, occurring mostly on the lower slope with deep soil showed deeper rooting. Styrax japonica, mainly restricted to the foot slope, had shallower roots in spite of growing on the deepest soil. These relations can be explained by the soil moisture profile under drought at each position on the slope. On the upper part of the slope and the foot slope, deep rooting brings little advantage in water uptake from the soil due to the total drying of the soil and no period of drying even in the shallow soil, respectively. However, deep rooting is useful on the lower slope where only the deep soil layer keeps moist. This was supported by better diameter growth of a deep-rooting species on deeper soil sites than on shallower soil sites, although a shallow-rooting species showed little difference between them.  相似文献   

9.
Several studies have shown that microbial action is responsible for many compounds responsible for human odour. In this paper, we compare the pattern of microbial profiles and that of chemical profiles of human axillary odour by using multivariate pattern matching techniques. Approximately 200 subjects from Carinthia, Austria, participated in the study. The microbial profiles were represented by denaturing gradient gel electrophoresis (DGGE) analysis and the axillary odour profiles were determined in the sweat samples collected by a stir-bar sampling device and analysed by gas chromatography/mass spectrometry (GC/MS). Both qualitative and quantitative distance metrics were used to construct dissimilarity matrices between samples which were then used to represent the patterns of these two types of profiles. The distance matrices were then compared by using the Mantel test and the Procrustean test. The results show that on the overall dataset there is no strong correlation between microbial and chemical profiles. When the data are split into family groups, correlations vary according to family with a range of estimated p values from 0.00 to 0.90 that the null hypothesis (no correlation) holds. When 32 subjects who followed four basic rules of behaviour were selected, the estimated p-values are 0.00 using qualitative and <0.01 using quantitative distance metrics, suggesting excellent evidence that there is a connection between the microbial and chemical signature.  相似文献   

10.
The UNIQEM database, designed to accumulate general microbiological data, is currently used to store and make available information about microorganisms studied and maintained at the Institute of Microbiology, Russian Academy of Sciences. UNIQEM can accumulate and maintain list-form information on a wide range of microorganisms (a property database) and facilitates collecting, processing, and publishing diverse data having to do with these microorganisms and their properties (a catalogue database). The database supports the retrieval of microorganisms by specifying an arbitrary set of their properties and has the potential for eventually evolving into a computer instrument for unattended identification of microorganisms. UNIQEMAkhlynin, D.S. and Gal’chenko, V.F., 1998.  相似文献   

11.
This study was designed to investigate the relationship between traditional skeletal cephalometric measurement and Fourier analysis of the lateral soft-tissue profile. A random sample of 121 untreated subjects of European descent, with wide ranges of malocclusions and underlying facial patterns, was selected in the Orthodontic Unit at the University of Melbourne. Lateral cephalograms were available for all subjects. Both traditional lateral cephalometric analysis and Fourier soft-tissue profile analysis were carried out. Multivariate statistical analysis among 11 hard-tissue cephalometric measurements and the first 50 Fourier harmonics was then performed. This analysis formed the basis for a subsequently proposed soft-tissue prediction model. From this model, 50 predicted x- and y-harmonics were generated for each subject in the total sample. Calculation of Pearson's correlation coefficients between the actual and predicted harmonics revealed strong relationships for many of the lower-order harmonics. To further test the model, the prediction-coefficients derived from all 121 subjects were then used to make predictions for the first 50 x- and y-harmonics for a subgroup of 10 independent test subjects. Once again, Pearson's correlations between the actual and predicted harmonics of the test model in the lower-order harmonics revealed strong associations. Superimposition of the actual and predicted soft-tissue outlines, however, revealed that much actual detail in the region between the nose and the chin was still lost using the predicted Fourier harmonics. This suggests that soft-tissue prediction based on this Fourier test model, while already useful in Forensic facial reconstruction, may not yet be appropriate for useful diagnosis and planning in clinical disciplines.  相似文献   

12.
13.
Loops are the most variable regions of protein structure and are, in general, the least accurately predicted. Their prediction has been approached in two ways, ab initio and database search. In recent years, it has been thought that ab initio methods are more powerful. In light of the continued rapid expansion in the number of known protein structures, we have re‐evaluated FREAD, a database search method and demonstrate that the power of database search methods may have been underestimated. We found that sequence similarity as quantified by environment specific substitution scores can be used to significantly improve prediction. In fact, FREAD performs appreciably better for an identifiable subset of loops (two thirds of shorter loops and half of the longer loops tested) than the ab initio methods of MODELLER, PLOP, and RAPPER. Within this subset, FREAD's predictive ability is length independent, in general, producing results within 2Å RMSD, compared to an average of over 10Å for loop length 20 for any of the other tested methods. We also benchmarked the prediction protocols on a set of 212 loops from the model structures in CASP 7 and 8. An extended version of FREAD is able to make predictions for 127 of these, it gives the best prediction of the methods tested in 61 of these cases. In examining FREAD's ability to predict in the model environment, we found that whole structure quality did not affect the quality of loop predictions. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

14.
The French Mediterranean zone is one of the richest of the country, with 3200 species and many endemics. Because of its interest as a synthetic tool to store and manage data, an ecological Mediterranean flora database was created. Built around five tables, BASECO allows several queries about the botanical and ecological characteristics of about 1800 plants. The database was implemented in Access, which is a relational database management system. Each species is identified by a code and is characterised by several qualitative traits relating to morphology, reproduction, life forms and biogeographical distribution, including several modalities. Each trait is informed from one or two pre-defined reference botanical handbooks as much as possible. There are many different possible uses of this database, even at a huge scale, allowing to reveal patterns hard to detect with the taxonomic approach alone.  相似文献   

15.
The detection of genes that show similar profiles under different experimental conditions is often an initial step in inferring the biological significance of such genes. Visualization tools are used to identify genes with similar profiles in microarray studies. Given the large number of genes recorded in microarray experiments, gene expression data are generally displayed on a low dimensional plot, based on linear methods. However, microarray data show nonlinearity, due to high-order terms of interaction between genes, so alternative approaches, such as kernel methods, may be more appropriate. We introduce a technique that combines kernel principal component analysis (KPCA) and Biplot to visualize gene expression profiles. Our approach relies on the singular value decomposition of the input matrix and incorporates an additional step that involves KPCA. The main properties of our method are the extraction of nonlinear features and the preservation of the input variables (genes) in the output display. We apply this algorithm to colon tumor, leukemia and lymphoma datasets. Our approach reveals the underlying structure of the gene expression profiles and provides a more intuitive understanding of the gene and sample association.  相似文献   

16.
Restriction endonucleases (REases) are DNA-cleaving enzymes that have become indispensable tools in molecular biology. Type II REases are highly divergent in sequence despite their common structural core, function and, in some cases, common specificities towards DNA sequences. This makes it difficult to identify and classify them functionally based on sequence, and has hampered the efforts of specificity-engineering. Here, we define novel REase sequence motifs, which extend beyond the PD-(D/E)XK hallmark, and incorporate secondary structure information. The automated search using these motifs is carried out with a newly developed fast regular expression matching algorithm that accommodates long patterns with optional secondary structure constraints. Using this new tool, named Scan2S, motifs derived from REases with specificity towards GATC- and CGGG-containing DNA sequences successfully identify REases of the same specificity. Notably, some of these sequences are not identified by standard sequence detection tools. The new motifs highlight potential specificity-determining positions that do not fully overlap for the GATC- and the CCGG-recognizing REases and are candidates for specificity re-engineering.  相似文献   

17.
同义密码子使用模式作为核苷酸与氨基酸的纽带,其多样性介导了核糖体扫描速率,同时扩充了基因的遗传信息存储量。随着新型技术的应用,发现特异性密码子和密码子结合力可调节核糖体扫描速率并影响蛋白质构象。同义密码子使用模式通过多种方式在不同环节影响着核糖体扫描速率,同时还影响着自身mRNA的稳定性。本文简述了密码子使用模式如何在核糖体扫描翻译mRNA的过程中实现对多肽链翻译延伸的调控,为今后生物工程学领域如何优化蛋白高效表达提供可参考的思路与理念。  相似文献   

18.
In the medical domain, it is very significant to develop a rule-based classification model. This is because it has the ability to produce a comprehensible and understandable model that accounts for the predictions. Moreover, it is desirable to know not only the classification decisions but also what leads to these decisions. In this paper, we propose a novel dynamic quantitative rule-based classification model, namely DQB, which integrates quantitative association rule mining and the Artificial Bee Colony (ABC) algorithm to provide users with more convenience in terms of understandability and interpretability via an accurate class quantitative association rule-based classifier model. As far as we know, this is the first attempt to apply the ABC algorithm in mining for quantitative rule-based classifier models. In addition, this is the first attempt to use quantitative rule-based classification models for classifying microarray gene expression profiles. Also, in this research we developed a new dynamic local search strategy named DLS, which is improved the local search for artificial bee colony (ABC) algorithm. The performance of the proposed model has been compared with well-known quantitative-based classification methods and bio-inspired meta-heuristic classification algorithms, using six gene expression profiles for binary and multi-class cancer datasets. From the results, it can be concludes that a considerable increase in classification accuracy is obtained for the DQB when compared to other available algorithms in the literature, and it is able to provide an interpretable model for biologists. This confirms the significance of the proposed algorithm in the constructing a classifier rule-based model, and accordingly proofs that these rules obtain a highly qualified and meaningful knowledge extracted from the training set, where all subset of quantitive rules report close to 100% classification accuracy with a minimum number of genes. It is remarkable that apparently (to the best of our knowledge) several new genes were discovered that have not been seen in any past studies. For the applicability demand, based on the results acqured from microarray gene expression analysis, we can conclude that DQB can be adopted in a different real world applications with some modifications.  相似文献   

19.
烟草甲Lasioderma serricorne是一种重要的仓储害虫,长期化学防治导致烟草甲已对多种传统熏蒸剂产生抗性,但其对新型熏蒸剂甲酸乙酯仍处于敏感水平。因此明确其体内细胞色素P450还原酶(cytochrome P450 reductase, CPR)对甲酸乙酯的代谢解毒功能,对开展该药剂的抗性监测及延缓抗性的发生发展具有重要意义。本研究旨在克隆烟草甲LsCPR基因,分析其分子特征和表达特性,为进一步明确其在烟草甲对甲酸乙酯解毒代谢过程中的作用奠定基础。利用RT-PCR技术克隆烟草甲LsCPR基因的开放阅读框(open reading frame, ORF);利用生物信息学软件分析LsCPR编码蛋白的结构、特征和系统进化关系。采用实时定量PCR技术检测LsCPR在烟草甲不同发育阶段(低龄幼虫、高龄幼虫、蛹、低龄成虫、高龄成虫)、幼虫不同组织(表皮、肠道、脂肪体和马氏管)以及甲酸乙酯熏蒸胁迫后的表达模式。烟草甲LsCPR基因的ORF为2 046 bp(GenBank登录号:MZ423209),编码681个氨基酸,具有典型的昆虫CPR基因FMN区域、NADPH区域和FAD等保守结构域。系统发育分析表明,烟草甲LsCPR与鞘翅目昆虫聚为一支,且与赤拟谷盗Tribolium castaneum CPR亲缘关系最近。LsCPR在烟草甲不同发育阶段均有表达,在高龄幼虫期的表达水平较高;在幼虫体内的表达部位主要在肠道,其次为脂肪体和马氏管,而在表皮的表达水平最低。LC10、LC30和LC50 3种浓度的甲酸乙酯处理24 h后,烟草甲LsCPR表达量随着胁迫浓度升高而上调且显著高于对照;甲酸乙酯LC50处理烟草甲不同时间(3、6、12、24和48 h)后,LsCPR基因上调表达,24 h时达到表达高峰。推测LsCPR是参与烟草甲代谢甲酸乙酯的候选基因。  相似文献   

20.
Wang X 《RNA (New York, N.Y.)》2008,14(6):1012-1017
MicroRNAs (miRNAs) are short noncoding RNAs that are involved in the regulation of thousands of gene targets. Recent studies indicate that miRNAs are likely to be master regulators of many important biological processes. Due to their functional importance, miRNAs are under intense study at present, and many studies have been published in recent years on miRNA functional characterization. The rapid accumulation of miRNA knowledge makes it challenging to properly organize and present miRNA function data. Although several miRNA functional databases have been developed recently, this remains a major bioinformatics challenge to miRNA research community. Here, we describe a new online database system, miRDB, on miRNA target prediction and functional annotation. Flexible web search interface was developed for the retrieval of target prediction results, which were generated with a new bioinformatics algorithm we developed recently. Unlike most other miRNA databases, miRNA functional annotations in miRDB are presented with a primary focus on mature miRNAs, which are the functional carriers of miRNA-mediated gene expression regulation. In addition, a wiki editing interface was established to allow anyone with Internet access to make contributions on miRNA functional annotation. This is a new attempt to develop an interactive community-annotated miRNA functional catalog. All data stored in miRDB are freely accessible at http://mirdb.org.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号