首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
MOTIVATION: Comparative sequence analysis is widely used to study genome function and evolution. This approach first requires the identification of homologous genes and then the interpretation of their homology relationships (orthology or paralogy). To provide help in this complex task, we developed three databases of homologous genes containing sequences, multiple alignments and phylogenetic trees: HOBACGEN, HOVERGEN and HOGENOM. In this paper, we present two new tools for automating the search for orthologs or paralogs in these databases. RESULTS: First, we have developed and implemented an algorithm to infer speciation and duplication events by comparison of gene and species trees (tree reconciliation). Second, we have developed a general method to search in our databases the gene families for which the tree topology matches a peculiar tree pattern. This algorithm of unordered tree pattern matching has been implemented in the FamFetch graphical interface. With the help of a graphical editor, the user can specify the topology of the tree pattern, and set constraints on its nodes and leaves. Then, this pattern is compared with all the phylogenetic trees of the database, to retrieve the families in which one or several occurrences of this pattern are found. By specifying ad hoc patterns, it is therefore possible to identify orthologs in our databases.  相似文献   

Identification of anonymous proteins from two-dimensional (2-D) gels by peptide mass fingerprinting is one area of proteomics that can greatly benefit from a simple, automated workflow to minimize sample contamination and facilitate high-throughput sample processing. In this investigation we outline a workflow employing robotic automation at each step subsequent to 2-D gel electrophoresis. As proof-of-concept, 96 protein spots from a 2-D gel were analyzed using this approach. Whole protein (1 mg) from mature, dry soybean (Glycine max [L.] Merr.) cv. Jefferson seed was resolved by high resolution 2-D gel electrophoresis. Approximately 150 proteins were observed after staining with Coomassie Blue. The rather low number of detected proteins was due to the fact that the dynamic range of protein expression was greater than 100-fold. The most abundant proteins were seed storage proteins which in total represented over 60% of soybean seed protein. Using peptide mass fingerprinting 44 protein spots were identified. Identification of soybean proteins was greatly aided by the use of annotated, contiguous Expressed Sequence Tag (EST) databases which are available for public access (UniGene, ftp.ncbi.nih.gov/repository/UniGene/). Searches were orders of magnitude faster when compared to searches of unannotated EST databases and resulted in a higher frequency of valid, high-scoring matches. Some abundant, non seed storage proteins identified in this investigation include an isoelectric series of sucrose binding proteins, alcohol dehydrogenase and seed maturation proteins. This survey of anonymous seed proteins will serve as the basis for future comparative analysis of seed-filling in soybean as well as comparisons with other soybean varieties.  相似文献   

普通烟草LBD基因家族的全基因组序列鉴定与表达分析   总被引:2,自引:0,他引:2  
LBD是一类具有LOB(lateral organ boundaries)结构域的基因家族,在植物发育过程中起到非常重要的作用。采用生物信息学方法,根据拟南芥LBD基因序列鉴定了普通烟草基因组中的LBD基因,并对家族成员进行了序列特征、系统发育和表达谱分析。结果表明:普通烟草基因组中共有98个LBD基因成员,其基因结构相对简单,一般含有1~3个外显子。LBD基因家族可分成I和II两大类,两类均含有CX_2CX_6CX_3C保守结构域,但II类不含有LX_6LX_3LX_6L形成的"卷曲螺旋"二级结构,根据与拟南芥LBD蛋白构建的系统发育树则可细分成5个亚家族(Ia、Ib、Ic、Id和II)。将LBD基因与表达序列标签(EST)比对,发现36个基因有EST证据;EST、芯片数据和转录组数据分析表明:LBD基因具有不同的组织表达模式,部分基因表现出组织特异性。这些研究结果为普通烟草LBD基因家族功能的深入研究奠定了基础。  相似文献   

Tandem mass spectrometry (MS/MS) combined with database searching is currently the most widely used method for high-throughput peptide and protein identification. Many different algorithms, scoring criteria, and statistical models have been used to identify peptides and proteins in complex biological samples, and many studies, including our own, describe the accuracy of these identifications, using at best generic terms such as "high confidence." False positive identification rates for these criteria can vary substantially with changing organisms under study, growth conditions, sequence databases, experimental protocols, and instrumentation; therefore, study-specific methods are needed to estimate the accuracy (false positive rates) of these peptide and protein identifications. We present and evaluate methods for estimating false positive identification rates based on searches of randomized databases (reversed and reshuffled). We examine the use of separate searches of a forward then a randomized database and combined searches of a randomized database appended to a forward sequence database. Estimated error rates from randomized database searches are first compared against actual error rates from MS/MS runs of known protein standards. These methods are then applied to biological samples of the model microorganism Shewanella oneidensis strain MR-1. Based on the results obtained in this study, we recommend the use of use of combined searches of a reshuffled database appended to a forward sequence database as a means providing quantitative estimates of false positive identification rates of peptides and proteins. This will allow researchers to set criteria and thresholds to achieve a desired error rate and provide the scientific community with direct and quantifiable measures of peptide and protein identification accuracy as opposed to vague assessments such as "high confidence."  相似文献   

The ProDom database is a comprehensive set of protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases. An associated database, ProDom-CG, has been derived as a restriction of ProDom to completely sequenced genomes. The ProDom construction method is based on iterative PSI-BLAST searches and multiple alignments are generated for each domain family. The ProDom web server provides the user with a set of tools to visualise multiple alignments, phylogenetic trees and domain architectures of proteins, as well as a BLAST-based server to analyse new sequences for homologous domains. The comprehensive nature of ProDom makes it particularly useful to help sustain the growth of InterPro.  相似文献   

Systematic and fully automated identification of protein sequence patterns.   总被引:4,自引:0,他引:4  
We present an efficient algorithm to systematically and automatically identify patterns in protein sequence families. The procedure is based on the Splash deterministic pattern discovery algorithm and on a framework to assess the statistical significance of patterns. We demonstrate its application to the fully automated discovery of patterns in 974 PROSITE families (the complete subset of PROSITE families which are defined by patterns and contain DR records). Splash generates patterns with better specificity and undiminished sensitivity, or vice versa, in 28% of the families; identical statistics were obtained in 48% of the families, worse statistics in 15%, and mixed behavior in the remaining 9%. In about 75% of the cases, Splash patterns identify sequence sites that overlap more than 50% with the corresponding PROSITE pattern. The procedure is sufficiently rapid to enable its use for daily curation of existing motif and profile databases. Third, our results show that the statistical significance of discovered patterns correlates well with their biological significance. The trypsin subfamily of serine proteases is used to illustrate this method's ability to exhaustively discover all motifs in a family that are statistically and biologically significant. Finally, we discuss applications of sequence patterns to multiple sequence alignment and the training of more sensitive score-based motif models, akin to the procedure used by PSI-BLAST. All results are available at httpl//www.research.ibm.com/spat/.  相似文献   

The alignment of homologous sequences with each other and theirdisplay has proved a–difficult task, despite a frequentrequirement for this process. HOMED enables related sequencesto be edited and listed in parallel with each other. The editorfunction uses a full screen editor which emulates the text editorsKED and EDT (on PDP–11 and VAX–11 respectively)and which can be adapted to emulate other text editors. Thisemulation has been adopted to simplify user learning of editingfunctions. HOMED provides functions for listing the sequencesin a variety of formats and for generating a consensus sequenceas well as providing a series of tools for maintenance of thesequence database. HOMED has been implemented in Pascal in amodular fashion to enhance portability. Received on November 27, 1986; accepted on January 8, 1987  相似文献   

HOMED: a homologous sequence editor   总被引:5,自引:0,他引:5  

分析在植物开花过程中起重要作用的LEAFY(LFY)基因的保守区序列,设计1对长度均为23bp的PCR引物,以杧果基因组DNA为模板,采用PCR方法扩增出长为822bp的DNA片段,克隆入pGEM-T Easy载体。测序和序列分析表明,获得了杧果LFY同源基因(miLFY)3’端的1个片段,该片段有1个415bp的内含子,编码区共编码135个氨基酸,其序列已经在GenBank中登记(登录号AY189684)。在GenBank中进行同源性检索,发现其氨基酸序列与其它植物LFY同源基因的氨基酸序列同源性高达74%~97%,推测它们具有相似的功能。  相似文献   

In this work, the commonly used algorithms for mass spectrometry based protein identification, Mascot, MS-Fit, ProFound and SEQUEST, were studied in respect to the selectivity and sensitivity of their searches. The influence of various search parameters were also investigated. Approximately 6600 searches were performed using different search engines with several search parameters to establish a statistical basis. The applied mass spectrometric data set was chosen from a current proteome study. The huge amount of data could only be handled with computational assistance. We present a software solution for fully automated triggering of several peptide mass fingerprinting (PMF) and peptide fragmentation fingerprinting (PFF) algorithms. The development of this high-throughput method made an intensive evaluation based on data acquired in a typical proteome project possible. Previous evaluations of PMF and PFF algorithms were mainly based on simulations.  相似文献   

A database (DB) describing the relationships between species and their metabolites would be useful for metabolomics research, because it targets systematic analysis of enormous numbers of organic compounds with known or unknown structures in metabolomics. We constructed an extensive species-metabolite DB for plants, the KNApSAcK Core DB, which contains 101,500 species-metabolite relationships encompassing 20,741 species and 50,048 metabolites. We also developed a search engine within the KNApSAcK Core DB for use in metabolomics research, making it possible to search for metabolites based on an accurate mass, molecular formula, metabolite name or mass spectra in several ionization modes. We also have developed databases for retrieving metabolites related to plants used for a range of purposes. In our multifaceted plant usage DB, medicinal/edible plants are related to the geographic zones (GZs) where the plants are used, their biological activities, and formulae of Japanese and Indonesian traditional medicines (Kampo and Jamu, respectively). These data are connected to the species-metabolites relationship DB within the KNApSAcK Core DB, keyed via the species names. All databases can be accessed via the website http://kanaya.naist.jp/KNApSAcK_Family/. KNApSAcK WorldMap DB comprises 41,548 GZ-plant pair entries, including 222 GZs and 15,240 medicinal/edible plants. The KAMPO DB consists of 336 formulae encompassing 278 medicinal plants; the JAMU DB consists of 5,310 formulae encompassing 550 medicinal plants. The Biological Activity DB consists of 2,418 biological activities and 33,706 pairwise relationships between medicinal plants and their biological activities. Current statistics of the binary relationships between individual databases were characterized by the degree distribution analysis, leading to a prediction of at least 1,060,000 metabolites within all plants. In the future, the study of metabolomics will need to take this huge number of metabolites into consideration.  相似文献   

Some widely used standard protocols for the separation of phenylthiohydantoin amino acid derivatives by reverse-phase gradient HPLC do not provide separation of the phenylthiohydantoin derivative of tryptophan (PTH-Trp) from diphenylurea (DPU), a by-product generated during Edman degradation of proteins in variable amounts. Furthermore, PTH-Trp is usually recovered in low yield under typical experimental conditions used with automated sequencing equipment. These factors may compromise the unambiguous assignment of tryptophan residues in automated protein sequence analysis, especially when sequencing is performed at high sensitivity. We devised a reverse-phase HPLC method which allows the separation of DPU and PTH-Trp and therefore the correct assignment of PTH-Trp. The method is based on a modification of the HPLC gradient used to elute and separate all PTH amino acids of interest. With Applied Biosystems Model 477A protein sequencers with on-line PTH amino acid identification, the correct assignment of tryptophan was consistent and reproducible even when sequencing at very high sensitivity (5 pmol).  相似文献   

B A Burkhart  L C Skow  M Negishi 《Gene》1990,87(2):205-211
Steroid 15 alpha-hydroxylase (P45015 alpha) activity is concomitant with the expression of two types of mRNA in the mouse liver. Two discrete genes, designated 15 alpha oh-1 and 15 alpha oh-2, that encode the two mRNAs were recovered from total genomic libraries of the inbred mouse strains 129/J and C57Bl/6J and identified by cDNA hybridization, restriction-site analysis and partial nucleotide sequence. Both genes are approx. 9 kb long and share significant homology, including flanking regions, over a region of at least 30 kb. The two distinct 15 alpha oh genes are members of a larger family of homologous genes and/or pseudogenes of unknown function. The most extensive sequence homology among family members in the 3' portion of the gene with progressively less homology toward the 5' end. The far 5' portions of 15 alpha oh-1 and 15 alpha oh-2 are very similar to one another but there is no observed homology with other genes of the family. The two 15 alpha oh genes and the homologous family have been localized to mouse chromosome 7 by somatic cell hybrid mapping. Analysis of a restriction fragment length polymorphism in recombinant inbred mice shows a close linkage of 15 alpha oh-1 and 15 alpha oh-2 with the Coh locus.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号