首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Proteomics research routinely involves identifying peptides and proteins via MS/MS sequence database search. Thus the database search engine is an integral tool in many proteomics research groups. Here, we introduce the Comet search engine to the existing landscape of commercial and open‐source database search tools. Comet is open source, freely available, and based on one of the original sequence database search tools that has been widely used for many years.  相似文献   

2.
ADSP-a new package for computational sequence analysis   总被引:3,自引:0,他引:3  
A new protein sequence analysis package, ADSP, is described,of which the SOMAP Screen–Oriented Multiple AlignmentProcedure forms an integral part. ADSP (Algorithms and DataStructures for Protein sequence analysis) incorporates facilitiesto generate potent pattern-recognition discriminators and offersfour algorithms with which to scan any NBRF format sequencedatabase: the package has been designed, in particular, to interfacewith the OWL composite sequence database, one of the largest,distributed non-redundant sources of sequence data of its kind.The system incorporates a powerful method for compound featureanalysis, which provides the basis for characterizing and predictingthe occurrence of complete protein superfamilies and for pinpointingthe emergence of related subfamilies. Used iteratively, theapproach allows diagnostic performance to be rigorously refinedand its efficacy to be assessed both qualitatively and quantitatively,and results in the generation of refined structural or functionalfeatures suitable for entry into a database: this compilationof characteristic signatures is distinct from, but complementaryto, widely used compendia of pattern templates such as PROSUE.  相似文献   

3.
Artemis is a widely used software tool for annotating and viewing sequence data. No database is required to use Artemis. Instead, individual sequence data files can be analysed with little or no formatting, making it particularly suited to the study of small genomes and chromosomes, and straightforward for a novice user to get started. Since its release in 1999, Artemis has been used to annotate a diverse collection of prokaryotic and eukaryotic genomes, ranging from Streptomyces coelicolor to, more recently, a large proportion of the Plasmodium falciparum genome. Artemis allows annotated genomes to be easily browsed and makes it simple to add useful biological information to raw sequence data. This paper gives an overview of some of the features of Artemis and includes how it facilitates manual gene prediction and can provide an overview of entire chromosomes or small compact genomes--useful for uncovering unusual features such as pathogenicity islands.  相似文献   

4.
5.
PANZEA is the first public database for studying maize genomic diversity. It was initiated as a repository of genomic diversity for an NSF Plant Genome project on 'Maize Evolutionary Genomics'. PANZEA is hosted at the Bioinformatics Research Center, North Carolina State University, and is open to the public (http://statgen.ncsu.edu/panzea). PANZEA is designed to capture the interrelationships between germplasm, molecular diversity, phenotypic diversity and genome structure. It has the ability to store, integrate and visualize DNA sequence, enzymatic, SSR (simple sequence repeat) marker, germplasm and phenotypic data. The relational data model is selected and implemented in Oracle. An automated DNA sequence data submission tool has been created that allows project researchers to remotely submit their DNA sequence data directly to PANZEA. On-line database search forms and reports have been created to allow users to search or download germplasm, DNA sequence, gene/locus data and much more, directly from the web.  相似文献   

6.
Kumar D  Mittal Y 《Bioinformation》2011,6(3):134-136
Lectins, a class of carbohydrate-binding proteins and widely recognized to play a range of crucial roles in many cell-cell recognition events triggering several important cellular processes encompass different members that are diverse in their protein structures, carbohydrate affinities and specificities, their larger biological roles and potential applications. To attain an effective use of all the diverse data initially an animal lectin database 'AnimalLectinDb' with information pertaining to taxonomic, structural, domain architecture, molecular sequence, carbohydrate structure and blood group specificity has been developed. It is expected to be of high value not only for basic study in lectin biology but also for advanced research in pursuing several applications in biotechnology, immunology, and clinical practice. AVAILABILITY: The database is available for free at http://www.research-bioinformatics.in.  相似文献   

7.
8.
9.
A strategy has been developed for the construction of a validated, comprehensive composite protein sequence database. Entries are amalgamated from primary source data bases by a largely automated set of processes in which redundant and trivially different entries are eliminated. A modular approach has been adopted to allow scientific judgement to be used at each stage of database processing and amalgamation. Source databases are assigned a priority depending on the quality of sequence validation and commenting. Rejection of entries from the lower priority database, in each pairwise comparison of databases, is carried out according to optionally defined redundancy criteria based on sequence segment mismatches. Efficient algorithms for this methodology are embodied in the COMPO software system. COMPO has been applied for over 2 years in construction and regular updating of the OWL composite protein sequence database from the source databases NBRF-PIR, SWISS-PROT, a GenBank translation retrieved from the feature tables, NBRF-NEW, NEWAT86, PSD-KYOTO and the sequences contained in the Brookhaven protein structure databank. OWL is part of the ISIS integrated data resource of protein sequence and structure [Akrigg et al. (1988) Nature, 335, 745-746]. The modular nature of the integration process greatly facilitates the frequent updating of OWL following releases of the source databases. The extent of redundancy in these sources is revealed by the comparison process. The advantages of a robust composite database for sequence similarity searching and information retrieval are discussed.  相似文献   

10.
11.
《Genomics》2022,114(3):110348
Single nucleotide polymorphisms (SNPs) are widely used in genetic research and molecular breeding. To date, the genomes of many vegetable crops have been assembled, and hundreds of core germplasms for each vegetable have been sequenced. However, these data are not currently easily accessible because they are stored on different public databases. Therefore, a vegetable crop SNP database should be developed that hosts SNPs demonstrated to have a high success rate in genotyping for genetic research (herein, “alpha SNPs”). We constructed a database (VegSNPDB, http://www.vegsnpdb.cn/) containing the sequence data of 2032 germplasms from 16 vegetable crop species. VegSNPDB hosts 118,725,944 SNPs of which 4,877,305 were alpha SNPs. SNPs can be searched by chromosome number, position, SNP type, genetic population, or specific individuals, as well as the values of MAF, PIC, and heterozygosity. We hope that VegSNPDB will become an important SNP database for the vegetable research community.  相似文献   

12.
From its origin, the PIR has aspired to support research in computational biology and genomics through the compilation of a comprehensive, quality controlled and well-organized protein sequence information resource. The resource originated with the pioneering work of the late Margaret O. Dayhoff in the early 1960s. Since 1988, the Protein Sequence Database has been maintained collaboratively by PIR-International, an association of macromolecular sequence data collection centers dedicated to fostering international cooperation as an essential element in the development of scientific databases. The work of the resource is widely distributed and is available on the World Wide Web, via FTP, E-mail server, CD-ROM and magnetic media. It is widely redistributed and incorporated into many other protein sequence data compilations including SWISS-PROT and theEntrezsystem of the NCBI.  相似文献   

13.
A computerized database containing DNA sequence information regarding human HPRT mutants has been created. The database itself is in the dBASE format and contains information on about 1500 mutants. In addition, an IBM PC compatible software package to analyze the information in the database has been developed. Both the database and software are freely available via the Internet.  相似文献   

14.
The Enzymes and Metabolic Pathways database (EMP) is an encoding of the contents of over 10 000 original publications on the topics of enzymology and metabolism. This large body of information has been transformed into a queryable database. An extraction of over 1800 pictorial representations of metabolic pathways from this collection is freely available on the World Wide Web. We believe that this collection will play an important role in the interpretation of genetic sequence data, as well as offering a meaningful framework for the integration of many other forms of biological data.  相似文献   

15.
The Los Alamos hepatitis C sequence database   总被引:6,自引:0,他引:6  
MOTIVATION: The hepatitis C virus (HCV) is a significant threat to public health worldwide. The virus is highly variable and evolves rapidly, making it an elusive target for the immune system and for vaccine and drug design. At present, some 30 000 HCV sequences have been published. A central website that provides annotated sequences and analysis tools will be helpful to HCV scientists worldwide. RESULTS: The HCV sequence database collects and annotates sequence data and provides them to the public via a website that contains a user-friendly search interface and a large number of sequence analysis tools, based on the model of the highly regarded Los Alamos HIV database. The HCV sequence database was officially launched in September 2003. Since then, its usage has steadily increased and is now at an average of approximately 280 visits per day from distinct IP addresses. AVAILABILITY: The HCV website can be accessed via http://hcv.lanl.gov and http://hcv-db.org.  相似文献   

16.
Zeng MS  Li DJ  Liu QL  Song LB  Li MZ  Zhang RH  Yu XJ  Wang HM  Ernberg I  Zeng YX 《Journal of virology》2005,79(24):15323-15330
To date, the only entire Epstein-Barr virus (EBV) genomic sequence available in the database is the prototype B95.8, which was derived from an individual with infectious mononucleosis. A causative link between EBV and nasopharyngeal carcinoma (NPC), a disease with a distinctly high incidence in southern China, has been widely investigated. However, no full-length analysis of any substrain of EBV from this area has been reported. In this study, we analyzed the entire genomic sequence of an EBV strain from a patient with NPC in Guangdong, China. This EBV strain was termed GD1 (Guangdong strain 1), and the full-length sequence of GD1 was submitted to the GenBank database. The assigned accession number is AY961628. The entire GD1 sequence is 171,656 bp in length, with 59.5% G+C content and 40.5% A+T content. We detected many sequence variations in GD1 compared to prototypical strain B95.8, including 43 deletion sites, 44 insertion sites, and 1,413 point mutations. Furthermore, we evaluated the frequency of some of these GD1 mutations in Cantonese NPC patients and found them to be highly prevalent. These findings suggest that GD1 is highly representative of the EBV strains isolated from NPC patients in Guangdong, China, an area with the highest incidence of NPC in the world. Furthermore, these findings provide the second full-length sequence analysis of any EBV strain as well as the first full-length sequence analysis of an NPC-derived EBV strain.  相似文献   

17.
A computerized database containing DNA sequence information regarding human p53 mutants has been created. The database itself is in the dBASE format and contains information on nearly 3000 mutants. In addition, an IBM PC compatible software package to analyze the information in the database has been developed. Both the database and software are freely available via the Internet.  相似文献   

18.
The KEGG pathway maps are widely used as a reference data set for inferring high-level functions of the organism or the ecosystem from its genome or metagenome sequence data. The KEGG modules, which are tighter functional units often corresponding to subpathways in the KEGG pathway maps, are designed for better automation of genome interpretation. Each KEGG module is represented by a simple Boolean expression of KEGG Orthology (KO) identifiers (K numbers), enabling automatic evaluation of the completeness of genes in the genome. Here we focus on metabolic functions and introduce reaction modules for improving annotation and signature modules for inferring metabolic capacity. We also describe how genome annotation is performed in KEGG using the manually created KO database and the computationally generated SSDB database. The resulting KEGG GENES database with KO (K number) annotation is a reference sequence database to be compared for automated annotation and interpretation of newly determined genomes.  相似文献   

19.

Background  

In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences.  相似文献   

20.
UniProt蛋白质数据库简介   总被引:1,自引:0,他引:1       下载免费PDF全文
罗静初 《生物信息学》2019,17(3):131-144
UniProt(https://www.uniprot.org/)是国际知名蛋白质数据库,主要包括UniProtKB知识库、UniParc归档库和UniRef参考序列集三部分。UniProtKB知识库是UniProt的核心,除蛋白质序列数据外,还包括大量注释信息。UniProtKB知识库分Swiss-Prot和TrEMBL两个子库。Swiss-Prot子库中50多万条序列均由人工审阅和注释,而TrEMBL子库中1.4亿多条序列是由核酸序列数据库EMBL中的蛋白质编码序列翻译所得,并由计算机根据一定规则进行注释。UniParc归档库将存放于不同数据库中的同一个蛋白质归并到一个记录中以避免冗余,并赋予序列唯一性特定标识符。UniRef参考序列集按相似性程度将UniProtKB和UniParc中的序列分为UniRef100、UniRef90和UniRef50三个数据集。UniProt网站为用户提供了高效实用的高级检索系统和大量帮助文档。UniProt数据库每4周发布新版的同时也发布统计报表,用户可通过统计报表了解该数据库的数据量及更新情况、数据类别和物种分布等基本信息,查看常规注释信息、序列特征注释信息和数据库交叉链接等统计数据。UniProt是目前国际上序列数据最完整、注释信息最丰富的非冗余蛋白质序列数据库,自本世纪初创建以来,为生命科学领域提供了宝贵资源。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号