共查询到20条相似文献,搜索用时 0 毫秒
1.
With PACRAT (Patterns, Analyses, Correlations. Remote Archive Testbed) we present an online database solution to the problem of accessing high-confidence sequences with specific relationships to classes of genes, such as upstream intergenic regions attached to tRNA genes. In addition the software contains a data warehousing and analysis-facilitating suite to streamline the process of analyzing the collected data. An unexpected additional benefit of the system is that it also provides easy access to sequences of lower confidence, and may be of assistance in such things as resolving ORF-call conflicts in genomic annotation projects. 相似文献
2.
Daniel Luis Notari Aurione Molin Vanessa Davanzo Douglas Picolotto Helena Graziottin Ribeiro Scheila de Avila e Silva 《Bioinformation》2014,10(6):381-383
A whole genome contains not only coding regions, but also non-coding regions. These are located between the end of a given
coding region and the beginning of the following coding region. For this reason, the information about gene regulation process
underlies in intergenic regions. There is no easy way to obtain intergenic regions from current available databases. IntergenicDB
was developed to integrate data of intergenic regions and their gene related information from NCBI databases. The main goal of
INTERGENICDB is to offer friendly database for intergenic sequences of bacterial genomes.
Availability
http://intergenicdb.bioinfoucs.com/ 相似文献3.
David Goudenège Stéphane Avner Céline Lucchetti-Miganeh Frédérique Barloy-Hubler 《BMC microbiology》2010,10(1):88
Background
The functions of proteins are strongly related to their localization in cell compartments (for example the cytoplasm or membranes) but the experimental determination of the sub-cellular localization of proteomes is laborious and expensive. A fast and low-cost alternative approach is in silico prediction, based on features of the protein primary sequences. However, biologists are confronted with a very large number of computational tools that use different methods that address various localization features with diverse specificities and sensitivities. As a result, exploiting these computer resources to predict protein localization accurately involves querying all tools and comparing every prediction output; this is a painstaking task. Therefore, we developed a comprehensive database, called CoBaltDB, that gathers all prediction outputs concerning complete prokaryotic proteomes. 相似文献4.
ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes 总被引:1,自引:0,他引:1
A new system, ZCURVE 1.0, for finding protein- coding genes in bacterial and archaeal genomes has been proposed. The current algorithm, which is based on the Z curve representation of the DNA sequences, lays stress on the global statistical features of protein-coding genes by taking the frequencies of bases at three codon positions into account. In ZCURVE 1.0, since only 33 parameters are used to characterize the coding sequences, it gives better consideration to both typical and atypical cases, whereas in Markov-model-based methods, e.g. Glimmer 2.02, thousands of parameters are trained, which may result in less adaptability. To compare the performance of the new system with that of Glimmer 2.02, both systems were run, respectively, for 18 genomes not annotated by the Glimmer system. Comparisons were also performed for predicting some function-known genes by both systems. Consequently, the average accuracy of both systems is well matched; however, ZCURVE 1.0 has more accurate gene start prediction, lower additional prediction rate and higher accuracy for the prediction of horizontally transferred genes. It is shown that the joint applications of both systems greatly improve gene-finding results. For a typical genome, e.g. Escherichia coli, the system ZCURVE 1.0 takes approximately 2 min on a Pentium III 866 PC without any human intervention. The system ZCURVE 1.0 is freely available at: http://tubic. tju.edu.cn/Zcurve_B/. 相似文献
5.
Background
Bacterial typing schemes based on the sequences of genes encoding surface antigens require databases that provide a uniform, curated, and widely accepted nomenclature of the variants identified. Due to the differences in typing schemes, imposed by the diversity of genes targeted, creating these databases has typically required the writing of one-off code to link the database to a web interface. Here we describe agdbNet, widely applicable web database software that facilitates simultaneous BLAST querying of multiple loci using either nucleotide or peptide sequences. 相似文献6.
Sridhar J Sabarinathan R Balan SS Rafi ZA Gunasekaran P Sekar K 《基因组蛋白质组与生物信息学报(英文版)》2011,9(4-5):179-182
In the past few decades, scientists from all over the world have taken a keen interest in novel functional units such as small regulatory RNAs, small open reading frames, pseudogenes, transposons, integrase binding attB/attP sites, repeat elements within the bacterial intergenic regions (IGRs) and in the analysis of those "junk" regions for genomic complexity. Here we have developed a web server, named Junker, to facilitate the in-depth analysis of IGRs for examining their length distribution, four-quadrant plots, GC percentage and repeat details. Upon selection of a particular bacterial genome, the physical genome map is displayed as a multiple loci with options to view any loci of interest in detail. In addition, an IGR statistics module has been created and implemented in the web server to analyze the length distribution of the IGRs and to understand the disordered grouping of IGRs across the genome by generating the four-quadrant plots. The proposed web server is freely available at the URL http://pranag.physics.iisc.ernet.in/junker/. 相似文献
7.
随着生物信息学与生物技术的不断发展,生物信息数据库中数据呈指数增长,理解其中所包含的生物学知识,揭示生物内在规律将成为今后自然科学研究中的重要课题。对近几年来国外常用生物信息数据库的使用作了简介,同时也较为详细地描述了如何进行序列分析。 相似文献
8.
Park GW Kwon KH Kim JY Lee JH Yun SH Kim SI Park YM Cho SY Paik YK Yoo JS 《Proteomics》2006,6(4):1121-1132
In shotgun proteomics, proteins can be fractionated by 1-D gel electrophoresis and digested into peptides, followed by liquid chromatography to separate the peptide mixture. Mass spectrometry generates hundreds of thousands of tandem mass spectra from these fractions, and proteins are identified by database searching. However, the search scores are usually not sufficient to distinguish the correct peptides. In this study, we propose a confident protein identification method for high-throughput analysis of human proteome. To build a filtering protocol in database search, we chose Pseudomonas putida KT2440 as a reference because this bacterial proteome contains fewer modifications and is simpler than the human proteome. First, the P. putida KT2440 proteome was filtered by reversed sequence database search and correlated by the molecular weight in 1-D-gel band positions. The characterization protocol was then applied to determine the criteria for clustering of the human plasma proteome into three different groups. This protein filtering method, based on bacterial proteome data analysis, represents a rapid way to generate higher confidence protein list of the human proteome, which includes some of heavily modified and cleaved proteins. 相似文献
9.
10.
Background
Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. 相似文献11.
RiceGAAS: an automated annotation system and database for rice genome sequence 总被引:27,自引:0,他引:27 下载免费PDF全文
Katsumi Sakata Yoshiaki Nagamura Hisataka Numa Baltazar A. Antonio Hideki Nagasaki Atsuko Idonuma Wakako Watanabe Yuji Shimizu Ikuo Horiuchi Takashi Matsumoto Takuji Sasaki Kenichi Higo 《Nucleic acids research》2002,30(1):98-102
An extensive effort of the International Rice Genome Sequencing Project (IRGSP) has resulted in rapid accumulation of genome sequence, and >137 Mb has already been made available to the public domain as of August 2001. This requires a high-throughput annotation scheme to extract biologically useful and timely information from the sequence data on a regular basis. A new automated annotation system and database called Rice Genome Automated Annotation System (RiceGAAS) has been developed to execute a reliable and up-to-date analysis of the genome sequence as well as to store and retrieve the results of annotation. The system has the following functional features: (i) collection of rice genome sequences from GenBank; (ii) execution of gene prediction and homology search programs; (iii) integration of results from various analyses and automatic interpretation of coding regions; (iv) re-execution of analysis, integration and automatic interpretation with the latest entries in reference databases; (v) integrated visualization of the stored data using web-based graphical view. RiceGAAS also has a data submission mechanism that allows public users to perform fully automated annotation of their own sequences. The system can be accessed at http://RiceGAAS.dna.affrc.go.jp/. 相似文献
12.
13.
Araújo LV Soares MA Oliveira SM Chequer P Tanuri A Sabino EC Ferreira JE 《Genetics and molecular research : GMR》2006,5(1):203-215
We developed a database system for collaborative HIV analysis (DBCollHIV) in Brazil. The main purpose of our DBCollHIV project was to develop an HIV-integrated database system with analytical bioinformatics tools that would support the needs of Brazilian research groups for data storage and sequence analysis. Whenever authorized by the principal investigator, this system also allows the integration of data from different studies and/or the release of the data to the general public. The development of a database that combines sequences associated with clinical/epidemiological data is difficult without the active support of interdisciplinary investigators. A functional database that securely stores data and helps the investigator to manipulate their sequences before publication would be an attractive tool for investigators depositing their data and collaborating with other groups. DBCollHIV allows investigators to manipulate their own datasets, as well as integrating molecular and clinical HIV data, in an innovative fashion. 相似文献
14.
rpoB sequence analysis as a novel basis for bacterial identification 总被引:12,自引:0,他引:12
Comparison of the sequences of conserved genes, most commonly those encoding 16S rRNA, is used for bacterial genotypic identification. Among some taxa, such as the Enterobacteriaceae, variation within this gene does not allow confident species identification. We investigated the usefulness of RNA polymerase beta-subunit encoding gene ( rpoB ) sequences as an alternative tool for universal bacterial genotypic identification. We generated a database of partial rpoB for 14 Enterobacteriaceae species and then assessed the intra- and interspecies divergence between the rpoB and the 16S rRNA genes by pairwise comparisons. We found that levels of divergence between the rpoB sequences of different strains were markedly higher than those between their 16S rRNA genes. This higher discriminatory power was further confirmed by assigning 20 blindly selected clinical isolates to the correct enteric species on the basis of rpoB sequence comparison. Comparison of rpoB sequences from Enterobacteriaceae was also used as the basis for their phylogenetic analysis and demonstrated the genus Klebsiella to be polyphyletic. The trees obtained with rpoB were more compatible with the currently accepted classification of Enterobacteriaceae than those obtained with 16S rRNA. These data indicate that rpoB is a powerful identification tool, which may be useful for universal bacterial identification. 相似文献
15.
An integrated view of bacterial and archaeal diversity in saline soil habitats is essential for understanding the biological and ecological processes and exploiting potential of microbial resources from such environments. This study examined the collective bacterial and archaeal diversity in saline soils using a meta-analysis approach. All available 16S rDNA sequences recovered from saline soils were retrieved from publicly available databases and subjected to phylogenetic and statistical analyses. A total of 9,043 bacterial and 1,039 archaeal sequences, each longer than 250 bp, were examined. The bacterial sequences were assigned into 5,784 operational taxonomic units (OTUs, based on ≥97 % sequence identity), representing 24 known bacterial phyla, with Proteobacteria (44.9 %), Actinobacteria (12.3 %), Firmicutes (10.4 %), Acidobacteria (9.0 %), Bacteroidetes (6.8 %), and Chloroflexi (5.9 %) being predominant. Lysobacter (12.8 %) was the dominant bacterial genus in saline soils, followed by Sphingomonas (4.5 %), Halomonas (2.5 %), and Gemmatimonas (2.5 %). Archaeal sequences were assigned to 602 OTUs, primarily from the phyla Euryarchaeota (88.7 %) and Crenarchaeota (11.3 %). Halorubrum and Thermofilum were the dominant archaeal genera in saline soils. Rarefaction analysis indicated that less than 25 % of bacterial diversity, and approximately 50 % of archaeal diversity, in saline soil habitats has been sampled. This analysis of the global bacterial and archaeal diversity in saline soil habitats can guide future studies to further examine the microbial diversity of saline soils. 相似文献
16.
17.
18.
In silico proteomics complements computational genomics in characterizing genome evolution. Here we examine cluster patterns in archaeal and bacterial proteomes using compositional properties of protein sequences in contrast to the traditionally used sequence alignment procedures. Application of standard Principal Component Analysis to the multi-dimensional data identified cluster patterns. Two types of cluster patterns exist in bacterial proteomes. Proteomes of type I have one major cluster with few isolated points in space revealing an underlying largely homogeneous compositional structure. In type II proteomes two clusters of protein distribution were discernible. The two clusters differ in size and were separated from each other although the boundary was somewhat fuzzy. Proteins falling in the major cluster were labeled as 'typical' and proteins of the minor cluster were called 'atypical'. The atypical proteins were mapped to Cluster of Orthologous Groups. Species distribution in COGs maps with respect to atypical proteins illuminated the biological relationships of extreme diversity among the archaeal members and of diversity among bacteria in relation to their niche. Amino acids that were over-represented in the atypical proteins had higher biosynthetic cost compared to 'typical' ribosomal proteins. However, archaea and bacteria economize by preferring the less costly amino acid to others closely related in chemical structure. Further, over-representation of serine in atypical proteins of archaeal members suggests re-examining these proteomes for the presence of Serine/Threonine phosphatases and kinases in Archaea. Our computational procedure can serve as a useful addition to the existing tools for carrying out in silico proteomics. 相似文献
19.
20.
SEQ is an interactive, self-documenting computer program that contains procedures for the analysis of nucleotide sequences and the manipulation of such sequences to allow the simulation and prediction of the results of recombinant DNA experiments. 相似文献