共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
James Carson Tao Ju Musodiq Bello Christina Thaller Joe Warren Ioannis A. Kakadiaris Wah Chiu Gregor Eichele 《Methods (San Diego, Calif.)》2010,50(2):85-95
Massive amounts of image data have been collected and continue to be generated for representing cellular gene expression throughout the mouse brain. Critical to exploiting this key effort of the post-genomic era is the ability to place these data into a common spatial reference that enables rapid interactive queries, analysis, data sharing, and visualization. In this paper, we present a set of automated protocols for generating and annotating gene expression patterns suitable for the establishment of a database. The steps include imaging tissue slices, detecting cellular gene expression levels, spatial registration with an atlas, and textual annotation. Using high-throughput in situ hybridization to generate serial sets of tissues displaying gene expression, this process was applied toward the establishment of a database representing over 200 genes in the postnatal day 7 mouse brain. These data using this protocol are now well-suited for interactive comparisons, analysis, queries, and visualization. 相似文献
4.
5.
Rajkumar Sasidharan Tam��s Nepusz David Swarbreck Eva Huala Alberto Paccanaro 《Nucleic acids research》2012,40(19):e152
We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam’s capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/. 相似文献
6.
The draft sequence of several complete protozoan genomes is now available and genome projects are ongoing for a number of other species. Different strategies are being implemented to identify and annotate protein coding and RNA genes in these genomes, as well as study their genomic architecture. Since the genomes vary greatly in size, GC-content, nucleotide composition, and degree of repetitiveness, genome structure is often a factor in choosing the methodology utilised for annotation. In addition, the approach taken is dictated, to a greater or lesser extent, by the particular reasons for carrying out genome-wide analyses and the level of funding available for projects. Nevertheless, these projects have provided a plethora of material that will aid in understanding the biology and evolution of these parasites, as well as identifying new targets that can be used to design urgently required drug treatments for the diseases they cause. 相似文献
7.
MOTIVATION: A number of free-standing programs have been developed in order to help researchers find potential coding regions and deduce gene structure for long stretches of what is essentially 'anonymous DNA'. As these programs apply inherently different criteria to the question of what is and is not a coding region, multiple algorithms should be used in the course of positional cloning and positional candidate projects to assure that all potential coding regions within a previously-identified critical region are identified. RESULTS: We have developed a gene identification tool called GeneMachine which allows users to query multiple exon and gene prediction programs in an automated fashion. BLAST searches are also performed in order to see whether a previously-characterized coding region corresponds to a region in the query sequence. A suite of Perl programs and modules are used to run MZEF, GENSCAN, GRAIL 2, FGENES, RepeatMasker, Sputnik, and BLAST. The results of these runs are then parsed and written into ASN.1 format. Output files can be opened using NCBI Sequin, in essence using Sequin as both a workbench and as a graphical viewer. The main feature of GeneMachine is that the process is fully automated; the user is only required to launch GeneMachine and then open the resulting file with Sequin. Annotations can then be made to these results prior to submission to GenBank, thereby increasing the intrinsic value of these data. AVAILABILITY: GeneMachine is freely-available for download at http://genome.nhgri.nih.gov/genemachine. A public Web interface to the GeneMachine server for academic and not-for-profit users is available at http://genemachine.nhgri.nih.gov. The Web supplement to this paper may be found at http://genome.nhgri.nih.gov/genemachine/supplement/. 相似文献
8.
9.
Uncertainty and inconsistency of gene structure annotation remain limitations on research in the genome era, frustrating both biologists and bioinformaticians, who have to sort out annotation errors for their genes of interest or to generate trustworthy datasets for algorithmic development. It is unrealistic to hope for better software solutions in the near future that would solve all the problems. The issue is all the more urgent with more species being sequenced and analyzed by comparative genomics - erroneous annotations could easily propagate, whereas correct annotations in one species will greatly facilitate annotation of novel genomes. We propose a dynamic, economically feasible solution to the annotation predicament: broad-based, web-technology-enabled community annotation, a prototype of which is now in use for Arabidopsis. 相似文献
10.
Kossenkov A Manion FJ Korotkov E Moloshok TD Ochs MF 《Bioinformatics (Oxford, England)》2003,19(5):675-676
The automated sequence annotation pipeline (ASAP) is designed to ease routine investigation of new functional annotations on unknown sequences, such as expressed sequence tags (ESTs), through querying of web-accessible resources and maintenance of a local database. The system allows easy use of the output from one search as the input for a new search, as well as the filtering of results. The database is used to store formats and parameters and information for parsing data from web sites. The database permits easy updating of format information should a site modify the format of a query or of a returned web page. 相似文献
11.
Describing the determinants of robustness of biological systems has become one of the central questions in systems biology. Despite the increasing research efforts, it has proven difficult to arrive at a unifying definition for this important concept. We argue that this is due to the multifaceted nature of the concept of robustness and the possibility to formally capture it at different levels of systemic formalisms (e.g., topology and dynamic behavior). Here we provide a comprehensive review of the existing definitions of robustness pertaining to metabolic networks. As kinetic approaches have been excellently reviewed elsewhere, we focus on definitions of robustness proposed within graph-theoretic and constraint-based formalisms. 相似文献
12.
13.
MS combined with database searching has become the preferred method for identifying proteins present in cell or tissue samples. The technique enables us to execute large-scale proteome analyses of species whose genomes have already been sequenced. Searching mass spectrometric data against protein databases composed of annotated genes has been widely conducted. However, there are some issues with this technique; wrong annotations in protein databases cause deterioration in the accuracy of protein identification, and only proteins that have already been annotated can be identified. We propose a new framework that can detect correct ORFs by integrating an MS/MS proteomic data mapping and a knowledge-based system regarding the translation initiation sites. This technique can provide correction of predicted coding sequences, together with the possibility of identifying novel genes. We have developed a computational system; it should first conduct the probabilistic peptide-matching against all possible translational frames using MS/MS data, then search for discriminative DNA patterns around the detected peptides, and lastly integrate the facts using empirical knowledge stored in knowledge bases to obtain correct ORFs. We used photosynthetic bacteria Synechocystis sp. PCC6803 as a sample prokaryote, resulting in the finding of 14 N-terminus annotation errors and several new candidate genes. 相似文献
14.
An integrated computational pipeline and database to support whole-genome sequence annotation 下载免费PDF全文
Mungall CJ Misra S Berman BP Carlson J Frise E Harris N Marshall B Shu S Kaminker JS Prochnik SE Smith CD Smith E Tupy JL Wiel C Rubin GM Lewis SE 《Genome biology》2002,3(12):research0081.1-8111
We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture. 相似文献
15.
Protein surface analysis for function annotation in high-throughput structural genomics pipeline 总被引:3,自引:0,他引:3
Binkowski TA Joachimiak A Liang J 《Protein science : a publication of the Protein Society》2005,14(12):2972-2981
Structural genomics (SG) initiatives are expanding the universe of protein fold space by rapidly determining structures of proteins that were intentionally selected on the basis of low sequence similarity to proteins of known structure. Often these proteins have no associated biochemical or cellular functions. The SG success has resulted in an accelerated deposition of novel structures. In some cases the structural bioinformatics analysis applied to these novel structures has provided specific functional assignment. However, this approach has also uncovered limitations in the functional analysis of uncharacterized proteins using traditional sequence and backbone structure methodologies. A novel method, named pvSOAR (pocket and void Surface of Amino Acid Residues), of comparing the protein surfaces of geometrically defined pockets and voids was developed. pvSOAR was able to detect previously unrecognized and novel functional relationships between surface features of proteins. In this study, pvSOAR is applied to several structural genomics proteins. We examined the surfaces of YecM, BioH, and RpiB from Escherichia coli as well as the CBS domains from inosine-5'-monosphate dehydrogenase from Streptococcus pyogenes, conserved hypothetical protein Ta549 from Thermoplasm acidophilum, and CBS domain protein mt1622 from Methanobacterium thermoautotrophicum with the goal to infer information about their biochemical function. 相似文献
16.
17.
Since the structure of the DNA molecule was identified half a century ago, the complete genome sequence has been determined for 37 prokaryotes and several eukaryotes. With the exponential growth of genetic information, bioinformatics has attempted to predict gene locations and functions in cyberspace prior to experimental confirmation at the bench. 相似文献
18.
Predicting the biological function of all the genes of an organism is one of the fundamental goals of computational system biology. In the last decade, high-throughput experimental methods for studying the functional interactions between gene products (GPs) have been combined with computational approaches based on Bayesian networks for data integration. The result of these computational approaches is an interaction network with weighted links representing connectivity likelihood between two functionally related GPs. The weighted network generated by these computational approaches can be used to predict annotations for functionally uncharacterized GPs. Here we introduce Weighted Network Predictor (WNP), a novel algorithm for function prediction of biologically uncharacterized GPs. Tests conducted on simulated data show that WNP outperforms other 5 state-of-the-art methods in terms of both specificity and sensitivity and that it is able to better exploit and propagate the functional and topological information of the network. We apply our method to Saccharomyces cerevisiae yeast and Arabidopsis thaliana networks and we predict Gene Ontology function for about 500 and 10000 uncharacterized GPs respectively. 相似文献
19.
Chenggang Yu Nela Zavaljevski Valmik Desai Seth Johnson Fred J Stevens Jaques Reifman 《BMC bioinformatics》2008,9(1):52
Background
Automated protein function prediction methods are needed to keep pace with high-throughput sequencing. With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates these resources will benefit from the advantages of each method. However, integrated systems usually do not provide mechanisms to generate customized databases to predict particular protein functions. Here, we describe a tool termed PIPA (Pipeline for Protein Annotation) that has these capabilities. 相似文献20.
Chi-Ching Lee Yi-Ping Phoebe Chen Tzu-Jung Yao Cheng-Yu Ma Wei-Cheng Lo Ping-Chiang Lyu Chuan Yi Tang 《Gene》2013
Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. 相似文献