共查询到20条相似文献,搜索用时 0 毫秒
1.
Protein surface analysis for function annotation in high-throughput structural genomics pipeline 总被引:3,自引:0,他引:3
Binkowski TA Joachimiak A Liang J 《Protein science : a publication of the Protein Society》2005,14(12):2972-2981
Structural genomics (SG) initiatives are expanding the universe of protein fold space by rapidly determining structures of proteins that were intentionally selected on the basis of low sequence similarity to proteins of known structure. Often these proteins have no associated biochemical or cellular functions. The SG success has resulted in an accelerated deposition of novel structures. In some cases the structural bioinformatics analysis applied to these novel structures has provided specific functional assignment. However, this approach has also uncovered limitations in the functional analysis of uncharacterized proteins using traditional sequence and backbone structure methodologies. A novel method, named pvSOAR (pocket and void Surface of Amino Acid Residues), of comparing the protein surfaces of geometrically defined pockets and voids was developed. pvSOAR was able to detect previously unrecognized and novel functional relationships between surface features of proteins. In this study, pvSOAR is applied to several structural genomics proteins. We examined the surfaces of YecM, BioH, and RpiB from Escherichia coli as well as the CBS domains from inosine-5'-monosphate dehydrogenase from Streptococcus pyogenes, conserved hypothetical protein Ta549 from Thermoplasm acidophilum, and CBS domain protein mt1622 from Methanobacterium thermoautotrophicum with the goal to infer information about their biochemical function. 相似文献
2.
3.
4.
5.
6.
7.
Ricardo Moreira Borges Fernanda das Neves Costa Fernanda O. Chagas Andrew Magno Teixeira Jaewon Yoon Márcio Barczyszyn Weiss Camila Manoel Crnkovic Alan Cesar Pilon Bruno C. Garrido Luz Adriana Betancur Abel M. Forero Leonardo Castellanos Freddy A. Ramos Mônica T. Pupo Stefan Kuhn 《Phytochemical analysis : PCA》2023,34(1):48-55
8.
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data 总被引:3,自引:0,他引:3
High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a ‘variants reduction’ protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/. 相似文献
9.
High-throughput sequencing and genotyping methods are dramatically increasing the number of observable genetic intraspecies differences that can be exploited as genetic markers. In addition, automated phenotyping platforms and "omics" profiling technologies further enlarge the set of quantifiable macroscopic and molecular traits at an ever-increasing pace. Combined, both lines of technological advances create unparalleled opportunities to identify candidate gene regions and, ideally, even single genes responsible for observed variations in a particular trait via association studies. However, as of yet, this new potential is not sufficiently matched by enabling software solutions to easily exploit this wealth of genotype/phenotype information. We have developed Matapax, a Web-based platform to address this need. Initially, we built the infrastructure to support association studies in Arabidopsis (Arabidopsis thaliana) based on several genotyping efforts covering up to 1,375 Arabidopsis accessions. Based on the user-supplied trait information, associated single-nucleotide polymorphism markers and single-nucleotide polymorphism-harboring or -neighboring genes are identified using both the GAPIT and EMMA libraries developed for R. Additional interrogation is facilitated by displaying candidate regions and genes in a genome browser and by providing relevant annotation information. In the future, we plan to broaden the scope of organisms to other plant species as more genotype/phenotype information becomes available. Matapax is freely available at http://matapax.mpimp-golm.mpg.de and can be accessed using any internet browser. 相似文献
10.
Shi W Zhan C Ignatov A Manjasetty BA Marinkovic N Sullivan M Huang R Chance MR 《Structure (London, England : 1993)》2005,13(10):1473-1486
A high-throughput method for measuring transition metal content based on quantitation of X-ray fluorescence signals was used to analyze 654 proteins selected as targets by the New York Structural GenomiX Research Consortium. Over 10% showed the presence of transition metal atoms in stoichiometric amounts; these totals as well as the abundance distribution are similar to those of the Protein Data Bank. Bioinformatics analysis of the identified metalloproteins in most cases supported the metalloprotein annotation; identification of the conserved metal binding motif was also shown to be useful in verifying structural models of the proteins. Metalloproteomics provides a rapid structural and functional annotation for these sequences and is shown to be approximately 95% accurate in predicting the presence or absence of stoichiometric metal content. The project's goal is to assay at least 1 member from each Pfam family; approximately 500 Pfam families have been characterized with respect to transition metal content so far. 相似文献
11.
An integrated computational pipeline and database to support whole-genome sequence annotation 下载免费PDF全文
Mungall CJ Misra S Berman BP Carlson J Frise E Harris N Marshall B Shu S Kaminker JS Prochnik SE Smith CD Smith E Tupy JL Wiel C Rubin GM Lewis SE 《Genome biology》2002,3(12):research0081.1-8111
We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture. 相似文献
12.
Background
Pathogen diagnostic assays based on polymerase chain reaction (PCR) technology provide high sensitivity and specificity. However, the design of these diagnostic assays is computationally intensive, requiring high-throughput methods to identify unique PCR signatures in the presence of an ever increasing availability of sequenced genomes. 相似文献13.
Two-stage clustering (TSC): a pipeline for selecting operational taxonomic units for the high-throughput sequencing of PCR amplicons 总被引:1,自引:0,他引:1
Clustering 16S/18S rRNA amplicon sequences into operational taxonomic units (OTUs) is a critical step for the bioinformatic analysis of microbial diversity. Here, we report a pipeline for selecting OTUs with a relatively low computational demand and a high degree of accuracy. This pipeline is referred to as two-stage clustering (TSC) because it divides tags into two groups according to their abundance and clusters them sequentially. The more abundant group is clustered using a hierarchical algorithm similar to that in ESPRIT, which has a high degree of accuracy but is computationally costly for large datasets. The rarer group, which includes the majority of tags, is then heuristically clustered to improve efficiency. To further improve the computational efficiency and accuracy, two preclustering steps are implemented. To maintain clustering accuracy, all tags are grouped into an OTU depending on their pairwise Needleman-Wunsch distance. This method not only improved the computational efficiency but also mitigated the spurious OTU estimation from 'noise' sequences. In addition, OTUs clustered using TSC showed comparable or improved performance in beta-diversity comparisons compared to existing OTU selection methods. This study suggests that the distribution of sequencing datasets is a useful property for improving the computational efficiency and increasing the clustering accuracy of the high-throughput sequencing of PCR amplicons. The software and user guide are freely available at http://hwzhoulab.smu.edu.cn/paperdata/. 相似文献
14.
15.
Sarmah R Sahu J Dehury B Sarma K Sahoo S Sahu M Barooah M Sen P Modi MK 《Bioinformation》2012,8(4):206-208
With the advent of high-throughput sequencing technology, sequences from many genomes are being deposited to public databases at a brisk rate. Open access to large amount of expressed sequence tag (EST) data in the public databases has provided a powerful platform for simple sequence repeat (SSR) development in species where sequence information is not available. SSRs are markers of choice for their high reproducibility, abundant polymorphism and high inter-specific transferability. The mining of SSRs from ESTs requires different high-throughput computational tools that need to be executed individually which are computationally intensive and time consuming. To reduce the time lag and to streamline the cumbersome process of SSR mining from ESTs, we have developed a user-friendly, web-based EST-SSR pipeline EST-SSR-MARKER PIPELINE (ESMP). This pipeline integrates EST pre-processing, clustering, assembly and subsequently mining of SSRs from assembled EST sequences. The mining of SSRs from ESTs provides valuable information on the abundance of SSRs in ESTs and will facilitate the development of markers for genetic analysis and related applications such as marker-assisted breeding. AVAILABILITY: The database is available for free at http://bioinfo.aau.ac.in/ESMP. 相似文献
16.
Juan Falgueras Antonio J Lara Noé Fernández-Pozo Francisco R Cantón Guillermo Pérez-Trabado M Gonzalo Claros 《BMC bioinformatics》2010,11(1):38
Background
High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. 相似文献17.
18.
慈竹是我国四川当地的优势丛生竹种之一,其纤维长度和质量较优异,是造纸、纺织等工业的良好原料。本文利用Illumina Hi SeqTM 2000平台,对10、50、100和150 cm高的慈竹笋进行转录组分析,共得到69.28 M条读长(Reads),经从头拼接、组装和聚类后得到111 137条非重复序列基因Unigene,其中共有63 094条注释到COG、GO、KEGG、Swiss-Prot和Nr数据库中。这些Unigene不仅具有一般的功能,如转录和信号转导等,还涉及到蔗糖转运与代谢、次级代谢产物及细胞壁的生物合成等方面。不同高度慈竹笋的纤维素合成酶基因存在差异表达,发现了可能调控慈竹生长发育以及纤维素和木质素生物合成的相关基因,为慈竹品种改良提供一定的理论基础。 相似文献
19.
20.
Automated image alignment for 2D gel electrophoresis in a high-throughput proteomics pipeline 总被引:1,自引:0,他引:1
Motivation: The quest for high-throughput proteomics has revealeda number of challenges in recent years. Whilst substantial improvementsin automated protein separation with liquid chromatography andmass spectrometry (LC/MS), aka shotgun proteomics,have been achieved, large-scale open initiatives such as theHuman Proteome Organization (HUPO) Brain Proteome Project haveshown that maximal proteome coverage is only possible when LC/MSis complemented by 2D gel electrophoresis (2-DE) studies. Moreover,both separation methods require automated alignment and differentialanalysis to relieve the bioinformatics bottleneck and so makehigh-throughput protein biomarker discovery a reality. The purposeof this article is to describe a fully automatic image alignmentframework for the integration of 2-DE into a high-throughputdifferential expression proteomics pipeline. Results: The proposed method is based on robust automated imagenormalization (RAIN) to circumvent the drawbacks of traditionalapproaches. These use symbolic representation at the very earlystages of the analysis, which introduces persistent errors dueto inaccuracies in modelling and alignment. In RAIN, a third-ordervolume-invariant B-spline model is incorporated into a multi-resolutionschema to correct for geometric and expression inhomogeneityat multiple scales. The normalized images can then be compareddirectly in the image domain for quantitative differential analysis.Through evaluation against an existing state-of-the-art methodon real and synthetically warped 2D gels, the proposed analysisframework demonstrates substantial improvements in matchingaccuracy and differential sensitivity. High-throughput analysisis established through an accelerated GPGPU (general purposecomputation on graphics cards) implementation. Availability: Supplementary material, software and images usedin the validation are available at http://www.proteomegrid.org/rain/ Contact: g.z.yang{at}imperial.ac.uk Supplementary information: Supplementary data are availableat Bioinformatics online.
Associate Editor: David Rocke 相似文献