期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

TRAPLINE: a standardized and automated pipeline for RNA sequencing data analysis,evaluation and annotation

Markus Wolfien Christian Rimmbach Ulf Schmitz Julia Jeannine Jung Stefan Krebs Gustav Steinhoff Robert David Olaf Wolkenhauer 《BMC bioinformatics》2016,17(1):1-18

相似文献

2.

An integrated Arabidopsis annotation database for Affymetrix Genechip data analysis, and tools for regulatory motif searches

Ghassemian M Waner D Tchieu J Gribskov M Schroeder JI 《Trends in plant science》2001,6(10):448-449

Genome-scale sequencing projects have provided the essential information required for the construction of entire genome chips or microarrays for RNA expression studies. The Arabidopsis and rice genomes have been sequenced and whole-genome oligonucleotide arrays are being manufactured. These should soon become available to researchers. Expression studies using genomic-scale expression arrays are providing us with a vast quantity of information at a rapid pace. The rate-limiting step in this type of experiments is not the data generation step but rather the data analysis component of experiments. We report improvements that should facilitate the analysis of Affymetrix Genechip expression data. 相似文献

3.

EST2uni: an open,parallel tool for automated EST analysis and database creation,with a data mining web interface and microarray expression data integration

Javier Forment Francisco Gilabert Antonio Robles Vicente Conejero Fernando Nuez Jose M Blanca 《BMC bioinformatics》2008,9(1):5

Background

Expressed sequence tag (EST) collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotated to remove low-quality and vector regions, eliminate redundancy and sequencing errors, and provide biologically relevant information. In order to provide a suitable way of performing the different steps in the analysis of the ESTs, flexible computation pipelines adapted to the local needs of specific EST projects have to be developed. Furthermore, EST collections must be stored in highly structured relational databases available to researchers through user-friendly interfaces which allow efficient and complex data mining, thus offering maximum capabilities for their full exploitation. 相似文献

4.

A web-based platform for rice microarray annotation and data analysis

Chen D Zhang F Yuan C Lu J Li X Chen M 《中国科学：生命科学英文版》2010,53(12):1467-1473

Rice (Oryza sativa) feeds over half of the global population. A web-based integrated platform for rice microarray annotation and data analysis in various biological contexts is presented, which provides a convenient query for comprehensive annotation compared with similar databases. Coupled with existing rice microarray data, it provides online analysis methods from the perspective of bioinformatics. This comprehensive bioinformatics analysis platform is composed of five modules, including data retrieval, microarray annotation, sequence analysis, results visualization and data analysis. The BioChip module facilitates the retrieval of microarray data information via identifiers of “Probe Set ID”, “Locus ID” and “Analysis Name”. The BioAnno module is used to annotate the gene or probe set based on the gene function, the domain information, the KEGG biochemical and regulatory pathways and the potential microRNA which regulates the genes. The BioSeq module lists all of the related sequence information by a microarray probe set. The BioView module provides various visual results for the microarray data. The BioAnaly module is used to analyze the rice microarray’s data set. 相似文献

5.

ASAP: automated sequence annotation pipeline for web-based updating of sequence information with a local dynamic database

Kossenkov A Manion FJ Korotkov E Moloshok TD Ochs MF 《Bioinformatics (Oxford, England)》2003,19(5):675-676

The automated sequence annotation pipeline (ASAP) is designed to ease routine investigation of new functional annotations on unknown sequences, such as expressed sequence tags (ESTs), through querying of web-accessible resources and maintenance of a local database. The system allows easy use of the output from one search as the input for a new search, as well as the filtering of results. The database is used to store formats and parameters and information for parsing data from web sites. The database permits easy updating of format information should a site modify the format of a query or of a returned web page. 相似文献

6.

Proteomic analysis of log to stationary growth phase Lactobacillus plantarum cells and a 2-DE database 总被引：2，自引：0，他引：2

Cohen DP Renes J Bouwman FG Zoetendal EG Mariman E de Vos WM Vaughan EE 《Proteomics》2006,6(24):6485-6493

Lactobacillus plantarum is part of the natural microbiota of many food fermentations as well as the human gastro-intestinal tract. The cytosolic fraction of the proteome of L. plantarum WCFS1, whose genome has been sequenced, was studied. 2-DE was used to investigate the proteins from the cytosolic fraction isolated from mid- and late-log, early- and late-stationary phase cells to generate reference maps of different growth conditions offering more knowledge of the metabolic behavior of this bacterium. From this fraction, a total of 200 protein spots were identified by MALDI-MS and a proteome production map was constructed to facilitate further studies such as detection of suitable biomarkers for specific growth conditions. More than half (57%) of the identified proteins were predicted to be involved in metabolic pathways of the bacterium. The protein profile changed during the growth of the bacteria such that 29% of the identified proteins involved in anabolic pathways were at least twofold up-regulated throughout the mid- and late-exponential and early-stationary phases. In the late-stationary phase, six proteins involved in stress or with a potential role for survival during starvation were up-regulated significantly. 相似文献

7.

Optimizing transformations for automated,high throughput analysis of flow cytometry data

Greg Finak Juan-Manuel Perez Andrew Weng Raphael Gottardo 《BMC bioinformatics》2010,11(1):546

Background

In a high throughput setting, effective flow cytometry data analysis depends heavily on proper data preprocessing. While usual preprocessing steps of quality assessment, outlier removal, normalization, and gating have received considerable scrutiny from the community, the influence of data transformation on the output of high throughput analysis has been largely overlooked. Flow cytometry measurements can vary over several orders of magnitude, cell populations can have variances that depend on their mean fluorescence intensities, and may exhibit heavily-skewed distributions. Consequently, the choice of data transformation can influence the output of automated gating. An appropriate data transformation aids in data visualization and gating of cell populations across the range of data. Experience shows that the choice of transformation is data specific. Our goal here is to compare the performance of different transformations applied to flow cytometry data in the context of automated gating in a high throughput, fully automated setting. We examine the most common transformations used in flow cytometry, including the generalized hyperbolic arcsine, biexponential, linlog, and generalized Box-Cox, all within the BioConductor flowCore framework that is widely used in high throughput, automated flow cytometry data analysis. All of these transformations have adjustable parameters whose effects upon the data are non-intuitive for most users. By making some modelling assumptions about the transformed data, we develop maximum likelihood criteria to optimize parameter choice for these different transformations. 相似文献

8.

RAPYD--rapid annotation platform for yeast data

Schneider J Blom J Jaenicke S Linke B Brinkrolf K Neuweger H Tauch A Goesmann A 《Journal of biotechnology》2011,155(1):118-126

Lower eukaryotes of the kingdom Fungi include a variety of biotechnologically important yeast species that are in the focus of genome research for more than a decade. Due to the rapid progress in ultra-fast sequencing technologies, the amount of available yeast genome data increases steadily. Thus, an efficient bioinformatics platform is required that covers genome assembly, eukaryotic gene prediction, genome annotation, comparative yeast genomics, and metabolic pathway reconstruction. Here, we present a bioinformatics platform for yeast genomics named RAPYD addressing the key requirements of extensive yeast sequence data analysis. The first step is a comprehensive regional and functional annotation of a yeast genome. A region prediction pipeline was implemented to obtain reliable and high-quality predictions of coding sequences and further genome features. Functions of coding sequences are automatically determined using a configurable prediction pipeline. Based on the resulting functional annotations, a metabolic pathway reconstruction module can be utilized to rapidly generate an overview of organism-specific features and metabolic blueprints. In a final analysis step shared and divergent features of closely related yeast strains can be explored using the comparative genomics module. An in-depth application example of the yeast Meyerozyma guilliermondii illustrates the functionality of RAPYD. A user-friendly web interface is available at https://rapyd.cebitec.uni-bielefeld.de. 相似文献

9.

ESTAP--an automated system for the analysis of EST data 总被引：2，自引：0，他引：2

Mao C Cushman JC May GD Weller JW 《Bioinformatics (Oxford, England)》2003,19(13):1720-1722

The EST Analysis Pipeline (ESTAP) is a set of analytical procedures that automatically verify, cleanse, store and analyze ESTs generated on high-throughput platforms. It uses a relational database to store sequence data and analysis results, which facilitates both the search for specific information and statistical analysis. ESTAP provides for easy viewing of the original and cleansed data, as well as the analysis results via a Web browser. It also allows the data owner to submit selected sequences to dbEST in a semi-automated fashion. 相似文献

10.

TRAP: automated classification, quantification and annotation of tandemly repeated sequences

Sobreira TJ Durham AM Gruber A 《Bioinformatics (Oxford, England)》2006,22(3):361-362

TRAP, the Tandem Repeats Analysis Program, is a Perl program that provides a unified set of analyses for the selection, classification, quantification and automated annotation of tandemly repeated sequences. TRAP uses the results of the Tandem Repeats Finder program to perform a global analysis of the satellite content of DNA sequences, permitting researchers to easily assess the tandem repeat content for both individual sequences and whole genomes. The results can be generated in convenient formats such as HTML and comma-separated values. TRAP can also be used to automatically generate annotation data in the format of feature table and GFF files. 相似文献

11.

Correlation network analysis for data integration and biomarker selection

Adourian A Jennings E Balasubramanian R Hines WM Damian D Plasterer TN Clish CB Stroobant P McBurney R Verheij ER Bobeldijk I van der Greef J Lindberg J Kenne K Andersson U Hellmold H Nilsson K Salter H Schuppe-Koistinen I 《Molecular bioSystems》2008,4(3):249-259

相似文献

12.

PhyloGena--a user-friendly system for automated phylogenetic annotation of unknown sequences 总被引：1，自引：0，他引：1

Hanekamp K Bohnebeck U Beszteri B Valentin K 《Bioinformatics (Oxford, England)》2007,23(7):793-801

MOTIVATION: Phylogenomic approaches towards functional and evolutionary annotation of unknown sequences have been suggested to be superior to those based only on pairwise local alignments. User-friendly software tools making the advantages of phylogenetic annotation available for the ever widening range of bioinformatically uninitiated biologists involved in genome/EST annotation projects are, however, not available. We were particularly confronted with this issue in the annotation of sequences from different groups of complex algae originating from secondary endosymbioses, where the identification of the phylogenetic origin of genes is often more problematic than in taxa well represented in the databases (e.g. animals, plants or fungi). RESULTS: We present a flexible pipeline with a user-friendly, interactive graphical user interface running on desktop computers that automatically performs a basic local alignment search tool (BLAST) search of query sequences, selects a representative subset of them, then creates a multiple alignment from the selected sequences, and finally computes a phylogenetic tree. The pipeline, named PhyloGena, uses public domain software for all standard bioinformatics tasks (similarity search, multiple alignment, and phylogenetic reconstruction). As the major technological innovation, selection of a meaningful subset of BLAST hits was implemented using logic programming, mimicing the selection procedure (BLAST tables, multiple alignments and phylogenetic trees) are displayed graphically, allowing the user to interact with the pipeline and deduce the function and phylogenetic origin of the query. PhyloGena thus makes phylogenomic annotation available also for those biologists without access to large computing facilities and with little informatics background. Although phylogenetic annotation is particularly useful when working with composite genomes (e.g. from complex algae), PhyloGena can be helpful in expressed sequence tag and genome annotation also in other organisms. AVAILABILITY: PhyloGena (executables for LINUX and Windows 2000/XP as well as source code) is available by anonymous ftp from http://www.awi.de/en/phylogena. 相似文献

13.

Evolutionarily conserved substrate substructures for automated annotation of enzyme superfamilies

Chiang RA Sali A Babbitt PC 《PLoS computational biology》2008,4(8):e1000142

The evolution of enzymes affects how well a species can adapt to new environmental conditions. During enzyme evolution, certain aspects of molecular function are conserved while other aspects can vary. Aspects of function that are more difficult to change or that need to be reused in multiple contexts are often conserved, while those that vary may indicate functions that are more easily changed or that are no longer required. In analogy to the study of conservation patterns in enzyme sequences and structures, we have examined the patterns of conservation and variation in enzyme function by analyzing graph isomorphisms among enzyme substrates of a large number of enzyme superfamilies. This systematic analysis of substrate substructures establishes the conservation patterns that typify individual superfamilies. Specifically, we determined the chemical substructures that are conserved among all known substrates of a superfamily and the substructures that are reacting in these substrates and then examined the relationship between the two. Across the 42 superfamilies that were analyzed, substantial variation was found in how much of the conserved substructure is reacting, suggesting that superfamilies may not be easily grouped into discrete and separable categories. Instead, our results suggest that many superfamilies may need to be treated individually for analyses of evolution, function prediction, and guiding enzyme engineering strategies. Annotating superfamilies with these conserved and reacting substructure patterns provides information that is orthogonal to information provided by studies of conservation in superfamily sequences and structures, thereby improving the precision with which we can predict the functions of enzymes of unknown function and direct studies in enzyme engineering. Because the method is automated, it is suitable for large-scale characterization and comparison of fundamental functional capabilities of both characterized and uncharacterized enzyme superfamilies. 相似文献

14.

CHIKVPRO - a protein sequence annotation database for chikungunya virus

Mishra AK Jain CK Agrawal A Jain SJ Dudha N Kumar K Sharma SK Gupta S 《Bioinformation》2010,5(1):4-6

In the recent past, there has been a resurgence of interest in Chikungunya virus (CHIKV) attributed to massive outbreaks of Chikungunya fever in the South-East Asia Region. This has reflected in substantial increase in submission of CHIKV genome sequences to NCBI (National Center for Biotechnology Information) database. Hereby we submit a database "CHIKVPRO" containing structural and functional annotation of Chikungunya virus proteins (25 strains) submitted in the NCBI repository. The CHIKV genome encodes for 9 proteins:4 non-structural and 5 structural. The CHIKVPRO database aims to provide the virology community with a single accession authoritative resource for CHIKV proteome- with reference to physiochemical and molecular properties, proteolytic cleavage sites, hydrophobicity, transmembrane prediction, and classification into functional families using SVMProt and other Expasy tools. AVAILABILITY: The database is freely available at http://www.chikvpro.info/ 相似文献

15.

Proteomic map and database of lymphoblastoid proteins

Caron M Imam-Sghiouar N Poirier F Le Caër JP Labas V Joubert-Caron R 《Journal of chromatography. B, Analytical technologies in the biomedical and life sciences》2002,771(1-2):197-209

相似文献

16.

An automated system designed for large scale NMR data deposition and annotation: application to over 600 assigned chemical shift data entries to the BioMagResBank from the Riken Structural Genomics/Proteomics Initiative internal database

Kobayashi N Harano Y Tochio N Nakatani E Kigawa T Yokoyama S Mading S Ulrich EL Markley JL Akutsu H Fujiwara T 《Journal of biomolecular NMR》2012,53(4):311-320

Biomolecular NMR chemical shift data are key information for the functional analysis of biomolecules and the development of new techniques for NMR studies utilizing chemical shift statistical information. Structural genomics projects are major contributors to the accumulation of protein chemical shift information. The management of the large quantities of NMR data generated by each project in a local database and the transfer of the data to the public databases are still formidable tasks because of the complicated nature of NMR data. Here we report an automated and efficient system developed for the deposition and annotation of a large number of data sets including (1)H, (13)C and (15)N resonance assignments used for the structure determination of proteins. We have demonstrated the feasibility of our system by applying it to over 600 entries from the internal database generated by the RIKEN Structural Genomics/Proteomics Initiative (RSGI) to the public database, BioMagResBank (BMRB). We have assessed the quality of the deposited chemical shifts by comparing them with those predicted from the PDB coordinate entry for the corresponding protein. The same comparison for other matched BMRB/PDB entries deposited from 2001-2011 has been carried out and the results suggest that the RSGI entries greatly improved the quality of the BMRB database. Since the entries include chemical shifts acquired under strikingly similar experimental conditions, these NMR data can be expected to be a promising resource to improve current technologies as well as to develop new NMR methods for protein studies. 相似文献

17.

The comparative analysis of statistics,based on the likelihood ratio criterion,in the automated annotation problem

Andrey M Leontovich Konstantin Y Tokmachev Hans C van Houwelingen 《BMC bioinformatics》2008,9(1):31

Background

This paper discusses the problem of automated annotation. It is a continuation of the previous work on the A⁴-algorithm (Adaptive algorithm of automated annotation) developed by Leontovich and others. 相似文献

18.

Extent and diversity of human alternative splicing established by complementary database annotation and microarray analysis

Bingham JL Carrigan PE Miller LJ Srinivasan S 《Omics : a journal of integrative biology》2008,12(1):83-92

Alternative splicing generates functional diversity in higher organisms through alternative first and last exons, skipped and included exons, intron retentions and alternative donor, and acceptor sites. In large-scale microarray studies in humans and the mouse, emphasis so far has been placed on exon-skip events, leaving the prevalence and importance of other splice types largely unexplored. Using a new human splice variant database and a genome-wide microarray to probes thousands of splice events of each type, we measured differential expression of splice types across six pair of diverse cell lines and validated the database annotation process. Results suggest that splicing in humans is more complex than simple exon-skip events, which account for a minority of splicing differences. The relative frequency of differential expression of the splice types correlates with what is found by our annotation efforts. In conclusion, alternative splicing in human cells is considerably more complex than the canonical example of the exon skip. The complementary approaches of genome-wide annotation of alternative splicing in human and design of genome-wide splicing microarrays to measure differential splicing in biological samples provide a powerful high-throughput tool to study the role of alternative splicing in human biology. 相似文献

19.

Regularization network-based gene selection for microarray data analysis

Zhou X Mao KZ 《International journal of neural systems》2006,16(5):341-352

Microarray data contains a large number of genes (usually more than 1000) and a relatively small number of samples (usually fewer than 100). This presents problems to discriminant analysis of microarray data. One way to alleviate the problem is to reduce dimensionality of data by selecting important genes to the discriminant problem. Gene selection can be cast as a feature selection problem in the context of pattern classification. Feature selection approaches are broadly grouped into filter methods and wrapper methods. The wrapper method outperforms the filter method but at the cost of more intensive computation. In the present study, we proposed a wrapper-like gene selection algorithm based on the Regularization Network. Compared with classical wrapper method, the computational costs in our gene selection algorithm is significantly reduced, because the evaluation criterion we proposed does not demand repeated training in the leave-one-out procedure. 相似文献

20.

An automated annotation tool for genomic DNA sequences using GeneScan and BLAST 总被引：1，自引：0，他引：1

Lynn AM Jain CK Kosalai K Barman P Thakur N Batra H Bhattacharya A 《Journal of genetics》2001,80(1):9-16

Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated annotation of genome DNA sequences. 相似文献