首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Patel RK  Jain M 《PloS one》2012,7(2):e30619
Next generation sequencing (NGS) technologies provide a high-throughput means to generate large amount of sequence data. However, quality control (QC) of sequence data generated from these technologies is extremely important for meaningful downstream analysis. Further, highly efficient and fast processing tools are required to handle the large volume of datasets. Here, we have developed an application, NGS QC Toolkit, for quality check and filtering of high-quality data. This toolkit is a standalone and open source application freely available at http://www.nipgr.res.in/ngsqctoolkit.html. All the tools in the application have been implemented in Perl programming language. The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools) and analysis (statistics tools). A variety of options have been provided to facilitate the QC at user-defined parameters. The toolkit is expected to be very useful for the QC of NGS data to facilitate better downstream analysis.  相似文献   

3.
The MicroCore toolkit is a suite of analysis programs for microarray and proteomics data that is open source and programmed exclusively in Java. MicroCore provides a flexible and extensible environment for the interpretation of functional genomics data through visualization. The first version of the application (downloadable from the MicroCore website: http://www.ucl.ac.uk/oncology/MicroCore/microcore.htm), implements two programs-PIMs (protein interaction maps) and MicroExpress-and is soon to be followed by an extended version which will also feature a fuzzy k-means clustering application and a Java-based R plug-in for microarray analysis. PIMs and MicroExpress provide a simple yet powerful way of graphically relating large quantities of expression data from multiple experiments to cellular pathways and biological processes in a statistically meaningful way.  相似文献   

4.

Background  

There are several isolated tools for partial analysis of microarray expression data. To provide an integrative, easy-to-use and automated toolkit for the analysis of Affymetrix microarray expression data we have developed Array2BIO, an application that couples several analytical methods into a single web based utility.  相似文献   

5.
One of the challenges to the effective utilization of cDNA microarray analysis in mouse models of oncogenesis is the choice of a critical set of probes that are informative for human disease. Given the thousands of genes with a potential role in human oncogenesis and the hundreds of thousands of mouse sequences available for use as probes, selection of an informative set of mouse probes can be an overwhelming task. We have developed a web based sequence mining tool using DataBase Independent (DBI) Perl to annotate publicly available sequences. The Mouse Oncochip Design Tool uses the Mouse Genome Database (MGD) developed and maintained by the Jackson Laboratories for mouse DNA sequences. There are over 380 000 sequences in their database. The output list has been ordered to present the genes more likely to be informative in a mouse model of human cancer using a candidate set of oncogenes to order the list. Mouse sequences that represent genes that are homologous with a member of a human oncogene set are listed first. In addition it provides a set of links for information on clone source gene function. Contact: http://nciarray.nci.nih.gov/cgi-bin/me/mouse_design.cgi  相似文献   

6.
7.
8.
SUMMARY: Visual programming offers an intuitive means of combining known analysis and visualization methods into powerful applications. The system presented here enables users who are not programmers to manage microarray and genomic data flow and to customize their analyses by combining common data analysis tools to fit their needs. AVAILABILITY: http://www.ailab.si/supp/bi-visprog SUPPLEMENTARY INFORMATION: http://www.ailab.si/supp/bi-visprog.  相似文献   

9.
10.
SUMMARY: Currently, new bacterial genomes are being published on a monthly basis. With the growing amount of genome sequence data, there is a demand for a flexible and easy-to-maintain structure for storing sequence data and results from bioinformatic analysis. More than 150 sequenced bacterial genomes are now available, and comparisons of properties for taxonomically similar organisms are not readily available to many biologists. In addition to the most basic information, such as AT content, chromosome length, tRNA count and rRNA count, a large number of more complex calculations are needed to perform detailed comparative genomics. DNA structural calculations like curvature and stacking energy, DNA compositions like base skews, oligo skews and repeats at the local and global level are just a few of the analysis that are presented on the CBS Genome Atlas Web page. Complex analysis, changing methods and frequent addition of new models are factors that require a dynamic database layout. Using basic tools like the GNU Make system, csh, Perl and MySQL, we have created a flexible database environment for storing and maintaining such results for a collection of complete microbial genomes. Currently, these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues. AVAILABILITY: A web based user interface which is dynamically linked to the Genome Atlas Database can be accessed via www.cbs.dtu.dk/services/GenomeAtlas/. SUPPLEMENTARY INFORMATION: This paper has a supplemental information page which links to the examples presented: www.cbs.dtu.dk/services/GenomeAtlas/suppl/bioinfdatabase.  相似文献   

11.
12.
We present a software package, Genquire, that allows visualization, querying, hand editing, and de novo markup of complete or partially annotated genomes. The system is written in Perl/Tk and uses, where possible, existing BioPerl data models and methods for representation and manipulation of the sequence and annotation objects. An adaptor API is provided to allow Genquire to display a wide range of databases and flat files, and a plugins API provides an interface to other sequence analysis software. AVAILABILITY: Genquire v3.03 is open-source software. The code is available for download and/or contribution at http://www.bioinformatics.org/Genquire  相似文献   

13.
14.
Adjustment of systematic microarray data biases   总被引:6,自引:0,他引:6  
MOTIVATION: Systematic differences due to experimental features of microarray experiments are present in most large microarray data sets. Many different experimental features can cause biases including different sources of RNA, different production lots of microarrays or different microarray platforms. These systematic effects present a substantial hurdle to the analysis of microarray data. RESULTS: We present here a new method for the identification and adjustment of systematic biases that are present within microarray data sets. Our approach is based on modern statistical discrimination methods and is shown to be very effective in removing systematic biases present in a previously published breast tumor cDNA microarray data set. The new method of 'Distance Weighted Discrimination (DWD)' is shown to be better than Support Vector Machines and Singular Value Decomposition for the adjustment of systematic microarray effects. In addition, it is shown to be of general use as a tool for the discrimination of systematic problems present in microarray data sets, including the merging of two breast tumor data sets completed on different microarray platforms. AVAILABILITY: Matlab software to perform DWD can be retrieved from https://genome.unc.edu/pubsup/dwd/  相似文献   

15.
SUMMARY: Inferring genetic network architecture from time series data generated from high-throughput experimental technologies, such as cDNA microarray, can help us to understand the system behavior of living organisms. We have developed an interactive tool, GeneNetwork, which provides four reverse engineering models and three data interpolation approaches to infer relationships between genes. GeneNetwork enables a user to readily reconstruct genetic networks based on microarray data without having intimate knowledge of the mathematical models. A simple graphical user interface enables rapid, intuitive mapping and analysis of the reconstructed network allowing biologists to explore gene relationships at the system level. AVAILABILITY: Download from http://genenetwork.sbl.bc.sinica.edu.tw/. SUPPLEMENTARY INFORMATION: Supplement documentation of algorithms for the four approaches is downloadable at the above location.  相似文献   

16.
《Genomics》2020,112(1):286-288
Synteny and collinearity analysis is a standard investigative strategy done in many comparative genomic studies to understand genomic conservation and evolution. Currently, most visualization toolkits of synteny and collinearity do not emphasize the graphical representation of the results, especially the lack of extensible format on vector graphics outputs. This limitation becomes more apparent as 3rd generation sequencing brings high-throughput data, requiring relatively higher resolution for the resulting images. We developed VGSC2, the 2nd version of the web-based vector graph toolkit for genome synteny and collinearity analysis. The updated version enables four types of plots for synteny and collinearity, and three types of plots for gene family evolutionary research. Using web-based technologies, VGSC2 provides an easy-to-use user interface to display the homologous genomic result into vector graphs such as SVG, EPS, and PDF, as well as an online editor. VGSC2 is open source and freely available for use online through the web server available at http://bio.njfu.edu.cn/vgsc2.  相似文献   

17.
18.
While minimum information about a microarray experiment (MIAME) standards have helped to increase the value of the microarray data deposited into public databases like ArrayExpress and Gene Expression Omnibus (GEO), limited means have been available to assess the quality of this data or to identify the procedures used to normalize and transform raw data. The EMERALD FP6 Coordination Action was designed to deliver approaches to assess and enhance the overall quality of microarray data and to disseminate these approaches to the microarray community through an extensive series of workshops, tutorials, and symposia. Tools were developed for assessing data quality and used to demonstrate how the removal of poor-quality data could improve the power of statistical analyses and facilitate analysis of multiple joint microarray data sets. These quality metrics tools have been disseminated through publications and through the software package arrayQualityMetrics. Within the framework provided by the Ontology of Biomedical Investigations, ontology was developed to describe data transformations, and software ontology was developed for gene expression analysis software. In addition, the consortium has advocated for the development and use of external reference standards in microarray hybridizations and created the Molecular Methods (MolMeth) database, which provides a central source for methods and protocols focusing on microarray-based technologies.  相似文献   

19.
We present a new computational technique (a software implementation, data sets, and supplementary information are available at http://www.enm.bris.ac.uk/lpd/) which enables the probabilistic analysis of cDNA microarray data and we demonstrate its effectiveness in identifying features of biomedical importance. A hierarchical Bayesian model, called Latent Process Decomposition (LPD), is introduced in which each sample in the data set is represented as a combinatorial mixture over a finite set of latent processes, which are expected to correspond to biological processes. Parameters in the model are estimated using efficient variational methods. This type of probabilistic model is most appropriate for the interpretation of measurement data generated by cDNA microarray technology. For determining informative substructure in such data sets, the proposed model has several important advantages over the standard use of dendrograms. First, the ability to objectively assess the optimal number of sample clusters. Second, the ability to represent samples and gene expression levels using a common set of latent variables (dendrograms cluster samples and gene expression values separately which amounts to two distinct reduced space representations). Third, in constrast to standard cluster models, observations are not assigned to a single cluster and, thus, for example, gene expression levels are modeled via combinations of the latent processes identified by the algorithm. We show this new method compares favorably with alternative cluster analysis methods. To illustrate its potential, we apply the proposed technique to several microarray data sets for cancer. For these data sets it successfully decomposes the data into known subtypes and indicates possible further taxonomic subdivision in addition to highlighting, in a wholly unsupervised manner, the importance of certain genes which are known to be medically significant. To illustrate its wider applicability, we also illustrate its performance on a microarray data set for yeast.  相似文献   

20.
Identification of biopolymer motifs represents a key step in the analysis of biological sequences. The MEME Suite is a widely used toolkit for comprehensive analysis of biopolymer motifs; however, these tools are poorly integrated within popular analysis frameworks like the R/Bioconductor project, creating barriers to their use. Here we present memes, an R package that provides a seamless R interface to a selection of popular MEME Suite tools. memes provides a novel “data aware” interface to these tools, enabling rapid and complex discriminative motif analysis workflows. In addition to interfacing with popular MEME Suite tools, memes leverages existing R/Bioconductor data structures to store the multidimensional data returned by MEME Suite tools for rapid data access and manipulation. Finally, memes provides data visualization capabilities to facilitate communication of results. memes is available as a Bioconductor package at https://bioconductor.org/packages/memes, and the source code can be found at github.com/snystrom/memes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号