首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
NMPP: a user-customized NimbleGen microarray data processing pipeline   总被引:1,自引:0,他引:1  
NMPP package is a bundle of user-customized tools based on established algorithms and methods to process self-designed NimbleGen microarray data. It features a command-line-based integrative processing procedure that comprises five major functional components, namely the raw microarray data parsing and integrating module, the array spatial effect smoothing and visualization module, the probe-level multi-array normalization module, the gene expression intensity summarization module and the gene expression status inference module. AVAILABILITY: http://plantgenomics.biology.yale.edu/nmpp  相似文献   

2.
3.
Variance stabilization is a step in the preprocessing of microarray data that can greatly benefit the performance of subsequent statistical modeling and inference. Due to the often limited number of technical replicates for Affymetrix and cDNA arrays, achieving variance stabilization can be difficult. Although the Illumina microarray platform provides a larger number of technical replicates on each array (usually over 30 randomly distributed beads per probe), these replicates have not been leveraged in the current log2 data transformation process. We devised a variance-stabilizing transformation (VST) method that takes advantage of the technical replicates available on an Illumina microarray. We have compared VST with log2 and Variance-stabilizing normalization (VSN) by using the Kruglyak bead-level data (2006) and Barnes titration data (2005). The results of the Kruglyak data suggest that VST stabilizes variances of bead-replicates within an array. The results of the Barnes data show that VST can improve the detection of differentially expressed genes and reduce false-positive identifications. We conclude that although both VST and VSN are built upon the same model of measurement noise, VST stabilizes the variance better and more efficiently for the Illumina platform by leveraging the availability of a larger number of within-array replicates. The algorithms and Supplementary Data are included in the lumi package of Bioconductor, available at: www.bioconductor.org.  相似文献   

4.

Background  

High-throughput methods that allow for measuring the expression of thousands of genes or proteins simultaneously have opened new avenues for studying biochemical processes. While the noisiness of the data necessitates an extensive pre-processing of the raw data, the high dimensionality requires effective statistical analysis methods that facilitate the identification of crucial biological features and relations. For these reasons, the evaluation and interpretation of expression data is a complex, labor-intensive multi-step process. While a variety of tools for normalizing, analysing, or visualizing expression profiles has been developed in the last years, most of these tools offer only functionality for accomplishing certain steps of the evaluation pipeline.  相似文献   

5.
Feather mites are among the most common and diverse ectosymbionts of birds, yet basic questions such as the nature of their relationship remain largely unanswered. One reason for feather mites being understudied is that their morphological identification is often virtually impossible when using female or young individuals. Even for adult male specimens this task is tedious and requires advanced taxonomic expertise, thus hampering large-scale studies. In addition, molecular-based methods are challenging because the low DNA amounts usually obtained from these tiny mites do not reach the levels required for high-throughput sequencing. This work aims to overcome these issues by using a DNA metabarcoding approach to accurately identify and quantify the feather mite species present in a sample. DNA metabarcoding is a widely used molecular technique that takes advantage of high-throughput sequencing methodologies to assign the taxonomic identity to all the organisms present in a complex sample (i.e., a sample made up of multiple specimens that are hard or impossible to individualise). We present a high-throughput method for feather mite identification using a fragment of the COI gene as marker and Illumina Miseq technology. We tested this method by performing two experiments plus a field test over a total of 11,861 individual mites (5360 of which were also morphologically identified). In the first experiment, we tested the probability of detecting a single feather mite in a heterogeneous pool of non-conspecific individuals. In the second experiment, we made 2?×?2 combinations of species and studied the relationship between the proportion of individuals of a given species in a sample and the proportion of sequences retrieved to test whether DNA metabarcoding can reliably quantify the relative abundance of mites in a sample. Here we also tested the efficacy of degenerate primers (i.e., a mixture of similar primers that differ in one or several bases that are designed to increase the chance of annealing) and investigated the relationship between the number of mismatches and PCR success. Finally, we applied our DNA metabarcoding pipeline to a total of 6501 unidentified and unsorted feather mite individuals sampled from 380 European passerine birds belonging to 10 bird species (field test). Our results show that this proposed pipeline is suitable for correct identification and quantitative estimation of the relative abundance of feather mite species in complex samples, especially when dealing with a moderate number (>?30) of individuals per sample.  相似文献   

6.
We developed Tilescope, a fully integrated data processing pipeline for analyzing high-density tiling-array data . In a completely automated fashion, Tilescope will normalize signals between channels and across arrays, combine replicate experiments, score each array element, and identify genomic features. The program is designed with a modular, three-tiered architecture, facilitating parallelism, and a graphic user-friendly interface, presenting results in an organized web page, downloadable for further analysis.  相似文献   

7.
8.
Measurements of gene expression from microarray experiments are highly dependent on experimental design. Systematic noise can be introduced into the data at numerous steps. On Illumina BeadChips, multiple samples are assayed in an ordered series of arrays. Two experiments were performed using the same samples but different hybridization designs. An experiment confounding genotype with BeadChip and treatment with array position was compared to another experiment in which these factors were randomized to BeadChip and array position. An ordinal effect of array position on intensity values was observed in both experiments. We demonstrate that there is increased rate of false-positive results in the confounded design and that attempts to correct for confounded effects by statistical modeling reduce power of detection for true differential expression. Simple analysis models without post hoc corrections provide the best results possible for a given experimental design. Normalization improved differential expression testing in both experiments but randomization was the most important factor for establishing accurate results. We conclude that lack of randomization cannot be corrected by normalization or by analytical methods. Proper randomization is essential for successful microarray experiments.  相似文献   

9.
SUMMARY: Microarray data management and processing (MAD) is a set of Windows integrated software for microarray analysis. It consists of a relational database for data storage with many user-interfaces for data manipulation, several text file parsers and Microsoft Excel macros for automation of data processing, and a generator to produce text files that are ready for cluster analysis. AVAILABILITY: Executable is available free of charge on http://pompous.swmed.edu. The source code is also available upon request.  相似文献   

10.

Background  

Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but has not been previously mined en masse for changes in mRNA processing. We explored the possibility of using HG-U133 microarray data to identify changes in alternative mRNA processing in several available archival datasets.  相似文献   

11.
Gene expression microarrays and oligonucleotide GeneChips have provided biologists with a means of measuring, in a single experiment, the expression levels of entire genomes under a variety of conditions. As with any nascent field, there is no single accepted method for analyzing the new data types, with new methods appearing monthly. Investigators using the new technology must constantly seek access to the latest tools and explore their data in multiple ways. The functional genomics data pipeline provides an integrated, extendable analysis environment permitting multiple, simultaneous analyses to be automatically performed and provides a web server and interface for presenting results. AVAILABILITY: Source code and executables are available under the GNU public license at http://bioinformatics.fccc.edu/  相似文献   

12.
13.
14.

Background:

The tasks in BioCreative II were designed to approximate some of the laborious work involved in curating biomedical research papers. The approach to these tasks taken by the University of Edinburgh team was to adapt and extend the existing natural language processing (NLP) system that we have developed as part of a commercial curation assistant. Although this paper concentrates on using NLP to assist with curation, the system can be equally employed to extract types of information from the literature that is immediately relevant to biologists in general.

Results:

Our system was among the highest performing on the interaction subtasks, and competitive performance on the gene mention task was achieved with minimal development effort. For the gene normalization task, a string matching technique that can be quickly applied to new domains was shown to perform close to average.

Conclusion:

The technologies being developed were shown to be readily adapted to the BioCreative II tasks. Although high performance may be obtained on individual tasks such as gene mention recognition and normalization, and document classification, tasks in which a number of components must be combined, such as detection and normalization of interacting protein pairs, are still challenging for NLP systems.
  相似文献   

15.
Microtubules are polar filaments built from αβ-tubulin heterodimers that exhibit a range of architectures in vitro and in vivo. Tubulin heterodimers are arranged helically in the microtubule wall but many physiologically relevant architectures exhibit a break in helical symmetry known as the seam. Noisy 2D cryo-electron microscopy projection images of pseudo-helical microtubules therefore depict distinct but highly similar views owing to the high structural similarity of α- and β-tubulin. The determination of the αβ-tubulin register and seam location during image processing is essential for alignment accuracy that enables determination of biologically relevant structures. Here we present a pipeline designed for image processing and high-resolution reconstruction of cryo-electron microscopy microtubule datasets, based in the popular and user-friendly RELION image-processing package, Microtubule RELION-based Pipeline (MiRP). The pipeline uses a combination of supervised classification and prior knowledge about geometric lattice constraints in microtubules to accurately determine microtubule architecture and seam location. The presented method is fast and semi-automated, producing near-atomic resolution reconstructions with test datasets that contain a range of microtubule architectures and binding proteins.  相似文献   

16.
The Illumina Infinium HumanMethylation27 BeadChip (Illumina 27k) microarray is a high-throughput platform capable of interrogating the human DNA methylome. In a search for autosomal sex-specific DNA methylation using this microarray, we discovered autosomal CpG loci showing significant methylation differences between the sexes. However, we found that the majority of these probes cross-reacted with sequences from sex chromosomes. Moreover, we determined that 6-10% of the microarray probes are non-specific and map to highly homologous genomic sequences. Using probes targeting different CpGs that are exact duplicates of each other, we investigated the precision of these repeat measurements and concluded that the overall precision of this microarray is excellent. In addition, we identified a small number of probes targeting CpGs that include single-nucleotide polymorphisms. Overall, our findings address several technical issues associated with the Illumina 27k microarray that, once considered, will enhance the analysis and interpretation of data generated from this platform.  相似文献   

17.
In genome‐wide association studies, quality control (QC) of genotypes is important to avoid spurious results. It is also important to maintain long‐term data integrity, particularly in settings with ongoing genotyping (e.g. estimation of genomic breeding values). Here we discuss snpqc , a fully automated pipeline to perform QC analyses of Illumina SNP array data. It applies a wide range of common quality metrics with user‐defined filtering thresholds to generate a comprehensive QC report and a filtered dataset, including a genomic relationship matrix, ready for further downstream analyses which make it amenable for integration in high‐throughput environments. snpqc also builds a database to store genotypic, phenotypic and quality metrics to ensure data integrity and the option of integrating more samples from subsequent runs. The program is generic across species and array designs, providing a convenient interface between the genotyping laboratory and downstream genome‐wide association study or genomic prediction.  相似文献   

18.

Background  

Microarrays depend on appropriate probe design to deliver the promise of accurate genome-wide measurement. Probe design, ideally, produces a unique probe-target match with homogeneous duplex stability over the complete set of probes. Much of microarray pre-processing is concerned with adjusting for non-ideal probes that do not report target concentration accurately. Cross-hybridizing probes (non-unique), probe composition and structure, as well as platform effects such as instrument limitations, have been shown to affect the interpretation of signal. Data cleansing pipelines seldom filter specifically for these constraints, relying instead on general statistical tests to remove the most variable probes from the samples in a study. This adjusts probes contributing to ProbeSet (gene) values in a study-specific manner. We refer to the complete set of factors as biologically applied filter levels (BaFL) and have assembled an analysis pipeline for managing them consistently. The pipeline and associated experiments reported here examine the outcome of comprehensively excluding probes affected by known factors on inter-experiment target behavior consistency.  相似文献   

19.
DNA methylation, an important type of epigenetic modification in humans, participates in crucial cellular processes, such as embryonic development, X-inactivation, genomic imprinting and chromosome stability. Several platforms have been developed to study genome-wide DNA methylation. Many investigators in the field have chosen the Illumina Infinium HumanMethylation microarray for its ability to reliably assess DNA methylation following sodium bisulfite conversion. Here, we analyzed methylation profiles of 489 adult males and 357 adult females generated by the Infinium HumanMethylation450 microarray. Among the autosomal CpG sites that displayed significant methylation differences between the two sexes, we observed a significant enrichment of cross-reactive probes co-hybridizing to the sex chromosomes with more than 94% sequence identity. This could lead investigators to mistakenly infer the existence of significant autosomal sex-associated methylation. Using sequence identity cutoffs derived from the sex methylation analysis, we concluded that 6% of the array probes can potentially generate spurious signals because of co-hybridization to alternate genomic sequences highly homologous to the intended targets. Additionally, we discovered probes targeting polymorphic CpGs that overlapped SNPs. The methylation levels detected by these probes are simply the reflection of underlying genetic polymorphisms but could be misinterpreted as true signals. The existence of probes that are cross-reactive or of target polymorphic CpGs in the Illumina HumanMethylation microarrays can confound data obtained from such microarrays. Therefore, investigators should exercise caution when significant biological associations are found using these array platforms. A list of all cross-reactive probes and polymorphic CpGs identified by us are annotated in this paper.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号