首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
Qianxing Mo  Faming Liang 《Biometrics》2010,66(4):1284-1294
Summary ChIP‐chip experiments are procedures that combine chromatin immunoprecipitation (ChIP) and DNA microarray (chip) technology to study a variety of biological problems, including protein–DNA interaction, histone modification, and DNA methylation. The most important feature of ChIP‐chip data is that the intensity measurements of probes are spatially correlated because the DNA fragments are hybridized to neighboring probes in the experiments. We propose a simple, but powerful Bayesian hierarchical approach to ChIP‐chip data through an Ising model with high‐order interactions. The proposed method naturally takes into account the intrinsic spatial structure of the data and can be used to analyze data from multiple platforms with different genomic resolutions. The model parameters are estimated using the Gibbs sampler. The proposed method is illustrated using two publicly available data sets from Affymetrix and Agilent platforms, and compared with three alternative Bayesian methods, namely, Bayesian hierarchical model, hierarchical gamma mixture model, and Tilemap hidden Markov model. The numerical results indicate that the proposed method performs as well as the other three methods for the data from Affymetrix tiling arrays, but significantly outperforms the other three methods for the data from Agilent promoter arrays. In addition, we find that the proposed method has better operating characteristics in terms of sensitivities and false discovery rates under various scenarios.  相似文献   

3.
Over the last decade, the introduction of microarray technology has had a profound impact on gene expression research. The publication of studies with dissimilar or altogether contradictory results, obtained using different microarray platforms to analyze identical RNA samples, has raised concerns about the reliability of this technology. The MicroArray Quality Control (MAQC) project was initiated to address these concerns, as well as other performance and data analysis issues. Expression data on four titration pools from two distinct reference RNA samples were generated at multiple test sites using a variety of microarray-based and alternative technology platforms. Here we describe the experimental design and probe mapping efforts behind the MAQC project. We show intraplatform consistency across test sites as well as a high level of interplatform concordance in terms of genes identified as differentially expressed. This study provides a resource that represents an important first step toward establishing a framework for the use of microarrays in clinical and regulatory settings.  相似文献   

4.
Together with the widely used Affymetrix microarrays, the recently introduced Illumina platform has become a cost-effective alternative for genome-wide studies. To efficiently use data from both array platforms, there is a pressing need for methods that allow systematic integration of multiple datasets, especially when the number of samples is small. To address these needs, we introduce a meta-analytic procedure for combining Affymetrix and Illumina data in the context of detecting differentially expressed genes between the platforms. We first investigate the effect of different expression change estimation procedures within the platforms on the agreement of the most differentially expressed genes. Using the best estimation methods, we then show the benefits of the integrative analysis in producing reproducible results across bootstrap samples. In particular, we demonstrate its biological relevance in identifying small but consistent changes during T helper 2 cell differentiation.  相似文献   

5.
To facilitate collaborative research efforts between multi-investigator teams using DNA microarrays, we identified sources of error and data variability between laboratories and across microarray platforms, and methods to accommodate this variability. RNA expression data were generated in seven laboratories, which compared two standard RNA samples using 12 microarray platforms. At least two standard microarray types (one spotted, one commercial) were used by all laboratories. Reproducibility for most platforms within any laboratory was typically good, but reproducibility between platforms and across laboratories was generally poor. Reproducibility between laboratories increased markedly when standardized protocols were implemented for RNA labeling, hybridization, microarray processing, data acquisition and data normalization. Reproducibility was highest when analysis was based on biological themes defined by enriched Gene Ontology (GO) categories. These findings indicate that microarray results can be comparable across multiple laboratories, especially when a common platform and set of procedures are used.  相似文献   

6.

Background

The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly.

Results

We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers.

Conclusions

Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.  相似文献   

7.
Factors affecting the type and frequency of germline mutations in animals are of significant interest from health and toxicology perspectives. However, studies in this field have been limited by the use of markers with low detection power or uncertain relevance to phenotype. Whole genome sequencing (WGS) is now a potential option to directly determine germline mutation type and frequency in family groups at all loci simultaneously. Medical studies have already capitalized on WGS to identify novel mutations in human families for clinical purposes, such as identifying candidate genes contributing to inherited conditions. However, WGS has not yet been used in any studies of vertebrates that aim to quantify changes in germline mutation frequency as a result of environmental factors. WGS is a promising tool for detecting mutation induction, but it is currently limited by several technical challenges. Perhaps the most pressing issue is sequencing error rates that are currently high in comparison to the intergenerational mutation frequency. Different platforms and depths of coverage currently result in a range of 10-10(3) false positives for every true mutation. In addition, the cost of WGS is still relatively high, particularly when comparing mutation frequencies among treatment groups with even moderate sample sizes. Despite these challenges, WGS offers the potential for unprecedented insight into germline mutation processes. Refinement of available tools and emergence of new technologies may be able to provide the improved accuracy and reduced costs necessary to make WGS viable in germline mutation studies in the very near future. To streamline studies, researchers may use multiple family triads per treatment group and sequence a targeted (reduced) portion of each genome with high (20-40 ×) depth of coverage. We are optimistic about the application of WGS for quantifying germline mutations, but caution researchers regarding the resource-intensive nature of the work using existing technology.  相似文献   

8.

Background

Next generation sequencing (NGS) platforms are currently being utilized for targeted sequencing of candidate genes or genomic intervals to perform sequence-based association studies. To evaluate these platforms for this application, we analyzed human sequence generated by the Roche 454, Illumina GA, and the ABI SOLiD technologies for the same 260 kb in four individuals.

Results

Local sequence characteristics contribute to systematic variability in sequence coverage (>100-fold difference in per-base coverage), resulting in patterns for each NGS technology that are highly correlated between samples. A comparison of the base calls to 88 kb of overlapping ABI 3730xL Sanger sequence generated for the same samples showed that the NGS platforms all have high sensitivity, identifying >95% of variant sites. At high coverage, depth base calling errors are systematic, resulting from local sequence contexts; as the coverage is lowered additional 'random sampling' errors in base calling occur.

Conclusions

Our study provides important insights into systematic biases and data variability that need to be considered when utilizing NGS platforms for population targeted sequencing studies.  相似文献   

9.

Background  

Many researchers are concerned with the comparability and reliability of microarray gene expression data. Recent completion of the MicroArray Quality Control (MAQC) project provides a unique opportunity to assess reproducibility across multiple sites and the comparability across multiple platforms. The MAQC analysis presented for the conclusion of inter- and intra-platform comparability/reproducibility of microarray gene expression measurements is inadequate. We evaluate the reproducibility/comparability of the MAQC data for 12901 common genes in four titration samples generated from five high-density one-color microarray platforms and the TaqMan technology. We discuss some of the problems with the use of correlation coefficient as metric to evaluate the inter- and intra-platform reproducibility and the percent of overlapping genes (POG) as a measure for evaluation of a gene selection procedure by MAQC.  相似文献   

10.
Whole exome sequencing by high-throughput sequencing of target-enriched genomic DNA (exome-seq) has become common in basic and translational research as a means of interrogating the interpretable part of the human genome at relatively low cost. We present a comparison of three major commercial exome sequencing platforms from Agilent, Illumina and Nimblegen applied to the same human blood sample. Our results suggest that the Nimblegen platform, which is the only one to use high-density overlapping baits, covers fewer genomic regions than the other platforms but requires the least amount of sequencing to sensitively detect small variants. Agilent and Illumina are able to detect a greater total number of variants with additional sequencing. Illumina captures untranslated regions, which are not targeted by the Nimblegen and Agilent platforms. We also compare exome sequencing and whole genome sequencing (WGS) of the same sample, demonstrating that exome sequencing can detect additional small variants missed by WGS.  相似文献   

11.
The utility of previously generated microarray data is severely limited owing to small study size, leading to under-powered analysis, and failure of replication. Multiplicity of platforms and various sources of systematic noise limit the ability to compile existing data from similar studies. We present a model for transformation of data across different generations of Affymetrix arrays, developed using previously published datasets describing technical replicates performed with two generations of arrays. The transformation is based upon a probe set-specific regression model, generated from replicate measurements across platforms, performed using correlation coefficients. The model, when applied to the expression intensities of 5069 shared, sequence-matched probe sets in three different generations of Affymetrix Human oligonucleotide arrays, showed significant improvement in inter generation correlations between sample-wide means and individual probe set pairs. The approach was further validated by an observed reduction in Euclidean distance between signal intensities across generations for the predicted values. Finally, application of the model to independent, but related datasets resulted in improved clustering of samples based upon their biological, as opposed to technical, attributes. Our results suggest that this transformation method is a valuable tool for integrating microarray datasets from different generations of arrays.  相似文献   

12.
Genomic measures of inbreeding based on identical-by-descent (IBD) segments are increasingly used to measure inbreeding and mostly estimated on SNP arrays and whole-genome sequencing (WGS) data. However, some softwares recurrently used for their estimation assume that genomic positions which have not been genotyped are nonvariant. This might be true for WGS data, but not for reduced genomic representations and can lead to spurious IBD segments estimation. In this project, we simulated the outputs of WGS, two SNP arrays of different sizes and RAD-sequencing for three populations with different sizes and histories. We compare the results of IBD segments estimation with two softwares: runs of homozygosity (ROHs) estimated with PLINK and homozygous-by-descent (HBD) segments estimated with RZooRoH. We demonstrate that to obtain meaningful estimates of inbreeding, RZooRoH requires a SNPs density 11 times smaller compared to PLINK: ranks of inbreeding coefficients were conserved among individuals above 22 SNPs/Mb for PLINK and 2 SNPs/Mb for RZooRoH. We also show that in populations with simple demographic histories, distribution of ROHs and HBD segments are correctly estimated with both SNP arrays and WGS. PLINK correctly estimated distribution of ROHs with SNP densities above 22 SNPs/Mb, while RZooRoH correctly estimated distribution of HBD segments with SNPs densities above 11 SNPs/Mb. However, in a population with a more complex demographic history, RZooRoH resulted in better distribution of IBD segments estimation compared to PLINK even with WGS data. Consequently, we advise researchers to use either methods relying on excess homozygosity averaged across SNPs or model-based HBD segments calling methods for inbreeding estimations.  相似文献   

13.
14.
Fan X  Shao L  Fang H  Tong W  Cheng Y 《PloS one》2011,6(1):e16067
High-throughput microarray technology has been widely applied in biological and medical decision-making research during the past decade. However, the diversity of platforms has made it a challenge to re-use and/or integrate datasets generated in different experiments or labs for constructing array-based diagnostic models. Using large toxicogenomics datasets generated using both Affymetrix and Agilent microarray platforms, we carried out a benchmark evaluation of cross-platform consistency in multiple-class prediction using three widely-used machine learning algorithms. After an initial assessment of model performance on different platforms, we evaluated whether predictive signature features selected in one platform could be directly used to train a model in the other platform and whether predictive models trained using data from one platform could predict datasets profiled using the other platform with comparable performance. Our results established that it is possible to successfully apply multiple-class prediction models across different commercial microarray platforms, offering a number of important benefits such as accelerating the possible translation of biomarkers identified with microarrays to clinically-validated assays. However, this investigation focuses on a technical platform comparison and is actually only the beginning of exploring cross-platform consistency. Further studies are needed to confirm the feasibility of microarray-based cross-platform prediction, especially using independent datasets.  相似文献   

15.
Using a novel approach combining four complementary metabolomic and mineral platforms with genome-wide genotyping at 1536 single nucleotide polymorphism (SNP) loci, we have investigated the extent of biochemical and genetic diversity in three commercially-relevant waxy rice cultivars important to food production in the Lao People??s Democratic Republic (PDR). Following cultivation with different nitrogen fertiliser regimes, multiple metabolomic data sets, including minerals, were produced and analysed using multivariate statistical methods to reveal the degree of similarity between the genotypes and to identify discriminatory compounds supported by multiple technology platforms. Results revealed little effect of nitrogen supply on metabolites related to quality, despite known yield differences. All platforms revealed unique metabolic signatures for each variety and many discriminatory compounds could be identified as being relevant to consumers in terms of nutritional value and taste or flavour. For each platform, metabolomic diversity was highly associated with genetic distance between the varieties. This study demonstrates that multiple metabolomic platforms have potential as phenotyping tools to assist breeders in their quest to combine key yield and quality characteristics. This better enables rice improvement programs to meet different consumer and farmer needs, and to address food security in rice-consuming countries.  相似文献   

16.
MOTIVATION: An increasingly common application of gene expression profile data is the reverse engineering of cellular networks. However, common procedures to normalize expression profiles generated using the Affymetrix GeneChips technology were originally developed for a rather different purpose, namely the accurate measure of differential gene expression between two or more phenotypes. As a result, current evaluation strategies lack comprehensive metrics to assess the suitability of available normalization procedures for reverse engineering and, in general, for measuring correlation between the expression profiles of a gene pair. RESULTS: We benchmark four commonly used normalization procedures (MAS5, RMA, GCRMA and Li-Wong) in the context of established algorithms for the reverse engineering of protein-protein and protein-DNA interactions. Replicate sample, randomized and human B-cell data sets are used as an input. Surprisingly, our study suggests that MAS5 provides the most faithful cellular network reconstruction. Furthermore, we identify a crucial step in GCRMA responsible for introducing severe artifacts in the data leading to a systematic overestimate of pairwise correlation. This has key implications not only for reverse engineering but also for other methods, such as hierarchical clustering, relying on accurate measurements of pairwise expression profile correlation. We propose an alternative implementation to eliminate such side effect.  相似文献   

17.
Metabolomic profiling is a powerful approach to characterize human metabolism and help understand common disease risk. Although multiple high-throughput technologies have been developed to assay the human metabolome, no technique is capable of capturing the entire human metabolism. Large-scale metabolomics data are being generated in multiple cohorts, but the datasets are typically profiled using different metabolomics platforms. Here, we compared analyses across two of the most frequently used metabolomic platforms, Biocrates and Metabolon, with the aim of assessing how complimentary metabolite profiles are across platforms. We profiled serum samples from 1,001 twins using both targeted (Biocrates, n = 160 metabolites) and non-targeted (Metabolon, n = 488 metabolites) mass spectrometry platforms. We compared metabolite distributions and performed genome-wide association analyses to identify shared genetic influences on metabolites across platforms. Comparison of 43 metabolites named for the same compound on both platforms indicated strong positive correlations, with few exceptions. Genome-wide association scans with high-throughput metabolic profiles were performed for each dataset and identified genetic variants at 7 loci associated with 16 unique metabolites on both platforms. The 16 metabolites showed consistent genetic associations and appear to be robustly measured across platforms. These included both metabolites named for the same compound across platforms as well as unique metabolites, of which 2 (nonanoylcarnitine (C9) [Biocrates]/Unknown metabolite X-13431 [Metabolon] and PC aa C28:1 [Biocrates]/1-stearoylglycerol [Metabolon]) are likely to represent the same or related biochemical entities. The results demonstrate the complementary nature of both platforms, and can be informative for future studies of comparative and integrative metabolomics analyses in samples profiled on different platforms.  相似文献   

18.
In the Cancer Genome Atlas (TCGA) project, gene expression of the same set of samples is measured multiple times on different microarray platforms. There are two main advantages to combining these measurements. First, we have the opportunity to obtain a more precise and accurate estimate of expression levels than using the individual platforms alone. Second, the combined measure simplifies downstream analysis by eliminating the need to work with three sets of expression measures and to consolidate results from the three platforms.We propose to use factor analysis (FA) to obtain a unified gene expression measure (UE) from multiple platforms. The UE is a weighted average of the three platforms, and is shown to perform well in terms of accuracy and precision. In addition, the FA model produces parameter estimates that allow the assessment of the model fit.The R code is provided in File S2. Gene-level FA measurements for the TCGA data sets are available from http://tcga-data.nci.nih.gov/docs/publications/unified_expression/.  相似文献   

19.
Ciliates are unicellular eukaryotes with separate germline and somatic genomes and diverse life cycles, which make them a unique model to improve our understanding of population genetics through the detection of genetic variations. However, traditional sequencing methods cannot be directly applied to ciliates because the majority are uncultivated. Single‐cell whole‐genome sequencing (WGS) is a powerful tool for studying genetic variation in microbes, but no studies have been performed in ciliates. We compared the use of single‐cell WGS and bulk DNA WGS to detect genetic variation, specifically single nucleotide polymorphisms (SNPs), in the model ciliate Tetrahymena thermophila. Our analyses showed that (i) single‐cell WGS has excellent performance regarding mapping rate and genome coverage but lower sequencing uniformity compared with bulk DNA WGS due to amplification bias (which was reproducible); (ii) false‐positive SNP sites detected by single‐cell WGS tend to occur in genomic regions with particularly high sequencing depth and high rate of C:G to T:A base changes; (iii) SNPs detected in three or more cells should be reliable (an detection efficiency of 83.4–97.4% was obtained for combined data from three cells). This analytical method could be adapted to measure genetic variation in other ciliates and broaden research into ciliate population genetics.  相似文献   

20.
DNA microarray technologies have evolved rapidly to become a key high-throughput technology for the simultaneous measurement of the relative expression levels of thousands of individual genes. However, despite the widespread adoption of DNA microarray technology, there remains considerable uncertainty and scepticism regarding data obtained using these technologies. Comparing results from seemingly identical experiments from different laboratories or even from different days can prove challenging; these challenges increase further when data from different array platforms need to be compared. To comply with emerging regulations, the quality of the data generated from array experiments needs to be clearly demonstrated. This review describes several initiatives that aim to improve confidence in data generated by array experiments, including initiatives to develop standards for data reporting and storage, external spike-in controls, quality control procedures, best practice guidelines, and quality metrics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号