首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
During the last several years, high-density genotyping SNP arrays have facilitated genome-wide association studies (GWAS) that successfully identified common genetic variants associated with a variety of phenotypes. However, each of the identified genetic variants only explains a very small fraction of the underlying genetic contribution to the studied phenotypic trait. Moreover, discordance observed in results between independent GWAS indicates the potential for Type I and II errors. High reliability of genotyping technology is needed to have confidence in using SNP data and interpreting GWAS results. Therefore, reproducibility of two widely genotyping technology platforms from Affymetrix and Illumina was assessed by analyzing four technical replicates from each of the six individuals in five laboratories. Genotype concordance of 99.40% to 99.87% within a laboratory for the sample platform, 98.59% to 99.86% across laboratories for the same platform, and 98.80% across genotyping platforms was observed. Moreover, arrays with low quality data were detected when comparing genotyping data from technical replicates, but they could not be detected according to venders' quality control (QC) suggestions. Our results demonstrated the technical reliability of currently available genotyping platforms but also indicated the importance of incorporating some technical replicates for genotyping QC in order to improve the reliability of GWAS results. The impact of discordant genotypes on association analysis results was simulated and could explain, at least in part, the irreproducibility of some GWAS findings when the effect size (i.e. the odds ratio) and the minor allele frequencies are low.  相似文献   

2.

Background  

Comparison of data produced on different microarray platforms often shows surprising discordance. It is not clear whether this discrepancy is caused by noisy data or by improper probe matching between platforms. We investigated whether the significant level of inconsistency between results produced by alternative gene expression microarray platforms could be reduced by stringent sequence matching of microarray probes. We mapped the short oligo probes of the Affymetrix platform onto cDNA clones of the Stanford microarray platform. Affymetrix probes were reassigned to redefined probe sets if they mapped to the same cDNA clone sequence, regardless of the original manufacturer-defined grouping. The NCI-60 gene expression profiles produced by Affymetrix HuFL platform were recalculated using these redefined probe sets and compared to previously published cDNA measurements of the same panel of RNA samples.  相似文献   

3.
Are data from different gene expression microarray platforms comparable?   总被引:8,自引:0,他引:8  
Many commercial and custom-made microarray formats are routinely used for large-scale gene expression surveys. Here, we sought to determine the level of concordance between microarray platforms by analyzing breast cancer cell lines with in situ synthesized oligonucleotide arrays (Affymetrix HG-U95v2), commercial cDNA microarrays (Agilent Human 1 cDNA), and custom-made cDNA microarrays from a sequence-validated 13K cDNA library. Gene expression data from the commercial platforms showed good correlations across the experiments (r = 0.78-0.86), whereas the correlations between the custom-made and either of the two commercial platforms were lower (r = 0.62-0.76). Discrepant findings were due to clone errors on the custom-made microarrays, old annotations, or unknown causes. Even within platform, there can be several ways to analyze data that may influence the correlation between platforms. Our results indicate that combining data from different microarray platforms is not straightforward. Variability of the data represents a challenge for developing future diagnostic applications of microarrays.  相似文献   

4.
《Genomics》2020,112(2):1437-1443
BackgroundWhole Exome Sequencing (WES) utilises overlapping fragments prone to sequencing artefacts. Saliva, a non-invasive source of DNA, has been successfully used in WES studies on various platforms. This study explored the validity and quality of DNA sourced from saliva compared to whole blood on an Ion Platform.MethodsDNA was extracted from both sample types from four individuals. WES, performed on the Ion Proton platform was assessed for quality metrics (Depth, Genotyping Quality, etc.) and variant identification for the same source sample-pairs.ResultsNo significant differences in quality metrics were identified between data obtained from whole blood and saliva samples, with several saliva samples demonstrating higher coverage depth. Variants within the same sample, from the two genomic DNA sources, had an average concordance similar to other studies and platforms with different chemistry.ConclusionSaliva-extracted DNA provides comparable sequencing quality to whole blood for WES on Ion Torrent Platforms.  相似文献   

5.
In this article, we describe ednaoccupancy , an r package for fitting Bayesian, multiscale occupancy models. These models are appropriate for occupancy surveys that include three nested levels of sampling: primary sample units within a study area, secondary sample units collected from each primary unit and replicates of each secondary sample unit. This design is commonly used in occupancy surveys of environmental DNA (eDNA). ednaoccupancy allows users to specify and fit multiscale occupancy models with or without covariates, to estimate posterior summaries of occurrence and detection probabilities, and to compare different models using Bayesian model‐selection criteria. We illustrate these features by analysing two published data sets: eDNA surveys of a fungal pathogen of amphibians and eDNA surveys of an endangered fish species.  相似文献   

6.
7.
We have conducted a study to compare the variability in measured gene expression levels associated with three types of microarray platforms. Total RNA samples were obtained from liver tissue of four male mice, two each from inbred strains A/J and C57BL/6J. The same four samples were assayed on Affymetrix Mouse Genome Expression Set 430 GeneChips (MOE430A and MOE430B), spotted cDNA microarrays, and spotted oligonucleotide microarrays using eight arrays of each type. Variances associated with measurement error were observed to be comparable across all microarray platforms. The MOE430A GeneChips and cDNA arrays had higher precision across technical replicates than the MOE430B GeneChips and oligonucleotide arrays. The Affymetrix platform showed the greatest range in the magnitude of expression levels followed by the oligonucleotide arrays. We observed good concordance in both estimated expression level and statistical significance of common genes between the Affymetrix MOE430A GeneChip and the oligonucleotide arrays. Despite their apparently high precision, cDNA arrays showed poor concordance with other platforms.  相似文献   

8.
The ability to generate whole genome data is rapidly becoming commoditized. For example, a mammalian sized genome (~3Gb) can now be sequenced using approximately ten lanes on an Illumina HiSeq 2000. Since lanes from different runs are often combined, verifying that each lane in a genome's build is from the same sample is an important quality control. We sought to address this issue in a post hoc bioinformatic manner, instead of using upstream sample or "barcode" modifications. We rely on the inherent small differences between any two individuals to show that genotype concordance rates can be effectively used to test if any two lanes of HiSeq 2000 data are from the same sample. As proof of principle, we use recent data from three different human samples generated on this platform. We show that the distributions of concordance rates are non-overlapping when comparing lanes from the same sample versus lanes from different samples. Our method proves to be robust even when different numbers of reads are analyzed. Finally, we provide a straightforward method for determining the gender of any given sample. Our results suggest that examining the concordance of detected genotypes from lanes purported to be from the same sample is a relatively simple approach for confirming that combined lanes of data are of the same identity and quality.  相似文献   

9.

Background

The use of mass spectrometry to investigate disease-associated proteins among thousands of candidates simultaneously creates challenges with the evaluation of operational and biological variation. Traditional statistical methods, which evaluate reproducibility of a single feature, are likely to provide an inadequate assessment of reproducibility. This paper proposes a systematic approach for the evaluation of the global reproducibility of multidimensional mass spectral data at the post-identification stage.

Methods

The proposed systematic approach combines dimensional reduction and permutation to test and summarize the reproducibility. First, principal component analysis is applied to the mean quantities from identified features of paired replicated samples. An eigenvalue test is used to identify the number of significant principal components which reflect the underlying correlation pattern of the multiple features. Second, a simulation-based permutation test is applied to the derived paired principal components. Third, a modified form of Bland Altman or MA plot is produced to visualize agreement between the replicates. Last, a discordance index is used to summarize the agreement.

Results

Application of this method to data from both a cardiac liquid chromatography tandem mass spectrometry experiment with iTRAQ labeling and simulation experiments derived from an ovarian cancer SELDI-MS experiment demonstrate that the proposed global reproducibility test is sensitive to the simulated systematic bias when the sample size is above 15. The two proposed test statistics (max t statistics and a sign score statistic) for the permutation tests are shown to be reliable.

Conclusion

The methodology presented in this paper provides a systematic approach for the global measurement of reproducibility in clinical proteomic studies.  相似文献   

10.
Microarrays are used to study gene expression in a variety of biological systems. A number of different platforms have been developed, but few studies exist that have directly compared the performance of one platform with another. The goal of this study was to determine array variation by analyzing the same RNA samples with three different array platforms. Using gene expression responses to benzo[a]pyrene exposure in normal human mammary epithelial cells (NHMECs), we compared the results of gene expression profiling using three microarray platforms: photolithographic oligonucleotide arrays (Affymetrix), spotted oligonucleotide arrays (Amersham), and spotted cDNA arrays (NCI). While most previous reports comparing microarrays have analyzed pre-existing data from different platforms, this comparison study used the same sample assayed on all three platforms, allowing for analysis of variation from each array platform. In general, poor correlation was found with corresponding measurements from each platform. Each platform yielded different gene expression profiles, suggesting that while microarray analysis is a useful discovery tool, further validation is needed to extrapolate results for broad use of the data. Also, microarray variability needs to be taken into consideration, not only in the data analysis but also in specific probe selection for each array type.  相似文献   

11.
Recently, a number of collaborative large-scale mouse mutagenesis programs have been launched. These programs aim for a better understanding of the roles of all individual coding genes and the biological systems in which these genes participate. In international efforts to share phenotypic data among facilities/institutes, it is desirable to integrate information obtained from different phenotypic platforms reliably. Since the definitions of specific phenotypes often depend on a tacit understanding of concepts that tends to vary among different facilities, it is necessary to define phenotypes based on the explicit evidence of assay results. We have developed a website termed PhenoSITE (Phenome Semantics Information with Terminology of Experiments: http://www.gsc.riken.jp/Mouse/), in which we are trying to integrate phenotype-related information using an experimental-evidence-based approach. The site's features include (1) a baseline database for our phenotyping platform; (2) an ontology associating international phenotypic definitions with experimental terminologies used in our phenotyping platform; (3) a database for standardized operation procedures of the phenotyping platform; and (4) a database for mouse mutants using data produced from the large-scale mutagenesis program at RIKEN GSC. We have developed two types of integrated viewers to enhance the accessibility to mutant resource information. One viewer depicts a matrix view of the ontology-based classification and chromosomal location of each gene; the other depicts ontology-mediated integration of experimental protocols, baseline data, and mutant information. These approaches rely entirely upon experiment-based evidence, ensuring the reliability of the integrated data from different phenotyping platforms.  相似文献   

12.
There is an increasing interest in the quantitative proteomic measurement of the protein contents of substantially similar biological samples, e.g. for the analysis of cellular response to perturbations over time or for the discovery of protein biomarkers from clinical samples. Technical limitations of current proteomic platforms such as limited reproducibility and low throughput make this a challenging task. A new LC-MS-based platform is able to generate complex peptide patterns from the analysis of proteolyzed protein samples at high throughput and represents a promising approach for quantitative proteomics. A crucial component of the LC-MS approach is the accurate evaluation of the abundance of detected peptides over many samples and the identification of peptide features that can stratify samples with respect to their genetic, physiological, or environmental origins. We present here a new software suite, SpecArray, that generates a peptide versus sample array from a set of LC-MS data. A peptide array stores the relative abundance of thousands of peptide features in many samples and is in a format identical to that of a gene expression microarray. A peptide array can be subjected to an unsupervised clustering analysis to stratify samples or to a discriminant analysis to identify discriminatory peptide features. We applied the SpecArray to analyze two sets of LC-MS data: one was from four repeat LC-MS analyses of the same glycopeptide sample, and another was from LC-MS analysis of serum samples of five male and five female mice. We demonstrate through these two study cases that the SpecArray software suite can serve as an effective software platform in the LC-MS approach for quantitative proteomics.  相似文献   

13.
14.
BACKGROUND: Diagnostic discordance for osteoporosis is the observation that the T-score of an individual patient varies from one key measurement site to another, falling into two different diagnostic categories identified by the World Health Organization (WHO) classification system. This study was conducted to evaluate the presence and risk factors for this phenomenon in a large sample of Iranian population. METHODS: Demographic data, anthropometric measurements, and risk factors for osteoporosis were derived from a database on 4229 patients referred to a community-based outpatient osteoporosis testing center from 2000 to 2003. Dual-energy X-ray absorptiometry (DXA) was performed on L1-L4 lumbar spine and total hip for all cases. Minor discordance was defined as present when the difference between two sites was no more than one WHO diagnostic class. Major discordance was present when one site is osteoporotic and the other is normal. Subjects with incomplete data were excluded. RESULTS: In 4188 participants (3848 female, mean age 53.4 +/- 11.8 years), major discordance, minor discordance, and concordance of T-scores were seen in 2.7%, 38.9% and 58.3%, respectively. In multivariate logistic regression analysis, older age, menopause, obesity, and belated menopause were recognized as risk factors and hormone replacement therapy as a protective factor against T-score discordance. CONCLUSION: The high prevalence of T-score discordance may lead to problems in interpretation of the densitometry results for some patients. This phenomenon should be regarded as a real and prevalent finding and physicians should develop a particular strategy approaching to these patients.  相似文献   

15.
16.
Adjustment of systematic microarray data biases   总被引:6,自引:0,他引:6  
MOTIVATION: Systematic differences due to experimental features of microarray experiments are present in most large microarray data sets. Many different experimental features can cause biases including different sources of RNA, different production lots of microarrays or different microarray platforms. These systematic effects present a substantial hurdle to the analysis of microarray data. RESULTS: We present here a new method for the identification and adjustment of systematic biases that are present within microarray data sets. Our approach is based on modern statistical discrimination methods and is shown to be very effective in removing systematic biases present in a previously published breast tumor cDNA microarray data set. The new method of 'Distance Weighted Discrimination (DWD)' is shown to be better than Support Vector Machines and Singular Value Decomposition for the adjustment of systematic microarray effects. In addition, it is shown to be of general use as a tool for the discrimination of systematic problems present in microarray data sets, including the merging of two breast tumor data sets completed on different microarray platforms. AVAILABILITY: Matlab software to perform DWD can be retrieved from https://genome.unc.edu/pubsup/dwd/  相似文献   

17.
We have assessed the utility of RNA titration samples for evaluating microarray platform performance and the impact of different normalization methods on the results obtained. As part of the MicroArray Quality Control project, we investigated the performance of five commercial microarray platforms using two independent RNA samples and two titration mixtures of these samples. Focusing on 12,091 genes common across all platforms, we determined the ability of each platform to detect the correct titration response across the samples. Global deviations from the response predicted by the titration ratios were observed. These differences could be explained by variations in relative amounts of messenger RNA as a fraction of total RNA between the two independent samples. Overall, both the qualitative and quantitative correspondence across platforms was high. In summary, titration samples may be regarded as a valuable tool, not only for assessing microarray platform performance and different analysis methods, but also for determining some underlying biological features of the samples.  相似文献   

18.
Many platforms for genome-wide analysis of gene expression contain ‘redundant’ measures for the same gene. For example, the most highly utilized platforms for gene expression microarrays, Affymetrix GeneChip® arrays, have as many as ten or more probe sets for some genes. Occasionally, individual probe sets for the same gene report different trends in expression across experimental conditions, a situation that must be resolved in order to accurately interpret the data. We developed an algorithm, SCOREM, for determining the level of agreement between such probe sets, utilizing a statistical test of concordance, Kendall''s W coefficient of concordance, and a graph-searching algorithm for the identification of concordant probe sets. We also present methods for consolidating concordant groups into a single value for its corresponding gene and for post hoc analysis of discordant groups. By combining statistical consolidation with sequence analysis, SCOREM possesses the unique ability to identify biologically meaningful discordant behaviors, including differing behaviors in alternate RNA isoforms and tissue-specific patterns of expression. When consolidating concordant behaviors, SCOREM outperforms other methods in detecting both differential expression and overrepresented functional categories.  相似文献   

19.
Our goal in this paper is to show an analytical workflow for selecting protein biomarker candidates from SELDI-MS data. The clinical question at issue is to enable prediction of the complete remission (CR) duration for acute myeloid leukemia (AML) patients. This would facilitate disease prognosis and make individual therapy possible. SELDI-mass spectrometry proteomics analyses were performed on blast cell samples collected from AML patients pre-chemotherapy. Although the biobank available included approximately 200 samples, only 58 were available for analysis. The presented workflow includes sample selection, experimental optimization, repeatability estimation, data preprocessing, data fusion, and feature selection. Specific difficulties have been the small number of samples and the skew distribution of the CR duration among the patients. Further, we had to deal with both noisy SELDI-MS data and a diverse patient cohort. This has been handled by sample selection and several methods for data preprocessing and feature detection in the analysis workflow. Four conceptually different methods for peak detection and alignment were considered, as well as two diverse methods for feature selection. The peak detection and alignment methods included the recently developed annotated regions of significance (ARS) method, the SELDI-MS software Ciphergen Express which was regarded as the standard method, segment-wise spectral alignment by a genetic algorithm (PAGA) followed by binning, and, finally, binning of raw data. In the feature selection, the "standard" Mann-Whitney t test was compared with a hierarchical orthogonal partial least-squares (O-PLS) analysis approach. The combined information from all these analyses gave a collection of 21 protein peaks. These were regarded as the most potential and robust biomarker candidates since they were picked out as significant features in several of the models. The chosen peaks will now be our first choice for the continuing work on protein identification and biological validation. The identification will be performed by chromatographic purification and MALDI MS/MS. Thus, we have shown that the use of several data handling methods can improve a protein profiling workflow from experimental optimization to a predictive model. The framework of this methodology should be seen as general and could be used with other one-dimensional spectral omics data than SELDI MS including an adequate number of samples.  相似文献   

20.
Profiling of mRNA abundances with high-throughput platforms such as microarrays and RNA-seq has become an important tool in both basic and biomedical research. However, these platforms remain prone to systematic errors and have challenges in clinical and industrial applications. As a result, it is standard practice to validate a subset of key results using alternate technologies. Similarly, clinical and industrial applications typically involve transitions from a high-throughput discovery platform to medium-throughput validation ones. These medium-throughput validation platforms have high technical reproducibility and reduced sample input needs, and low sensitivity to sample quality (e.g., for processing FFPE specimens). Unfortunately, while medium-throughput platforms have proliferated, there are no comprehensive comparisons of them. Here we fill that gap by comparing two key medium-throughput platforms—NanoString''s nCounter Analysis System and ABI''s OpenArray System—to gold-standard quantitative real-time RT-PCR. We quantified 38 genes and positive and negative controls in 165 samples. Signal:noise ratios, correlations, dynamic range, and detection accuracy were compared across platforms. All three measurement technologies showed good concordance, but with divergent price/time/sensitivity trade-offs. This study provides the first detailed comparison of medium-throughput RNA quantification platforms and provides a template and a standard data set for the evaluation of additional technologies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号