首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data.

Results

We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset.

Conclusion

Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data.  相似文献   

2.
3.
The present study was conducted to identify and characterize the thermophilic bacteria isolated from various hot springs in Turkey by using phenotypic and genotypic methods including fatty acid methyl ester and rep-PCR profilings, and 16S rRNA sequencing. The data of fatty acid analysis showed the presence of 17 different fatty acids in 15 bacterial strains examined in this study. Six fatty acids, 15:0 iso, 15:0 anteiso, 16:0, 16:0 iso, 17:0 iso, and 17:0 anteiso, were present in all strains. The bacterial strains were classified into three phenotypic groups based on fatty acid profiles which were confirmed by genotypic methods such as 16S rRNA sequence analysis and rep-PCR genomic fingerprint profiles. After evaluating several primer sets targeting the repetitive DNA elements of REP, ERIC, BOX and (GTG)5, the (GTG)5 and BOXA1R primers were found to be the most reliable technique for identification and taxonomic characterization of thermophilic bacteria in the genera of Geobacillus, Anoxybacillus and Bacillus spp. Therefore, rep-PCR fingerprinting using the (GTG)5 and BOXA1R primers can be considered as a promising genotypic tool for the identification and characterization of thermophilic bacteria from species to strain level.  相似文献   

4.
5.
Transfected cell microarray is a promising method for accelerating the functional exploration of the genome, giving information about protein function in the living cell. The microarrays consist of clusters of cells (spots) overexpressing or silencing a particular gene product. The subsequent analysis of the phenotypic consequences of such perturbations can then be detected using cell-based assays. The focus in the present study was to establish an experimental design and a robust analysis approach for fluorescence intensity data, and to address the use of replicates for studying regulation of gene expression with varying complexity and effect size. Our analysis pipeline includes measurement of fluorescence intensities, normalization strategies using negative control spots and internal control plasmids, and linear regression (ANOVA) modelling for estimating biological effects and calculating P-values for comparisons of interests. Our results show the potential of transfected cell microarrays in studying complex regulation of gene expression by enabling measurement of biological responses in cells with overexpression and downregulation of specific gene products, combined with the possibility of assaying the effects of external stimuli. Simulation experiments show that transfected cell microarrays can be used to reliably detect even quantitatively minor biological effects by including several technical and experimental replicates.  相似文献   

6.
Molecules produced by Rhizobium meliloti increase respiration of alfalfa (Medicago sativa L.) roots. Maximum respiratory increases, measured either as CO2 evolution or as O2 uptake, were elicited in roots of 3-d-old seedlings by 16 h of exposure to living or dead R. meliloti cells at densities of 107 bacteria/mL. Excising roots after exposure to bacteria and separating them into root-tip- and root-hair-containing segments showed that respiratory increases occurred only in the root-hair region. In such assays, CO2 production by segments with root hairs increased by as much as 100% in the presence of bacteria. Two partially purified compounds from R. meliloti 1021 increased root respiration at very low, possibly picomolar, concentrations. One factor, peak B, resembled known pathogenic elicitors because it produced a rapid (15-min), transitory increase in respiration. A second factor, peak D, was quite different because root respiration increased slowly for 8 h and was maintained at the higher level. These molecules differ from lipo-chitin oligosaccharides active in root nodulation for the following reasons: (a) they do not curl alfalfa root hairs, (b) they are synthesized by bacteria in the absence of known plant inducer molecules, and (c) they are produced by a mutant R. meliloti that does not synthesize known lipo-chitin oligosaccharides. The peak-D compound(s) may benefit both symbionts by increasing CO2, which is required for growth of R. meliloti, and possibly by increasing the energy that is available in the plant to form root nodules.  相似文献   

7.
The proper identification of differentially methylated CpGs is central in most epigenetic studies. The Illumina HumanMethylation450 BeadChip is widely used to quantify DNA methylation; nevertheless, the design of an appropriate analysis pipeline faces severe challenges due to the convolution of biological and technical variability and the presence of a signal bias between Infinium I and II probe design types. Despite recent attempts to investigate how to analyze DNA methylation data with such an array design, it has not been possible to perform a comprehensive comparison between different bioinformatics pipelines due to the lack of appropriate data sets having both large sample size and sufficient number of technical replicates. Here we perform such a comparative analysis, targeting the problems of reducing the technical variability, eliminating the probe design bias and reducing the batch effect by exploiting two unpublished data sets, which included technical replicates and were profiled for DNA methylation either on peripheral blood, monocytes or muscle biopsies. We evaluated the performance of different analysis pipelines and demonstrated that: (1) it is critical to correct for the probe design type, since the amplitude of the measured methylation change depends on the underlying chemistry; (2) the effect of different normalization schemes is mixed, and the most effective method in our hands were quantile normalization and Beta Mixture Quantile dilation (BMIQ); (3) it is beneficial to correct for batch effects. In conclusion, our comparative analysis using a comprehensive data set suggests an efficient pipeline for proper identification of differentially methylated CpGs using the Illumina 450K arrays.  相似文献   

8.
The coupling of chromosome conformation capture (3C) with next-generation sequencing technologies enables the high-throughput detection of long-range genomic interactions, via the generation of ligation products between DNA sequences, which are closely juxtaposed in vivo. These interactions involve promoter regions, enhancers and other regulatory and structural elements of chromosomes and can reveal key details of the regulation of gene expression. 3C-seq is a variant of the method for the detection of interactions between one chosen genomic element (viewpoint) and the rest of the genome. We present r3Cseq, an R/Bioconductor package designed to perform 3C-seq data analysis in a number of different experimental designs. The package reads a common aligned read input format, provides data normalization, allows the visualization of candidate interaction regions and detects statistically significant chromatin interactions, thus greatly facilitating hypothesis generation and the interpretation of experimental results. We further demonstrate its use on a series of real-world applications.  相似文献   

9.
The identification of peptides and proteins from fragmentation mass spectra is a very common approach in the field of proteomics. Contemporary high-throughput peptide identification pipelines can quickly produce large quantities of MS/MS data that contain valuable knowledge about the actual physicochemical processes involved in the peptide fragmentation process, which can be extracted through extensive data mining studies. As these studies attempt to exploit the intensity information contained in the MS/MS spectra, a critical step required for a meaningful comparison of this information between MS/MS spectra is peak intensity normalization. We here describe a procedure for quantifying the efficiency of different published normalization methods in terms of the quartile coefficient of dispersion (qcod) statistic. The quartile coefficient of dispersion is applied to measure the dispersion of the peak intensities between redundant MS/MS spectra, allowing the quantification of the differences in computed peak intensity reproducibility between the different normalization methods. We demonstrate that our results are independent of the data set used in the evaluation procedure, allowing us to provide generic guidance on the choice of normalization method to apply in a certain MS/MS pipeline application.  相似文献   

10.
The net ecosystem exchange (NEE) of forests represents the balance of gross primary productivity (GPP) and respiration (R). Methods to estimate these two components from eddy covariance flux measurements are usually based on a functional relationship between respiration and temperature that is calibrated for night‐time (respiration) fluxes and subsequently extrapolated using daytime temperature measurements. However, respiration fluxes originate from different parts of the ecosystem, each of which experiences its own course of temperature. Moreover, if the temperature–respiration function is fitted to combined data from different stages of biological development or seasons, a spurious temperature effect may be included that will lead to overestimation of the direct effect of temperature and therefore to overestimates of daytime respiration. We used the EUROFLUX eddy covariance data set for 15 European forests and pooled data per site, month and for conditions of low and sufficient soil moisture, respectively. We found that using air temperature (measured above the canopy) rather than soil temperature (measured 5 cm below the surface) yielded the most reliable and consistent exponential (Q10) temperature–respiration relationship. A fundamental difference in air temperature‐based Q10 values for different sites, times of year or soil moisture conditions could not be established; all were in the range 1.6–2.5. However, base respiration (R0, i.e. respiration rate scaled to 0°C) did vary significantly among sites and over the course of the year, with increased base respiration rates during the growing season. We used the overall mean Q10 of 2.0 to estimate annual GPP and R. Testing suggested that the uncertainty in total GPP and R associated with the method of separation was generally well within 15%. For the sites investigated, we found a positive relationship between GPP and R, indicating that there is a latitudinal trend in NEE because the absolute decrease in GPP towards the pole is greater than in R.  相似文献   

11.
12.
13.
The effect of resource pulses, such as rainfall events, on soil respiration plays an important role in controlling grassland carbon balance, but how shifts in long-term precipitation regime regulate rain pulse effect on soil respiration is still unclear. We first quantified the influence of rainfall event on soil respiration based on a two-year (2006 and 2009) continuously measured soil respiration data set in a temperate steppe in northern China. In 2006 and 2009, soil carbon release induced by rainfall events contributed about 44.5% (83.3 g C m−2) and 39.6% (61.7 g C m−2) to the growing-season total soil respiration, respectively. The pulse effect of rainfall event on soil respiration can be accurately predicted by a water status index (WSI), which is the product of rainfall event size and the ratio between antecedent soil temperature to moisture at the depth of 10 cm (r 2 = 0.92, P<0.001) through the growing season. It indicates the pulse effect can be enhanced by not only larger individual rainfall event, but also higher soil temperature/moisture ratio which is usually associated with longer dry spells. We then analyzed a long-term (1953–2009) precipitation record in the experimental area. We found both the extreme heavy rainfall events (>40 mm per event) and the long dry-spells (>5 days) during the growing seasons increased from 1953–2009. It suggests the shift in precipitation regime has increased the contribution of rain pulse effect to growing-season total soil respiration in this region. These findings highlight the importance of incorporating precipitation regime shift and its impacts on the rain pulse effect into the future predictions of grassland carbon cycle under climate change.  相似文献   

14.
Penny D  Stowe BB 《Plant physiology》1966,41(2):360-365
Biologically active lipids increase the growth of pea stem sections within 3 hours at the same time their respiration is increased and their growth rate is more than that of the intact plant. The greater final length of the intact internode is due to a longer growth period.

Both active and inactive lipids are rapidly taken up and enter all major metabolic fractions: among centrifugal fractions methyl oleate tends to label those that contain metabolically active membranes. It is concluded that lipids active in the bioassay are probably the effective molecules at the subcellular site of action.

No direct effect of lipids on isolated mitochondria could be shown. The respiration of stem tissue was not influenced by dinitrophenol and carbonyl cyano m-chlorophenyl hydrazone although dinitrophenol inhibited growth. Lipid-induced respiration was sensitive to these agents as well as to cyanide, indicating cytochrome oxidase is probably involved.

The promotion of growth and respiration by lipids is not linked to protein synthesis, since actinomycin D, puromycin and cycloheximide failed to inhibit the respiratory increase even though strongly limiting amino acid incorporation into protein. It is most likely that the effect of lipids on growth is due to their promotion of respiration.

  相似文献   

15.

Background  

Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data.  相似文献   

16.
We discuss the identification of genes that are associated with an outcome in RNA sequencing and other sequence-based comparative genomic experiments. RNA-sequencing data take the form of counts, so models based on the Gaussian distribution are unsuitable. Moreover, normalization is challenging because different sequencing experiments may generate quite different total numbers of reads. To overcome these difficulties, we use a log-linear model with a new approach to normalization. We derive a novel procedure to estimate the false discovery rate (FDR). Our method can be applied to data with quantitative, two-class, or multiple-class outcomes, and the computation is fast even for large data sets. We study the accuracy of our approaches for significance calculation and FDR estimation, and we demonstrate that our method has potential advantages over existing methods that are based on a Poisson or negative binomial model. In summary, this work provides a pipeline for the significance analysis of sequencing data.  相似文献   

17.
Markov models of ion channel dynamics have evolved as experimental advances have improved our understanding of channel function. Past studies have examined limited sets of various topologies for Markov models of channel dynamics. We present a systematic method for identification of all possible Markov model topologies using experimental data for two types of native voltage-gated ion channel currents: mouse atrial sodium currents and human left ventricular fast transient outward potassium currents. Successful models identified with this approach have certain characteristics in common, suggesting that aspects of the model topology are determined by the experimental data. Incorporating these channel models into cell and tissue simulations to assess model performance within protocols that were not used for training provided validation and further narrowing of the number of acceptable models. The success of this approach suggests a channel model creation pipeline may be feasible where the structure of the model is not specified a priori.  相似文献   

18.
19.
20.
Hop plant (Humulus lupulus L.), cultivated primarily for its use in the brewing industry, is faced with a variety of diseases, including severe vascular diseases, such as Verticillium wilt, against which no effective protection is available. The understanding of disease resistance with tools such as differentially expressed gene studies is an important objective of plant defense mechanisms. In this study, we evaluated twenty-three reference genes for RT-qPCR expression studies on hop under biotic stress conditions. The candidate genes were validated on susceptible and resistant hop cultivars sampled at three different time points after infection with Verticillium albo-atrum. The stability of expression and the number of genes required for accurate normalization were assessed by three different Excel-based approaches (geNorm v.3.5 software, NormFinder, and RefFinder). High consistency was found among them, identifying the same six best reference genes (YLS8, DRH1, TIP41, CAC, POAC and SAND) and five least stably expressed genes (CYCL, UBQ11, POACT, GAPDH and NADH). The candidate genes in different experimental subsets/conditions resulted in different rankings. A combination of the two best reference genes, YLS8 and DRH1, was used for normalization of RT-qPCR data of the gene of interest (PR-1) implicated in biotic stress of hop. We outlined the differences between normalized and non-normalized values and the importance of RT-qPCR data normalization. The high correlation obtained among data standardized with different sets of reference genes confirms the suitability of the reference genes selected for normalization. Lower correlations between normalized and non-normalized data may reflect different quantity and/or quality of RNA samples used in RT-qPCR analyses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号