首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 859 毫秒
1.
Bisulfite sequencing (BS-seq) is the gold standard for studying genome-wide DNA methylation. We developed MOABS to increase the speed, accuracy, statistical power and biological relevance of BS-seq data analysis. MOABS detects differential methylation with 10-fold coverage at single-CpG resolution based on a Beta-Binomial hierarchical model and is capable of processing two billion reads in 24 CPU hours. Here, using simulated and real BS-seq data, we demonstrate that MOABS outperforms other leading algorithms, such as Fisher’s exact test and BSmooth. Furthermore, MOABS analysis can be easily extended to differential 5hmC analysis using RRBS and oxBS-seq. MOABS is available at http://code.google.com/p/moabs/.  相似文献   

2.
DNA methylation plays a crucial role in higher organisms. Coupling bisulfite treatment with next generation sequencing enables the interrogation of 5-methylcytosine sites in the genome. However, bisulfite conversion introduces mismatches between the reads and the reference genome, which makes mapping of Illumina and SOLiD reads slow and inaccurate. BatMeth is an algorithm that integrates novel Mismatch Counting, List Filtering, Mismatch Stage Filtering and Fast Mapping onto Two Indexes components to improve unique mapping rate, speed and precision. Experimental results show that BatMeth is faster and more accurate than existing tools. BatMeth is freely available at http://code.google.com/p/batmeth/.  相似文献   

3.
Bisulfite treatment can be used to ascertain the methylation states of individual cytosines in DNA. Ideally, bisulfite treatment deaminates unmethylated cytosines to uracils, and leaves 5-methylcytosines unchanged. Two types of bisulfite-conversion error occur: inappropriate conversion of 5-methylcytosine to thymine, and failure to convert unmethylated cytosine to uracil. Conventional bisulfite treatment requires hours of exposure to low-molarity, low-temperature bisulfite (‘LowMT’) and, sometimes, thermal denaturation. An alternate, high-molarity, high-temperature (‘HighMT’) protocol has been reported to accelerate conversion and to reduce inappropriate conversion. We used molecular encoding to obtain validated, individual-molecule data on failed- and inappropriate-conversion frequencies for LowMT and HighMT treatments of both single-stranded and hairpin-linked oligonucleotides. After accounting for bisulfite-independent error, we found that: (i) inappropriate-conversion events accrue predominantly on molecules exposed to bisulfite after they have attained complete or near-complete conversion; (ii) the HighMT treatment is preferable because it yields greater homogeneity among sites and among molecules in conversion rates, and thus yields more reliable data; (iii) different durations of bisulfite treatment will yield data appropriate to address different experimental questions; and (iv) conversion errors can be used to assess the validity of methylation data collected without the benefit of molecular encoding.  相似文献   

4.
DNA methylation is a chemical modification of cytosine bases that is pivotal for gene regulation, cellular specification and cancer development. Here, we describe an R package, methylKit, that rapidly analyzes genome-wide cytosine epigenetic profiles from high-throughput methylation and hydroxymethylation sequencing experiments. methylKit includes functions for clustering, sample quality visualization, differential methylation analysis and annotation features, thus automating and simplifying many of the steps for discerning statistically significant bases or regions of DNA methylation. Finally, we demonstrate methylKit on breast cancer data, in which we find statistically significant regions of differential methylation and stratify tumor subtypes. methylKit is available at http://code.google.com/p/methylkit.  相似文献   

5.
6.

Background  

Bisulfite sequencing is a powerful technique to study DNA cytosine methylation. Bisulfite treatment followed by PCR amplification specifically converts unmethylated cytosines to thymine. Coupled with next generation sequencing technology, it is able to detect the methylation status of every cytosine in the genome. However, mapping high-throughput bisulfite reads to the reference genome remains a great challenge due to the increased searching space, reduced complexity of bisulfite sequence, asymmetric cytosine to thymine alignments, and multiple CpG heterogeneous methylation.  相似文献   

7.
《Epigenetics》2013,8(2):94-100
Differential denaturation during PCR can be used to selectively amplify unmethylated DNA from a methylated DNA background. The use of differential denaturation in PCR is particularly suited to amplification of undermethylated sequences following treatment with bisulphite, since bisulphite selectively converts cytosines to uracil while methylated cytosines remain unreactive. Thus amplicons derived from unmethylated DNA retain less cytosines and their lower G + C content allows for their amplification at the lower melting temperatures, while limiting amplification of the corresponding methylated amplicons (Bisulphite Differential Denaturation PCR, BDD-PCR). Selective amplification of unmethylated DNA of four human genomic regions from three genes, GSTP1, BRCA1 and MAGE-A1, is demonstrated with selectivity observed at a ratio of down to one unmethylated molecule in 105 methylated molecules. BDD-PCR has the potential to be used to selectively amplify and detect aberrantly demethylated genes, such as oncogenes, in cancers. Additionally BDD-PCR can be effectively utilised in improving the specificity of methylation specific PCR (MSP) by limiting amplification of DNA that is not fully converted, thus preventing misinterpretation of the methylation versus non-conversion.   相似文献   

8.

Background

Whole genome sequencing of bisulfite converted DNA (‘methylC-seq’) method provides comprehensive information of DNA methylation. An important application of these whole genome methylation maps is classifying each position as a methylated versus non-methylated nucleotide. A widely used current method for this purpose, the so-called binomial method, is intuitive and straightforward, but lacks power when the sequence coverage and the genome-wide methylation level are low. These problems present a particular challenge when analyzing sparsely methylated genomes, such as those of many invertebrates and plants.

Results

We demonstrate that the number of sequence reads per position from methylC-seq data displays a large variance and can be modeled as a shifted negative binomial distribution. We also show that DNA methylation levels of adjacent CpG sites are correlated, and this similarity in local DNA methylation levels extends several kilobases. Taking these observations into account, we propose a new method based on Bayesian classification to infer DNA methylation status while considering the neighborhood DNA methylation levels of a specific site. We show that our approach has higher sensitivity and better classification performance than the binomial method via multiple analyses, including computational simulations, Area Under Curve (AUC) analyses, and improved consistencies across biological replicates. This method is especially advantageous in the analyses of sparsely methylated genomes with low coverage.

Conclusions

Our method improves the existing binomial method for binary methylation calls by utilizing a posterior odds framework and incorporating local methylation information. This method should be widely applicable to the analyses of methylC-seq data from diverse sparsely methylated genomes. Bis-Class and example data are provided at a dedicated website (http://bibs.snu.ac.kr/software/Bisclass).

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-608) contains supplementary material, which is available to authorized users.  相似文献   

9.

Background

Methylenetetrahydrofolate reductase (MTHFR) is an important enzyme of folate and methionine metabolism, making it crucial for DNA synthesis and methylation. The objective of this study was to analyze MTHFR gene 677C>T polymorphism in infertile male individuals from North India, followed by a meta-analysis on our data and published studies.

Methodology/Principal Findings

We undertook genotyping on a total of 837 individuals including well characterized infertile (N = 522) and confirmed fertile (N = 315) individuals. The SNP was typed by direct DNA sequencing. Chi square test was done for statistical analysis. Published studies were searched using appropriate keywords. Source of data collection for meta-analysis included ‘Pubmed’, ‘Ovid’ and ‘Google Scholar’. Those studies analyzing 677C>T polymorphism in male infertility and presenting all relevant data were included in meta-analysis. The genotype data for infertile subjects and fertile controls was extracted from each study. Chi square test was done to obtain odds ratio (OR) and p-value. Meta-analysis was performed using Comprehensive Meta-analysis software (Version 2). The frequency of mutant (T) allele (p = 0.0025) and genotypes (CT+TT) (p = 0.0187) was significantly higher in infertile individuals in comparison to fertile controls in our case-control study. The overall summary estimate (OR) for allele and genotype meta-analysis were 1.304 (p = 0.000), 1.310 (p = 0.000), respectively, establishing significant association of 677C>T polymorphism with male infertility.

Conclusions/Significance

677C>T substitution associated strongly with male infertility in Indian population. Allele and genotype meta-analysis also supported its strong correlation with male infertility, thus establishing it as a risk factor.  相似文献   

10.
MOTIVATION: Methylation of cytosines in DNA plays an important role in the regulation of gene expression, and the analysis of methylation patterns is fundamental for the understanding of cell differentiation, aging processes, diseases and cancer development. Such analysis has been limited, because technologies for detailed and efficient high-throughput studies have not been available. We have developed a novel quantitative methylation analysis algorithm and workflow based on direct DNA sequencing of PCR products from bisulfite-treated DNA with high-throughput sequencing machines. This technology is a prerequisite for success of the Human Epigenome Project, the first large genome-wide sequencing study for DNA methylation in many different tissues. Methylation in tissue samples which are compositions of different cells is a quantitative information represented by cytosine/thymine proportions after bisulfite conversion of unmethylated cytosines to uracil and PCR. Calculation of quantitative methylation information from base proportions represented by different dye signals in four-dye sequencing trace files needs a specific algorithm handling imbalanced and overscaled signals, incomplete conversion, quality problems and basecaller artifacts. RESULTS: The algorithm we developed has several key properties: it analyzes trace files from PCR products of bisulfite-treated DNA sequenced directly on ABI machines; it yields quantitative methylation measurements for individual cytosine positions after alignment with genomic reference sequences, signal normalization and estimation of effectiveness of bisulfite treatment; it works in a fully automated pipeline including data quality monitoring; it is efficient and avoids the usual cost of multiple sequencing runs on subclones to estimate DNA methylation. The power of our new algorithm is demonstrated with data from two test systems based on mixtures with known base compositions and defined methylation. In addition, the applicability is proven by identifying CpGs that are differentially methylated in real tissue samples.  相似文献   

11.
The development of Next Generation Sequencing technologies, capable of sequencing hundreds of millions of short reads (25–70 bp each) in a single run, is opening the door to population genomic studies of non-model species. In this paper we present SHRiMP - the SHort Read Mapping Package: a set of algorithms and methods to map short reads to a genome, even in the presence of a large amount of polymorphism. Our method is based upon a fast read mapping technique, separate thorough alignment methods for regular letter-space as well as AB SOLiD (color-space) reads, and a statistical model for false positive hits. We use SHRiMP to map reads from a newly sequenced Ciona savignyi individual to the reference genome. We demonstrate that SHRiMP can accurately map reads to this highly polymorphic genome, while confirming high heterozygosity of C. savignyi in this second individual. SHRiMP is freely available at http://compbio.cs.toronto.edu/shrimp.  相似文献   

12.
With the aim of understanding relationship between genetic and phenotypic variations in cultivated tomato, single nucleotide polymorphism (SNP) markers covering the whole genome of cultivated tomato were developed and genome-wide association studies (GWAS) were performed. The whole genomes of six tomato lines were sequenced with the ABI-5500xl SOLiD sequencer. Sequence reads covering ∼13.7× of the genome for each line were obtained, and mapped onto tomato reference genomes (SL2.40) to detect ∼1.5 million SNP candidates. Of the identified SNPs, 1.5% were considered to confer gene functions. In the subsequent Illumina GoldenGate assay for 1536 SNPs, 1293 SNPs were successfully genotyped, and 1248 showed polymorphisms among 663 tomato accessions. The whole-genome linkage disequilibrium (LD) analysis detected highly biased LD decays between euchromatic (58 kb) and heterochromatic regions (13.8 Mb). Subsequent GWAS identified SNPs that were significantly associated with agronomical traits, with SNP loci located near genes that were previously reported as candidates for these traits. This study demonstrates that attractive loci can be identified by performing GWAS with a large number of SNPs obtained from re-sequencing analysis.  相似文献   

13.
《PloS one》2014,9(4)
We present a draft assembly of the genome of European pear (Pyrus communis) ‘Bartlett’. Our assembly was developed employing second generation sequencing technology (Roche 454), from single-end, 2 kb, and 7 kb insert paired-end reads using Newbler (version 2.7). It contains 142,083 scaffolds greater than 499 bases (maximum scaffold length of 1.2 Mb) and covers a total of 577.3 Mb, representing most of the expected 600 Mb Pyrus genome. A total of 829,823 putative single nucleotide polymorphisms (SNPs) were detected using re-sequencing of ‘Louise Bonne de Jersey’ and ‘Old Home’. A total of 2,279 genetically mapped SNP markers anchor 171 Mb of the assembled genome. Ab initio gene prediction combined with prediction based on homology searching detected 43,419 putative gene models. Of these, 1219 proteins (556 clusters) are unique to European pear compared to 12 other sequenced plant genomes. Analysis of the expansin gene family provided an example of the quality of the gene prediction and an insight into the relationships among one class of cell wall related genes that control fruit softening in both European pear and apple (Malus×domestica). The ‘Bartlett’ genome assembly v1.0 (http://www.rosaceae.org/species/pyrus/pyrus_communis/genome_v1.0) is an invaluable tool for identifying the genetic control of key horticultural traits in pear and will enable the wide application of marker-assisted and genomic selection that will enhance the speed and efficiency of pear cultivar development.  相似文献   

14.
High-throughput sequencing is increasingly being used in combination with bisulfite (BS) assays to study DNA methylation at nucleotide resolution. Although several programmes provide genome-wide alignment of BS-treated reads, the resulting information is not readily interpretable and often requires further bioinformatic steps for meaningful analysis. Current post-alignment BS-sequencing programmes are generally focused on the gene-specific level, a restrictive feature when analysis in the non-coding regions, such as enhancers and intergenic microRNAs, is required. Here, we present Genome Bisulfite Sequencing Analyser (GBSA—http://ctrad-csi.nus.edu.sg/gbsa), a free open-source software capable of analysing whole-genome bisulfite sequencing data with either a gene-centric or gene-independent focus. Through analysis of the largest published data sets to date, we demonstrate GBSA’s features in providing sequencing quality assessment, methylation scoring, functional data management and visualization of genomic methylation at nucleotide resolution. Additionally, we show that GBSA’s output can be easily integrated with other high-throughput sequencing data, such as RNA-Seq or ChIP-seq, to elucidate the role of methylated intergenic regions in gene regulation. In essence, GBSA allows an investigator to explore not only known loci but also all the genomic regions, for which methylation studies could lead to the discovery of new regulatory mechanisms.  相似文献   

15.
16.
DNA methylation pattern mapping is heavily studied in normal and diseased tissues. A variety of methods have been established to interrogate the cytosine methylation patterns in cells. Reduced representation of whole genome bisulfite sequencing was developed to detect quantitative base pair resolution cytosine methylation patterns at GC-rich genomic loci. This is accomplished by combining the use of a restriction enzyme followed by bisulfite conversion. Enhanced Reduced Representation Bisulfite Sequencing (ERRBS) increases the biologically relevant genomic loci covered and has been used to profile cytosine methylation in DNA from human, mouse and other organisms. ERRBS initiates with restriction enzyme digestion of DNA to generate low molecular weight fragments for use in library preparation. These fragments are subjected to standard library construction for next generation sequencing. Bisulfite conversion of unmethylated cytosines prior to the final amplification step allows for quantitative base resolution of cytosine methylation levels in covered genomic loci. The protocol can be completed within four days. Despite low complexity in the first three bases sequenced, ERRBS libraries yield high quality data when using a designated sequencing control lane. Mapping and bioinformatics analysis is then performed and yields data that can be easily integrated with a variety of genome-wide platforms. ERRBS can utilize small input material quantities making it feasible to process human clinical samples and applicable in a range of research applications. The video produced demonstrates critical steps of the ERRBS protocol.  相似文献   

17.

Background

Highly parallel sequencing technologies have become important tools in the analysis of sequence polymorphisms on a genomic scale. However, the development of customized software to analyze data produced by these methods has lagged behind.

Methods/Principal Findings

Here I describe a tool, ‘galign’, designed to identify polymorphisms between sequence reads obtained using Illumina/Solexa technology and a reference genome. The ‘galign’ alignment tool does not use Smith-Waterman matrices for sequence comparisons. Instead, a simple algorithm comparing parsed sequence reads to parsed reference genome sequences is used. ‘galign’ output is geared towards immediate user application, displaying polymorphism locations, nucleotide changes, and relevant predicted amino-acid changes for ease of information processing. To do so, ‘galign’ requires several accessory files easily derived from an annotated reference genome. Direct sequencing as well as in silico studies demonstrate that ‘galign’ provides lesion predictions comparable in accuracy to available prediction programs, accompanied by greater processing speed and more user-friendly output. We demonstrate the use of ‘galign’ to identify mutations leading to phenotypic consequences in C. elegans.

Conclusion/Significance

Our studies suggest that ‘galign’ is a useful tool for polymorphism discovery, and is of immediate utility for sequence mining in C. elegans.  相似文献   

18.

Background

Bisulfite sequencing using next generation sequencers yields genome-wide measurements of DNA methylation at single nucleotide resolution. Traditional aligners are not designed for mapping bisulfite-treated reads, where the unmethylated Cs are converted to Ts. We have developed BS Seeker, an approach that converts the genome to a three-letter alphabet and uses Bowtie to align bisulfite-treated reads to a reference genome. It uses sequence tags to reduce mapping ambiguity. Post-processing of the alignments removes non-unique and low-quality mappings.

Results

We tested our aligner on synthetic data, a bisulfite-converted Arabidopsis library, and human libraries generated from two different experimental protocols. We evaluated the performance of our approach and compared it to other bisulfite aligners. The results demonstrate that among the aligners tested, BS Seeker is more versatile and faster. When mapping to the human genome, BS Seeker generates alignments significantly faster than RMAP and BSMAP. Furthermore, BS Seeker is the only alignment tool that can explicitly account for tags which are generated by certain library construction protocols.

Conclusions

BS Seeker provides fast and accurate mapping of bisulfite-converted reads. It can work with BS reads generated from the two different experimental protocols, and is able to efficiently map reads to large mammalian genomes. The Python program is freely available at http://pellegrini.mcdb.ucla.edu/BS_Seeker/BS_Seeker.html.  相似文献   

19.
Structural variations (SVs) play a crucial role in genetic diversity. However, the alignments of reads near/across SVs are made inaccurate by the presence of polymorphisms. BatAlign is an algorithm that integrated two strategies called ‘Reverse-Alignment’ and ‘Deep-Scan’ to improve the accuracy of read-alignment. In our experiments, BatAlign was able to obtain the highest F-measures in read-alignments on mismatch-aberrant, indel-aberrant, concordantly/discordantly paired and SV-spanning data sets. On real data, the alignments of BatAlign were able to recover 4.3% more PCR-validated SVs with 73.3% less callings. These suggest BatAlign to be effective in detecting SVs and other polymorphic-variants accurately using high-throughput data. BatAlign is publicly available at https://goo.gl/a6phxB.  相似文献   

20.

Background

Repetitive behaviours (RB) in patients with Gilles de la Tourette syndrome (GTS) are frequent. However, a controversy persists whether they are manifestations of obssessive-compulsive disorder (OCD) or correspond to complex tics.

Methods

166 consecutive patients with GTS aged 15–68 years were recruited and submitted to extensive neurological, psychiatric and psychological evaluations. RB were evaluated by the YBOCS symptom checklist and Mini International Neuropsychiatric Interview (M.I.N.I), and classified on the basis of a semi-directive psychiatric interview as compulsions or tics.

Results

RB were present in 64.4% of patients with GTS (107/166) and categorised into 3 major groups: a ‘tic-like’ group (24.3%–40/166) characterised by RB such as touching, counting, ‘just right’ and symmetry searching; an ‘OCD-like’ group (20.5%–34/166) with washing and checking rituals; and a ‘mixed’ group (13.2%–22/166) with both ‘tics-like’ and ‘OCD-like’ types of RB present in the same patient. In 6.3% of patients, RB could not be classified into any of these groups and were thus considered ‘undetermined’.

Conclusions

The results confirm the phenomenological heterogeneity of RB in GTS patients and allows to distinguish two types: tic-like behaviours which are very likely an integral part of GTS; and OCD-like behaviours, which can be considered as a comorbid condition of GTS and were correlated with higher score of complex tics, neuroleptic and SSRIs treatment frequency and less successful socio-professional adaptation. We suggest that a meticulous semiological analysis of RB in GTS patients will help to tailor treatment and allow to better classify patients for future pathophysiologic studies.

Trial Registration

ClinicalTrials.gov NCT00169351  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号