首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 927 毫秒
1.
Current computational methods used to analyze changes in DNA methylation and chromatin modification rely on sequenced genomes. Here we describe a pipeline for the detection of these changes from short-read sequence data that does not require a reference genome. Open source software packages were used for sequence assembly, alignment, and measurement of differential enrichment. The method was evaluated by comparing results with reference-based results showing a strong correlation between chromatin modification and gene expression. We then used our de novo sequence assembly to build the DNA methylation profile for the non-referenced Psammomys obesus genome. The pipeline described uses open source software for fast annotation and visualization of unreferenced genomic regions from short-read data.  相似文献   

2.
SUMMARY: Accurate and complete mapping of short-read sequencing to a reference genome greatly enhances the discovery of biological results and improves statistical predictions. We recently presented RNA-MATE, a pipeline for the recursive mapping of RNA-Seq datasets. With the rapid increase in genome re-sequencing projects, progression of available mapping software and the evolution of file formats, we now present X-MATE, an updated version of RNA-MATE, capable of mapping both RNA-Seq and DNA datasets and with improved performance, output file formats, configuration files, and flexibility in core mapping software. AVAILABILITY: Executables, source code, junction libraries, test data and results and the user manual are available from http://grimmond.imb.uq.edu.au/X-MATE/.  相似文献   

3.
Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron''s Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage.  相似文献   

4.
The nucleosome is the fundamental packing unit of the eukaryotic genome, and CpG methylation is an epigenetic modification associated with gene repression and silencing. We investigated nucleosome assembly mediated by histone chaperone Nap1 and the effects of CpG methylation based on three-color single molecule FRET measurements, which enabled direct monitoring of histone binding in the context of DNA wrapping. According to our observation, (H3-H4)2 tetramer incorporation must precede H2A-H2B dimer binding, which is independent of DNA termini wrapping. Upon CpG methylation, (H3-H4)2 tetramer incorporation and DNA termini wrapping are facilitated, whereas proper incorporation of H2A-H2B dimers is inhibited. We suggest that these changes are due to rigidified DNA and increased random binding of histones to DNA. According to the results, CpG methylation expedites nucleosome assembly in the presence of abundant DNA and histones, which may help facilitate gene packaging in chromatin. The results also indicate that the slowest steps in nucleosome assembly are DNA termini wrapping and tetramer positioning, both of which are affected heavily by changes in the physical properties of DNA.  相似文献   

5.
We offer a guide to de novo genome assembly1 using sequence data generated by the Illumina platform for biologists working with fungi or other organisms whose genomes are less than 100 Mb in size. The guide requires no familiarity with sequencing assembly technology or associated computer programs. It defines commonly used terms in genome sequencing and assembly; provides examples of assembling short-read genome sequence data for four strains of the fungus Grosmannia clavigera using four assembly programs; gives examples of protocols and software; and presents a commented flowchart that extends from DNA preparation for submission to a sequencing center, through to processing and assembly of the raw sequence reads using freely available operating systems and software.  相似文献   

6.
Advances in DNA sequencing have made it feasible to gather genomic data for non‐model organisms and large sets of individuals, often using methods for sequencing subsets of the genome. Several of these methods sequence DNA associated with endonuclease restriction sites (various RAD and GBS methods). For use in taxa without a reference genome, these methods rely on de novo assembly of fragments in the sequencing library. Many of the software options available for this application were originally developed for other assembly types and we do not know their accuracy for reduced representation libraries. To address this important knowledge gap, we simulated data from the Arabidopsis thaliana and Homo sapiens genomes and compared de novo assemblies by six software programs that are commonly used or promising for this purpose (ABySS , CD‐HIT , Stacks , Stacks2 , Velvet and VSEARCH ). We simulated different mutation rates and types of mutations, and then applied the six assemblers to the simulated data sets, varying assembly parameters. We found substantial variation in software performance across simulations and parameter settings. ABySS failed to recover any true genome fragments, and Velvet and VSEARCH performed poorly for most simulations. Stacks and Stacks2 produced accurate assemblies of simulations containing SNPs, but the addition of insertion and deletion mutations decreased their performance. CD‐HIT was the only assembler that consistently recovered a high proportion of true genome fragments. Here, we demonstrate the substantial difference in the accuracy of assemblies from different software programs and the importance of comparing assemblies that result from different parameter settings.  相似文献   

7.

Background

DNA methylation plays crucial roles in epigenetic gene regulation in normal development and disease pathogenesis. Efficient and accurate quantification of DNA methylation at single base resolution can greatly advance the knowledge of disease mechanisms and be used to identify potential biomarkers. We developed an improved pipeline based on reduced representation bisulfite sequencing (RRBS) for cost-effective genome-wide quantification of DNA methylation at single base resolution. A selection of two restriction enzymes (TaqαI and MspI) enables a more unbiased coverage of genomic regions of different CpG densities. We further developed a highly automated software package to analyze bisulfite sequencing results from the Solexa GAIIx system.

Results

With two sequencing lanes, we were able to quantify ~1.8 million individual CpG sites at a minimum sequencing depth of 10. Overall, about 76.7% of CpG islands, 54.9% of CpG island shores and 52.2% of core promoters in the human genome were covered with at least 3 CpG sites per region.

Conclusions

With this new pipeline, it is now possible to perform whole-genome DNA methylation analysis at single base resolution for a large number of samples for understanding how DNA methylation and its changes are involved in development, differentiation, and disease pathogenesis.  相似文献   

8.
9.
10.
We have discovered a distinct DNA-methylation boundary at a site between 650 and 800 nucleotides upstream of the CGG repeat in the first exon of the human FMR1 gene. This boundary, identified by bisulfite sequencing, is present in all human cell lines and cell types, irrespective of age, gender, and developmental stage. The same boundary is found also in different mouse tissues, although sequence homology between human and mouse in this region is only 46.7%. This boundary sequence, in both the unmethylated and the CpG-methylated modes, binds specifically to nuclear proteins from human cells. We interpret this boundary as carrying a specific chromatin structure that delineates a hypermethylated area in the genome from the unmethylated FMR1 promoter and protecting it from the spreading of DNA methylation. In individuals with the fragile X syndrome (FRAXA), the methylation boundary is lost; methylation has penetrated into the FMR1 promoter and inactivated the FMR1 gene. In one FRAXA genome, the upstream terminus of the methylation boundary region exhibits decreased methylation as compared to that of healthy individuals. This finding suggests changes in nucleotide sequence and chromatin structure in the boundary region of this FRAXA individual. In the completely de novo methylated FMR1 promoter, there are isolated unmethylated CpG dinucleotides that are, however, not found when the FMR1 promoter and upstream sequences are methylated in vitro with the bacterial M-SssI DNA methyltransferase. They may arise during de novo methylation only in DNA that is organized in chromatin and be due to the binding of specific proteins.  相似文献   

11.
ABSTRACT: BACKGROUND: The assembly of next-generation short-read sequencing data can result in a fragmented non-contiguous set of genomic sequences. Therefore a common step in a genome project is to join neighbouring sequence regions together and fill gaps. This scaffolding step is non-trivial and requires manually editing large blocks of nucleotide sequence. Joining these sequences together also hides the source of each region in the final genome sequence. Taken together these considerations may make reproducing or editing an existing genome scaffold difficult. METHODS: The software outlined here, "Scaffolder," is implemented in the Ruby programming language and can be installed via the RubyGems software management system. Genome scaffolds are defined using YAML - a data format which is both human and machine-readable. Command line binaries and extensive documentation are available. RESULTS: This software allows a genome build to be defined in terms of the constituent sequences using a relatively simple syntax. This syntax further allows unknown regions to be specified and additional sequence to be used to fill known gaps in the scaffold. Defining the genome construction in a file makes the scaffolding process reproducible and easier to edit compared with large FASTA nucleotide sequences. CONCLUSIONS: Scaffolder is easy-to-use genome scaffolding software which promotes reproducibility and continuous development in a genome project. Scaffolder can be found at http://next.gs.  相似文献   

12.
13.
Although plastid genome (plastome) structure is highly conserved across most seed plants, investigations during the past two decades have revealed several disparately related lineages that experienced substantial rearrangements. Most plastomes contain a large inverted repeat and two single-copy regions, and a few dispersed repeats; however, the plastomes of some taxa harbour long repeat sequences (>300 bp). These long repeats make it challenging to assemble complete plastomes using short-read data, leading to misassemblies and consensus sequences with spurious rearrangements. Single-molecule, long-read sequencing has the potential to overcome these challenges, yet there is no consensus on the most effective method for accurately assembling plastomes using long-read data. We generated a pipeline, plastid Genome Assembly Using Long-read data (ptGAUL), to address the problem of plastome assembly using long-read data from Oxford Nanopore Technologies (ONT) or Pacific Biosciences platforms. We demonstrated the efficacy of the ptGAUL pipeline using 16 published long-read data sets. We showed that ptGAUL quickly produces accurate and unbiased assemblies using only ~50× coverage of plastome data. Additionally, we deployed ptGAUL to assemble four new Juncus (Juncaceae) plastomes using ONT long reads. Our results revealed many long repeats and rearrangements in Juncus plastomes compared with basal lineages of Poales. The ptGAUL pipeline is available on GitHub: https://github.com/Bean061/ptgaul .  相似文献   

14.
Large-insert genome analysis (LIGAN) is a broadly applicable, high-throughput technology designed to characterize genome-scale structural variation. Fosmid paired-end sequences and DNA fingerprints from a query genome are compared to a reference sequence using the Genomic Variation Analysis (GenVal) suite of software tools to pinpoint locations of insertions, deletions, and rearrangements. Fosmids spanning regions that contain new structural variants can then be sequenced. Clonal pairs of Pseudomonas aeruginosa isolates from four cystic fibrosis patients were used to validate the LIGAN technology. Approximately 1.5 Mb of inserted sequences were identified, including 743 kb containing 615 ORFs that are absent from published P. aeruginosa genomes. Six rearrangement breakpoints and 220 kb of deleted sequences were also identified. Our study expands the “genome universe” of P. aeruginosa and validates a technology that complements emerging, short-read sequencing methods that are better suited to characterizing single-nucleotide polymorphisms than structural variation.  相似文献   

15.
Site-specific hypermethylation of tumor suppressor genes accompanied by genome-wide hypomethylation are epigenetic hallmarks of malignancy. However, the molecular mechanisms that drive these linked changes in DNA methylation remain obscure. DNA methyltransferase 1 (DNMT1), the principle enzyme responsible for maintaining methylation patterns is commonly dysregulated in tumors. Replication foci targeting sequence (RFTS) is an N-terminal domain of DNMT1 that inhibits DNA-binding and catalytic activity, suggesting that RFTS deletion would result in a gain of DNMT1 function. However, a substantial body of data suggested that RFTS is required for DNMT1 activity. Here, we demonstrate that deletion of RFTS alters DNMT1-dependent DNA methylation during malignant transformation. Compared to full-length DNMT1, ectopic expression of hyperactive DNMT1-ΔRFTS caused greater malignant transformation and enhanced promoter methylation with condensed chromatin structure that silenced DAPK and DUOX1 expression. Simultaneously, deletion of RFTS impaired DNMT1 chromatin association with pericentromeric Satellite 2 (SAT2) repeat sequences and produced DNA demethylation at SAT2 repeats and globally. To our knowledge, RFTS-deleted DNMT1 is the first single factor that can reprogram focal hypermethylation and global hypomethylation in parallel during malignant transformation. Our evidence suggests that the RFTS domain of DNMT1 is a target responsible for epigenetic changes in cancer.  相似文献   

16.
17.
18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号