首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
Many genomes have been sequenced to high-quality draft status using Sanger capillary electrophoresis and/or newer short-read sequence data and whole genome assembly techniques. However, even the best draft genomes contain gaps and other imperfections due to limitations in the input data and the techniques used to build draft assemblies. Sequencing biases, repetitive genomic features, genomic polymorphism, and other complicating factors all come together to make some regions difficult or impossible to assemble. Traditionally, draft genomes were upgraded to “phase 3 finished” status using time-consuming and expensive Sanger-based manual finishing processes. For more facile assembly and automated finishing of draft genomes, we present here an automated approach to finishing using long-reads from the Pacific Biosciences RS (PacBio) platform. Our algorithm and associated software tool, PBJelly, (publicly available at https://sourceforge.net/projects/pb-jelly/) automates the finishing process using long sequence reads in a reference-guided assembly process. PBJelly also provides “lift-over” co-ordinate tables to easily port existing annotations to the upgraded assembly. Using PBJelly and long PacBio reads, we upgraded the draft genome sequences of a simulated Drosophila melanogaster, the version 2 draft Drosophila pseudoobscura, an assembly of the Assemblathon 2.0 budgerigar dataset, and a preliminary assembly of the Sooty mangabey. With 24× mapped coverage of PacBio long-reads, we addressed 99% of gaps and were able to close 69% and improve 12% of all gaps in D. pseudoobscura. With 4× mapped coverage of PacBio long-reads we saw reads address 63% of gaps in our budgerigar assembly, of which 32% were closed and 63% improved. With 6.8× mapped coverage of mangabey PacBio long-reads we addressed 97% of gaps and closed 66% of addressed gaps and improved 19%. The accuracy of gap closure was validated by comparison to Sanger sequencing on gaps from the original D. pseudoobscura draft assembly and shown to be dependent on initial reference quality.  相似文献   

2.
Spontaneous mutations play a central role in evolution. Despite their importance, mutation rates are some of the most elusive parameters to measure in evolutionary biology. The combination of mutation accumulation (MA) experiments and whole-genome sequencing now makes it possible to estimate mutation rates by directly observing new mutations at the molecular level across the whole genome. We performed an MA experiment with the social amoeba Dictyostelium discoideum and sequenced the genomes of three randomly chosen lines using high-throughput sequencing to estimate the spontaneous mutation rate in this model organism. The mitochondrial mutation rate of 6.76×10−9, with a Poisson confidence interval of 4.1×10−9 − 9.5×10−9, per nucleotide per generation is slightly lower than estimates for other taxa. The mutation rate estimate for the nuclear DNA of 2.9×10−11, with a Poisson confidence interval ranging from 7.4×10−13 to 1.6×10−10, is the lowest reported for any eukaryote. These results are consistent with low microsatellite mutation rates previously observed in D. discoideum and low levels of genetic variation observed in wild D. discoideum populations. In addition, D. discoideum has been shown to be quite resistant to DNA damage, which suggests an efficient DNA-repair mechanism that could be an adaptation to life in soil and frequent exposure to intracellular and extracellular mutagenic compounds. The social aspect of the life cycle of D. discoideum and a large portion of the genome under relaxed selection during vegetative growth could also select for a low mutation rate. This hypothesis is supported by a significantly lower mutation rate per cell division in multicellular eukaryotes compared with unicellular eukaryotes.  相似文献   

3.
Organellar DNA sequences are widely used in evolutionary and population genetic studies, however, the conservative nature of chloroplast gene and genome evolution often limits phylogenetic resolution and statistical power. To gain maximal access to the historical record contained within chloroplast genomes, we have adapted multiplex sequencing-by-synthesis (MSBS) to simultaneously sequence multiple genomes using the Illumina Genome Analyzer. We PCR-amplified ~120 kb plastomes from eight species (seven Pinus, one Picea) in 35 reactions. Pooled products were ligated to modified adapters that included 3 bp indexing tags and samples were multiplexed at four genomes per lane. Tagged microreads were assembled by de novo and reference-guided assembly methods, using previously published Pinus plastomes as surrogate references. Assemblies for these eight genomes are estimated at 88–94% complete, with an average sequence depth of 55× to 186×. Mononucleotide repeats interrupt contig assembly with increasing repeat length, and we estimate that the limit for their assembly is 16 bp. Comparisons to 37 kb of Sanger sequence show a validated error rate of 0.056%, and conspicuous errors are evident from the assembly process. This efficient sequencing approach yields high-quality draft genomes and should have immediate applicability to genomes with comparable complexity.  相似文献   

4.
《Aging cell》2022,21(6)
DNA methylation (DNAm) has been reported to be associated with many diseases and with mortality. We hypothesized that the integration of DNAm with clinical risk factors would improve mortality prediction. We performed an epigenome‐wide association study of whole blood DNAm in relation to mortality in 15 cohorts (= 15,013). During a mean follow‐up of 10 years, there were 4314 deaths from all causes including 1235 cardiovascular disease (CVD) deaths and 868 cancer deaths. Ancestry‐stratified meta‐analysis of all‐cause mortality identified 163 CpGs in European ancestry (EA) and 17 in African ancestry (AA) participants at < 1 × 10−7, of which 41 (EA) and 16 (AA) were also associated with CVD death, and 15 (EA) and 9 (AA) with cancer death. We built DNAm‐based prediction models for all‐cause mortality that predicted mortality risk after adjusting for clinical risk factors. The mortality prediction model trained by integrating DNAm with clinical risk factors showed an improvement in prediction of cancer death with 5% increase in the C‐index in a replication cohort, compared with the model including clinical risk factors alone. Mendelian randomization identified 15 putatively causal CpGs in relation to longevity, CVD, or cancer risk. For example, cg06885782 (in KCNQ4) was positively associated with risk for prostate cancer (Beta = 1.2, P MR = 4.1 × 10−4) and negatively associated with longevity (Beta = −1.9, P MR = 0.02). Pathway analysis revealed that genes associated with mortality‐related CpGs are enriched for immune‐ and cancer‐related pathways. We identified replicable DNAm signatures of mortality and demonstrated the potential utility of CpGs as informative biomarkers for prediction of mortality risk.  相似文献   

5.

Background

Free circulating DNA (fcDNA) has many potential clinical applications, due to the non-invasive way in which it is collected. However, because of the low concentration of fcDNA in blood, genome-wide analysis carries many technical challenges that must be overcome before fcDNA studies can reach their full potential. There are currently no definitive standards for fcDNA collection, processing and whole-genome sequencing. We report novel detailed methodology for the capture of high-quality methylated fcDNA, library preparation and downstream genome-wide Next-Generation Sequencing. We also describe the effects of sample storage, processing and scaling on fcDNA recovery and quality.

Results

Use of serum versus plasma, and storage of blood prior to separation resulted in genomic DNA contamination, likely due to leukocyte lysis. Methylated fcDNA fragments were isolated from 5 donors using a methyl-binding protein-based protocol and appear as a discrete band of ~180 bases. This discrete band allows minimal sample loss at the size restriction step in library preparation for Next-Generation Sequencing, allowing for high-quality sequencing from minimal amounts of fcDNA. Following sequencing, we obtained 37×106-86×106 unique mappable reads, representing more than 50% of total mappable reads. The methylation status of 9 genomic regions as determined by DNA capture and sequencing was independently validated by clonal bisulphite sequencing.

Conclusions

Our optimized methods provide high-quality methylated fcDNA suitable for whole-genome sequencing, and allow good library complexity and accurate sequencing, despite using less than half of the recommended minimum input DNA.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-476) contains supplementary material, which is available to authorized users.  相似文献   

6.
To improve the metagenomic analysis of complex microbiomes, we have repurposed restriction endonucleases as methyl specific DNA binding proteins. As an example, we use DpnI immobilized on magnetic beads. The ten minute extraction technique allows specific binding of genomes containing the DpnI Gm6ATC motif common in the genomic DNA of many bacteria including γ-proteobacteria. Using synthetic genome mixtures, we demonstrate 80% recovery of Escherichia coli genomic DNA even when only femtogram quantities are spiked into 10 µg of human DNA background. Binding is very specific with less than 0.5% of human DNA bound. Next Generation Sequencing of input and enriched synthetic mixtures results in over 100-fold enrichment of target genomes relative to human and plant DNA. We also show comparable enrichment when sequencing complex microbiomes such as those from creek water and human saliva. The technique can be broadened to other restriction enzymes allowing for the selective enrichment of trace and unculturable organisms from complex microbiomes and the stratification of organisms according to restriction enzyme enrichment.  相似文献   

7.
Recent whole-genome analysis suggests that lateral gene transfer by bacteriophages has contributed significantly to the genetic diversity of bacteria. To accurately determine the frequency of phage-mediated gene transfer, we employed cycling primed in situ amplification-fluorescent in situ hybridization (CPRINS-FISH) and investigated the movement of the ampicillin resistance gene among Escherichia coli cells mediated by phage at the single-cell level. Phages P1 and T4 and the newly isolated E. coli phage EC10 were used as vectors. The transduction frequencies determined by conventional plating were 3 × 10−8 to 2 × 10−6, 1 × 10−8 to 4 × 10−8, and <4 × 10−9 to 4 × 10−8 per PFU for phages P1, T4, and EC10, respectively. The frequencies of DNA transfer determined by CPRINS-FISH were 7 × 10−4 to 1 × 10−3, 9 × 10−4 to 3 × 10−3, and 5 × 10−4 to 4 × 10−3 for phages P1, T4, and EC10, respectively. Direct viable counting combined with CPRINS-FISH revealed that more than 20% of the cells carrying the transferred gene retained their viabilities. These results revealed that the difference in the number of viable cells carrying the transferred gene and the number of cells capable of growth on the selective medium was 3 to 4 orders of magnitude, indicating that phage-mediated exchange of DNA sequences among bacteria occurs with unexpectedly high frequency.  相似文献   

8.
A plasmid marker rescue system based on restoration of the nptII gene was established in Streptococcus gordonii to study the transfer of bacterial and transgenic plant DNA by transformation. In vitro studies revealed that the marker rescue efficiency depends on the type of donor DNA. Plasmid and chromosomal DNA of bacteria as well as DNA of transgenic potatoes were transferred with efficiencies ranging from 8.1 × 10−6 to 5.8 × 10−7 transformants per nptII gene. Using a 792-bp amplification product of nptII the efficiency was strongly decreased (9.8 × 10−9). In blood sausage, marker rescue using plasmid DNA was detectable (7.9 × 10−10), whereas in milk heat-inactivated horse serum (HHS) had to be added to obtain an efficiency of 2.7 × 10−11. No marker rescue was detected in extracts of transgenic potatoes despite addition of HHS. In vivo transformation of S. gordonii LTH 5597 was studied in monoassociated rats by using plasmid DNA. No marker rescue could be detected in vivo, although transformation was detected in the presence of saliva and fecal samples supplemented with HHS. It was also shown that plasmid DNA persists in rat saliva permitting transformation for up to 6 h of incubation. It is suggested that the lack of marker rescue is due to the absence of competence-stimulating factors such as serum proteins in rat saliva.  相似文献   

9.
Using the genomic sequences of Drosophila melanogaster subgroup, the pattern of gene duplications was investigated with special attention to interlocus gene conversion. Our fine-scale analysis with careful visual inspections enabled accurate identification of a number of duplicated blocks (genomic regions). The orthologous parts of those duplicated blocks were also identified in the D. simulans and D. sechellia genomes, by which we were able to clearly classify the duplicated blocks into post- and pre-speciation blocks. We found 31 post-speciation duplicated genes, from which the rate of gene duplication (from one copy to two copies) is estimated to be 1.0×10−9 per single-copy gene per year. The role of interlocus gene conversion was observed in several respects in our data: (1) synonymous divergence between a duplicated pair is overall very low. Consequently, the gene duplication rate would be seriously overestimated by counting duplicated genes with low divergence; (2) the sizes of young duplicated blocks are generally large. We postulate that the degeneration of gene conversion around the edges could explain the shrinkage of “identifiable” duplicated regions; and (3) elevated paralogous divergence is observed around the edges in many duplicated blocks, supporting our gene conversion–degeneration model. Our analysis demonstrated that gene conversion between duplicated regions is a common and genome-wide phenomenon in the Drosophila genomes, and that its role should be especially significant in the early stages of duplicated genes. Based on a population genetic prediction, we applied a new genome-scan method to test for signatures of selection for neofunctionalization and found a strong signature in a pair of transporter genes.  相似文献   

10.
Sex differences in schizophrenia are well known, but their genetic basis has not been identified. We performed a genome-wide association scan for schizophrenia in an Ashkenazi Jewish population using DNA pooling. We found a female-specific association with rs7341475, a SNP in the fourth intron of the reelin (RELN) gene (p = 2.9 × 10−5 in women), with a significant gene-sex effect (p = 1.8 × 10−4). We studied rs7341475 in four additional populations, totaling 2,274 cases and 4,401 controls. A significant effect was observed only in women, replicating the initial result (p = 2.1 × 10−3 in women; p = 4.2 × 10−3 for gene-sex interaction). Based on all populations the estimated relative risk of women carrying the common genotype is 1.58 (p = 8.8 × 10−7; p = 1.6 × 10−5 for gene-sex interaction). The female-specific association between RELN and schizophrenia is one of the few examples of a replicated sex-specific genetic association in any disease.  相似文献   

11.

Background

DNA word frequencies, normalized for genomic AT content, are remarkably stable within prokaryotic genomes and are therefore said to reflect a “genomic signature.” The genomic signatures can be used to phylogenetically classify organisms from arbitrary sampled DNA. Genomic signatures can also be used to search for horizontally transferred DNA or DNA regions subjected to special selection forces. Thus, the stability of the genomic signature can be used as a measure of genomic homogeneity. The factors associated with the stability of the genomic signatures are not known, and this motivated us to investigate further. We analyzed the intra-genomic variance of genomic signatures based on AT content normalization (0th order Markov model) as well as genomic signatures normalized by smaller DNA words (1st and 2nd order Markov models) for 636 sequenced prokaryotic genomes. Regression models were fitted, with intra-genomic signature variance as the response variable, to a set of factors representing genomic properties such as genomic AT content, genome size, habitat, phylum, oxygen requirement, optimal growth temperature and oligonucleotide usage variance (OUV, a measure of oligonucleotide usage bias), measured as the variance between genomic tetranucleotide frequencies and Markov chain approximated tetranucleotide frequencies, as predictors.

Principal Findings

Regression analysis revealed that OUV was the most important factor (p<0.001) determining intra-genomic homogeneity as measured using genomic signatures. This means that the less random the oligonucleotide usage is in the sense of higher OUV, the more homogeneous the genome is in terms of the genomic signature. The other factors influencing variance in the genomic signature (p<0.001) were genomic AT content, phylum and oxygen requirement.

Conclusions

Genomic homogeneity in prokaryotes is intimately linked to genomic GC content, oligonucleotide usage bias (OUV) and aerobiosis, while oligonucleotide usage bias (OUV) is associated with genomic GC content, aerobiosis and habitat.  相似文献   

12.
We developed a method for aptamer identification without in vitro selection. We have previously obtained several aptamers, which may fold into the G-quadruplex (G4) structure, against target proteins; therefore, we hypothesized that the G4 structure would be an excellent scaffold for aptamers to recognize the target protein. Moreover, the G4-forming sequence contained in the promoter region of insulin can reportedly bind to insulin. We thus expected that G4 DNAs, which are contained in promoter regions, could act as DNA aptamers against their gene products. We designated this aptamer identification method as “G4 promoter-derived aptamer selection (G4PAS).” Using G4PAS, we identified vascular endothelial growth factor (VEGF)165, platelet-derived growth factor-AA (PDGF)-AA, and RB1 DNA aptamers. Surface plasmon resonance (SPR) analysis revealed that the dissociation constant (K d) values of VEGF165, PDGF-AA, and RB1 DNA aptamers were 1.7 × 10−7 M, 6.3 × 10−9 M, and 4.4 × 10−7 M, respectively. G4PAS is a simple and rapid method of aptamer identification because it involves only binding analysis of G4 DNAs to the target protein. In the human genome, over 40% of promoters contain one or more potential G4 DNAs. G4PAS could therefore be applied to identify aptamers against target proteins that contain G4 DNAs on their promoters.  相似文献   

13.
We describe methods for rapid sequencing of the entire human mitochondrial genome (mtgenome), which involve long-range PCR for specific amplification of the mtgenome, pyrosequencing, quantitative mapping of sequence reads to identify sequence variants and heteroplasmy, as well as de novo sequence assembly. These methods have been used to study 40 publicly available HapMap samples of European (CEU) and African (YRI) ancestry to demonstrate a sequencing error rate <5.63×10−4, nucleotide diversity of 1.6×10−3 for CEU and 3.7×10−3 for YRI, patterns of sequence variation consistent with earlier studies, but a higher rate of heteroplasmy varying between 10% and 50%. These results demonstrate that next-generation sequencing technologies allow interrogation of the mitochondrial genome in greater depth than previously possible which may be of value in biology and medicine.  相似文献   

14.
In utero smoke exposure has been shown to have detrimental effects on lung function and to be associated with persistent wheezing and asthma in children. One potential mechanism of IUS effects could be alterations in DNA methylation, which may have life-long implications. The goal of this study was to examine the association between DNA methylation and nicotine exposure in fetal lung and placental tissue in early development; nicotine exposure in this analysis represents a likely surrogate for in-utero smoke. We performed an epigenome-wide analysis of DNA methylation in fetal lung tissue (n = 85, 41 smoke exposed (48%), 44 controls) and the corresponding placental tissue samples (n = 80, 39 smoke exposed (49%), 41 controls) using the Illumina HumanMethylation450 BeadChip array. Differential methylation analyses were conducted to evaluate the variation associated with nicotine exposure. The most significant CpG sites in the fetal lung analysis mapped to the PKP3 (P = 2.94 × 10−03), ANKRD33B (P = 3.12 × 10−03), CNTD2 (P = 4.9 × 10−03) and DPP10 (P = 5.43 × 10−03) genes. In the placental methylome, the most significant CpG sites mapped to the GTF2H2C and GTF2H2D genes (P = 2.87 × 10−06 − 3.48 × 10−05). One hundred and one unique CpG sites with P-values < 0.05 were concordant between lung and placental tissue analyses. Gene Set Enrichment Analysis demonstrated enrichment of specific disorders, such as asthma and immune disorders. Our findings demonstrate an association between in utero nicotine exposure and variable DNA methylation in fetal lung and placental tissues, suggesting a role for DNA methylation variation in the fetal origins of chronic diseases.  相似文献   

15.
Next-generation sequencing (NGS) technologies have transformed genomic research and have the potential to revolutionize clinical medicine. However, the background error rates of sequencing instruments and limitations in targeted read coverage have precluded the detection of rare DNA sequence variants by NGS. Here we describe a method, termed CypherSeq, which combines double-stranded barcoding error correction and rolling circle amplification (RCA)-based target enrichment to vastly improve NGS-based rare variant detection. The CypherSeq methodology involves the ligation of sample DNA into circular vectors, which contain double-stranded barcodes for computational error correction and adapters for library preparation and sequencing. CypherSeq is capable of detecting rare mutations genome-wide as well as those within specific target genes via RCA-based enrichment. We demonstrate that CypherSeq is capable of correcting errors incurred during library preparation and sequencing to reproducibly detect mutations down to a frequency of 2.4 × 10−7 per base pair, and report the frequency and spectra of spontaneous and ethyl methanesulfonate-induced mutations across the Saccharomyces cerevisiae genome.  相似文献   

16.
Complexes of cationic liposomes with DNA are promising tools to deliver genetic information into cells for gene therapy and vaccines. Electrostatic interaction is thought to be the major force in lipid–DNA interaction, while lipid-base binding and the stability of cationic lipid–DNA complexes have been the subject of more debate in recent years. The aim of this study was to examine the complexation of calf-thymus DNA with cholesterol (Chol), 1,2-dioleoyl-3-trimethylammonium-propane (DOTAP), dioctadecyldimethylammoniumbromide (DDAB) and dioleoylphosphatidylethanolamine (DOPE), at physiological condition, using constant DNA concentration and various lipid contents. Fourier transform infrared (FTIR), UV-visible, circular dichroism spectroscopic methods and atomic force microscopy were used to analyse lipid-binding site, the binding constant and the effects of lipid interaction on DNA stability and conformation. Structural analysis showed a strong lipid–DNA interaction via major and minor grooves and the backbone phosphate group with overall binding constants of KChol = 1.4 (±0.5) × 104 M−1, KDDAB = 2.4 (±0.80) × 104 M−1, KDOTAP = 3.1 (±0.90) × 104 M−1 and KDOPE = 1.45 (± 0.60) × 104 M−1. The order of stability of lipid–DNA complexation is DOTAP>DDAB>DOPE>Chol. Hydrophobic interactions between lipid aliphatic tails and DNA were observed. Chol and DOPE induced a partial B to A-DNA conformational transition, while a partial B to C-DNA alteration occurred for DDAB and DOTAP at high lipid concentrations. DNA aggregation was observed at high lipid content.  相似文献   

17.
Single Molecule, Real-Time (SMRT®) Sequencing (Pacific Biosciences, Menlo Park, CA, USA) provides the longest continuous DNA sequencing reads currently available. However, the relatively high error rate in the raw read data requires novel analysis methods to deconvolute sequences derived from complex samples. Here, we present a workflow of novel computer algorithms able to reconstruct viral variant genomes present in mixtures with an accuracy of >QV50. This approach relies exclusively on Continuous Long Reads (CLR), which are the raw reads generated during SMRT Sequencing. We successfully implement this workflow for simultaneous sequencing of mixtures containing up to forty different >9 kb HIV-1 full genomes. This was achieved using a single SMRT Cell for each mixture and desktop computing power. This novel approach opens the possibility of solving complex sequencing tasks that currently lack a solution.  相似文献   

18.
To meet the new challenge of generating the draft sequences of mammalian genomes, we describe the development of a novel high throughput 96-well method for the purification of plasmid DNA template using size-fractionated, acid-washed glass beads. Unlike most previously described approaches, the current method has been designed and optimized to facilitate the direct binding of alcohol-precipitated plasmid DNA to glass beads from alkaline lysed bacterial cells containing the insoluble cellular aggregate material. Eliminating the tedious step of separating the cleared lysate significantly simplifies the method and improves throughput and reliability. During a 4 month period of 96-capillary DNA sequencing of the Rattus norvegicus genome at the Baylor College of Medicine Human Genome Sequencing Center, the average success rate and read length derived from >1 800 000 plasmid DNA templates prepared by the direct lysis/glass bead method were 82.2% and 516 bases, respectively. The cost of this direct lysis/glass bead method in September 2001 was ~10 cents per clone, which is a significant cost saving in high throughput genomic sequencing efforts.  相似文献   

19.
《Aging cell》2021,20(6)
Clonal hematopoiesis of indeterminate potential (CHIP) is a common precursor state for blood cancers that most frequently occurs due to mutations in the DNA‐methylation modifying enzymes DNMT3A or TET2. We used DNA‐methylation array and whole‐genome sequencing data from four cohorts together comprising 5522 persons to study the association between CHIP, epigenetic clocks, and health outcomes. CHIP was strongly associated with epigenetic age acceleration, defined as the residual after regressing epigenetic clock age on chronological age, in several clocks, ranging from 1.31 years (GrimAge, p < 8.6 × 10−7) to 3.08 years (EEAA, p < 3.7 × 10−18). Mutations in most CHIP genes except DNA‐damage response genes were associated with increases in several measures of age acceleration. CHIP carriers with mutations in multiple genes had the largest increases in age acceleration and decrease in estimated telomere length. Finally, we found that ~40% of CHIP carriers had acceleration >0 in both Hannum and GrimAge (referred to as AgeAccelHG+). This group was at high risk of all‐cause mortality (hazard ratio 2.90, p < 4.1 × 10−8) and coronary heart disease (CHD) (hazard ratio 3.24, p < 9.3 × 10−6) compared to those who were CHIP−/AgeAccelHG−. In contrast, the other ~60% of CHIP carriers who were AgeAccelHG− were not at increased risk of these outcomes. In summary, CHIP is strongly linked to age acceleration in multiple clocks, and the combination of CHIP and epigenetic aging may be used to identify a population at high risk for adverse outcomes and who may be a target for clinical interventions.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号