首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 765 毫秒
1.

Background  

Whole genome shotgun sequencing produces increasingly higher coverage of a genome with random sequence reads. Progressive whole genome assembly and eventual finishing sequencing is a process that typically takes several years for large eukaryotic genomes. In the interim, all sequence reads of public sequencing projects are made available in repositories such as the NCBI Trace Archive. For a particular locus, sequencing coverage may be high enough early on to produce a reliable local genome assembly. We have developed software, Tracembler, that facilitates in silico chromosome walking by recursively assembling reads of a selected species from the NCBI Trace Archive starting with reads that significantly match sequence seeds supplied by the user.  相似文献   

2.

Background  

Complete sequencing and annotation of the 96.2 kb Bacillus anthracis plasmid, pXO2, predicted 85 open reading frames (ORFs). Bacillus cereus and Bacillus thuringiensis isolates that ranged in genomic similarity to B. anthracis, as determined by amplified fragment length polymorphism (AFLP) analysis, were examined by PCR for the presence of sequences similar to 47 pXO2 ORFs.  相似文献   

3.

Background

Haplotypes are important for assessing genealogy and disease susceptibility of individual genomes, but are difficult to obtain with routine sequencing approaches. Experimental haplotype reconstruction based on assembling fragments of individual chromosomes is promising, but with variable yields due to incompletely understood parameter choices.

Results

We parameterize the clone-based haplotyping problem in order to provide theoretical and empirical assessments of the impact of different parameters on haplotype assembly. We confirm the intuition that long clones help link together heterozygous variants and thus improve haplotype length. Furthermore, given the length of the clones, we address how to choose the other parameters, including number of pools, clone coverage and sequencing coverage, so as to maximize haplotype length. We model the problem theoretically and show empirically the benefits of using larger clones with moderate number of pools and sequencing coverage. In particular, using 140 kb BAC clones, we construct haplotypes for a personal genome and assemble haplotypes with N50 values greater than 2.6 Mb. These assembled haplotypes are longer and at least as accurate as haplotypes of existing clone-based strategies, whether in vivo or in vitro.

Conclusions

Our results provide practical guidelines for the development and design of clone-based methods to achieve long range, high-resolution and accurate haplotypes.  相似文献   

4.

Background  

Group A streptococcal (GAS) infections can lead to the development of severe post-infectious sequelae, such as rheumatic fever (RF) and rheumatic heart disease (RHD). RF and RHD are a major health concern in developing countries, and in indigenous populations of developed nations. The majority of GAS isolates are M protein-nontypeable (MNT) by standard serotyping. However, GAS typing is a necessary tool in the epidemiologically analysis of GAS and provides useful information for vaccine development. Although DNA sequencing is the most conclusive method for M protein typing, this is not a feasible approach especially in developing countries. To overcome this problem, we have developed a polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP)-based assay for molecular typing the M protein gene (emm) of GAS.  相似文献   

5.

Background

Identity by descent (IBD) has played a fundamental role in the discovery of genetic loci underlying human diseases. Both pedigree-based and population-based linkage analyses rely on estimating recent IBD, and evidence of ancient IBD can be used to detect population structure in genetic association studies. Various methods for detecting IBD, including those implemented in the soft- ware programs fastIBD and GERMLINE, have been developed in the past several years using population genotype data from microarray platforms. Now, next-generation DNA sequencing data is becoming increasingly available, enabling the comprehensive analysis of genomes, in- cluding identifying rare variants. These sequencing data may provide an opportunity to detect IBD with higher resolution than previously possible, potentially enabling the detection of disease causing loci that were previously undetectable with sparser genetic data.

Results

Here, we investigate how different levels of variant coverage in sequencing and microarray genotype data influences the resolution at which IBD can be detected. This includes microarray genotype data from the WTCCC study, denser genotype data from the HapMap Project, low coverage sequencing data from the 1000 Genomes Project, and deep coverage complete genome data from our own projects. With high power (78%), we can detect segments of length 0.4 cM or larger using fastIBD and GERMLINE in sequencing data. This compares to similar power to detect segments of length 1.0 cM or higher with microarray genotype data. We find that GERMLINE has slightly higher power than fastIBD for detecting IBD segments using sequencing data, but also has a much higher false positive rate.

Conclusion

We further quantify the effect of variant density, conditional on genetic map length, on the power to resolve IBD segments. These investigations into IBD resolution may help guide the design of future next generation sequencing studies that utilize IBD, including family-based association studies, association studies in admixed populations, and homozygosity mapping studies.  相似文献   

6.

Background  

Amplified fragment length polymorphism (AFLP) is a PCR-based technique that involves restriction of genomic DNA followed by ligation of adaptors to the fragments generated and selective PCR amplification of a subset of these fragments. The amplified fragments are separated on a sequencing gel and visualized by autoradiography or fluorescent sequencing equipment. AFLP allows high-resolution genotyping but the lack of a format for databasing and comparison of AFLP fingerprint profiles limits its wider applications in profiling large numbers of biological samples.  相似文献   

7.
8.

Background  

The genes of plants can be up- or down-regulated during viral infection to influence the replication of viruses. Identification of these differentially expressed genes could shed light on the defense systems employed by plants and the mechanisms involved in the adaption of viruses to plant cells. Differential gene expression in Nicotiana benthamiana plants in response to infection with Bamboo mosaic virus (BaMV) was revealed using cDNA-amplified fragment length polymorphism (AFLP).  相似文献   

9.
10.

Background

Massively parallel sequencing systems continue to improve on data output, while leaving labor-intensive library preparations a potential bottleneck. Efforts are currently under way to relieve the crucial and time-consuming work to prepare DNA for high-throughput sequencing.

Methodology/Principal Findings

In this study, we demonstrate an automated parallel library preparation protocol using generic carboxylic acid-coated superparamagnetic beads and polyethylene glycol precipitation as a reproducible and flexible method for DNA fragment length separation. With this approach the library preparation for DNA sequencing can easily be adjusted to a desired fragment length. The automated protocol, here demonstrated using the GS FLX Titanium instrument, was compared to the standard manual library preparation, showing higher yield, throughput and great reproducibility. In addition, 12 libraries were prepared and uniquely tagged in parallel, and the distribution of sequence reads between these indexed samples could be improved using quantitative PCR-assisted pooling.

Conclusions/Significance

We present a novel automated procedure that makes it possible to prepare 36 indexed libraries per person and day, which can be increased to up to 96 libraries processed simultaneously. The yield, speed and robust performance of the protocol constitute a substantial improvement to present manual methods, without the need of extensive equipment investments. The described procedure enables a considerable efficiency increase for small to midsize sequencing centers.  相似文献   

11.

Background  

Despite increasing popularity and improvements in terminal restriction fragment length polymorphism (T-RFLP) and other microbial community fingerprinting techniques, there are still numerous obstacles that hamper the analysis of these datasets. Many steps are required to process raw data into a format ready for analysis and interpretation. These steps can be time-intensive, error-prone, and can introduce unwanted variability into the analysis. Accordingly, we developed T-REX, free, online software for the processing and analysis of T-RFLP data.  相似文献   

12.

Background  

Fast-growing Eucalyptus grandis trees are one of the most efficient producers of wood in South Africa. The most serious problem affecting the quality and yield of solid wood products is the occurrence of end splitting in logs. Selection of E. grandis planting stock that exhibit preferred wood qualities is thus a priority of the South African forestry industry. We used microarray-based DNA-amplified fragment length polymorphism (AFLP) analysis in combination with expression profiling to develop fingerprints and profile gene expression of wood-forming tissue of seven different E. grandis trees.  相似文献   

13.

Background

The availability of diverse second- and third-generation sequencing technologies enables the rapid determination of the sequences of bacterial genomes. However, identifying the sequencing technology most suitable for producing a finished genome with multiple chromosomes remains a challenge. We evaluated the abilities of the following three second-generation sequencers: Roche 454 GS Junior (GS Jr), Life Technologies Ion PGM (Ion PGM), and Illumina MiSeq (MiSeq) and a third-generation sequencer, the Pacific Biosciences RS sequencer (PacBio), by sequencing and assembling the genome of Vibrio parahaemolyticus, which consists of a 5-Mb genome comprising two circular chromosomes.

Results

We sequenced the genome of V. parahaemolyticus with GS Jr, Ion PGM, MiSeq, and PacBio and performed de novo assembly with several genome assemblers. Although GS Jr generated the longest mean read length of 418 bp among the second-generation sequencers, the maximum contig length of the best assembly from GS Jr was 165 kbp, and the number of contigs was 309. Single runs of Ion PGM and MiSeq produced data of considerably greater sequencing coverage, 279× and 1,927×, respectively. The optimized result for Ion PGM contained 61 contigs assembled from reads of 77× coverage, and the longest contig was 895 kbp in size. Those for MiSeq were 34 contigs, 58× coverage, and 733 kbp, respectively. These results suggest that higher coverage depth is unnecessary for a better assembly result. We observed that multiple rRNA coding regions were fragmented in the assemblies from the second-generation sequencers, whereas PacBio generated two exceptionally long contigs of 3,288,561 and 1,875,537 bps, each of which was from a single chromosome, with 73× coverage and mean read length 3,119 bp, allowing us to determine the absolute positions of all rRNA operons.

Conclusions

PacBio outperformed the other sequencers in terms of the length of contigs and reconstructed the greatest portion of the genome, achieving a genome assembly of “finished grade” because of its long reads. It showed the potential to assemble more complex genomes with multiple chromosomes containing more repetitive sequences.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-699) contains supplementary material, which is available to authorized users.  相似文献   

14.

Background  

We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe.  相似文献   

15.

Background

Advances in Illumina DNA sequencing technology have produced longer paired-end reads that increasingly have sequence overlaps. These reads can be merged into a single read that spans the full length of the original DNA fragment, allowing for error correction and accurate determination of read coverage. Extant merging programs utilize simplistic or unverified models for the selection of bases and quality scores for the overlapping region of merged reads.

Results

We first examined the baseline quality score - error rate relationship using sequence reads derived from PhiX. In contrast to numerous published reports, we found that the quality scores produced by Illumina were not substantially inflated above the theoretical values, once the reference genome was corrected for unreported sequence variants. The PhiX reads were then used to create empirical models of sequencing errors in overlapping regions of paired-end reads, and these models were incorporated into a novel merging program, NGmerge. We demonstrate that NGmerge corrects errors and ambiguous bases better than other merging programs, and that it assigns quality scores for merged bases that accurately reflect the error rates. Our results also show that, contrary to published analyses, the sequencing errors of paired-end reads are not independent.

Conclusions

We provide a free and open-source program, NGmerge, that performs better than existing read merging programs. NGmerge is available on GitHub (https://github.com/harvardinformatics/NGmerge) under the MIT License; it is written in C and supported on Linux.
  相似文献   

16.

Background  

Among Coffea species, C. canephora has the widest natural distribution area in tropical African forests. It represents a good model for analyzing the geographical distribution of diversity in relation to locations proposed as part of the "refuge theory". In this study, we used both microsatellite (simple sequence repeat, SSR) and restriction fragment length polymorphism (RFLP) markers to investigate the genetic variation pattern of C. canephora in the Guineo-Congolean distribution zone.  相似文献   

17.

Background  

We investigated the molecular basis of primary open-angle glaucoma (POAG) using Opticin (OPTC) as a candidate gene on the basis of its expression in the trabecular meshwork cells involved in the disease pathogenesis. Two hundred POAG patients and 100 controls were enrolled in this study. The coding sequence of OPTC was amplified by PCR from genomic DNA of POAG patients, followed by SSCP, DHPLC and DNA sequencing. Subsequent bioinformatic analysis, site-directed mutagenesis, quantitative RT-PCR and western blot experiments were performed to address the functional significance of a 'silent' change in the OPTC coding region while screening for mutations in POAG patients.  相似文献   

18.

Background

DNA sequencing technologies deviate from the ideal uniform distribution of reads. These biases impair scientific and medical applications. Accordingly, we have developed computational methods for discovering, describing and measuring bias.

Results

We applied these methods to the Illumina, Ion Torrent, Pacific Biosciences and Complete Genomics sequencing platforms, using data from human and from a set of microbes with diverse base compositions. As in previous work, library construction conditions significantly influence sequencing bias. Pacific Biosciences coverage levels are the least biased, followed by Illumina, although all technologies exhibit error-rate biases in high- and low-GC regions and at long homopolymer runs. The GC-rich regions prone to low coverage include a number of human promoters, so we therefore catalog 1,000 that were exceptionally resistant to sequencing. Our results indicate that combining data from two technologies can reduce coverage bias if the biases in the component technologies are complementary and of similar magnitude. Analysis of Illumina data representing 120-fold coverage of a well-studied human sample reveals that 0.20% of the autosomal genome was covered at less than 10% of the genome-wide average. Excluding locations that were similar to known bias motifs or likely due to sample-reference variations left only 0.045% of the autosomal genome with unexplained poor coverage.

Conclusions

The assays presented in this paper provide a comprehensive view of sequencing bias, which can be used to drive laboratory improvements and to monitor production processes. Development guided by these assays should result in improved genome assemblies and better coverage of biologically important loci.  相似文献   

19.

Background  

Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sample (e.g. ChIP-sequencing). The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis tasks. The Solexa/Illumina 1G sequencer can produce tens of millions of reads, ranging in length from ~25–50 nt, in a single experiment. Accurately mapping the reads back to a reference genome is a critical task in almost all applications. Two sources of information that are often ignored when mapping reads from the Solexa technology are the 3' ends of longer reads, which contain a much higher frequency of sequencing errors, and the base-call quality scores.  相似文献   

20.

Background  

"Candidatus Phytoplasma aurantifolia", is the causative agent of witches' broom disease in Mexican lime trees (Citrus aurantifolia L.), and is responsible for major losses of Mexican lime trees in Southern Iran and Oman. The pathogen is strictly biotrophic, and thus is completely dependent on living host cells for its survival. The molecular basis of compatibility and disease development in this system is poorly understood. Therefore, we have applied a cDNA- amplified fragment length polymorphism (AFLP) approach to analyze gene expression in Mexican lime trees infected by " Ca. Phytoplasma aurantifolia".  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号