首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.

Background

Techniques enabling targeted re-sequencing of the protein coding sequences of the human genome on next generation sequencing instruments are of great interest. We conducted a systematic comparison of the solution-based exome capture kits provided by Agilent and Roche NimbleGen. A control DNA sample was captured with all four capture methods and prepared for Illumina GAII sequencing. Sequence data from additional samples prepared with the same protocols were also used in the comparison.

Results

We developed a bioinformatics pipeline for quality control, short read alignment, variant identification and annotation of the sequence data. In our analysis, a larger percentage of the high quality reads from the NimbleGen captures than from the Agilent captures aligned to the capture target regions. High GC content of the target sequence was associated with poor capture success in all exome enrichment methods. Comparison of mean allele balances for heterozygous variants indicated a tendency to have more reference bases than variant bases in the heterozygous variant positions within the target regions in all methods. There was virtually no difference in the genotype concordance compared to genotypes derived from SNP arrays. A minimum of 11× coverage was required to make a heterozygote genotype call with 99% accuracy when compared to common SNPs on genome-wide association arrays.

Conclusions

Libraries captured with NimbleGen kits aligned more accurately to the target regions. The updated NimbleGen kit most efficiently covered the exome with a minimum coverage of 20×, yet none of the kits captured all the Consensus Coding Sequence annotated exons.  相似文献   

4.
We have developed DNA microarrays containing stem–loop DNA probes with short single-stranded overhangs immobilized on a Packard HydroGel chip, a 3-dimensional porous gel substrate. Microarrays were fabricated by immobilizing self-complementary single-stranded oligonucleotides, which adopt a partially duplex structure upon denaturing and re-annealing. Hybridization of single-stranded DNA targets to such arrays is enhanced by contiguous stacking interactions with stem–loop probes and is highly sequence specific. Subsequent enzymatic ligation of the targets to the probes followed by stringent washing further enhances the mismatched base discrimination. We demonstrate here that these microarrays provide excellent specificity with signal-to-background ratios of from 10- to 300-fold. In a comparative study, we demonstrated that HydroGel arrays display 10–30 times higher hybridization signals than some solid surface DNA microarrays. Using Sanger sequencing reactions, we have also developed a method for preparing nested 3′-deletion sets from a target and evaluated the use of stem–loop DNA arrays for detecting p53 mutations in the deletion set. The stem–loop DNA array format is simple, robust and flexible in design, thus it is potentially useful in various DNA diagnostic tests.  相似文献   

5.
Gene structure conservation aids similarity based gene prediction   总被引:4,自引:1,他引:3       下载免费PDF全文
One of the primary tasks in deciphering the functional contents of a newly sequenced genome is the identification of its protein coding genes. Existing computational methods for gene prediction include ab initio methods which use the DNA sequence itself as the only source of information, comparative methods using multiple genomic sequences, and similarity based methods which employ the cDNA or protein sequences of related genes to aid the gene prediction. We present here an algorithm implemented in a computer program called Projector which combines comparative and similarity approaches. Projector employs similarity information at the genomic DNA level by directly using known genes annotated on one DNA sequence to predict the corresponding related genes on another DNA sequence. It therefore makes explicit use of the conservation of the exon–intron structure between two related genes in addition to the similarity of their encoded amino acid sequences. We evaluate the performance of Projector by comparing it with the program Genewise on a test set of 491 pairs of independently confirmed mouse and human genes. It is more accurate than Genewise for genes whose proteins are <80% identical, and is suitable for use in a combined gene prediction system where other methods identify well conserved and non-conserved genes, and pseudogenes.  相似文献   

6.
The next generation sequencing enables generation of high resolution and high throughput data for structure sequence of any genome at a fast declining cost. This opens opportunity for population based genetic and genomic analyses. In many applications, whole genome sequencing or re-sequencing is unnecessary or prohibited by budget limits. The Reduced Representation Genome Sequencing (RRGS), which sequences only a small proportion of the genome of interest, has been proposed to deal with the situations. Several forms of RRGS are proposed and implemented in the literature. When applied to plant or crop species, the current RRGS protocols shared a key drawback that a significantly high proportion (up to 60%) of sequence reads to be generated may be of non-genomic origin but attributed to chloroplast DNA or rRNA genes, leaving an exceptional low efficiency of the sequencing experiment. We recommended and discussed here the design of optimized simplified genomic DNA and bisulfite sequencing strategies, which may greatly improves efficiency of the sequencing experiments by bringing down the presentation of the undesirable sequencing reads to less than 10% in the whole sequence reads. The optimized RAD-seq and RRBS-seq methods are potentially useful for sequence variant screening and genotyping in large plant/crop populations.  相似文献   

7.
We have developed DNA microarrays containing stem-loop DNA probes with short single-stranded overhangs immobilized on a Packard HydroGel chip, a 3-dimensional porous gel substrate. Microarrays were fabricated by immobilizing self-complementary single-stranded oligonucleotides, which adopt a partially duplex structure upon denaturing and re-annealing. Hybridization of single-stranded DNA targets to such arrays is enhanced by contiguous stacking interactions with stem-loop probes and is highly sequence specific. Subsequent enzymatic ligation of the targets to the probes followed by stringent washing further enhances the mismatched base discrimination. We demonstrate here that these microarrays provide excellent specificity with signal-to-background ratios of from 10- to 300-fold. In a comparative study, we demonstrated that HydroGel arrays display 10-30 times higher hybridization signals than some solid surface DNA microarrays. Using Sanger sequencing reactions, we have also developed a method for preparing nested 3'-deletion sets from a target and evaluated the use of stem-loop DNA arrays for detecting p53 mutations in the deletion set. The stem-loop DNA array format is simple, robust and flexible in design, thus it is potentially useful in various DNA diagnostic tests.  相似文献   

8.
9.
10.
11.
12.
To meet the needs of large-scale genomic/genetic studies, the next-generation massively parallelized sequencing technologies provide high throughput, low cost and low labor-intensive sequencing service, with subsequent bioinformatic software and laboratory methods developed to expand their applications in various types of research. PCR-based genomic/genetic studies, which have significant usage in association studies like cancer research, haven’t benefited much from those next-generation sequencing technolo...  相似文献   

13.
14.
In sequencing-by-hybridization methods, the nucleotide sequence of a nucleic acid is reconstructed by overlapping oligonucleotides capable of hybridizing with the nucleic acid. In their present form, the methods are hardly suitable for sequencing of long nucleic acid molecules because of the occurrence of non-unique overlaps between the oligonucleotides, and similarly to the conventional sequencing methods, it is necessary to obtain an individual molecule. In the method described here, most ambiguities in reconstruction of a sequence from the constituent oligonucleotides are eliminated by preparing on oligonucleotide arrays and separate surveying of the nucleic acid nested partials. This enables longer nucleic acids to be sequenced, and results in a high redundancy of the input data allowing most hybridization errors to be eliminated by algorithmic means. Furthermore, large pools of nucleic acid strands can be sequenced directly, without isolating individual strands.  相似文献   

15.
We present here the sequence and characterization of various minisatellite-like tandem repeat loci isolated from the genome of Atlantic salmon (Salmo salar). Their diversity of sequence and lack of core motifs common to minisatellites of other species suggest the presence of numerous and previously unidentified simple sequence repeat families in this salmonid. Evidence for their ubiquity was provided by screening of a salmon genomic library. Southern blot analysis of the phylogenetic distribution of a subset of the minisatellites found one sequence to be pervasive among vertebrates, others present only in Salmoninae or Salmonidae species, and one amplified only in Atlantic salmon. There is evidence for the positioning of microsatellite and minisatellite arrays in close proximity at many loci. Furthermore, one tandem repeat appears to have been inserted into the transposase coding region of a copy of the Tc1 transposon-like element recently identified in salmonids. Received: 9 October 1996 / Accepted: 20 May 1997  相似文献   

16.
Sequencing by hybridization (SBH) is a DNA sequencing technique, in which the sequence is reconstructed using its k-mer content. This content, which is called the spectrum of the sequence, is obtained by hybridization to a universal DNA array. Standard universal arrays contain all k-mers for some fixed k, typically 8 to 10. Currently, in spite of its promise and elegance, SBH is not competitive with standard gel-based sequencing methods. This is due to two main reasons: lack of tools to handle realistic levels of hybridization errors and an inherent limitation on the length of uniquely reconstructible sequence by standard universal arrays. In this paper, we deal with both problems. We introduce a simple polynomial reconstruction algorithm which can be applied to spectra from standard arrays and has provable performance in the presence of both false negative and false positive errors. We also propose a novel design of chips containing universal bases that differs from the one proposed by Preparata et al. (1999). We give a simple algorithm that uses spectra from such chips to reconstruct with high probability random sequences of length lower only by a squared log factor compared to the information theoretic bound. Our algorithm is very robust to errors and has a provable performance even if there are both false negative and false positive errors. Simulations indicate that its sensitivity to errors is also very small in practice.  相似文献   

17.

Motivation

Next Generation Sequencing (NGS) is a frequently applied approach to detect sequence variations between highly related genomes. Recent large-scale re-sequencing studies as the Human 1000 Genomes Project utilize NGS data of low coverage to afford sequencing of hundreds of individuals. Here, SNPs and micro-indels can be detected by applying an alignment-consensus approach. However, computational methods capable of discovering other variations such as novel insertions or highly diverged sequence from low coverage NGS data are still lacking.

Results

We present LOCAS, a new NGS assembler particularly designed for low coverage assembly of eukaryotic genomes using a mismatch sensitive overlap-layout-consensus approach. LOCAS assembles homologous regions in a homology-guided manner while it performs de novo assemblies of insertions and highly polymorphic target regions subsequently to an alignment-consensus approach. LOCAS has been evaluated in homology-guided assembly scenarios with low sequence coverage of Arabidopsis thaliana strains sequenced as part of the Arabidopsis 1001 Genomes Project. While assembling the same amount of long insertions as state-of-the-art NGS assemblers, LOCAS showed best results regarding contig size, error rate and runtime.

Conclusion

LOCAS produces excellent results for homology-guided assembly of eukaryotic genomes with short reads and low sequencing depth, and therefore appears to be the assembly tool of choice for the detection of novel sequence variations in this scenario.  相似文献   

18.
Diagnostic re-sequencing plays a central role in medical and evolutionary genetics. In this report we describe a process that applies fluorescence-based re-sequencing and an integrated set of analysis tools to automate and simplify the identification of DNA variations using the human mitochondrial genome as a model system. Two programs used in genome sequence analysis (Phred, a base-caller, and Phrap, a sequence assembler) are applied to assess the quality of each base call across the sequence. Potential DNA variants are automatically identified and 'tagged' by comparing the assembled sequence with a reference sequence. We also show that employing the Consed program to display a set of highly annotated reference sequences greatly simplifies data analysis by providing a visual database containing information on the location of the PCR primers, coding and regulatory sequences and previously known DNA variants. Among the 12 genomes sequenced 378 variants including 29 new variants were identified along with two heteroplasmic sites, automatically detected by the PolyPhred program. Overall we document the ease and speed of performing high quality and accurate fluorescence-based re-sequencing on long tracts of DNA as well as the application of new approaches to automatically find and view DNA variants among these sequences.  相似文献   

19.
In order to develop a large set of single-nucleotide polymorphisms (SNPs) in Cryptomeria japonica, for a wide range of applications, we adopted a systematic EST (expressed sequence tags) re-sequencing approach. We examined a group of four genotypes comprising parents of a mapping population as well as representatives of two main lines from natural populations. We sequenced 5,170 gene fragments, representing analysis of over 1.3?Mb of DNA sequences in C. japonica. This analysis leads to the discovery of 13,413 SNPs in 3,744 amplicons, with an average of one SNP for every 101.0?bp (one SNP for every 78.3?bp in introns and for every 106.7?bp in exon regions). Nucleotide diversity in C. japonica (???=?0.0045) was found to be similar to values recorded in highly polymorphic forest tree species such as pine. We also validated the use of the SNPs as molecular markers for genetic diversity studies using the high throughput SNP genotyping platform GoldenGate. From 1,536 candidate SNP sites tested, 1,164 (75.8?%) were confirmed to be polymorphic. We anticipate that the genome-wide SNP markers reported here will be useful for evaluating the species?? range-wide genetic structure and in marker-assisted selection used as part of the C. japonica tree improvement program.  相似文献   

20.
Kim S  Zhao K  Jiang R  Molitor J  Borevitz JO  Nordborg M  Marjoram P 《Genetics》2006,173(2):1125-1133
We develop methods for exploiting "single-feature polymorphism" data, generated by hybridizing genomic DNA to oligonucleotide expression arrays. Our methods enable the use of such data, which can be regarded as very high density, but imperfect, polymorphism data, for genomewide association or linkage disequilibrium mapping. We use a simulation-based power study to conclude that our methods should have good power for organisms like Arabidopsis thaliana, in which linkage disequilibrium is extensive, the reason being that the noisiness of single-feature polymorphism data is more than compensated for by their great number. Finally, we show how power depends on the accuracy with which single-feature polymorphisms are called.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号