首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
    
The computer program exonsampler automates the sampling of thousands of exon sequences from publicly available reference genome sequences and gene annotation databases. It was designed to provide exon sequences for the efficient, next‐generation gene sequencing method called exon capture. The exon sequences can be sampled by a list of gene name abbreviations (e.g. IFNG, TLR1), or by sampling exons from genes spaced evenly across chromosomes. It provides a list of genomic coordinates (a bed file), as well as a set of sequences in fasta format. User‐adjustable parameters for collecting exon sequences include a minimum and maximum acceptable exon length, maximum number of exonic base pairs (bp) to sample per gene, and maximum total bp for the entire collection. It allows for partial sampling of very large exons. It can preferentially sample upstream (5 prime) exons, downstream (3 prime) exons, both external exons, or all internal exons. It is written in the Python programming language using its free libraries. We describe the use of exonsampler to collect exon sequences from the domestic cow (Bos taurus) genome for the design of an exon‐capture microarray to sequence exons from related species, including the zebu cow and wild bison. We collected ~10% of the exome (~3 million bp), including 155 candidate genes, and ~16 000 exons evenly spaced genomewide. We prioritized the collection of 5 prime exons to facilitate discovery and genotyping of SNPs near upstream gene regulatory DNA sequences, which control gene expression and are often under natural selection.  相似文献   

2.
3.
    
Sequence capture technologies, pioneered in mammalian genomes, enable the resequencing of targeted genomic regions. Most capture protocols require blocking DNA, the production of which in large quantities can prove challenging. A blocker‐free, two‐stage capture protocol was developed using NimbleGen arrays. The first capture depletes the library of repetitive sequences, while the second enriches for target loci. This strategy was used to resequence non‐repetitive portions of an approximately 2.2 Mb chromosomal interval and a set of 43 genes dispersed in the 2.3 Gb maize genome. This approach achieved approximately 1800–3000‐fold enrichment and 80–98% coverage of targeted bases. More than 2500 SNPs were identified in target genes. Low rates of false‐positive SNP predictions were obtained, even in the presence of captured paralogous sequences. Importantly, it was possible to recover novel sequences from non‐reference alleles. The ability to design novel repeat‐subtraction and target capture arrays makes this technology accessible in any species.  相似文献   

4.
Screening large numbers of target regions in multiple DNA samples for sequence variation is an important application of next-generation sequencing but an efficient method to enrich the samples in parallel has yet to be reported. We describe an advanced method that combines DNA samples using indexes or barcodes prior to target enrichment to facilitate this type of experiment. Sequencing libraries for multiple individual DNA samples, each incorporating a unique 6-bp index, are combined in equal quantities, enriched using a single in-solution target enrichment assay and sequenced in a single reaction. Sequence reads are parsed based on the index, allowing sequence analysis of individual samples. We show that the use of indexed samples does not impact on the efficiency of the enrichment reaction. For three- and nine-indexed HapMap DNA samples, the method was found to be highly accurate for SNP identification. Even with sequence coverage as low as 8x, 99% of sequence SNP calls were concordant with known genotypes. Within a single experiment, this method can sequence the exonic regions of hundreds of genes in tens of samples for sequence and structural variation using as little as 1 μg of input DNA per sample.  相似文献   

5.
Several protein-coding genes from land plant chloroplasts have been shown to contain introns. The majority of these introns resemble the fungal mitochondrial group II introns due to considerable nucleotide sequence homology at their 5 and 3 ends and they can readily be folded to form six hairpins characteristic of the predicted secondary structure of the mitochondrial group II introns. Recently it has been demonstrated that some mitochondrial group II introns are capable of self-splicing in vitro in the absence of protein co-factors. However evidence presented in this overview suggests that this is probably not the case for chloroplast introns and that trans-acting factors are almost certainly involved in their processing reactions.Abbreviations kop kilobase pairs - ORF Open Reading Frame - pre-RNA precursor ribonucleic acid  相似文献   

6.
7.
8.
    
Sequence capture studies result in rich data sets comprising hundreds to thousands of targeted genomic regions that are superseding Sanger-based data sets comprised of a few well-known loci with historical uses in phylogenetics (‘legacy loci’). However, integrating sequence capture and Sanger-based data sets is of interest as legacy loci can include different types of loci (e.g. mitochondrial and nuclear) across a potentially larger sample of species from past studies. Sequence capture data sets include nontargeted sequences, and there has been recent interest in extracting legacy loci from invertebrate data sets. Here, we use published legacy data from leaf-footed bugs (Hemiptera: Coreoidea) to recover 15 mitochondrial and seven nuclear legacy loci from off-target sequences in a sequence capture data set, explore approaches to improve legacy locus recovery, and combine these loci with sequence capture data for phylogenetic analysis. Two nuclear loci were determined to already be targeted by sequence capture baits. Most of the remaining loci were successfully recovered from off-target sequences, but this recovery varied greatly. Additionally, complementing complete mitogenomes with additional reference mitochondrial sequences from a genetic depository did not offer improvement for most of our taxa; however, supplementing these reference sequences with extracted legacy loci offered ≥6% improvement across taxa for a given mitochondrial locus (negligible improvement for nuclear loci). Phylogenetic analysis of legacy and sequence capture data produced a topology generally congruent with recent studies, but support was lower. Thus, future studies may employ the approaches used in this study to integrate legacy data with newly generated sequence capture data sets without added expenses.  相似文献   

9.
Neurological disorders comprise a variety of complex diseases in the central nervous system, which can be roughly classified as neurodegenerative diseases and psychiatric disorders. The basic and translational research of neurological disorders has been hindered by the difficulty in accessing the pathological center (i.e., the brain) in live patients. The rapid advancement of sequencing and array technologies has made it possible to investigate the disease mechanism and biomarkers from a systems perspective. In this review, recent progresses in the discovery of novel risk genes, treatment targets and peripheral biomarkers employing genomic technologies will be dis- cussed. Our major focus will be on two of the most heavily investigated neurological disorders, namely Alzheimer's disease and autism spectrum disorder.  相似文献   

10.
    
With the rapid increase in production of genetic data from new sequencing technologies, a myriad of new ways to study genomic patterns in nonmodel organisms are currently possible. Because genome assembly still remains a complicated procedure, and because the functional role of much of the genome is unclear, focusing on SNP genotyping from expressed sequences provides a cost‐effective way to reduce complexity while still retaining functionally relevant information. This review summarizes current methods, identifies ways that using expressed sequence data benefits population genomic inference and explores how current practitioners evaluate and overcome challenges that are commonly encountered. We focus particularly on the additional power of functional analysis provided by expressed sequence data and how these analyses push beyond allele pattern data available from nonfunction genomic approaches. The massive data sets generated by these approaches create opportunities and problems as well – especially false positives. We discuss methods available to validate results from expressed SNP genotyping assays, new approaches that sidestep use of mRNA and review follow‐up experiments that can focus on evolutionary mechanisms acting across the genome.  相似文献   

11.
12.
    
Reduced representation genome sequencing such as restriction‐site‐associated DNA (RAD) sequencing is finding increased use to identify and genotype large numbers of single‐nucleotide polymorphisms (SNPs) in model and nonmodel species. We generated a unique resource of novel SNP markers for the European eel using the RAD sequencing approach that was simultaneously identified and scored in a genome‐wide scan of 30 individuals. Whereas genomic resources are increasingly becoming available for this species, including the recent release of a draft genome, no genome‐wide set of SNP markers was available until now. The generated SNPs were widely distributed across the eel genome, aligning to 4779 different contigs and 19 703 different scaffolds. Significant variation was identified, with an average nucleotide diversity of 0.00529 across individuals. Results varied widely across the genome, ranging from 0.00048 to 0.00737 per locus. Based on the average nucleotide diversity across all loci, long‐term effective population size was estimated to range between 132 000 and 1 320 000, which is much higher than previous estimates based on microsatellite loci. The generated SNP resource consisting of 82 425 loci and 376 918 associated SNPs provides a valuable tool for future population genetics and genomics studies and allows for targeting specific genes and particularly interesting regions of the eel genome.  相似文献   

13.
    
  相似文献   

14.
    
? Bread wheat (Triticum aestivum; Poaceae) is a crop plant of great importance. It provides nearly 20% of the world's daily food supply measured by calorie intake, similar to that provided by rice. The yield of wheat has doubled over the last 40 years due to a combination of advanced agronomic practice and improved germplasm through selective breeding. More recently, yield growth has been less dramatic, and a significant improvement in wheat production will be required if demand from the growing human population is to be met. ? Next-generation sequencing (NGS) technologies are revolutionizing biology and can be applied to address critical issues in plant biology. Technologies can produce draft sequences of genomes with a significant reduction to the cost and timeframe of traditional technologies. In addition, NGS technologies can be used to assess gene structure and expression, and importantly, to identify heritable genome variation underlying important agronomic traits. ? This review provides an overview of the wheat genome and NGS technologies, details some of the problems in applying NGS technology to wheat, and describes how NGS technologies are starting to impact wheat crop improvement.  相似文献   

15.
16.
17.
18.
    
Next-generation sequencing technologies (NGS) have revolutionized biological research by significantly increasing data generation while simultaneously decreasing the time to data output. For many ecologists and evolutionary biologists, the research opportunities afforded by NGS are substantial; even for taxa lacking genomic resources, large-scale genome-level questions can now be addressed, opening up many new avenues of research. While rapid and massive sequencing afforded by NGS increases the scope and scale of many research objectives, whole genome sequencing is often unwarranted and unnecessarily complex for specific research questions. Recently developed targeted sequence enrichment, coupled with NGS, represents a beneficial strategy for enhancing data generation to answer questions in ecology and evolutionary biology. This marriage of technologies offers researchers a simple method to isolate and analyze a few to hundreds, or even thousands, of genes or genomic regions from few to many samples in a relatively efficient and effective manner. These strategies can be applied to questions at both the infra- and interspecific levels, including those involving parentage, gene flow, divergence, phylogenetics, reticulate evolution, and many more. Here we provide a brief overview of targeted sequence enrichment, and emphasize the power of this technology to increase our ability to address a wide range of questions of interest to ecologists and evolutionary biologists, particularly for those working with taxa for which few genomic resources are available.  相似文献   

19.
20.
    
Single nucleotide polymorphisms SNPs are rapidly replacing anonymous markers in population genomic studies, but their use in non model organisms is hampered by the scarcity of cost‐effective approaches to uncover genome‐wide variation in a comprehensive subset of individuals. The screening of one or only a few individuals induces ascertainment bias. To discover SNPs for a population genomic study of the Pyrenean rocket (Sisymbrium austriacum subsp. chrysanthum), we undertook a pooled RAD‐PE (Restriction site Associated DNA Paired‐End sequencing) approach. RAD tags were generated from the PstI‐digested pooled genomic DNA of 12 individuals sampled across the species distribution range and paired‐end sequenced using Illumina technology to produce ~24.5 Mb of sequences, covering ~7% of the specie's genome. Sequences were assembled into ~76 000 contigs with a mean length of 323 bp (N50 = 357 bp, sequencing depth = 24x). In all, >15 000 SNPs were called, of which 47% were annotated in putative genic regions based on homology with the Arabidopsis thaliana genome. Gene ontology (GO) slim categorization demonstrated that the identified SNPs covered extant genic variation well. The validation of 300 SNPs on a larger set of individuals using a KASPar assay underpinned the utility of pooled RAD‐PE as an inexpensive genome‐wide SNP discovery technique (success rate: 87%). In addition to SNPs, we discovered >600 putative SSR markers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号