首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
4.
The computer program exonsampler automates the sampling of thousands of exon sequences from publicly available reference genome sequences and gene annotation databases. It was designed to provide exon sequences for the efficient, next‐generation gene sequencing method called exon capture. The exon sequences can be sampled by a list of gene name abbreviations (e.g. IFNG, TLR1), or by sampling exons from genes spaced evenly across chromosomes. It provides a list of genomic coordinates (a bed file), as well as a set of sequences in fasta format. User‐adjustable parameters for collecting exon sequences include a minimum and maximum acceptable exon length, maximum number of exonic base pairs (bp) to sample per gene, and maximum total bp for the entire collection. It allows for partial sampling of very large exons. It can preferentially sample upstream (5 prime) exons, downstream (3 prime) exons, both external exons, or all internal exons. It is written in the Python programming language using its free libraries. We describe the use of exonsampler to collect exon sequences from the domestic cow (Bos taurus) genome for the design of an exon‐capture microarray to sequence exons from related species, including the zebu cow and wild bison. We collected ~10% of the exome (~3 million bp), including 155 candidate genes, and ~16 000 exons evenly spaced genomewide. We prioritized the collection of 5 prime exons to facilitate discovery and genotyping of SNPs near upstream gene regulatory DNA sequences, which control gene expression and are often under natural selection.  相似文献   

5.
Orchids are known for their beauty and complexity of flower and ecological strategies. The evolution in orchid floral morphology, structure, and physiological properties has held the fascination of botanists for centuries, from Darwin through to the present. In floral studies, MADS‐box genes contributing to the now famous ABCDE model of floral organ identity control have dominated conceptual thinking. The sophisticated orchid floral organization offers an opportunity to discover new variant genes and different levels of complexity to the ABCDE model. Recently, several remarkable research reports on orchid MADS‐box genes, especially B‐class MADS‐box genes, have revealed the evolutionary track and important functions on orchid floral development. Diversification and fixation of both paleoAP3 gene sequences and expression profiles might be explained by subfunctionalization and even neofunctionalization. Knowledge about MADS‐box genes encoding ABCDE functions in orchids will give insights into the highly evolved floral morphogenetic networks of orchids.  相似文献   

6.
Characterization of highly duplicated genes, such as genes of the major histocompatibility complex (MHC), where multiple loci often co‐amplify, has until recently been hindered by insufficient read depths per amplicon. Here, we used ultra‐deep Illumina sequencing to resolve genotypes at exon 3 of MHC class I genes in the sedge warbler (Acrocephalus schoenobaenus). We sequenced 24 individuals in two replicates and used this data, as well as a simulated data set, to test the effect of amplicon coverage (range: 500–20 000 reads per amplicon) on the repeatability of genotyping using four different genotyping approaches. A third replicate employed unique barcoding to assess the extent of tag jumping, that is swapping of individual tag identifiers, which may confound genotyping. The reliability of MHC genotyping increased with coverage and approached or exceeded 90% within‐method repeatability of allele calling at coverages of >5000 reads per amplicon. We found generally high agreement between genotyping methods, especially at high coverages. High reliability of the tested genotyping approaches was further supported by our analysis of the simulated data set, although the genotyping approach relying primarily on replication of variants in independent amplicons proved sensitive to repeatable errors. According to the most repeatable genotyping method, the number of co‐amplifying variants per individual ranged from 19 to 42. Tag jumping was detectable, but at such low frequencies that it did not affect the reliability of genotyping. We thus demonstrate that gene families with many co‐amplifying genes can be reliably genotyped using HTS, provided that there is sufficient per amplicon coverage.  相似文献   

7.
Establishing the sex of individuals in wild systems can be challenging and often requires genetic testing. Genotyping‐by‐sequencing (GBS) and other reduced‐representation DNA sequencing (RRS) protocols (e.g., RADseq, ddRAD) have enabled the analysis of genetic data on an unprecedented scale. Here, we present a novel approach for the discovery and statistical validation of sex‐specific loci in GBS data sets. We used GBS to genotype 166 New Zealand fur seals (NZFS, Arctocephalus forsteri) of known sex. We retained monomorphic loci as potential sex‐specific markers in the locus discovery phase. We then used (i) a sex‐specific locus threshold (SSLT) to identify significantly male‐specific loci within our data set; and (ii) a significant sex‐assignment threshold (SSAT) to confidently assign sex in silico the presence or absence of significantly male‐specific loci to individuals in our data set treated as unknowns (98.9% accuracy for females; 95.8% for males, estimated via cross‐validation). Furthermore, we assigned sex to 86 individuals of true unknown sex using our SSAT and assessed the effect of SSLT adjustments on these assignments. From 90 verified sex‐specific loci, we developed a panel of three sex‐specific PCR primers that we used to ascertain sex independently of our GBS data, which we show amplify reliably in at least two other pinniped species. Using monomorphic loci normally discarded from large SNP data sets is an effective way to identify robust sex‐linked markers for nonmodel species. Our novel pipeline can be used to identify and statistically validate monomorphic and polymorphic sex‐specific markers across a range of species and RRS data sets.  相似文献   

8.
The Périgord black truffle (Tuber melanosporum Vittad.), considered a gastronomic delicacy worldwide, is an ectomycorrhizal filamentous fungus that is ecologically important in Mediterranean French, Italian and Spanish woodlands. In this study, we developed a novel resource of single nucleotide polymorphisms (SNPs) for T. melanosporum using Illumina high‐throughput resequencing. The genome from six T. melanosporum geographical accessions was sequenced to a depth of approximately 20×. These geographical accessions were selected from different populations within the northern and southern regions of the geographical species distribution. Approximately 80% of the reads for each of the six resequenced geographical accessions mapped against the reference T. melanosporum genome assembly, estimating the core genome size of this organism to be approximately 110 Mbp. A total of 442 326 SNPs corresponding to 3540 SNPs/Mbps were identified as being included in all seven genomes. The SNPs occurred more frequently in repeated sequences (85%), although 4501 SNPs were also identified in the coding regions of 2587 genes. Using the ratio of nonsynonymous mutations per nonsynonymous site (pN) to synonymous mutations per synonymous site (pS) and Tajima's D index scanning the whole genome, we were able to identify genomic regions and genes potentially subjected to positive or purifying selection. The SNPs identified represent a valuable resource for future population genetics and genomics studies.  相似文献   

9.
DNA barcodes are useful for species discovery and species identification, but obtaining barcodes currently requires a well‐equipped molecular laboratory and is time‐consuming, and/or expensive. We here address these issues by developing a barcoding pipeline for Oxford Nanopore MinION? and demonstrating that one flow cell can generate barcodes for ~500 specimens despite the high basecall error rates of MinION? reads. The pipeline overcomes these errors by first summarizing all reads for the same tagged amplicon as a consensus barcode. Consensus barcodes are overall mismatch‐free but retain indel errors that are concentrated in homopolymeric regions. They are addressed with an optional error correction pipeline that is based on conserved amino acid motifs from publicly available barcodes. The effectiveness of this pipeline is documented by analysing reads from three MinION? runs that represent three different stages of MinION? development. They generated data for (i) 511 specimens of a mixed Diptera sample, (ii) 575 specimens of ants and (iii) 50 specimens of Chironomidae. The run based on the latest chemistry yielded MinION? barcodes for 490 of the 511 specimens which were assessed against reference Sanger barcodes (N = 471). Overall, the MinION? barcodes have an accuracy of 99.3%–100% with the number of ambiguous bases after correction ranging from <0.01% to 1.5% depending on which correction pipeline is used. We demonstrate that it requires ~2 hr of sequencing to gather all information needed for obtaining reliable barcodes for most specimens (>90%). We estimate that up to 1,000 barcodes can be generated in one flow cell and that the cost per barcode can be 相似文献   

10.
11.
The European rabbit (Oryctolagus cuniculus) is a domesticated species with one of the broadest ranges of economic and scientific applications and fields of investigation. Rabbit genome information and assembly are available (oryCun2.0), but so far few studies have investigated its variability, and massive discovery of polymorphisms has not been published yet for this species. Here, we sequenced two reduced representation libraries (RRLs) to identify single nucleotide polymorphisms (SNPs) in the rabbit genome. Genomic DNA of 10 rabbits belonging to different breeds was pooled and digested with two restriction enzymes (HaeIII and RsaI) to create two RRLs which were sequenced using the Ion Torrent Personal Genome Machine. The two RRLs produced 2 917 879 and 4 046 871 reads, for a total of 280.51 Mb (248.49 Mb with quality >20) and 417.28 Mb (360.89 Mb with quality >20) respectively of sequenced DNA. About 90% and 91% respectively of the obtained reads were mapped on the rabbit genome, covering a total of 15.82% of the oryCun2.0 genome version. The mapping and ad hoc filtering procedures allowed to reliably call 62 491 SNPs. SNPs in a few genomic regions were validated by Sanger sequencing. The Variant Effect Predictor Web tool was used to map SNPs on the current version of the rabbit genome. The obtained results will be useful for many applied and basic research programs for this species and will contribute to the development of cost‐effective solutions for high‐throughput SNP genotyping in the rabbit.  相似文献   

12.
13.
14.
Barley (Hordeum vulgare L.) is a major cereal grain widely used for livestock feed, brewing malts and human food. Grain yield is the most important breeding target for genetic improvement and largely depends on optimal timing of flowering. Little is known about the allelic diversity of genes that underlie flowering time in domesticated barley, the genetic changes that have occurred during breeding, and their impact on yield and adaptation. Here, we report a comprehensive genomic assessment of a worldwide collection of 895 barley accessions based on the targeted resequencing of phenology genes. A versatile target‐capture method was used to detect genome‐wide polymorphisms in a panel of 174 flowering time‐related genes, chosen based on prior knowledge from barley, rice and Arabidopsis thaliana. Association studies identified novel polymorphisms that accounted for observed phenotypic variation in phenology and grain yield, and explained improvements in adaptation as a result of historical breeding of Australian barley cultivars. We found that 50% of genetic variants associated with grain yield, and 67% of the plant height variation was also associated with phenology. The precise identification of favourable alleles provides a genomic basis to improve barley yield traits and to enhance adaptation for specific production areas.  相似文献   

15.
16.
Single nucleotide polymorphisms (SNPs) are essential to the understanding of population genetic variation and diversity. Here, we performed restriction‐site‐associated DNA sequencing (RAD‐seq) on 72 individuals from 13 Chinese indigenous and three introduced chicken breeds. A total of 620 million reads were obtained using an Illumina Hiseq2000 sequencer. An average of 75 587 SNPs were identified from each individual. Further filtering strictly validated 28 895 SNPs candidates for all populations. When compared with the NCBI dbSNP (chicken_9031), 15 404 SNPs were new discoveries. In this study, RAD‐seq was performed for the first time on chickens, implicating the remarkable effectiveness and potential applications on genetic analysis and breeding technique for whole‐genome selection in chicken and other agricultural animals.  相似文献   

17.
This study investigated the birth of a brownbanded bamboo shark Chiloscyllium punctatum at the Steinhart Aquarium. Genetic analyses suggest this is the longest documented case of sperm storage for any species of shark (45 months).  相似文献   

18.
19.
Despite the importance of polyploidy and the increasing availability of new genomic data, there remain important gaps in our knowledge of polyploid population genetics. These gaps arise from the complex nature of polyploid data (e.g. multiple alleles and loci, mixed inheritance patterns, association between ploidy and mating system variation). Furthermore, many of the standard tools for population genetics that have been developed for diploids are often not feasible for polyploids. This review aims to provide an overview of the state‐of‐the‐art in polyploid population genetics and to identify the main areas where further development of molecular techniques and statistical theory is required. We review commonly used molecular tools (amplified fragment length polymorphism, microsatellites, Sanger sequencing, next‐generation sequencing and derived technologies) and their challenges associated with their use in polyploid populations: that is, allele dosage determination, null alleles, difficulty of distinguishing orthologues from paralogues and copy number variation. In addition, we review the approaches that have been used for population genetic analysis in polyploids and their specific problems. These problems are in most cases directly associated with dosage uncertainty and the problem of inferring allele frequencies and assumptions regarding inheritance. This leads us to conclude that for advancing the field of polyploid population genetics, most priority should be given to development of new molecular approaches that allow efficient dosage determination, and to further development of analytical approaches to circumvent dosage uncertainty and to accommodate ‘flexible’ modes of inheritance. In addition, there is a need for more simulation‐based studies that test what kinds of biases could result from both existing and novel approaches.  相似文献   

20.
Type specimens have high scientific importance because they provide the only certain connection between the application of a Linnean name and a physical specimen. Many other individuals may have been identified as a particular species, but their linkage to the taxon concept is inferential. Because type specimens are often more than a century old and have experienced conditions unfavourable for DNA preservation, success in sequence recovery has been uncertain. This study addresses this challenge by employing next‐generation sequencing (NGS) to recover sequences for the barcode region of the cytochrome c oxidase 1 gene from small amounts of template DNA. DNA quality was first screened in more than 1800 century‐old type specimens of Lepidoptera by attempting to recover 164‐bp and 94‐bp reads via Sanger sequencing. This analysis permitted the assignment of each specimen to one of three DNA quality categories – high (164‐bp sequence), medium (94‐bp sequence) or low (no sequence). Ten specimens from each category were subsequently analysed via a PCR‐based NGS protocol requiring very little template DNA. It recovered sequence information from all specimens with average read lengths ranging from 458 bp to 610 bp for the three DNA categories. By sequencing ten specimens in each NGS run, costs were similar to Sanger analysis. Future increases in the number of specimens processed in each run promise substantial reductions in cost, making it possible to anticipate a future where barcode sequences are available from most type specimens.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号