首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Copy number variants (CNVs) are thought to play an important role in the predisposition to autism spectrum disorder (ASD). However, their relatively low frequency and widespread genomic distribution complicates their accurate characterization and utilization for clinical genetics purposes. Here we present a comprehensive analysis of multi-study, genome-wide CNV data from AutDB (http://mindspec.org/autdb.html), a genetic database that accommodates detailed annotations of published scientific reports of CNVs identified in ASD individuals. Overall, we evaluated 4,926 CNVs in 2,373 ASD subjects from 48 scientific reports, encompassing ∼2.12×109 bp of genomic data. Remarkable variation was seen in CNV size, with duplications being significantly larger than deletions, (P  =  3×10−105; Wilcoxon rank sum test). Examination of the CNV burden across the genome revealed 11 loci with a significant excess of CNVs among ASD subjects (P<7×10−7). Altogether, these loci covered 15,610 kb of the genome and contained 166 genes. Remarkable variation was seen both in locus size (20 - 4950 kb), and gene content, with seven multigenic (≥3 genes) and four monogenic loci. CNV data from control populations was used to further refine the boundaries of these ASD susceptibility loci. Interestingly, our analysis indicates that 15q11.2-13.3, a genomic region prone to chromosomal rearrangements of various sizes, contains three distinct ASD susceptibility CNV loci that vary in their genomic boundaries, CNV types, inheritance patterns, and overlap with CNVs from control populations. In summary, our analysis of AutDB CNV data provides valuable insights into the genomic characteristics of ASD susceptibility CNV loci and could therefore be utilized in various clinical settings and facilitate future genetic research of this disorder.  相似文献   

2.

Background

By examining the genotype calls generated by the 1000 Genomes Project we discovered that the human reference genome GRCh37 contains almost 20,000 loci in which the reference allele has never been observed in healthy individuals and around 70,000 loci in which it has been observed only in the heterozygous state.

Results

We show that a large fraction of this rare reference allele (RRA) loci belongs to coding, functional and regulatory elements of the genome and could be linked to rare Mendelian disorders as well as cancer. We also demonstrate that classical germline and somatic variant calling tools are not capable to recognize the rare allele when present in these loci. To overcome such limitations, we developed a novel tool, named RAREVATOR, that is able to identify and call the rare allele in these genomic positions. By using a small cancer dataset we compared our tool with two state-of-the-art callers and we found that RAREVATOR identified more than 1,500 germline and 22 somatic RRA variants missed by the two methods and which belong to significantly mutated pathways.

Conclusions

These results show that, to date, the investigation of around 100,000 loci of the human genome has been missed by re-sequencing experiments based on the GRCh37 assembly and that our tool can fill the gap left by other methods. Moreover, the investigation of the latest version of the human reference genome, GRCh38, showed that although the GRC corrected almost all insertions and a small part of SNVs and deletions, a large number of functionally relevant RRAs still remain unchanged. For this reason, also future resequencing experiments, based on GRCh38, will benefit from RAREVATOR analysis results. RAREVATOR is freely available at http://sourceforge.net/projects/rarevator.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1481-9) contains supplementary material, which is available to authorized users.  相似文献   

3.
Genomic enrichment methods and next-generation sequencing produce uneven coverage for the portions of the genome (the loci) they target; this information is essential for ascertaining the suitability of each locus for further analysis. lociNGS is a user-friendly accessory program that takes multi-FASTA formatted loci, next-generation sequence alignments and demographic data as input and collates, displays and outputs information about the data. Summary information includes the parameters coverage per locus, coverage per individual and number of polymorphic sites, among others. The program can output the raw sequences used to call loci from next-generation sequencing data. lociNGS also reformats subsets of loci in three commonly used formats for multi-locus phylogeographic and population genetics analyses – NEXUS, IMa2 and Migrate. lociNGS is available at https://github.com/SHird/lociNGS and is dependent on installation of MongoDB (freely available at http://www.mongodb.org/downloads). lociNGS is written in Python and is supported on MacOSX and Unix; it is distributed under a GNU General Public License.  相似文献   

4.
Clinically relevant features of monogenic diseases, including severity of symptoms and age of onset, can vary widely in response to environmental differences as well as to the presence of genetic modifiers affecting the trait’s penetrance and expressivity. While a better understanding of modifier loci could lead to treatments for Mendelian diseases, the rarity of individuals harboring both a disease-causing allele and a modifying genotype hinders their study in human populations. We examined the genetic architecture of monogenic trait modifiers using a well-characterized yeast model of the human Mendelian disease classic galactosemia. Yeast strains with loss-of-function mutations in the yeast ortholog (GAL7) of the human disease gene (GALT) fail to grow in the presence of even small amounts of galactose due to accumulation of the same toxic intermediates that poison human cells. To isolate and individually genotype large numbers of the very rare (∼0.1%) galactose-tolerant recombinant progeny from a cross between two gal7Δ parents, we developed a new method, called “FACS-QTL.” FACS-QTL improves upon the currently used approaches of bulk segregant analysis and extreme QTL mapping by requiring less genome engineering and strain manipulation as well as maintaining individual genotype information. Our results identified multiple distinct solutions by which the monogenic trait could be suppressed, including genetic and nongenetic mechanisms as well as frequent aneuploidy. Taken together, our results imply that the modifiers of monogenic traits are likely to be genetically complex and heterogeneous.  相似文献   

5.
Structural genomic variations play an important role in human disease and phenotypic diversity. With the rise of high-throughput sequencing tools, mate-pair/paired-end/single-read sequencing has become an important technique for the detection and exploration of structural variation. Several analysis tools exist to handle different parts and aspects of such sequencing based structural variation analyses pipelines. A comprehensive analysis platform to handle all steps, from processing the sequencing data, to the discovery and visualization of structural variants, is missing. The ViVar platform is built to handle the discovery of structural variants, from Depth Of Coverage analysis, aberrant read pair clustering to split read analysis. ViVar provides you with powerful visualization options, enables easy reporting of results and better usability and data management. The platform facilitates the processing, analysis and visualization, of structural variation based on massive parallel sequencing data, enabling the rapid identification of disease loci or genes. ViVar allows you to scale your analysis with your work load over multiple (cloud) servers, has user access control to keep your data safe and is easy expandable as analysis techniques advance. URL: https://www.cmgg.be/vivar/  相似文献   

6.
Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/.  相似文献   

7.
Whole-genome sequencing of Mauritian cynomolgus macaques reveals novel candidate loci for controlling simian immunodeficiency virus replication.See related Research, http://genomebiology.com/2014/15/11/478  相似文献   

8.
In response to DNA damage, two general but fundamental processes occur in the cell: (1) a DNA lesion is recognized and repaired, and (2) concomitantly, the cell halts the cell cycle to provide a window of opportunity for repair to occur. An essential factor for a proper DNA-damage response is the heterotrimeric protein complex Replication Protein A (RPA). Of particular interest is hyperphosphorylation of the 32-kDa subunit, called RPA2, on its serine/threonine-rich amino (N) terminus following DNA damage in human cells. The unstructured N-terminus is often referred to as the phosphorylation domain and is conserved among eukaryotic RPA2 subunits, including Rfa2 in Saccharomyces cerevisiae. An aspartic acid/alanine-scanning and genetic interaction approach was utilized to delineate the importance of this domain in budding yeast. It was determined that the Rfa2 N-terminus is important for a proper DNA-damage response in yeast, although its phosphorylation is not required. Subregions of the Rfa2 N-terminus important for the DNA-damage response were also identified. Finally, an Rfa2 N-terminal hyperphosphorylation-mimetic mutant behaves similarly to another Rfa1 mutant (rfa1-t11) with respect to genetic interactions, DNA-damage sensitivity, and checkpoint adaptation. Our data indicate that post-translational modification of the Rfa2 N-terminus is not required for cells to deal with “repairable” DNA damage; however, post-translational modification of this domain might influence whether cells proceed into M-phase in the continued presence of unrepaired DNA lesions as a “last-resort” mechanism for cell survival.  相似文献   

9.
High-throughput sequencing is increasingly being used in combination with bisulfite (BS) assays to study DNA methylation at nucleotide resolution. Although several programmes provide genome-wide alignment of BS-treated reads, the resulting information is not readily interpretable and often requires further bioinformatic steps for meaningful analysis. Current post-alignment BS-sequencing programmes are generally focused on the gene-specific level, a restrictive feature when analysis in the non-coding regions, such as enhancers and intergenic microRNAs, is required. Here, we present Genome Bisulfite Sequencing Analyser (GBSA—http://ctrad-csi.nus.edu.sg/gbsa), a free open-source software capable of analysing whole-genome bisulfite sequencing data with either a gene-centric or gene-independent focus. Through analysis of the largest published data sets to date, we demonstrate GBSA’s features in providing sequencing quality assessment, methylation scoring, functional data management and visualization of genomic methylation at nucleotide resolution. Additionally, we show that GBSA’s output can be easily integrated with other high-throughput sequencing data, such as RNA-Seq or ChIP-seq, to elucidate the role of methylated intergenic regions in gene regulation. In essence, GBSA allows an investigator to explore not only known loci but also all the genomic regions, for which methylation studies could lead to the discovery of new regulatory mechanisms.  相似文献   

10.
The power of population genetic analyses is often limited by sample size resulting from constraints in financial resources and time to genotype large numbers of individuals. This particularly applies to nonmodel species where detailed genomic knowledge is lacking. Next‐generation sequencing technology using primers ‘tagged’ with an individual barcode of a few nucleotides offers the opportunity to genotype hundreds of individuals at several loci in parallel ( Binladen et al. 2007 ; Meyer et al. 2008 ). The large number of sequence reads can also be used to identify artefacts by frequency distribution thresholds intrinsically determined for each run and data set. In Babik et al. (2009 ), next‐generation deep sequencing was used to genotype several major histocompatibility complex (MHC) class IIB loci of the European bank vole ( Fig. 1 ). Their approach can be useful for many researchers working with complex multiallelic templates and large sample sizes.
Figure 1 Open in figure viewer PowerPoint Hypothetical example of parallel genotyping of two individuals using individually bar‐coded primers. Polymerase chain reactions (PCRs) are performed separately for each individual using a forward primer with a unique Tag‐sequence of four nucleotides. After sequencing of pooled PCR products, sequences can be sorted by their forward primer Tag (Tag‐sorting error rate was estimated < 0.1%). Rare sequences most likely represent artefacts and due to the large amount of sequences obtained (up to 106) the artefact threshold can be determined intrinsically for each data set and was estimated to be around 3% in the case of bank vole MHC class IIB genes ( Babik et al. 2009 ). Photos by Gabriela Bydlon.  相似文献   

11.
12.
Researchers generating new genome-wide data in an exploratory sequencing study can gain biological insights by comparing their data with well-annotated data sets possessing similar genomic patterns. Data compression techniques are needed for efficient comparisons of a new genomic experiment with large repositories of publicly available profiles. Furthermore, data representations that allow comparisons of genomic signals from different platforms and across species enhance our ability to leverage these large repositories. Here, we present a signal processing approach that characterizes protein–chromatin interaction patterns at length scales of several kilobases. This allows us to efficiently compare numerous chromatin-immunoprecipitation sequencing (ChIP-seq) data sets consisting of many types of DNA-binding proteins collected from a variety of cells, conditions and organisms. Importantly, these interaction patterns broadly reflect the biological properties of the binding events. To generate these profiles, termed Arpeggio profiles, we applied harmonic deconvolution techniques to the autocorrelation profiles of the ChIP-seq signals. We used 806 publicly available ChIP-seq experiments and showed that Arpeggio profiles with similar spectral densities shared biological properties. Arpeggio profiles of ChIP-seq data sets revealed characteristics that are not easily detected by standard peak finders. They also allowed us to relate sequencing data sets from different genomes, experimental platforms and protocols. Arpeggio is freely available at http://sourceforge.net/p/arpeggio/wiki/Home/.  相似文献   

13.

Background

Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality.

Results

In order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with ~40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS.

Conclusions

By exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process.We have made public the input data (FASTQ format) for the set of pools used in this study:ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/.(alternatively accessible via http://congenie.org/downloads).The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-439) contains supplementary material, which is available to authorized users.  相似文献   

14.

Background

Over the past years, reports have indicated that honey bee populations are declining and that infestation by an ecto-parasitic mite (Varroa destructor) is one of the main causes. Selective breeding of resistant bees can help to prevent losses due to the parasite, but it requires that a robust breeding program and genetic evaluation are implemented. Genomic selection has emerged as an important tool in animal breeding programs and simulation studies have shown that it yields more accurate breeding value estimates, higher genetic gain and low rates of inbreeding. Since genomic selection relies on marker data, simulations conducted on a genomic dataset are a pre-requisite before selection can be implemented. Although genomic datasets have been simulated in other species undergoing genetic evaluation, simulation of a genomic dataset specific to the honey bee is required since this species has a distinct genetic and reproductive biology. Our software program was aimed at constructing a base population by simulating a random mating honey bee population. A forward-time population simulation approach was applied since it allows modeling of genetic characteristics and reproductive behavior specific to the honey bee.

Results

Our software program yielded a genomic dataset for a base population in linkage disequilibrium. In addition, information was obtained on (1) the position of markers on each chromosome, (2) allele frequency, (3) χ2 statistics for Hardy-Weinberg equilibrium, (4) a sorted list of markers with a minor allele frequency less than or equal to the input value, (5) average r2 values of linkage disequilibrium between all simulated marker loci pair for all generations and (6) average r2 value of linkage disequilibrium in the last generation for selected markers with the highest minor allele frequency.

Conclusion

We developed a software program that takes into account the genetic and reproductive biology specific to the honey bee and that can be used to constitute a genomic dataset compatible with the simulation studies necessary to optimize breeding programs. The source code together with an instruction file is freely accessible at http://msproteomics.org/Research/Misc/honeybeepopulationsimulator.html  相似文献   

15.
16.
17.
Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).  相似文献   

18.
We describe methclone, a novel method to identify epigenetic loci that harbor large changes in the clonality of their epialleles (epigenetic alleles). Methclone efficiently analyzes genome-wide DNA methylation sequencing data. We quantify the changes using a composition entropy difference calculation and also introduce a new measure of global clonality shift, loci with epiallele shift per million loci covered, which enables comparisons between different samples to gauge overall epiallelic dynamics. Finally, we demonstrate the utility of methclone in capturing functional epiallele shifts in leukemia patients from diagnosis to relapse. Methclone is open-source and freely available at https://code.google.com/p/methclone.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0472-5) contains supplementary material, which is available to authorized users.  相似文献   

19.
20.
The study of cell-population heterogeneity in a range of biological systems, from viruses to bacterial isolates to tumor samples, has been transformed by recent advances in sequencing throughput. While the high-coverage afforded can be used, in principle, to identify very rare variants in a population, existing ad hoc approaches frequently fail to distinguish true variants from sequencing errors. We report a method (LoFreq) that models sequencing run-specific error rates to accurately call variants occurring in <0.05% of a population. Using simulated and real datasets (viral, bacterial and human), we show that LoFreq has near-perfect specificity, with significantly improved sensitivity compared with existing methods and can efficiently analyze deep Illumina sequencing datasets without resorting to approximations or heuristics. We also present experimental validation for LoFreq on two different platforms (Fluidigm and Sequenom) and its application to call rare somatic variants from exome sequencing datasets for gastric cancer. Source code and executables for LoFreq are freely available at http://sourceforge.net/projects/lofreq/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号