期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Assessing cell-specific effects of genetic variations using tRNA microarrays

Polte Christine Wedemeyer Daniel Oliver Kathryn E. Wagner Johannes Bijvelds Marcel J. C. Mahoney John de Jonge Hugo R. Sorscher Eric J. Ignatova Zoya 《BMC genomics》2019,20(8):1-3

Background

Short-read resequencing of genomes produces abundant information of the genetic variation of individuals. Due to their numerous nature, these variants are rarely exhaustively validated. Furthermore, low levels of undetected variant miscalling will have a systematic and disproportionate impact on the interpretation of individual genome sequence information, especially should these also be carried through into in reference databases of genomic variation.

Results

We find that sequence variation from short-read sequence data is subject to recurrent-yet-intermittent miscalling that occurs in a sequence intrinsic manner and is very sensitive to sequence read length. The miscalls arise from difficulties aligning short reads to redundant genomic regions, where the rate of sequencing error approaches the sequence diversity between redundant regions. We find the resultant miscalled variants to be sensitive to small sequence variations between genomes, and thereby are often intrinsic to an individual, pedigree, strain or human ethnic group. In human exome sequences, we identify 2–300 recurrent false positive variants per individual, almost all of which are present in public databases of human genomic variation. From the exomes of non-reference strains of inbred mice, we identify 3–5000 recurrent false positive variants per mouse – the number of which increasing with greater distance between an individual mouse strain and the reference C57BL6 mouse genome. We show that recurrently miscalled variants may be reproduced for a given genome from repeated simulation rounds of read resampling, realignment and recalling. As such, it is possible to identify more than two-thirds of false positive variation from only ten rounds of simulation.

Conclusion

Identification and removal of recurrent false positive variants from specific individual variant sets will improve overall data quality. Variant miscalls arising are highly sequence intrinsic and are often specific to an individual, pedigree or ethnicity. Further, read length is a strong determinant of whether given false variants will be called for any given genome – which has profound significance for cohort studies that pool datasets collected and sequenced at different points in time.

相似文献

2.

Impacts of Variation in the Human Genome on Gene Regulation

Rajini R. Haraksingh Michael P. Snyder 《Journal of molecular biology》2013

相似文献

3.

MU2A--reconciling the genome and transcriptome to determine the effects of base substitutions

Garla V Kong Y Szpakowski S Krauthammer M 《Bioinformatics (Oxford, England)》2011,27(3):416-418

相似文献

4.

Genome-wide associations of gene expression variation in humans

下载免费PDF全文

Stranger BE Forrest MS Clark AG Minichiello MJ Deutsch S Lyle R Hunt S Kahl B Antonarakis SE Tavaré S Deloukas P Dermitzakis ET 《PLoS genetics》2005,1(6):e78

The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12–13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level. 相似文献

5.

Copy number variation across European populations

Chen W Hayward C Wright AF Hicks AA Vitart V Knott S Wild SH Pramstaller PP Wilson JF Rudan I Porteous DJ 《PloS one》2011,6(8):e23087

Genome analysis provides a powerful approach to test for evidence of genetic variation within and between geographical regions and local populations. Copy number variants which comprise insertions, deletions and duplications of genomic sequence provide one such convenient and informative source. Here, we investigate copy number variants from genome wide scans of single nucleotide polymorphisms in three European population isolates, the island of Vis in Croatia, the islands of Orkney in Scotland and the South Tyrol in Italy. We show that whereas the overall copy number variant frequencies are similar between populations, their distribution is highly specific to the population of origin, a finding which is supported by evidence for increased kinship correlation for specific copy number variants within populations. 相似文献

6.

GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data

Edoardo Giacopuzzi Niko Popitsch Jenny C Taylor 《Nucleic acids research》2022,50(5):2522

Non-coding variants have long been recognized as important contributors to common disease risks, but with the expansion of clinical whole genome sequencing, examples of rare, high-impact non-coding variants are also accumulating. Despite recent advances in the study of regulatory elements and the availability of specialized data collections, the systematic annotation of non-coding variants from genome sequencing remains challenging. Here, we propose a new framework for the prioritization of non-coding regulatory variants that integrates information about regulatory regions with prediction scores and HPO-based prioritization. Firstly, we created a comprehensive collection of annotations for regulatory regions including a database of 2.4 million regulatory elements (GREEN-DB) annotated with controlled gene(s), tissue(s) and associated phenotype(s) where available. Secondly, we calculated a variation constraint metric and showed that constrained regulatory regions associate with disease-associated genes and essential genes from mouse knock-outs. Thirdly, we compared 19 non-coding impact prediction scores providing suggestions for variant prioritization. Finally, we developed a VCF annotation tool (GREEN-VARAN) that can integrate all these elements to annotate variants for their potential regulatory impact. In our evaluation, we show that GREEN-DB can capture previously published disease-associated non-coding variants as well as identify additional candidate disease genes in trio analyses. 相似文献

7.

The Biological Significance of Multi-copy Regions and Their Impact on Variant Discovery

Jing Sun Yanfang Zhang Minhui Wang Qian Guan Xiujia Yang Jin Xia Ou Mingchen Yan Chengrui Wang Yan Zhang Zhi-Hao Li Chunhong Lan Chen Mao Hong-Wei Zhou Bingtao Hao Zhenhai Zhang 《基因组蛋白质组与生物信息学报(英文版)》2020,18(5):516-524

Identification of genetic variants via high-throughput sequencing (HTS) technologies has been essential for both fundamental and clinical studies. However, to what extent the genome sequence composition affects variant calling remains unclear. In this study, we identified 63,897 multi-copy sequences (MCSs) with a minimum length of 300 bp, each of which occurs at least twice in the human genome. The 151,749 genomic loci (multi-copy regions, or MCRs) harboring these MCSs account for 1.98% of the genome and are distributed unevenly across chromosomes. MCRs containing the same MCS tend to be located on the same chromosome. Gene Ontology (GO) analyses revealed that 3800 genes whose UTRs or exons overlap with MCRs are enriched for Golgi-related cellular component terms and various enzymatic activities in the GO biological function category. MCRs are also enriched for loci that are sensitive to neocarzinostatin-induced double-strand breaks. Moreover, genetic variants discovered by genome-wide association studies and recorded in dbSNP are significantly underrepresented in MCRs. Using simulated HTS datasets, we show that false variant discovery rates are significantly higher in MCRs than in other genomic regions. These results suggest that extra caution must be taken when identifying genetic variants in the MCRs via HTS technologies. 相似文献

8.

A genome-wide comparison of the functional properties of rare and common genetic variants in humans

Zhu Q Ge D Maia JM Zhu M Petrovski S Dickson SP Heinzen EL Shianna KV Goldstein DB 《American journal of human genetics》2011,88(4):407-468

One of the longest running debates in evolutionary biology concerns the kind of genetic variation that is primarily responsible for phenotypic variation in species. Here, we address this question for humans specifically from the perspective of population allele frequency of variants across the complete genome, including both coding and noncoding regions. We establish simple criteria to assess the likelihood that variants are functional based on their genomic locations and then use whole-genome sequence data from 29 subjects of European origin to assess the relationship between the functional properties of variants and their population allele frequencies. We find that for all criteria used to assess the likelihood that a variant is functional, the rarer variants are significantly more likely to be functional than the more common variants. Strikingly, these patterns disappear when we focus on only those variants in which the major alleles are derived. These analyses indicate that the majority of the genetic variation in terms of phenotypic consequence may result from a mutation-selection balance, as opposed to balancing selection, and have direct relevance to the study of human disease. 相似文献

9.

BlackOPs: increasing confidence in variant detection through mappability filtering

Christopher R. Cabanski Matthew D. Wilkerson Matthew Soloway Joel S. Parker Jinze Liu Jan F. Prins J. S. Marron Charles M. Perou D. Neil Hayes 《Nucleic acids research》2013,41(19):e178

Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin (‘mismapping’) and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing. 相似文献

10.

Which Genetics Variants in DNase-Seq Footprints Are More Likely to Alter Binding?

Gregory A. Moyerbrailean Cynthia A. Kalita Chris T. Harvey Xiaoquan Wen Francesca Luca Roger Pique-Regi 《PLoS genetics》2016,12(2)

相似文献

11.

Prioritization of regulatory variants with tissue-specific function in the non-coding regions of human genome

Shengcheng Dong Alan P Boyle 《Nucleic acids research》2022,50(1):e6

Understanding the functional consequences of genetic variation in the non-coding regions of the human genome remains a challenge. We introduce h ere a computational tool, TURF, to prioritize regulatory variants with tissue-specific function by leveraging evidence from functional genomics experiments, including over 3000 functional genomics datasets from the ENCODE project provided in the RegulomeDB database. TURF is able to generate prediction scores at both organism and tissue/organ-specific levels for any non-coding variant on the genome. We present that TURF has an overall top performance in prediction by using validated variants from MPRA experiments. We also demonstrate how TURF can pick out the regulatory variants with tissue-specific function over a candidate list from associate studies. Furthermore, we found that various GWAS traits showed the enrichment of regulatory variants predicted by TURF scores in the trait-relevant organs, which indicates that these variants can be a valuable source for future studies. 相似文献

12.

Targeted Resequencing of the Pericentromere of Chromosome 2 Linked to Constitutional Delay of Growth and Puberty

Diana L. Cousminer Jaakko T. Leinonen Antti-Pekka Sarin Himanshu Chheda Ida Surakka Karoliina Wehkalampi Pekka Ellonen Samuli Ripatti Leo Dunkel Aarno Palotie Elisabeth Widén 《PloS one》2015,10(6)

Constitutional delay of growth and puberty (CDGP) is the most common cause of pubertal delay. CDGP is defined as the proportion of the normal population who experience pubertal onset at least 2 SD later than the population mean, representing 2.3% of all adolescents. While adolescents with CDGP spontaneously enter puberty, they are at risk for short stature, decreased bone mineral density, and psychosocial problems. Genetic factors contribute heavily to the timing of puberty, but the vast majority of CDGP cases remain biologically unexplained, and there is no definitive test to distinguish CDGP from pathological absence of puberty during adolescence. Recently, we published a study identifying significant linkage between a locus at the pericentromeric region of chromosome 2 (chr 2) and CDGP in Finnish families. To investigate this region for causal variation, we sequenced chr 2 between the genomic coordinates of 79–124 Mb (genome build GRCh37) in the proband and affected parent of the 13 families contributing most to this linkage signal. One gene, DNAH6, harbored 6 protein-altering low-frequency variants (< 6% in the Finnish population) in 10 of the CDGP probands. We sequenced an additional 135 unrelated Finnish CDGP subjects and utilized the unique Sequencing Initiative Suomi (SISu) population reference exome set to show that while 5 of these variants were present in the CDGP set, they were also present in the Finnish population at similar frequencies. Additional variants in the targeted region could not be prioritized for follow-up, possibly due to gaps in sequencing coverage or lack of functional knowledge of non-genic genomic regions. Thus, despite having a well-characterized sample collection from a genetically homogeneous population with a large population-based reference sequence dataset, we were unable to pinpoint variation in the linked region predisposing delayed puberty. This study highlights the difficulties of detecting genetic variants under linkage regions for complex traits and suggests that advancements in annotation of gene function and regulatory regions of the genome will be critical for solving the genetic background of complex phenotypes like CDGP. 相似文献

13.

Rare and common regulatory variation in population-scale sequenced human genomes

Montgomery SB Lappalainen T Gutierrez-Arcelus M Dermitzakis ET 《PLoS genetics》2011,7(7):e1002144

Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs) when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs) discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function. 相似文献

14.

Inferring short tandem repeat variation from paired-end short reads

Minh Duc Cao Edward Tasker Kai Willadsen Michael Imelfort Sailaja Vishwanathan Sridevi Sureshkumar Sureshkumar Balasubramanian Mikael Bodén 《Nucleic acids research》2014,42(3):e16

The advances of high-throughput sequencing offer an unprecedented opportunity to study genetic variation. This is challenged by the difficulty of resolving variant calls in repetitive DNA regions. We present a Bayesian method to estimate repeat-length variation from paired-end sequence read data. The method makes variant calls based on deviations in sequence fragment sizes, allowing the analysis of repeats at lengths of relevance to a range of phenotypes. We demonstrate the method’s ability to detect and quantify changes in repeat lengths from short read genomic sequence data across genotypes. We use the method to estimate repeat variation among 12 strains of Arabidopsis thaliana and demonstrate experimentally that our method compares favourably against existing methods. Using this method, we have identified all repeats across the genome, which are likely to be polymorphic. In addition, our predicted polymorphic repeats also included the only known repeat expansion in A. thaliana, suggesting an ability to discover potential unstable repeats. 相似文献

15.

MosaicBase:A Knowledgebase of Postzygotic Mosaic Variants in Noncancer Disease-related and Healthy Human Individuals

《基因组蛋白质组与生物信息学报(英文版)》2020,18(2):140-149

Mosaic variants resulting from postzygotic mutations are prevalent in the human genome and play important roles in human diseases. However, except for cancer-related variants, there is no collection of postzygotic mosaic variants in noncancer disease-related and healthy individuals. Here, we present MosaicBase, a comprehensive database that includes 6698 mosaic variants related to 266 noncancer diseases and 27,991 mosaic variants identified in 422 healthy individuals. Genomic and phenotypic information of each variant was manually extracted and curated from 383 publications. MosaicBase supports the query of variants with Online Mendelian Inheritance in Man (OMIM) entries, genomic coordinates, gene symbols, or Entrez IDs. We also provide an integrated genome browser for users to easily access mosaic variants and their related annotations for any genomic region. By analyzing the variants collected in MosaicBase, we find that mosaic variants that directly contribute to disease phenotype show features distinct from those of variants in individuals with mild or no phenotypes, in terms of their genomic distribution, mutation signatures, and fraction of mutant cells. MosaicBase will not only assist clinicians in genetic counseling and diagnosis but also provide a useful resource to understand the genomic baseline of postzygotic mutations in the general human population. MosaicBase is publicly available at http://mosaicbase.com/ or http://49.4.21.8:8000. 相似文献

16.

Functional constraint and small insertions and deletions in the ENCODE regions of the human genome 总被引：1，自引：0，他引：1

Clark TG Andrew T Cooper GM Margulies EH Mullikin JC Balding DJ 《Genome biology》2007,8(9):R180

Background

We describe the distribution of indels in the 44 Encyclopedia of DNA Elements (ENCODE) regions (about 1% of the human genome) and evaluate the potential contributions of small insertion and deletion polymorphisms (indels) to human genetic variation. We relate indels to known genomic annotation features and measures of evolutionary constraint. 相似文献

17.

Genomics of Divergence along a Continuum of Parapatric Population Differentiation

Philine G. D. Feulner Frédéric J. J. Chain Mahesh Panchal Yun Huang Christophe Eizaguirre Martin Kalbe Tobias L. Lenz Irene E. Samonte Monika Stoll Erich Bornberg-Bauer Thorsten B. H. Reusch Manfred Milinski 《PLoS genetics》2015,11(2)

The patterns of genomic divergence during ecological speciation are shaped by a combination of evolutionary forces. Processes such as genetic drift, local reduction of gene flow around genes causing reproductive isolation, hitchhiking around selected variants, variation in recombination and mutation rates are all factors that can contribute to the heterogeneity of genomic divergence. On the basis of 60 fully sequenced three-spined stickleback genomes, we explore these different mechanisms explaining the heterogeneity of genomic divergence across five parapatric lake and river population pairs varying in their degree of genetic differentiation. We find that divergent regions of the genome are mostly specific for each population pair, while their size and abundance are not correlated with the extent of genome-wide population differentiation. In each pair-wise comparison, an analysis of allele frequency spectra reveals that 25–55% of the divergent regions are consistent with a local restriction of gene flow. Another large proportion of divergent regions (38–75%) appears to be mainly shaped by hitchhiking effects around positively selected variants. We provide empirical evidence that alternative mechanisms determining the evolution of genomic patterns of divergence are not mutually exclusive, but rather act in concert to shape the genome during population differentiation, a first necessary step towards ecological speciation. 相似文献

18.

Efficient and Comprehensive Representation of Uniqueness for Next-Generation Sequencing by Minimum Unique Length Analyses

Helena Storvall Daniel Ramsk?ld Rickard Sandberg 《PloS one》2013,8(1)

相似文献

19.

MLGA—a rapid and cost-efficient assay for gene copy-number analysis

Magnus Isaksson Johan Stenberg Fredrik Dahl Ann-Charlotte Thuresson Marie-Louise Bondeson Mats Nilsson 《Nucleic acids research》2007,35(17):e115

Structural variation is an important cause of genetic variation. Whole genome analysis techniques can efficiently identify copy-number variable regions but there is a need for targeted methods, to verify and accurately size variable regions, and to diagnose large sample cohorts. We have developed a technique based on multiplex amplification of size-coded selectively circularized genomic fragments, which is robust, cheaper and more rapid than current multiplex targeted copy-number assays. 相似文献

20.

Genome‐wide variation within and between wild and domestic yak

Kun Wang Quanjun Hu Hui Ma Lizhong Wang Yongzhi Yang Wenchun Luo Qiang Qiu 《Molecular ecology resources》2014,14(4):794-801

The yak is one of the few animals that can thrive in the harsh environment of the Qinghai‐Tibetan Plateau and adjacent Alpine regions. Yak provides essential resources allowing Tibetans to live at high altitudes. However, genetic variation within and between wild and domestic yak remain unknown. Here, we present a genome‐wide study of the genetic variation within and between wild and domestic yak. Using next‐generation sequencing technology, we resequenced three wild and three domestic yak with a mean of fivefold coverage using our published domestic yak genome as a reference. We identified a total of 8.38 million SNPs (7.14 million novel), 383 241 InDels and 126 352 structural variants between the six yak. We observed higher linkage disequilibrium in domestic yak than in wild yak and a modest but distinct genetic divergence between these two groups. We further identified more than a thousand of potential selected regions (PSRs) for the three domestic yak by scanning the whole genome. These genomic resources can be further used to study genetic diversity and select superior breeds of yak and other bovid species. 相似文献