共查询到20条相似文献,搜索用时 15 毫秒
1.
Ananyo Choudhury Scott Hazelhurst Ayton Meintjes Ovokeraye Achinike-Oduaran Shaun Aron Junaid Gamieldien Mahjoubeh Jalali Sefid Dashti Nicola Mulder Nicki Tiffin Michèle Ramsay 《BMC genomics》2014,15(1)
Background
Population differentiation is the result of demographic and evolutionary forces. Whole genome datasets from the 1000 Genomes Project (October 2012) provide an unbiased view of genetic variation across populations from Europe, Asia, Africa and the Americas. Common population-specific SNPs (MAF > 0.05) reflect a deep history and may have important consequences for health and wellbeing. Their interpretation is contextualised by currently available genome data.Results
The identification of common population-specific (CPS) variants (SNPs and SSV) is influenced by admixture and the sample size under investigation. Nine of the populations in the 1000 Genomes Project (2 African, 2 Asian (including a merged Chinese group) and 5 European) revealed that the African populations (LWK and YRI), followed by the Japanese (JPT) have the highest number of CPS SNPs, in concordance with their histories and given the populations studied. Using two methods, sliding 50-SNP and 5-kb windows, the CPS SNPs showed distinct clustering across large genome segments and little overlap of clusters between populations. iHS enrichment score and the population branch statistic (PBS) analyses suggest that selective sweeps are unlikely to account for the clustering and population specificity. Of interest is the association of clusters close to recombination hotspots. Functional analysis of genes associated with the CPS SNPs revealed over-representation of genes in pathways associated with neuronal development, including axonal guidance signalling and CREB signalling in neurones.Conclusions
Common population-specific SNPs are non-randomly distributed throughout the genome and are significantly associated with recombination hotspots. Since the variant alleles of most CPS SNPs are the derived allele, they likely arose in the specific population after a split from a common ancestor. Their proximity to genes involved in specific pathways, including neuronal development, suggests evolutionary plasticity of selected genomic regions. Contrary to expectation, selective sweeps did not play a large role in the persistence of population-specific variation. This suggests a stochastic process towards population-specific variation which reflects demographic histories and may have some interesting implications for health and susceptibility to disease.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-437) contains supplementary material, which is available to authorized users. 相似文献2.
3.
4.
Can Alkan Pinar Kavak Mehmet Somel Omer Gokcumen Serkan Ugurlu Ceren Saygi Elif Dal Kuyas Bugra Tunga Güng?r S Cenk Sahinalp Nesrin ?z?ren Cemalettin Bekpen 《BMC genomics》2014,15(1)
Background
Turkey is a crossroads of major population movements throughout history and has been a hotspot of cultural interactions. Several studies have investigated the complex population history of Turkey through a limited set of genetic markers. However, to date, there have been no studies to assess the genetic variation at the whole genome level using whole genome sequencing. Here, we present whole genome sequences of 16 Turkish individuals resequenced at high coverage (32 × -48×).Results
We show that the genetic variation of the contemporary Turkish population clusters with South European populations, as expected, but also shows signatures of relatively recent contribution from ancestral East Asian populations. In addition, we document a significant enrichment of non-synonymous private alleles, consistent with recent observations in European populations. A number of variants associated with skin color and total cholesterol levels show frequency differentiation between the Turkish populations and European populations. Furthermore, we have analyzed the 17q21.31 inversion polymorphism region (MAPT locus) and found increased allele frequency of 31.25% for H1/H2 inversion polymorphism when compared to European populations that show about 25% of allele frequency.Conclusion
This study provides the first map of common genetic variation from 16 western Asian individuals and thus helps fill an important geographical gap in analyzing natural human variation and human migration. Our data will help develop population-specific experimental designs for studies investigating disease associations and demographic history in Turkey.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-963) contains supplementary material, which is available to authorized users. 相似文献5.
Jeremy T Howard Christian Maltecca Mekonnen Haile-Mariam Ben J Hayes Jennie E Pryce 《BMC genomics》2015,16(1)
Background
Dairy cattle breeding objectives are in general similar across countries, but environment and management conditions may vary, giving rise to slightly different selection pressures applied to a given trait. This potentially leads to different selection pressures to loci across the genome that, if large enough, may give rise to differential regions with high levels of homozygosity. The objective of this study was to characterize differences and similarities in the location and frequency of homozygosity related measures of Jersey dairy cows and bulls from the United States (US), Australia (AU) and New Zealand (NZ).Results
The populations consisted of a subset of genotyped Jersey cows born in US (n = 1047) and AU (n = 886) and Jersey bulls progeny tested from the US (n = 736), AU (n = 306) and NZ (n = 768). Differences and similarities across populations were characterized using a principal component analysis (PCA) and a run of homozygosity (ROH) statistic (ROH45), which counts the frequency of a single nucleotide polymorphism (SNP) being in a ROH of at least 45 SNP. Regions that exhibited high frequencies of ROH45 and those that had significantly different ROH45 frequencies between populations were investigated for their association with milk yield traits. Within sex, the PCA revealed slight differentiation between the populations, with the greatest occurring between the US and NZ bulls. Regions with high levels of ROH45 for all populations were detected on BTA3 and BTA7 while several other regions differed in ROH45 frequency across populations, the largest number occurring for the US and NZ bull contrast. In addition, multiple regions with different ROH45 frequencies across populations were found to be associated with milk yield traits.Conclusion
Multiple regions exhibited differential ROH45 across AU, NZ and US cow and bull populations, an interpretation is that locations of the genome are undergoing differential directional selection. Two regions on BTA3 and BTA7 had high ROH45 frequencies across all populations and will be investigated further to determine the gene(s) undergoing directional selection.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1352-4) contains supplementary material, which is available to authorized users. 相似文献6.
Bing Guo Paul L Greenwood Linda M Cafe Guanghong Zhou Wangang Zhang Brian P Dalrymple 《BMC genomics》2015,16(1)
Background
This study aimed to identify markers for muscle growth rate and the different cellular contributors to cattle muscle and to link the muscle growth rate markers to specific cell types.Results
The expression of two groups of genes in the longissimus muscle (LM) of 48 Brahman steers of similar age, significantly enriched for “cell cycle” and “ECM (extracellular matrix) organization” Gene Ontology (GO) terms was correlated with average daily gain/kg liveweight (ADG/kg) of the animals. However, expression of the same genes was only partly related to growth rate across a time course of postnatal LM development in two cattle genotypes, Piedmontese x Hereford (high muscling) and Wagyu x Hereford (high marbling). The deposition of intramuscular fat (IMF) altered the relationship between the expression of these genes and growth rate. K-means clustering across the development time course with a large set of genes (5,596) with similar expression profiles to the ECM genes was undertaken. The locations in the clusters of published markers of different cell types in muscle were identified and used to link clusters of genes to the cell type most likely to be expressing them. Overall correspondence between published cell type expression of markers and predicted major cell types of expression in cattle LM was high. However, some exceptions were identified: expression of SOX8 previously attributed to muscle satellite cells was correlated with angiogenesis. Analysis of the clusters and cell types suggested that the “cell cycle” and “ECM” signals were from the fibro/adipogenic lineage. Significant contributions to these signals from the muscle satellite cells, angiogenic cells and adipocytes themselves were not as strongly supported. Based on the clusters and cell type markers, sets of five genes predicted to be representative of fibro/adipogenic precursors (FAPs) and endothelial cells, and/or ECM remodelling and angiogenesis were identified.Conclusions
Gene sets and gene markers for the analysis of many of the major processes/cell populations contributing to muscle composition and growth have been proposed, enabling a consistent interpretation of gene expression datasets from cattle LM. The same gene sets are likely to be applicable in other cattle muscles and in other species.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1403-x) contains supplementary material, which is available to authorized users. 相似文献7.
Background
Top-down mass spectrometry plays an important role in intact protein identification and characterization. Top-down mass spectra are more complex than bottom-up mass spectra because they often contain many isotopomer envelopes from highly charged ions, which may overlap with one another. As a result, spectral deconvolution, which converts a complex top-down mass spectrum into a monoisotopic mass list, is a key step in top-down spectral interpretation.Results
In this paper, we propose a new scoring function, L-score, for evaluating isotopomer envelopes. By combining L-score with MS-Deconv, a new software tool, MS-Deconv+, was developed for top-down spectral deconvolution. Experimental results showed that MS-Deconv+ outperformed existing software tools in top-down spectral deconvolution.Conclusions
L-score shows high discriminative ability in identification of isotopomer envelopes. Using L-score, MS-Deconv+ reports many correct monoisotopic masses missed by other software tools, which are valuable for proteoform identification and characterization.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1140) contains supplementary material, which is available to authorized users. 相似文献8.
Arianne C Richard Paul A Lyons James E Peters Daniele Biasci Shaun M Flint James C Lee Eoin F McKinney Richard M Siegel Kenneth GC Smith 《BMC genomics》2014,15(1)
Background
Although numerous investigations have compared gene expression microarray platforms, preprocessing methods and batch correction algorithms using constructed spike-in or dilution datasets, there remains a paucity of studies examining the properties of microarray data using diverse biological samples. Most microarray experiments seek to identify subtle differences between samples with variable background noise, a scenario poorly represented by constructed datasets. Thus, microarray users lack important information regarding the complexities introduced in real-world experimental settings. The recent development of a multiplexed, digital technology for nucleic acid measurement enables counting of individual RNA molecules without amplification and, for the first time, permits such a study.Results
Using a set of human leukocyte subset RNA samples, we compared previously acquired microarray expression values with RNA molecule counts determined by the nCounter Analysis System (NanoString Technologies) in selected genes. We found that gene measurements across samples correlated well between the two platforms, particularly for high-variance genes, while genes deemed unexpressed by the nCounter generally had both low expression and low variance on the microarray. Confirming previous findings from spike-in and dilution datasets, this “gold-standard” comparison demonstrated signal compression that varied dramatically by expression level and, to a lesser extent, by dataset. Most importantly, examination of three different cell types revealed that noise levels differed across tissues.Conclusions
Microarray measurements generally correlate with relative RNA molecule counts within optimal ranges but suffer from expression-dependent accuracy bias and precision that varies across datasets. We urge microarray users to consider expression-level effects in signal interpretation and to evaluate noise properties in each dataset independently.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-649) contains supplementary material, which is available to authorized users. 相似文献9.
10.
Background
The Tibetan pig is one of domestic animals indigenous to the Qinghai-Tibet Plateau. Several geographically isolated pig populations are distributed throughout the Plateau. It remained an open question if these populations have experienced different demographic histories and have evolved independent adaptive loci for the harsh environment of the Plateau. To address these questions, we herein investigated ~ 40,000 genetic variants across the pig genome in a broad panel of 678 individuals from 5 Tibetan geographic populations and 34 lowland breeds.Results
Using a series of population genetic analyses, we show that Tibetan pig populations have marked genetic differentiations. Tibetan pigs appear to be 3 independent populations corresponding to the Tibetan, Gansu and Sichuan & Yunnan locations. Each population is more genetically similar to its geographic neighbors than to any of the other Tibetan populations. By applying a locus-specific branch length test, we identified both population-specific and -shared candidate genes under selection in Tibetan pigs. These genes, such as PLA2G12A, RGCC, C9ORF3, GRIN2B, GRID1 and EPAS1, are involved in high-altitude physiology including angiogenesis, pulmonary hypertension, oxygen intake, defense response and erythropoiesis. A majority of these genes have not been implicated in previous studies of highlanders and high-altitude animals.Conclusion
Tibetan pig populations have experienced substantial genetic differentiation. Historically, Tibetan pigs likely had admixture with neighboring lowland breeds. During the long history of colonization in the Plateau, Tibetan pigs have developed a complex biological adaptation mechanism that could be different from that of Tibetans and other animals. Different Tibetan pig populations appear to have both distinct and convergent adaptive loci for the harsh environment of the Plateau.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-834) contains supplementary material, which is available to authorized users. 相似文献11.
Ana M Perez O’Brien Daniela H?ller Solomon A Boison Marco Milanesi Lorenzo Bomba Yuri T Utsunomiya Roberto Carvalheiro Haroldo HR Neves Marcos VB da Silva Curtis P VanTassell Tad S Sonstegard Gábor Mészáros Paolo Ajmone-Marsan Fernando Garcia Johann S?lkner 《遗传、选种与进化》2015,47(1)
Background
Nelore and Gir are the two most important indicine cattle breeds for production of beef and milk in Brazil. Historical records state that these breeds were introduced in Brazil from the Indian subcontinent, crossed to local taurine cattle in order to quickly increase the population size, and then backcrossed to the original breeds to recover indicine adaptive and productive traits. Previous investigations based on sparse DNA markers detected taurine admixture in these breeds. High-density genome-wide analyses can provide high-resolution information on the genetic composition of current Nelore and Gir populations, estimate more precisely the levels and nature of taurine introgression, and shed light on their history and the strategies that were used to expand these breeds.Results
We used the high-density Illumina BovineHD BeadChip with more than 777 K single nucleotide polymorphisms (SNPs) that were reduced to 697 115 after quality control filtering to investigate the structure of Nelore and Gir populations and seven other worldwide populations for comparison. Multidimensional scaling and model-based ancestry estimation clearly separated the indicine, European taurine and African taurine ancestries. The average level of taurine introgression in the autosomal genome of Nelore and Gir breeds was less than 1% but was 9% for the Brahman breed. Analyses based on the mitochondrial SNPs present in the Illumina BovineHD BeadChip did not clearly differentiate taurine and indicine haplotype groupings.Conclusions
The low level of taurine ancestry observed for both Nelore and Gir breeds confirms the historical records of crossbreeding and supports a strong directional selection against taurine haplotypes via backcrossing. Random sampling in production herds across the country and subsequent genotyping would be useful for a more complete view of the admixture levels in the commercial Nelore and Gir populations.Electronic supplementary material
The online version of this article (doi:10.1186/s12711-015-0109-5) contains supplementary material, which is available to authorized users. 相似文献12.
Shengjie Yang Yiyuan Liu Ning Jiang Jing Chen Lindsey Leach Zewei Luo Minghui Wang 《BMC genomics》2014,15(1)
Background
While the possible sources underlying the so-called ‘missing heritability’ evident in current genome-wide association studies (GWAS) of complex traits have been actively pursued in recent years, resolving this mystery remains a challenging task. Studying heritability of genome-wide gene expression traits can shed light on the goal of understanding the relationship between phenotype and genotype. Here we used microarray gene expression measurements of lymphoblastoid cell lines and genome-wide SNP genotype data from 210 HapMap individuals to examine the heritability of gene expression traits.Results
Heritability levels for expression of 10,720 genes were estimated by applying variance component model analyses and 1,043 expression quantitative loci (eQTLs) were detected. Our results indicate that gene expression traits display a bimodal distribution of heritability, one peak close to 0% and the other summit approaching 100%. Such a pattern of the within-population variability of gene expression heritability is common among different HapMap populations of unrelated individuals but different from that obtained in the CEU and YRI trio samples. Higher heritability levels are shown by housekeeping genes and genes associated with cis eQTLs. Both cis and trans eQTLs make comparable cumulative contributions to the heritability. Finally, we modelled gene-gene interactions (epistasis) for genes with multiple eQTLs and revealed that epistasis was not prevailing in all genes but made a substantial contribution in explaining total heritability for some genes analysed.Conclusions
We utilised a mixed effect model analysis for estimating genetic components from population based samples. On basis of analyses of genome-wide gene expression from four HapMap populations, we demonstrated detailed exploitation of the distribution of genetic heritabilities for expression traits from different populations, and highlighted the importance of studying interaction at the gene expression level as an important source of variation underlying missing heritability.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-13) contains supplementary material, which is available to authorized users. 相似文献13.
Background
In invertebrates, genes belonging to dynamically regulated functional categories appear to be less methylated than “housekeeping” genes, suggesting that DNA methylation may modulate gene expression plasticity. To date, however, experimental evidence to support this hypothesis across different natural habitats has been lacking.Results
Gene expression profiles were generated from 30 pairs of genetically identical fragments of coral Acropora millepora reciprocally transplanted between distinct natural habitats for 3 months. Gene expression was analyzed in the context of normalized CpG content, a well-established signature of historical germline DNA methylation. Genes with weak methylation signatures were more likely to demonstrate differential expression based on both transplant environment and population of origin than genes with strong methylation signatures. Moreover, the magnitude of expression differences due to environment and population were greater for genes with weak methylation signatures.Conclusions
Our results support a connection between differential germline methylation and gene expression flexibility across environments and populations. Studies of phylogenetically basal invertebrates such as corals will further elucidate the fundamental functional aspects of gene body methylation in Metazoa.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1109) contains supplementary material, which is available to authorized users. 相似文献14.
Michael A Quail Miriam Smith David Jackson Steven Leonard Thomas Skelly Harold P Swerdlow Yong Gu Peter Ellis 《BMC genomics》2014,15(1)
Background
A minor but significant fraction of samples subjected to next-generation sequencing methods are either mixed-up or cross-contaminated. These events can lead to false or inconclusive results. We have therefore developed SASI-Seq; a process whereby a set of uniquely barcoded DNA fragments are added to samples destined for sequencing. From the final sequencing data, one can verify that all the reads derive from the original sample(s) and not from contaminants or other samples.Results
By adding a mixture of three uniquely barcoded amplicons, of different sizes spanning the range of insert sizes one would normally use for Illumina sequencing, at a spike-in level of approximately 0.1%, we demonstrate that these fragments remain intimately associated with the sample. They can be detected following even the tightest size selection regimes or exome enrichment and can report the occurrence of sample mix-ups and cross-contamination.As a consequence of this work, we have designed a set of 384 eleven-base Illumina barcode sequences that are at least 5 changes apart from each other, allowing for single-error correction and very low levels of barcode misallocation due to sequencing error.Conclusion
SASI-Seq is a simple, inexpensive and flexible tool that enables sample assurance, allows deconvolution of sample mix-ups and reports levels of cross-contamination between samples throughout NGS workflows.Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-110) contains supplementary material, which is available to authorized users. 相似文献15.
16.
17.
18.
19.
Wenlian Qiao Gerald Quon Elizabeth Csaszar Mei Yu Quaid Morris Peter W. Zandstra 《PLoS computational biology》2012,8(12)
The cellular composition of heterogeneous samples can be predicted using an expression deconvolution algorithm to decompose their gene expression profiles based on pre-defined, reference gene expression profiles of the constituent populations in these samples. However, the expression profiles of the actual constituent populations are often perturbed from those of the reference profiles due to gene expression changes in cells associated with microenvironmental or developmental effects. Existing deconvolution algorithms do not account for these changes and give incorrect results when benchmarked against those measured by well-established flow cytometry, even after batch correction was applied. We introduce PERT, a new probabilistic expression deconvolution method that detects and accounts for a shared, multiplicative perturbation in the reference profiles when performing expression deconvolution. We applied PERT and three other state-of-the-art expression deconvolution methods to predict cell frequencies within heterogeneous human blood samples that were collected under several conditions (uncultured mono-nucleated and lineage-depleted cells, and culture-derived lineage-depleted cells). Only PERT''s predicted proportions of the constituent populations matched those assigned by flow cytometry. Genes associated with cell cycle processes were highly enriched among those with the largest predicted expression changes between the cultured and uncultured conditions. We anticipate that PERT will be widely applicable to expression deconvolution strategies that use profiles from reference populations that vary from the corresponding constituent populations in cellular state but not cellular phenotypic identity. 相似文献
20.
Xianxian Liu Xinwei Xiong Jie Yang Lisheng Zhou Bin Yang Huashui Ai Huanban Ma Xianhua Xie Yixuan Huang Shaoming Fang Shijun Xiao Jun Ren Junwu Ma Lusheng Huang 《遗传、选种与进化》2015,47(1)