首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
One of the main caveats of association studies is the possible affection by bias due to population stratification. Existing methods rely on model-based approaches like structure and ADMIXTURE or on principal component analysis like EIGENSTRAT. Here we provide a novel visualization technique and describe the problem of population substructure from a graph-theoretical point of view. We group the sequenced individuals into triads, which depict the relational structure, on the basis of a predefined pairwise similarity measure. We then merge the triads into a network and apply community detection algorithms in order to identify homogeneous subgroups or communities, which can further be incorporated as covariates into logistic regression. We apply our method to populations from different continents in the 1000 Genomes Project and evaluate the type 1 error based on the empirical p-values. The application to 1000 Genomes data suggests that the network approach provides a very fine resolution of the underlying ancestral population structure. Besides we show in simulations, that in the presence of discrete population structures, our developed approach maintains the type 1 error more precisely than existing approaches.  相似文献   

2.
The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict) originate from the same population the genomic prediction model is trained on. In this paper we propose an approach based on clustering and resampling to investigate the effect of increasing genetic distance between training and target populations when predicting quantitative traits. This is important for plant and animal genetics, where genomic selection programs rely on the precision of predictions in future rounds of breeding. Therefore, estimating how quickly predictive accuracy decays is important in deciding which training population to use and how often the model has to be recalibrated. We find that the correlation between true and predicted values decays approximately linearly with respect to either FST or mean kinship between the training and the target populations. We illustrate this relationship using simulations and a collection of data sets from mice, wheat and human genetics.  相似文献   

3.
Human-altered environments often challenge native species with a complex spatial distribution of resources. Hostile landscape features can inhibit animal movement (i.e., genetic exchange), while other landscape attributes facilitate gene flow. The genetic attributes of organisms inhabiting such complex environments can reveal the legacy of their movements through the landscape. Thus, by evaluating landscape attributes within the context of genetic connectivity of organisms within the landscape, we can elucidate how a species has coped with the enhanced complexity of human altered environments. In this research, we utilized genetic data from eastern chipmunks (Tamias striatus) in conjunction with spatially explicit habitat attribute data to evaluate the realized permeability of various landscape elements in a fragmented agricultural ecosystem. To accomplish this we 1) used logistic regression to evaluate whether land cover attributes were most often associated with the matrix between or habitat within genetically identified populations across the landscape, and 2) utilized spatially explicit habitat attribute data to predict genetically-derived Bayesian probabilities of population membership of individual chipmunks in an agricultural ecosystem. Consistency between the results of the two approaches with regard to facilitators and inhibitors of gene flow in the landscape indicate that this is a promising new way to utilize both landscape and genetic data to gain a deeper understanding of human-altered ecosystems.  相似文献   

4.
In the context of genetics and breeding research on multiple phenotypic traits, reconstructing the directional or causal structure between phenotypic traits is a prerequisite for quantifying the effects of genetic interventions on the traits. Current approaches mainly exploit the genetic effects at quantitative trait loci (QTLs) to learn about causal relationships among phenotypic traits. A requirement for using these approaches is that at least one unique QTL has been identified for each trait studied. However, in practice, especially for molecular phenotypes such as metabolites, this prerequisite is often not met due to limited sample sizes, high noise levels and small QTL effects. Here, we present a novel heuristic search algorithm called the QTL+phenotype supervised orientation (QPSO) algorithm to infer causal directions for edges in undirected phenotype networks. The two main advantages of this algorithm are: first, it does not require QTLs for each and every trait; second, it takes into account associated phenotypic interactions in addition to detected QTLs when orienting undirected edges between traits. We evaluate and compare the performance of QPSO with another state-of-the-art approach, the QTL-directed dependency graph (QDG) algorithm. Simulation results show that our method has broader applicability and leads to more accurate overall orientations. We also illustrate our method with a real-life example involving 24 metabolites and a few major QTLs measured on an association panel of 93 tomato cultivars. Matlab source code implementing the proposed algorithm is freely available upon request.  相似文献   

5.
6.
7.

Background

Exactly how human tumors grow is uncertain because serial observations are impractical. One approach to reconstruct the histories of individual human cancers is to analyze the current genomic variation between its cells. The greater the variations, on average, the greater the time since the last clonal evolution cycle (“a molecular clock hypothesis”). Here we analyze passenger DNA methylation patterns from opposite sides of 12 primary human colorectal cancers (CRCs) to evaluate whether the variation (pairwise distances between epialleles) is consistent with a single clonal expansion after transformation.

Methodology/Principal Findings

Data from 12 primary CRCs are compared to epigenomic data simulated under a single clonal expansion for a variety of possible growth scenarios. We find that for many different growth rates, a single clonal expansion can explain the population variation in 11 out of 12 CRCs. In eight CRCs, the cells from different glands are all equally distantly related, and cells sampled from the same tumor half appear no more closely related than cells sampled from opposite tumor halves. In these tumors, growth appears consistent with a single “symmetric” clonal expansion. In three CRCs, the variation in epigenetic distances was different between sides, but this asymmetry could be explained by a single clonal expansion with one region of a tumor having undergone more cell division than the other. The variation in one CRC was complex and inconsistent with a simple single clonal expansion.

Conclusions

Rather than a series of clonal expansion after transformation, these results suggest that the epigenetic variation of present-day cancer cells in primary CRCs can almost always be explained by a single clonal expansion.  相似文献   

8.
We describe a new way to develop evidence of causes of biological effects using field-based species sensitivity distributions (SSDs) and show how evidence can be compared when genera or effect endpoints are different among potentially causal agents. To evaluate if a cause is sufficient to elicit an effect, we developed a general SSD. A cause was judged sufficient if the intensity of the stressor at the site predicted the observed proportion of extirpation. To evaluate if an effect is specific to a cause, we developed site-specific SSDs using field-based effect levels of genera occurring in the locality of the study. An effect was judged specific to a cause if susceptible genera were absent and tolerant genera were present. Field-based SSDs were used to assess nutrients and conductivity. Other associations were used to assess metals, sediment, dissolved oxygen, and temperature. A case study at Pigeon Roost Creek, Tennessee, USA, illustrates how the SSDs are used to infer multiple causes. A weight-of-evidence analysis identified nutrients and sediment as probable causes but another unidentified agent appears to be acting as well. This inferential approach has broad application and the causal models for conductivity, nutrients, and deposited sediment can be used at other locations.  相似文献   

9.
Although there are many methods available for inferring copy-number variants (CNVs) from next-generation sequence data, there remains a need for a system that is computationally efficient but that retains good sensitivity and specificity across all types of CNVs. Here, we introduce a new method, estimation by read depth with single-nucleotide variants (ERDS), and use various approaches to compare its performance to other methods. We found that for common CNVs and high-coverage genomes, ERDS performs as well as the best method currently available (Genome STRiP), whereas for rare CNVs and high-coverage genomes, ERDS performs better than any available method. Importantly, ERDS accommodates both unique and highly amplified regions of the genome and does so without requiring separate alignments for calling CNVs and other variants. These comparisons show that for genomes sequenced at high coverage, ERDS provides a computationally convenient method that calls CNVs as well as or better than any currently available method.  相似文献   

10.
We analyzed the tissue carbon, nitrogen, and sulfur stable isotope contents of macrofaunal communities associated with vestimentiferan tubeworms and bathymodiolin mussels from the Gulf of Mexico lower continental slope (970-2800 m). Shrimp in the genus Alvinocaris associated with vestimentiferans from shallow (530 m) and deep (1400-2800 m) sites were used to test the hypothesis that seep animals derive a greater proportion of their nutrition from seeps (i.e. a lower proportion from the surface) at greater depths. To account for spatial variability in the inorganic source pool, we used the differences between the mean tissue δ13C and δ15N of the shrimp in each collection and the mean δ 13C and δ15N values of the vestimentiferans from the same collection, since vestimentiferans are functionally autotrophic and serve as a baseline for environmental isotopic variation. There was a significant negative relationship between this difference and depth for both δ13C and δ15N (p=0.02 and 0.007, respectively), which supports the hypothesis of higher dependence on seep nutrition with depth. The small polychaete worm Protomystides sp. was hypothesized to be a blood parasite of the vestimentiferan Escarpialaminata. There was a highly significant linear relationship between the δ13C values of Protomystides sp. and the E. laminata individuals to which they were attached across all collections (p < 0.001) and within a single collection (p = 0.01), although this relationship was not significant for δ15N and δ34S. We made several other qualitative inferences with respect to the feeding biology of the taxa occurring in these lower slope seeps, some of which have not been described prior to this study.  相似文献   

11.
12.
The Caribbean basin is home to some of the most complex interactions in recent history among previously diverged human populations. Here, we investigate the population genetic history of this region by characterizing patterns of genome-wide variation among 330 individuals from three of the Greater Antilles (Cuba, Puerto Rico, Hispaniola), two mainland (Honduras, Colombia), and three Native South American (Yukpa, Bari, and Warao) populations. We combine these data with a unique database of genomic variation in over 3,000 individuals from diverse European, African, and Native American populations. We use local ancestry inference and tract length distributions to test different demographic scenarios for the pre- and post-colonial history of the region. We develop a novel ancestry-specific PCA (ASPCA) method to reconstruct the sub-continental origin of Native American, European, and African haplotypes from admixed genomes. We find that the most likely source of the indigenous ancestry in Caribbean islanders is a Native South American component shared among inland Amazonian tribes, Central America, and the Yucatan peninsula, suggesting extensive gene flow across the Caribbean in pre-Columbian times. We find evidence of two pulses of African migration. The first pulse—which today is reflected by shorter, older ancestry tracts—consists of a genetic component more similar to coastal West African regions involved in early stages of the trans-Atlantic slave trade. The second pulse—reflected by longer, younger tracts—is more similar to present-day West-Central African populations, supporting historical records of later transatlantic deportation. Surprisingly, we also identify a Latino-specific European component that has significantly diverged from its parental Iberian source populations, presumably as a result of small European founder population size. We demonstrate that the ancestral components in admixed genomes can be traced back to distinct sub-continental source populations with far greater resolution than previously thought, even when limited pre-Columbian Caribbean haplotypes have survived.  相似文献   

13.
Inter-simple sequence repeat (ISSR) markers were used to analyze genetic diversity and relatedness of 15 germplasms of Fagopyrum tataricum. Samples representing 75 individuals were collected from a range of altitudes in the Western Himalaya. The 13 ISSR primers revealed 98.1% polymorphism among populations, whereas average polymorphism was extremely low (2.18%) within populations. The coefficient of population differentiation was 0.9750, with limited gene flow (N m) of 0.0128. The average PIC value of the ISSR markers was high (0.812), with a marker ratio of 0.65 and marker index of 6.66. The genetic diversity of F. tataricum significantly correlated with altitude and gene diversity, Shannon’s index, and the percentage of polymorphic bands. The genetic diversity among populations showed broad genetic base and provided a developmental strategy for crop improvement.  相似文献   

14.
Ecological risk assessments of chemicals are often based on simple measurements of toxicity in individuals. However, the protection goals are often set at the population and community levels. Population models may be a useful tool to extrapolate from individual-level measurements to population-level endpoints. In the present study, the population growth rate (λ) was calculated for three sets of full life-cycle data (Tetranychus urticae exposed to agrimek, and Daphnia pulex exposed to spinosad and diazinon). The results were compared to λ from population models, where survival and/or reproduction were adjusted according to 4 d of data from the same life-cycle data. This was done to determine whether truncated demographic data can give results similar to that obtained with full life-cycle data. The resulting correlations were strong when both effects on survival and reproduction were included in the model (p < .001, 0.93 < R2 < 1.00). There were also strong correlations in several cases when only effects on survival or reproduction were considered, although the total risk to the population tended to be underestimated. The results of the present study show that population models can be useful to extrapolate truncated data on the individual level to more ecologically relevant population-level endpoints.  相似文献   

15.
Computational gene regulation models provide a means for scientists to draw biological inferences from time-course gene expression data. Based on the state-space approach, we developed a new modeling tool for inferring gene regulatory networks, called time-delayed Gene Regulatory Networks (tdGRNs). tdGRN takes time-delayed regulatory relationships into consideration when developing the model. In addition, a priori biological knowledge from genome-wide location analysis is incorporated into the structure of the gene regulatory network. tdGRN is evaluated on both an artificial dataset and a published gene expression data set. It not only determines regulatory relationships that are known to exist but also uncovers potential new ones. The results indicate that the proposed tool is effective in inferring gene regulatory relationships with time delay. tdGRN is complementary to existing methods for inferring gene regulatory networks. The novel part of the proposed tool is that it is able to infer time-delayed regulatory relationships.  相似文献   

16.
Comparative genome-scale analyses of protein-coding gene sequences are employed to examine evidence for whole-genome duplication and horizontal gene transfer. For this purpose, an orthogroup should be delineated to infer evolutionary history regarding each gene, and results of all orthogroup analyses need to be integrated to infer a genome-scale history. An orthogroup is a set of genes descended from a single gene in the last common ancestor of all species under consideration. However, such analyses confront several problems: 1) Analytical pipelines to infer all gene histories with methods comparing species and gene trees are not fully developed, and 2) without detailed analyses within orthogroups, evolutionary events of paralogous genes in the same orthogroup cannot be distinguished for genome-wide integration of results derived from multiple orthogroup analyses. Here I present an analytical pipeline, ORTHOSCOPE* (star), to infer evolutionary histories of animal/plant genes from genome-scale data. ORTHOSCOPE* estimates a tree for a specified gene, detects speciation/gene duplication events that occurred at nodes belonging to only one lineage leading to a species of interest, and then integrates results derived from gene trees estimated for all query genes in genome-wide data. Thus, ORTHOSCOPE* can be used to detect species nodes just after whole-genome duplications as a first step of comparative genomic analyses. Moreover, by examining the presence or absence of genes belonging to species lineages with dense taxon sampling available from the ORTHOSCOPE web version, ORTHOSCOPE* can detect genes lost in specific lineages and horizontal gene transfers. This pipeline is available at https://github.com/jun-inoue/ORTHOSCOPE_STAR.  相似文献   

17.
In the south of France, Leishmania infantum is responsible for numerous cases of canine leishmaniasis (CanL), sporadic cases of human visceral leishmaniasis (VL) and rare cases of cutaneous and muco-cutaneous leishmaniasis (CL and MCL, respectively). Several endemic areas have been clearly identified in the south of France including the Pyrénées-Orientales, Cévennes (CE), Provence (P), Alpes-Maritimes (AM) and Corsica (CO). Within these endemic areas, the two cities of Nice (AM) and Marseille (P), which are located 150 km apart, and their surroundings, concentrate the greatest number of French autochthonous leishmaniasis cases. In this study, 270 L. infantum isolates from an extended time period (1978–2011) from four endemic areas, AM, P, CE and CO, were assessed using Multi-Locus Microsatellite Typing (MLMT). MLMT revealed a total of 121 different genotypes with 91 unique genotypes and 30 repeated genotypes. Substantial genetic diversity was found with a strong genetic differentiation between the Leishmania populations from AM and P. However, exchanges were observed between these two endemic areas in which it seems that strains spread from AM to P. The genetic differentiations in these areas suggest strong epidemiological structuring. A model-based analysis using STRUCTURE revealed two main populations: population A (consisting of samples primarily from the P and AM endemic areas with MON-1 and non-MON-1 strains) and population B consisting of only MON-1 strains essentially from the AM endemic area. For four patients, we observed several isolates from different biological samples which provided insight into disease relapse and re-infection. These findings shed light on the transmission dynamics of parasites in humans. However, further data are required to confirm this hypothesis based on a limited sample set. This study represents the most extensive population analysis of L. infantum strains using MLMT conducted in France.  相似文献   

18.
Air pollution has been associated with increased systemic inflammation markers. We developed a new pathway analysis approach to investigate whether gene variants within relevant pathways (oxidative stress, endothelial function, and metal processing) modified the association between particulate air pollution and fibrinogen, C-reactive protein (CRP), intercellular adhesion molecule-1 (ICAM-1), and vascular cell adhesion molecule-1 (VCAM-1). Our study population consisted of 822 elderly participants of the Normative Aging Study (1999–2011). To investigate the role of biological mechanisms and to reduce the number of comparisons in the analysis, we created pathway-specific scores using gene variants related to each pathway. To select the most appropriate gene variants, we used the least absolute shrinkage and selection operator (Lasso) to relate independent outcomes representative of each pathway (8-hydroxydeoxyguanosine for oxidative stress, augmentation index for endothelial function, and patella lead for metal processing) to gene variants. A high genetic score corresponds to a higher allelic risk profile. We fit mixed-effects models to examine modification by the genetic score of the weekly air pollution association with the outcome. Among participants with higher genetic scores within the oxidative stress pathway, we observed significant associations between particle number and fibrinogen, while we did not find any association among participants with lower scores (pinteraction = 0.04). Compared to individuals with low genetic scores of metal processing gene variants, participants with higher scores had greater effects of particle number on fibrinogen (pinteraction = 0.12), CRP (pinteraction = 0.02), and ICAM-1 (pinteraction = 0.08). This two-stage penalization method is easy to implement and can be used for large-scale genetic applications.  相似文献   

19.
居群遗传结构研究中显性标记数据方法初探   总被引:37,自引:0,他引:37  
钱韦  葛颂 《遗传学报》2001,28(3):244-255
为对比显性标记应用于居群遗传结构研究时不同统计参数的适用性,利用RAPD技术对中国5个居群的100个疣粒野生稻个体进行了遗传结构分析。在衡量居群遗传多样性水平时,多态位点比率(PPB)会低估遗传变异的量,其价值不如Shannon多样性指数和Nei基因多样性指数,而采用Nei指数时不必进行Lynch-Milligan矫正。对个体间遗传关系进行分析时,17种遗传相似性指数矩阵两两之间的Mantel检测都表现出极显著的相关性(r>0.95,t>t  相似文献   

20.
Barley (Hordeum vulgare L.) is an important economic crop for food, feed and industrial raw materials. In the present research, 112 barley landraces from the Shanghai region were genotyped using genotyping-by-sequencing (GBS), and the genetic diversity and population structure were analyzed. The results showed that 210,268 Single Nucleotide Polymorphisms (SNPs) were present in total, and the average poly-morphism information content (PIC) was 0.1642. Genetic diversity and population structure analyses suggested that these barley landraces were differentiated and could be divided into three sub-groups, with morphological traits of row-type and adherence of the hulls the main distinguishing factors between groups. Genotypes with similar or duplicated names were also investigated according to their genetic backgrounds and seed appearances. This study provided valuable information on barley landraces from the Shanghai region, and showed that all these barley landraces should be protected and used for future breeding programs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号