首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The recent dramatic cost reduction of next-generation sequencing technology enables investigators to assess most variants in the human genome to identify risk variants for complex diseases. However, sequencing large samples remains very expensive. For a study sample with existing genotype data, such as array data from genome-wide association studies, a cost-effective approach is to sequence a subset of the study sample and then to impute the rest of the study sample, using the sequenced subset as a reference panel. The use of such an internal reference panel identifies population-specific variants and avoids the problem of a substantial mismatch in ancestry background between the study population and the reference population. To efficiently select an internal panel, we introduce an idea of phylogenetic diversity from mathematical phylogenetics and comparative genomics. We propose the “most diverse reference panel”, defined as the subset with the maximal “phylogenetic diversity”, thereby incorporating individuals that span a diverse range of genotypes within the sample. Using data both from simulations and from the 1000 Genomes Project, we show that the most diverse reference panel can substantially improve the imputation accuracy compared to randomly selected reference panels, especially for the imputation of rare variants. The improvement in imputation accuracy holds across different marker densities, reference panel sizes, and lengths for the imputed segments. We thus propose a novel strategy for planning sequencing studies on samples with existing genotype data.  相似文献   

2.
Genotype-Imputation Accuracy across Worldwide Human Populations   总被引:2,自引:0,他引:2  
A current approach to mapping complex-disease-susceptibility loci in genome-wide association (GWA) studies involves leveraging the information in a reference database of dense genotype data. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and tested for disease association. This imputation strategy has been successful for GWA studies in populations well represented by existing reference panels. We used genotypes at 513,008 autosomal single-nucleotide polymorphism (SNP) loci in 443 unrelated individuals from 29 worldwide populations to evaluate the “portability” of the HapMap reference panels for imputation in studies of diverse populations. When a single HapMap panel was leveraged for imputation of randomly masked genotypes, European populations had the highest imputation accuracy, followed by populations from East Asia, Central and South Asia, the Americas, Oceania, the Middle East, and Africa. For each population, we identified “optimal” mixtures of reference panels that maximized imputation accuracy, and we found that in most populations, mixtures including individuals from at least two HapMap panels produced the highest imputation accuracy. From a separate survey of additional SNPs typed in the same samples, we evaluated imputation accuracy in the scenario in which all genotypes at a given SNP position were unobserved and were imputed on the basis of data from a commercial “SNP chip,” again finding that most populations benefited from the use of combinations of two or more HapMap reference panels. Our results can serve as a guide for selecting appropriate reference panels for imputation-based GWA analysis in diverse populations.  相似文献   

3.
Microsatellite instability (MSI) occurs in over 90% of Lynch syndrome cancers and is considered a hallmark of the disease. MSI is an early event in colon tumor development, but screening polyps for MSI remains controversial because of reduced sensitivity compared to more advanced neoplasms. To increase sensitivity, we investigated the use of a novel type of marker consisting of long mononucleotide repeat (LMR) tracts. Adenomas from 160 patients, ranging in age from 29–55 years old, were screened for MSI using the new markers and compared with current marker panels and immunohistochemistry standards. Overall, 15 tumors were scored as MSI-High using the LMRs compared to 9 for the NCI panel and 8 for the MSI Analysis System (Promega). This difference represents at least a 1.7-fold increase in detection of MSI-High lesions over currently available markers. Moreover, the number of MSI-positive markers per sample and the size of allelic changes were significantly greater with the LMRs (p = 0.001), which increased confidence in MSI classification. The overall sensitivity and specificity of the LMR panel for detection of mismatch repair deficient lesions were 100% and 96%, respectively. In comparison, the sensitivity and specificity of the MSI Analysis System were 67% and 100%; and for the NCI panel, 75% and 97%. The difference in sensitivity between the LMR panel and the other panels was statistically significant (p<0.001). The increased sensitivity for detection of MSI-High phenotype in early colorectal lesions with the new LMR markers indicates that MSI screening for the early detection of Lynch syndrome might be feasible.  相似文献   

4.
The application of Next-Generation Sequencing for studying the genetics of papillary thyroid carcinomas (PTC) has recently revealed new somatic mutations and gene fusions as potential new tumor-initiating events in patients without any known driver lesion. Gene and miRNA expression analyses defined clinically relevant subclasses correlated to tumor progression. In addition, it has been shown that tumor driver mutations in BRAF, and RET rearrangements - altogether termed “BRAF-like” carcinomas - have a very similar expression pattern and constitute a distinct category. Conversely, “RAS-like” carcinomas have a different genomic, epigenomic, and proteomic profile. These findings justify the need to reconsider PTC classification schemes.  相似文献   

5.
Nowadays, depression is a major issue in public health. Because of the partial overlap between the brain structures involved in depression, olfaction and emotion, the study of olfactory function could be a relevant way to find specific cognitive markers of depression. This study aims at determining whether the olfactory impairments are state or trait markers of major depressive episode (MDE) through the study of the olfactory parameters involving the central olfactory pathway. In a pilot study, we evaluated prospectively 18 depressed patients during acute episodes of depression and 6 weeks after antidepressant treatment (escitalopram) against 54 healthy volunteers, matched by age, gender and smoking status. We investigated the participants’ abilities to identify odors (single odors and in binary mixture), to evaluate and discriminate the odors’ intensity, and determine the hedonic valence of odors. The results revealed an “olfactory anhedonia” expressed by decrease of hedonic score for high emotional odorant as potential state marker of MDE. Moreover, these patients experienced an “olfactory negative alliesthesia”, during the odor intensity evaluation, and failed to identify correctly two odorants with opposite valences in a binary iso-mixture, which constitute potential trait markers of the disease. This study provides preliminary evidence for olfactory impairments associated with MDE (state marker) that are persistent after the clinical improvement of depressive symptoms (trait marker). These results could be explained by the chronicity of depression and/or by the impact of therapeutic means used (antidepressant treatment). They need to be confirmed particularly the ones obtained in complex olfactory environment which corresponds a more objective daily life situation.  相似文献   

6.
Imputation of genotypes in a study sample can make use of sequenced or densely genotyped external reference panels consisting of individuals that are not from the study sample. It also can employ internal reference panels, incorporating a subset of individuals from the study sample itself. Internal panels offer an advantage over external panels because they can reduce imputation errors arising from genetic dissimilarity between a population of interest and a second, distinct population from which the external reference panel has been constructed. As the cost of next-generation sequencing decreases, internal reference panel selection is becoming increasingly feasible. However, it is not clear how best to select individuals to include in such panels. We introduce a new method for selecting an internal reference panel—minimizing the average distance to the closest leaf (ADCL)—and compare its performance relative to an earlier algorithm: maximizing phylogenetic diversity (PD). Employing both simulated data and sequences from the 1000 Genomes Project, we show that ADCL provides a significant improvement in imputation accuracy, especially for imputation of sites with low-frequency alleles. This improvement in imputation accuracy is robust to changes in reference panel size, marker density, and length of the imputation target region.  相似文献   

7.
With the astonishing rate that genomic and metagenomic sequence data sets are accumulating, there are many reasons to constrain the data analyses. One approach to such constrained analyses is to focus on select subsets of gene families that are particularly well suited for the tasks at hand. Such gene families have generally been referred to as “marker” genes. We are particularly interested in identifying and using such marker genes for phylogenetic and phylogeny-driven ecological studies of microbes and their communities (e.g., construction of species trees, phylogenetic based assignment of metagenomic sequence reads to taxonomic groups, phylogeny-based assessment of alpha- and beta-diversity of microbial communities from metagenomic data). We therefore refer to these as PhyEco (for phylogenetic and phylogenetic ecology) markers. The dual use of these PhyEco markers means that we needed to develop and apply a set of somewhat novel criteria for identification of the best candidates for such markers. The criteria we focused on included universality across the taxa of interest, ability to be used to produce robust phylogenetic trees that reflect as much as possible the evolution of the species from which the genes come, and low variation in copy number across taxa.We describe here an automated protocol for identifying potential PhyEco markers from a set of complete genome sequences. The protocol combines rapid searching, clustering and phylogenetic tree building algorithms to generate protein families that meet the criteria listed above. We report here the identification of PhyEco markers for different taxonomic levels including 40 for “all bacteria and archaea”, 114 for “all bacteria (greatly expanding on the ∼30 commonly used), and 100 s to 1000 s for some of the individual phyla of bacteria. This new list of PhyEco markers should allow much more detailed automated phylogenetic and phylogenetic ecology analyses of these groups than possible previously.  相似文献   

8.
Fibroblast growth factor receptors (FGFRs) are activated by mutation and overexpressed in bladder cancers (BCs), and FGFR inhibitors are currently being evaluated in clinical trials in BC patients. However, BC cells display marked heterogeneity in their responses to FGFR inhibitors, and the biological mechanisms underlying this heterogeneity are not well defined. Here we used a novel inhibitor of FGFRs 1–3 and RNAi to determine the effects of inhibiting FGFR1 or FGFR3 in a panel of human BC cell lines. We observed that FGFR1 was expressed in BC cells that also expressed the “mesenchymal” markers ZEB1 and vimentin, whereas FGFR3 expression was restricted to the E-cadherin- and p63-positive “epithelial” subset. Sensitivity to the growth-inhibitory effects of BGJ-398 was also restricted to the “epithelial” BC cells and it correlated directly with FGFR3 mRNA levels but not with the presence of activating FGFR3 mutations. In contrast, BGJ-398 did not strongly inhibit proliferation but did block invasion in the “mesenchymal” BC cells in vitro. Similarly, BGJ-398 did not inhibit primary tumor growth but blocked the production of circulating tumor cells (CTCs) and the formation of lymph node and distant metastases in mice bearing orthotopically implanted “mesenchymal” UM-UC3 cells. Together, our data demonstrate that FGFR1 and FGFR3 have largely non-overlapping roles in regulating invasion/metastasis and proliferation in distinct “mesenchymal” and “epithelial” subsets of human BC cells. The results suggest that the tumor EMT phenotype will be an important determinant of the biological effects of FGFR inhibitors in patients.  相似文献   

9.

Background

The diagnostic approach to dizzy, older patients is not straightforward as many organ systems can be involved and evidence for diagnostic strategies is lacking. A first differentiation in diagnostic subtypes or profiles may guide the diagnostic process of dizziness and can serve as a classification system in future research. In the literature this has been done, but based on pathophysiological reasoning only.

Objective

To establish a classification of diagnostic profiles of dizziness based on empirical data.

Design

Cross-sectional study.

Participants and Setting

417 consecutive patients of 65 years and older presenting with dizziness to 45 primary care physicians in the Netherlands from July 2006 to January 2008.

Methods

We performed tests, including patient history, and physical and additional examination, previously selected by an international expert panel and based on an earlier systematic review. We used the results of these tests in a principal component analysis for exploration, data-reduction and finally differentiation into diagnostic dizziness profiles.

Results

Demographic data and the results of the tests yielded 221 variables, of which 49 contributed to the classification of dizziness into six diagnostic profiles, that may be named as follows: “frailty”, “psychological”, “cardiovascular”, “presyncope”, “non-specific dizziness” and “ENT”. These explained 32% of the variance.

Conclusions

Empirically identified components classify dizziness into six profiles. This classification takes into account the heterogeneity and multicausality of dizziness and may serve as starting point for research on diagnostic strategies and can be a first step in an evidence based diagnostic approach of dizzy older patients.  相似文献   

10.
Whole-genome radiation hybrid (RH) panels have been constructed for several species, including cattle. RH panels have proven to be an extremely powerful tool to construct high-density maps, which is an essential step in the identification of genes controlling important traits, and they can be used to establish high-resolution comparative maps. Although bovine RH panels can be used with ovine markers to construct sheep RH maps based on bovine genome organization, only some (c. 50%) of the markers available in sheep can be successfully mapped in the bovine genome. So, with the development of genomics and genome sequencing projects, there is a need for a high-resolution RH panel in sheep to map ovine markers. Consequently, we have constructed a 12 000-rad ovine whole-genome RH panel. Two hundred and eight hybrid clones were produced, of which 90 were selected based on their retention frequency. The final panel had an average marker retention frequency of 31.8%. The resolution of this 12 000-rad panel (SheepRH) was estimated by constructing an RH framework map for a 23-Mb region of sheep chromosome 18 (OAR18) that contains a QTL for scrapie susceptibility.  相似文献   

11.
Animal hybridization is well documented, but evolutionary outcomes and conservation priorities often differ for natural and anthropogenic hybrids. Among primates, an order with many endangered species, the two contexts can be hard to disentangle from one another, which carries important conservation implications. Callithrix marmosets give us a unique glimpse of genetic hybridization effects under distinct natural and human-induced contexts. Here, we use a 44 autosomal microsatellite marker panel to examine genome-wide admixture levels and introgression at a natural C. jacchus and C. penicillata species border along the São Francisco River in NE Brazil and in an area of Rio de Janeiro state where humans introduced these species exotically. Additionally, we describe for the first time autosomal genetic diversity in wild C. penicillata and expand previous C. jacchus genetic data. We characterize admixture within the natural zone as bimodal where hybrid ancestry is biased toward one parental species or the other. We also show evidence that São Francisco River islands are gateways for bidirectional gene flow across the species border. In the anthropogenic zone, marmosets essentially form a hybrid swarm with intermediate levels of admixture, likely from the absence of strong physical barriers to interspecific breeding. Our data show that while hybridization can occur naturally, the presence of physical, even if leaky, barriers to hybridization is important for maintaining species genetic integrity. Thus, we suggest further study of hybridization under different contexts to set well informed conservation guidelines for hybrid populations that often fit somewhere between “natural” and “man-made.”  相似文献   

12.
The shortnose sturgeon, Acipenser brevirostrum, oft considered a phylogenetic relic, is listed as an “endangered species threatened with extinction” in the US and “Vulnerable” on the IUCN Red List. Effective conservation of A. brevirostrum depends on understanding its diversity and evolutionary processes, yet challenges associated with the polyploid nature of its nuclear genome have heretofore limited population genetic analysis to maternally inherited haploid characters. We developed a suite of polysomic microsatellite DNA markers and characterized a sample of 561 shortnose sturgeon collected from major extant populations along the North American Atlantic coast. The 181 alleles observed at 11 loci were scored as binary loci and the data were subjected to multivariate ordination, Bayesian clustering, hierarchical partitioning of variance, and among-population distance metric tests. The methods uncovered moderately high levels of gene diversity suggesting population structuring across and within three metapopulations (Northeast, Mid-Atlantic, and Southeast) that encompass seven demographically discrete and evolutionarily distinct lineages. The predicted groups are consistent with previously described behavioral patterns, especially dispersal and migration, supporting the interpretation that A. brevirostrum exhibit adaptive differences based on watershed. Combined with results of prior genetic (mitochondrial DNA) and behavioral studies, the current work suggests that dispersal is an important factor in maintaining genetic diversity in A. brevirostrum and that the basic unit for conservation management is arguably the local population.  相似文献   

13.
Plant breeding populations exhibit varying levels of structure and admixture; these features are likely to induce heterogeneity of marker effects across subpopulations. Traditionally, structure has been dealt with as a potential confounder, and various methods exist to “correct” for population stratification. However, these methods induce a mean correction that does not account for heterogeneity of marker effects. The animal breeding literature offers a few recent studies that consider modeling genetic heterogeneity in multibreed data, using multivariate models. However, these methods have received little attention in plant breeding where population structure can have different forms. In this article we address the problem of analyzing data from heterogeneous plant breeding populations, using three approaches: (a) a model that ignores population structure [A-genome-based best linear unbiased prediction (A-GBLUP)], (b) a stratified (i.e., within-group) analysis (W-GBLUP), and (c) a multivariate approach that uses multigroup data and accounts for heterogeneity (MG-GBLUP). The performance of the three models was assessed on three different data sets: a diversity panel of rice (Oryza sativa), a maize (Zea mays L.) half-sib panel, and a wheat (Triticum aestivum L.) data set that originated from plant breeding programs. The estimated genomic correlations between subpopulations varied from null to moderate, depending on the genetic distance between subpopulations and traits. Our assessment of prediction accuracy features cases where ignoring population structure leads to a parsimonious more powerful model as well as others where the multivariate and stratified approaches have higher predictive power. In general, the multivariate approach appeared slightly more robust than either the A- or the W-GBLUP.  相似文献   

14.
The brain is a large-scale complex network often referred to as the “connectome”. Cognitive functions and information processing are mainly based on the interactions between distant brain regions. However, most of the ‘feature extraction’ methods used in the context of Brain Computer Interface (BCI) ignored the possible functional relationships between different signals recorded from distinct brain areas. In this paper, the functional connectivity quantified by the phase locking value (PLV) was introduced to characterize the evoked responses (ERPs) obtained in the case of target and non-targets visual stimuli. We also tested the possibility of using the functional connectivity in the context of ‘P300 speller’. The proposed approach was compared to the well-known methods proposed in the state of the art of “P300 Speller”, mainly the peak picking, the area, time/frequency based features, the xDAWN spatial filtering and the stepwise linear discriminant analysis (SWLDA). The electroencephalographic (EEG) signals recorded from ten subjects were analyzed offline. The results indicated that phase synchrony offers relevant information for the classification in a P300 speller. High synchronization between the brain regions was clearly observed during target trials, although no significant synchronization was detected for a non-target trial. The results showed also that phase synchrony provides higher performance than some existing methods for letter classification in a P300 speller principally when large number of trials is available. Finally, we tested the possible combination of both approaches (classical features and phase synchrony). Our findings showed an overall improvement of the performance of the P300-speller when using Peak picking, the area and frequency based features. Similar performances were obtained compared to xDAWN and SWLDA when using large number of trials.  相似文献   

15.
The extremes of phenotype displayed by the domestic dog, as well as the largest number of naturally occurring inherited diseases in any mammalian species except man (>450), have generated a large interest in genomic linkage mapping in the species. Marker sets for linkage mapping should ideally show both high levels of polymorphism among the target group of animals and an even spacing of markers across the whole genome. Currently a microsatellite marker set known as Minimal Screening Set 2 (MSS2) is widely used. Here, we have extended this marker set by filling in gaps as noted from the marker positions in the CanFam genome assembly (1.0) and the 5000cR radiation hybrid (RH) map. An additional 183 markers have been positioned to increase the coverage of the MSS2 set wherever it contains a gap >9 mb or 1000(5000) RH units. We have called the marker set derived from the MSS2 set and these 183 markers, MSS3. The average physical spacing of markers in the complete 507 marker MSS3 set is 5 mb, whereas average heterozygosity of the 183 new markers on a panel of 10 dogs of differing breeds is 0.74. This marker group will allow genome-wide scans in the dog to be conducted at close to 5 cM resolution.  相似文献   

16.
17.
Micropathogens (viruses, bacteria, fungi, parasitic protozoa) share a common trait, which is partial clonality, with wide variance in the respective influence of clonality and sexual recombination on the dynamics and evolution of taxa. The discrimination of distinct lineages and the reconstruction of their phylogenetic history are key information to infer their biomedical properties. However, the phylogenetic picture is often clouded by occasional events of recombination across divergent lineages, limiting the relevance of classical phylogenetic analysis and dichotomic trees. We have applied a network analysis based on graph theory to illustrate the relationships among genotypes of Trypanosoma cruzi, the parasitic protozoan responsible for Chagas disease, to identify major lineages and to unravel their past history of divergence and possible recombination events. At the scale of T. cruzi subspecific diversity, graph theory-based networks applied to 22 isoenzyme loci (262 distinct Multi-Locus-Enzyme-Electrophoresis -MLEE) and 19 microsatellite loci (66 Multi-Locus-Genotypes -MLG) fully confirms the high clustering of genotypes into major lineages or “near-clades”. The release of the dichotomic constraint associated with phylogenetic reconstruction usually applied to Multilocus data allows identifying putative hybrids and their parental lineages. Reticulate topology suggests a slightly different history for some of the main “near-clades”, and a possibly more complex origin for the putative hybrids than hitherto proposed. Finally the sub-network of the near-clade T. cruzi I (28 MLG) shows a clustering subdivision into three differentiated lesser near-clades (“Russian doll pattern”), which confirms the hypothesis recently proposed by other investigators. The present study broadens and clarifies the hypotheses previously obtained from classical markers on the same sets of data, which demonstrates the added value of this approach. This underlines the potential of graph theory-based network analysis for describing the nature and relationships of major pathogens, thereby opening stimulating prospects to unravel the organization, dynamics and history of major micropathogen lineages.  相似文献   

18.
Hepatocellular carcinoma is one of the most heterogeneous cancers, as reflected by its multiple grades and difficulty to subtype. In this study, we integrated copy number variation, DNA methylation, mRNA, and miRNA data with the developed “cluster of cluster” method and classified 256 HCC samples from TCGA (The Cancer Genome Atlas) into five major subgroups (S1-S5). We observed that this classification was associated with specific mutations and protein expression, and we detected that each subgroup had distinct molecular signatures. The subclasses were associated not only with survival but also with clinical observations. S1 was characterized by bulk amplification on 8q24, TP53 mutation, low lipid metabolism, highly expressed onco-proteins, attenuated tumor suppressor proteins and a worse survival rate. S2 and S3 were characterized by telomere hypomethylation and a low expression of TERT and DNMT1/3B. Compared to S2, S3 was associated with less copy number variation and some good prognosis biomarkers, including CRP and CYP2E1. In contrast, the mutation rate of CTNNB1 was higher in S3. S4 was associated with bulk amplification and various molecular characteristics at different biological levels. In summary, we classified the HCC samples into five subgroups using multiple “-omics” data. Each subgroup had a distinct survival rate and molecular signature, which may provide information about the pathogenesis of subtypes in HCC.  相似文献   

19.
20.
Characterizing the spatial patterns of genetic diversity in human populations has a wide range of applications, from detecting genetic mutations associated with disease to inferring human history. Current approaches, including the widely used principal-component analysis, are not suited for the analysis of linked markers, and local and long-range linkage disequilibrium (LD) can dramatically reduce the accuracy of spatial localization when unaccounted for. To overcome this, we have introduced an approach that performs spatial localization of individuals on the basis of their genetic data and explicitly models LD among markers by using a multivariate normal distribution. By leveraging external reference panels, we derive closed-form solutions to the optimization procedure to achieve a computationally efficient method that can handle large data sets. We validate the method on empirical data from a large sample of European individuals from the POPRES data set, as well as on a large sample of individuals of Spanish ancestry. First, we show that by modeling LD, we achieve accuracy superior to that of existing methods. Importantly, whereas other methods show decreased performance when dense marker panels are used in the inference, our approach improves in accuracy as more markers become available. Second, we show that accurate localization of genetic data can be achieved with only a part of the genome, and this could potentially enable the spatial localization of admixed samples that have a fraction of their genome originating from a given continent. Finally, we demonstrate that our approach is resistant to distortions resulting from long-range LD regions; such distortions can dramatically bias the results when unaccounted for.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号