首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

   

We present psi-square, a program for searching the space of gene vectors. The program starts with a gene vector, i.e., the set of measurements associated with a gene, and finds similar vectors, derives a probabilistic model of these vectors, then repeats search using this model as a query, and continues to update the model and search again, until convergence. When applied to three different pathway-discovery problems, psi-square was generally more sensitive and sometimes more specific than the ad hoc methods developed for solving each of these problems before.  相似文献   

2.
3.
We systematically analyzed the relationships between gene fitness profiles (co-fitness) and drug inhibition profiles (co-inhibition) from several hundred chemogenomic screens in yeast. Co-fitness predicted gene functions distinct from those derived from other assays and identified conditionally dependent protein complexes. Co-inhibitory compounds were weakly correlated by structure and therapeutic class. We developed an algorithm predicting protein targets of chemical compounds and verified its accuracy with experimental testing. Fitness data provide a novel, systems-level perspective on the cell.  相似文献   

4.
Learning module networks from genome-wide location and expression data   总被引:6,自引:0,他引:6  
Xu X  Wang L  Ding D 《FEBS letters》2004,578(3):297-304
  相似文献   

5.
This paper compares pollen spectra derived from modified Tauber traps and moss samples from a selection of woodland types from Bulgaria, the Czech Republic, Georgia, Greece, Poland, Switzerland and Wales. The study examines the representation of individual taxa in the two sampling media and aims to ascertain the duration of pollen deposition captured by a moss. The latter aim was pursued through the calculation of dissimilarity indexes to assess how many years of pollen deposited in a pollen trap yield percentage values that are most similar to those obtained from the moss. The results are broadly scattered; the majority of moss samples being most similar to several years of pollen deposition in the adjacent trap. For a selection of samples, a comparison of the pollen accumulation rate in pollen traps with the pollen concentration in the moss per unit surface indicates that the entrapment and/or preservation of individual pollen types in the moss differ from that in the pollen trap. A comparison of the proportion of different taxa in the moss with the pollen spectrum of 2 years of pollen deposition in the trap also revealed large differences. There is a tendency for bisaccate grains such as Pinus and Picea to have a higher representation in moss than in traps but there is considerable regional variation. The results indicate that pollen proportions from moss samples often represent the pollen deposition of one area over several years. However, bisaccate pollen grains tend to be over-represented in moss samples compared to both pollen traps and, potentially, lake sediments.  相似文献   

6.
We investigated the ability of several principal components analysis (PCA)-based strategies to detect and control for population stratification using data from a multi-center study of epithelial ovarian cancer among women of European-American ethnicity. These include a correction based on an ancestry informative markers (AIMs) panel designed to capture European ancestral variation and corrections utilizing un-thinned genome-wide SNP data; case-control samples were drawn from four geographically distinct North-American sites. The AIMs-only and genome-wide first principal components (PC1) both corresponded to the previously described North or Northwest-Southeast axis of European variation. We found that the genome-wide PCA captured this primary dimension of variation more precisely and identified additional axes of genome-wide variation of relevance to epithelial ovarian cancer. Associations evident between the genome-wide PCs and study site corroborate North American immigration history and suggest that undiscovered dimensions of variation lie within Northern Europe. The structure captured by the genome-wide PCA was also found within control individuals and did not reflect the case-control variation present in the data. The genome-wide PCA highlighted three regions of local LD, corresponding to the lactase (LCT) gene on chromosome 2, the human leukocyte antigen system (HLA) on chromosome 6 and to a common inversion polymorphism on chromosome 8. These features did not compromise the efficacy of PCs from this analysis for ancestry control. This study concludes that although AIMs panels are a cost-effective way of capturing population structure, genome-wide data should preferably be used when available.  相似文献   

7.
Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins) represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO) terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function.  相似文献   

8.
9.
Over the past two decades many quantitative trait loci (QTL) have been detected; however, very few have been incorporated into breeding programs. The recent development of genome-wide association studies (GWAS) in plants provides the opportunity to detect QTL in germplasm collections such as unstructured populations from breeding programs. The overall goal of the barley Coordinated Agricultural Project was to conduct GWAS with the intent to couple QTL detection and breeding. The basic idea is that breeding programs generate a vast amount of phenotypic data and combined with cheap genotyping it should be possible to use GWAS to detect QTL that would be immediately accessible and used by breeding programs. There are several constraints to using breeding program-derived phenotype data for conducting GWAS namely: limited population size and unbalanced data sets. We chose the highly heritable trait heading date to study these two variables. We examined 766 spring barley breeding lines (panel #1) grown in balanced trials and a subset of 384 spring barley breeding lines (panel #2) grown in balanced and unbalanced trials. In panel #1, we detected three major QTL for heading date that have been detected in previous bi-parental mapping studies. Simulation studies showed that population sizes greater than 384 individuals are required to consistently detect QTL. We also showed that unbalanced data sets from panel #2 can be used to detect the three major QTL. However, unbalanced data sets resulted in an increase in the false-positive rate. Interestingly, one-step analysis performed better than two-step analysis in reducing the false-positive rate. The results of this work show that it is possible to use phenotypic data from breeding programs to detect QTL, but that careful consideration of population size and experimental design are required.  相似文献   

10.
11.
12.
13.
Genome-tagged mice (GTM): two sets of genome-wide congenic strains   总被引:6,自引:0,他引:6  
An important approach for understanding complex disease risk using the mouse is to map and ultimately identify the genes conferring risk. Genes contributing to complex traits can be mapped to chromosomal regions using genome scans of large mouse crosses. Congenic strains can then be developed to fine-map a trait and to ascertain the magnitude of the genotype effect in a chromosomal region. Congenic strains are constructed by repeated backcrossing to the background strain with selection at each generation for the presence of a donor chromosomal region, a time-consuming process. One approach to accelerate this process is to construct a library of congenic strains encompassing the entire genome of one strain on the background of the other. We have employed marker-assisted breeding to construct two sets of overlapping congenic strains, called genome-tagged mice (GTMs), that span the entire mouse genome. Both congenic GTM sets contain more than 60 mouse strains, each with on average a 23-cM introgressed segment (range 8 to 58 cM). C57BL/6J was utilized as a background strain for both GTM sets with either DBA/2J or CAST/Ei as the donor strain. The background and donor strains are genetically and phenotypically divergent. The genetic basis for the phenotypic strain differences can be rapidly mapped by simply screening the GTM strains. Furthermore, the phenotype differences can be fine-mapped by crossing appropriate congenic mice to the background strain, and complex gene interactions can be investigated using combinations of these congenics.  相似文献   

14.
15.
Probabilistic tests of topology offer a powerful means of evaluating competing phylogenetic hypotheses. The performance of the nonparametric Shimodaira-Hasegawa (SH) test, the parametric Swofford-Olsen-Waddell-Hillis (SOWH) test, and Bayesian posterior probabilities were explored for five data sets for which all the phylogenetic relationships are known with a very high degree of certainty. These results are consistent with previous simulation studies that have indicated a tendency for the SOWH test to be prone to generating Type 1 errors because of model misspecification coupled with branch length heterogeneity. These results also suggest that the SOWH test may accord overconfidence in the true topology when the null hypothesis is in fact correct. In contrast, the SH test was observed to be much more conservative, even under high substitution rates and branch length heterogeneity. For some of those data sets where the SOWH test proved misleading, the Bayesian posterior probabilities were also misleading. The results of all tests were strongly influenced by the exact substitution model assumptions. Simple models, especially those that assume rate homogeneity among sites, had a higher Type 1 error rate and were more likely to generate misleading posterior probabilities. For some of these data sets, the commonly used substitution models appear to be inadequate for estimating appropriate levels of uncertainty with the SOWH test and Bayesian methods. Reasons for the differences in statistical power between the two maximum likelihood tests are discussed and are contrasted with the Bayesian approach.  相似文献   

16.
Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast   总被引:16,自引:0,他引:16  
Haploinsufficiency is defined as a dominant phenotype in diploid organisms that are heterozygous for a loss-of-function allele. Despite its relevance to human disease, neither the extent of haploinsufficiency nor its precise molecular mechanisms are well understood. We used the complete set of Saccharomyces cerevisiae heterozygous deletion strains to survey the genome for haploinsufficiency via fitness profiling in rich (YPD) and minimal media to identify all genes that confer a haploinsufficient growth defect. This assay revealed that approximately 3% of all approximately 5900 genes tested are haploinsufficient for growth in YPD. This class of genes is functionally enriched for metabolic processes carried out by molecular complexes such as the ribosome. Much of the haploinsufficiency in YPD is alleviated by slowing the growth rate of each strain in minimal media, suggesting that certain gene products are rate limiting for growth only in YPD. Overall, our results suggest that the primary mechanism of haploinsufficiency in yeast is due to insufficient protein production. We discuss the relevance of our findings in yeast to human haploinsufficiency disorders.  相似文献   

17.
Systematic nonrandom mating in populations results in genetic stratification and is predominantly caused by geographic separation, providing the opportunity to infer individuals' birthplace from genetic data. Such inference has been demonstrated for individuals' country of birth, but here we use data from the Northern Finland Birth Cohort 1966 (NFBC1966) to investigate the characteristics of genetic structure within a population and subsequently develop a method for inferring location to a finer scale. Principal component analysis (PCA) shows that while the first PCs are particularly informative for location, there is also location information in the higher-order PCs, but it cannot be captured by a linear model. We introduce a new method, pcLOCATE, which is able to exploit this information to improve the accuracy of location inference. pcLOCATE uses individuals' PC values to estimate the probability of birth in each town and then averages over all towns to give an estimated longitude and latitude of birth using a fully Bayesian model. We apply pcLOCATE to the NFBC1966 data to estimate parental birthplace, testing with successively more PCs and finding the model with the top 23 PCs most accurate, with a median distance of 23 km between the estimated and the true location. pcLOCATE predicts the most recent residence of NFBC1966 individuals to a median distance of 47 km. We also apply pcLOCATE to Indian individuals from the London Life Sciences Prospective Population Study (LOLIPOP) data, and find that birthplace is predicated to a median distance of 54 km from the true location. A method with such accuracy is potentially valuable in population genetics and forensics.  相似文献   

18.
The power of genome-wide SNP association studies is limited, among others, by the large number of false positive test results. To provide a remedy, we combined SNP association analysis with the pathway-driven gene set enrichment analysis (GSEA), recently developed to facilitate handling of genome-wide gene expression data. The resulting GSEA-SNP method rests on the assumption that SNPs underlying a disease phenotype are enriched in genes constituting a signaling pathway or those with a common regulation. Besides improving power for association mapping, GSEA-SNP may facilitate the identification of disease-associated SNPs and pathways, as well as the understanding of the underlying biological mechanisms. GSEA-SNP may also help to identify markers with weak effects, undetectable in association studies without pathway consideration. The program is freely available and can be downloaded from our website.  相似文献   

19.
Estimating pairwise correlation from replicated genome-scale (a.k.a. OMICS) data is fundamental to cluster functionally relevant biomolecules to a cellular pathway. The popular Pearson correlation coefficient estimates bivariate correlation by averaging over replicates. It is not completely satisfactory since it introduces strong bias while reducing variance. We propose a new multivariate correlation estimator that models all replicates as independent and identically distributed (i.i.d.) samples from the multivariate normal distribution. We derive the estimator by maximizing the likelihood function. For small sample data, we provide a resampling-based statistical inference procedure, and for moderate to large sample data, we provide an asymptotic statistical inference procedure based on the Likelihood Ratio Test (LRT). We demonstrate advantages of the new multivariate correlation estimator over Pearson bivariate correlation estimator using simulations and real-world data analysis examples. AVAILABILITY: The estimator and statistical inference procedures have been implemented in an R package 'CORREP' that is available from CRAN [http://cran.r-project.org] and Bioconductor [http://www.bioconductor.org/]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

20.

Background

Human genome sequencing has enabled the association of phenotypes with genetic loci, but our ability to effectively translate this data to the clinic has not kept pace. Over the past 60 years, pharmaceutical companies have successfully demonstrated the safety and efficacy of over 1,200 novel therapeutic drugs via costly clinical studies. While this process must continue, better use can be made of the existing valuable data. In silico tools such as candidate gene prediction systems allow rapid identification of disease genes by identifying the most probable candidate genes linked to genetic markers of the disease or phenotype under investigation. Integration of drug-target data with candidate gene prediction systems can identify novel phenotypes which may benefit from current therapeutics. Such a drug repositioning tool can save valuable time and money spent on preclinical studies and phase I clinical trials.

Methods

We previously used Gentrepid (http://www.gentrepid.org) as a platform to predict 1,497 candidate genes for the seven complex diseases considered in the Wellcome Trust Case-Control Consortium genome-wide association study; namely Type 2 Diabetes, Bipolar Disorder, Crohn's Disease, Hypertension, Type 1 Diabetes, Coronary Artery Disease and Rheumatoid Arthritis. Here, we adopted a simple approach to integrate drug data from three publicly available drug databases: the Therapeutic Target Database, the Pharmacogenomics Knowledgebase and DrugBank; with candidate gene predictions from Gentrepid at the systems level.

Results

Using the publicly available drug databases as sources of drug-target association data, we identified a total of 428 candidate genes as novel therapeutic targets for the seven phenotypes of interest, and 2,130 drugs feasible for repositioning against the predicted novel targets.

Conclusions

By integrating genetic, bioinformatic and drug data, we have demonstrated that currently available drugs may be repositioned as novel therapeutics for the seven diseases studied here, quickly taking advantage of prior work in pharmaceutics to translate ground-breaking results in genetics to clinical treatments.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号