首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Frequent hitters are compounds that are detected as a "hit" in multiple high-throughput screening (HTS) assays. Such behavior is specific (e.g., target family related) or unspecific (e.g., reactive compounds) or can result from a combination of such behaviors. Detecting such hits while predicting the underlying reason behind their promiscuous behavior is desirable because it provides valuable information not only about the compounds themselves but also about the assay methodology and target classes at hand. This information can also greatly reduce cost and time during HTS hit profiling. The present study exemplifies how to mine large HTS data repositories, such as the one at Boehringer Ingelheim, to identify frequent hitters, gain further insights into the causes of promiscuous behavior, and generate models for predicting promiscuous compounds. Applications of this approach are demonstrated using two recent large-scale HTS assays. The authors believe this analysis and its concrete applications are valuable tools for streamlining and accelerating decision-making processes during the course of hit discovery.  相似文献   

2.
Thomas SC  Hill WG 《Genetics》2000,155(4):1961-1972
Previous techniques for estimating quantitative genetic parameters, such as heritability in populations where exact relationships are unknown but are instead inferred from marker genotypes, have used data from individuals on a pairwise level only. At this level, families are weighted according to the number of pairs within which each family appears, hence by size rather than information content, and information from multiple relationships is lost. Estimates of parameters are therefore not the most efficient achievable. Here, Markov chain Monte Carlo techniques have been used to partition the population into complete sibships, including, if known, prior knowledge of the distribution of family sizes. These pedigrees have then been used with restricted maximum likelihood under an animal model to estimate quantitative genetic parameters. Simulations to compare the properties of parameter estimates with those of existing techniques indicate that the use of sibship reconstruction is superior to earlier methods, having lower mean square errors and showing nonsignificant downward bias. In addition, sibship reconstruction allows the estimation of population allele frequencies that account for the relationships within the sample, so prior knowledge of allele frequencies need not be assumed. Extensions to these techniques allow reconstruction of half sibships when some or all of the maternal genotypes are known.  相似文献   

3.
A test statistic to detect errors in sib-pair relationships.   总被引:4,自引:2,他引:2  
Several authors have proposed algorithms to detect Mendelian errors in human genetic linkage data. Most currently available methods use likelihood-based methods on multiplex family data to identify typing or pedigree errors. These algorithms cannot be applied in many sib-pair collections, because of lack of parental-genotype information. Nonetheless, misspecifying the relationships between individuals has serious consequences for sib-pair linkage studies: false relationships bias the statistics designed to identify linkage with disease phenotypes. To test the hypothesis that two individuals are sibs, we propose a test statistic based on the summation, over a large number of genetic markers, of the number of alleles shared identical by state by a pair of individuals, for each marker. The test statistic has an approximately normal distribution under the null hypothesis, and extreme negative values correspond to nonsib pairs. Power and significance studies show that the test statistic calculated by use of 50 unlinked markers has 96% power to detect half-sibs and has 100% power to detect unrelated individuals as not full-sib pairs, with a 5% false-positive rate. Furthermore, extreme positive values of the test statistic identify sibs as MZ twins.  相似文献   

4.
Following the success of small-molecule high-throughput screening (HTS) in drug discovery, other large-scale screening techniques are currently revolutionizing the biological sciences. Powerful new statistical tools have been developed to analyze the vast amounts of data in DNA chip studies, but have not yet found their way into compound screening. In HTS, characterization of single-point hit lists is often done only in retrospect after the results of confirmation experiments are available. However, for prioritization, for optimal use of resources, for quality control, and for comparison of screens it would be extremely valuable to predict the rates of false positives and false negatives directly from the primary screening results. Making full use of the available information about compounds and controls contained in HTS results and replicated pilot runs, the Z score and from it the p value can be estimated for each measurement. Based on this consideration, we have applied the concept of p-value distribution analysis (PVDA), which was originally developed for gene expression studies, to HTS data. PVDA allowed prediction of all relevant error rates as well as the rate of true inactives, and excellent agreement with confirmation experiments was found.  相似文献   

5.
The ability to infer relationships between groups of sequences, either by searching for their evolutionary history or by comparing their sequence similarity, can be a crucial step in hypothesis testing. Interpreting relationships of human immunodeficiency virus type 1 (HIV-1) sequences can be challenging because of their rapidly evolving genomes, but it may also lead to a better understanding of the underlying biology. Several studies have focused on the evolution of HIV-1, but there is little information to link sequence similarities and evolutionary histories of HIV-1 to the epidemiological information of the infected individual. Our goal was to correlate patterns of HIV-1 genetic diversity with epidemiological information, including risk and demographic factors. These correlations were then used to predict epidemiological information through analyzing short stretches of HIV-1 sequence. Using standard phylogenetic and phenetic techniques on 100 HIV-1 subtype B sequences, we were able to show some correlation between the viral sequences and the geographic area of infection and the risk of men who engage in sex with men. To help identify more subtle relationships between the viral sequences, the method of multidimensional scaling (MDS) was performed. That method identified statistically significant correlations between the viral sequences and the risk factors of men who engage in sex with men and individuals who engage in sex with injection drug users or use injection drugs themselves. Using tree construction, MDS, and newly developed likelihood assignment methods on the original 100 samples we sequenced, and also on a set of blinded samples, we were able to predict demographic/risk group membership at a rate statistically better than by chance alone. Such methods may make it possible to identify viral variants belonging to specific demographic groups by examining only a small portion of the HIV-1 genome. Such predictions of demographic epidemiology based on sequence information may become valuable in assigning different treatment regimens to infected individuals.  相似文献   

6.
Numerous family studies have been performed to assess the associations between cancer incidence and genetic and non-genetic risk factors and to quantitatively evaluate the cancer risk attributable to these factors. However, mathematical models that account for a measured hereditary susceptibility gene have not been fully explored in family studies. In this report, we proposed statistical approaches to precisely model a measured susceptibility gene fitted to family data and simultaneously determine the combined effects of individual risk factors and their interactions. Our approaches are structured for age-specific risk models based on Cox proportional hazards regression methods. They are useful for analyses of families and extended pedigrees in which measured risk genotypes are segregated within the family and are robust even when the genotypes are available only in some members of a family. We exemplified these methods by analyzing six extended pedigrees ascertained through soft-tissue sarcoma patients with p53 germ-line mutations. Our analyses showed that germ-line p53 mutations and sex had significant interaction effects on cancer risk. Our proposed methods in family studies are accurate and robust for assessing age-specific cancer risk attributable to a measured hereditary susceptibility gene, providing valuable inferences for genetic counseling and clinical management.  相似文献   

7.
Ozmeriç N  Preus HR  Olsen I 《Anaerobe》1999,5(6):571-577
Porphyromonas gingivalis and Prevotella intermedia are black-pigmented, putative periodontopathogenic bacteria considered to cause some forms of periodontal disease. Porphyromonas gingivalis and P. intermedia can be transmitted between humans and produce periodontal disease in susceptible hosts. In this article, studies using molecular typing methods for determining the transmission of black-pigmented, putative periodontopathogens between family members are reviewed. As individuals living close to each other are more prone to transmit bacteria, the studies on transmission of periodontopathogens have been performed on family members. It has been shown that black-pigmented bacteria are not only transferred between spouses but also between parents and child. Since only a limited number of studies have been done, longitudinal and controlled studies should be carried out to elucidate further the transmittance potential of these bacteria.  相似文献   

8.
9.
Combining different sources of information is essential for a complete understanding of the process of genetic differentiation between species. The Iberian and North African wall lizard ( Podarcis ) species complex has been the object of several studies regarding morphological and mitochondrial DNA variation but, so far, no large-scale survey of nuclear variation within this group has been accomplished. In this study, ten polymorphic allozyme loci were studied in 569 individuals collected across the Iberian Peninsula and North Africa. The obtained data were analysed using both conventional population genetic tools and recent Bayesian model-based clustering methods. Our results show that there are several well-differentiated entities corroborating the major splits observed in mtDNA analyses. These groups correspond not only to the fully recognized species Podarcis bocagei , Podarcis carbonelli , and Podarcis vaucheri but also to multiple forms within the polytypic Podarcis hispanica , all of which have a similar level of differentiation to that observed between the acknowledged species. However, relationships between forms are weakly supported both by population and individual clustering methods, suggesting a scenario of a rapid diversification that contrasts to the clear bifurcating model assumed from previous mtDNA analyses. Individual multilocus analyses report few individuals misassigned or apparently admixed, some of which are most likely explained by the persistence of high levels of ancestral polymorphism. Other admixed individuals, however, are probably the result of limited levels of gene flow between forms.  © 2007 The Linnean Society of London, Biological Journal of the Linnean Society , 2007, 91 , 121–133.  相似文献   

10.
11.
YV Sun 《Human genetics》2012,131(10):1677-1686
Millions of genetic variants have been assessed for their effects on the trait of interest in genome-wide association studies (GWAS). The complex traits are affected by a set of inter-related genes. However, the typical GWAS only examine the association of a single genetic variant at a time. The individual effects of a complex trait are usually small, and the simple sum of these individual effects may not reflect the holistic effect of the genetic system. High-throughput methods enable genomic studies to produce a large amount of data to expand the knowledge base of the biological systems. Biological networks and pathways are built to represent the functional or physical connectivity among genes. Integrated with GWAS data, the network- and pathway-based methods complement the approach of single genetic variant analysis, and may improve the power to identify trait-associated genes. Taking advantage of the biological knowledge, these approaches are valuable to interpret the functional role of the genetic variants, and to further understand the molecular mechanism influencing the traits. The network- and pathway-based methods have demonstrated their utilities, and will be increasingly important to address a number of challenges facing the mainstream GWAS.  相似文献   

12.
Identification of genetic variants via high-throughput sequencing (HTS) technologies has been essential for both fundamental and clinical studies. However, to what extent the genome sequence composition affects variant calling remains unclear. In this study, we identified 63,897 multi-copy sequences (MCSs) with a minimum length of 300 bp, each of which occurs at least twice in the human genome. The 151,749 genomic loci (multi-copy regions, or MCRs) harboring these MCSs account for 1.98% of the genome and are distributed unevenly across chromosomes. MCRs containing the same MCS tend to be located on the same chromosome. Gene Ontology (GO) analyses revealed that 3800 genes whose UTRs or exons overlap with MCRs are enriched for Golgi-related cellular component terms and various enzymatic activities in the GO biological function category. MCRs are also enriched for loci that are sensitive to neocarzinostatin-induced double-strand breaks. Moreover, genetic variants discovered by genome-wide association studies and recorded in dbSNP are significantly underrepresented in MCRs. Using simulated HTS datasets, we show that false variant discovery rates are significantly higher in MCRs than in other genomic regions. These results suggest that extra caution must be taken when identifying genetic variants in the MCRs via HTS technologies.  相似文献   

13.
Osteopetrosis is a rare genetically heterogeneous disorder of bone metabolism characterized by increased skeleton density. In the past, standard methods for genetic diagnosis of osteopetrosis have primarily been performed by candidate gene screening and positional cloning. However, these methods are time and labor consumptive; and the genetic basis of approximately 30% of the cases is yet to be elucidated. Here, we employed whole exome sequencing of two affected individuals from an osteopetrosis family to identify a candidate mutation in CLCN7 (Y99C). It was identified from a total of 1757 and 1728 genetic variations found in either patient, which were then distilled using filtering strategies and confirmed using Sanger sequencing. We identified this mutation in six family members, while not in population matched controls. This mutation was previously found in osteopetrosis patients by other researchers. Our evolutionary analysis also indicated that it is under extremely high selective pressure, and is likely to be critical for the correct function of ClC-7, and thus is likely to be the responsible cause of disease. Collectively, our data further indicated that mutation (Y99C) may be a cause of osteopetrosis, and highlights the use of whole exome sequencing as a valuable approach to identifying disease mutations in a cost and time efficient manner.  相似文献   

14.
Family samples, which can be enriched for rare causal variants by focusing on families with multiple extreme individuals and which facilitate detection of de novo mutation events, provide an attractive resource for next-generation sequencing studies. Here, we describe, implement, and evaluate a likelihood-based framework for analysis of next generation sequence data in family samples. Our framework is able to identify variant sites accurately and to assign individual genotypes, and can handle de novo mutation events, increasing the sensitivity and specificity of variant calling and de novo mutation detection. Through simulations we show explicit modeling of family relationships is especially useful for analyses of low-frequency variants and that genotype accuracy increases with the number of individuals sequenced per family. Compared with the standard approach of ignoring relatedness, our methods identify and accurately genotype more variants, and have high specificity for detecting de novo mutation events. The improvement in accuracy using our methods over the standard approach is particularly pronounced for low-frequency variants. Furthermore the family-aware calling framework dramatically reduces Mendelian inconsistencies and is beneficial for family-based analysis. We hope our framework and software will facilitate continuing efforts to identify genetic factors underlying human diseases.  相似文献   

15.
BackgroundThe success of collapsing methods which investigate the combined effect of rare variants on complex traits has so far been limited. The manner in which variants within a gene are selected prior to analysis has a crucial impact on this success, which has resulted in analyses conventionally filtering variants according to their consequence. This study investigates whether an alternative approach to filtering, using annotations from recently developed bioinformatics tools, can aid these types of analyses in comparison to conventional approaches.ConclusionIncorporating variant annotations from non-coding bioinformatics tools should prove to be a valuable asset for rare variant analyses in the future. Filtering by variant consequence is only possible in coding regions of the genome, whereas utilising non-coding bioinformatics annotations provides an opportunity to discover unknown causal variants in non-coding regions as well. This should allow studies to uncover a greater number of causal variants for complex traits and help elucidate their functional role in disease.  相似文献   

16.
With the exponentially increasing amount of information in the biomedical field, the significance of advanced information retrieval and information extraction, as well as the role of databases, has been increasing. PRIME is an integrated gene/protein informatics database based on natural language processing. It provides automatically extracted protein/family/gene/compound interaction information including both physical and genetic interactions, gene ontology based functions, and graphic pathway viewers. Gene/protein/family names and functional terms are recognized based on dictionaries developed in our laboratory. The interaction and functional information are extracted by syntactic dependencies and various phrase patterns. We have included about 920,000 (non-redundant) protein interactions and 360,000 annotated gene-function relationships for major eukaryotes. By combining the sequence and text information, the pathway comparison between two organisms and simple pathway deduction based on other organism interaction data, and pathway filtering using tissue expression data, are also available. This database is accessible at http://prime.ontology.ims.u-tokyo.ac.jp:8081.  相似文献   

17.
Relative-pair designs are routinely employed in linkage studies of complex genetic diseases and quantitative traits. Valid application of these methods requires correct specification of the relationships of the pairs. For example, within a sibship, presumed full sibs actually might be MZ twins, half sibs, or unrelated. Misclassification of half-sib pairs or unrelated individuals as full sibs can result in reduced power to detect linkage. When other family members, such as parents or additional siblings, are available, incorrectly specified relationships usually will be detected through apparent incompatibilities with Mendelian inheritance. Without other family members, sibling relationships cannot be determined absolutely, but they still can be inferred probabilistically if sufficient genetic marker data are available. In this paper, we describe a simple likelihood ratio method to infer the true relationship of a putative sibling pair. We explore the number of markers required to accurately infer relationships typically encountered in a sib-pair study, as a function of marker allele frequencies, marker spacing, and genotyping error rate, and we conclude that very accurate inference of relationships can be achieved, given the marker data from even part of a genome scan. We compare our method to related methods of relationship inference that have been suggested. Finally, we demonstrate the value of excluding non-full sibs in a genetic linkage study of non-insulin-dependent diabetes mellitus.  相似文献   

18.
Genome-wide association studies (GWAS) using family data involve association analyses between hundreds of thousands of markers and a trait for a large number of related individuals. The correlations among relatives bring statistical and computational challenges when performing these large-scale association analyses. Recently, several rapid methods accounting for both within- and between-family variation have been proposed. However, these techniques mostly model the phenotypic similarities in terms of genetic relatedness. The familial resemblances in many family-based studies such as twin studies are not only due to the genetic relatedness, but also derive from shared environmental effects and assortative mating. In this paper, we propose 2 generalized least squares (GLS) models for rapid association analysis of family-based GWAS, which accommodate both genetic and environmental contributions to familial resemblance. In our first model, we estimated the joint genetic and environmental variations. In our second model, we estimated the genetic and environmental components separately. Through simulation studies, we demonstrated that our proposed approaches are more powerful and computationally efficient than a number of existing methods are. We show that estimating the residual variance-covariance matrix in the GLS models without SNP effects does not lead to an appreciable bias in the p values as long as the SNP effect is small (i.e. accounting for no more than 1% of trait variance).  相似文献   

19.
Knowledge of how individuals are related is important in many areas of research, and numerous methods for inferring pairwise relatedness from genetic data have been developed. However, the majority of these methods were not developed for situations where data are limited. Specifically, most methods rely on the availability of population allele frequencies, the relative genomic position of variants and accurate genotype data. But in studies of non‐model organisms or ancient samples, such data are not always available. Motivated by this, we present a new method for pairwise relatedness inference, which requires neither allele frequency information nor information on genomic position. Furthermore, it can be applied not only to accurate genotype data but also to low‐depth sequencing data from which genotypes cannot be accurately called. We evaluate it using data from a range of human populations and show that it can be used to infer close familial relationships with a similar accuracy as a widely used method that relies on population allele frequencies. Additionally, we show that our method is robust to SNP ascertainment and applicable to low‐depth sequencing data generated using different strategies, including resequencing and RADseq, which is important for application to a diverse range of populations and species.  相似文献   

20.
Studies of relatedness have been crucial in molecular ecology over the last decades. Good evidence of this is the fact that studies of population structure, evolution of social behaviours, genetic diversity and quantitative genetics all involve relatedness research. The main aim of this article was to review the most common graphical methods used in allele sharing studies for detecting and identifying family relationships. Both IBS‐ and IBD‐based allele sharing studies are considered. Furthermore, we propose two additional graphical methods from the field of compositional data analysis: the ternary diagram and scatterplots of isometric log‐ratios of IBS and IBD probabilities. We illustrate all graphical tools with genetic data from the HGDP‐CEPH diversity panel, using mainly 377 microsatellites genotyped for 25 individuals from the Maya population of this panel. We enhance all graphics with convex hulls obtained by simulation and use these to confirm the documented relationships. The proposed compositional graphics are shown to be useful in relatedness research, as they also single out the most prominent related pairs. The ternary diagram is advocated for its ability to display all three allele sharing probabilities simultaneously. The log‐ratio plots are advocated as an attempt to overcome the problems with the Euclidean distance interpretation in the classical graphics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号