首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.

Background

Knowing which proteins exist in a certain organism or cell type and how these proteins interact with each other are necessary for the understanding of biological processes at the whole cell level. The determination of the protein-protein interaction (PPI) networks has been the subject of extensive research. Despite the development of reasonably successful methods, serious technical difficulties still exist. In this paper we present DomainGA, a quantitative computational approach that uses the information about the domain-domain interactions to predict the interactions between proteins.

Results

DomainGA is a multi-parameter optimization method in which the available PPI information is used to derive a quantitative scoring scheme for the domain-domain pairs. Obtained domain interaction scores are then used to predict whether a pair of proteins interacts. Using the yeast PPI data and a series of tests, we show the robustness and insensitivity of the DomainGA method to the selection of the parameter sets, score ranges, and detection rules. Our DomainGA method achieves very high explanation ratios for the positive and negative PPIs in yeast. Based on our cross-verification tests on human PPIs, comparison of the optimized scores with the structurally observed domain interactions obtained from the iPFAM database, and sensitivity and specificity analysis; we conclude that our DomainGA method shows great promise to be applicable across multiple organisms.

Conclusion

We envision the DomainGA as a first step of a multiple tier approach to constructing organism specific PPIs. As it is based on fundamental structural information, the DomainGA approach can be used to create potential PPIs and the accuracy of the constructed interaction template can be further improved using complementary methods. Explanation ratios obtained in the reported test case studies clearly show that the false prediction rates of the template networks constructed using the DomainGA scores are reasonably low, and the erroneous predictions can be filtered further using supplementary approaches such as those based on literature search or other prediction methods.  相似文献   

3.
Plant biologists in fields of ecology, evolution, genetics and breeding frequently use multivariate methods. This paper illustrates Principal Component Analysis (PCA) and Gabriel's biplot as applied to microarray expression data from plant pathology experiments.  相似文献   

4.
Summary The precision of estimates of genetic variances and covariances obtained from multivariate selection experiments of various designs are discussed. The efficiencies of experimental designs are compared using criteria based on a confidence region of the estimated genetic parameters, with estimation using both responses and selection differentials and offspring-parent regression. A good selection criterion is shown to be to select individuals as parents using an index of the sums of squares and crossproducts of the phenotypic measurements. Formulae are given for the optimum selection proportion when the relative numbers of individuals in the parent and progeny generations are fixed or variable. Although the optimum depends on a priori knowledge of the genetic parameters to be estimated, the designs are very robust to poor estimates. For bivariate uncorrelated data, the variance of the estimated genetic parameters can be reduced by approximately 0.4 relative to designs of a more conventional nature when half of the individuals are selected on one trait and half on the other trait. There are larger reductions in variances if the traits are correlated.  相似文献   

5.
MOTIVATION: Microarray experiments often involve hundreds or thousands of genes. In a typical experiment, only a fraction of genes are expected to be differentially expressed; in addition, the measured intensities among different genes may be correlated. Depending on the experimental objectives, sample size calculations can be based on one of the three specified measures: sensitivity, true discovery and accuracy rates. The sample size problem is formulated as: the number of arrays needed in order to achieve the desired fraction of the specified measure at the desired family-wise power at the given type I error and (standardized) effect size. RESULTS: We present a general approach for estimating sample size under independent and equally correlated models using binomial and beta-binomial models, respectively. The sample sizes needed for a two-sample z-test are computed; the computed theoretical numbers agree well with the Monte Carlo simulation results. But, under more general correlation structures, the beta-binomial model can underestimate the needed samples by about 1-5 arrays. CONTACT: jchen@nctr.fda.gov.  相似文献   

6.
We have developed a complete statistical model for the analysis of tumor specific gene expression profiles. The approach provides investigators with a global overview on large scale gene expression data, indicating aspects of the data that relate to tumor phenotype, but also summarizing the uncertainties inherent in classification of tumor types. We demonstrate the use of this method in the context of a gene expression profiling study of 27 human breast cancers. The study is aimed at defining molecular characteristics of tumors that reflect estrogen receptor tatus. In addition to good predictive performance with respect to pure classification of the expression profiles, the model also uncovers conflicts in the data with respect to the classification of some of the tumors, highlighting them as critical cases for which additional investigations are appropriate.  相似文献   

7.
Moulting rate estimates of Temora longicornis and Pseudocalanuselongatus in short incubation experiments in the laboratorywere compared with shipboard experiments in the North Sea. Absenceof food, rotation of bottles and (an extremely small) containersize did not affect the moulting rate in well-fed copepods.A minor bias was found due to gentle handling. High variancewas due to the use of different samples, one before and oneafter incubation, which is part of the so-called ‘Kimmerermethod’. This error can be reduced by monitoring the newproduction of exuviae in a single sample, which makes sortingof individuals unnecessary, while conserving the estimate ofmoulting rate in individual stages.  相似文献   

8.
MOTIVATION: Numerical output of spotted microarrays displays censoring of pixel intensities at some software dependent threshold. This reduces the quality of gene expression data, because it seriously violates the linearity of expression with respect to signal intensity. Statistical methods based on typically available spot summaries together with some parametric assumptions can suggest ways to correct for this defect. RESULTS: A maximum likelihood approach is suggested together with a sensible approximation to the joint density of the mean, median and variance-which are typically available to the biological end-user. The method 'corrects' the gene expression values for pixel censoring. A by-product of our approach is a comparison between several two-parameter models for pixel intensity values. It suggests that pixels separated by one or two other pixels can be considered independent draws from a Lognormal or a Gamma distribution. AVAILABILITY: The R/S-Plus code is available at http://www.stats.gla.ac.uk/~microarray/software.  相似文献   

9.
The complete nucleotide sequence of chloroplast DNA from a liverwort, Marchantia polymorpha has made clear the entire gene organization of the chloroplast genome. Quite a few genes encoding components of photosynthesis and protein synthesis machinery have been identified by comparative computer analysis. Other genes involved in photosynthesis, respiratory electron transport, and membrane-associated transport in chloroplasts were predicted by the amino acid sequence homology and secondary structure of gene products. Thirty-three open reading frames in the liverwort chloroplast genome remain unidentified. However, most of these open reading frames are also conserved in the chloroplast genomes of two species, a liverwort, Marchantia polymorpha, and tobacco, Nicotiana tabacum, indicating their active functions in chloroplasts.Abbreviations bp base pair - kDa kilodalton - IR inverted repeat - ORF open reading frame - DALA -aminolevulinate  相似文献   

10.
Relative expression ratios are commonly estimated in real-time qPCR studies by comparing the quantification cycle for the target gene with that for a reference gene in the treatment samples, normalized to the same quantities determined for a control sample. For the “standard curve” design, where data are obtained for all four of these at several dilutions, nonlinear least squares can be used to assess the amplification efficiencies (AE) and the adjusted ΔΔCq and its uncertainty, with automatic inclusion of the effect of uncertainty in the AEs. An algorithm is illustrated for the KaleidaGraph program.  相似文献   

11.
Biokinetic parameters are usually calculated from slopes and intercepts taken from plots of experimental data. One response at an item is plotted and used for parameter estimation. Aside from problems that may be caused by transformations made when the data are plotted, this approach has the weakness of not using all the data simultaneously when there is more than one response. This paper shows how multiresponse biological data can be handled to get parameter estimates that are much more precise than those obtained using conventional methods.  相似文献   

12.
13.
Using statistical methods, the designs of multifraction experiments which are likely to give the most precise estimate of the alpha-beta ratio in the linear-quadratic model are investigated. The aim of the investigation is to try to understand what features of an experimental design make it efficient for estimating alpha/beta rather than to recommend a specific design. A plot of the design on an nd2 versus nd graph is suggested, and this graph is called the design plot. The best designs are those which have a large spread in the isoeffect direction in the design plot, which means that a wide range of doses per fraction should be used. For binary response assays, designs with expected response probabilities near to 0.5 are most efficient. Furthermore, dose points with expected response probabilities outside the range 0.1 to 0.9 contribute negligibly to the efficiency with which alpha/beta can be estimated. For "top-up" experiments, the best designs are those which replace as small a portion as possible of the full experiment with the top-up scheme. In addition, from a statistical viewpoint, it makes no difference whether a single large top-up dose or several smaller top-up doses are used; however, other considerations suggest that two or more top-up doses may be preferable. The practical realities of designing experiments as well as the somewhat idealized statistical considerations are discussed.  相似文献   

14.
The quantification of gene expression by real-time polymerase chain reaction (PCR) has revolutionized the field of gene expression analysis. Due to its sensitivity and flexibility it is becoming the method of choice for many investigators. However, good normalization protocols still have to be implemented to facilitate data exchange and comparison. We have designed primers for 10 unrelated genes and developed a simple protocol to detect genes with stable expression that are suitable for use as endogenous reference genes for further use in the normalization of gene expression data obtained by real-time PCR. Using this protocol, we were able to identify human proteosome subunit Y as a reliable endogenous reference gene for human umbilical vein endothelial cells treated for up to 18 h with TNFalpha, IL-4, or IFNgamma and for B cells isolated from healthy controls and patients suffering from IgA nephropathy. Other optional endogenous reference genes that can be considered are phosphomannomutase (PPMM) and actin for endothelial cells and glyceraldehyde-3-phosphate dehydrogenase and PPMM for B cells.  相似文献   

15.
Discovery of single nucleotide polymorphisms (SNPs) requires analysis of redundant sequences such as those available in large public databases. The ability to detect SNPs, especially those of low frequency, is dependent on the depth and scale of the discovery effort. Large numbers of SNPs have been identified by mining large-scale EST surveys and whole genome sequencing projects. These surveys however are subject to ascertainment bias and the inherent errors in large-scale single pass sequencing efforts. For example, the number of steps involved in the construction and sequencing of cDNA libraries make ESTs highly error prone, resulting in an increased frequency of nonvalid SNPs obtained in these surveys. Sequences of mtDNA genes are often incorporated into cDNA libraries as an artifact of the library construction process and are typically either subtracted from cDNA libraries or are considered superfluous when evaluating the information content of EST datasets. Sequences of mtDNA genes provide a unique resource for the analysis of SNP parameters in EST projects. This study uses sequences from four turkey muscle cDNA libraries to demonstrate how mtDNA sequences gleaned from collections of ESTs can be used to estimate SNP parameters and thus help predict the validity of SNPs.  相似文献   

16.
17.
Using DNA microarrays to study gene expression in closely related species   总被引:6,自引:0,他引:6  
MOTIVATION: Comparisons of gene expression levels within and between species have become a central tool in the study of the genetic basis for phenotypic variation, as well as in the study of the evolution of gene regulation. DNA microarrays are a key technology that enables these studies. Currently, however, microarrays are only available for a small number of species. Thus, in order to study gene expression levels in species for which microarrays are not available, researchers face three sets of choices: (i) use a microarray designed for another species, but only compare gene expression levels within species, (ii) construct a new microarray for every species whose gene expression profiles will be compared or (iii) build a multi-species microarray with probes from each species of interest. Here, we use data collected using a multi-primate cDNA array to evaluate the reliability of each approach. RESULTS: We find that, for inter-species comparisons, estimates of expression differences based on multi-species microarrays are more accurate than those based on multiple species-specific arrays. We also demonstrate that within-species expression differences can be estimated using a microarray for a closely related species, without discernible loss of information. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

18.
19.
20.
Gene flow from crops to wild related species has been recently under focus in risk-assessment studies of the ecological consequences of growing transgenic crops. However, experimental studies addressing this question are usually temporally or spatially limited. Indirect population-structure approaches can provide more global estimates of gene flow, but their assumptions appear inappropriate in an agricultural context. In an attempt to help the committees providing advice on the release of transgenic crops, we present a new method to estimate the quantity of genes migrating from crops to populations of related wild plants by way of pollen dispersal. This method provides an average estimate at a landscape level. Its originality is based on the measure of the inverse gene flow, i.e. gene flow from the wild plants to the crop. Such gene flow results in an observed level of impurities from wild plants in crop seeds. This level of impurity is usually known by the seed producers and, in any case, its measure is easier than a direct screen of wild populations because crop seeds are abundant and their genetic profile is known. By assuming that wild and cultivated plants have a similar individual pollen dispersal function, we infer the level of pollen-mediated gene flow from a crop to the surrounding wild populations from this observed level of impurity. We present an example for sugar beet data. Results suggest that under conditions of seed production in France (isolation distance of 1,000 m) wild beets produce high numbers of seeds fathered by cultivated plants. Received: 5 February 2001 / Accepted: 26 March 2001  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号