首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Data integration procedures combine heterogeneous data sets into predictive models, but they are limited to data explicitly related to the target object type, such as genes. Collage is a new data fusion approach to gene prioritization. It considers data sets of various association levels with the prediction task, utilizes collective matrix factorization to compress the data, and chaining to relate different object types contained in a data compendium. Collage prioritizes genes based on their similarity to several seed genes. We tested Collage by prioritizing bacterial response genes in Dictyostelium as a novel model system for prokaryote-eukaryote interactions. Using 4 seed genes and 14 data sets, only one of which was directly related to the bacterial response, Collage proposed 8 candidate genes that were readily validated as necessary for the response of Dictyostelium to Gram-negative bacteria. These findings establish Collage as a method for inferring biological knowledge from the integration of heterogeneous and coarsely related data sets.  相似文献   

3.
4.

Background

Large amounts of microarray expression data have been generated for the Apicomplexan parasite Toxoplasma gondii in an effort to identify genes critical for virulence or developmental transitions. However, researchers’ ability to analyze this data is limited by the large number of unannotated genes, including many that appear to be conserved hypothetical proteins restricted to Apicomplexa. Further, differential expression of individual genes is not always informative and often relies on investigators to draw big-picture inferences without the benefit of context. We hypothesized that customization of gene set enrichment analysis (GSEA) to T. gondii would enable us to rigorously test whether groups of genes serving a common biological function are co-regulated during the developmental transition to the latent bradyzoite form.

Results

Using publicly available T. gondii expression microarray data, we created Toxoplasma gene sets related to bradyzoite differentiation, oocyst sporulation, and the cell cycle. We supplemented these with lists of genes derived from community annotation efforts that identified contents of the parasite-specific organelles, rhoptries, micronemes, dense granules, and the apicoplast. Finally, we created gene sets based on metabolic pathways annotated in the KEGG database and Gene Ontology terms associated with gene annotations available at http://www.toxodb.org. These gene sets were used to perform GSEA analysis using two sets of published T. gondii expression data that characterized T. gondii stress response and differentiation to the latent bradyzoite form.

Conclusions

GSEA provides evidence that cell cycle regulation and bradyzoite differentiation are coupled. Δgcn5A mutants unable to induce bradyzoite-associated genes in response to alkaline stress have different patterns of cell cycle and bradyzoite gene expression from stressed wild-type parasites. Extracellular tachyzoites resemble a transitional state that differs in gene expression from both replicating intracellular tachyzoites and in vitro bradyzoites by expressing genes that are enriched in bradyzoites as well as genes that are associated with the G1 phase of the cell cycle. The gene sets we have created are readily modified to reflect ongoing research and will aid researchers’ ability to use a knowledge-based approach to data analysis facilitating the development of new insights into the intricate biology of Toxoplasma gondii.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-515) contains supplementary material, which is available to authorized users.  相似文献   

5.
6.
7.
One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn''t make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions.  相似文献   

8.

Background

The identification of gene sets that are significantly impacted in a given condition based on microarray data is a crucial step in current life science research. Most gene set analysis methods treat genes equally, regardless how specific they are to a given gene set.

Results

In this work we propose a new gene set analysis method that computes a gene set score as the mean of absolute values of weighted moderated gene t-scores. The gene weights are designed to emphasize the genes appearing in few gene sets, versus genes that appear in many gene sets. We demonstrate the usefulness of the method when analyzing gene sets that correspond to the KEGG pathways, and hence we called our method P athway A nalysis with D own-weighting of O verlapping G enes (PADOG). Unlike most gene set analysis methods which are validated through the analysis of 2-3 data sets followed by a human interpretation of the results, the validation employed here uses 24 different data sets and a completely objective assessment scheme that makes minimal assumptions and eliminates the need for possibly biased human assessments of the analysis results.

Conclusions

PADOG significantly improves gene set ranking and boosts sensitivity of analysis using information already available in the gene expression profiles and the collection of gene sets to be analyzed. The advantages of PADOG over other existing approaches are shown to be stable to changes in the database of gene sets to be analyzed. PADOG was implemented as an R package available at: http://bioinformaticsprb.med.wayne.edu/PADOG/or http://www.bioconductor.org.  相似文献   

9.
10.
Our ability to engineer organisms with new biosynthetic pathways and genetic circuits is limited by the availability of protein characterization data and the cost of synthetic DNA. With new tools for reading and writing DNA, there are opportunities for scalable assays that more efficiently and cost effectively mine for biochemical protein characteristics. To that end, we have developed the Multiplex Library Synthesis and Expression Correction (MuLSEC) method for rapid assembly, error correction, and expression characterization of many genes as a pooled library. This methodology enables gene synthesis from microarray-synthesized oligonucleotide pools with a one-pot technique, eliminating the need for robotic liquid handling. Post assembly, the gene library is subjected to an ampicillin based quality control selection, which serves as both an error correction step and a selection for proteins that are properly expressed and folded in E. coli. Next generation sequencing of post selection DNA enables quantitative analysis of gene expression characteristics. We demonstrate the feasibility of this approach by building and testing over 90 genes for empirical evidence of soluble expression. This technique reduces the problem of part characterization to multiplex oligonucleotide synthesis and deep sequencing, two technologies under extensive development with projected cost reduction.  相似文献   

11.
Epithelial morphogenesis is characterized by an exquisite control of cell shape and position. Progression through dorsal closure in Drosophila gastrulation depends on the ability of Rap1 GTPase to signal through the adherens junctional multidomain protein Canoe. Here, we provide genetic evidence that epithelial Rap activation and Canoe effector usage are conferred by the Drosophila PDZ-GEF (dPDZ-GEF) exchange factor. We demonstrate that dPDZ-GEF/Rap/Canoe signaling modulates cell shape and apicolateral cell constriction in embryonic and wing disc epithelia. In dPDZ-GEF mutant embryos with strong dorsal closure defects, cells in the lateral ectoderm fail to properly elongate. Postembryonic dPDZ-GEF mutant cells generated in mosaic tissue display a striking extension of lateral cell perimeters in the proximity of junctional complexes, suggesting a loss of normal cell contractility. Furthermore, our data indicate that dPDZ-GEF signaling is linked to myosin II function. Both dPDZ-GEF and cno show strong genetic interactions with the myosin II-encoding gene, and myosin II distribution is severely perturbed in epithelia of both mutants. These findings provide the first insight into the molecular machinery targeted by Rap signaling to modulate epithelial plasticity. We propose that dPDZ-GEF-dependent signaling functions as a rheostat linking Rap activity to the regulation of cell shape in epithelial morphogenesis at different developmental stages.  相似文献   

12.
New genes originate frequently across diverse taxa. Given that genetic networks are typically comprised of robust, co-evolved interactions, the emergence of new genes raises an intriguing question: how do new genes interact with pre-existing genes? Here, we show that a recently originated gene rapidly evolved new gene networks and impacted sex-biased gene expression in Drosophila. This 4–6 million-year-old factor, named Zeus for its role in male fecundity, originated through retroposition of a highly conserved housekeeping gene, Caf40. Zeus acquired male reproductive organ expression patterns and phenotypes. Comparative expression profiling of mutants and closely related species revealed that Zeus has recruited a new set of downstream genes, and shaped the evolution of gene expression in germline. Comparative ChIP-chip revealed that the genomic binding profile of Zeus diverged rapidly from Caf40. These data demonstrate, for the first time, how a new gene quickly evolved novel networks governing essential biological processes at the genomic level.  相似文献   

13.
In plants, chalcones are precursors for a large number of flavonoid-derived plant natural products and are converted to flavanones by chalcone isomerase or nonenzymatically. Chalcones are synthesized from tyrosine and phenylalanine via the phenylpropanoid pathway involving phenylalanine ammonia lyase (PAL), cinnamate-4-hydroxylase (C4H), 4-coumarate:coenzyme A ligase (4CL), and chalcone synthase (CHS). For the purpose of production of flavanones in Escherichia coli, three sets of an artificial gene cluster which contained three genes of heterologous origins—PAL from the yeast Rhodotorula rubra, 4CL from the actinomycete Streptomyces coelicolor A3(2), and CHS from the licorice plant Glycyrrhiza echinata—were constructed. The constructions of the three sets were done as follows: (i) PAL, 4CL, and CHS were placed in that order under the control of the T7 promoter (PT7) and the ribosome-binding sequence (RBS) in the pET vector, where the initiation codons of 4CL and CHS were overlapped with the termination codons of the preceding genes; (ii) the three genes were transcribed by a single PT7 in front of PAL, and each of the three contained the RBS at appropriate positions; and (iii) all three genes contained both PT7 and the RBS. These pathways bypassed C4H, a cytochrome P-450 hydroxylase, because the bacterial 4CL enzyme ligated coenzyme A to both cinnamic acid and 4-coumaric acid. E. coli cells containing the gene clusters produced two flavanones, pinocembrin from phenylalanine and naringenin from tyrosine, in addition to their precursors, cinnamic acid and 4-coumaric acid. Of the three sets, the third gene cluster conferred on the host the highest ability to produce the flavanones. This is a new metabolic engineering technique for the production in bacteria of a variety of compounds of plant and animal origin.  相似文献   

14.
The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.  相似文献   

15.
16.

Background  

A major goal of computational studies of gene regulation is to accurately predict the expression of genes based on the cis-regulatory content of their promoters. The development of computational methods to decode the interactions among cis-regulatory elements has been slow, in part, because it is difficult to know, without extensive experimental validation, whether a particular method identifies the correct cis-regulatory interactions that underlie a given set of expression data. There is an urgent need for test expression data in which the interactions among cis-regulatory sites that produce the data are known. The ability to rapidly generate such data sets would facilitate the development and comparison of computational methods that predict gene expression patterns from promoter sequence.  相似文献   

17.
18.
Inferring gene regulatory networks (GRNs) is a major issue in systems biology, which explicitly characterizes regulatory processes in the cell. The Path Consistency Algorithm based on Conditional Mutual Information (PCA-CMI) is a well-known method in this field. In this study, we introduce a new algorithm (IPCA-CMI) and apply it to a number of gene expression data sets in order to evaluate the accuracy of the algorithm to infer GRNs. The IPCA-CMI can be categorized as a hybrid method, using the PCA-CMI and Hill-Climbing algorithm (based on MIT score). The conditional dependence between variables is determined by the conditional mutual information test which can take into account both linear and nonlinear genes relations. IPCA-CMI uses a score and search method and defines a selected set of variables which is adjacent to one of or Y. This set is used to determine the dependency between X and Y. This method is compared with the method of evaluating dependency by PCA-CMI in which the set of variables adjacent to both X and Y, is selected. The merits of the IPCA-CMI are evaluated by applying this algorithm to the DREAM3 Challenge data sets with n variables and n samples () and to experimental data from Escherichia coil containing 9 variables and 9 samples. Results indicate that applying the IPCA-CMI improves the precision of learning the structure of the GRNs in comparison with that of the PCA-CMI.  相似文献   

19.
Rho family GTPases act as molecular switches to regulate a range of physiological functions, including the regulation of the actin-based cytoskeleton, membrane trafficking, cell morphology, nuclear gene expression, and cell growth. Rho function is regulated by its ability to bind GTP and by its localization. We previously demonstrated functional and physical interactions between Rho3 and the clathrin-associated adaptor protein-1 (AP-1) complex, which revealed a role of Rho3 in regulating Golgi/endosomal trafficking in fission yeast. Sip1, a conserved AP-1 accessory protein, recruits the AP-1 complex to the Golgi/endosomes through physical interaction. In this study, we showed that Sip1 is required for Rho3 localization. First, overexpression of rho3 + suppressed defective membrane trafficking associated with sip1-i4 mutant cells, including defects in vacuolar fusion, Golgi/endosomal trafficking and secretion. Notably, Sip1 interacted with Rho3, and GFP-Rho3, similar to Apm1-GFP, did not properly localize to the Golgi/endosomes in sip1-i4 mutant cells at 27°C. Interestingly, the C-terminal region of Sip1 is required for its localization to the Golgi/endosomes, because Sip1-i4-GFP protein failed to properly localize to Golgi/endosomes, whereas the fluorescence of Sip1ΔN mutant protein co-localized with that of FM4-64. Consistently, in the sip1-i4 mutant cells, which lack the C-terminal region of Sip1, binding between Apm1 and Rho3 was greatly impaired, presumably due to mislocalization of these proteins in the sip1-i4 mutant cells. Furthermore, the interaction between Apm1 and Rho3 as well as Rho3 localization to the Golgi/endosomes were significantly rescued in sip1-i4 mutant cells by the expression of Sip1ΔN. Taken together, these results suggest that Sip1 recruits Rho3 to the Golgi/endosomes through physical interaction and enhances the formation of the Golgi/endosome AP-1/Rho3 complex, thereby promoting crosstalk between AP-1 and Rho3 in the regulation of Golgi/endosomal trafficking in fission yeast.  相似文献   

20.
H-NS family proteins, bacterial xenogeneic silencers, play central roles in genome organization and in the regulation of foreign genes. It is thought that gene repression is directly dependent on the DNA binding modes of H-NS family proteins. These proteins form lateral protofilaments along DNA. Under specific environmental conditions they switch to bridging two DNA duplexes. This switching is a direct effect of environmental conditions on electrostatic interactions between the oppositely charged DNA binding and N-terminal domains of H-NS proteins. The Pseudomonas lytic phage LUZ24 encodes the protein gp4, which modulates the DNA binding and function of the H-NS family protein MvaT of Pseudomonas aeruginosa. However, the mechanism by which gp4 affects MvaT activity remains elusive. In this study, we show that gp4 specifically interferes with the formation and stability of the bridged MvaT–DNA complex. Structural investigations suggest that gp4 acts as an ‘electrostatic zipper’ between the oppositely charged domains of MvaT protomers, and stabilizes a structure resembling their ‘half-open’ conformation, resulting in relief of gene silencing and adverse effects on P. aeruginosa growth. The ability to control H-NS conformation and thereby its impact on global gene regulation and growth might open new avenues to fight Pseudomonas multidrug resistance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号