首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 113 毫秒
1.
Combined analysis of the microarray and drug-activity datasets has the potential of revealing valuable knowledge about various relations among gene expressions and drug activities in the malignant cell. In this paper, we apply Bayesian networks, a tool for compact representation of the joint probability distribution, to such analysis. For the alleviation of data dimensionality problem, the huge datasets were condensed using a feature abstraction technique. The proposed analysis method was applied to the NCI60 dataset (http://discover.nci.nih.gov) consisting of gene expression profiles and drug activity patterns on human cancer cell lines. The Bayesian networks, learned from the condensed dataset, identified most of the salient pairwise correlations and some known relationships among several features in the original dataset, confirming the effectiveness of the proposed feature abstraction method. Also, a survey of the recent literature confirms the several relationships appearing in the learned Bayesian network to be biologically meaningful.  相似文献   

2.
MOTIVATION: Discrimination between two classes such as normal and cancer samples and between two types of cancers based on gene expression profiles is an important problem which has practical implications as well as the potential to further our understanding of gene expression of various cancer cells. Classification or discrimination of more than two groups or classes (multi-class) is also needed. The need for multi-class discrimination methodologies is apparent in many microarray experiments where various cancer types are considered simultaneously. RESULTS: Thus, in this paper we present the extension to the classification methodology proposed earlier Nguyen and Rocke (2002b; Bioinformatics, 18, 39-50) to classify cancer samples from multiple classes. The methodologies proposed in this paper are applied to four gene expression data sets with multiple classes: (a) a hereditary breast cancer data set with (1) BRCA1-mutation, (2) BRCA2-mutation and (3) sporadic breast cancer samples, (b) an acute leukemia data set with (1) acute myeloid leukemia (AML), (2) T-cell acute lymphoblastic leukemia (T-ALL) and (3) B-cell acute lymphoblastic leukemia (B-ALL) samples, (c) a lymphoma data set with (1) diffuse large B-cell lymphoma (DLBCL), (2) B-cell chronic lymphocytic leukemia (BCLL) and (3) follicular lymphoma (FL) samples, and (d) the NCI60 data set with cell lines derived from cancers of various sites of origin. In addition, we evaluated the classification algorithms and examined the variability of the error rates using simulations based on randomization of the real data sets. We note that there are other methods for addressing multi-class prediction recently and our approach is along the line of Nguyen and Rocke (2002b; Bioinformatics, 18, 39-50). CONTACT: dnguyen@stat.tamu.edu; dmrocke@ucdavis.edu  相似文献   

3.
This paper examines a new technique for the visualization of and the interaction with trees, objects frequently used to convey hierarchical relationships in biological data. Motivated by the quality of 2D tree interaction, we adapt the planar tree-of-life metaphor to a virtual, semi-immersive 3D environment. A 3D environment extends the utility of this metaphor by allowing the user to view an entire data set in a single screen. Interrogation of the tree is implemented using 3D input devices. This real-time interrogation of the tree itself provides a quick means by which to qualitatively analyse the hierarchical data. In this paper, we describe the techniques underlying the implementation of such an environment. We conclude by considering the utility of tree metaphors as a basis for the representation of highly dimensional data sets. AVAILABILITY: Arbor3D (source code, a binary executable for SGI IRIX 6.4, Perl parsers, and sample Newick data files) are available via the Internet (http://xian.tamu.edu/Arbor3D/). Arbor3D can be displayed in "CAVE simulator" mode on an SGI workstation screen, or as an interactive virtual environment on a projection workbench. CONTACT: druths@rice.edu; echen@cs.rice.edu; leland@xian.tamu.edu  相似文献   

4.
The essence of a living cell is adaptation to a changing environment, and a central goal of modern cell biology is to understand adaptive change under normal and pathological conditions. Because the number of components is large, and processes and conditions are many, visual tools are useful in providing an overview of relations that would otherwise be far more difficult to assimilate. Historically, representations were static pictures, with genes and proteins represented as nodes, and known or inferred correlations between them (links) represented by various kinds of lines. The modern challenge is to capture functional hierarchies and adaptation to environmental change, and to discover pathways and processes embedded in known data, but not currently recognizable. Among the tools being developed to meet this challenge is VisANT (freely available at http://visant.bu.edu) which integrates, mines and displays hierarchical information. Challenges to integrating modeling (discrete or continuous) and simulation capabilities into such visual mining software are briefly discussed.  相似文献   

5.
MOTIVATION: A major problem of pattern classification is estimation of the Bayes error when only small samples are available. One way to estimate the Bayes error is to design a classifier based on some classification rule applied to sample data, estimate the error of the designed classifier, and then use this estimate as an estimate of the Bayes error. Relative to the Bayes error, the expected error of the designed classifier is biased high, and this bias can be severe with small samples. RESULTS: This paper provides a correction for the bias by subtracting a term derived from the representation of the estimation error. It does so for Boolean classifiers, these being defined on binary features. Although the general theory applies to any Boolean classifier, a model is introduced to reduce the number of parameters. A key point is that the expected correction is conservative. Properties of the corrected estimate are studied via simulation. The correction applies to binary predictors because they are mathematically identical to Boolean classifiers. In this context the correction is adapted to the coefficient of determination, which has been used to measure nonlinear multivariate relations between genes and design genetic regulatory networks. An application using gene-expression data from a microarray experiment is provided on the website http://gspsnap.tamu.edu/smallsample/ (user:'smallsample', password:'smallsample)').  相似文献   

6.
The gene that encodes the alpha-isoform of phosphatidylinositol 3-kinase (PIK3Ca) is frequently mutated in human cancers. We profiled the mutation status of the PIK3Ca gene in the National Cancer Institute (NCI)-60 panel of human cancer cell lines maintained by the Developmental Therapeutics Program of the NCI. Mutation hotspots on the gene were PCR amplified and sequenced, and the trace data were analyzed with software designed to detect mutations. Seven of the cell lines tested have PIK3Ca mutations: two lines derived from breast cancer, two from colon cancer, two from ovarian cancer, and one from lung cancer. BRAF and EGFR genes were normal in the PIK3Ca mutant lines. Two of the cell lines with mutant PIK3Ca also have a mutant version of the KRAS gene. The mutation status was correlated with array-based gene expression that is publicly available for the NCI-60 cell lines. We found increased expression levels for estrogen receptor (ER) and ERBB2 in PIK3Ca mutant lines. The PIK3Ca mutation status was also correlated with compound screening data for the cell lines. PIK3Ca-mutant cell lines were relatively more sensitive than PIK3Ca-normal cell lines to the ER inhibitor tamoxifen and the AKT inhibitor triciribine, among other compounds. The results provide insights into the role of mutant PIK3Ca in oncogenic signaling and allow preliminary identification of novel targets for therapeutic intervention in cancers harboring PIK3Ca mutations.  相似文献   

7.
MOTVIATION: The existence of several technologies for measuring gene expression makes the question of cross-technology agreement of measurements an important issue. Cross-platform utilization of data from different technologies has the potential to reduce the need to duplicate experiments but requires corresponding measurements to be comparable. METHODS: A comparison of mRNA measurements of 2895 sequence-matched genes in 56 cell lines from the standard panel of 60 cancer cell lines from the National Cancer Institute (NCI 60) was carried out by calculating correlation between matched measurements and calculating concordance between cluster from two high-throughput DNA microarray technologies, Stanford type cDNA microarrays and Affymetrix oligonucleotide microarrays. RESULTS: In general, corresponding measurements from the two platforms showed poor correlation. Clusters of genes and cell lines were discordant between the two technologies, suggesting that relative intra-technology relationships were not preserved. GC-content, sequence length, average signal intensity, and an estimator of cross-hybridization were found to be associated with the degree of correlation. This suggests gene-specific, or more correctly probe-specific, factors influencing measurements differently in the two platforms, implying a poor prognosis for a broad utilization of gene expression measurements across platforms.  相似文献   

8.
For anticancer drug therapy, it is critical to kill those cells with highest tumorigenic potential, even when they comprise a relatively small fraction of the overall tumor cell population. We have used the established NCI/DTP 60 cell line growth inhibition assay as a platform for exploring the relationship between chemical structure and growth inhibition in both tumorigenic and non-tumorigenic cancer cell lines. Using experimental measurements of “take rate” in ectopic implants as a proxy for tumorigenic potential, we identified eight chemical agents that appear to strongly and selectively inhibit the growth of the most tumorigenic cell lines. Biochemical assay data and structure-activity relationships indicate that these compounds act by inhibiting tubulin polymerization. Yet, their activity against tumorigenic cell lines is more selective than that of the other microtubule inhibitors in clinical use. Biochemical differences in the tubulin subunits that make up microtubules, or differences in the function of microtubules in mitotic spindle assembly or cell division may be associated with the selectivity of these compounds.  相似文献   

9.
MOTIVATION: We present an application of Bayesian variable selection to the novel detection of sequence elements that confer negative design to protein structure and function. As an illustration, we analyze the different dimer interfaces between the CXCL8 chemokine family with the CCL4 and CCL2 chemokine families to discover the changes that disfavor CXCL8 of quaternary structure. RESULTS: In comparison with known experimental results, our method identifies evolutionarily conserved sequence changes in the CC families that inhibit CXCL8 quaternary structure. Therefore, we find positive selection of negative design elements. Furthermore, our approach predicts that a two-residue deletion conserved in the CCL4 chemokine family disfavors CXCL8 dimerization. AVAILABILITY: The Matlab code for the Bayesian variable selection is freely available at http://stat.tamu.edu/~mvannucci/webpages/codes.html  相似文献   

10.
Cluster analysis of gene-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and constructing gene regulatory networks. The motivation for considering mutual information is its capacity to measure a general dependence among gene random variables. We propose a novel clustering strategy based on minimizing mutual information among gene clusters. Simulated annealing is employed to solve the optimization problem. Bootstrap techniques are employed to get more accurate estimates of mutual information when the data sample size is small. Moreover, we propose to combine the mutual information criterion and traditional distance criteria such as the Euclidean distance and the fuzzy membership metric in designing the clustering algorithm. The performances of the new clustering methods are compared with those of some existing methods, using both synthesized data and experimental data. It is seen that the clustering algorithm based on a combined metric of mutual information and fuzzy membership achieves the best performance. The supplemental material is available at www.gspsnap.tamu.edu/gspweb/zxb/glioma_zxb.  相似文献   

11.
The NCI60 database is the largest available collection of compounds with measured anti-cancer activity. The strengths and limitations for using the NCI60 database as a source of new anti-cancer agents are explored and discussed in relation to previous studies. We selected a sub-set of 2333 compounds with reliable experimental half maximum growth inhibitions (GI(50)) values for 30 cell lines from the NCI60 data set and evaluated their growth inhibitory effect (chemosensitivity) with respect to tissue of origin. This was done by identifying natural clusters in the chemosensitivity data set and in a data set of expression profiles of 1901 genes for the corresponding tumor cell lines. Five clusters were identified based on the gene expression data using self-organizing maps (SOM), comprising leukemia, melanoma, ovarian and prostate, basal breast, and luminal breast cancer cells, respectively. The strong difference in gene expression between basal and luminal breast cancer cells was reflected clearly in the chemosensitivity data. Although most compounds in the data set were of low potency, high efficacy compounds that showed specificity with respect to tissue of origin could be found. Furthermore, eight potential topoisomerase II inhibitors were identified using a structural similarity search. Finally, a set of genes with expression profiles that were significantly correlated with anti-cancer drug activity was identified. Our study demonstrates that the combined data sets, which provide comprehensive information on drug activity and gene expression profiles of tumor cell lines studied, are useful for identifying potential new active compounds.  相似文献   

12.
Accurate tools for multiple sequence alignment (MSA) are essential for comparative studies of the function and structure of biological sequences. However, it is very challenging to develop a computationally efficient algorithm that can consistently predict accurate alignments for various types of sequence sets. In this article, we introduce PicXAA (Probabilistic Maximum Accuracy Alignment), a probabilistic non-progressive alignment algorithm that aims to find protein alignments with maximum expected accuracy. PicXAA greedily builds up the multiple alignment from sequence regions with high local similarities, thereby yielding an accurate global alignment that effectively grasps the local similarities among sequences. Evaluations on several widely used benchmark sets show that PicXAA constantly yields accurate alignment results on a wide range of reference sets, with especially remarkable improvements over other leading algorithms on sequence sets with local similarities. PicXAA source code is freely available at: http://www.ece.tamu.edu/∼bjyoon/picxaa/.  相似文献   

13.

Background  

In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golubet al.[1] and the NCI60 dataset of Rosset al.[2] present multiclass classification problems where three tumour types and nine cell lines respectively must be identified. We apply an evolutionary algorithm to identify the near-optimal set of predictive genes that classify the data. We also examine the initial gene selection step whereby the most informative genes are selected from the genes assayed.  相似文献   

14.
Over the past few decades, panels of human cancer cell lines have made a significant contribution to the discovery and development of anticancer drugs. The National Cancer Institute 60 (NCI60), which consists of 60 cell lines from various human cancer types, remains the most powerful human cancer cell line panel for high throughput screening of anticancer drugs. The development of JFCR39, comprising a panel of 39 human cancer cell lines coupled with a drug-activity database, was based on NCI60. Like NCI60, JFCR39 not only provides disease-oriented information but can also predict the action mechanism or molecular target of a given antitumor agent by utilizing the COMPARE algorithm. The molecular targets of ZSTK474 as well as several other antitumor agents have been identified by using JFCR39 and some of these compounds have since entered clinical trials. In this review, we will describe human cancer cell line panels particularly JFCR39 and its application in the discovery and/or development of anticancer drug candidates.  相似文献   

15.
We utilize the secondary structural properties of the 28S rRNA D2–D10 expansion segments to hypothesize a multiple sequence alignment for major lineages of the hymenopteran superfamily Ichneumonoidea (Braconidae, Ichneumonidae). The alignment consists of 290 sequences (originally analyzed in Belshaw and Quicke, Syst Biol 51:450–477, 2002) and provides the first global alignment template for this diverse group of insects. Predicted structures for these expansion segments as well as for over half of the 18S rRNA are given, with highly variable regions characterized and isolated within conserved structures. We demonstrate several pitfalls of optimization alignment and illustrate how these are potentially addressed with structure-based alignments. Our global alignment is presented online at (http://hymenoptera.tamu.edu/rna) with summary statistics, such as basepair frequency tables, along with novel tools for parsing structure-based alignments into input files for most commonly used phylogenetic software. These resources will be valuable for hymenopteran systematists, as well as researchers utilizing rRNA sequences for phylogeny estimation in any taxon. We explore the phylogenetic utility of our structure-based alignment by examining a subset of the data under a variety of optimality criteria using results from Belshaw and Quicke (2002) as a benchmark.Access to on-line data: http://hymenoptera.tamu.edu/rna; username, ichs; password, ichzzz  相似文献   

16.
To determine cancer pathway activities in nine types of primary tumors and NCI60 cell lines, we applied an in silico approach by examining gene signatures reflective of consequent pathway activation using gene expression data. Supervised learning approaches predicted that the Ras pathway is active in ~70% of lung adenocarcinomas but inactive in most squamous cell carcinomas, pulmonary carcinoids, and small cell lung carcinomas. In contrast, the TGF-β, TNF-α, Src, Myc, E2F3, and β-catenin pathways are inactive in lung adenocarcinomas. We predicted an active Ras, Myc, Src, and/or E2F3 pathway in significant percentages of breast cancer, colorectal carcinoma, and gliomas. Our results also suggest that Ras may be the most prevailing oncogenic pathway. Additionally, many NCI60 cell lines exhibited a gene signature indicative of an active Ras, Myc, and/or Src, but not E2F3, β-catenin, TNF-α, or TGF-β pathway. To our knowledge, this is the first comprehensive survey of cancer pathway activities in nine major tumor types and the most widely used NCI60 cell lines. The "gene expression pathway signatures" we have defined could facilitate the understanding of molecular mechanisms in cancer development and provide guidance to the selection of appropriate cell lines for cancer research and pharmaceutical compound screening.  相似文献   

17.
Path matching and graph matching in biological networks.   总被引:2,自引:0,他引:2  
We develop algorithms for the following path matching and graph matching problems: (i) given a query path p and a graph G, find a path p' that is most similar to p in G; (ii) given a query graph G (0) and a graph G, find a graph G (0)' that is most similar to G (0) in G. In these problems, p and G (0) represent a given substructure of interest to a biologist, and G represents a large network in which the biologist desires to find a related substructure. These algorithms allow the study of common substructures in biological networks in order to understand how these networks evolve both within and between organisms. We reduce the path matching problem to finding a longest weighted path in a directed acyclic graph and show that the problem of finding top k suboptimal paths can be solved in polynomial time. This is in contrast with most previous approaches that used exponential time algorithms to find simple paths which are practical only when the paths are short. We reduce the graph matching problem to finding highest scoring subgraphs in a graph and give an exact algorithm to solve the problem when the query graph G (0) is of moderate size. This eliminates the need for less accurate heuristic or randomized algorithms.We show that our algorithms are able to extract biologically meaningful pathways from protein interaction networks in the DIP database and metabolic networks in the KEGG database. Software programs implementing these techniques (PathMatch and GraphMatch) are available at http://faculty.cs.tamu.edu/shsze/pathmatch and http://faculty.cs.tamu.edu/shsze/graphmatch.  相似文献   

18.
A series of 10 derivatives 2-6 issued from the fusion of various five-membered heterocycles to cyclopenta[c]thiophene were evaluated for potential anticancer activity in the NCI's in vitro human disease-oriented tumor cell line screening panel that consisted of 60 human tumor cell lines arranged in nine subpanels, representing diverse histologies. Among these tested compounds, four were found to be cytotoxic allowing us to point out some structure-activity relationships. The oxazolidinone derivatives 2a-c displayed further in vivo antitumor activity in the hollow fiber assay and standard xenograft testing developed at the NCI.  相似文献   

19.
Simvastatin and lovastatin are statins traditionally used for lowering serum cholesterol levels. However, there exists evidence indicating their potential chemotherapeutic characteristics in cancer. In this study, we used bioinformatic analysis of publicly available data in order to systematically identify the genes involved in resistance to cytotoxic effects of these two drugs in the NCI60 cell line panel. We used the pharmacological data available for all the NCI60 cell lines to classify simvastatin or lovastatin resistant and sensitive cell lines, respectively. Next, we performed whole-genome single marker case-control association tests for the lovastatin and simvastatin resistant and sensitive cells using their publicly available Affymetrix 125K SNP genomic data. The results were then evaluated using RNAi methodology. After correction of the p-values for multiple testing using False Discovery Rate, our results identified three genes (NRP1, COL13A1, MRPS31) and six genes (EAF2, ANK2, AKAP7, STEAP2, LPIN2, PARVB) associated with resistance to simvastatin and lovastatin, respectively. Functional validation using RNAi confirmed that silencing of EAF2 expression modulated the response of HCT-116 colon cancer cells to both statins. In summary, we have successfully utilized the publicly available data on the NCI60 cell lines to perform whole-genome association studies for simvastatin and lovastatin. Our results indicated genes involved in the cellular response to these statins and siRNA studies confirmed the role of the EAF2 in response to these drugs in HCT-116 colon cancer cells.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号