首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In cancer research, high-throughput genomic studies have been extensively conducted, searching for markers associated with cancer diagnosis, prognosis and variation in response to treatment. In this article, we analyze cancer prognosis studies and investigate ranking markers based on their marginal prognosis power. To avoid ambiguity, we focus on microarray gene expression studies where genes are the markers, but note that the methodology and results are applicable to other high-throughput studies. The objectives of this study are 2-fold. First, we investigate ranking markers under three commonly adopted semiparametric models, namely the Cox, accelerated failure time and additive risk models. Data analysis shows that the ranking may vary significantly under different models. Second, we describe a nonparametric concordance measure, which has roots in the time-dependent ROC (receiver operating characteristic) framework and relies on much weaker assumptions than the semiparametric models. In simulation, it is shown that ranking using the concordance measure is not sensitive to model specification whereas ranking under the semiparametric models is. In data analysis, the concordance measure generates rankings significantly different from those under the semiparametric models.  相似文献   

2.
Recent technology has made it possible to simultaneously perform multi-platform genomic profiling (e.g. DNA methylation (DM) and gene expression (GE)) of biological samples, resulting in so-called ‘multi-dimensional genomic data’. Such data provide unique opportunities to study the coordination between regulatory mechanisms on multiple levels. However, integrative analysis of multi-dimensional genomics data for the discovery of combinatorial patterns is currently lacking. Here, we adopt a joint matrix factorization technique to address this challenge. This method projects multiple types of genomic data onto a common coordinate system, in which heterogeneous variables weighted highly in the same projected direction form a multi-dimensional module (md-module). Genomic variables in such modules are characterized by significant correlations and likely functional associations. We applied this method to the DM, GE, and microRNA expression data of 385 ovarian cancer samples from the The Cancer Genome Atlas project. These md-modules revealed perturbed pathways that would have been overlooked with only a single type of data, uncovered associations between different layers of cellular activities and allowed the identification of clinically distinct patient subgroups. Our study provides an useful protocol for uncovering hidden patterns and their biological implications in multi-dimensional ‘omic’ data.  相似文献   

3.
In identifying subgroups of a heterogeneous disease or condition, it is often desirable to identify both the observations and the features which differ between subgroups. For instance, it may be that there is a subgroup of individuals with a certain disease who differ from the rest of the population based on the expression profile for only a subset of genes. Identifying the subgroup of patients and subset of genes could lead to better-targeted therapy. We can represent the subgroup of individuals and genes as a bicluster, a submatrix, U, of a larger data matrix, X, such that the features and observations in U differ from those not contained in U. We present a novel two-step method, SC-Biclust, for identifying U. In the first step, the observations in the bicluster are identified to maximize the sum of the weighted between-cluster feature differences. In the second step, features in the bicluster are identified based on their contribution to the clustering of the observations. This versatile method can be used to identify biclusters that differ on the basis of feature means, feature variances, or more general differences. The bicluster identification accuracy of SC-Biclust is illustrated through several simulated studies. Application of SC-Biclust to pain research illustrates its ability to identify biologically meaningful subgroups.  相似文献   

4.

Background  

When predictive survival models are built from high-dimensional data, there are often additional covariates, such as clinical scores, that by all means have to be included into the final model. While there are several techniques for the fitting of sparse high-dimensional survival models by penalized parameter estimation, none allows for explicit consideration of such mandatory covariates.  相似文献   

5.
We propose a statistical method for uncovering gene pathways that characterize cancer heterogeneity. To incorporate knowledge of the pathways into the model, we define a set of activities of pathways from microarray gene expression data based on the Sparse Probabilistic Principal Component Analysis (SPPCA). A pathway activity logistic regression model is then formulated for cancer phenotype. To select pathway activities related to binary cancer phenotypes, we use the elastic net for the parameter estimation and derive a model selection criterion for selecting tuning parameters included in the model estimation. Our proposed method can also reverse-engineer gene networks based on the identified multiple pathways that enables us to discover novel gene-gene associations relating with the cancer phenotypes. We illustrate the whole process of the proposed method through the analysis of breast cancer gene expression data.  相似文献   

6.

Background

Genomic selection methods require dense and widespread genotyping data, posing a particular challenge if both sexes are subject to intense selection (e.g., aquaculture species). This study focuses on alternative low-cost genomic selection methods (IBD-GS) that use selective genotyping with sparse marker panels to estimate identity-by-descent relationships through linkage analysis. Our aim was to evaluate the potential of these methods in selection programs for continuous traits measured on sibs of selection candidates in a typical aquaculture breeding population.

Methods

Phenotypic and genomic data were generated by stochastic simulation, assuming low to moderate heritabilities (0.10 to 0.30) for a Gaussian trait measured on sibs of the selection candidates in a typical aquaculture breeding population that consisted of 100 families (100 training animals and 20 selection candidates per family). Low-density marker genotype data (~ 40 markers per Morgan) were used to trace genomic identity-by-descent relationships. Genotyping was restricted to selection candidates from 30 phenotypically top-ranking families and varying fractions of their phenotypically extreme training sibs. All phenotypes were included in the genetic analyses. Classical pedigree-based and IBD-GS models were compared based on realized genetic gain over one generation of selection.

Results

Genetic gain increased substantially (13 to 32%) with IBD-GS compared to classical selection and was greatest with higher heritability. Most of the extra gain from IBD-GS was obtained already by genotyping the 5% phenotypically most extreme sibs within the pre-selected families. Additional genotyping further increased genetic gains, but these were small when going from genotyping 20% of the extremes to all phenotyped sibs. The success of IBD-GS with sparse and selective genotyping can be explained by the fact that within-family haplotype blocks are accurately traced even with low-marker densities and that most of the within-family variance for normally distributed traits is captured by a small proportion of the phenotypically extreme sibs.

Conclusions

IBD-GS was substantially more effective than classical selection, even when based on very few markers and combined with selective genotyping of small fractions of the population. The study shows that low-cost GS programs can be successful by combining sparse and selective genotyping with pedigree and linkage information.  相似文献   

7.
Chen M  Wang K  Zhang L  Li C  Yang Y 《PloS one》2011,6(12):e28552
Urine has emerged as an attractive biofluid for the noninvasive detection of prostate cancer (PCa). There is a strong imperative to discover candidate urinary markers for the clinical diagnosis and prognosis of PCa. The rising flood of various omics profiles presents immense opportunities for the identification of prospective biomarkers. Here we present a simple and efficient strategy to derive candidate urine markers for prostate tumor by mining cancer genomic profiles from public databases. Prostate, bladder and kidney are three major tissues from which cellular matters could be released into urine. To identify urinary markers specific for PCa, upregulated entities that might be shed in exosomes of bladder cancer and kidney cancer are first excluded. Through the ontology-based filtering and further assessment, a reduced list of 19 entities encoding urinary proteins was derived as putative PCa markers. Among them, we have found 10 entities closely associated with the process of tumor cell growth and development by pathway enrichment analysis. Further, using the 10 entities as seeds, we have constructed a protein-protein interaction (PPI) subnetwork and suggested a few urine markers as preferred prognostic markers to monitor the invasion and progression of PCa. Our approach is amenable to discover and prioritize potential markers present in a variety of body fluids for a spectrum of human diseases.  相似文献   

8.
Adenocarcinoma of the pancreas is a significant cause of cancer mortality, and up to 10?% of cases appear to be familial. Heritable genomic copy number variants (CNVs) can modulate gene expression and predispose to disease. Here, we identify candidate predisposition genes for familial pancreatic cancer (FPC) by analyzing germline losses or gains present in one or more high-risk patients and absent in a large control group. A total of 120 FPC cases and 1,194 controls were genotyped on the Affymetrix 500K array, and 36 cases and 2,357 controls were genotyped on the Affymetrix 6.0 array. Detection of CNVs was performed by multiple computational algorithms and partially validated by quantitative PCR. We found no significant difference in the germline CNV profiles of cases and controls. A total of 93 non-redundant FPC-specific CNVs (53 losses and 40 gains) were identified in 50 cases, each CNV present in a single individual. FPC-specific CNVs overlapped the coding region of 88 RefSeq genes. Several of these genes have been reported to be differentially expressed and/or affected by copy number alterations in pancreatic adenocarcinoma. Further investigation in high-risk subjects may elucidate the role of one or more of these genes in genetic predisposition to pancreatic cancer.  相似文献   

9.
10.
11.
《Cell》2023,186(1):47-62.e16
  1. Download : Download high-res image (187KB)
  2. Download : Download full-size image
  相似文献   

12.
Copy number alterations (CNAs) can be observed in most of cancer patients. Several oncogenes and tumor suppressor genes with CNAs have been identified in different kinds of tumor. However, the systematic survey of CNA-affected functions is still lack. By employing systems biology approaches, instead of examining individual genes, we directly identified the functional hotspots on human genome. A total of 838 hotspots on human genome with 540 enriched Gene Ontology functions were identified. Seventy-six aCGH array data of hepatocellular carcinoma (HCC) tumors were employed in this study. A total of 150 regions which putatively affected by CNAs and the encoded functions were identified. Our results indicate that two immune related hotspots had copy number alterations in most of patients. In addition, our data implied that these immune-related regions might be involved in HCC oncogenesis. Also, we identified 39 hotspots of which copy number status were associated with patient survival. Our data implied that copy number alterations of the regions may contribute in the dysregulation of the encoded functions. These results further demonstrated that our method enables researchers to survey biological functions of CNAs and to construct regulation hypothesis at pathway and functional levels.  相似文献   

13.
14.
Malignant mesothelioma (MM) is an aggressive and therapy-resistant neoplasm arising from the pleural mesothelial cells and usually associated with long-term asbestos exposure. Recent studies suggest that tumors contain cancer stem cells (CSCs) and their stem cell characteristics are thought to confer therapy-resistance. However, whether MM cell has any stem cell characteristics is not known. To understand the molecular basis of MM, we first performed serial transplantation of surgical samples into NOD/SCID mice and established new cell lines. Next, we performed marker analysis of the MM cell lines and found that many of them contain SP cells and expressed several putative CSC markers such as CD9, CD24, and CD26. Interestingly, expression of CD26 closely correlated with that of CD24 in some cases. Sorting and culture assay revealed that SP and CD24+ cells proliferated by asymmetric cell division-like manner. In addition, CD9+ and CD24+ cells have higher potential to generate spheroid colony than negative cells in the stem cell medium. Moreover, these marker-positive cells have clear tendency to generate larger tumors in mouse transplantation assay. Taken together, our data suggest that SP, CD9, CD24, and CD26 are CSC markers of MM and could be used as novel therapeutic targets.  相似文献   

15.
The Salmonella genomic island 1 is an integrative mobilizable element   总被引:6,自引:0,他引:6  
Salmonella genomic island 1 (SGI1) is a genomic island containing an antibiotic resistance gene cluster identified in several Salmonella enterica serovars. The SGI1 antibiotic resistance gene cluster, which is a complex class 1 integron, confers the common multidrug resistance phenotype of epidemic S. enterica Typhimurium DT104. The SGI1 occurrence in S. enterica serovars Typhimurium, Agona, Paratyphi B, Albany, Meleagridis and Newport indicates the horizontal transfer potential of SGI1. Here, we report that SGI1 could be conjugally transferred from S. enterica donor strains to non-SGI1 S. enterica and Escherichia coli recipient strains where it integrated into the recipient chromosome in a site-specific manner. First, an extrachromosomal circular form of SGI1 was identified by PCR which forms through a specific recombination of the left and right ends of the integrated SGI1. Chromosomal excision of SGI1 was found to require SGI1-encoded integrase which presents similarities to the lambdoid integrase family. Second, the conjugal transfer of SGI1 required the presence of a helper plasmid. The conjugative IncC plasmid R55 could thus mobilize in trans SGI1 which was transferred from the donor to the recipient strains. By this way, the conjugal transfer of SGI1 occurred at a frequency of 10(-5)-10(-6) transconjugants per donor. No transconjugants could be obtained for the SGI1 donor lacking the int integrase gene. Third, chromosomal integration of SGI1 occurred via a site-specific recombination between a 18 bp sequence found in the circular form of SGI1 and a similar 18 bp sequence at the 3' end of thdF gene in the S. enterica and E. coli chromosome. SGI1 appeared to be transmissible only in the presence of additional conjugative functions provided in trans. SGI1 can thus be classified within the group of integrative mobilizable elements (IMEs).  相似文献   

16.
With mounting availability of genomic and phenotypic databases, data integration and mining become increasingly challenging. While efforts have been put forward to analyze prokaryotic phenotypes, current computational technologies either lack high throughput capacity for genomic scale analysis, or are limited in their capability to integrate and mine data across different scales of biology. Consequently, simultaneous analysis of associations among genomes, phenotypes, and gene functions is prohibited. Here, we developed a high throughput computational approach, and demonstrated for the first time the feasibility of integrating large quantities of prokaryotic phenotypes along with genomic datasets for mining across multiple scales of biology (protein domains, pathways, molecular functions, and cellular processes). Applying this method over 59 fully sequenced prokaryotic species, we identified genetic basis and molecular mechanisms underlying the phenotypes in bacteria. We identified 3,711 significant correlations between 1,499 distinct Pfam and 63 phenotypes, with 2,650 correlations and 1,061 anti-correlations. Manual evaluation of a random sample of these significant correlations showed a minimal precision of 30% (95% confidence interval: 20%-42%; n = 50). We stratified the most significant 478 predictions and subjected 100 to manual evaluation, of which 60 were corroborated in the literature. We furthermore unveiled 10 significant correlations between phenotypes and KEGG pathways, eight of which were corroborated in the evaluation, and 309 significant correlations between phenotypes and 166 GO concepts evaluated using a random sample (minimal precision = 72%; 95% confidence interval: 60%-80%; n = 50). Additionally, we conducted a novel large-scale phenomic visualization analysis to provide insight into the modular nature of common molecular mechanisms spanning multiple biological scales and reused by related phenotypes (metaphenotypes). We propose that this method elucidates which classes of molecular mechanisms are associated with phenotypes or metaphenotypes and holds promise in facilitating a computable systems biology approach to genomic and biomedical research.  相似文献   

17.
Identifying large-scale structural variation in cancer genomes continues to be a challenge to researchers. Current methods rely on genome alignments based on a reference that can be a poor fit to highly variant and complex tumor genomes. To address this challenge we developed a method that uses available breakpoint information to generate models of structural variations. We use these models as references to align previously unmapped and discordant reads from a genome. By using these models to align unmapped reads, we show that our method can help to identify large-scale variations that have been previously missed.  相似文献   

18.
Gastric cancer (GC) is a prevalent malignant cancer of digestive system, identification of novel diagnostic and prognostic biomarkers for GC is urgently demanded. The aim of this study was to determine potential long noncoding RNAs (lncRNAs) associated with the pathogenesis and prognosis of GC. Raw noncoding RNA microarray data (GSE53137, GSE70880, and GSE99417) was downloaded from Gene Expression Omnibus (GEO) database. Differentially expressed genes between GC and adjacent normal gastric tissue samples were screened by an integrated analysis of multiple gene expression profile after gene reannotation and batch normalization. Differentially expressed genes were further confirmed by The Cancer Genome Atlas (TCGA) database. Competing endogenous RNA (ceRNA) network, Gene Ontology term and Kyoto Encyclopedia of Genes and Genomes pathway, survival analysis were extensively applied to identify hub lncRNAs and discover potential biomarkers related to diagnosis and prognosis of GC. In total of 246 integrated differential genes including 15 lncRNAs and 241 messenger RNAs (mRNAs) were obtained after intersections of differential genes between GEO and TCGA database. ceRNA network comprised of three lncRNAs (UCA1, HOTTIP, and HMGA1P4), 26 microRNAs (miRNAs) and 72 mRNAs. Functional analysis revealed that three lncRNAs were mainly dominated in cell cycle and cellular senescence. Survival analysis showed that HMGA1P4 was statistically related to the overall survival rate. For the first time, we identified that HMGA1P4, a target of miR-301b/miR-508, is involved in cell cycle and senescence process by regulating CCNA2 in GC. Finally, the expression levels of three lncRNAs were validated to be upregulated in GC tissues. Thus, three lncRNAs including UCA1, HOTTIP, and HMGA1P4 may contribute to GC development and their potential functions might be associated with the prognosis of GC.  相似文献   

19.
Development of 1,030 genomic SSR markers in switchgrass   总被引:1,自引:0,他引:1  
Switchgrass, Panicum virgatum L., a native to the tall grass prairies in North America, has been grown for soil conservation and herbage production in the USA and recently widely recognized as a promising dedicated cellulosic bioenergy crop. A large amount of codominant molecular markers including simple sequence repeats (SSRs) are required for the construction of linkage maps and implementation of molecular breeding strategies to develop superior switchgrass cultivars. The objectives of this study were (1) to identify SSR-containing clones and to design PCR primer pairs (PPs) in SSR-enriched genomic libraries, and (2) to validate and characterize the designed SSR PPs. Five genomic SSR enriched libraries were constructed using genomic DNA of ‘SL93 7 × 15’, a switchgrass genotype selected in an Oklahoma State University (OSU) southern lowland breeding population. A total of 3,046 clones from four libraries enriched in (CA/TG)n, (GA/TC)n, (CAG/CTG)n and (AAG/CTT)n SSR repeats were sequenced at the OSU Core Facility. From the sequences, we isolated 1,300 unique SSR-containing clones, from which we designed 1,398 PPs using SSR Locator V.1 software. Among the designed PPs, 1,030 (73.7%) amplified reproducible and strong bands with expected fragment size, and 802 detected polymorphic alleles, in SL93 7 × 15 and ‘NL94 16 × 13’, two parents of one mapping population. All of the four libraries contained a high rate of perfect SSR repeat types, ranging from 62.7 to 76.2%. Polymorphism of the effective SSR markers was also tested in two lowland and two upland switchgrass cultivars, encompassing ‘Alamo’ and ‘Kanlow’, and ‘Blackwell’ and ‘Dacotah’, respectively. The developed SSR markers should be useful in genetic and breeding research in switchgrass.  相似文献   

20.
Large-scale genomic and proteomic analysis has provided a wealth of information on biologically relevant systems, and the ability to analyze this information is crucial to uncovering important biological relationships. However, it has proven difficult to compare large datasets from different sources due to different gene and protein identifiers assigned by individual laboratories and database systems. Here, we describe the design of a fully automated blast program (BlastPro) that facilitates rapid comparison of large protein-protein, nucleotide--nucleotide, or nucleotide--protein datasets from numerous, independent studies. Using this system, we compared several published genomic and proteomic databases for proteins that are upregulated in highly motile, metastatic tumor cells. Analysis of five independent studies comprised of greater than 1 x 10(6) genomic sequences and greater than 1,000 proteins revealed that the cytoskeletal-associated protein alpha-actinin is increased at both the mRNA and protein level in metastatic breast, prostate, and skin cancer cells. Interestingly, spatial analysis of alpha-actinin expression revealed that it is amplified 8-fold in the leading pseudopodium compared to the cell body compartment of migrating cells. These findings indicate that amplification of alpha-actinin and its localization to the leading pseudopodium are potential biomarkers of cancer progression to a more metastatic phenotype. Together, our results demonstrate that the BlastPro system can be used to compare large genomic and proteomic datasets to reveal important biological relationships including those associated with cancer progression.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号