首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Human cancers are complex ecosystems composed of cells with distinct molecular signatures. Such intratumoral heterogeneity poses a major challenge to cancer diagnosis and treatment. Recent advancements of single-cell techniques such as scRNA-seq have brought unprecedented insights into cellular heterogeneity. Subsequently, a challenging computational problem is to cluster high dimensional noisy datasets with substantially fewer cells than the number of genes.

Methods

In this paper, we introduced a consensus clustering framework conCluster, for cancer subtype identification from single-cell RNA-seq data. Using an ensemble strategy, conCluster fuses multiple basic partitions to consensus clusters.

Results

Applied to real cancer scRNA-seq datasets, conCluster can more accurately detect cancer subtypes than the widely used scRNA-seq clustering methods. Further, we conducted co-expression network analysis for the identified melanoma subtypes.

Conclusions

Our analysis demonstrates that these subtypes exhibit distinct gene co-expression networks and significant gene sets with different functional enrichment.
  相似文献   

2.

Background

New technologies for acquisition of genomic data, while offering unprecedented opportunities for genetic discovery, also impose severe burdens of interpretation andpenalties for multiple testing.

Methods

The Pathway-based Analyses Group of the Genetic Analysis Workshop 19 (GAW19) sought reduction of multiple-testing burden through various approaches to aggregation of highdimensional data in pathways informed by prior biological knowledge.

Results

Experimental methods testedincluded the use of "synthetic pathways" (random sets of genes) to estimate power and false-positive error rate of methods applied to simulated data; data reduction via independent components analysis, single-nucleotide polymorphism (SNP)-SNP interaction, and use of gene sets to estimate genetic similarity; and general assessment of the efficacy of prior biological knowledge to reduce the dimensionality of complex genomic data.

Conclusions

The work of this group explored several promising approaches to managing high-dimensional data, with the caveat that these methods are necessarily constrained by the quality of external bioinformatic annotation.
  相似文献   

3.

Background

Accurately predicting pathogenic human genes has been challenging in recent research. Considering extensive gene–disease data verified by biological experiments, we can apply computational methods to perform accurate predictions with reduced time and expenses.

Methods

We propose a probability-based collaborative filtering model (PCFM) to predict pathogenic human genes. Several kinds of data sets, containing data of humans and data of other nonhuman species, are integrated in our model. Firstly, on the basis of a typical latent factorization model, we propose model I with an average heterogeneous regularization. Secondly, we develop modified model II with personal heterogeneous regularization to enhance the accuracy of aforementioned models. In this model, vector space similarity or Pearson correlation coefficient metrics and data on related species are also used.

Results

We compared the results of PCFM with the results of four state-of-arts approaches. The results show that PCFM performs better than other advanced approaches.

Conclusions

PCFM model can be leveraged for predictions of disease genes, especially for new human genes or diseases with no known relationships.
  相似文献   

4.

Introduction

Infiltrating gliomas are primary brain tumors that express significant biological and clinical heterogeneity in adults, which complicates their treatment and prognosis. Characterization of tumor subtypes using spectroscopic analysis may assist in predicting malignant transformation and quantification of response to therapy.

Study objective

To implement an automated algorithm for classification of metabolomic profiles for the classification of glioma pathological grades and the prediction of malignant progression using spectra obtained by high-resolution magic angle spinning (HR-MAS) spectroscopy of patient-derived tissue samples.

Methods

237 image-guided tissue samples were obtained from 152 patients who underwent surgery for newly diagnosed or recurrent glioma and analyzed via HR-MAS spectroscopy. Orthogonal projection to latent structures discriminant analysis was used as a classifier and the variable-influence-on-projection values were evaluated to identify signature spectral regions.

Results

The accuracy of classifiers developed for discriminating glioma subtypes was 68% for newly diagnosed grade II versus III samples; 86 and 92% for new and recurrent grade III versus IV, respectively; 95% for newly diagnosed grade II versus IV; and 88% for recurrent grade II versus IV lesions. Classifiers distinguished between samples from newly diagnosed vs. recurrent lesions with an accuracy of 78% for grade III and 99% for grade IV glioma.

Conclusion

Classifying metabolomic profiles for new and recurrent glioma without prior assumptions regarding spectral components identified candidate in vivo biomarkers for use in assessing changes that are likely to impact treatment decisions.
  相似文献   

5.

Background

An important step toward understanding the biological mechanisms underlying a complex disease is a refined understanding of its clinical heterogeneity. Relating clinical and molecular differences may allow us to define more specific subtypes of patients that respond differently to therapeutic interventions.

Results

We developed a novel unbiased method called diVIsive Shuffling Approach (VIStA) that identifies subgroups of patients by maximizing the difference in their gene expression patterns. We tested our algorithm on 140 subjects with Chronic Obstructive Pulmonary Disease (COPD) and found four distinct, biologically and clinically meaningful combinations of clinical characteristics that are associated with large gene expression differences. The dominant characteristic in these combinations was the severity of airflow limitation. Other frequently identified measures included emphysema, fibrinogen levels, phlegm, BMI and age. A pathway analysis of the differentially expressed genes in the identified subtypes suggests that VIStA is capable of capturing specific molecular signatures within in each group.

Conclusions

The introduced methodology allowed us to identify combinations of clinical characteristics that correspond to clear gene expression differences. The resulting subtypes for COPD contribute to a better understanding of its heterogeneity.
  相似文献   

6.

Background

The current literature establishes the importance of gene functional category and expression in promoting or suppressing duplicate gene loss after whole genome doubling in plants, a process known as fractionation. Inspired by studies that have reported gene expression to be the dominating factor in preventing duplicate gene loss, we analyzed the relative effect of functional category and expression.

Methods

We use multivariate methods to study data sets on gene retention, function and expression in rosids and asterids to estimate effects and assess their interaction.

Results

Our results suggest that the effect on duplicate gene retention fractionation by functional category and expression are independent and have no statistical interaction.

Conclusion

In plants, functional category is the more dominant factor in explaining duplicate gene loss.
  相似文献   

7.

Background

Sets of genes that are known to be associated with each other can be used to interpret microarray data. This gene set approach to microarray data analysis can illustrate patterns of gene expression which may be more informative than analyzing the expression of individual genes. Various statistical approaches exist for the analysis of gene sets. There are three main classes of these methods: over-representation analysis, functional class scoring, and pathway topology based methods.

Methods

We propose weighted hypergeometric and weighted chi-squared methods in order to assign a rank to the degree to which each gene participates in the enrichment. Each gene is assigned a weight determined by the absolute value of its log fold change, which is then raised to a certain power. The power value can be adjusted as needed. Datasets from the Gene Expression Omnibus are used to test the method. The significantly enriched pathways are validated through searching the literature in order to determine their relevance to the dataset.

Results

Although these methods detect fewer significantly enriched pathways, they can potentially produce more relevant results. Furthermore, we compare the results of different enrichment methods on a set of microarray studies all containing data from various rodent neuropathic pain models.

Discussion

Our method is able to produce more consistent results than other methods when evaluated on similar datasets. It can also potentially detect relevant pathways that are not identified by the standard methods. However, the lack of biological ground truth makes validating the method difficult.
  相似文献   

8.

Background

Cerebral infarction caused by different reasons seems differ in fibrinogen levels, so the current work intends to explore the relationship between the fibrinogen level and subtypes of the TOAST criteria in the acute stage of ischemic stroke.

Methods

A total of 577 case research objects were treated acute ischemic stroke patients in our hospital from December 2008 to December 2010, and blood samples within 72 hours of the onset were processed with the fibrinogen (PT-der) measurement. Classification of selected patients according to the TOAST Criteria was conducted to study the distribution of fibrinogen levels in the stroke subtypes.

Results

The distribution of fibrinogen levels in the subtypes was observed to be statistically insignificant.

Conclusions

In the acute stage of ischemic stroke, fibrinogen level was not related to the subtypes of the TOAST criteria.
  相似文献   

9.

Background

An artificial neural network approach was chosen to model the outcome of the complex signaling pathways in the gastro-intestinal tract and other peripheral organs that eventually produce the satiety feeling in the brain upon feeding.

Methods

A multilayer feed-forward neural network was trained with sets of experimental data relating concentration-time courses of plasma satiety hormones to Visual Analog Scales (VAS) scores. The network successfully predicted VAS responses from sets of satiety hormone data obtained in experiments using different food compositions.

Results

The correlation coefficients for the predicted VAS responses for test sets having i) a full set of three satiety hormones, ii) a set of only two satiety hormones, and iii) a set of only one satiety hormone were 0.96, 0.96, and 0.89, respectively. The predicted VAS responses discriminated the satiety effects of high satiating food types from less satiating food types both in orally fed and ileal infused forms.

Conclusions

From this application of artificial neural networks, one may conclude that neural network models are very suitable to describe situations where behavior is complex and incompletely understood. However, training data sets that fit the experimental conditions need to be available.
  相似文献   

10.

Background

Diagnostic accuracy of lymphoma, a heterogeneous cancer, is essential for patient management. Several ancillary tests including immunophenotyping, and sometimes cytogenetics and PCR are required to aid histological diagnosis. In this proof of principle study, gene expression microarray was evaluated as a single platform test in the differential diagnosis of common lymphoma subtypes and reactive lymphadenopathy (RL) in lymph node biopsies.

Methods

116 lymph node biopsies diagnosed as RL, classical Hodgkin lymphoma (cHL), diffuse large B cell lymphoma (DLBCL) or follicular lymphoma (FL) were assayed by mRNA microarray. Three supervised classification strategies (global multi-class, local binary-class and global binary-class classifications) using diagonal linear discriminant analysis was performed on training sets of array data and the classification error rates calculated by leave one out cross-validation. The independent error rate was then evaluated by testing the identified gene classifiers on an independent (test) set of array data.

Results

The binary classifications provided prediction accuracies, between a subtype of interest and the remaining samples, of 88.5%, 82.8%, 82.8% and 80.0% for FL, cHL, DLBCL, and RL respectively. Identified gene classifiers include LIM domain only-2 (LMO2), Chemokine (C-C motif) ligand 22 (CCL22) and Cyclin-dependent kinase inhibitor-3 (CDK3) specifically for FL, cHL and DLBCL subtypes respectively.

Conclusions

This study highlights the ability of gene expression profiling to distinguish lymphoma from reactive conditions and classify the major subtypes of lymphoma in a diagnostic setting. A cost-effective single platform "mini-chip" assay could, in principle, be developed to aid the quick diagnosis of lymph node biopsies with the potential to incorporate other pathological entities into such an assay.
  相似文献   

11.

Background

With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enrichment analysis heavily relies on the quality and integrity of gene set annotations. Although several methods were developed to annotate gene sets, there is still a lack of high quality annotation methods. Here, we propose a novel method to improve the annotation accuracy through combining the GO structure and gene expression data.

Results

We propose a novel approach for optimizing gene set annotations to get more accurate annotation results. The proposed method filters the inconsistent annotations using GO structure information and probabilistic gene set clusters calculated by a range of cluster sizes over multiple bootstrap resampled datasets. The proposed method is employed to analyze p53 cell lines, colon cancer and breast cancer gene expression data. The experimental results show that the proposed method can filter a number of annotations unrelated to experimental data and increase gene set enrichment power and decrease the inconsistent of annotations.

Conclusions

A novel gene set annotation optimization approach is proposed to improve the quality of gene annotations. Experimental results indicate that the proposed method effectively improves gene set annotation quality based on the GO structure and gene expression data.
  相似文献   

12.
13.
14.

Background

The heme-protein interactions are essential for various biological processes such as electron transfer, catalysis, signal transduction and the control of gene expression. The knowledge of heme binding residues can provide crucial clues to understand these activities and aid in functional annotation, however, insufficient work has been done on the research of heme binding residues from protein sequence information.

Methods

We propose a sequence-based approach for accurate prediction of heme binding residues by a novel integrative sequence profile coupling position specific scoring matrices with heme specific physicochemical properties. In order to select the informative physicochemical properties, we design an intuitive feature selection scheme by combining a greedy strategy with correlation analysis.

Results

Our integrative sequence profile approach for prediction of heme binding residues outperforms the conventional methods using amino acid and evolutionary information on the 5-fold cross validation and the independent tests.

Conclusions

The novel feature of an integrative sequence profile achieves good performance using a reduced set of feature vector elements.
  相似文献   

15.
16.

Background

Integrative analysis on multi-omics data has gained much attention recently. To investigate the interactive effect of gene expression and DNA methylation on cancer, we propose a directed random walk-based approach on an integrated gene-gene graph that is guided by pathway information.

Methods

Our approach first extracts a single pathway profile matrix out of the gene expression and DNA methylation data by performing the random walk over the integrated graph. We then apply a denoising autoencoder to the pathway profile to further identify important pathway features and genes. The extracted features are validated in the survival prediction task for breast cancer patients.

Results

The results show that the proposed method substantially improves the survival prediction performance compared to that of other pathway-based prediction methods, revealing that the combined effect of gene expression and methylation data is well reflected in the integrated gene-gene graph combined with pathway information. Furthermore, we show that our joint analysis on the methylation features and gene expression profile identifies cancer-specific pathways with genes related to breast cancer.

Conclusions

In this study, we proposed a DRW-based method on an integrated gene-gene graph with expression and methylation profiles in order to utilize the interactions between them. The results showed that the constructed integrated gene-gene graph can successfully reflect the combined effect of methylation features on gene expression profiles. We also found that the selected features by DA can effectively extract topologically important pathways and genes specifically related to breast cancer.
  相似文献   

17.

Background

Genome-scale metabolic models provide an opportunity for rational approaches to studies of the different reactions taking place inside the cell. The integration of these models with gene regulatory networks is a hot topic in systems biology. The methods developed to date focus mostly on resolving the metabolic elements and use fairly straightforward approaches to assess the impact of genome expression on the metabolic phenotype.

Results

We present here a method for integrating the reverse engineering of gene regulatory networks into these metabolic models. We applied our method to a high-dimensional gene expression data set to infer a background gene regulatory network. We then compared the resulting phenotype simulations with those obtained by other relevant methods.

Conclusions

Our method outperformed the other approaches tested and was more robust to noise. We also illustrate the utility of this method for studies of a complex biological phenomenon, the diauxic shift in yeast.
  相似文献   

18.

Introduction

Untargeted metabolomics is a powerful tool for biological discoveries. To analyze the complex raw data, significant advances in computational approaches have been made, yet it is not clear how exhaustive and reliable the data analysis results are.

Objectives

Assessment of the quality of raw data processing in untargeted metabolomics.

Methods

Five published untargeted metabolomics studies, were reanalyzed.

Results

Omissions of at least 50 relevant compounds from the original results as well as examples of representative mistakes were reported for each study.

Conclusion

Incomplete raw data processing shows unexplored potential of current and legacy data.
  相似文献   

19.

Background

Existing clustering approaches for microarray data do not adequately differentiate between subsets of co-expressed genes. We devised a novel approach that integrates expression and sequence data in order to generate functionally coherent and biologically meaningful subclusters of genes. Specifically, the approach clusters co-expressed genes on the basis of similar content and distributions of predicted statistically significant sequence motifs in their upstream regions.

Results

We applied our method to several sets of co-expressed genes and were able to define subsets with enrichment in particular biological processes and specific upstream regulatory motifs.

Conclusions

These results show the potential of our technique for functional prediction and regulatory motif identification from microarray data.
  相似文献   

20.

Background

Tuberculosis (TB) is a contagious infectious disease caused by Mycobacterium tuberculosis (Mtb). This disease with two million deaths per year has the highest mortality rate among bacterial infections. The only available vaccine against TB is BCG vaccine. BCG is an effective vaccine against TB in childhood, however, due to some limitations, has not proper efficiency in adults. Also, BCG cannot produce an adequately protective response against reactivation of latent infections.

Objective

In the present study we will review the most recent findings about contribution of HspX protein in the vaccines against tuberculosis.

Methods

Therefore, many attempts have been made to improve BCG or to find its replacement. Most of the subunit vaccines for TB in various phases of clinical trials were constructed as prophylactic vaccines using Mtb proteins expressed in the replicating stage. These vaccines might prevent active TB but not reactivation of latent tuberculosis infection (LTBI). A literature search was performed on various online databases (PubMed, Scopus, and Google Scholar) regarding the roles of HspX protein in tuberculosis vaccines.

Results

Ideal subunit post-exposure vaccines should target all forms of TB infection, including active symptomatic and dormant (latent) asymptomatic forms. Among these subunit vaccines, HspX is the most important latent phase antigen of M. tuberculosis with a strong immunological response. There are many studies that have evaluated the immunogenicity of this protein to improve TB vaccine.

Conclusion

According to the studies, HspX protein is a good candidate for development of subunit vaccines against TB infection.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号