首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
Microarrays have been useful in understanding various biological processes by allowing the simultaneous study of the expression of thousands of genes. However, the analysis of microarray data is a challenging task. One of the key problems in microarray analysis is the classification of unknown expression profiles. Specifically, the often large number of non-informative genes on the microarray adversely affects the performance and efficiency of classification algorithms. Furthermore, the skewed ratio of sample to variable poses a risk of overfitting. Thus, in this context, feature selection methods become crucial to select relevant genes and, hence, improve classification accuracy. In this study, we investigated feature selection methods based on gene expression profiles and protein interactions. We found that in our setup, the addition of protein interaction information did not contribute to any significant improvement of the classification results. Furthermore, we developed a novel feature selection method that relies exclusively on observed gene expression changes in microarray experiments, which we call “relative Signal-to-Noise ratio” (rSNR). More precisely, the rSNR ranks genes based on their specificity to an experimental condition, by comparing intrinsic variation, i.e. variation in gene expression within an experimental condition, with extrinsic variation, i.e. variation in gene expression across experimental conditions. Genes with low variation within an experimental condition of interest and high variation across experimental conditions are ranked higher, and help in improving classification accuracy. We compared different feature selection methods on two time-series microarray datasets and one static microarray dataset. We found that the rSNR performed generally better than the other methods.  相似文献   

2.
Identifying subspace gene clusters from the gene expression data is useful for discovering novel functional gene interactions. In this paper, we propose to use low-rank representation (LRR) to identify the subspace gene clusters from microarray data. LRR seeks the lowest-rank representation among all the candidates that can represent the genes as linear combinations of the bases in the dataset. The clusters can be extracted based on the block diagonal representation matrix obtained using LRR, and they can well capture the intrinsic patterns of genes with similar functions. Meanwhile, the parameter of LRR can balance the effect of noise so that the method is capable of extracting useful information from the data with high level of background noise. Compared with traditional methods, our approach can identify genes with similar functions yet without similar expression profiles. Also, it could assign one gene into different clusters. Moreover, our method is robust to the noise and can identify more biologically relevant gene clusters. When applied to three public datasets, the results show that the LRR based method is superior to existing methods for identifying subspace gene clusters.  相似文献   

3.
We propose permutation tests based on the pairwise distances between microarrays to compare location, variability, or equivalence of gene expression between two populations. For these tests the entire microarray or some pre-specified subset of genes is the unit of analysis. The pairwise distances only have to be computed once so the procedure is not computationally intensive despite the high dimensionality of the data. An R software package, permtest, implementing the method is freely available from the Comprehensive R Archive Network at http://cran.r-project.org.  相似文献   

4.
High-altitude hypoxia (reduced inspired oxygen tension due to decreased barometric pressure) exerts severe physiological stress on the human body. Two high-altitude regions where humans have lived for millennia are the Andean Altiplano and the Tibetan Plateau. Populations living in these regions exhibit unique circulatory, respiratory, and hematological adaptations to life at high altitude. Although these responses have been well characterized physiologically, their underlying genetic basis remains unknown. We performed a genome scan to identify genes showing evidence of adaptation to hypoxia. We looked across each chromosome to identify genomic regions with previously unknown function with respect to altitude phenotypes. In addition, groups of genes functioning in oxygen metabolism and sensing were examined to test the hypothesis that particular pathways have been involved in genetic adaptation to altitude. Applying four population genetic statistics commonly used for detecting signatures of natural selection, we identified selection-nominated candidate genes and gene regions in these two populations (Andeans and Tibetans) separately. The Tibetan and Andean patterns of genetic adaptation are largely distinct from one another, with both populations showing evidence of positive natural selection in different genes or gene regions. Interestingly, one gene previously known to be important in cellular oxygen sensing, EGLN1 (also known as PHD2), shows evidence of positive selection in both Tibetans and Andeans. However, the pattern of variation for this gene differs between the two populations. Our results indicate that several key HIF-regulatory and targeted genes are responsible for adaptation to high altitude in Andeans and Tibetans, and several different chromosomal regions are implicated in the putative response to selection. These data suggest a genetic role in high-altitude adaption and provide a basis for future genotype/phenotype association studies necessary to confirm the role of selection-nominated candidate genes and gene regions in adaptation to altitude.  相似文献   

5.
High Throughput Biological Data (HTBD) requires detailed analysis methods and from a life science perspective, these analysis results make most sense when interpreted within the context of biological pathways. Bayesian Networks (BNs) capture both linear and nonlinear interactions and handle stochastic events in a probabilistic framework accounting for noise making them viable candidates for HTBD analysis. We have recently proposed an approach, called Bayesian Pathway Analysis (BPA), for analyzing HTBD using BNs in which known biological pathways are modeled as BNs and pathways that best explain the given HTBD are found. BPA uses the fold change information to obtain an input matrix to score each pathway modeled as a BN. Scoring is achieved using the Bayesian-Dirichlet Equivalent method and significance is assessed by randomization via bootstrapping of the columns of the input matrix. In this study, we improve on the BPA system by optimizing the steps involved in “Data Preprocessing and Discretization”, “Scoring”, “Significance Assessment”, and “Software and Web Application”. We tested the improved system on synthetic data sets and achieved over 98% accuracy in identifying the active pathways. The overall approach was applied on real cancer microarray data sets in order to investigate the pathways that are commonly active in different cancer types. We compared our findings on the real data sets with a relevant approach called the Signaling Pathway Impact Analysis (SPIA).  相似文献   

6.
Both genetic drift and natural selection cause the frequencies of alleles in a population to vary over time. Discriminating between these two evolutionary forces, based on a time series of samples from a population, remains an outstanding problem with increasing relevance to modern data sets. Even in the idealized situation when the sampled locus is independent of all other loci, this problem is difficult to solve, especially when the size of the population from which the samples are drawn is unknown. A standard χ2-based likelihood-ratio test was previously proposed to address this problem. Here we show that the χ2-test of selection substantially underestimates the probability of type I error, leading to more false positives than indicated by its P-value, especially at stringent P-values. We introduce two methods to correct this bias. The empirical likelihood-ratio test (ELRT) rejects neutrality when the likelihood-ratio statistic falls in the tail of the empirical distribution obtained under the most likely neutral population size. The frequency increment test (FIT) rejects neutrality if the distribution of normalized allele-frequency increments exhibits a mean that deviates significantly from zero. We characterize the statistical power of these two tests for selection, and we apply them to three experimental data sets. We demonstrate that both ELRT and FIT have power to detect selection in practical parameter regimes, such as those encountered in microbial evolution experiments. Our analysis applies to a single diallelic locus, assumed independent of all other loci, which is most relevant to full-genome selection scans in sexual organisms, and also to evolution experiments in asexual organisms as long as clonal interference is weak. Different techniques will be required to detect selection in time series of cosegregating linked loci.  相似文献   

7.
G Yu  W Yao  J Wang  X Ma  W Xiao  H Li  D Xia  Y Yang  K Deng  H Xiao  B Wang  X Guo  W Guan  Z Hu  Y Bai  H Xu  J Liu  X Zhang  Z Ye 《PloS one》2012,7(8):e42377

Background

Long noncoding RNAs (lncRNAs) are an important class of pervasive genes involved in a variety of biological functions. They are aberrantly expressed in many types of cancers. In this study, we described lncRNAs profiles in 6 pairs of human renal clear cell carcinoma (RCCC) and the corresponding adjacent nontumorous tissues (NT) by microarray.

Methodology/Principal Findings

With abundant and varied probes accounting 33,045 LncRNAs in our microarray, the number of lncRNAs that expressed at a certain level could be detected is 17157. From the data we found there were thousands of lncRNAs that differentially expressed (≥2 fold-change) in RCCC tissues compared with NT and 916 lncRNAs differentially expressed in five or more of six RCCC samples. Compared with NT, many lncRNAs were significantly up-regulated or down-regulated in RCCC. Our data showed that down-regulated lncRNAs were more common than up-regulated ones. ENST00000456816, X91348, BC029135, NR_024418 were evaluated by qPCR in sixty-three pairs of RCCC and NT samples. The four lncRNAs were aberrantly expressed in RCCC compared with matched histologically normal renal tissues.

Conclusions/Significance

Our study is the first one to determine genome-wide lncRNAs expression patterns in RCCC by microarray. The results displayed that clusters of lncRNAs were aberrantly expressed in RCCC compared with NT samples, which revealed that lncRNAs differentially expressed in tumor tissues and normal tissues may exert a partial or key role in tumor development. Taken together, this study may provide potential targets for future treatment of RCCC and novel insights into cancer biology.  相似文献   

8.
9.
Cancer is thought to be caused by a sequence of multiple genetic and epigenetic alterations which occur in one or more of the genes controlling cell cycle progression and signaling transduction. The complexity of carcinogenic mechanisms leads to heterogeneity in molecular phenotype, pathology, and prognosis of cancers.  相似文献   

10.
微阵列技术是生物技术变革的核心,允许研究者同时监测成千上万个基的表达水平,已广泛应用医学研究.如何挖掘海量基表达信息中的有用信息并进行生物学专业解释,是基表达谱数据分析领所面临的一个重要挑战.生物信号通路研究已成为基芯片中不同表型差异表达研究的主要方法,其是以整个信号通路作为一个整体作为研究对象,此得出的研究结果更加科学和准确.在本文中我们简要描述了近10年来信号通路基集富集分析方法的发展情况,将其分为三个阶段,对每个阶段方法的基础和特点做了一些简单的总结和阐述.  相似文献   

11.
Identifying perturbed or dysregulated pathways is critical to understanding the biological processes that change within an experiment. Previous methods identified important pathways that are significantly enriched among differentially expressed genes; however, these methods cannot account for small, coordinated changes in gene expression that amass across a whole pathway. In order to overcome this limitation, we use microarray gene expression data to identify pathway perturbation based on pathway correlation profiles. By identifying the distribution of gene-gene pair correlations within a pathway, we can rank the pathways based on the level of perturbation and dysregulation. We have shown this successfully for differences between two experimental conditions in Escherichia coli and changes within time series data in Saccharomyces cerevisiae, as well as two estrogen receptor response classes of breast cancer. Overall, our method made significant predictions as to the pathway perturbations that are involved in the experimental conditions.  相似文献   

12.
Hilar cholangiocarcinoma (HCCA) is an invasive hepatic malignancy that is difficult to biopsy; therefore, novel markers of HCCA prognosis are needed. Here, the level of canonical Wnt activation in patients with HCCA, intrahepatic cholangiocarcinoma (IHCC), and congenital choledochal cysts (CCC) was compared to understand the role of Wnt signaling in HCCA. Pathology specimens from HCCA (n=129), IHCC (n=31), and CCC (n=45) patients were used to construct tissue microarrays. Wnt2, Wnt3, β-catenin, TCF4, c-Myc, and cyclin D1 were detected by immunohistochemistry. Parallel correlation analysis was used to analyze differences in protein levels between the HCCA, IHCC, and CCC groups. Univariate and multivariate analyses were used to determine independent predictors of successful resection and prognosis in the HCCA group. The protein levels of Wnt2, β-catenin, TCF4, c-Myc, and cyclin D1 were significantly higher in HCCA compared to IHHC or CCC. Wnt signaling activation (Wnt2+, Wnt3+, nuclear β-catenin+, nuclear TCF4+) was significantly greater in HCCA tissues than CCC tissues. Univariable analyses indicated that expression of cyclin D1 as well as Wnt signaling activation, and partial Wnt activation (Wnt2+ or Wnt3+ and nuclear β-catenin+ or nuclear TCF4+) predicted successful resection, but only cyclin D1 expression remained significant in multivariable analyses. Only partial Wnt activation was an independent predictor of survival time. Proteins in the canonical Wnt signaling pathway were present at higher levels in HCCA and correlated with tumor resecility and patient prognosis. These results suggest that Wnt pathway analysis may be a useful marker for clinical outcome in HCCA.Key words: Hilar cholangiocarcinoma, Wnt signaling pathway, tissue microarray, β-catenin, c-Myc, cyclin D1  相似文献   

13.
In this article we propose two practical types of designs for large time-course, dual-channel microarray experiments. One type consists of several interwoven loops, and the other type combines reference and loop designs. By representing the experiment as a graph, where the timepoints are nodes and the arrays are edges, we demonstrate how the time contrasts between any two timepoints can be estimated, provided that there is a path of edges linking them. In addition, we give a general formula for the variance of such contrasts. The efficiency of the proposed designs is evaluated by estimating the variances of the log-ratios of the comparisons of interest.  相似文献   

14.
15.
16.
Numerous prognostic gene expression signatures for breast cancer were generated previously with few overlap and limited insight into the biology of the disease. Here we introduce a novel algorithm named SCoR (Survival analysis using Cox proportional hazard regression and Random resampling) to apply random resampling and clustering methods in identifying gene features correlated with time to event data. This is shown to reduce overfitting noises involved in microarray data analysis and discover functional gene sets linked to patient survival. SCoR independently identified a common poor prognostic signature composed of cell proliferation genes from six out of eight breast cancer datasets. Furthermore, a sequential SCoR analysis on highly proliferative breast cancers repeatedly identified T/B cell markers as favorable prognosis factors. In glioblastoma, SCoR identified a common good prognostic signature of chromosome 10 genes from two gene expression datasets (TCGA and REMBRANDT), recapitulating the fact that loss of one copy of chromosome 10 (which harbors the tumor suppressor PTEN) is linked to poor survival in glioblastoma patients. SCoR also identified prognostic genes on sex chromosomes in lung adenocarcinomas, suggesting patient gender might be used to predict outcome in this disease. These results demonstrate the power of SCoR to identify common and biologically meaningful prognostic gene expression signatures.  相似文献   

17.
In mammalian cells, transcribed enhancers (TrEns) play important roles in the initiation of gene expression and maintenance of gene expression levels in a spatiotemporal manner. One of the most challenging questions is how the genomic characteristics of enhancers relate to enhancer activities. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers’ DNA code in a more systematic way. To address this problem, we developed a novel computational framework, Transcribed Enhancer Landscape Search (TELS), aimed at identifying predictive cell type/tissue-specific motif signatures of TrEns. As a case study, we used TELS to compile a comprehensive catalog of motif signatures for all known TrEns identified by the FANTOM5 consortium across 112 human primary cells and tissues. Our results confirm that combinations of different short motifs characterize in an optimized manner cell type/tissue-specific TrEns. Our study is the first to report combinations of motifs that maximize classification performance of TrEns exclusively transcribed in one cell type/tissue from TrEns exclusively transcribed in different cell types/tissues. Moreover, we also report 31 motif signatures predictive of enhancers’ broad activity. TELS codes and material are publicly available at http://www.cbrc.kaust.edu.sa/TELS.  相似文献   

18.
While much effort has focused on detecting positive and negative directional selection in the human genome, relatively little work has been devoted to balancing selection. This lack of attention is likely due to the paucity of sophisticated methods for identifying sites under balancing selection. Here we develop two composite likelihood ratio tests for detecting balancing selection. Using simulations, we show that these methods outperform competing methods under a variety of assumptions and demographic models. We apply the new methods to whole-genome human data, and find a number of previously-identified loci with strong evidence of balancing selection, including several HLA genes. Additionally, we find evidence for many novel candidates, the strongest of which is FANK1, an imprinted gene that suppresses apoptosis, is expressed during meiosis in males, and displays marginal signs of segregation distortion. We hypothesize that balancing selection acts on this locus to stabilize the segregation distortion and negative fitness effects of the distorter allele. Thus, our methods are able to reproduce many previously-hypothesized signals of balancing selection, as well as discover novel interesting candidates.  相似文献   

19.
Random nuclear restriction fragment length polymorphisms (RFLPs) were used to assess similarities and relationships among open-pollinated (OP) populations of the cultivated bulb onion (Allium cepa). Seventeen OP populations and 2 inbreds of contrasting daylength response [termed by convention as long (LD) and short (SD) day], 1 shallot (A. cepa var. ascalonicum), and one cultivar of bunching onion (Allium fistulosum) were examined with 104 cDNA clones and two to four restriction enzymes. Sixty (58%) clones detected at least 1 polymorphic fragment scorable among the OP populations and were used for analyses. The average number of polymorphic fragments per polymorphic probe-enzyme combination was 1.9, reflecting that numerous monomorphic fragments were usually present. Similarities were estimated as the proportion of polymorphic fragments shared by 2 populations. Average similarity values among LD, among SD, and between LD and SD OP populations were 0.79, 0.67, and 0.68, respectively. Relationships among the OP populations were estimated by parsimony, cluster analysis of similarities using the unweighted-pair-group method (UPGMA), and multivariate analysis using principle components. Parsimony analysis generated a strict consensus tree that grouped all but 1 LD onion with unresolved relationships to the SD OP populations. The UPGMA analysis placed together the LD storage OP populations. Principal component analysis grouped all but 2 LD onions; the other OP populations were dispersed. The results suggest that LD and SD onions do not represent distinct germ plasm, but that LD storage onions represent a derived group selected for production at higher latitudes. If it is assumed that the sampled populations are representative of all onion OP populations, the lower similarities among SD OP populations indicate that their collection and maintenance in germ plasm collections is important for the preservation of genetic diversity.  相似文献   

20.

Background

DNA word frequencies, normalized for genomic AT content, are remarkably stable within prokaryotic genomes and are therefore said to reflect a “genomic signature.” The genomic signatures can be used to phylogenetically classify organisms from arbitrary sampled DNA. Genomic signatures can also be used to search for horizontally transferred DNA or DNA regions subjected to special selection forces. Thus, the stability of the genomic signature can be used as a measure of genomic homogeneity. The factors associated with the stability of the genomic signatures are not known, and this motivated us to investigate further. We analyzed the intra-genomic variance of genomic signatures based on AT content normalization (0th order Markov model) as well as genomic signatures normalized by smaller DNA words (1st and 2nd order Markov models) for 636 sequenced prokaryotic genomes. Regression models were fitted, with intra-genomic signature variance as the response variable, to a set of factors representing genomic properties such as genomic AT content, genome size, habitat, phylum, oxygen requirement, optimal growth temperature and oligonucleotide usage variance (OUV, a measure of oligonucleotide usage bias), measured as the variance between genomic tetranucleotide frequencies and Markov chain approximated tetranucleotide frequencies, as predictors.

Principal Findings

Regression analysis revealed that OUV was the most important factor (p<0.001) determining intra-genomic homogeneity as measured using genomic signatures. This means that the less random the oligonucleotide usage is in the sense of higher OUV, the more homogeneous the genome is in terms of the genomic signature. The other factors influencing variance in the genomic signature (p<0.001) were genomic AT content, phylum and oxygen requirement.

Conclusions

Genomic homogeneity in prokaryotes is intimately linked to genomic GC content, oligonucleotide usage bias (OUV) and aerobiosis, while oligonucleotide usage bias (OUV) is associated with genomic GC content, aerobiosis and habitat.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号