首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
2.

Background  

Alternative splicing is an important gene regulation mechanism. It is estimated that about 74% of multi-exon human genes have alternative splicing. High throughput tandem (MS/MS) mass spectrometry provides valuable information for rapidly identifying potentially novel alternatively-spliced protein products from experimental datasets. However, the ability to identify alternative splicing events through tandem mass spectrometry depends on the database against which the spectra are searched.  相似文献   

3.

Background  

Classification studies using gene expression datasets are usually based on small numbers of samples and tens of thousands of genes. The selection of those genes that are important for distinguishing the different sample classes being compared, poses a challenging problem in high dimensional data analysis. We describe a new procedure for selecting significant genes as recursive cluster elimination (RCE) rather than recursive feature elimination (RFE). We have tested this algorithm on six datasets and compared its performance with that of two related classification procedures with RFE.  相似文献   

4.

Background  

Human cells of various tissue types differ greatly in morphology despite having the same set of genetic information. Some genes are expressed in all cell types to perform house-keeping functions, while some are selectively expressed to perform tissue-specific functions. In this study, we wished to elucidate how proteins encoded by human house-keeping genes and tissue-specific genes are organized in human protein-protein interaction networks. We constructed protein-protein interaction networks for different tissue types using two gene expression datasets and one protein-protein interaction database. We then calculated three network indices of topological importance, the degree, closeness, and betweenness centralities, to measure the network position of proteins encoded by house-keeping and tissue-specific genes, and quantified their local connectivity structure.  相似文献   

5.

Background  

Microarray technology has made it possible to simultaneously measure the expression levels of large numbers of genes in a short time. Gene expression data is information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. Clustering is an important tool for finding groups of genes with similar expression patterns in microarray data analysis. However, hard clustering methods, which assign each gene exactly to one cluster, are poorly suited to the analysis of microarray datasets because in such datasets the clusters of genes frequently overlap.  相似文献   

6.
A first-draft human protein-interaction map   总被引:3,自引:2,他引:1       下载免费PDF全文

Background

Protein-interaction maps are powerful tools for suggesting the cellular functions of genes. Although large-scale protein-interaction maps have been generated for several invertebrate species, projects of a similar scale have not yet been described for any mammal. Because many physical interactions are conserved between species, it should be possible to infer information about human protein interactions (and hence protein function) using model organism protein-interaction datasets.

Results

Here we describe a network of over 70,000 predicted physical interactions between around 6,200 human proteins generated using the data from lower eukaryotic protein-interaction maps. The physiological relevance of this network is supported by its ability to preferentially connect human proteins that share the same functional annotations, and we show how the network can be used to successfully predict the functions of human proteins. We find that combining interaction datasets from a single organism (but generated using independent assays) and combining interaction datasets from two organisms (but generated using the same assay) are both very effective ways of further improving the accuracy of protein-interaction maps.

Conclusions

The complete network predicts interactions for a third of human genes, including 448 human disease genes and 1,482 genes of unknown function, and so provides a rich framework for biomedical research.
  相似文献   

7.

Background  

Gene expression measurements from breast cancer (BrCa) tumors are established clinical predictive tools to identify tumor subtypes, identify patients showing poor/good prognosis, and identify patients likely to have disease recurrence. However, diverse breast cancer datasets in conjunction with diagnostic clinical arrays show little overlap in the sets of genes identified. One approach to identify a set of consistently dysregulated candidate genes in these tumors is to employ meta-analysis of multiple independent microarray datasets. This allows one to compare expression data from a diverse collection of breast tumor array datasets generated on either cDNA or oligonucleotide arrays.  相似文献   

8.

Background  

The extended use of microarray technologies has enabled the generation and accumulation of gene expression datasets that contain expression levels of thousands of genes across tens or hundreds of different experimental conditions. One of the major challenges in the analysis of such datasets is to discover local structures composed by sets of genes that show coherent expression patterns across subsets of experimental conditions. These patterns may provide clues about the main biological processes associated to different physiological states.  相似文献   

9.

Background  

DNA microarrays, which determine the expression levels of tens of thousands of genes from a sample, are an important research tool. However, the volume of data they produce can be an obstacle to interpretation of the results. Clustering the genes on the basis of similarity of their expression profiles can simplify the data, and potentially provides an important source of biological inference, but these methods have not been tested systematically on datasets from complex human tissues. In this paper, four clustering methods, CRC, k-means, ISA and memISA, are used upon three brain expression datasets. The results are compared on speed, gene coverage and GO enrichment. The effects of combining the clusters produced by each method are also assessed.  相似文献   

10.

Background  

Although the prediction of protein-protein interactions has been extensively investigated for yeast, few such datasets exist for the far larger proteome in human. Furthermore, it has recently been estimated that the overall average false positive rate of available computational and high-throughput experimental interaction datasets is as high as 90%.  相似文献   

11.

Background  

A number of completely sequenced eukaryotic genome data are available in the public domain. Eukaryotic genes are either 'intron containing' or 'intronless'. Eukaryotic 'intronless' genes are interesting datasets for comparative genomics and evolutionary studies. The SEGE database containing a collection of eukaryotic single exon genes is available. However, SEGE is derived using GenBank. The redundant, incomplete and heterogeneous qualities of GenBank data are a bottleneck for biological investigation in comparative genomics and evolutionary studies. Such studies often require representative gene sets from each genome and this is possible only by deriving specific datasets from completely sequenced genome data. Thus Genome SEGE, a database for 'intronless' genes in completely sequenced eukaryotic genomes, has been constructed.  相似文献   

12.

Background  

Microarray technology is generating huge amounts of data about the expression level of thousands of genes, or even whole genomes, across different experimental conditions. To extract biological knowledge, and to fully understand such datasets, it is essential to include external biological information about genes and gene products to the analysis of expression data. However, most of the current approaches to analyze microarray datasets are mainly focused on the analysis of experimental data, and external biological information is incorporated as a posterior process.  相似文献   

13.

Background  

Synthetic lethality experiments identify pairs of genes with complementary function. More direct functional associations (for example greater probability of membership in a single protein complex) may be inferred between genes that share synthetic lethal interaction partners than genes that are directly synthetic lethal. Probabilistic algorithms that identify gene modules based on motif discovery are highly appropriate for the analysis of synthetic lethal genetic interaction data and have great potential in integrative analysis of heterogeneous datasets.  相似文献   

14.

Background  

Cells dynamically adapt their gene expression patterns in response to various stimuli. This response is orchestrated into a number of gene expression modules consisting of co-regulated genes. A growing pool of publicly available microarray datasets allows the identification of modules by monitoring expression changes over time. These time-series datasets can be searched for gene expression modules by one of the many clustering methods published to date. For an integrative analysis, several time-series datasets can be joined into a three-dimensional gene-condition-time dataset, to which standard clustering or biclustering methods are, however, not applicable. We thus devise a probabilistic clustering algorithm for gene-condition-time datasets.  相似文献   

15.

Background  

Time course microarray profiles examine the expression of genes over a time domain. They are necessary in order to determine the complete set of genes that are dynamically expressed under given conditions, and to determine the interaction between these genes. Because of cost and resource issues, most time series datasets contain less than 9 points and there are few tools available geared towards the analysis of this type of data.  相似文献   

16.

Background  

RNA-mediated interference (RNAi)-based functional genomics is a systems-level approach to identify novel genes that control biological phenotypes. Existing computational approaches can identify individual genes from RNAi datasets that regulate a given biological process. However, currently available methods cannot identify which RNAi screen "hits" are novel components of well-characterized biological pathways known to regulate the interrogated phenotype. In this study, we describe a method to identify genes from RNAi datasets that are novel components of known biological pathways. We experimentally validate our approach in the context of a recently completed RNAi screen to identify novel regulators of melanogenesis.  相似文献   

17.

Background  

Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes.  相似文献   

18.

Background  

Identifying candidate genes in genetic networks is important for understanding regulation and biological function. Large gene expression datasets contain relevant information about genetic networks, but mining the data is not a trivial task. Algorithms that infer Bayesian networks from expression data are powerful tools for learning complex genetic networks, since they can incorporate prior knowledge and uncover higher-order dependencies among genes. However, these algorithms are computationally demanding, so novel techniques that allow targeted exploration for discovering new members of known pathways are essential.  相似文献   

19.
20.

Background  

Due to the large number of genes in a typical microarray dataset, feature selection looks set to play an important role in reducing noise and computational cost in gene expression-based tissue classification while improving accuracy at the same time. Surprisingly, this does not appear to be the case for all multiclass microarray datasets. The reason is that many feature selection techniques applied on microarray datasets are either rank-based and hence do not take into account correlations between genes, or are wrapper-based, which require high computational cost, and often yield difficult-to-reproduce results. In studies where correlations between genes are considered, attempts to establish the merit of the proposed techniques are hampered by evaluation procedures which are less than meticulous, resulting in overly optimistic estimates of accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号