首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Microarray gene expression data are accumulating in public databases. The expression profiles contain valuable information for understanding human gene expression patterns. However, the effective use of public microarray data requires integrating the expression profiles from heterogeneous sources.

Results

In this study, we have compiled a compendium of microarray expression profiles of various human tissue samples. The microarray raw data generated in different research laboratories have been obtained and combined into a single dataset after data normalization and transformation. To demonstrate the usefulness of the integrated microarray data for studying human gene expression patterns, we have analyzed the dataset to identify potential tissue-selective genes. A new method has been proposed for genome-wide identification of tissue-selective gene targets using both microarray intensity values and detection calls. The candidate genes for brain, liver and testis-selective expression have been examined, and the results suggest that our approach can select some interesting gene targets for further experimental studies.

Conclusion

A computational approach has been developed in this study for combining microarray expression profiles from heterogeneous sources. The integrated microarray data can be used to investigate tissue-selective expression patterns of human genes.
  相似文献   

2.

Background

Phylogenetic comparative methods (PCMs) have been applied widely in analyzing data from related species but their fit to data is rarely assessed.

Question

Can one determine whether any particular comparative method is typically more appropriate than others by examining comparative data sets?

Data

I conducted a meta-analysis of 122 phylogenetic data sets found by searching all papers in JEB, Blackwell Synergy and JSTOR published in 2002–2005 for the purpose of assessing the fit of PCMs. The number of species in these data sets ranged from 9 to 117.

Analysis Method

I used the Akaike information criterion to compare PCMs, and then fit PCMs to bivariate data sets through REML analysis. Correlation estimates between two traits and bootstrapped confidence intervals of correlations from each model were also compared.

Conclusions

For phylogenies of less than one hundred taxa, the Independent Contrast method and the independent, non-phylogenetic models provide the best fit.For bivariate analysis, correlations from different PCMs are qualitatively similar so that actual correlations from real data seem to be robust to the PCM chosen for the analysis. Therefore, researchers might apply the PCM they believe best describes the evolutionary mechanisms underlying their data.  相似文献   

3.

Background

Metabolic disorders such as obesity and diabetes are diseases which develop gradually over time in an individual and through the perturbations of genes. Systematic experiments tracking disease progression at gene level are usually conducted giving a temporal microarray data. There is a need for developing methods to analyze such complex data and extract important proteins which could be involved in temporal progression of the data and hence progression of the disease.

Results

In the present study, we have considered a temporal microarray data from an experiment conducted to study development of obesity and diabetes in mice. We have used this data along with an available Protein-Protein Interaction network to find a network of interactions between proteins which reproduces the next time point data from previous time point data. We show that the resulting network can be mined to identify critical nodes involved in the temporal progression of perturbations. We further show that published algorithms can be applied on such connected network to mine important proteins and show an overlap between outputs from published and our algorithms. The importance of set of proteins identified was supported by literature as well as was further validated by comparing them with the positive genes dataset from OMIM database which shows significant overlap.

Conclusions

The critical proteins identified from algorithms can be hypothesized to play important role in temporal progression of the data.
  相似文献   

4.

Background

Meaningful exchange of microarray data is currently difficult because it is rare that published data provide sufficient information depth or are even in the same format from one publication to another. Only when data can be easily exchanged will the entire biological community be able to derive the full benefit from such microarray studies.

Results

To this end we have developed three key ingredients towards standardizing the storage and exchange of microarray data. First, we have created a minimal information for the annotation of a microarray experiment (MIAME)-compliant conceptualization of microarray experiments modeled using the unified modeling language (UML) named MAGE-OM (microarray gene expression object model). Second, we have translated MAGE-OM into an XML-based data format, MAGE-ML, to facilitate the exchange of data. Third, some of us are now using MAGE (or its progenitors) in data production settings. Finally, we have developed a freely available software tool kit (MAGE-STK) that eases the integration of MAGE-ML into end users' systems.

Conclusions

MAGE will help microarray data producers and users to exchange information by providing a common platform for data exchange, and MAGE-STK will make the adoption of MAGE easier.  相似文献   

5.

Background

Many common diseases arise from an interaction between environmental and genetic factors. Our knowledge regarding environment and gene interactions is growing, but frameworks to build an association between gene-environment interactions and disease using preexisting, publicly available data has been lacking. Integrating freely-available environment-gene interaction and disease phenotype data would allow hypothesis generation for potential environmental associations to disease.

Methods

We integrated publicly available disease-specific gene expression microarray data and curated chemical-gene interaction data to systematically predict environmental chemicals associated with disease. We derived chemical-gene signatures for 1,338 chemical/environmental chemicals from the Comparative Toxicogenomics Database (CTD). We associated these chemical-gene signatures with differentially expressed genes from datasets found in the Gene Expression Omnibus (GEO) through an enrichment test.

Results

We were able to verify our analytic method by accurately identifying chemicals applied to samples and cell lines. Furthermore, we were able to predict known and novel environmental associations with prostate, lung, and breast cancers, such as estradiol and bisphenol A.

Conclusions

We have developed a scalable and statistical method to identify possible environmental associations with disease using publicly available data and have validated some of the associations in the literature.  相似文献   

6.
7.

Introduction

Untargeted metabolomics is a powerful tool for biological discoveries. To analyze the complex raw data, significant advances in computational approaches have been made, yet it is not clear how exhaustive and reliable the data analysis results are.

Objectives

Assessment of the quality of raw data processing in untargeted metabolomics.

Methods

Five published untargeted metabolomics studies, were reanalyzed.

Results

Omissions of at least 50 relevant compounds from the original results as well as examples of representative mistakes were reported for each study.

Conclusion

Incomplete raw data processing shows unexplored potential of current and legacy data.
  相似文献   

8.

Background and Aims

The green algal class Chlorophyceae comprises five orders (Chlamydomonadales, Sphaeropleales, Chaetophorales, Chaetopeltidales and Oedogoniales). Attempts to resolve the relationships among these groups have met with limited success. Studies of single genes (18S rRNA, 26S rRNA, rbcL or atpB) have largely failed to unambiguously resolve the relative positions of Oedogoniales, Chaetophorales and Chaetopeltidales (the OCC taxa). In contrast, recent genomics analyses of plastid data from OCC exemplars provided a robust phylogenetic analysis that supports a monophyletic OCC alliance.

Methods

An ITS2 data set was assembled to independently test the OCC hypothesis and to evaluate the performance of these data in assessing green algal phylogeny at the ordinal or class level. Sequence-structure analysis designed for use with ITS2 data was employed for phylogenetic reconstruction.

Key Results

Results of this study yielded trees that were, in general, topologically congruent with the results from the genomic analyses, including support for the monophyly of the OCC alliance.

Conclusions

Not all nodes from the ITS2 analyses exhibited robust support, but our investigation demonstrates that sequence-structure analyses of ITS2 provide a taxon-rich means of testing phylogenetic hypotheses at high taxonomic levels. Thus, the ITS2 data, in the context of sequence-structure analysis, provide an economical supplement or alternative to the single-marker approaches used in green algal phylogeny.  相似文献   

9.

Background

The variability in the clinical phenotype of Parkinson’s disease seems to suggest the existence of several subtypes of the disease. To test this hypothesis we performed a cluster analysis using data assessing both motor and non-motor symptoms in a large cohort of newly diagnosed untreated PD patients.

Methods

We collected data on demographic, motor, and the whole complex of non-motor symptoms from 100 consecutive newly diagnosed untreated outpatients. Statistical cluster analysis allowed the identification of different subgroups, which have been subsequently explored.

Results

The data driven approach identified four distinct groups of patients, we have labeled: 1) Benign Pure Motor; 2) Benign mixed Motor-Non-Motor; 3) Non-Motor Dominant; and 4) Motor Dominant.

Conclusion

Our results confirmed the existence of different subgroups of early PD patients. Cluster analysis revealed the presence of distinct subtypes of patients profiled according to the relevance of both motor and non-motor symptoms. Identification of such subtypes may have important implications for generating pathogenetic hypotheses and therapeutic strategies.  相似文献   

10.

Background

Researchers working in the area of Public Health are being confronted with large volumes of data on various aspects of entomology and epidemiology. To obtain the relevant information out of these data requires particular database management system. In this paper, we have described about the usages of our developed database on lymphatic filariasis.

Methods

This database application is developed using Model View Controller (MVC) architecture, with MySQL as database and a web based interface. We have collected and incorporated the data on filariasis in the database from Karimnagar, Chittoor, East and West Godavari districts of Andhra Pradesh, India.

Conclusion

The importance of this database is to store the collected data, retrieve the information and produce various combinational reports on filarial aspects which in turn will help the public health officials to understand the burden of disease in a particular locality. This information is likely to have an imperative role on decision making for effective control of filarial disease and integrated vector management operations.  相似文献   

11.

Background

Standardized schemas, databases, and public data repositories are needed for the studies of malaria vectors that encompass a remarkably diverse array of designs and rapidly generate large data volumes, often in resource-limited tropical settings lacking specialized software or informatics support.

Results

Data from the majority of mosquito studies conformed to a generic schema, with data collection forms recording the experimental design, sorting of collections, details of sample pooling or subdivision, and additional observations. Generically applicable forms with standardized attribute definitions enabled rigorous, consistent data and sample management with generic software and minimal expertise. Forms use now includes 20 experiments, 8 projects, and 15 users at 3 research and control institutes in 3 African countries, resulting in 11 peer-reviewed publications.

Conclusion

We have designed generic data schema that can be used to develop paper or electronic based data collection forms depending on the availability of resources. We have developed paper-based data collection forms that can be used to collect data from majority of entomological studies across multiple study areas using standardized data formats. Data recorded on these forms with standardized formats can be entered and linked with any relational database software. These informatics tools are recommended because they ensure that medical entomologists save time, improve data quality, and data collected and shared across multiple studies is in standardized formats hence increasing research outputs.
  相似文献   

12.

Context

A better understanding of “patient pathway” thanks to data analysis can lead to better treatments for patients. The ClinMine project, supported by the French National Research Agency (ANR), aims at proposing, from various case studies, algorithmic and statistical models able to handle this type of pathway data, focusing primarily on hospital data.

Methods

This article presents two of these case studies, focusing on the integration of temporal data within analysis. First, the hypothesis that some aspects of the patient pathway can be described, even predicted, from the management process of the hospital medical mail is studied. Therefore a specific functional data analysis is driven, and several types of patients have been detected. The second case study deals with the detection of profiles through a biclustering of the patients. The difficulty to simultaneously deal with heterogeneous data, including temporal data is exposed and a method is proposed.

Results

Experiments are driven on real data coming from a hospital. Results on these data show the effectiveness of the two proposed methods.

Conclusion

The project ClinMine aimed at dealing with hospital data in order to provide a better understanding of “patient pathway”. The two methods proposed here show their ability to simultaneously deal with heterogeneous data, including temporal aspects, and manages to give information for the understanding of “patient pathway” (identification of interesting clusters of patients).  相似文献   

13.

Background

Calls have been made for increased access to individual participant data (IPD) from clinical trials, to ensure that complete evidence is available. However, despite the obvious benefits, progress towards this is frustratingly slow. In the meantime, many systematic reviews have already collected IPD from clinical trials. We propose that a central repository for these IPD should be established to ensure that these datasets are safeguarded and made available for use by others, building on the strengths and advantages of the collaborative groups that have been brought together in developing the datasets.

Objective

Evaluate the level of support, and identify major issues, for establishing a central repository of IPD.

Design

On-line survey with email reminders.

Participants

71 reviewers affiliated with the Cochrane Collaboration''s IPD Meta-analysis Methods Group were invited to participate.

Results

30 (42%) invitees responded: 28 (93%) had been involved in an IPD review and 24 (80%) had been involved in a randomised trial. 25 (83%) agreed that a central repository was a good idea and 25 (83%) agreed that they would provide their IPD for central storage. Several benefits of a central repository were noted: safeguarding and standardisation of data, increased efficiency of IPD meta-analyses, knowledge advancement, and facilitating future clinical, and methodological research. The main concerns were gaining permission from trial data owners, uncertainty about the purpose of the repository, potential resource implications, and increased workload for IPD reviewers. Restricted access requiring approval, data security, anonymisation of data, and oversight committees were highlighted as issues under governance of the repository.

Conclusion

There is support in this community of IPD reviewers, many of whom are also involved in clinical trials, for storing IPD in a central repository. Results from this survey are informing further work on developing a repository of IPD which is currently underway by our group.  相似文献   

14.

Background

In recent years, both single-nucleotide polymorphism (SNP) array and functional magnetic resonance imaging (fMRI) have been widely used for the study of schizophrenia (SCZ). In addition, a few studies have been reported integrating both SNPs data and fMRI data for comprehensive analysis.

Methods

In this study, a novel sparse representation based variable selection (SRVS) method has been proposed and tested on a simulation data set to demonstrate its multi-resolution properties. Then the SRVS method was applied to an integrative analysis of two different SCZ data sets, a Single-nucleotide polymorphism (SNP) data set and a functional resonance imaging (fMRI) data set, including 92 cases and 116 controls. Biomarkers for the disease were identified and validated with a multivariate classification approach followed by a leave one out (LOO) cross-validation. Then we compared the results with that of a previously reported sparse representation based feature selection method.

Results

Results showed that biomarkers from our proposed SRVS method gave significantly higher classification accuracy in discriminating SCZ patients from healthy controls than that of the previous reported sparse representation method. Furthermore, using biomarkers from both data sets led to better classification accuracy than using single type of biomarkers, which suggests the advantage of integrative analysis of different types of data.

Conclusions

The proposed SRVS algorithm is effective in identifying significant biomarkers for complicated disease as SCZ. Integrating different types of data (e.g. SNP and fMRI data) may identify complementary biomarkers benefitting the diagnosis accuracy of the disease.
  相似文献   

15.

Background  

Recent advances in automation technologies have enabled the use of flow cytometry for high throughput screening, generating large complex data sets often in clinical trials or drug discovery settings. However, data management and data analysis methods have not advanced sufficiently far from the initial small-scale studies to support modeling in the presence of multiple covariates.  相似文献   

16.
17.

Background

Over the past several decades the efforts to improve maternal survival and the consequent demand for accurate estimates of maternal mortality have increased. However, measuring maternal mortality remains a difficult task especially in developing countries with weak information systems. Sibling histories included in household surveys (most notably the Demographic and Health Surveys (DHS)) have emerged as an important source of maternal mortality data. Data have been mainly collected from women and have not been widely collected from men due to concerns about data quality. We assess data quality of histories obtained from men and the potential to improve the efficiency of surveys measuring maternal mortality by collecting such data.

Methods and Findings

We used data from 10 Demographic and Health Surveys (DHS) that have included a full sibling history in both their women’s and men’s questionnaires. We estimated adult and maternal mortality indicators from histories obtained from men and women. We assessed the completeness and accuracy of these histories using several indicators of data quality. Our study finds that mortality estimates based on sibling histories obtained from men do not systematically or significantly differ from those obtained from women. Quality indicators were similar when comparing data from men and women. Pooling data obtained from men and women produced narrower confidence intervals.

Conclusion

From experience across nine developing countries, sibling history data obtained from men appear to be a reliable source of information on adult and maternal mortality. Given that there are no significant differences between mortality estimates based on data obtained from men and women, data can be pooled to increase efficiency. This finding improves the feasibility for countries to generate robust empirical estimates of adult and maternal mortality from surveys. Further we recommend that male sibling histories be collected from all sample households rather than from a subsample.  相似文献   

18.

Background

Data from biological samples and medical evaluations plays an essential part in clinical decision making. This data is equally important in clinical studies and it is critical to have an infrastructure that ensures that its quality is preserved throughout its entire lifetime. We are running a 5-year longitudinal clinical study, KOL-Örestad, with the objective to identify new COPD (Chronic Obstructive Pulmonary Disease) biomarkers in blood. In the study, clinical data and blood samples are collected from both private and public health-care institutions and stored at our research center in databases and biobanks, respectively. The blood is analyzed by Mass Spectrometry and the results from this analysis then linked to the clinical data.

Method

We built an infrastructure that allows us to efficiently collect and analyze the data. We chose to use REDCap as the EDC (Electronic Data Capture) tool for the study due to its short setup-time, ease of use, and flexibility. REDCap allows users to easily design data collection modules based on existing templates. In addition, it provides two functions that allow users to import batches of data; through a web API (Application Programming Interface) as well as by uploading CSV-files (Comma Separated Values).

Results

We created a software, DART (Data Rapid Translation), that translates our biomarker data into a format that fits REDCap's CSV-templates. In addition, DART is configurable to work with many other data formats as well. We use DART to import our clinical chemistry data to the REDCap database.

Conclusion

We have shown that a powerful and internationally adopted EDC tool such as REDCap can be extended so that it can be used efficiently in proteomic studies. In our study, we accomplish this by using DART to translate our clinical chemistry data to a format that fits the templates of REDCap.
  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号