首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
Molecular and functional profiling of cancer cell lines is subject to laboratory‐specific experimental practices and data analysis protocols. The current challenge therefore is how to make an integrated use of the omics profiles of cancer cell lines for reliable biological discoveries. Here, we carried out a systematic analysis of nine types of data modalities using meta‐analysis of 53 omics studies across 12 research laboratories for 2,018 cell lines. To account for a relatively low consistency observed for certain data modalities, we developed a robust data integration approach that identifies reproducible signals shared among multiple data modalities and studies. We demonstrated the power of the integrative analyses by identifying a novel driver gene, ECHDC1, with tumor suppressive role validated both in breast cancer cells and patient tumors. The multi‐modal meta‐analysis approach also identified synthetic lethal partners of cancer drivers, including a co‐dependency of PTEN deficient endometrial cancer cells on RNA helicases.  相似文献   

2.
3.
Algorithms for active module identification (AMI) are central to analysis of omics data. Such algorithms receive a gene network and nodes'' activity scores as input and report subnetworks that show significant over‐representation of accrued activity signal (“active modules”), thus representing biological processes that presumably play key roles in the analyzed conditions. Here, we systematically evaluated six popular AMI methods on gene expression and GWAS data. We observed that GO terms enriched in modules detected on the real data were often also enriched on modules found on randomly permuted data. This indicated that AMI methods frequently report modules that are not specific to the biological context measured by the analyzed omics dataset. To tackle this bias, we designed a permutation‐based method that empirically evaluates GO terms reported by AMI methods. We used the method to fashion five novel AMI performance criteria. Last, we developed DOMINO, a novel AMI algorithm, that outperformed the other six algorithms in extensive testing on GE and GWAS data. Software is available at https://github.com/Shamir‐Lab.  相似文献   

4.
5.

Motivation

In mass spectrometry-based proteomics, XML formats such as mzML and mzXML provide an open and standardized way to store and exchange the raw data (spectra and chromatograms) of mass spectrometric experiments. These file formats are being used by a multitude of open-source and cross-platform tools which allow the proteomics community to access algorithms in a vendor-independent fashion and perform transparent and reproducible data analysis. Recent improvements in mass spectrometry instrumentation have increased the data size produced in a single LC-MS/MS measurement and put substantial strain on open-source tools, particularly those that are not equipped to deal with XML data files that reach dozens of gigabytes in size.

Results

Here we present a fast and versatile parsing library for mass spectrometric XML formats available in C++ and Python, based on the mature OpenMS software framework. Our library implements an API for obtaining spectra and chromatograms under memory constraints using random access or sequential access functions, allowing users to process datasets that are much larger than system memory. For fast access to the raw data structures, small XML files can also be completely loaded into memory. In addition, we have improved the parsing speed of the core mzML module by over 4-fold (compared to OpenMS 1.11), making our library suitable for a wide variety of algorithms that need fast access to dozens of gigabytes of raw mass spectrometric data.

Availability

Our C++ and Python implementations are available for the Linux, Mac, and Windows operating systems. All proposed modifications to the OpenMS code have been merged into the OpenMS mainline codebase and are available to the community at https://github.com/OpenMS/OpenMS.  相似文献   

6.
  1. Metadata plays an essential role in the long‐term preservation, reuse, and interoperability of data. Nevertheless, creating useful metadata can be sufficiently difficult and weakly enough incentivized that many datasets may be accompanied by little or no metadata. One key challenge is, therefore, how to make metadata creation easier and more valuable. We present a solution that involves creating domain‐specific metadata schemes that are as complex as necessary and as simple as possible. These goals are achieved by co‐development between a metadata expert and the researchers (i.e., the data creators). The final product is a bespoke metadata scheme into which researchers can enter information (and validate it) via the simplest of interfaces: a web browser application and a spreadsheet.
  2. We provide the R package dmdScheme (dmdScheme: An R package for working with domain specific MetaData schemes (Version v0.9.22), 2019) for creating a template domain‐specific scheme. We describe how to create a domain‐specific scheme from this template, including the iterative co‐development process, and the simple methods for using the scheme, and simple methods for quality assessment, improvement, and validation.
  3. The process of developing a metadata scheme following the outlined approach was successful, resulting in a metadata scheme which is used for the data generated in our research group. The validation quickly identifies forgotten metadata, as well as inconsistent metadata, therefore improving the quality of the metadata. Multiple output formats are available, including XML.
  4. Making the provision of metadata easier while also ensuring high quality must be a priority for data curation initiatives. We show how both objectives are achieved by close collaboration between metadata experts and researchers to create domain‐specific schemes. A near‐future priority is to provide methods to interface domain‐specific schemes with general metadata schemes, such as the Ecological Metadata Language, to increase interoperability.

The article describes a methodology to develop, enter, and validate domain specific metadata schemes which is suitable to be used by nonmetadata specialists. The approach uses an R package which forms the backend of the processing of the metadata, uses spreadsheets to enter the metadata, and provides a server based approach to distribute and use the developed metadata schemes.  相似文献   

7.
Type 2 diabetes mellitus (T2DM) is an independent risk factor of Alzheimer''s disease (AD). Therefore, identifying periphery biomarkers correlated with mild cognitive impairment (MCI) is of importance for early diagnosis of AD. Here, we performed platelet proteomics in T2DM patients with MCI (T2DM‐MCI) and without MCI (T2DM‐nMCI). Pearson analysis of the omics data with MMSE (mini‐mental state examination), Aβ1‐42/Aβ1‐40 (β‐amyloid), and rGSK‐3β(T/S9) (total to Serine‐9‐phosphorylated glycogen synthase kinase‐3β) revealed that mitophagy/autophagy‐, insulin signaling‐, and glycolysis/gluconeogenesis pathways‐related proteins were most significantly involved. Among them, only the increase of optineurin, an autophagy‐related protein, was simultaneously correlated with the reduced MMSE score, and the increased Aβ1‐42/Aβ1‐40 and rGSK‐3β(T/S9), and the optineurin alone could discriminate T2DM‐MCI from T2DM‐nMCI. Combination of the elevated platelet optineurin and rGSK‐3β(T/S9) enhanced the MCI‐discriminating efficiency with AUC of 0.927, specificity of 86.7%, sensitivity of 85.3%, and accuracy of 0.859, which is promising for predicting cognitive decline in T2DM patients.  相似文献   

8.
9.
Accurate measurements of cellular protein concentrations are invaluable to quantitative studies of gene expression and physiology in living cells. Here, we developed a versatile mass spectrometric workflow based on data‐independent acquisition proteomics (DIA/SWATH) together with a novel protein inference algorithm (xTop). We used this workflow to accurately quantify absolute protein abundances in Escherichia coli for > 2,000 proteins over > 60 growth conditions, including nutrient limitations, non‐metabolic stresses, and non‐planktonic states. The resulting high‐quality dataset of protein mass fractions allowed us to characterize proteome responses from a coarse (groups of related proteins) to a fine (individual) protein level. Hereby, a plethora of novel biological findings could be elucidated, including the generic upregulation of low‐abundant proteins under various metabolic limitations, the non‐specificity of catabolic enzymes upregulated under carbon limitation, the lack of large‐scale proteome reallocation under stress compared to nutrient limitations, as well as surprising strain‐dependent effects important for biofilm formation. These results present valuable resources for the systems biology community and can be used for future multi‐omics studies of gene regulation and metabolic control in Ecoli.  相似文献   

10.
11.
There is a need to store very large numbers of conventional human pluripotent stem cell (hPSC) lines for their off‐the‐shelf usage in stem cell therapy. Therefore, it is valuable to generate “universal” or “hypoimmunogenic” hPSCs with gene‐editing technology by knocking out or in immune‐related genes. A few universal or hypoimmunogenic hPSC lines should be enough to store for their off‐the‐shelf usage. Here, we overview and discuss how to prepare universal or hypoimmunogenic hPSCs and their disadvantages. β2‐Microglobulin‐knockout hPSCs did not harbour human leukocyte antigen (HLA)‐expressing class I cells but rather activated natural killer (NK) cells. To avoid NK cell and macrophage activities, homozygous hPSCs expressing a single allele of an HLA class I molecule, such as HLA‐C, were developed. Major HLA class I molecules were knocked out, and PD‐L1, HLA‐G and CD47 were knocked in hPSCs using CRISPR/Cas9 gene editing. These cells escaped activation of not only T cells but also NK cells and macrophages, generating universal hPSCs.  相似文献   

12.
13.
A growing food demand and advanced agricultural techniques increasingly affect farmland ecosystems, threatening invertebrate populations with cascading effects along the food chain upon insectivorous vertebrates. Supporting farmland biodiversity thus optimally requires the delineation of species hotspots at multiple trophic levels to prioritize conservation management. The goal of this study was to investigate the links between grassland management intensity and orthopteran density at the field scale and to upscale this information to the landscape in order to guide management action at landscape scale. More specifically, we investigated the relationships between grassland management intensity, floral indicator species, and orthopteran abundance in grasslands with different land use in the SW Swiss Alps. Field vegetation surveys of indicator plant species were used to generate a management intensity proxy, to which field assessments of orthopterans were related. Orthopteran abundance showed a hump‐shaped response to management intensity, with low values in intensified, nutrient‐rich grasslands and in nutrient‐poor, xeric grasslands, while it peaked in middle‐intensity grasslands. Combined with remote‐sensed data about grassland gross primary productivity, the above proxy was used to build landscape‐wide, spatially explicit projections of the potential distribution of orthopteran‐rich grasslands as possible foraging grounds for insectivorous vertebrates. This spatially explicit multitrophic approach enables the delineation of focal farmland areas in order to prioritize conservation action.  相似文献   

14.
15.
16.
In the past two decades, our ability to study cellular and molecular systems has been transformed through the development of omics sciences. While unlimited potential lies within massive omics datasets, the success of omics sciences to further our understanding of human disease and/or translating these findings to clinical utility remains elusive due to a number of factors. A significant limiting factor is the integration of different omics datasets (i.e., integromics) for extraction of biological and clinical insights. To this end, the National Cancer Institute (NCI) and the National Heart, Lung and Blood Institute (NHLBI) organized a joint workshop in June 2012 with the focus on integration issues related to multi-omics technologies that needed to be resolved in order to realize the full utility of integrating omics datasets by providing a glimpse into the disease as an integrated “system”. The overarching goals were to (1) identify challenges and roadblocks in omics integration, and (2) facilitate the full maturation of ‘integromics’ in biology and medicine. Participants reached a consensus on the most significant barriers for integrating omics sciences and provided recommendations on viable approaches to overcome each of these barriers within the areas of technology, bioinformatics and clinical medicine.  相似文献   

17.

Background  

Many proteomics initiatives require integration of all information with uniformcriteria from collection of samples and data display to publication of experimental results. The integration and exchanging of these data of different formats and structure imposes a great challenge to us. The XML technology presents a promise in handling this task due to its simplicity and flexibility. Nasopharyngeal carcinoma (NPC) is one of the most common cancers in southern China and Southeast Asia, which has marked geographic and racial differences in incidence. Although there are some cancer proteome databases now, there is still no NPC proteome database.  相似文献   

18.
While insulin‐like growth factor‐1 (IGF‐1) is a well‐established modulator of aging and longevity in model organisms, its role in humans has been controversial. In this study, we used the UK Biobank (n = 440,185) to resolve previous ambiguities in the relationship between serum IGF‐1 levels and clinical disease. We examined prospective associations of serum IGF‐1 with mortality, dementia, vascular disease, diabetes, osteoporosis, and cancer, finding two generalized patterns: First, IGF‐1 interacts with age to modify risk in a manner consistent with antagonistic pleiotropy; younger individuals with high IGF‐1 are protected from disease, while older individuals with high IGF‐1 are at increased risk for incident disease or death. Second, the association between IGF‐1 and risk is generally U‐shaped, indicating that both high and low levels of IGF‐1 may be detrimental. With the exception of a more uniformly positive relationship between IGF‐1 and cancer, these effects were remarkably consistent across a wide range of conditions, providing evidence for a unifying pathway that determines risk for most age‐associated diseases. These data suggest that IGF‐1 signaling could be harmful in older adults, who may actually benefit from the attenuation of biological growth pathways.  相似文献   

19.
Outbreaks of infectious viruses resulting from spillover events from bats have brought much attention to bat‐borne zoonoses, which has motivated increased ecological and epidemiological studies on bat populations. Field sampling methods often collect pooled samples of bat excreta from plastic sheets placed under‐roosts. However, positive bias is introduced because multiple individuals may contribute to pooled samples, making studies of viral dynamics difficult. Here, we explore the general issue of bias in spatial sample pooling using Hendra virus in Australian bats as a case study. We assessed the accuracy of different under‐roost sampling designs using generalized additive models and field data from individually captured bats and pooled urine samples. We then used theoretical simulation models of bat density and under‐roost sampling to understand the mechanistic drivers of bias. The most commonly used sampling design estimated viral prevalence 3.2 times higher than individual‐level data, with positive bias 5–7 times higher than other designs due to spatial autocorrelation among sampling sheets and clustering of bats in roosts. Simulation results indicate using a stratified random design to collect 30–40 pooled urine samples from 80 to 100 sheets, each with an area of 0.75–1 m2, and would allow estimation of true prevalence with minimum sampling bias and false negatives. These results show that widely used under‐roost sampling techniques are highly sensitive to viral presence, but lack specificity, providing limited information regarding viral dynamics. Improved estimation of true prevalence can be attained with minor changes to existing designs such as reducing sheet size, increasing sheet number, and spreading sheets out within the roost area. Our findings provide insight into how spatial sample pooling is vulnerable to bias for a wide range of systems in disease ecology, where optimal sampling design is influenced by pathogen prevalence, host population density, and patterns of aggregation.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号