首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
2.
Node-Link diagrams make it possible to take a quick glance at how nodes (or actors) in a network are connected by edges (or ties). A conventional network diagram of a “contact tree” maps out a root and branches that represent the structure of nodes and edges, often without further specifying leaves or fruits that would have grown from small branches. By furnishing such a network structure with leaves and fruits, we reveal details about “contacts” in our ContactTrees upon which ties and relationships are constructed. Our elegant design employs a bottom-up approach that resembles a recent attempt to understand subjective well-being by means of a series of emotions. Such a bottom-up approach to social-network studies decomposes each tie into a series of interactions or contacts, which can help deepen our understanding of the complexity embedded in a network structure. Unlike previous network visualizations, ContactTrees highlight how relationships form and change based upon interactions among actors, as well as how relationships and networks vary by contact attributes. Based on a botanical tree metaphor, the design is easy to construct and the resulting tree-like visualization can display many properties at both tie and contact levels, thus recapturing a key ingredient missing from conventional techniques of network visualization. We demonstrate ContactTrees using data sets consisting of up to three waves of 3-month contact diaries over the 2004-2012 period, and discuss how this design can be applied to other types of datasets.  相似文献   

3.
We present a consensus classification of life to embrace the more than 1.6 million species already provided by more than 3,000 taxonomists’ expert opinions in a unified and coherent, hierarchically ranked system known as the Catalogue of Life (CoL). The intent of this collaborative effort is to provide a hierarchical classification serving not only the needs of the CoL’s database providers but also the diverse public-domain user community, most of whom are familiar with the Linnaean conceptual system of ordering taxon relationships. This classification is neither phylogenetic nor evolutionary but instead represents a consensus view that accommodates taxonomic choices and practical compromises among diverse expert opinions, public usages, and conflicting evidence about the boundaries between taxa and the ranks of major taxa, including kingdoms. Certain key issues, some not fully resolved, are addressed in particular. Beyond its immediate use as a management tool for the CoL and ITIS (Integrated Taxonomic Information System), it is immediately valuable as a reference for taxonomic and biodiversity research, as a tool for societal communication, and as a classificatory “backbone” for biodiversity databases, museum collections, libraries, and textbooks. Such a modern comprehensive hierarchy has not previously existed at this level of specificity.  相似文献   

4.
Data-Driven Method to Estimate Nonlinear Chemical Equivalence   总被引:1,自引:0,他引:1  
There is great need to express the impacts of chemicals found in the environment in terms of effects from alternative chemicals of interest. Methods currently employed in fields such as life-cycle assessment, risk assessment, mixtures toxicology, and pharmacology rely mostly on heuristic arguments to justify the use of linear relationships in the construction of “equivalency factors,” which aim to model these concentration-concentration correlations. However, the use of linear models, even at low concentrations, oversimplifies the nonlinear nature of the concentration-response curve, therefore introducing error into calculations involving these factors. We address this problem by reporting a method to determine a concentration-concentration relationship between two chemicals based on the full extent of experimentally derived concentration-response curves. Although this method can be easily generalized, we develop and illustrate it from the perspective of toxicology, in which we provide equations relating the sigmoid and non-monotone, or “biphasic,” responses typical of the field. The resulting concentration-concentration relationships are manifestly nonlinear for nearly any chemical level, even at the very low concentrations common to environmental measurements. We demonstrate the method using real-world examples of toxicological data which may exhibit sigmoid and biphasic mortality curves. Finally, we use our models to calculate equivalency factors, and show that traditional results are recovered only when the concentration-response curves are “parallel,” which has been noted before, but we make formal here by providing mathematical conditions on the validity of this approach.  相似文献   

5.
It is generally accepted that the number of neurons in a given brain area far exceeds the number of neurons needed to carry any specific function controlled by that area. For example, motor areas of the human brain contain tens of millions of neurons that control the activation of tens or at most hundreds of muscles. This massive redundancy implies the covariation of many neurons, which constrains the population activity to a low-dimensional manifold within the space of all possible patterns of neural activity. To gain a conceptual understanding of the complexity of the neural activity within a manifold, it is useful to estimate its dimensionality, which quantifies the number of degrees of freedom required to describe the observed population activity without significant information loss. While there are many algorithms for dimensionality estimation, we do not know which are well suited for analyzing neural activity. The objective of this study was to evaluate the efficacy of several representative algorithms for estimating the dimensionality of linearly and nonlinearly embedded data. We generated synthetic neural recordings with known intrinsic dimensionality and used them to test the algorithms’ accuracy and robustness. We emulated some of the important challenges associated with experimental data by adding noise, altering the nature of the embedding of the low-dimensional manifold within the high-dimensional recordings, varying the dimensionality of the manifold, and limiting the amount of available data. We demonstrated that linear algorithms overestimate the dimensionality of nonlinear, noise-free data. In cases of high noise, most algorithms overestimated the dimensionality. We thus developed a denoising algorithm based on deep learning, the “Joint Autoencoder”, which significantly improved subsequent dimensionality estimation. Critically, we found that all algorithms failed when the intrinsic dimensionality was high (above 20) or when the amount of data used for estimation was low. Based on the challenges we observed, we formulated a pipeline for estimating the dimensionality of experimental neural data.  相似文献   

6.
Cancer is a genetic disease that develops through a series of somatic mutations, a subset of which drive cancer progression. Although cancer genome sequencing studies are beginning to reveal the mutational patterns of genes in various cancers, identifying the small subset of “causative” mutations from the large subset of “non-causative” mutations, which accumulate as a consequence of the disease, is a challenge. In this article, we present an effective machine learning approach for identifying cancer-associated mutations in human protein kinases, a class of signaling proteins known to be frequently mutated in human cancers. We evaluate the performance of 11 well known supervised learners and show that a multiple-classifier approach, which combines the performances of individual learners, significantly improves the classification of known cancer-associated mutations. We introduce several novel features related specifically to structural and functional characteristics of protein kinases and find that the level of conservation of the mutated residue at specific evolutionary depths is an important predictor of oncogenic effect. We consolidate the novel features and the multiple-classifier approach to prioritize and experimentally test a set of rare unconfirmed mutations in the epidermal growth factor receptor tyrosine kinase (EGFR). Our studies identify T725M and L861R as rare cancer-associated mutations inasmuch as these mutations increase EGFR activity in the absence of the activating EGF ligand in cell-based assays.  相似文献   

7.
The Chemical Master Equation (CME) is a cornerstone of stochastic analysis and simulation of models of biochemical reaction networks. Yet direct solutions of the CME have remained elusive. Although several approaches overcome the infinite dimensional nature of the CME through projections or other means, a common feature of proposed approaches is their susceptibility to the curse of dimensionality, i.e. the exponential growth in memory and computational requirements in the number of problem dimensions. We present a novel approach that has the potential to “lift” this curse of dimensionality. The approach is based on the use of the recently proposed Quantized Tensor Train (QTT) formatted numerical linear algebra for the low parametric, numerical representation of tensors. The QTT decomposition admits both, algorithms for basic tensor arithmetics with complexity scaling linearly in the dimension (number of species) and sub-linearly in the mode size (maximum copy number), and a numerical tensor rounding procedure which is stable and quasi-optimal. We show how the CME can be represented in QTT format, then use the exponentially-converging -discontinuous Galerkin discretization in time to reduce the CME evolution problem to a set of QTT-structured linear equations to be solved at each time step using an algorithm based on Density Matrix Renormalization Group (DMRG) methods from quantum chemistry. Our method automatically adapts the “basis” of the solution at every time step guaranteeing that it is large enough to capture the dynamics of interest but no larger than necessary, as this would increase the computational complexity. Our approach is demonstrated by applying it to three different examples from systems biology: independent birth-death process, an example of enzymatic futile cycle, and a stochastic switch model. The numerical results on these examples demonstrate that the proposed QTT method achieves dramatic speedups and several orders of magnitude storage savings over direct approaches.  相似文献   

8.
Increasing energy and housing demands are impacting wildlife populations throughout western North America. Greater sage-grouse (Centrocercus urophasianus), a species known for its sensitivity to landscape-scale disturbance, inhabits the same low elevation sage-steppe in which much of this development is occurring. Wyoming has committed to maintain sage-grouse populations through conservation easements and policy changes that conserves high bird abundance “core” habitat and encourages development in less sensitive landscapes. In this study, we built new predictive models of oil and gas, wind, and residential development and applied build-out scenarios to simulate future development and measure the efficacy of conservation actions for maintaining sage-grouse populations. Our approach predicts sage-grouse population losses averted through conservation action and quantifies return on investment for different conservation strategies. We estimate that without conservation, sage-grouse populations in Wyoming will decrease under our long-term scenario by 14–29% (95% CI: 4–46%). However, a conservation strategy that includes the “core area” policy and $250 million in targeted easements could reduce these losses to 9–15% (95% CI: 3–32%), cutting anticipated losses by roughly half statewide and nearly two-thirds within sage-grouse core breeding areas. Core area policy is the single most important component, and targeted easements are complementary to the overall strategy. There is considerable uncertainty around the magnitude of our estimates; however, the relative benefit of different conservation scenarios remains comparable because potential biases and assumptions are consistently applied regardless of the strategy. There is early evidence based on a 40% reduction in leased hectares inside core areas that Wyoming policy is reducing potential for future fragmentation inside core areas. Our framework using build-out scenarios to anticipate species declines provides estimates that could be used by decision makers to determine if expected population losses warrant ESA listing.  相似文献   

9.
Determinants of cooperation include ingroup vs. outgroup membership, and individual traits, such as prosociality and trust. We investigated whether these factors can be overridden by beliefs about people’s trust. We manipulated the information players received about each other’s level of general trust, “high” or “low”. These levels were either measured (Experiment 1) or just arbitrarily assigned labels (Experiment 2). Players’ choices whether to cooperate or defect in a stag hunt (or an assurance game)—where it is mutually beneficial to cooperate, but costly if the partner should fail to do so—were strongly predicted by what they were told about the other player’s trust label, as well as by what they were told that the other player was told about their own label. Our findings demonstrate the importance for cooperation in a risky coordination game of both first- and second-order beliefs about how much people trust each other. This supports the idea that institutions can influence cooperation simply by influencing beliefs.  相似文献   

10.
11.
High Throughput Biological Data (HTBD) requires detailed analysis methods and from a life science perspective, these analysis results make most sense when interpreted within the context of biological pathways. Bayesian Networks (BNs) capture both linear and nonlinear interactions and handle stochastic events in a probabilistic framework accounting for noise making them viable candidates for HTBD analysis. We have recently proposed an approach, called Bayesian Pathway Analysis (BPA), for analyzing HTBD using BNs in which known biological pathways are modeled as BNs and pathways that best explain the given HTBD are found. BPA uses the fold change information to obtain an input matrix to score each pathway modeled as a BN. Scoring is achieved using the Bayesian-Dirichlet Equivalent method and significance is assessed by randomization via bootstrapping of the columns of the input matrix. In this study, we improve on the BPA system by optimizing the steps involved in “Data Preprocessing and Discretization”, “Scoring”, “Significance Assessment”, and “Software and Web Application”. We tested the improved system on synthetic data sets and achieved over 98% accuracy in identifying the active pathways. The overall approach was applied on real cancer microarray data sets in order to investigate the pathways that are commonly active in different cancer types. We compared our findings on the real data sets with a relevant approach called the Signaling Pathway Impact Analysis (SPIA).  相似文献   

12.
The omnipresent need for optimisation requires constant improvements of companies’ business processes (BPs). Minimising the risk of inappropriate BP being implemented is usually performed by simulating the newly developed BP under various initial conditions and “what-if” scenarios. An effectual business process simulations software (BPSS) is a prerequisite for accurate analysis of an BP. Characterisation of an BPSS tool is a challenging task due to the complex selection criteria that includes quality of visual aspects, simulation capabilities, statistical facilities, quality reporting etc. Under such circumstances, making an optimal decision is challenging. Therefore, various decision support models are employed aiding the BPSS tool selection. The currently established decision support models are either proprietary or comprise only a limited subset of criteria, which affects their accuracy. Addressing this issue, this paper proposes a new hierarchical decision support model for ranking of BPSS based on their technical characteristics by employing DEX and qualitative to quantitative (QQ) methodology. Consequently, the decision expert feeds the required information in a systematic and user friendly manner. There are three significant contributions of the proposed approach. Firstly, the proposed hierarchical model is easily extendible for adding new criteria in the hierarchical structure. Secondly, a fully operational decision support system (DSS) tool that implements the proposed hierarchical model is presented. Finally, the effectiveness of the proposed hierarchical model is assessed by comparing the resulting rankings of BPSS with respect to currently available results.  相似文献   

13.
Intravaginal rings (IVRs) are currently under investigation as devices for the delivery of agents to protect against the sexual transmission of HIV and STIs, as well as pregnancy. To assist product developers in creating highly acceptable rings, we sought to identify characteristics that intravaginal ring users consider when making decisions about ring use or non-use. We conducted four semi-structured focus groups with 21 women (aged 18–45) who reported using an IVR in the past 12 months. Participants manipulated four prototype rings in their hands, discussed ring materials, dimensionality, and “behavior,” and shared perceptions and appraisals. Five salient ring characteristics were identified: 1) appearance of the rings’ surfaces, 2) tactile sensations of the cylinder material, 3) materials properties, 4) diameter of the cylinder, and 5) ring circumference. Pliability (or flexibility) was generally considered the most important mechanical property. Several ring properties (e.g., porousness, dimensionality) were associated with perceptions of efficacy. Women also revealed user behaviors that may impact the effectiveness of certain drugs, such as removing, rinsing and re-inserting the ring while bathing, and removing the ring during sexual encounters. As product developers explore IVRs as prevention delivery systems, it is critical to balance product materials and dimensions with use parameters to optimize drug delivery and the user experience. It is also critical to consider how user behaviors (e.g., removing the ring) might impact drug delivery.  相似文献   

14.
Stimulus dimensionality-reduction methods in neuroscience seek to identify a low-dimensional space of stimulus features that affect a neuron’s probability of spiking. One popular method, known as maximally informative dimensions (MID), uses an information-theoretic quantity known as “single-spike information” to identify this space. Here we examine MID from a model-based perspective. We show that MID is a maximum-likelihood estimator for the parameters of a linear-nonlinear-Poisson (LNP) model, and that the empirical single-spike information corresponds to the normalized log-likelihood under a Poisson model. This equivalence implies that MID does not necessarily find maximally informative stimulus dimensions when spiking is not well described as Poisson. We provide several examples to illustrate this shortcoming, and derive a lower bound on the information lost when spiking is Bernoulli in discrete time bins. To overcome this limitation, we introduce model-based dimensionality reduction methods for neurons with non-Poisson firing statistics, and show that they can be framed equivalently in likelihood-based or information-theoretic terms. Finally, we show how to overcome practical limitations on the number of stimulus dimensions that MID can estimate by constraining the form of the non-parametric nonlinearity in an LNP model. We illustrate these methods with simulations and data from primate visual cortex.  相似文献   

15.

Background

Graphical representation of data is one of the most easily comprehended forms of explanation. The current study describes a simple visualization tool which may allow greater understanding of medical and epidemiological data.

Method

We propose a simple tool for visualization of data, known as a “quilt plot”, that provides an alternative to presenting large volumes of data as frequency tables. Data from the Australian Needle and Syringe Program survey are used to illustrate “quilt plots”.

Conclusion

Visualization of large volumes of data using “quilt plots” enhances interpretation of medical and epidemiological data. Such intuitive presentations are particularly useful for the rapid assessment of problems in the data which cannot be readily identified by manual review. We recommend that, where possible, “quilt plots” be used along with traditional quantitative assessments of the data as an explanatory data analysis tool.  相似文献   

16.
Little is known about the history and population structure of our closest living relatives, the chimpanzees, in part because of an extremely poor fossil record. To address this, we report the largest genetic study of the chimpanzees to date, examining 310 microsatellites in 84 common chimpanzees and bonobos. We infer three common chimpanzee populations, which correspond to the previously defined labels of “western,” “central,” and “eastern,” and find little evidence of gene flow between them. There is tentative evidence for structure within western chimpanzees, but we do not detect distinct additional populations. The data also provide historical insights, demonstrating that the western chimpanzee population diverged first, and that the eastern and central populations are more closely related in time.  相似文献   

17.
18.
Genomic evaluation models can fit additive and dominant SNP effects. Under quantitative genetics theory, additive or “breeding” values of individuals are generated by substitution effects, which involve both “biological” additive and dominant effects of the markers. Dominance deviations include only a portion of the biological dominant effects of the markers. Additive variance includes variation due to the additive and dominant effects of the markers. We describe a matrix of dominant genomic relationships across individuals, D, which is similar to the G matrix used in genomic best linear unbiased prediction. This matrix can be used in a mixed-model context for genomic evaluations or to estimate dominant and additive variances in the population. From the “genotypic” value of individuals, an alternative parameterization defines additive and dominance as the parts attributable to the additive and dominant effect of the markers. This approach underestimates the additive genetic variance and overestimates the dominance variance. Transforming the variances from one model into the other is trivial if the distribution of allelic frequencies is known. We illustrate these results with mouse data (four traits, 1884 mice, and 10,946 markers) and simulated data (2100 individuals and 10,000 markers). Variance components were estimated correctly in the model, considering breeding values and dominance deviations. For the model considering genotypic values, the inclusion of dominant effects biased the estimate of additive variance. Genomic models were more accurate for the estimation of variance components than their pedigree-based counterparts.  相似文献   

19.
New generation vaccines are in demand to include only the key antigens sufficient to confer protective immunity among the plethora of pathogen molecules. In the last decade, large-scale genomics-based technologies have emerged. Among them, the Reverse Vaccinology approach was successfully applied to the development of an innovative vaccine against Neisseria meningitidis serogroup B, now available on the market with the commercial name BEXSERO® (Novartis Vaccines). The limiting step of such approaches is the number of antigens to be tested in in vivo models. Several laboratories have been trying to refine the original approach in order to get to the identification of the relevant antigens straight from the genome. Here we report a new bioinformatics tool that moves a first step in this direction. The tool has been developed by identifying structural/functional features recurring in known bacterial protective antigens, the so called “Protectome space,” and using such “protective signatures” for protective antigen discovery. In particular, we applied this new approach to Staphylococcus aureus and Group B Streptococcus and we show that not only already known protective antigens were re-discovered, but also two new protective antigens were identified.Although vaccines based on attenuated pathogens as pioneered by Luis Pasteur have been shown to be extremely effective, safety and technical reasons recommend that new generation vaccines include few selected pathogen components which, in combination with immunostimulatory molecules, can induce long lasting protective responses. Such approach implies that the key antigens sufficient to confer protective immunity are singled out among the plethora of pathogen molecules. As it turns out, the search for such protective antigens can be extremely complicated.Genomic technologies have opened the way to new strategies in vaccine antigen discovery (1, 2, 3). Among them, Reverse Vaccinology (RV)1 has proved to be highly effective, as demonstrated by the fact that a new Serogroup B Neisseria meningitidis (MenB) vaccine, incorporating antigens selected by RV, is now available to defeat meningococcal meningitis (4, 5). In essence, RV is based on the simple assumption that cloning all annotated proteins/genes and screening them against a robust and reliable surrogate-of-protection assay must lead to the identification of all protective antigens. Because most of the assays available for protective antigen selection involve animal immunization and challenge, the number of antigens to be tested represents a severe bottleneck of the entire process. For this reason, despite the fact that RV is a brute force, inclusive approach (“test-all-to-lose-nothing” type of approach) in their pioneered work of MenB vaccine discovery, Pizza and co-workers did not test the entire collection of MenB proteins but rather restricted their analysis to the ones predicted to be surface-localized. This was based on the evidence that for an anti-MenB vaccine to be protective bactericidal antibodies must be induced, a property that only surface-exposed antigens have. For the selection of surface antigens Pizza and co-workers mainly used PSORT and other available tools like MOTIFS and FINDPATTERNS to find proteins carrying localization-associated features such as transmembrane domains, leader peptides, and lipobox and outer membrane anchoring motifs. At the end, 570 proteins were selected and entered the still very labor intensive screening phase. Over the last few years, our laboratories have been trying to move to more selective strategies. Our ultimate goal, we like to refer to as the “Holy Grail of Vaccinology,” is to identify protective antigens by “simply” scanning the genome sequence of any given pathogen, thus avoiding time consuming “wet science” and “move straight from genome to the clinic” (6).With this objective in mind, we have developed a series of proteomics-based protocols that, in combination with bioinformatics tools, have substantially reduced the number of antigens to be tested in the surrogate-of-protection assays (7, 8). In particular, we have recently described a three-technology strategy that allows to narrow the number of antigens to be tested in the animal models down to less than ten (9). However, this strategy still requires high throughput experimental activities. Therefore, the availability of in silico tools that selectively and accurately single out relevant categories of antigens among the complexity of pathogen components would greatly facilitate the vaccine discovery process.In the present work, we describe a new bioinformatics approach that brings an additional contribution to our “from genome to clinic” goal. The approach has been developed on the basis of the assumption that protective antigens are protective in that they have specific structural/functional features (“protective signatures”) that distinguish them from immunologically irrelevant pathogen components. These features have been identified by using existing databases and prediction tools, such as PFam and SMART. Our approach focuses on protein biological role rather than its localization: it is completely protein localization unbiased, and lead to the identification of both surface-exposed and secreted antigens (which are the majority in extracellular bacteria) as well as cytoplasmic protective antigens (for instance, antigens that elicit interferon γ producing CD4+ T cells, thus potentiating the killing activity of phagocytic cells toward intracellular pathogens). Should these assumptions be valid, PS could be identified if: (1) all known protective antigens are compiled to create what we refer to as “the Protectome space,” and (2) Protectome is subjected to computer-assisted scrutiny using selected tools. Once signatures are identified, novel protective antigens of a pathogen of interest should be identifiable by scanning its genome sequence in search for proteins that carry one or more protective signatures. A similar attempt has been reported (10), where the discrimination of protective antigens versus nonprotective antigens was tried using statistical methods based on amino acid compositional analysis and auto cross-covariance. This model was implemented in a server for the prediction of vaccine candidates, that is, Vaxijen (www.darrenflower.info/Vaxijen); however, the selection criteria applied are still too general leading to a list of candidates that include ca. 30% of the total genome ORFs very similarly to the number of antigens predicted by classical RV based on the presence of localization signals.Here we show that Protectome analysis unravels specific signatures embedded in protective antigens, most of them related to the biological role/function of the proteins. These signatures narrow down the candidate list to ca. 3% of the total ORFs content and can be exploited for protective antigen discovery. Indeed, the strategy was validated by demonstrating that well characterized vaccine components could be identified by scanning the genome sequence of the corresponding pathogens for the presence of the PS. Furthermore, when the approach was applied to Staphylococcus aureus and Streptococcus agalactiae (Group B Streptococcus, GBS) not only already known protective antigens were rediscovered, but also two new protective antigens were identified.  相似文献   

20.
Data visualization is essential to discover patterns and anomalies in large high‐dimensional datasets. New dimensionality reduction techniques have thus been developed for visualizing omics data, in particular from single‐cell studies. However, jointly showing several types of data, for example, single‐cell expression and gene networks, remains a challenge. Here, we present ‘U‐CIE, a visualization method that encodes arbitrary high‐dimensional data as colors using a combination of dimensionality reduction and the CIELAB color space to retain the original structure to the extent possible. U‐CIE first uses UMAP to reduce high‐dimensional data to three dimensions, partially preserving distances between entities. Next, it embeds the resulting three‐dimensional representation within the CIELAB color space. This color model was designed to be perceptually uniform, meaning that the Euclidean distance between any two points should correspond to their relative perceptual difference. Therefore, the combination of UMAP and CIELAB thus results in a color encoding that captures much of the structure of the original high‐dimensional data. We illustrate its broad applicability by visualizing single‐cell data on a protein network and metagenomic data on a world map and on scatter plots.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号