首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The last 10 years have seen the rise of many technologies that produce an unprecedented amount of genome-scale data from many organisms. Although the research community has been successful in exploring these data, many challenges still persist. One of them is the effective integration of such data sets directly into approaches based on mathematical modeling of biological systems. Applications in cancer are a good example. The bridge between information and modeling in cancer can be achieved by two major types of complementary strategies. First, there is a bottom-up approach, in which data generates information about structure and relationship between components of a given system. In addition, there is a top-down approach, where cybernetic and systems-theoretical knowledge are used to create models that describe mechanisms and dynamics of the system. These approaches can also be linked to yield multi-scale models combining detailed mechanism and wide biological scope. Here we give an overall picture of this field and discuss possible strategies to approach the major challenges ahead.  相似文献   

2.
A systems genetics approach combining pathway analysis of quantitative trait loci (QTL) and gene expression information has provided strong evidence for common pathways associated with genetic resistance to internal parasites. Gene data, collected from published QTL regions in sheep, cattle, mice, rats and humans, and microarray data from sheep, were converted to human Entrez Gene IDs and compared to the KEGG pathway database. Selection of pathways from QTL data was based on a selection index that ensured that the selected pathways were in all species and the majority of the projects overall and within species. Pathways with either up- and down-regulated genes, primarily up-regulated genes or primarily down-regulated genes, were selected from gene expression data. After comparing the data sets independently, the pathways from each data set were compared and the common set of pathways and genes was identified. Comparisons within data sets identified 21 pathways from QTL data and 66 pathways from gene expression data. Both selected sets were enriched with pathways involved in immune functions, disease and cell responses to signals. The analysis identified 14 pathways that were common between QTL and gene expression data, and four directly associated with IFNγ or MHCII, with 31 common genes, including three MHCII genes. In conclusion, a systems genetics approach combining data from multiple QTL and gene expression projects led to the discovery of common pathways associated with genetic resistance to internal parasites. This systems genetics approach may prove significant for the discovery of candidate genes for many other multifactorial, economically important traits.  相似文献   

3.
Genetic network reverse engineering has been an area of intensive research within the systems biology community during the last decade. With many techniques currently available, the task of validating them and choosing the best one for a certain problem is a complex issue. Current practice has been to validate an approach on in-silico synthetic data sets, and, wherever possible, on real data sets with known ground-truth. In this study, we highlight a major issue that the validation of reverse engineering algorithms on small benchmark networks very often results in networks which are not statistically better than a randomly picked network. Another important issue highlighted is that with short time series, a small variation in the pre-processing procedure might yield large differences in the inferred networks. To demonstrate these issues, we have selected as our case study the IRMA in-vivo synthetic yeast network recently published in Cell. Using Fisher's exact test, we show that many results reported in the literature on reverse-engineering this network are not significantly better than random. The discussion is further extended to some other networks commonly used for validation purposes in the literature. The results presented in this study emphasize that studies carried out using small genetic networks are likely to be trivial, making it imperative that larger real networks be used for validating and benchmarking purposes. If smaller networks are considered, then the results should be interpreted carefully to avoid over confidence. This article is part of a Special Issue entitled: Computational Methods for Protein Interaction and Structural Prediction.  相似文献   

4.
《Journal of Physiology》2013,107(5):369-398
An important property of visual systems is to be simultaneously both selective to specific patterns found in the sensory input and invariant to possible variations. Selectivity and invariance (tolerance) are opposing requirements. It has been suggested that they could be joined by iterating a sequence of elementary selectivity and tolerance computations. It is, however, unknown what should be selected or tolerated at each level of the hierarchy. We approach this issue by learning the computations from natural images. We propose and estimate a probabilistic model of natural images that consists of three processing layers. Two natural image data sets are considered: image patches, and complete visual scenes downsampled to the size of small patches. For both data sets, we find that in the first two layers, simple and complex cell-like computations are performed. In the third layer, we mainly find selectivity to longer contours; for patch data, we further find some selectivity to texture, while for the downsampled complete scenes, some selectivity to curvature is observed.  相似文献   

5.
6.
Yi G  Shi JQ  Choi T 《Biometrics》2011,67(4):1285-1294
The model based on Gaussian process (GP) prior and a kernel covariance function can be used to fit nonlinear data with multidimensional covariates. It has been used as a flexible nonparametric approach for curve fitting, classification, clustering, and other statistical problems, and has been widely applied to deal with complex nonlinear systems in many different areas particularly in machine learning. However, it is a challenging problem when the model is used for the large-scale data sets and high-dimensional data, for example, for the meat data discussed in this article that have 100 highly correlated covariates. For such data, it suffers from large variance of parameter estimation and high predictive errors, and numerically, it suffers from unstable computation. In this article, penalized likelihood framework will be applied to the model based on GPs. Different penalties will be investigated, and their ability in application given to suit the characteristics of GP models will be discussed. The asymptotic properties will also be discussed with the relevant proofs. Several applications to real biomechanical and bioinformatics data sets will be reported.  相似文献   

7.
Methods to handle missing data have been an area of statistical research for many years. Little has been done within the context of pedigree analysis. In this paper we present two methods for imputing missing data for polygenic models using family data. The imputation schemes take into account familial relationships and use the observed familial information for the imputation. A traditional multiple imputation approach and multiple imputation or data augmentation approach within a Gibbs sampler for the handling of missing data for a polygenic model are presented.We used both the Genetic Analysis Workshop 13 simulated missing phenotype and the complete phenotype data sets as the means to illustrate the two methods. We looked at the phenotypic trait systolic blood pressure and the covariate gender at time point 11 (1970) for Cohort 1 and time point 1 (1971) for Cohort 2. Comparing the results for three replicates of complete and missing data incorporating multiple imputation, we find that multiple imputation via a Gibbs sampler produces more accurate results. Thus, we recommend the Gibbs sampler for imputation purposes because of the ease with which it can be extended to more complicated models, the consistency of the results, and the accountability of the variation due to imputation.  相似文献   

8.
In just the past 20 years systematics has progressed from the sequencing of individual genes for a few taxa to routine sequencing of complete plastid and even nuclear genomes. Recent technological advances have made it possible to compile very large data sets, the analyses of which have in turn provided unprecedented insights into phylogeny and evolution. Indeed, this narrow window of a few decades will likely be viewed as a golden era in systematics. Relationships have been resolved at all taxonomic levels across all groups of photosynthetic life. In the angiosperms, problematic deep-level relationships have either been largely resolved, or will be resolved within the next several years. The same large data sets have also provided new insights into the many rapid radiations that have characterized angiosperm evolution. For example, all of the major lineages of angiosperms likely arose within a narrow window of just a few million years. At the population level, the ease of DNA sequencing has given new life to phylogeographic studies, and microsatellite analyses have become more commonplace, with a concomitant impact on conservation and population biology. With the wealth of sequence data soon to be available, we are on the cusp of assembling the first semi-comprehensive tree of life for many of the 15,000 genera of flowering plants and indeed for much of green life. Accompanying these opportunities are also enormous new computational/informatic challenges including the management and phylogenetic analysis of such large, sometimes fragmentary data sets, and visualization of trees with thousands of terminals.  相似文献   

9.
A new method for analyzing steady-state enzyme kinetic data is presented. The technique, which is based on the numerical differentiation of the complete reaction curve, has several advantages over initial velocity and integrated Michaelis-Menten equation methods. The differentiated data are fit to the differential equation describing the appropriate kinetic scheme. This approach is particularly valuable in cases of strong competitive product inhibition and of changing concentrations of active enzyme. The method assumes a reversible reaction and is applicable to a very wide variety of steady-state kinetic schemes. A particular advantage of this approach over integrated methods is that it is independent of [S0] and hence of errors in [S0]. The combination of complete progress curve and computer analysis makes this approach very efficient with respect to both time and materials. Running on an IBM PC XT or equivalent microcomputer with an 8087 coprocessor, the analyses are very fast, the complete process usually being complete in a minute or two. The utility of the technique is demonstrated by application to both simulated and real data. We show that the differentiation of the progress curve for the ribonuclease-catalyzed hydrolysis of 2',3'-cyclic cytidine monophosphate reveals strong product inhibition by 3'-CMP, and this product inhibition accounts for the large discrepancies reported in the literature for the value of Km for this substrate. The method was also applied to determine the rate of reactivation of beta-lactamase which had been reversibly inactivated by cloxacillin. Since large numbers of data points are required for the numerical differentiation the method has become practical only with the advent of computer-acquired data systems.  相似文献   

10.
Chromatin immunoprecipitation (ChIP) followed by deep sequencing can now easily be performed across different conditions, time points and even species. However, analyzing such data is not trivial and standard methods are as yet unavailable. Here we present a protocol to systematically compare ChIP-sequencing (ChIP-seq) data across conditions. We first describe technical guidelines for data preprocessing, read mapping, read-density visualization and peak calling. We then describe methods and provide code with specific examples to compare different data sets across species and across conditions, including a threshold-free approach to measure global similarity, a strategy to assess the binary conservation of binding events and measurements for quantitative changes of binding. We discuss how differences in binding can be related to gene functions, gene expression and sequence changes. Once established, this protocol should take about 2 d to complete and be generally applicable to many data sets.  相似文献   

11.
Roshan et al. recently described a "divide-and-conquer" technique for parsimony analysis of large data sets, Rec-I-DCM3, and stated that it compares very favorably to results using the program TNT. Their technique is based on selecting subsets of taxa to create reduced data sets or subproblems, finding most-parsimonious trees for each reduced data set, recombining all parts together, and then performing global TBR swapping on the combined tree. Here, we contrast this approach to sectorial searches, a divide-and-conquer algorithm implemented in TNT. This algorithm also uses a guide tree to create subproblems, with the first-pass state sets of the nodes that join the selected sectors with the rest of the topology; this allows exact length calculations for the entire topology (that is, any solution N steps shorter than the original, for the reduced subproblem, must also be N steps shorter for the entire topology). We show here that, for sectors of similar size analyzed with the same search algorithms, subdividing data sets with sectorial searches produces better results than subdividing with Rec-I-DCM3. Roshan et al.'s claim that Rec-I-DCM3 outperforms the techniques in TNT was caused by a poor experimental design and algorithmic settings used for the runs in TNT. In particular, for finding trees at or very close to the minimum known length of the analyzed data sets, TNT clearly outperforms Rec-I-DCM3. Finally, we show that the performance of Rec-I-DCM3 is bound by the efficiency of TBR implementation for the complete data set, as this method behaves (after some number of iterations) as a technique for cyclic perturbations and improvements more than as a divide-and-conquer strategy.  相似文献   

12.
Species identification through DNA barcoding or metabarcoding has become a key approach for biodiversity evaluation and ecological studies. However, the rapid accumulation of barcoding data has created some difficulties: for instance, global enquiries to a large reference library can take a very long time. We here devise a two‐step searching strategy to speed identification procedures of such queries. This firstly uses a Hidden Markov Model (HMM) algorithm to narrow the searching scope to genus level and then determines the corresponding species using minimum genetic distance. Moreover, using a fuzzy membership function, our approach also estimates the credibility of assignment results for each query. To perform this task, we developed a new software pipeline, FuzzyID2, using Python and C++. Performance of the new method was assessed using eight empirical data sets ranging from 70 to 234,535 barcodes. Five data sets (four animal, one plant) deployed the conventional barcode approach, one used metabarcodes, and two were eDNA‐based. The results showed mean accuracies of generic and species identification of 98.60% (with a minimum of 95.00% and a maximum of 100.00%) and 94.17% (with a range of 84.40%–100.00%), respectively. Tests with simulated NGS sequences based on realistic eDNA and metabarcode data demonstrated that FuzzyID2 achieved a significantly higher identification success rate than the commonly used Blast method, and the TIPP method tends to find many fewer species than either FuzztID2 or Blast. Furthermore, data sets with tens of thousands of barcodes need only a few seconds for each query assignment using FuzzyID2. Our approach provides an efficient and accurate species identification protocol for biodiversity‐related projects with large DNA sequence data sets.  相似文献   

13.
Absolute protein concentration determination is becoming increasingly important in a number of fields including diagnostics, biomarker discovery and systems biology modeling. The recently introduced quantification concatamer methodology provides a novel approach to performing such determinations, and it has been applied to both microbial and mammalian systems. While a number of software tools exist for performing analyses of quantitative data generated by related methodologies such as SILAC, there is currently no analysis package dedicated to the quantification concatamer approach. Furthermore, most tools that are currently available in the field of quantitative proteomics do not manage storage and dissemination of such data sets.  相似文献   

14.
A PCR-based approach to sequencing complete mitochondrial genomes is described along with a set of 86 primers designed primarily for avian mitochondrial DNA (mtDNA). This PCR-based approach allows an accurate determination of complete mtDNA sequences that is faster than sequencing cloned mtDNA. The primers are spaced at about 500-base intervals along both DNA strands. Many of the primers incorporate degenerate positions to accommodate variation in mtDNA sequence among avian taxa and to reduce the potential for preferential amplification of nuclear pseudogenes. Comparison with published vertebrate mtDNA sequences suggests that many of the primers will have broad taxonomic utility. In addition, these primers should make available a wider variety of mitochondrial genes for studies based on smaller data sets.  相似文献   

15.
《Biophysical journal》2021,120(20):4472-4483
Single-molecule (SM) approaches have provided valuable mechanistic information on many biophysical systems. As technological advances lead to ever-larger data sets, tools for rapid analysis and identification of molecules exhibiting the behavior of interest are increasingly important. In many cases the underlying mechanism is unknown, making unsupervised techniques desirable. The divisive segmentation and clustering (DISC) algorithm is one such unsupervised method that idealizes noisy SM time series much faster than computationally intensive approaches without sacrificing accuracy. However, DISC relies on a user-selected objective criterion (OC) to guide its estimation of the ideal time series. Here, we explore how different OCs affect DISC’s performance for data typical of SM fluorescence imaging experiments. We find that OCs differing in their penalty for model complexity each optimize DISC’s performance for time series with different properties such as signal/noise and number of sample points. Using a machine learning approach, we generate a decision boundary that allows unsupervised selection of OCs based on the input time series to maximize performance for different types of data. This is particularly relevant for SM fluorescence data sets, which often have signal/noise near the derived decision boundary and include time series of nonuniform length because of stochastic bleaching. Our approach, AutoDISC, allows unsupervised per-molecule optimization of DISC, which will substantially assist in the rapid analysis of high-throughput SM data sets with noisy samples and nonuniform time windows.  相似文献   

16.
17.

Background  

With the advent of high throughput biotechnology data acquisition platforms such as micro arrays, SNP chips and mass spectrometers, data sets with many more variables than observations are now routinely being collected. Finding relationships between response variables of interest and variables in such data sets is an important problem akin to finding needles in a haystack. Whilst methods for a number of response types have been developed a general approach has been lacking.  相似文献   

18.
R Cohen  J M Claverie 《Biopolymers》1975,14(8):1701-1716
The first detailed application of a recently published very general approach to chemical equilibria during sedimentation is presented. As a consequence of the very extensive theoretical treatment, made possible by this approach, the active enzyme analytical centrifugation method can now be used under a far wider set of conditions than before, including the study of many interacting active molecule systems. It has also been shown that this method is as precise as the more conventional ones.  相似文献   

19.
For understanding the computation and function of single neurons in sensory systems, one needs to investigate how sensory stimuli are related to a neuron’s response and which biological mechanisms underlie this relationship. Mathematical models of the stimulus–response relationship have proved very useful in approaching these issues in a systematic, quantitative way. A starting point for many such analyses has been provided by phenomenological “linear–nonlinear” (LN) models, which comprise a linear filter followed by a static nonlinear transformation. The linear filter is often associated with the neuron’s receptive field. However, the structure of the receptive field is generally a result of inputs from many presynaptic neurons, which may form parallel signal processing pathways. In the retina, for example, certain ganglion cells receive excitatory inputs from ON-type as well as OFF-type bipolar cells. Recent experiments have shown that the convergence of these pathways leads to intriguing response characteristics that cannot be captured by a single linear filter. One approach to adjust the LN model to the biological circuit structure is to use multiple parallel filters that capture ON and OFF bipolar inputs. Here, we review these new developments in modeling neuronal responses in the early visual system and provide details about one particular technique for obtaining the required sets of parallel filters from experimental data.  相似文献   

20.

Background

The skeleton of complex systems can be represented as networks where vertices represent entities, and edges represent the relations between these entities. Often it is impossible, or expensive, to determine the network structure by experimental validation of the binary interactions between every vertex pair. It is usually more practical to infer the network from surrogate observations. Network inference is the process by which an underlying network of relations between entities is determined from indirect evidence. While many algorithms have been developed to infer networks from quantitative data, less attention has been paid to methods which infer networks from repeated co-occurrence of entities in related sets. This type of data is ubiquitous in the field of systems biology and in other areas of complex systems research. Hence, such methods would be of great utility and value.

Results

Here we present a general method for network inference from repeated observations of sets of related entities. Given experimental observations of such sets, we infer the underlying network connecting these entities by generating an ensemble of networks consistent with the data. The frequency of occurrence of a given link throughout this ensemble is interpreted as the probability that the link is present in the underlying real network conditioned on the data. Exponential random graphs are used to generate and sample the ensemble of consistent networks, and we take an algorithmic approach to numerically execute the inference method. The effectiveness of the method is demonstrated on synthetic data before employing this inference approach to problems in systems biology and systems pharmacology, as well as to construct a co-authorship collaboration network. We predict direct protein-protein interactions from high-throughput mass-spectrometry proteomics, integrate data from Chip-seq and loss-of-function/gain-of-function followed by expression data to infer a network of associations between pluripotency regulators, extract a network that connects 53 cancer drugs to each other and to 34 severe adverse events by mining the FDA’s Adverse Events Reporting Systems (AERS), and construct a co-authorship network that connects Mount Sinai School of Medicine investigators. The predicted networks and online software to create networks from entity-set libraries are provided online at http://www.maayanlab.net/S2N.

Conclusions

The network inference method presented here can be applied to resolve different types of networks in current systems biology and systems pharmacology as well as in other fields of research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号