With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx.  相似文献   



A systems biology interpretation of genome-scale RNA interference (RNAi) experiments is complicated by scope, experimental variability and network signaling robustness. Over representation approaches (ORA), such as the Hypergeometric or z-score, are an established statistical framework used to associate RNA interference effectors to biologically annotated gene sets or pathways. These methods, however, do not directly take advantage of our growing understanding of the interactome. Furthermore, these methods can miss partial pathway activation and may be biased by protein complexes. Here we present a novel ORA, protein interaction permutation analysis (PIPA), that takes advantage of canonical pathways and established protein interactions to identify pathways enriched for protein interactions connecting RNAi hits.  相似文献   

Heterochrony is important as a potential mechanism of evolutionary change. However, the analysis of developmental timing data within a phylogenetic framework to identify important shifts has proven difficult. In particular, analytical problems with sequence (event) heterochrony revolve around the lack of an absolute time frame in development to allow standardization of timing data across species. An important breakthrough in this regard is the method of "event-pairing," which compares the relative timing of developmental events in a pairwise fashion. The resulting event-pair-encoded data can be mapped onto a phylogeny, which can provide important biological information. However, event-paired data are cumbersome to work with and lack a rigorous quantitative framework under which to analyze them. Critically, the otherwise advantageous relativity of event-pairing prevents an assessment of whether one or both events in a single event-pair have changed position during evolutionary history. Building on the method of event-pairing, we describe a protocol whereby event-pair transformations along a given branch are analyzed en bloc. Our method of "event-pair cracking" thereby allows developmental timing data to be analyzed quantitatively within a phylogenetic framework to infer key heterochronic shifts. We demonstrate the utility of event-pair cracking through a worked example and show how it provides a set of desired features identified by previous authors.  相似文献   

The Memory-Prediction Framework (MPF) and its Hierarchical-Temporal Memory implementation (HTM) have been widely applied to unsupervised learning problems, for both classification and prediction. To date, there has been no attempt to incorporate MPF/HTM in reinforcement learning or other adaptive systems; that is, to use knowledge embodied within the hierarchy to control a system, or to generate behaviour for an agent. This problem is interesting because the human neocortex is believed to play a vital role in the generation of behaviour, and the MPF is a model of the human neocortex.We propose some simple and biologically-plausible enhancements to the Memory-Prediction Framework. These cause it to explore and interact with an external world, while trying to maximize a continuous, time-varying reward function. All behaviour is generated and controlled within the MPF hierarchy. The hierarchy develops from a random initial configuration by interaction with the world and reinforcement learning only. Among other demonstrations, we show that a 2-node hierarchy can learn to successfully play "rocks, paper, scissors" against a predictable opponent.  相似文献   

基因组尺度代谢网络研究进展   总被引:2,自引:0,他引:2  
基因组尺度代谢网络从基因组序列出发,结合基因、蛋白质、代谢数据库和实验数据,从系统的角度定量研究生命体的代谢过程,了解各个组分之间的相互作用关系。这类网络模型对于生命活动理论研究和优良工程菌的构建都具有重要的理论和实践意义。以下结合作者的实际研究经验,对基因组尺度代谢网络从重构到模拟直至应用进行了较为详细的介绍,并讨论了一些目前存在的难题和未来的研究方向。  相似文献   

Migrations, i.e. the recurring, roundtrip movement of animals between distant and distinct habitats, occur among diverse metazoan taxa. Although traditionally linked to avoidance of food shortages, predators or harsh abiotic conditions, there is increasing evidence that parasites may have played a role in the evolution of migration. On the one hand, selective pressures from parasites can favour migratory strategies that allow either avoidance of infections or recovery from them. On the other hand, infected animals incur physiological costs that may limit their migratory abilities, affecting their speed, the timing of their departure or arrival, and/or their condition upon reaching their destination. During migration, reduced immunocompetence as well as exposure to different external conditions and parasite infective stages can influence infection dynamics. Here, we first explore whether parasites represent extra costs for their hosts during migration. We then review how infection dynamics and infection risk are affected by host migration, thereby considering parasites as both causes and consequences of migration. We also evaluate the comparative evidence testing the hypothesis that migratory species harbour a richer parasite fauna than their closest free-living relatives, finding general support for the hypothesis. Then we consider the implications of host migratory behaviour for parasite ecology and evolution, which have received much less attention. Parasites of migratory hosts may achieve much greater spatial dispersal than those of non-migratory hosts, expanding their geographical range, and providing more opportunities for host-switching. Exploiting migratory hosts also exerts pressures on the parasite to adapt its phenology and life-cycle duration, including the timing of major developmental, reproduction and transmission events. Natural selection may even favour parasites that manipulate their host's migratory strategy in ways that can enhance parasite transmission. Finally, we propose a simple integrated framework based on eco-evolutionary feedbacks to consider the reciprocal selection pressures acting on migratory hosts and their parasites. Host migratory strategies and parasite traits evolve in tandem, each acting on the other along two-way causal paths and feedback loops. Their likely adjustments to predicted climate change will be understood best from this coevolutionary perspective.  相似文献   

Connecting the nonlinear and often counterintuitive physiological effects of multiple environmental drivers to the emergent impacts on ecosystems is a fundamental challenge. Unfortunately, the disconnect between the way “stressors” (e.g., warming) is considered in organismal (physiological) and ecological (community) contexts continues to hamper progress. Environmental drivers typically elicit biphasic physiological responses, where performance declines at levels above and below some optimum. It is also well understood that species exhibit highly variable response surfaces to these changes so that the optimum level of any environmental driver can vary among interacting species. Thus, species interactions are unlikely to go unaltered under environmental change. However, while these nonlinear, species‐specific physiological relationships between environment and performance appear to be general, rarely are they incorporated into predictions of ecological tipping points. Instead, most ecosystem‐level studies focus on varying levels of “stress” and frequently assume that any deviation from “normal” environmental conditions has similar effects, albeit with different magnitudes, on all of the species within a community. We consider a framework that realigns the positive and negative physiological effects of changes in climatic and nonclimatic drivers with indirect ecological responses. Using a series of simple models based on direct physiological responses to temperature and ocean pCO2, we explore how variation in environment‐performance relationships among primary producers and consumers translates into community‐level effects via trophic interactions. These models show that even in the absence of direct mortality, mismatched responses resulting from often subtle changes in the physical environment can lead to substantial ecosystem‐level change.  相似文献   

The land-sparing versus land-sharing debate centers around how different intensities of habitat use can be coordinated to satisfy competing demands for biodiversity persistence and food production in agricultural landscapes. We apply the broad concepts from this debate to the sea and propose it as a framework to inform marine zoning based on three possible management strategies, establishing: no-take marine reserves, regulated fishing zones, and unregulated open-access areas. We develop a general model that maximizes standing fish biomass, given a fixed management budget while maintaining a minimum harvest level. We find that when management budgets are small, sea-sparing is the optimal management strategy because for all parameters tested, reserves are more cost-effective at increasing standing biomass than traditional fisheries management. For larger budgets, the optimal strategy switches to sea-sharing because, at a certain point, further investing to grow the no-take marine reserves reduces catch below the minimum harvest constraint. Our intention is to illustrate how general rules of thumb derived from plausible, single-purpose models can help guide marine protected area policy under our novel sparing and sharing framework. This work is the beginning of a basic theory for optimal zoning allocations and should be considered complementary to the more specific spatial planning literature for marine reserve as nations expand their marine protected area estates.  相似文献   

The predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores. Eight polygenic score methods were tested: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDpred1, LDpred2, PRScs, DBSLMM and SBayesR, evaluating their performance to predict outcomes in UK Biobank and the Twins Early Development Study (TEDS). Strategies to identify optimal p-value thresholds and shrinkage parameters were compared, including 10-fold cross validation, pseudovalidation and infinitesimal models (with no validation sample), and multi-polygenic score elastic net models. LDpred2, lassosum and PRScs performed strongly using 10-fold cross-validation to identify the most predictive p-value threshold or shrinkage parameter, giving a relative improvement of 16–18% over pT+clump in the correlation between observed and predicted outcome values. Using pseudovalidation, the best methods were PRScs, DBSLMM and SBayesR. PRScs pseudovalidation was only 3% worse than the best polygenic score identified by 10-fold cross validation. Elastic net models containing polygenic scores based on a range of parameters consistently improved prediction over any single polygenic score. Within a reference-standardized framework, the best polygenic prediction was achieved using LDpred2, lassosum and PRScs, modeling multiple polygenic scores derived using multiple parameters. This study will help researchers performing polygenic score studies to select the most powerful and predictive analysis methods.  相似文献   

Systems-level design of cell metabolism is becoming increasingly important for renewable production of fuels, chemicals, and drugs. Computational models are improving in the accuracy and scope of predictions, but are also growing in complexity. Consequently, efficient and scalable algorithms are increasingly important for strain design. Previous algorithms helped to consolidate the utility of computational modeling in this field. To meet intensifying demands for high-performance strains, both the number and variety of genetic manipulations involved in strain construction are increasing. Existing algorithms have experienced combinatorial increases in computational complexity when applied toward the design of such complex strains. Here, we present EMILiO, a new algorithm that increases the scope of strain design to include reactions with individually optimized fluxes. Unlike existing approaches that would experience an explosion in complexity to solve this problem, we efficiently generated numerous alternate strain designs producing succinate, l-glutamate and l-serine. This was enabled by successive linear programming, a technique new to the area of computational strain design.  相似文献   

This article applied distributed artificial intelligence to the real-time planning and control of flexible manufacturing systems (FMS) consisting of asynchronous manufacturing cells. A knowledge-based approach is used to determine the course of action, resource sharing, and processor assignments. Within each cell there is an embedded automatic planning system that executes dynamic scheduling and supervises manufacturing operations. Because of the decentralized control, real-time task assignments are carried out by a negotiation process among cell hosts. The negotiation process is modeled by augmented Petri nets —the combination of production rules and Petri nets—and is excuted by a distributed, rule-based algorithm.  相似文献   

We present version 2 of the SPINE system for structural proteomics. SPINE is available over the web at http://nesg.org. It serves as the central hub for the Northeast Structural Genomics Consortium, allowing collaborative structural proteomics to be carried out in a distributed fashion. The core of SPINE is a laboratory information management system (LIMS) for key bits of information related to the progress of the consortium in cloning, expressing and purifying proteins and then solving their structures by NMR or X-ray crystallography. Originally, SPINE focused on tracking constructs, but, in its current form, it is able to track target sample tubes and store detailed sample histories. The core database comprises a set of standard relational tables and a data dictionary that form an initial ontology for proteomic properties and provide a framework for large-scale data mining. Moreover, SPINE sits at the center of a federation of interoperable information resources. These can be divided into (i) local resources closely coupled with SPINE that enable it to handle less standardized information (e.g. integrated mailing and publication lists), (ii) other information resources in the NESG consortium that are inter-linked with SPINE (e.g. crystallization LIMS local to particular laboratories) and (iii) international archival resources that SPINE links to and passes on information to (e.g. TargetDB at the PDB).  相似文献   

MOTIVATION: Currently available methods for the prediction of subcellular location of mitochondrial proteins rely largely on the presence of mitochondrial targeting signals in the protein sequences. However, a large fraction of mitochondrial proteins lack such signals, making those tools ineffective for genome-scale prediction of mitochondria-targeted proteins. Here, we propose a method for genome-scale prediction of nucleus-encoded mitochondrial proteins. The new method, MITOPRED, is based on the Pfam domain occurrence patterns and the amino acid compositional differences between mitochondrial and non-mitochondrial proteins. RESULTS: MITOPRED could predict mitochondrial proteins with 100% specificity at a 44% sensitivity rate and with 67% specificity at 99% sensitivity. Additionally, it was sufficiently robust to predict mitochondrial proteins across different eukaryotic species with similar accuracy. Based on Matthews correlation coefficient measure, the prediction performance of MITOPRED is clearly superior (0.73) to those of the two popular methods TargetP (0.51) and PSORT (0.53). Using this method, we predicted the nucleus-encoded mitochondrial proteins from six complete genomes (three invertebrate, two vertebrate and one plant species) and estimated the total number in each genome. In human, our method estimated the existence of 1362 mitochondrial proteins corresponding to 4.8% of the total proteome. AVAILABILITY: MITOPRED program is freely accessible at http://mitopred.sdsc.edu. Source code is available on request from the authors. SUPPLEMENTARY INFORMATION: Training data sets are also available at http://mitopred.sdsc.edu  相似文献   

Sequences of 16 NAD and/or NADP-linked aldehyde oxidoreductases are aligned, including representative examples of all aldehyde dehydrogenase forms with wide substrate preferences as well as additional types with distinct specificities for certain metabolic aldehyde intermediates, particularly semialdehydes, yielding pairwise identities from 15 to 83%. Eleven of 23 invariant residues are glycine and three are proline, indicating evolutionary restraint against alteration of peptide chain-bending points. Additionally, another 66 positions show high conservation of residue type, mostly hydrophobic residues. Ten of these occur in predicted beta-strands, suggesting important interior-packing interactions. A single invariant cysteine residue is found, further supporting its catalytic role. A previously identified essential glutamic acid residue is conserved in all but methyl malonyl semialdehyde dehydrogenase, which may relate to formation by that enzyme of a CoA ester as a product rather than a free carboxylate species. Earlier, similarity to a GXGXXG segment expected in the NAD-binding site was noted from alignments with fewer sequences. The same region continues to be indicated, although now only the first glycine residue is strictly conserved and the second (usually threonine) is not present at all, suggesting greater variance in coenzyme-binding interactions.  相似文献   

Genome-scale metabolic models have been recognised as useful tools for better understanding living organisms’ metabolism. merlin (https://www.merlin-sysbio.org/) is an open-source and user-friendly resource that hastens the models’ reconstruction process, conjugating manual and automatic procedures, while leveraging the user''s expertise with a curation-oriented graphical interface. An updated and redesigned version of merlin is herein presented. Since 2015, several features have been implemented in merlin, along with deep changes in the software architecture, operational flow, and graphical interface. The current version (4.0) includes the implementation of novel algorithms and third-party tools for genome functional annotation, draft assembly, model refinement, and curation. Such updates increased the user base, resulting in multiple published works, including genome metabolic (re-)annotations and model reconstructions of multiple (lower and higher) eukaryotes and prokaryotes. merlin version 4.0 is the only tool able to perform template based and de novo draft reconstructions, while achieving competitive performance compared to state-of-the art tools both for well and less-studied organisms.  相似文献   

Putative synapomorphy assessment (primary homology assessment) is distinct for DNA strings having a codon structure (hereafter, coding DNA) versus those lacking it (hereafter, non-coding DNA). The first requires the identification of a reading frame and of usually few in-frame insertions and deletions. In non-coding DNA, where length variation is much more common, putative synapomorphy assessment is considerably less straightforward and highly depends on the alignment method. Appreciating the existence of evolutionary constraints, alignments that consider patterns associated with specific putative evolutionary events are favored. Once the sequences have been aligned, the postulated putative evolutionary events need to be coded as an additional step. In order for the alignments and the alignment coding to be falsifiable, they should be carried out using justified and explicitly formulated criteria. Alternative coding methods for the most common patterns present in alignments of non-coding DNA are discussed here. Simpler putative synapomorphy assessment will not always correlate to more reliable phylogenetic information because simplicity does not necessarily correlate to the degree of homoplasy. The use of non-coding DNA can result in more laborious coding, but at the same time in more corroborated hypotheses, mirroring their accuracy for phylogenetic inference.  相似文献   



Advances in bioinformatic techniques and analyses have led to the availability of genome-scale metabolic reconstructions. The size and complexity of such networks often means that their potential behaviour can only be analysed with constraint-based methods. Whilst requiring minimal experimental data, such methods are unable to give insight into cellular substrate concentrations. Instead, the long-term goal of systems biology is to use kinetic modelling to characterize fully the mechanics of each enzymatic reaction, and to combine such knowledge to predict system behaviour.  相似文献   

The photophysical properties of 1,1′-dimethyl-4,4′dipyridinium (methyl viologen, MV2+) intercalated within zirconium phosphate (ZrP) were investigated. The intercalation of MV2+ within ZrP was achieved by ion-exchange using a hydrated form of ZrP with six water molecules per formula unit and an interlayer distance of 10.3 Å. The intercalation yields a new phase with an interlayer distance up to 10.6 Å. The MV2+-exchanged ZrP material was characterized using elemental analysis, XRPD and IR data. The MV2+-exchanged ZrP materials show a red shift in the UV-Vis spectra in contrast with solution. The photoexcitation of nitrogen purged, MV2+-exchanged ZrP water suspensions with UV light leads to fluorescence emission with a maximum at 337 nm. The photoexcitation of MV2+-exchanged ZrP suspensions without nitrogen purging yields two fluorescence emissions with maxima at 337 and 450 nm. The emission in the visible region can be attributed to a photodecomposition product. The fluorescence quantum yields indicate that the emission of MV2+-exchanged ZrP is of the same order of magnitude as that of MV2+ in water indicating a strong deactivation of the excited state by non-radiative pathways.  相似文献   

