首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Social unrest is endemic in many societies, and recent news has drawn attention to happenings in Latin America, the Middle East, and Eastern Europe. Civilian populations mobilize, sometimes spontaneously and sometimes in an organized manner, to raise awareness of key issues or to demand changes in governing or other organizational structures. It is of key interest to social scientists and policy makers to forecast civil unrest using indicators observed on media such as Twitter, news, and blogs. We present an event forecasting model using a notion of activity cascades in Twitter (proposed by Gonzalez-Bailon et al., 2011) to predict the occurrence of protests in three countries of Latin America: Brazil, Mexico, and Venezuela. The basic assumption is that the emergence of a suitably detected activity cascade is a precursor or a surrogate to a real protest event that will happen “on the ground.” Our model supports the theoretical characterization of large cascades using spectral properties and uses properties of detected cascades to forecast events. Experimental results on many datasets, including the recent June 2013 protests in Brazil, demonstrate the effectiveness of our approach.  相似文献   

2.
Fan M  Wong KC  Ryu T  Ravasi T  Gao X 《PloS one》2012,7(6):e39475
With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx.  相似文献   

3.
We study properties of multidomain proteins from a graph theoretical perspective. In particular, we demonstrate connections between properties of the domain overlap graph and certain variants of Dollo parsimony models. We apply our graph theoretical results to address several interrelated questions: do proteins acquire new domains infrequently, or often enough that the same combinations of domains will be created repeatedly through independent events? Once domain architectures are created do they persist? In other words, is the existence of ancestral proteins with domain compositions not observed in contemporary proteins unlikely? Our experimental results indicate that independent merges of domain pairs are not uncommon in large superfamilies.  相似文献   

4.
The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.  相似文献   

5.
An algorithm for determining of protein domain structure is proposed. Domain structures resulted from the algorithm application have been obtained and compared with available data. The method is based on entirely physical model of van der Waals interactions that reflects as illustrated in this work the distribution of electron density. Various levels of hierarchy in the protein spatial structure are discerned by analysis of the energy interaction between structural units of different scales. Thus the level of energy hierarchy plays role of sole parameter, and the method obviates the use of complicated geometrical criteria with numerous fitting parameters. The algorithm readily and accurately locates domains formed by continuous segments of the protein chain as well as those comprising non-sequential segments, sets no limit to the number of segments in a domain. We have analyzed 309 protein structures. Among 277 structures for which our results could be compared with the domain definitions made in other works, 243 showed complete or partial coincidence, and only in 34 cases the domain structures proved substantially different. The domains delineated with our approach may coincide with reference definition at different levels of the globule hierarchy. Along with defining the domain structure, our approach allows one to consider the protein spatial structure in terms of the spatial distribution of the interaction energy in order to establish the correspondence between the hierarchy of energy distribution and the hierarchy of structural elements.  相似文献   

6.
Parsimony methods infer phylogenetic trees by minimizing number of character changes required to explain observed character states. From the perspective of applicability of parsimony methods, it is important to assess whether the characters used to infer phylogeny are likely to provide a correct tree. We introduce a graph theoretical characterization that helps to assess whether given set of characters is appropriate to use with parsimony methods. Given a set of characters and a set of taxa, we construct a network called character overlap graph. We show that the character overlap graph for characters that are appropriate to use in parsimony methods is characterized by significant under-representation of subnetworks known as holes, and provide a validation for this observation. This characterization explains success in constructing evolutionary trees using parsimony method for some characters (e.g., protein domains) and lack of such success for other characters (e.g., introns). In the latter case, the understanding of obstacles to applying parsimony methods in a direct way has lead us to a new approach for detecting inconsistent and/or noisy data. Namely, we introduce the concept of stable characters which is similar but less restrictive than the well known concept of pairwise compatible characters. Application of this approach to introns produces the evolutionary tree consistent with the Coelomata hypothesis.  相似文献   

7.
The number of amino acid residues contained in the S1 ribosomal protein of various bacteria varies in a wide range: from 111 to 863 residues in Spiroplasma kunkelii and Treponema pallidum, respectively. The architecture of this protein is traditionally (in particular, because of unknown spatial structure) represented as repeated S1 domains, the copy number of which depends on the protein length. The data on the copy number and boundaries of these domains is available in specialized databases, such as SMART, Pfam, and PROSITE; however, these data can be rather different for the same object. In this work, we used the approach utilizing analysis of predicted secondary structure (PsiPred program). This allowed us to detect the structural domains in S1 protein sequences; their copy number varied from one to six. Alignment of the S1 proteins containing different numbers of domains with the S1 RNA-binding domain of Escherichia coli polynucleotide phosphorylase provided for discovering a domain within this family displaying the maximal homology to the E. coli domain. This conservative domain migrates along the chain, and its location in the proteins with different numbers of domains follows a certain pattern. Similar to the S1 domain of polynucleotide phosphorylase, residues Phe19, Phe22, His34, Asp64, and Arg68 in this conservative domain are clustered on the surface to form an RNA-binding site.  相似文献   

8.

Background

Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, can be responsible for a particular molecular function. Therefore, identifying which domains should be associated with a specific function is a non-trivial task.

Results

We describe a general approach for the computational discovery of associations between different sets of annotations by formalising the problem as a bipartite graph enrichment problem in the setting of a tripartite graph. We call this approach “CODAC” (for COmputational Discovery of Direct Associations using Common Neighbours). As one application of this approach, we describe “GODomainMiner” for associating GO terms with protein domains. We used GODomainMiner to predict GO-domain associations between each of the 3 GO ontology namespaces (MF, BP, and CC) and the Pfam, CATH, and SCOP domain classifications. Overall, GODomainMiner yields average enrichments of 15-, 41- and 25-fold GO-domain associations compared to the existing GO annotations in these 3 domain classifications, respectively.

Conclusions

These associations could potentially be used to annotate many of the protein chains in the Protein Databank and protein sequences in UniProt whose domain composition is known but which currently lack GO annotation.
  相似文献   

9.
Social media are increasingly reflecting and influencing behavior of other complex systems. In this paper we investigate the relations between a well-known micro-blogging platform Twitter and financial markets. In particular, we consider, in a period of 15 months, the Twitter volume and sentiment about the 30 stock companies that form the Dow Jones Industrial Average (DJIA) index. We find a relatively low Pearson correlation and Granger causality between the corresponding time series over the entire time period. However, we find a significant dependence between the Twitter sentiment and abnormal returns during the peaks of Twitter volume. This is valid not only for the expected Twitter volume peaks (e.g., quarterly announcements), but also for peaks corresponding to less obvious events. We formalize the procedure by adapting the well-known “event study” from economics and finance to the analysis of Twitter data. The procedure allows to automatically identify events as Twitter volume peaks, to compute the prevailing sentiment (positive or negative) expressed in tweets at these peaks, and finally to apply the “event study” methodology to relate them to stock returns. We show that sentiment polarity of Twitter peaks implies the direction of cumulative abnormal returns. The amount of cumulative abnormal returns is relatively low (about 1–2%), but the dependence is statistically significant for several days after the events.  相似文献   

10.
11.
Src homology-3 (SH3) domains mediate important protein-protein interactions in a variety of normal and pathological cellular processes, thus providing an attractive target for the selective interference of SH3-dependent signaling events that govern these processes. Most SH3 domains recognize proline-rich peptides with low affinity and poor selectivity, and the goal to design potent and specific ligands for various SH3 domains remains elusive. Better understanding of the molecular basis for SH3 domain recognition is needed in order to design such ligands with potency and specificity. In this report, we seek to define a clear recognition preference of the specificity pocket of the Abl SH3 domain using targeted synthetic peptide libraries. High-resolution affinity panning coupled with mass spectrometric readout allows for quick identification of Trp as the preferred fourth residue in the decapeptide ligand APTWSPPPPP, which binds to Abl SH3 four times stronger than does the decapeptide containing Tyr or Phe in the fourth position. This finding is in contrast to several reports that Tyr is the only residue selected from phage displayed peptide libraries that interacts with the specificity pocket of Abl SH3. This simple, unbiased approach can fine-tune the affinity and selectivity of both natural and unnatural SH3 ligands whose consensus binding sequence has been pre-defined by combinatorial library methods.  相似文献   

12.
ABSTRACT: BACKGROUND: Prescribing errors are a major source of morbidity and mortality and represent a significant patient safety concern. Evidence suggests that trainee doctors are responsible for most prescribing errors. Understanding the factors that influence prescribing behavior may lead to effective interventions to reduce errors. Existing investigations of prescribing errors have been based on Human Error Theory but not on other relevant behavioral theories. The aim of this study was to apply a broad theory-based approach using the Theoretical Domains Framework (TDF) to investigate prescribing in the hospital context among a sample of trainee doctors. METHOD: Semistructured interviews, based on 12 theoretical domains, were conducted with 22 trainee doctors to explore views, opinions, and experiences of prescribing and prescribing errors. Content analysis was conducted, followed by applying relevance criteria and a novel stage of critical appraisal, to identify which theoretical domains could be targeted in interventions to improve prescribing. RESULTS: Seven theoretical domains met the criteria of relevance: "social professional role and identity," "environmental context and resources," "social influences," "knowledge," "skills," "memory, attention, and decision making," and "behavioral regulation." From critical appraisal of the interview data, "beliefs about consequences" and "beliefs about capabilities" were also identified as potentially important domains. Interrelationships between domains were evident. Additionally, the data supported theoretical elaboration of the domain behavioral regulation. CONCLUSIONS: In this investigation of hospital-based prescribing, participants' attributions about causes of errors were used to identify domains that could be targeted in interventions to improve prescribing. In a departure from previous TDF practice, critical appraisal was used to identify additional domains that should also be targeted, despite participants' perceptions that they were not relevant to prescribing errors. These were beliefs about consequences and beliefs about capabilities. Specifically, in the light of the documented high error rate, beliefs that prescribing errors were not likely to have consequences for patients and that trainee doctors are capable of prescribing without error should also be targeted in an intervention. This study is the first to suggest critical appraisal for domain identification and to use interview data to propose theoretical elaborations and interrelationships between domains.  相似文献   

13.
Twitter is a major social media platform in which users send and read messages (“tweets”) of up to 140 characters. In recent years this communication medium has been used by those affected by crises to organize demonstrations or find relief. Because traffic on this media platform is extremely heavy, with hundreds of millions of tweets sent every day, it is difficult to differentiate between times of turmoil and times of typical discussion. In this work we present a new approach to addressing this problem. We first assess several possible “thermostats” of activity on social media for their effectiveness in finding important time periods. We compare methods commonly found in the literature with a method from economics. By combining methods from computational social science with methods from economics, we introduce an approach that can effectively locate crisis events in the mountains of data generated on Twitter. We demonstrate the strength of this method by using it to locate the social events relating to the Occupy Wall Street movement protests at the end of 2011.  相似文献   

14.
RhoA activated kinases (ROCKs) are potent effectors of RhoA signaling for regulation of the cytoskeleton. ROCKs have been shown to be localized to several different subcellular locations, suggesting that its localization is context specific and regulated. However, the signaling mechanisms that control ROCK localization have not been clearly described. In this study we measured ROCKII localization following stimulation with the chemokine CXCL12 or adhesion to collagen 1. Strikingly, each of these extracellular signals targeted ROCKII to membrane protrusions. We further determined that both RhoA and PI3-kinase signaling are required for these stimuli to induce efficient membrane localization. Furthermore, we used a mutational approach to show that two separate domains predicted to respond to these localization signals, the Rho Binding Domain (RBD) and the Pleckstrin Homology domain (PH). Unexpectedly, we found that these two domains work synergistically to lead to membrane localization. This suggests a novel mechanism for controlling ROCKII localization at the membrane, in which the ROCKII C-terminus acts as a coincidence detector for spatial regulatory signals. In other words, efficient membrane targeting requires the ROCKII RBD to receive the RhoA signal and the PH domain to receive the phospholipid signal.  相似文献   

15.
Sistla RK  K V B  Vishveshwara S 《Proteins》2005,59(3):616-626
We present a novel method for the identification of structural domains and domain interface residues in proteins by graph spectral method. This method converts the three-dimensional structure of the protein into a graph by using atomic coordinates from the PDB file. Domain definitions are obtained by constructing either a protein backbone graph or a protein side-chain graph. The graph is constructed based on the interactions between amino acid residues in the three-dimensional structure of the proteins. The spectral parameters of such a graph contain information regarding the domains and subdomains in the protein structure. This is based on the fact that the interactions among amino acids are higher within a domain than across domains. This is evident in the spectra of the protein backbone and the side-chain graphs, thus differentiating the structural domains from one another. Further, residues that occur at the interface of two domains can also be easily identified from the spectra. This method is simple, elegant, and robust. Moreover, a single numeric computation yields both the domain definitions and the interface residues.  相似文献   

16.
PDZ domains are modular protein interaction domains that are present in metazoans and bacteria. These domains possess unique structural features that allow them to interact with the C-terminal residues of their ligands. The Escherichia coli essential periplasmic protein DegP contains two PDZ domains attached to the C-terminal end of the protease domain. In this study we examined the role of each PDZ domain in the protease and chaperone activities of this protein. Specifically, DegP mutants with either one or both PDZ domains deleted were generated and tested to determine their protease and chaperone activities, as well as their abilities to sequester unfolded substrates. We found that the PDZ domains in DegP have different roles; the PDZ1 domain is essential for protease activity and is responsible for recognizing and sequestering unfolded substrates through C-terminal tags, whereas the PDZ2 domain is mostly involved in maintaining the hexameric cage of DegP. Interestingly, neither of the PDZ domains was required for the chaperone activity of DegP. In addition, we found that the loops connecting the protease domain to PDZ1 and connecting PDZ1 to PDZ2 are also essential for the protease activity of the hexameric DegP protein. New insights into the roles of the PDZ domains in the structure and function of DegP are provided. These results imply that DegP recognizes substrate molecules targeted for degradation and substrate molecules targeted for refolding in different manners and suggest that the substrate recognition mechanisms may play a role in the protease-chaperone switch, dictating whether the substrate is degraded or refolded.  相似文献   

17.
Decomposition of structural domains is an essential task in classifying protein structures, predicting protein function, and many other proteomics problems. As the number of known protein structures in PDB grows exponentially, the need for accurate automatic domain decomposition methods becomes more essential. In this article, we introduce a bottom‐up algorithm for assigning protein domains using a graph theoretical approach. This algorithm is based on a center‐based clustering approach. For constructing initial clusters, members of an independent dominating set for the graph representation of a protein are considered as the centers. A distance matrix is then defined for these clusters. To obtain final domains, these clusters are merged using the compactness principle of domains and a method similar to the neighbor‐joining algorithm considering some thresholds. The thresholds are computed using a training set consisting of 50 protein chains. The algorithm is implemented using C++ language and is named ProDomAs. To assess the performance of ProDomAs, its results are compared with seven automatic methods, against five publicly available benchmarks. The results show that ProDomAs outperforms other methods applied on the mentioned benchmarks. The performance of ProDomAs is also evaluated against 6342 chains obtained from ASTRAL SCOP 1.71. ProDomAs is freely available at http://www.bioinf.cs.ipm.ir/software/prodomas . Proteins 2014; 82:1937–1946. © 2014 Wiley Periodicals, Inc.  相似文献   

18.
Detecting spreading outbreaks in social networks with sensors is of great significance in applications. Inspired by the formation mechanism of humans’ physical sensations to external stimuli, we propose a new method to detect the influence of spreading by constructing excitable sensor networks. Exploiting the amplifying effect of excitable sensor networks, our method can better detect small-scale spreading processes. At the same time, it can also distinguish large-scale diffusion instances due to the self-inhibition effect of excitable elements. Through simulations of diverse spreading dynamics on typical real-world social networks (Facebook, coauthor, and email social networks), we find that the excitable sensor networks are capable of detecting and ranking spreading processes in a much wider range of influence than other commonly used sensor placement methods, such as random, targeted, acquaintance and distance strategies. In addition, we validate the efficacy of our method with diffusion data from a real-world online social system, Twitter. We find that our method can detect more spreading topics in practice. Our approach provides a new direction in spreading detection and should be useful for designing effective detection methods.  相似文献   

19.
Although the possibility of gene evolution by domain rearrangements has long been appreciated, current methods for reconstructing and systematically analyzing gene family evolution are limited to events such as duplication, loss, and sometimes, horizontal transfer. However, within the Drosophila clade, we find domain rearrangements occur in 35.9% of gene families, and thus, any comprehensive study of gene evolution in these species will need to account for such events. Here, we present a new computational model and algorithm for reconstructing gene evolution at the domain level. We develop a method for detecting homologous domains between genes and present a phylogenetic algorithm for reconstructing maximum parsimony evolutionary histories that include domain generation, duplication, loss, merge (fusion), and split (fission) events. Using this method, we find that genes involved in fusion and fission are enriched in signaling and development, suggesting that domain rearrangements and reuse may be crucial in these processes. We also find that fusion is more abundant than fission, and that fusion and fission events occur predominantly alongside duplication, with 92.5% and 34.3% of fusion and fission events retaining ancestral architectures in the duplicated copies. We provide a catalog of ~9,000 genes that undergo domain rearrangement across nine sequenced species, along with possible mechanisms for their formation. These results dramatically expand on evolution at the subgene level and offer several insights into how new genes and functions arise between species.  相似文献   

20.
MOTIVATION: Unravelling the rules underlying protein-protein and protein-ligand interactions is a crucial step in understanding cell machinery. Peptide recognition modules (PRMs) are globular protein domains which focus their binding targets on short protein sequences and play a key role in the frame of protein-protein interactions. High-throughput techniques permit the whole proteome scanning of each domain, but they are characterized by a high incidence of false positives. In this context, there is a pressing need for the development of in silico experiments to validate experimental results and of computational tools for the inference of domain-peptide interactions. RESULTS: We focused on the SH3 domain family and developed a machine-learning approach for inferring interaction specificity. SH3 domains are well-studied PRMs which typically bind proline-rich short sequences characterized by the PxxP consensus. The binding information is known to be held in the conformation of the domain surface and in the short sequence of the peptide. Our method relies on interaction data from high-throughput techniques and benefits from the integration of sequence and structure data of the interacting partners. Here, we propose a novel encoding technique aimed at representing binding information on the basis of the domain-peptide contact residues in complexes of known structure. Remarkably, the new encoding requires few variables to represent an interaction, thus avoiding the 'curse of dimension'. Our results display an accuracy >90% in detecting new binders of known SH3 domains, thus outperforming neural models on standard binary encodings, profile methods and recent statistical predictors. The method, moreover, shows a generalization capability, inferring specificity of unknown SH3 domains displaying some degree of similarity with the known data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号