首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
  1. A time‐consuming challenge faced by camera trap practitioners is the extraction of meaningful data from images to inform ecological management. An increasingly popular solution is automated image classification software. However, most solutions are not sufficiently robust to be deployed on a large scale due to lack of location invariance when transferring models between sites. This prevents optimal use of ecological data resulting in significant expenditure of time and resources to annotate and retrain deep learning models.
  2. We present a method ecologists can use to develop optimized location invariant camera trap object detectors by (a) evaluating publicly available image datasets characterized by high intradataset variability in training deep learning models for camera trap object detection and (b) using small subsets of camera trap images to optimize models for high accuracy domain‐specific applications.
  3. We collected and annotated three datasets of images of striped hyena, rhinoceros, and pigs, from the image‐sharing websites FlickR and iNaturalist (FiN), to train three object detection models. We compared the performance of these models to that of three models trained on the Wildlife Conservation Society and Camera CATalogue datasets, when tested on out‐of‐sample Snapshot Serengeti datasets. We then increased FiN model robustness by infusing small subsets of camera trap images into training.
  4. In all experiments, the mean Average Precision (mAP) of the FiN trained models was significantly higher (82.33%–88.59%) than that achieved by the models trained only on camera trap datasets (38.5%–66.74%). Infusion further improved mAP by 1.78%–32.08%.
  5. Ecologists can use FiN images for training deep learning object detection solutions for camera trap image processing to develop location invariant, robust, out‐of‐the‐box software. Models can be further optimized by infusion of 5%–10% camera trap images into training data. This would allow AI technologies to be deployed on a large scale in ecological applications. Datasets and code related to this study are open source and available on this repository: https://doi.org/10.5061/dryad.1c59zw3tx.
  相似文献   

3.
The current available data on protein sequences largely exceeds the experimental capabilities to annotate their function. So annotation in silico, i.e. using computational methods becomes increasingly important. This annotation is inevitably a prediction, but it can be an important starting point for further experimental studies. Here we present a method for prediction of protein functional sites, SDPsite, based on the identification of protein specificity determinants. Taking as an input a protein sequence alignment and a phylogenetic tree, the algorithm predicts conserved positions and specificity determinants, maps them onto the protein's 3D structure, and searches for clusters of the predicted positions. Comparison of the obtained predictions with experimental data and data on performance of several other methods for prediction of functional sites reveals that SDPsite agrees well with the experiment and outperforms most of the previously available methods. SDPsite is publicly available under http://bioinf.fbb.msu.ru/SDPsite.  相似文献   

4.
Amid the COVID‐19 crisis, we put sizeable efforts to collect a high number of experimentally validated drug–virus association entries from literature by text mining and built a human drug–virus association database. To the best of our knowledge, it is the largest publicly available drug–virus database so far. Next, we develop a novel weight regularization matrix factorization approach, termed WRMF, for in silico drug repurposing by integrating three networks: the known drug–virus association network, the drug–drug chemical structure similarity network, and the virus–virus genomic sequencing similarity network. Specifically, WRMF adds a weight to each training sample for reducing the influence of negative samples (i.e. the drug–virus association is unassociated). A comparison on the curated drug–virus database shows that WRMF performs better than a few state‐of‐the‐art methods. In addition, we selected the other two different public datasets (i.e. Cdataset and HMDD V2.0) to assess WRMF''s performance. The case study also demonstrated the accuracy and reliability of WRMF to infer potential drugs for the novel virus. In summary, we offer a useful tool including a novel drug–virus association database and a powerful method WRMF to repurpose potential drugs for new viruses.  相似文献   

5.
Temperature is a fundamental environmental factor that shapes the evolution of organisms. Learning thermal determinants of protein sequences in evolution thus has profound significance for basic biology, drug discovery, and protein engineering. Here, we use a data set of over 3 million BRENDA enzymes labeled with optimal growth temperatures (OGTs) of their source organisms to train a deep neural network model (DeepET). The protein‐temperature representations learned by DeepET provide a temperature‐related statistical summary of protein sequences and capture structural properties that affect thermal stability. For prediction of enzyme optimal catalytic temperatures and protein melting temperatures via a transfer learning approach, our DeepET model outperforms classical regression models trained on rationally designed features and other deep‐learning‐based representations. DeepET thus holds promise for understanding enzyme thermal adaptation and guiding the engineering of thermostable enzymes.  相似文献   

6.

Background  

The number of protein sequences deriving from genome sequencing projects is outpacing our knowledge about the function of these proteins. With the gap between experimentally characterized and uncharacterized proteins continuing to widen, it is necessary to develop new computational methods and tools for functional prediction. Knowledge of catalytic sites provides a valuable insight into protein function. Although many computational methods have been developed to predict catalytic residues and active sites, their accuracy remains low, with a significant number of false positives. In this paper, we present a novel method for the prediction of catalytic sites, using a carefully selected, supervised machine learning algorithm coupled with an optimal discriminative set of protein sequence conservation and structural properties.  相似文献   

7.
8.
The current state of the art in medical genetics is to identify and classify the functional (deleterious) or non-functional (neutral) single amino acid substitutions (SAPs), also known as non-synonymous SNPs (nsSNPs). The primary goal is to elucidate the mechanisms through which functional SAPs exert their effects, and ultimately interrogating this information for association with complex phenotypes. This work focuses on coagulation factors involved in the coagulation cascade pathway which plays a vital role in the maintenance of homeostasis in the human system. We developed an integrated coagulation variation database, CoagVDb, which makes use of the biological information from various public databases such as NCBI, OMIM, UniProt, PDB and SAPs (rsIDs/variant). CoagVDb enriched with computational prediction scores classify SAPs as either deleterious or tolerated. Also, various other properties are incorporated such as amino acid composition, secondary structure elements, solvent accessibility, ordered/disordered regions, conservation, and the presence of disulfide bonds. This specialized database provides integration of various prediction scores from different computational methods along with gene, protein, and disease information. We hope our database will act as a useful reference resource for hematologists to reveal protein structure–function relationship and disease genotype–phenotype correlation.

Electronic supplementary material

The online version of this article (doi:10.1186/s40659-015-0028-5) contains supplementary material, which is available to authorized users.  相似文献   

9.
Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify “inter-paralog inversions”, i.e., sites where the relationship between the ancestry and the functional signal is decoupled. The amino acids in these sites are masked from being recognized by other prediction tools. Still, they play a role in functional divergence and could indicate a shift in protein function. We develop a method to specifically recognize inter-paralog amino acid inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 amino acid sites that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain. Our work uncovers an outcome of protein duplications with direct implications in protein functional annotation and sequence evolution. The developed method is optimized to work with large protein datasets and can be readily included in a targeted protein analysis pipeline.  相似文献   

10.
Restriction‐site‐associated DNA sequencing (RADseq) has become an accessible way to obtain genome‐wide data in the form of single‐nucleotide polymorphisms (SNPs) for phylogenetic inference. Nonetheless, how differences in RADseq methods influence phylogenetic estimation is poorly understood because most comparisons have largely relied on conceptual predictions rather than empirical tests. We examine how differences in ddRAD and 2bRAD data influence phylogenetic estimation in two non‐model frog groups. We compare the impact of method choice on phylogenetic information, missing data, and allelic dropout, considering different sequencing depths. Given that researchers must balance input (funding, time) with output (amount and quality of data), we also provide comparisons of laboratory effort, computational time, monetary costs, and the repeatability of library preparation and sequencing. Both 2bRAD and ddRAD methods estimated well‐supported trees, even at low sequencing depths, and had comparable amounts of missing data, patterns of allelic dropout, and phylogenetic signal. Compared to ddRAD, 2bRAD produced more repeatable datasets, had simpler laboratory protocols, and had an overall faster bioinformatics assembly. However, many fewer parsimony‐informative sites per SNP were obtained from 2bRAD data when using native pipelines, highlighting a need for further investigation into the effects of each pipeline on resulting datasets. Our study underscores the importance of comparing RADseq methods, such as expected results and theoretical performance using empirical datasets, before undertaking costly experiments.  相似文献   

11.
Endozoochory, a mutualistic interaction between plants and frugivores, is one of the key processes responsible for maintenance of tropical biodiversity. Islands, which have a smaller subset of plants and frugivores when compared with mainland communities, offer an interesting setting to understand the organization of plant–frugivore communities vis‐a‐vis the mainland sites. We examined the relative influence of functional traits and phylogenetic relationships on the plant–seed disperser interactions on an island and a mainland site. The island site allowed us to investigate the organization of the plant–seed disperser community in the natural absence of key frugivore groups (bulbuls and barbets) of Asian tropics. The endemic Narcondam Hornbill was the most abundant frugivore on the island and played a central role in the community. Species strength of frugivores (a measure of relevance of frugivores for plants) was positively associated with their abundance. Among plants, figs had the highest species strength and played a central role in the community. Island‐mainland comparison revealed that the island plant–seed disperser community was more asymmetric, connected, and nested as compared to the mainland community. Neither phylogenetic relationships nor functional traits (after controlling for phylogenetic relationships) were able to explain the patterns of interactions between plants and frugivores on the island or the mainland pointing toward the diffused nature of plant–frugivore interactions. The diffused nature is a likely consequence of plasticity in foraging behavior and trait convergence that contribute to governing the interactions between plants and frugivores. This is one of the few studies to compare the plant–seed disperser communities between a tropical island and mainland and demonstrates key role played by a point‐endemic frugivore in seed dispersal on island.  相似文献   

12.
High‐resolution experimental structural determination of protein–protein interactions has led to valuable mechanistic insights, yet due to the massive number of interactions and experimental limitations there is a need for computational methods that can accurately model their structures. Here we explore the use of the recently developed deep learning method, AlphaFold, to predict structures of protein complexes from sequence. With a benchmark of 152 diverse heterodimeric protein complexes, multiple implementations and parameters of AlphaFold were tested for accuracy. Remarkably, many cases (43%) had near‐native models (medium or high critical assessment of predicted interactions accuracy) generated as top‐ranked predictions by AlphaFold, greatly surpassing the performance of unbound protein–protein docking (9% success rate for near‐native top‐ranked models), however AlphaFold modeling of antibody–antigen complexes within our set was unsuccessful. We identified sequence and structural features associated with lack of AlphaFold success, and we also investigated the impact of multiple sequence alignment input. Benchmarking of a multimer‐optimized version of AlphaFold (AlphaFold‐Multimer) with a set of recently released antibody–antigen structures confirmed a low rate of success for antibody–antigen complexes (11% success), and we found that T cell receptor–antigen complexes are likewise not accurately modeled by that algorithm, showing that adaptive immune recognition poses a challenge for the current AlphaFold algorithm and model. Overall, our study demonstrates that end‐to‐end deep learning can accurately model many transient protein complexes, and highlights areas of improvement for future developments to reliably model any protein–protein interaction of interest.  相似文献   

13.
Interactions between small molecules and proteins play critical roles in regulating and facilitating diverse biological functions, yet our ability to accurately re-engineer the specificity of these interactions using computational approaches has been limited. One main difficulty, in addition to inaccuracies in energy functions, is the exquisite sensitivity of protein–ligand interactions to subtle conformational changes, coupled with the computational problem of sampling the large conformational search space of degrees of freedom of ligands, amino acid side chains, and the protein backbone. Here, we describe two benchmarks for evaluating the accuracy of computational approaches for re-engineering protein-ligand interactions: (i) prediction of enzyme specificity altering mutations and (ii) prediction of sequence tolerance in ligand binding sites. After finding that current state-of-the-art “fixed backbone” design methods perform poorly on these tests, we develop a new “coupled moves” design method in the program Rosetta that couples changes to protein sequence with alterations in both protein side-chain and protein backbone conformations, and allows for changes in ligand rigid-body and torsion degrees of freedom. We show significantly increased accuracy in both predicting ligand specificity altering mutations and binding site sequences. These methodological improvements should be useful for many applications of protein – ligand design. The approach also provides insights into the role of subtle conformational adjustments that enable functional changes not only in engineering applications but also in natural protein evolution.  相似文献   

14.
Proteins play important roles in living organisms, and their function is directly linked with their structure. Due to the growing gap between the number of proteins being discovered and their functional characterization (in particular as a result of experimental limitations), reliable prediction of protein function through computational means has become crucial. This paper reviews the machine learning techniques used in the literature, following their evolution from simple algorithms such as logistic regression to more advanced methods like support vector machines and modern deep neural networks. Hyperparameter optimization methods adopted to boost prediction performance are presented. In parallel, the metamorphosis in the features used by these algorithms from classical physicochemical properties and amino acid composition, up to text-derived features from biomedical literature and learned feature representations using autoencoders, together with feature selection and dimensionality reduction techniques, are also reviewed. The success stories in the application of these techniques to both general and specific protein function prediction are discussed.  相似文献   

15.
16.
Many proteins involved in signal transduction contain peptide recognition modules (PRMs) that recognize short linear motifs (SLiMs) within their interaction partners. Here, we used large‐scale peptide‐phage display methods to derive optimal ligands for 163 unique PRMs representing 79 distinct structural families. We combined the new data with previous data that we collected for the large SH3, PDZ, and WW domain families to assemble a database containing 7,984 unique peptide ligands for 500 PRMs representing 82 structural families. For 74 PRMs, we acquired enough new data to map the specificity profiles in detail and derived position weight matrices and binding specificity logos based on multiple peptide ligands. These analyses showed that optimal peptide ligands resembled peptides observed in existing structures of PRM‐ligand complexes, indicating that a large majority of the phage‐derived peptides are likely to target natural peptide‐binding sites and could thus act as inhibitors of natural protein–protein interactions. The complete dataset has been assembled in an online database (http://www.prm‐db.org) that will enable many structural, functional, and biological studies of PRMs and SLiMs.  相似文献   

17.
Proteins do not carry out their functions alone. Instead, they often act by participating in macromolecular complexes and play different functional roles depending on the other members of the complex. It is therefore interesting to identify co-complex relationships. Although protein complexes can be identified in a high-throughput manner by experimental technologies such as affinity purification coupled with mass spectrometry (APMS), these large-scale datasets often suffer from high false positive and false negative rates. Here, we present a computational method that predicts co-complexed protein pair (CCPP) relationships using kernel methods from heterogeneous data sources. We show that a diffusion kernel based on random walks on the full network topology yields good performance in predicting CCPPs from protein interaction networks. In the setting of direct ranking, a diffusion kernel performs much better than the mutual clustering coefficient. In the setting of SVM classifiers, a diffusion kernel performs much better than a linear kernel. We also show that combination of complementary information improves the performance of our CCPP recognizer. A summation of three diffusion kernels based on two-hybrid, APMS, and genetic interaction networks and three sequence kernels achieves better performance than the sequence kernels or diffusion kernels alone. Inclusion of additional features achieves a still better ROC(50) of 0.937. Assuming a negative-to-positive ratio of 600ratio1, the final classifier achieves 89.3% coverage at an estimated false discovery rate of 10%. Finally, we applied our prediction method to two recently described APMS datasets. We find that our predicted positives are highly enriched with CCPPs that are identified by both datasets, suggesting that our method successfully identifies true CCPPs. An SVM classifier trained from heterogeneous data sources provides accurate predictions of CCPPs in yeast. This computational method thereby provides an inexpensive method for identifying protein complexes that extends and complements high-throughput experimental data.  相似文献   

18.
Mitogen‐activated protein kinases (MAPK) are broadly used regulators of cellular signaling. However, how these enzymes can be involved in such a broad spectrum of physiological functions is not understood. Systematic discovery of MAPK networks both experimentally and in silico has been hindered because MAPKs bind to other proteins with low affinity and mostly in less‐characterized disordered regions. We used a structurally consistent model on kinase‐docking motif interactions to facilitate the discovery of short functional sites in the structurally flexible and functionally under‐explored part of the human proteome and applied experimental tools specifically tailored to detect low‐affinity protein–protein interactions for their validation in vitro and in cell‐based assays. The combined computational and experimental approach enabled the identification of many novel MAPK‐docking motifs that were elusive for other large‐scale protein–protein interaction screens. The analysis produced an extensive list of independently evolved linear binding motifs from a functionally diverse set of proteins. These all target, with characteristic binding specificity, an ancient protein interaction surface on evolutionarily related but physiologically clearly distinct three MAPKs (JNK, ERK, and p38). This inventory of human protein kinase binding sites was compared with that of other organisms to examine how kinase‐mediated partnerships evolved over time. The analysis suggests that most human MAPK‐binding motifs are surprisingly new evolutionarily inventions and newly found links highlight (previously hidden) roles of MAPKs. We propose that short MAPK‐binding stretches are created in disordered protein segments through a variety of ways and they represent a major resource for ancient signaling enzymes to acquire new regulatory roles.  相似文献   

19.
Land‐use intensification is the main factor for the catastrophic decline of insect pollinators. However, land‐use intensification includes multiple processes that act across various scales and should affect pollinator guilds differently depending on their ecology. We aimed to reveal how two main pollinator guilds, wild bees and hoverflies, respond to different land‐use intensification measures, that is, arable field cover (AFC), landscape heterogeneity (LH), and functional flower composition of local plant communities as a measure of habitat quality. We sampled wild bees and hoverflies on 22 dry grassland sites within a highly intensified landscape (NE Germany) within three campaigns using pan traps. We estimated AFC and LH on consecutive radii (60–3000 m) around the dry grassland sites and estimated the local functional flower composition. Wild bee species richness and abundance was positively affected by LH and negatively by AFC at small scales (140–400 m). In contrast, hoverflies were positively affected by AFC and negatively by LH at larger scales (500–3000 m), where both landscape parameters were negatively correlated to each other. At small spatial scales, though, LH had a positive effect on hoverfly abundance. Functional flower diversity had no positive effect on pollinators, but conspicuous flowers seem to attract abundance of hoverflies. In conclusion, landscape parameters contrarily affect two pollinator guilds at different scales. The correlation of landscape parameters may influence the observed relationships between landscape parameters and pollinators. Hence, effects of land‐use intensification seem to be highly landscape‐specific.  相似文献   

20.
Predictome: a database of putative functional links between proteins   总被引:11,自引:2,他引:9       下载免费PDF全文
The current deluge of genomic sequences has spawned the creation of tools capable of making sense of the data. Computational and high-throughput experimental methods for generating links between proteins have recently been emerging. These methods effectively act as hypothesis machines, allowing researchers to screen large sets of data to detect interesting patterns that can then be studied in greater detail. Although the potential use of these putative links in predicting gene function has been demonstrated, a central repository for all such links for many genomes would maximize their usefulness. Here we present Predictome, a database of predicted links between the proteins of 44 genomes based on the implementation of three computational methods—chromosomal proximity, phylogenetic profiling and domain fusion—and large-scale experimental screenings of protein–protein interaction data. The combination of data from various predictive methods in one database allows for their comparison with each other, as well as visualization of their correlation with known pathway information. As a repository for such data, Predictome is an ongoing resource for the community, providing functional relationships among proteins as new genomic data emerges. Predictome is available at http://predictome.bu.edu.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号