期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Predicting protein-protein interactions using signature products 总被引：6，自引：0，他引：6

Martin S Roe D Faulon JL 《Bioinformatics (Oxford, England)》2005,21(2):218-226

相似文献

2.

Assessing computational methods of cis-regulatory module prediction

Su J Teichmann SA Down TA 《PLoS computational biology》2010,6(12):e1001020

相似文献

3.

An assessment of the uses of homologous interactions

Saeed R Deane C 《Bioinformatics (Oxford, England)》2008,24(5):689-695

MOTIVATION: Protein-protein interactions have proved to be a valuable starting point for understanding the inner workings of the cell. Computational methodologies have been built which both predict interactions and use interaction datasets in order to predict other protein features. Such methods require gold standard positive (GSP) and negative (GSN) interaction sets. Here we examine and demonstrate the usefulness of homologous interactions in predicting good quality positive and negative interaction datasets. RESULTS: We generate GSP interaction sets as subsets from experimental data using only interaction and sequence information. We can therefore produce sets for several species (many of which at present have no identified GSPs). Comprehensive error rate testing demonstrates the power of the method. We also show how the use of our datasets significantly improves the predictive power of algorithms for interaction prediction and function prediction. Furthermore, we generate GSN interaction sets for yeast and examine the use of homology along with other protein properties such as localization, expression and function. Using a novel method to assess the accuracy of a negative interaction set, we find that the best single selector for negative interactions is a lack of co-function. However, an integrated method using all the characteristics shows significant improvement over any current method for identifying GSN interactions. The nature of homologous interactions is also examined and we demonstrate that interologs are found more commonly within species than across species. CONCLUSION: GSP sets built using our homologous verification method are demonstrably better than standard sets in terms of predictive ability. We can build such GSP sets for several species. When generating GSNs we show a combination of protein features and lack of homologous interactions gives the highest quality interaction sets. AVAILABILITY: GSP and GSN datasets for all the studied species can be downloaded from http://www.stats.ox.ac.uk/~deane/HPIV. 相似文献

4.

Fine scale waterbody data improve prediction of waterbird occurrence despite coarse species data 总被引：1，自引：0，他引：1

Petra &#x;ímov Vítzslav Moudrý Jan Komrek Karel Hrach Marie‐Jose Fortin 《Ecography》2019,42(3):511-520

While modelling habitat suitability and species distribution, ecologists must deal with issues related to the spatial resolution of species occurrence and environmental data. Indeed, given that the spatial resolution of species and environmental datasets range from centimeters to hundreds of kilometers, it underlines the importance of choosing the optimal combination of resolutions to achieve the highest possible modelling prediction accuracy. We evaluated how the spatial resolution of land cover/waterbody datasets (meters to 1 km) affect waterbird habitat suitability models based on atlas data (grid cell of 12 × 11 km). We hypothesized that the area, perimeter and number of waterbodies computed from high resolution datasets would explain distributions of waterbirds better because coarse resolution datasets omit small waterbodies affecting species occurrence. Specifically, we investigated which spatial resolution of waterbodies better explain the distribution of seven waterbirds nesting on ponds/lakes with areas ranging from 0.1 ha to hundreds of hectares. Our results show that the area and perimeter of waterbodies derived from high resolution datasets (raster data with 30 m resolution, vector data corresponding with map scale 1:10 000) explain the distribution of the waterbirds better than those calculated using less accurate datasets despite the coarse grain of the species data. Taking into account the spatial extent (global vs regional) of the datasets, we found the Global Inland Waterbody Dataset to be the most suitable for modelling distribution of waterbirds. In general, we recommend using land cover data of a resolution sufficient to capture the smallest patches of the habitat suitable for a given species’ presence for both fine and coarse grain habitat suitability and distribution modelling. 相似文献

5.

Identification of cis-regulatory modules encoding temporal dynamics during development

Delphine Potier Denis Seyres Céline Guichard Magali Iche-Torres Stein Aerts Carl Herrmann Laurent Perrin 《BMC genomics》2014,15(1)

相似文献

6.

Decoding the genome with an integrative analysis tool: combinatorial CRM Decoder

Kang K Kim J Chung JH Lee D 《Nucleic acids research》2011,39(17):e116

The identification of genome-wide cis-regulatory modules (CRMs) and characterization of their associated epigenetic features are fundamental steps toward the understanding of gene regulatory networks. Although integrative analysis of available genome-wide information can provide new biological insights, the lack of novel methodologies has become a major bottleneck. Here, we present a comprehensive analysis tool called combinatorial CRM decoder (CCD), which utilizes the publicly available information to identify and characterize genome-wide CRMs in a species of interest. CCD first defines a set of the epigenetic features which is significantly associated with a set of known CRMs as a code called ‘trace code’, and subsequently uses the trace code to pinpoint putative CRMs throughout the genome. Using 61 genome-wide data sets obtained from 17 independent mouse studies, CCD successfully catalogued ∼12 600 CRMs (five distinct classes) including polycomb repressive complex 2 target sites as well as imprinting control regions. Interestingly, we discovered that ∼4% of the identified CRMs belong to at least two different classes named ‘multi-functional CRM’, suggesting their functional importance for regulating spatiotemporal gene expression. From these examples, we show that CCD can be applied to any potential genome-wide datasets and therefore will shed light on unveiling genome-wide CRMs in various species. 相似文献

7.

Prediction of similarly acting cis-regulatory modules by subsequence profiling and comparative genomics in Drosophila melanogaster and D.pseudoobscura 总被引：2，自引：0，他引：2

Grad YH Roth FP Halfon MS Church GM 《Bioinformatics (Oxford, England)》2004,20(16):2738-2750

相似文献

8.

Identification of cis-regulatory modules for adeno-associated virus-based cell-type-specific targeting in the retina and brain

Cheng-Hui Lin Yue Sun Candace S.Y. Chan Man-Ru Wu Lei Gu Alexander E. Davis Baokun Gu Wenlin Zhang Bogdan Tanasa Lei R. Zhong Mark M. Emerson Lu Chen Jun B. Ding Sui Wang 《The Journal of biological chemistry》2022,298(4)

相似文献

9.

Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets

Manikandan Narayanan Adrian Vetta Eric E. Schadt Jun Zhu 《PLoS computational biology》2010,6(4)

相似文献

10.

Exploring and linking biomedical resources through multidimensional semantic spaces

Berlanga Rafael Jim&#;nez-Ruiz Ernesto Nebot Victoria 《BMC bioinformatics》2012,13(1):1-17

相似文献

11.

Analysing and Correcting the Differences between Multi-Source and Multi-Scale Spatial Remote Sensing Observations

Yingying Dong Ruisen Luo Haikuan Feng Jihua Wang Jinling Zhao Yining Zhu Guijun Yang 《PloS one》2014,9(11)

Differences exist among analysis results of agriculture monitoring and crop production based on remote sensing observations, which are obtained at different spatial scales from multiple remote sensors in same time period, and processed by same algorithms, models or methods. These differences can be mainly quantitatively described from three aspects, i.e. multiple remote sensing observations, crop parameters estimation models, and spatial scale effects of surface parameters. Our research proposed a new method to analyse and correct the differences between multi-source and multi-scale spatial remote sensing surface reflectance datasets, aiming to provide references for further studies in agricultural application with multiple remotely sensed observations from different sources. The new method was constructed on the basis of physical and mathematical properties of multi-source and multi-scale reflectance datasets. Theories of statistics were involved to extract statistical characteristics of multiple surface reflectance datasets, and further quantitatively analyse spatial variations of these characteristics at multiple spatial scales. Then, taking the surface reflectance at small spatial scale as the baseline data, theories of Gaussian distribution were selected for multiple surface reflectance datasets correction based on the above obtained physical characteristics and mathematical distribution properties, and their spatial variations. This proposed method was verified by two sets of multiple satellite images, which were obtained in two experimental fields located in Inner Mongolia and Beijing, China with different degrees of homogeneity of underlying surfaces. Experimental results indicate that differences of surface reflectance datasets at multiple spatial scales could be effectively corrected over non-homogeneous underlying surfaces, which provide database for further multi-source and multi-scale crop growth monitoring and yield prediction, and their corresponding consistency analysis evaluation. 相似文献

12.

Classification of small molecules by two- and three-dimensional decomposition kernels

Ceroni A Costa F Frasconi P 《Bioinformatics (Oxford, England)》2007,23(16):2038-2045

MOTIVATION: Several kernel-based methods have been recently introduced for the classification of small molecules. Most available kernels on molecules are based on 2D representations obtained from chemical structures, but far less work has focused so far on the definition of effective kernels that can also exploit 3D information. RESULTS: We introduce new ideas for building kernels on small molecules that can effectively use and combine 2D and 3D information. We tested these kernels in conjunction with support vector machines for binary classification on the 60 NCI cancer screening datasets as well as on the NCI HIV data set. Our results show that 3D information leveraged by these kernels can consistently improve prediction accuracy in all datasets. AVAILABILITY: An implementation of the small molecule classifier is available from http://www.dsi.unifi.it/neural/src/3DDK. 相似文献

13.

Cost effective prediction of the eutrophication status of lakes and reservoirs

A. CATHERINE D. MOUILLOT N. ESCOFFIER C. BERNARD M. TROUSSELLIER 《Freshwater Biology》2010,55(11):2425-2435

1. Eutrophication is a serious threat in many parts of the world, and identifying the environmental factors that determine the spatial distribution of eutrophicated waterbodies as well as the development of management tools is a challenge. 2. In this study, data from the Ile‐de‐France region were analysed to determine if catchment scale environmental variables could predict concentrations of chlorophyll a (used as a proxy for eutrophication status) of artificial lakes and reservoirs. 3. General additive models (GAM) and random forest models (RF) displayed greater predictive power than generalised linear models, indicating the importance of non‐monotonic relationships. Using RF modelling, very high predictive accuracy was achieved for both continuous and binomial (eutrophic or not) response variables (continuous: R² = 0.715; binomial: kappa = 0.764, 89% of waterbodies were accurately predicted). The better predictive power and robustness of RF versus GAM was attributed to the formers ability to better handle complex interactions between predictors and to account for threshold effects. 4. Our results confirmed the close link between the water quality of lakes and reservoirs and the characteristics of their catchments. Moreover, we also showed that (i) simple (e.g. linear and/or monotonic) relationships between catchment land use and water quality were only found for sub‐regional datasets, and (ii) land use needs to be considered in association with complementary environmental variables (hydromorphological variables) to best assess its impact on water quality. 相似文献

14.

Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation 总被引：43，自引：1，他引：42

Steven J. Phillips Miroslav Dudík 《Ecography》2008,31(2):161-175

Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time‐consuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use “default settings”, tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presence‐only data. We evaluate our method on independently collected high‐quality presence‐absence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce “hinge features” that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore “background sampling” strategies that cope with sample selection bias and decrease model‐building time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presence‐only data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model performance; 3) logistic output improves model calibration, so that large differences in output values correspond better to large differences in suitability; 4) “target‐group” background sampling can give much better predictive performance than random background sampling; 5) random background sampling results in a dramatic decrease in running time, with no decrease in model performance. 相似文献

15.

Brachyury,Foxa2 and the cis-Regulatory Origins of the Notochord

Diana S. José-Edwards Izumi Oda-Ishii Jamie E. Kugler Yale J. Passamaneck Lavanya Katikala Yutaka Nibu Anna Di Gregorio 《PLoS genetics》2015,11(12)

相似文献

16.

Increasing confidence of protein interactomes using network topological metrics 总被引：4，自引：0，他引：4

Chen J Hsu W Lee ML Ng SK 《Bioinformatics (Oxford, England)》2006,22(16):1998-2004

MOTIVATION: Experimental limitations in high-throughput protein-protein interaction detection methods have resulted in low quality interaction datasets that contained sizable fractions of false positives and false negatives. Small-scale, focused experiments are then needed to complement the high-throughput methods to extract true protein interactions. However, the naturally vast interactomes would require much more scalable approaches. RESULTS: We describe a novel method called IRAP* as a computational complement for repurification of the highly erroneous experimentally derived protein interactomes. Our method involves an iterative process of removing interactions that are confidently identified as false positives and adding interactions detected as false negatives into the interactomes. Identification of both false positives and false negatives are performed in IRAP* using interaction confidence measures based on network topological metrics. Potential false positives are identified amongst the detected interactions as those with very low computed confidence values, while potential false negatives are discovered as the undetected interactions with high computed confidence values. Our results from applying IRAP* on large-scale interaction datasets generated by the popular yeast-two-hybrid assays for yeast, fruit fly and worm showed that the computationally repurified interaction datasets contained potentially lower fractions of false positive and false negative errors based on functional homogeneity. AVAILABILITY: The confidence indices for PPIs in yeast, fruit fly and worm as computed by our method can be found at our website http://www.comp.nus.edu.sg/~chenjin/fpfn. 相似文献

17.

Multiple pheromone types and other extensions to the Ant-Miner classification rule discovery algorithm 总被引：1，自引：0，他引：1

Khalid M. Salama Ashraf M. Abdelbar Alex A. Freitas 《Swarm Intelligence》2011,5(3-4):149-182

Ant-Miner is an ant-based algorithm for the discovery of classification rules. This paper proposes five extensions to Ant-Miner: (1) we utilize multiple types of pheromone, one for each permitted rule class, i.e. an ant first selects the rule class and then deposits the corresponding type of pheromone; (2) we use a quality contrast intensifier to magnify the reward of high-quality rules and to penalize low-quality rules in terms of pheromone update; (3) we allow the use of a logical negation operator in the antecedents of constructed rules; (4) we incorporate stubborn ants, an ACO variation in which an ant is allowed to take into consideration its own personal past history; (5) we use an ant colony behavior in which each ant is allowed to have its own values of the ?? and ?? parameters (in a sense, to have its own personality). Empirical results on 23 datasets show improvements in the algorithm??s performance in terms of predictive accuracy and simplicity of the generated rule set. 相似文献

18.

Unveiling combinatorial regulation through the combination of ChIP information and in silico cis-regulatory module detection

Sun H Guns T Fierro AC Thorrez L Nijssen S Marchal K 《Nucleic acids research》2012,40(12):e90

相似文献

19.

Genetic algorithms applied to multi-class prediction for the analysis of gene expression data 总被引：9，自引：0，他引：9

Ooi CH Tan P 《Bioinformatics (Oxford, England)》2003,19(1):37-44

MOTIVATION: An important challenge in the use of large-scale gene expression data for biological classification occurs when the expression dataset being analyzed involves multiple classes. Key issues that need to be addressed under such circumstances are the efficient selection of good predictive gene groups from datasets that are inherently 'noisy', and the development of new methodologies that can enhance the successful classification of these complex datasets. METHODS: We have applied genetic algorithms (GAs) to the problem of multi-class prediction. A GA-based gene selection scheme is described that automatically determines the members of a predictive gene group, as well as the optimal group size, that maximizes classification success using a maximum likelihood (MLHD) classification method. RESULTS: The GA/MLHD-based approach achieves higher classification accuracies than other published predictive methods on the same multi-class test dataset. It also permits substantial feature reduction in classifier genesets without compromising predictive accuracy. We propose that GA-based algorithms may represent a powerful new tool in the analysis and exploration of complex multi-class gene expression data. AVAILABILITY: Supplementary information, data sets and source codes are available at http://www.omniarray.com/bioinformatics/GA. 相似文献

20.

Spatial genetic structure and landscape connectivity in black bears: Investigating the significance of using different land cover datasets and classifications in landscape genetics analyses

Hope M. Draheim Jennifer A. Moore Scott R. Winterstein Kim T. Scribner 《Ecology and evolution》2021,11(2):978

Landscape genetic analyses allow detection of fine‐scale spatial genetic structure (SGS) and quantification of effects of landscape features on gene flow and connectivity. Typically, analyses require generation of resistance surfaces. These surfaces characteristically take the form of a grid with cells that are coded to represent the degree to which landscape or environmental features promote or inhibit animal movement. How accurately resistance surfaces predict association between the landscape and movement is determined in large part by (a) the landscape features used, (b) the resistance values assigned to features, and (c) how accurately resistance surfaces represent landscape permeability. Our objective was to evaluate the performance of resistance surfaces generated using two publicly available land cover datasets that varied in how accurately they represent the actual landscape. We genotyped 365 individuals from a large black bear population (Ursus americanus) in the Northern Lower Peninsula (NLP) of Michigan, USA at 12 microsatellite loci, and evaluated the relationship between gene flow and landscape features using two different land cover datasets. We investigated the relative importance of land cover classification and accuracy on landscape resistance model performance. We detected local spatial genetic structure in Michigan''s NLP black bears and found roads and land cover were significantly correlated with genetic distance. We observed similarities in model performance when different land cover datasets were used despite 21% dissimilarity in classification between the two land cover datasets. However, we did find the performance of land cover models to predict genetic distance was dependent on the way the land cover was defined. Models in which land cover was finely defined (i.e., eight land cover classes) outperformed models where land cover was defined more coarsely (i.e., habitat/non‐habitat or forest/non‐forest). Our results show that landscape genetic researchers should carefully consider how land cover classification changes inference in landscape genetic studies. 相似文献