首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Genomic islands (GIs) are clusters of alien genes in some bacterial genomes, but not be seen in the genomes of other strains within the same genus. The detection of GIs is extremely important to the medical and environmental communities. Despite the discovery of the GI associated features, accurate detection of GIs is still far from satisfactory.

Results

In this paper, we combined multiple GI-associated features, and applied and compared various machine learning approaches to evaluate the classification accuracy of GIs datasets on three genera: Salmonella, Staphylococcus, Streptococcus, and their mixed dataset of all three genera. The experimental results have shown that, in general, the decision tree approach outperformed better than other machine learning methods according to five performance evaluation metrics. Using J48 decision trees as base classifiers, we further applied four ensemble algorithms, including adaBoost, bagging, multiboost and random forest, on the same datasets. We found that, overall, these ensemble classifiers could improve classification accuracy.

Conclusions

We conclude that decision trees based ensemble algorithms could accurately classify GIs and non-GIs, and recommend the use of these methods for the future GI data analysis. The software package for detecting GIs can be accessed at http://www.esu.edu/cpsc/che_lab/software/GIDetector/.
  相似文献   

2.
A Genomic Islands (GI) is a chunk of DNA sequence in a genome whose origin can be traced back to other organisms or viruses. The detection of GIs plays an indispensable role in biomedical research, due to the fact that GIs are highly related to special functionalities such as disease-causing GIs - pathogenicity islands. It is also very important to visualize genomic islands, as well as the supporting features corresponding to the genomic islands in the genome. We have developed a program, Genomic Island Visualization (GIV), which displays the locations of genomic islands in a genome, as well as the corresponding supportive feature information for GIs. GIV was implemented in C++, and was compiled and executed on Linux/Unix operating systems.

Availability

GIV is freely available for non-commercial use at http://www5.esu.edu/cpsc/bioinfo/software/GIV  相似文献   

3.
Che D  Hasan MS  Wang H  Fazekas J  Huang J  Liu Q 《Bioinformation》2011,7(6):311-314
Genomic islands (GIs) are genomic regions that are originally transferred from other organisms. The detection of genomic islands in genomes can lead to many applications in industrial, medical and environmental contexts. Existing computational tools for GI detection suffer either low recall or low precision, thus leaving the room for improvement. In this paper, we report the development of our Ensemble algorithm for Genomic Island Detection (EGID). EGID utilizes the prediction results of existing computational tools, filters and generates consensus prediction results. Performance comparisons between our ensemble algorithm and existing programs have shown that our ensemble algorithm is better than any other program. EGID was implemented in Java, and was compiled and executed on Linux operating systems. EGID is freely available at http://www5.esu.edu/cpsc/bioinfo/software/EGID.  相似文献   

4.
Hasan MS  Liu Q  Wang H  Fazekas J  Chen B  Che D 《Bioinformation》2012,8(4):203-205
Genomic Islands (GIs) are genomic regions that are originally from other organisms, through a process known as Horizontal Gene Transfer (HGT). Detection of GIs plays a significant role in biomedical research since such align genomic regions usually contain important features, such as pathogenic genes. We have developed a use friendly graphic user interface, Genomic Island Suite of Tools (GIST), which is a platform for scientific users to predict GIs. This software package includes five commonly used tools, AlienHunter, IslandPath, Colombo SIGI-HMM, INDeGenIUS and Pai-Ida. It also includes an optimization program EGID that ensembles the result of existing tools for more accurate prediction. The tools in GIST can be used either separately or sequentially. GIST also includes a downloadable feature that facilitates collecting the input genomes automatically from the FTP server of the National Center for Biotechnology Information (NCBI). GIST was implemented in Java, and was compiled and executed on Linux/Unix operating systems. AVAILABILITY: The database is available for free at http://www5.esu.edu/cpsc/bioinfo/software/GIST.  相似文献   

5.

Background

It has been noted that many bacterial virulence factor genes are located within genomic islands (GIs; clusters of genes in a prokaryotic genome of probable horizontal origin). However, such studies have been limited to single genera or isolated observations. We have performed the first large-scale analysis of multiple diverse pathogens to examine this association. We additionally identified genes found predominantly in pathogens, but not non-pathogens, across multiple genera using 631 complete bacterial genomes, and we identified common trends in virulence for genes in GIs. Furthermore, we examined the relationship between GIs and clustered regularly interspaced palindromic repeats (CRISPRs) proposed to confer resistance to phage.

Methodology/Principal Findings

We show quantitatively that GIs disproportionately contain more virulence factors than the rest of a given genome (p<1E-40 using three GI datasets) and that CRISPRs are also over-represented in GIs. Virulence factors in GIs and pathogen-associated virulence factors are enriched for proteins having more “offensive” functions, e.g. active invasion of the host, and are disproportionately components of type III/IV secretion systems or toxins. Numerous hypothetical pathogen-associated genes were identified, meriting further study.

Conclusions/Significance

This is the first systematic analysis across diverse genera indicating that virulence factors are disproportionately associated with GIs. “Offensive” virulence factors, as opposed to host-interaction factors, may more often be a recently acquired trait (on an evolutionary time scale detected by GI analysis). Newly identified pathogen-associated genes warrant further study. We discuss the implications of these results, which cement the significant role of GIs in the evolution of many pathogens.  相似文献   

6.
Zhang  Yanlin  Liu  Weiwei  Lin  Yu  Ng  Yen Kaow  Li  Shuaicheng 《BMC genomics》2019,20(2):129-141
Background

Recent advances in genome analysis have established that chromatin has preferred 3D conformations, which bring distant loci into contact. Identifying these contacts is important for us to understand possible interactions between these loci. This has motivated the creation of the Hi-C technology, which detects long-range chromosomal interactions. Distance geometry-based algorithms, such as ChromSDE and ShRec3D, have been able to utilize Hi-C data to infer 3D chromosomal structures. However, these algorithms, being matrix-based, are space- and time-consuming on very large datasets. A human genome of 100 kilobase resolution would involve ∼30,000 loci, requiring gigabytes just in storing the matrices.

Results

We propose a succinct representation of the distance matrices which tremendously reduces the space requirement. We give a complete solution, called SuperRec, for the inference of chromosomal structures from Hi-C data, through iterative solving the large-scale weighted multidimensional scaling problem.

Conclusions

SuperRec runs faster than earlier systems without compromising on result accuracy. The SuperRec package can be obtained from http://www.cs.cityu.edu.hk/~shuaicli/SuperRec.

  相似文献   

7.
Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project.  相似文献   

8.
Background: Left ventricular assist devices (LVADs) provide support for patients with end-stage heart failure. The aims of this study were to determine whether baseline analysis and early trends in routine laboratory data, platelet activity, and thromboinflammatory biomarkers following LVAD implantation reveal trends that predict personalized risks of one-year gastrointestinal (GI) bleeding, stroke, pump thrombosis, drive-line infections and mortality in patients on LVAD support.

Methods: We performed an observational study at the University of Kentucky with 61 participants who underwent first-time LVAD implantation. Blood was collected at baseline and post-op days 0, 1, 3 and 6 as well as clinical follow-up. Demographics, clinical characteristics, one-year adverse events and routine laboratory data were collected from electronic medical records. Platelet function and plasma biomarkers were profiled.

Results: Evaluation of routine laboratory results revealed that sustained thrombocytopenia and increased mean platelet volume (MPV) were associated with development of GI bleeding and mortality. Platelet function at follow-up visit predicted one-year bleeding events. Thrombotic biomarker sCD40L strongly predicted one-year GI bleeding at baseline before implantation and within the first week following LVAD implant.

Conclusions: Early trends in routine bloodwork and platelet function may serve as novel signatures of patients at risk to experience adverse events.  相似文献   


9.
Abstract

The existence and identity of non-Watson-Crick base pairs (bps) within RNA bulges, internal loops, and hairpin loops cannot reliably be predicted by existing algorithms. We have developed the Isfold (Isosteric Folding) program as a tool to examine patterns of nucleotide substitutions from sequence alignments or mutation experiments and identify plausible bp interactions. We infer these interactions based on the observation that each non-Watson-Crick bp has a signature pattern of isosteric substitutions where mutations can be made that preserve the 3D structure. Isfold produces a dynamic representation of predicted bps within defined motifs in order of their probabilities. The software was developed under Windows XP, and is capable of running on PC and MAC with Matlab 7.1 (SP3) or higher. A PC standalone version that does not require Matlab also is available. This software and a user manual are freely available at www.ucsf.edu/frankel/isfold.  相似文献   

10.
Capsule Differences in Cork Oak Quercus suber and Holm Oak Quercus rotundifolia dominance had little influence on bird communities though bark-gleaners showed a foraging preference for Cork Oak.

Aims Examine the use of Cork and Holm Oak trees by insectivorous birds in Mediterranean oak woodlands.

Methods Point-counts were used to compare species abundance among Cork Oak-dominated, Holm Oak-dominated and mixed woodlands. Focal foraging observations were used to evaluate the use of Cork and Holm Oaks in the three habitats and to relate tree characteristics with the foraging time of foliage- and bark-gleaners.

Results Bird densities in the three habitats were not different for most foliage- and bark-gleaners. Tree preference index values and foraging time per tree showed no significant differences between tree species and foraging guilds, however bark-gleaners had positive index values for Cork Oak in the three habitats. The foraging time of foliage- and bark-gleaners on both tree species showed a positive relationship with characteristics associated with arthropod abundance.

Conclusion Cork and Holm Oak trees are equally preferred by foliage-gleaners but bark-gleaners moderately preferred Cork Oak. Characteristics regarding morphology, phenology and physiological condition of trees can be used to predict habitat quality for insectivorous forest birds in Mediterranean oak woodlands.  相似文献   


11.
Human Genome Project Information http://www.ornl.gov/TechResources/Human_Genome/home.html

Access Excellence http://www.accessexcellence.org/

Genetics Science Learning Centre at the Eccles Institute of Human Genetics, University of Utah http://gslc.genetics.utah.edu/

Blazing a Genetic Trial http://www.hhmi.org/GeneticTrail/

A Hypermedia Glossary of Genetic Terms prepared and presented by Birgid Schlindwein http://www.weihenstephan.de/~schlind/genglos.html

Virtual Flylab http://vcourseware5.calstatela.edu/VirtualFlyLab/IntroVflyLab.html

Web watch topics coming up…  相似文献   

12.
13.
Background: Most work in Amazonia has concentrated on dense lowland evergreen rain forest, a vegetation type with >40% cover. Large parts of southern Amazonia are covered by open evergreen lowland rain forest, physiognomically dominated by high abundance of palms. This vegetation type has received relatively little attention so far. Understanding the key predictors of above-ground biomass (AGB) across scales is important to accurately quantify the impacts of land cover change on the terrestrial carbon budget.

Aims: We assessed the structure of southern Amazonian forests, Brazil, to quantify the relative importance of variation in AGB caused by the abundance/density of palm species and by forest structure.

Methods: We stratified the landscape into homogeneous units in terms of vegetation types and elevation for using as a guide for plot establishment. We used the variation partitioning technique to decompose the relative contribution of forest structure and palm abundance.

Results: The AGBcommunity (including trees, palms and lianas) and AGBtree (excluding palms and lianas) significantly decreased with increasing abundance of palms. The Attalea speciosa, a large-leaved palm species, was the most important for explaining the variance of AGB. The total variance of AGBtree was partially explained by a redundant effect of A. speciosa and trees (28%) and by trees alone (62%), based on models of basal area. The redundant effect, alongside with additional analyses, indicated (1) competition between A. speciosa and small trees and (2) covariation between A. speciosa and large trees.

Conclusions: The abundance of palms plays a minor but significant role in predicting the AGB at the local scale in southern Amazonia.  相似文献   


14.
Capsule: Wood Warblers Phylloscopus sibilatrix showed significant selection for tree species and woodland characteristics at staging and wintering sites in sub-Saharan Africa.

Aims: To investigate home range size, habitat and tree species selection of Wood Warblers at a staging site in Burkina Faso (Koubri) and a wintering site in Ghana (Pepease).

Methods: Comparing habitat recorded at locations of radio-tagged birds and at control points, we investigated whether there was habitat and tree species selection. We also compared home range size of individual birds between the two sites.

Results: Home range size did not differ between the two sites. There was significant selection for tree species at both Koubri and Pepease: Anogeissus leiocarpus and Albizia zygia, respectively. At Koubri, there was significant avoidance of the most common tree species (Azadirachta indica, Mangifera indica (both non-native), Vitellaria paradoxa and Acacia spp.). In addition, there was a preference for taller trees and greater tree density at both sites. However, the probability of a point being used declined with increasing number of taller (>14?m) trees.

Conclusion: Fine-scale selection of woodland habitats suggests that Wood Warblers are likely to suffer the consequences of ongoing land-use change in their West African wintering grounds.  相似文献   


15.
Capsule: Nestling Southern Grey Shrikes Lanius meridionalis show a high prevalence of haemosporidian parasites including five lineages described here for the first time.

Aims: To examine the prevalence of various haemosporidian lineages in nestlings of three separated Iberian populations of the Southern Grey Shrike.

Methods: Blood samples were taken from nestling Southern Grey Shrikes from three agroecosystem areas in the Iberian Peninsula. Parasites were detected from blood samples using polymerase chain reaction screening.

Resusts: Nestlings were parasitized by 11 different lineages belonging to the genera Haemoproteus (3.8%), Plasmodium (0.5%) and Leucocytozoon (1.8%), including five new undescribed lineages. These are among the highest prevalence levels of haemosporidians parasites (7.4%) for nestlings of passerine birds.

Conclusion: Our findings suggest that the distribution of avian haemosporidians is determined by complex effects including climate and biogeography. Most parasite lineages were not universally spread across shrike populations, despite being otherwise widespread both geographically and taxonomically.  相似文献   


16.
Capsule: Automated acoustic recording can be used as a valuable survey technique for Capercaillie Tetrao urogallus leks, improving the quality and quantity of field data for this endangered bird species. However, more development work and testing against traditional methods are needed to establish optimal working practices.

Aims: This study aims to determine whether Capercaillie vocalizations can be recognized in lek recordings, whether this can be automated using readily available software, and whether the number of calls resulting varies with location, weather conditions, date and time of day.

Methods: Unattended recording devices and semi-automated call classification software were used to record and analyse the display calls of Capercaillie at three known lek sites in Scotland over a two-week period.

Results: Capercaillie calls were successfully and rapidly identified within a data set that included the vocalizations of other bird species and environmental noise. Calls could be readily recognized to species level using a combination of unsupervised software and manual analysis. The number of calls varied by time and date, by recorder/microphone location at the lek site, and with weather conditions. This information can be used to better target future acoustic monitoring and improve the quality of existing traditional lek surveys.

Conclusion: Bioacoustic methods provide a practical and cost-effective way to determine habitat occupancy and activity levels by a vocally distinctive bird species. Following further testing alongside traditional counting methods, it could offer a significant new approach towards more effective monitoring of local population levels for Capercaillie and other species of conservation concern.  相似文献   


17.
Capsule: The Red-backed Shrike Lanius collurio and the Barred Warbler Sylvia nisoria had similar habitat preferences and their territories often overlapped. However, we found that Red-backed Shrikes were more flexible in habitat choice whilst Barred Warblers had more specific requirements.

Aim: We aimed to analyse and compare distribution and habitat preferences of Red-backed Shrikes and Barred Warblers breeding sympatrically in semi-natural landscape in a wetland/farmland mosaic.

Methods: We examined habitat availability and use by the two species within their breeding territories to identify differences in habitat selection.

Results: Territories of both species were similar in habitat composition and used levees, bushes, fallow areas and single trees. However, the spatial characteristics of the territories differed between species. Red-backed Shrikes used a wider range of sizes and shapes of habitat patches, whilst Barred Warblers preferred a more complex landscape structure and a higher diversity of habitat types. We also found that areas of 71% of Barred Warbler and 34% of Red-backed Shrike territories overlapped.

Conclusion: Whilst both species showed similar habitat choices, they appeared to differ significantly in terms of landscape structure: Red-backed Shrikes were more flexible and less selective than Barred Warblers in their habitat choice.  相似文献   


18.
Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble’s output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) − k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer’s disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.  相似文献   

19.
Introduction: Within the last decade, the study of microbial communities has gained increasing research interest also driven by the recognition of the important role of these consortia in human health and disease. Metaproteomics, the analysis of the entire set of proteins from all microorganisms present in one ecosystem, has become a prominent technique for studying the relation between taxonomic diversity and functional profile of microbial communities.

Areas covered: The aim of this review is to address opportunities and challenges of metaproteomics from a computational perspective. Appealing to an audience of microbial ecologists and proteomic researchers alike, we provide an overview on state-of-the-art software and databases by which metaproteome data can be readily analyzed.

Expert commentary: While tailored protein databases, combined search algorithms and iterative workflows are means to improve the identification yield, software tools for taxonomic and functional analysis are challenged by the vast amount of unannotated sequences in metaproteomics.  相似文献   


20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号