首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Hasan MS  Liu Q  Wang H  Fazekas J  Chen B  Che D 《Bioinformation》2012,8(4):203-205
Genomic Islands (GIs) are genomic regions that are originally from other organisms, through a process known as Horizontal Gene Transfer (HGT). Detection of GIs plays a significant role in biomedical research since such align genomic regions usually contain important features, such as pathogenic genes. We have developed a use friendly graphic user interface, Genomic Island Suite of Tools (GIST), which is a platform for scientific users to predict GIs. This software package includes five commonly used tools, AlienHunter, IslandPath, Colombo SIGI-HMM, INDeGenIUS and Pai-Ida. It also includes an optimization program EGID that ensembles the result of existing tools for more accurate prediction. The tools in GIST can be used either separately or sequentially. GIST also includes a downloadable feature that facilitates collecting the input genomes automatically from the FTP server of the National Center for Biotechnology Information (NCBI). GIST was implemented in Java, and was compiled and executed on Linux/Unix operating systems. AVAILABILITY: The database is available for free at http://www5.esu.edu/cpsc/bioinfo/software/GIST.  相似文献   

2.
3.
The adaptability of pathogenic bacteria to hosts is influenced by the genomic plasticity of the bacteria, which can be increased by such mechanisms as horizontal gene transfer. Pathogenicity islands play a major role in this type of gene transfer because they are large, horizontally acquired regions that harbor clusters of virulence genes that mediate the adhesion, colonization, invasion, immune system evasion, and toxigenic properties of the acceptor organism. Currently, pathogenicity islands are mainly identified in silico based on various characteristic features: (1) deviations in codon usage, G+C content or dinucleotide frequency and (2) insertion sequences and/or tRNA genetic flanking regions together with transposase coding genes. Several computational techniques for identifying pathogenicity islands exist. However, most of these techniques are only directed at the detection of horizontally transferred genes and/or the absence of certain genomic regions of the pathogenic bacterium in closely related non-pathogenic species. Here, we present a novel software suite designed for the prediction of pathogenicity islands (pathogenicity island prediction software, or PIPS). In contrast to other existing tools, our approach is capable of utilizing multiple features for pathogenicity island detection in an integrative manner. We show that PIPS provides better accuracy than other available software packages. As an example, we used PIPS to study the veterinary pathogen Corynebacterium pseudotuberculosis, in which we identified seven putative pathogenicity islands.  相似文献   

4.
IslandPath: aiding detection of genomic islands in prokaryotes   总被引:11,自引:0,他引:11  
Genomic islands (clusters of genes of potential horizontal origin in a prokaryotic genome) are frequently associated with a particular adaptation of a microbe that is of medical, agricultural or environmental importance, such as antibiotic resistance, pathogen virulence, or metal resistance. While many sequence features associated with such islands have been adopted separately in applications for analysis of genomic islands, including pathogenicity islands, there is no single application that integrates multiple features for island detection. IslandPath is a network service which incorporates multiple DNA signals and genome annotation features into a graphical display of a bacterial or archaeal genome, to aid the detection of genomic islands. AVAILABILITY: This application is available at http://www.pathogenomics.sfu.ca/islandpath and the source code is freely available, under GNU public licence, from the authors. SUPPLEMENTARY INFORMATION: An online help file, which includes analyses of the utility of IslandPath, can be found at http://www.pathogenomics.sfu.ca/islandpath/current/islandhelp.html  相似文献   

5.
基因水平转移可导致细菌不同种属间个体DNA的交换,从而使细菌对环境的适应性增强,是细菌进化的重要途径之一。基因组岛是基因水平转移的重要载体,可移动的基因组岛能够整合到宿主的染色体上,并在特定的条件下切除,进而通过转化、接合或转导等方式转移到新的宿主中。基因组岛具有多种生物学功能,如抗生素抗性、致病性、异源物质降解、重金属抗性等。基因组岛的转移造成可变基因在不同种属细菌间的广泛传播,例如毒力和耐药基因的传播导致了多重耐药细菌的产生,威胁人类健康。基因组岛由整合酶介导转移,同时在转移的过程受到多种不同转录因子的调控。本文对细菌中基因组岛的结构特点、转移和调控机制以及预测等方面进行了综述,并最终阐明基因组岛的转移及其调控机制是遏制基因组岛传播的重要策略。  相似文献   

6.
Microbial genes that are “novel” (no detectable homologs in other species) have become of increasing interest as environmental sampling suggests that there are many more such novel genes in yet-to-be-cultured microorganisms. By analyzing known microbial genomic islands and prophages, we developed criteria for systematic identification of putative genomic islands (clusters of genes of probable horizontal origin in a prokaryotic genome) in 63 prokaryotic genomes, and then characterized the distribution of novel genes and other features. All but a few of the genomes examined contained significantly higher proportions of novel genes in their predicted genomic islands compared with the rest of their genome (Paired t test = 4.43E-14 to 1.27E-18, depending on method). Moreover, the reverse observation (i.e., higher proportions of novel genes outside of islands) never reached statistical significance in any organism examined. We show that this higher proportion of novel genes in predicted genomic islands is not due to less accurate gene prediction in genomic island regions, but likely reflects a genuine increase in novel genes in these regions for both bacteria and archaea. This represents the first comprehensive analysis of novel genes in prokaryotic genomic islands and provides clues regarding the origin of novel genes. Our collective results imply that there are different gene pools associated with recently horizontally transmitted genomic regions versus regions that are primarily vertically inherited. Moreover, there are more novel genes within the gene pool associated with genomic islands. Since genomic islands are frequently associated with a particular microbial adaptation, such as antibiotic resistance, pathogen virulence, or metal resistance, this suggests that microbes may have access to a larger “arsenal” of novel genes for adaptation than previously thought.  相似文献   

7.
Genome evolution in prokaryotes is assisted by integration of gene pools from phages and plasmids. Regions downstream of tRNAs and tmRNAs are considered as hot spots for the integration of these gene pools or genomic islands. Till date, genomic islands have been identified only at tRNA/tmRNA genes in the enterobacterial genomes. Present work reports 10 distinct small RNAs as potent integration sites for genomic islands. A known tool tRNAcc 1.0 has been used to identify genomic islands associated with small RNAs c0362, oxyS, ryaA, rybB, rybD, ryeB, ryeE, rtT, sraE and tmRNA. The coordinates of 25 such small RNA associated genomic islands in three E. coli (strains: CFT073, EDL933 and K12) and Shigella flexneri (strain: 301) genomes are presented. Moreover cross-verification of the genomic sequences encoded within the identified genomic islands in horizontal gene transfer database, GenBank annotation features and atypical sequence compositions support our results. Again, all of the identified 25 genomic integration sites do exhibit genomic block rearrangements with respect to the associated small RNA. Similar to tRNAs/tmRNAs, the downstream regions of the small RNAs are found to be hotspots of integration.  相似文献   

8.

Background

Genomic prediction of breeding values from dense single nucleotide polymorphisms (SNP) genotypes is used for livestock and crop breeding, and can also be used to predict disease risk in humans. For some traits, the most accurate genomic predictions are achieved with non-linear estimates of SNP effects from Bayesian methods that treat SNP effects as random effects from a heavy tailed prior distribution. These Bayesian methods are usually implemented via Markov chain Monte Carlo (MCMC) schemes to sample from the posterior distribution of SNP effects, which is computationally expensive. Our aim was to develop an efficient expectation–maximisation algorithm (emBayesR) that gives similar estimates of SNP effects and accuracies of genomic prediction than the MCMC implementation of BayesR (a Bayesian method for genomic prediction), but with greatly reduced computation time.

Methods

emBayesR is an approximate EM algorithm that retains the BayesR model assumption with SNP effects sampled from a mixture of normal distributions with increasing variance. emBayesR differs from other proposed non-MCMC implementations of Bayesian methods for genomic prediction in that it estimates the effect of each SNP while allowing for the error associated with estimation of all other SNP effects. emBayesR was compared to BayesR using simulated data, and real dairy cattle data with 632 003 SNPs genotyped, to determine if the MCMC and the expectation-maximisation approaches give similar accuracies of genomic prediction.

Results

We were able to demonstrate that allowing for the error associated with estimation of other SNP effects when estimating the effect of each SNP in emBayesR improved the accuracy of genomic prediction over emBayesR without including this error correction, with both simulated and real data. When averaged over nine dairy traits, the accuracy of genomic prediction with emBayesR was only 0.5% lower than that from BayesR. However, emBayesR reduced computing time up to 8-fold compared to BayesR.

Conclusions

The emBayesR algorithm described here achieved similar accuracies of genomic prediction to BayesR for a range of simulated and real 630 K dairy SNP data. emBayesR needs less computing time than BayesR, which will allow it to be applied to larger datasets.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-014-0082-4) contains supplementary material, which is available to authorized users.  相似文献   

9.
RESULTS: CpGProD is an application for identifying mammalian promoter regions associated with CpG islands in large genomic sequences. Although it is strictly dedicated to this particular promoter class corresponding to approximately 50% of the genes, CpGProD exhibits a higher sensitivity and specificity than other tools used for promoter prediction. Notably, CpGProD uses different parameters according to species (human, mouse) studied. Moreover, CpGProD predicts the promoter orientation on the DNA strand. AVAILABILITY: http://pbil.univ-lyon1.fr/software/cpgprod.html SUPPLEMENTARY INFORMATION: http://pbil.univ-lyon1.fr/software/cpgprod.html  相似文献   

10.
A Genomic Islands (GI) is a chunk of DNA sequence in a genome whose origin can be traced back to other organisms or viruses. The detection of GIs plays an indispensable role in biomedical research, due to the fact that GIs are highly related to special functionalities such as disease-causing GIs - pathogenicity islands. It is also very important to visualize genomic islands, as well as the supporting features corresponding to the genomic islands in the genome. We have developed a program, Genomic Island Visualization (GIV), which displays the locations of genomic islands in a genome, as well as the corresponding supportive feature information for GIs. GIV was implemented in C++, and was compiled and executed on Linux/Unix operating systems.

Availability

GIV is freely available for non-commercial use at http://www5.esu.edu/cpsc/bioinfo/software/GIV  相似文献   

11.
Accuracy of genomic selection in European maize elite breeding populations   总被引:1,自引:0,他引:1  
Genomic selection is a promising breeding strategy for rapid improvement of complex traits. The objective of our study was to investigate the prediction accuracy of genomic breeding values through cross validation. The study was based on experimental data of six segregating populations from a half-diallel mating design with 788 testcross progenies from an elite maize breeding program. The plants were intensively phenotyped in multi-location field trials and fingerprinted with 960 SNP markers. We used random regression best linear unbiased prediction in combination with fivefold cross validation. The prediction accuracy across populations was higher for grain moisture (0.90) than for grain yield (0.58). The accuracy of genomic selection realized for grain yield corresponds to the precision of phenotyping at unreplicated field trials in 3–4 locations. As for maize up to three generations are feasible per year, selection gain per unit time is high and, consequently, genomic selection holds great promise for maize breeding programs.  相似文献   

12.
Random forests for genomic data analysis   总被引:1,自引:0,他引:1  
Chen X  Ishwaran H 《Genomics》2012,99(6):323-329
Random forests (RF) is a popular tree-based ensemble machine learning tool that is highly data adaptive, applies to "large p, small n" problems, and is able to account for correlation as well as interactions among features. This makes RF particularly appealing for high-dimensional genomic data analysis. In this article, we systematically review the applications and recent progresses of RF for genomic data, including prediction and classification, variable selection, pathway analysis, genetic association and epistasis detection, and unsupervised learning.  相似文献   

13.

Background  

Genomic islands are regions of bacterial genomes that have been acquired by horizontal transfer and often contain blocks of genes that function together for specific processes. Recently, it has become clear that the impact of genomic islands on the evolution of different bacterial species is significant and represents a major force in establishing bacterial genomic variation. However, the study of genomic island evolution has been mostly performed at the sequence level using computer software or hybridization analysis to compare different bacterial genomic sequences. We describe here a novel experimental approach to study the evolution of species-specific bacterial genomic islands that identifies island genes that have evolved in such a way that they are differentially-expressed depending on the bacterial host background into which they are transferred.  相似文献   

14.

Background

Genomic islands play an important role in medical, methylation and biological studies. To explore the region, we propose a CpG islands prediction analysis platform for genome sequence exploration (CpGPAP).

Results

CpGPAP is a web-based application that provides a user-friendly interface for predicting CpG islands in genome sequences or in user input sequences. The prediction algorithms supported in CpGPAP include complementary particle swarm optimization (CPSO), a complementary genetic algorithm (CGA) and other methods (CpGPlot, CpGProD and CpGIS) found in the literature. The CpGPAP platform is easy to use and has three main features (1) selection of the prediction algorithm; (2) graphic visualization of results; and (3) application of related tools and dataset downloads. These features allow the user to easily view CpG island results and download the relevant island data. CpGPAP is freely available at http://bio.kuas.edu.tw/CpGPAP/.

Conclusions

The platform's supported algorithms (CPSO and CGA) provide a higher sensitivity and a higher correlation coefficient when compared to CpGPlot, CpGProD, CpGIS, and CpGcluster over an entire chromosome.  相似文献   

15.
Artificial selection has proven to be effective at altering the performance of animal production systems. Nevertheless, selection based on assessment of the genetic superiority of candidates is suboptimal as a result of errors in the prediction of genetic merit. Conventional breeding programs may extend phenotypic measurements on selection candidates to include correlated indicator traits, or delay selection decisions well beyond puberty so that phenotypic performance can be observed on progeny or other relatives. Extending the generation interval to increase the accuracy of selection reduces annual rates of gain compared to accurate selection and use of parents of the next generation at the immediate time they reach breeding age. Genomic prediction aims at reducing prediction errors at breeding age by exploiting information on the transmission of chromosome fragments from parents to selection candidates, in conjunction with knowledge on the value of every chromosome fragment. For genomic prediction to influence beef cattle breeding programs and the rate or cost of genetic gains, training analyses must be undertaken, and genomic prediction tools made available for breeders and other industry stakeholders. This paper reviews the nature or kind of studies currently underway, the scope or extent of some of those studies, and comments on the likely predictive value of genomic information for beef cattle improvement.  相似文献   

16.

Background

A major part of horizontal gene transfer that contributes to the diversification and adaptation of bacteria is facilitated by genomic islands. The evolution of these islands is poorly understood. Some progress was made with the identification of a set of phylogenetically related genomic islands among the Proteobacteria, recognized from the investigation of the evolutionary origins of a Haemophilus influenzae antibiotic resistance island, namely ICEHin1056. More clarity comes from this comparative analysis of seven complete sequences of the ICEHin1056 genomic island subfamily.

Results

These genomic islands have core and accessory genes in approximately equal proportion, with none demonstrating recent acquisition from other islands. The number of variable sites within core genes is similar to that found in the host bacteria. Furthermore, the GC content of the core genes is similar to that of the host bacteria (38% to 40%). Most of the core gene content is formed by the syntenic type IV secretion system dependent conjugative module and replicative module. GC content and lack of variable sites indicate that the antibiotic resistance genes were acquired relatively recently. An analysis of conjugation efficiency and antibiotic susceptibility demonstrates that phenotypic expression of genomic island-borne genes differs between different hosts.

Conclusion

Genomic islands of the ICEHin1056 subfamily have a longstanding relationship with H. influenzae and H. parainfluenzae and are co-evolving as semi-autonomous genomes within the 'supragenomes' of their host species. They have promoted bacterial diversity and adaptation through becoming efficient vectors of antibiotic resistance by the recent acquisition of antibiotic resistance transposons.  相似文献   

17.
Infection with Mycobacterium avium subsp. paratuberculosis causes Johne's disease in cattle and is also implicated in cases of Crohn's disease in humans. Another closely related strain, M. avium subsp. avium, is a health problem for immunocompromised patients. To understand the molecular pathogenesis of M. avium subspecies, we analyzed the genome contents of isolates collected from humans and domesticated or wildlife animals. Comparative genomic hybridizations indicated distinct lineages for each subspecies where the closest genomic relatedness existed between M. avium subsp. paratuberculosis isolates collected from human and clinical cow samples. Genomic islands (n = 24) comprising 846 kb were present in the reference M. avium subsp. avium strain but absent from 95% of M. avium subsp. paratuberculosis isolates. Additional analysis identified a group of 18 M. avium subsp. paratuberculosis-associated islands comprising 240 kb that were absent from most of the M. avium subsp. avium isolates. Sequence analysis of DNA regions flanking the genomic islands identified three large inversions in addition to several small inversions that could play a role in regulation of gene expression. Analysis of genes encoded in the genomic islands reveals factors that are probably important for various mechanisms of virulence. Overall, M. avium subsp. avium isolates displayed a higher level of genomic diversity than M. avium subsp. paratuberculosis isolates. Among M. avium subsp. paratuberculosis isolates, those from wildlife animals displayed the highest level of genomic rearrangements that were not observed in other isolates. The presented findings will affect the future design of diagnostics and vaccines for Johne's and Crohn's diseases and provide a model for genomic analysis of closely related bacteria.  相似文献   

18.

Background

Genomic islands (GIs) are clusters of alien genes in some bacterial genomes, but not be seen in the genomes of other strains within the same genus. The detection of GIs is extremely important to the medical and environmental communities. Despite the discovery of the GI associated features, accurate detection of GIs is still far from satisfactory.

Results

In this paper, we combined multiple GI-associated features, and applied and compared various machine learning approaches to evaluate the classification accuracy of GIs datasets on three genera: Salmonella, Staphylococcus, Streptococcus, and their mixed dataset of all three genera. The experimental results have shown that, in general, the decision tree approach outperformed better than other machine learning methods according to five performance evaluation metrics. Using J48 decision trees as base classifiers, we further applied four ensemble algorithms, including adaBoost, bagging, multiboost and random forest, on the same datasets. We found that, overall, these ensemble classifiers could improve classification accuracy.

Conclusions

We conclude that decision trees based ensemble algorithms could accurately classify GIs and non-GIs, and recommend the use of these methods for the future GI data analysis. The software package for detecting GIs can be accessed at http://www.esu.edu/cpsc/che_lab/software/GIDetector/.
  相似文献   

19.
Uropathogenic Escherichia coli (UPEC) strains are responsible for the majority of uncomplicated urinary tract infections, which can present clinically as cystitis or pyelonephritis. UPEC strain CFT073, isolated from the blood of a patient with acute pyelonephritis, was most cytotoxic and most virulent in mice among our strain collection. Based on the genome sequence of CFT073, microarrays were utilized in comparative genomic hybridization (CGH) analysis of a panel of uropathogenic and fecal/commensal E. coli isolates. Genomic DNA from seven UPEC (three pyelonephritis and four cystitis) isolates and three fecal/commensal strains, including K-12 MG1655, was hybridized to the CFT073 microarray. The CFT073 genome contains 5,379 genes; CGH analysis revealed that 2,820 (52.4%) of these genes were common to all 11 E. coli strains, yet only 173 UPEC-specific genes were found by CGH to be present in all UPEC strains but in none of the fecal/commensal strains. When the sequences of three additional sequenced UPEC strains (UTI89, 536, and F11) and a commensal strain (HS) were added to the analysis, 131 genes present in all UPEC strains but in no fecal/commensal strains were identified. Seven previously unrecognized genomic islands (>30 kb) were delineated by CGH in addition to the three known pathogenicity islands. These genomic islands comprise 672 kb of the 5,231-kb (12.8%) genome, demonstrating the importance of horizontal transfer for UPEC and the mosaic structure of the genome. UPEC strains contain a greater number of iron acquisition systems than do fecal/commensal strains, which is reflective of the adaptation to the iron-limiting urinary tract environment. Each strain displayed distinct differences in the number and type of known virulence factors. The large number of hypothetical genes in the CFT073 genome, especially those shown to be UPEC specific, strongly suggests that many urovirulence factors remain uncharacterized.  相似文献   

20.

Background

Dominance effect may play an important role in genetic variation of complex traits. Full featured and easy-to-use computing tools for genomic prediction and variance component estimation of additive and dominance effects using genome-wide single nucleotide polymorphism (SNP) markers are necessary to understand dominance contribution to a complex trait and to utilize dominance for selecting individuals with favorable genetic potential.

Results

The GVCBLUP package is a shared memory parallel computing tool for genomic prediction and variance component estimation of additive and dominance effects using genome-wide SNP markers. This package currently has three main programs (GREML_CE, GREML_QM, and GCORRMX) and a graphical user interface (GUI) that integrates the three main programs with an existing program for the graphical viewing of SNP additive and dominance effects (GVCeasy). The GREML_CE and GREML_QM programs offer complementary computing advantages with identical results for genomic prediction of breeding values, dominance deviations and genotypic values, and for genomic estimation of additive and dominance variances and heritabilities using a combination of expectation-maximization (EM) algorithm and average information restricted maximum likelihood (AI-REML) algorithm. GREML_CE is designed for large numbers of SNP markers and GREML_QM for large numbers of individuals. Test results showed that GREML_CE could analyze 50,000 individuals with 400 K SNP markers and GREML_QM could analyze 100,000 individuals with 50K SNP markers. GCORRMX calculates genomic additive and dominance relationship matrices using SNP markers. GVCeasy is the GUI for GVCBLUP integrated with an existing software tool for the graphical viewing of SNP effects and a function for editing the parameter files for the three main programs.

Conclusion

The GVCBLUP package is a powerful and versatile computing tool for assessing the type and magnitude of genetic effects affecting a phenotype by estimating whole-genome additive and dominance heritabilities, for genomic prediction of breeding values, dominance deviations and genotypic values, for calculating genomic relationships, and for research and education in genomic prediction and estimation.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-270) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号