首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 593 毫秒
1.
Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species.  相似文献   

2.
There are four sequenced and publicly available plant genomes to date. With many more slated for completion, one challenge will be to use comparative genomic methods to detect novel evolutionary patterns in plant genomes. This research requires sequence alignment algorithms to detect regions of similarity within and among genomes. However, different alignment algorithms are optimized for identifying different types of homologous sequences. This review focuses on plant genome evolution and provides a tutorial for using several sequence alignment algorithms and visualization tools to detect useful patterns of conservation: conserved non-coding sequences, false positive noise, subfunctionalization, synteny, annotation errors, inversions and local duplications. Our tutorial encourages the reader to experiment online with the reviewed tools as a companion to the text.  相似文献   

3.
Traditional phylogenetic analysis is based on multiple sequence alignment. With the development of worldwide genome sequencing project, more and more completely sequenced genomes become available. However, traditional sequence alignment tools are impossible to deal with large-scale genome sequence. So, the development of new algorithms to infer phylogenetic relationship without alignment from whole genome information represents a new direction of phylogenetic study in the post-genome era. In the present study, a novel algorithm based on BBC (base-base correlation) is proposed to analyze the phylogenetic relationships of HEV (Hepatitis E virus). When 48 HEV genome sequences are analyzed, the phylogenetic tree that is constructed based on BBC algorithm is well consistent with that of previous study. When compared with methods of sequence alignment, the merit of BBC algorithm appears to be more rapid in calculating evolutionary distances of whole genome sequence and not requires any human intervention, such as gene identification, parameter selection. BBC algorithm can serve as an alternative to rapidly construct phylogenetic trees and infer evolutionary relationships.  相似文献   

4.

Background  

An increasing number of whole viral and bacterial genomes are being sequenced and deposited in public databases. In parallel to the mounting interest in whole genomes, the number of whole genome analyses software tools is also increasing. GeneOrder was originally developed to provide an analysis of genes between two genomes, allowing visualization of gene order and synteny comparisons of any small genomes. It was originally developed for comparing virus, mitochondrion and chloroplast genomes. This is now extended to small bacterial genomes of sizes less than 2 Mb.  相似文献   

5.

Background  

It is difficult to accurately interpret chromosomal correspondences such as true orthology and paralogy due to significant divergence of genomes from a common ancestor. Analyses are particularly problematic among lineages that have repeatedly experienced whole genome duplication (WGD) events. To compare multiple "subgenomes" derived from genome duplications, we need to relax the traditional requirements of "one-to-one" syntenic matchings of genomic regions in order to reflect "one-to-many" or more generally "many-to-many" matchings. However this relaxation may result in the identification of synteny blocks that are derived from ancient shared WGDs that are not of interest. For many downstream analyses, we need to eliminate weak, low scoring alignments from pairwise genome comparisons. Our goal is to objectively select subset of synteny blocks whose total scores are maximized while respecting the duplication history of the genomes in comparison. We call this "quota-based" screening of synteny blocks in order to appropriately fill a quota of syntenic relationships within one genome or between two genomes having WGD events.  相似文献   

6.
ABSTRACT: BACKGROUND: The availability of a large number of recently sequenced vertebrate genomes opens new avenues to integrate cytogenetics and genomics in comparative and evolutionary studies. Cytogenetic mapping can offer alternative means to identify conserved synteny shared by distinct genomes and also to define genome regions that are still not fine characterized even after wide-ranging nucleotide sequence efforts. An efficient way to perform comparative cytogenetic mapping is based on BAC clones mapping by fluorescence in situ hybridization. In this report, to address the knowledge gap on the genome evolution in cichlid fishes, BAC clones of an Oreochromis niloticus library covering the linkage groups (LG) 1, 3, 5, and 7 were mapped onto the chromosomes of 9 African cichlid species. The cytogenetic mapping data were also integrated with BAC-end sequences information of O. niloticus and comparatively analyzed against the genome of other fish species and vertebrates. RESULTS: The location of BACs from LG1, 3, 5, and 7 revealed a strong chromosomal conservation among the analyzed cichlid species genomes, which evidenced a synteny of the markers of each LG. Comparative in silico analysis also identified large genomic blocks that were conserved in distantly related fish groups and also in other vertebrates. CONCLUSIONS: Although it has been suggested that fishes contain plastic genomes with high rates of chromosomal rearrangements and probably low rates of synteny conservation, our results evidence that large syntenic chromosome segments have been maintained conserved during evolution, at least for the considered markers. Additionally, our current cytogenetic mapping efforts integrated with genomic approaches conduct to a new perspective to address important questions involving chromosome evolution in fishes.  相似文献   

7.
High-throughput DNA sequencing (HTS) is of increasing importance in the life sciences. One of its most prominent applications is the sequencing of whole genomes or targeted regions of the genome such as all exonic regions (i.e., the exome). Here, the objective is the identification of genetic variants such as single nucleotide polymorphisms (SNPs). The extraction of SNPs from the raw genetic sequences involves many processing steps and the application of a diverse set of tools. We review the essential building blocks for a pipeline that calls SNPs from raw HTS data. The pipeline includes quality control, mapping of short reads to the reference genome, visualization and post-processing of the alignment including base quality recalibration. The final steps of the pipeline include the SNP calling procedure along with filtering of SNP candidates. The steps of this pipeline are accompanied by an analysis of a publicly available whole-exome sequencing dataset. To this end, we employ several alignment programs and SNP calling routines for highlighting the fact that the choice of the tools significantly affects the final results.  相似文献   

8.

Background  

Identifying syntenic regions, i.e., blocks of genes or other markers with evolutionary conserved order, and quantifying evolutionary relatedness between genomes in terms of chromosomal rearrangements is one of the central goals in comparative genomics. However, the analysis of synteny and the resulting assessment of genome rearrangements are sensitive to the choice of a number of arbitrary parameters that affect the detection of synteny blocks. In particular, the choice of a set of markers and the effect of different aggregation strategies, which enable coarse graining of synteny blocks and exclusion of micro-rearrangements, need to be assessed. Therefore, existing tools and resources that facilitate identification, visualization and analysis of synteny need to be further improved to provide a flexible platform for such analysis, especially in the context of multiple genomes.  相似文献   

9.
10.

Background  

The recent availability of an expanding collection of genome sequences driven by technological advances has facilitated comparative genomics and in particular the identification of synteny among multiple genomes. However, the development of effective and easy-to-use methods for identifying such conserved gene clusters among multiple genomes–synteny blocks–as well as databases, which host synteny blocks from various groups of species (especially eukaryotes) and also allow users to run synteny-identification programs, lags behind.  相似文献   

11.
During the last decade, technological improvements led to the development of large sets of plant genomic resources permitting the emergence of high-resolution comparative genomic studies. Synteny-based identification of seven shared duplications in cereals led to the modeling of a common ancestral genome structure of 33.6 Mb structured in five protochromosomes containing 9138 protogenes and provided new insights into the evolution of cereal genomes from their extinct ancestors. Recent palaeogenomic data indicate that whole genome duplications were a driving force in the evolutionary success of cereals over the last 50 to 70 millions years. Finally, detailed synteny and duplication relationships led to an improved representation of cereal genomes in concentric circles, thus providing a new reference tool for improved gene annotation and cross-genome markers development.  相似文献   

12.
The classic algorithms of Needleman-Wunsch and Smith-Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). To process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces that are suitable for Needleman-Wunsch alignment. In these alignment methods, it is standard practice to fix the parameters and to produce a single alignment for subsequent analysis by biologists. As the number of alignment programs applied on a whole genome scale continues to increase, so does the disagreement in their results. The alignments produced by different programs vary greatly, especially in non-coding regions of eukaryotic genomes where the biologically correct alignment is hard to find. Parametric alignment is one possible remedy. This methodology resolves the issue of robustness to changes in parameters by finding all optimal alignments for all possible parameters in a PHMM. Our main result is the construction of a whole genome parametric alignment of Drosophila melanogaster and Drosophila pseudoobscura. This alignment draws on existing heuristics for dividing whole genomes into small pieces for alignment, and it relies on advances we have made in computing convex polytopes that allow us to parametrically align non-coding regions using biologically realistic models. We demonstrate the utility of our parametric alignment for biological inference by showing that cis-regulatory elements are more conserved between Drosophila melanogaster and Drosophila pseudoobscura than previously thought. We also show how whole genome parametric alignment can be used to quantitatively assess the dependence of branch length estimates on alignment parameters.  相似文献   

13.
MOTIVATION: As more whole genome sequences become available, comparing multiple genomes at the sequence level can provide insight into new biological discovery. However, there are significant challenges for genome comparison. The challenge includes requirement for computational resources owing to the large volume of genome data. More importantly, since the choice of genomes to be compared is entirely subjective, there are too many choices for genome comparison. For these reasons, there is pressing need for bioinformatics systems for comparing multiple genomes where users can choose genomes to be compared freely. RESULTS: PLATCOM (Platform for Computational Comparative Genomics) is an integrated system for the comparative analysis of multiple genomes. The system is built on several public databases and a suite of genome analysis applications are provided as exemplary genome data mining tools over these internal databases. Researchers are able to visually investigate genomic sequence similarities, conserved gene neighborhoods, conserved metabolic pathways and putative gene fusion events among a set of selected multiple genomes. AVAILABILITY: http://platcom.informatics.indiana.edu/platcom  相似文献   

14.
Fast algorithms for large-scale genome alignment and comparison   总被引:35,自引:5,他引:30       下载免费PDF全文
We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The new system, MUMmer 2, runs three times faster while using one-third as much memory as the original MUMmer system. It has been used successfully to align the entire human and mouse genomes to each other, and to align numerous smaller eukaryotic and prokaryotic genomes. A new module permits the alignment of multiple DNA sequence fragments, which has proven valuable in the comparison of incomplete genome sequences. We also describe a method to align more distantly related genomes by detecting protein sequence homology. This extension to MUMmer aligns two genomes after translating the sequence in all six reading frames, extracts all matching protein sequences and then clusters together matches. This method has been applied to both incomplete and complete genome sequences in order to detect regions of conserved synteny, in which multiple proteins from one organism are found in the same order and orientation in another. The system code is being made freely available by the authors.  相似文献   

15.
We describe EnteriX, a suite of three web-based visualization tools for graphically portraying alignment information from comparisons among several fixed and user-supplied sequences from related enterobacterial species, anchored on a reference genome (http://bio.cse.psu.edu/). The first visualization, Enteric, displays stacked pairwise alignments between a reference genome and each of the related bacteria, represented schematically as PIPs (Percent Identity Plots). Encoded in the views are large-scale genomic rearrangement events and functional landmarks. The second visualization, Menteric, computes and displays 1 Kb views of nucleotide-level multiple alignments of the sequences, together with annotations of genes, regulatory sites and conserved regions. The third, a Java-based tool named Maj, displays alignment information in two formats, corresponding roughly to the Enteric and Menteric views, and adds zoom-in capabilities. The uses of such tools are diverse, from examining the multiple sequence alignment to infer conserved sites with potential regulatory roles, to scrutinizing the commonalities and differences between the genomes for pathogenicity or phylogenetic studies. The EnteriX suite currently includes >15 enterobacterial genomes, generates views centered on four different anchor genomes and provides support for including user sequences in the alignments.  相似文献   

16.
17.
We have previously described a bioinformatics pipeline identifying comparative anchor-tagged sequence (CATS) loci, combined with design of intron-spanning primers. The derived anchor markers defining the linkage position of homologous genes are essential for evaluating genome conservation among related species and facilitate transfer of genetic and genome information between species. Here we validate this global approach in the common bean and in the AA genome complement of the allotetraploid peanut. We present the successful conversion of approximately 50% of the bioinformatics-defined primers into legume anchor markers in bean and diploid Arachis species. One hundred and four new loci representing single-copy genes were added to the existing bean map. These new legume anchor-marker loci enabled the alignment of genetic linkage maps through corresponding genes and provided an estimate of the extent of synteny and collinearity. Extensive macrosynteny between Lotus and bean was uncovered on 8 of the 11 bean chromosomes and large blocks of macrosynteny were also found between bean and Medicago. This suggests that anchor markers can facilitate a better understanding of the genes and genetics of important traits in crops with largely uncharacterized genomes using genetic and genome information from related model plants.  相似文献   

18.
Sex chromosome differentiation began early during mammalian evolution. The karyotype of almost all placental mammals living today includes a pair of heterosomes: XX in females and XY in males. The genomes of different species may contain homologous synteny blocks indicating that they share a common ancestry. One of the tools used for their identification is the Zoo-FISH technique. The aim of the study was to determine whether sex chromosomes of some members of the Canidae family (the domestic dog, the red fox, the arctic fox, an interspecific hybrid: arctic fox x red fox and the Chinese raccoon dog) are evolutionarily conservative. Comparative cytogenetic analysis by Zoo-FISH using painting probes specific to domestic dog heterosomes was performed. The results show the presence of homologous synteny covering the entire structures of the X and the Y chromosomes. This suggests that sex chromosomes are conserved in the Canidae family. The data obtained through Zoo-FISH karyotype analysis append information obtained using other comparative genomics methods, giving a more complete depiction of genome evolution.  相似文献   

19.
Genomic screens for small RNA candidates in Enterobacteriacae genomes were carried out with existing small RNA sequences, conserved flanking genes, and genomic backbone information. The small RNA sequences and contexts from E. coli K12 formed the basis of the search. Sequence identity identified 117 additional small RNA homologs in related genomes. Motifs of continuous sequence stretches added another 48 sRNA regions, termed partial homologs. However, this study is unique in identifying 160 nonhomologous sRNA loci in related genomes based on the conserved flanking gene synteny and the backbone retention information obtained from KEGG-SSDB. Gene synteny and genomic backbone continuity were observed to be correlated with all of the sRNAs in related genomes. This search is the first of its kind toward identification of functionally important regions using gene order and back-bone information. A disruption in flanking gene order or genomic backbone indicates a possible hotspot for alien gene pool integration. This study reports both occurrence of multiple copies of a sRNA and co-occurrence of different sRNAs between a pair of conserved flanking genes. In general, synteny and genomic backbone retention information can be added as additional search criteria toward the design of precise bioinformatics tools for sRNA, gene identification, and gene functional annotations in related genomes.  相似文献   

20.
Non-circular plots of whole genomes are natural representations of genomic data aligned along all chromosomes.Currently,there is no specialized graphical user interface(GUI) designed to produce non-circular whole genome diagrams,and the use of existing tools requires considerable coding effort from users.Moreover,such tools also require improvement,including the addition of new functionalities.To address these issues,we developed a new R/Shiny application,named shiny Chromosome,as a GUI for the interactive creation of non-circular whole genome diagrams.shiny Chromosome can be easily installed on personal computers for own use as well as on local or public servers for community use.Publication-quality images can be readily generated and annotated from user input using diverse widgets.shiny Chromosome is deployed at http://150.109.59.144:3838/shiny Chromosome/,http://shiny Chromosome.ncpgr.cn,and https://yimingyu.shinyapps.io/shiny Chromosome for online use.The source code and manual of shiny Chromosome are freely available at https://github.com/venyao/shiny Chromosome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号