首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
《Genomics》2019,111(6):1298-1305
Based on the k-mer model for protein sequence, a novel k-mer natural vector method is proposed to characterize the features of k-mers in a protein sequence, in which the numbers and distributions of k-mers are considered. It is proved that the relationship between a protein sequence and its k-mer natural vector is one-to-one. Phylogenetic analysis of protein sequences therefore can be easily performed without requiring evolutionary models or human intervention. In addition, there exists no a criterion to choose a suitable k, and k has a great influence on obtaining results as well as computational complexity. In this paper, a compound k-mer natural vector is utilized to quantify each protein sequence. The results gotten from phylogenetic analysis on three protein datasets demonstrate that our new method can precisely describe the evolutionary relationships of proteins, and greatly heighten the computing efficiency.  相似文献   

2.
The order of genes in the genomes of species can change during evolution and can provide information about their phylogenetic relationship. An interesting method to infer the phylogenetic relationship from the gene orders is to use different types of rearrangement operations and to find possible rearrangement scenarios using these operations. One of the most common rearrangement operations is reversals, which reverse the order of a subset of neighbored genes. In this paper, we study the problem to find the ancestral gene order for three species represented by their gene orders. The rearrangement scenario should use a minimal number of reversals and no other rearrangement operations. This problem is called the Median problem and is known to be NP--complete. In this paper, we describe a heuristic algorithm for finding solutions to the Median problem that searches for rearrangement scenarios with the additional property that gene groups should not be destroyed by reversal operations. The concept of conserved intervals for signed permutations is used to describe such gene groups. We show experimentally, for different types of test problems, that the proposed algorithm produces very good results compared to other algorithms for the Median problem. We also integrate our reversal selection procedure into the well-known MGR and GRAPPA algorithms and show that they achieve a significant speedup while obtaining solutions of the same quality as the original algorithms on the test problems.  相似文献   

3.
The surprising fact that global statistical properties computed on a genomewide scale may reveal species information has first been observed in studies of dinucleotide frequencies. Here we will look at the same phenomenon with a totally different statistical approach. We show that patterns in the short-range statistical correlations in DNA sequences serve as evolutionary fingerprints of eukaryotes. All chromosomes of a species display the same characteristic pattern, markedly different from those of other species. The chromosomes of a species are sorted onto the same branch of a phylogenetic tree due to this correlation pattern. The average correlation between nucleotides at a distance k is quantified in two independent ways: (i) by estimating it from a higher-order Markov process and (ii) by computing the mutual information function at a distance k. We show how the quality of phylogenetic reconstruction depends on the range of correlation strengths and on the length of the underlying sequence segment. This concept of the correlation pattern as a phylogenetic signature of eukaryote species combines two rather distant domains of research, namely phylogenetic analysis based on molecular observation and the study of the correlation structure of DNA sequences.  相似文献   

4.
随着越来越多基因组的测序完成,基于全基因组的非比对的系统发生分析已成为研究热点。不同的生物物种或个体基因组之间的核酸组分不完全相同。遗传语言-DNA序列的信息很大程度上反映在其k—mer频数中。基于基因组序列k-mer频数的系统发生树则从新的角度为我们提供物种之间的亲缘关系。本文定义基于k-mer,频数的信息参数,并用它表征基因组序列,计算不同基因组之间信息参数的距离,用邻接法对84个病毒构建了系统发生树,发现构建的系统发生树很大程度上与已有的系统发生树相吻合。  相似文献   

5.
MOTIVATION: Alternative splicing is currently seen to explain the vast disparity between the number of predicted genes in the human genome and the highly diverse proteome. The mapping of expressed sequences tag (EST) consensus sequences derived from the GeneNest database onto the genome provides an efficient way of predicting exon-intron boundaries, gene structure and alternative splicing events. However, the alternative splicing events are obscured by a large number of putatively artificial exon boundaries arising due to genomic contamination or alignment errors. The current work describes a methodology to associate quality values to the predicted exon-intron boundaries. High quality exon-intron boundaries are used to predict constitutive and alternative splicing ranked by confidence values, aiming to facilitate large-scale analysis of alternative splicing and splicing in general. RESULTS: Applying the current methodology, constitutive splicing is observed in 33,270 EST clusters, out of which 45% are alternatively spliced. The classification derived from the computed confidence values for 17 of these splice events frequently correlate (15/17) with RT-PCR experiments performed for 40 different tissue samples. As an application of the confidence measure, an evaluation of distribution of alternative splicing revealed that majority of variants correspond to the coding regions of the genes. However, still a significant fraction maps to non-coding regions, thereby indicating a functional relevance of alternative splicing in untranslated regions. AVAILABILITY: The predicted alternative splice variants are visualized in the SpliceNest database at http://splicenest.molgen.mpg.de  相似文献   

6.
7.
Clark AG 《Cell》2008,134(3):388-389
Next-generation sequencing methods use massively parallel detection of short sequencing reactions, making them ideal for the analysis of ancient DNA. In this issue, Green et al. (2008) exploit this feature to infer the complete mitochondrial genome sequence of one Neanderthal and place bounds on its time of common ancestry with modern humans.  相似文献   

8.
Genome sequences and great expectations   总被引:2,自引:1,他引:1  
Iliopoulos I  Tsoka S  Andrade MA  Janssen P  Audit B  Tramontano A  Valencia A  Leroy C  Sander C  Ouzounis CA 《Genome biology》2001,2(1):interactions0001.1-interactions00013
To assess how automatic function assignment will contribute to genome annotation in the next five years, we have performed an analysis of 31 available genome sequences. An emerging pattern is that function can be predicted for almost two-thirds of the 73,500 genes that were analyzed. Despite progress in computational biology, there will always be a great need for large-scale experimental determination of protein function.  相似文献   

9.
To assess how automatic function assignment will contribute to genome annotation in the next five years, we have performed an analysis of 31 available genome sequences. An emerging pattern is that function can be predicted for almost two-thirds of the 73,500 genes that were analyzed. Despite progress in computational biology, there will always be a great need for large-scale experimental determination of protein function.  相似文献   

10.
Genome sequences of Halobacterium species   总被引:1,自引:1,他引:0  
  相似文献   

11.
《Genomics》2022,114(4):110414
Classification of viruses into their taxonomic ranks (e.g., order, family, and genus) provides a framework to organize an abundant population of viruses. Next-generation metagenomic sequencing technologies lead to a rapid increase in generating sequencing data of viruses which require bioinformatics tools to analyze the taxonomy. Many metagenomic taxonomy classifiers have been developed to study microbiomes, but it is particularly challenging to assign the taxonomy of diverse virus sequences and there is a growing need for dedicated methods to be developed that are optimized to classify virus sequences into their taxa. For taxonomic classification of viruses from metagenomic sequences, we developed VirusTaxo using diverse (e.g., 402 DNA and 280 RNA) genera of viruses. VirusTaxo has an average accuracy of 93% at genus level prediction in DNA and RNA viruses. VirusTaxo outperformed existing taxonomic classifiers of viruses where it assigned taxonomy of a larger fraction of metagenomic contigs compared to other methods. Benchmarking of VirusTaxo on a collection of SARS-CoV-2 sequencing libraries and metavirome datasets suggests that VirusTaxo can characterize virus taxonomy from highly diverse contigs and provide a reliable decision on the taxonomy of viruses.  相似文献   

12.

Background

With the advent of low cost, fast sequencing technologies metagenomic analyses are made possible. The large data volumes gathered by these techniques and the unpredictable diversity captured in them are still, however, a challenge for computational biology.

Results

In this paper we address the problem of rapid taxonomic assignment with small and adaptive data models (< 5 MB) and present the accelerated k-mer explorer (AKE). Acceleration in AKE’s taxonomic assignments is achieved by a special machine learning architecture, which is well suited to model data collections that are intrinsically hierarchical. We report classification accuracy reasonably well for ranks down to order, observed on a study on real world data (Acid Mine Drainage, Cow Rumen).

Conclusion

We show that the execution time of this approach is orders of magnitude shorter than competitive approaches and that accuracy is comparable. The tool is presented to the public as a web application (url: https://ani.cebitec.uni-bielefeld.de/ake/, username: bmc, password: bmcbioinfo).

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0384-0) contains supplementary material, which is available to authorized users.  相似文献   

13.
14.
Evaluation and improvements in the automatic alignment of protein sequences   总被引:6,自引:0,他引:6  
The accuracy of protein sequence alignment obtained by applying a commonly used global sequence comparison algorithm is assessed. Alignments based on the superposition of the three-dimensional structures are used as a standard for testing the automatic, sequence-based methods. Alignments obtained from the global comparison of five pairs of homologous protein sequences studied gave 54% agreement overall for residues in secondary structures. The inclusion of information about the secondary structure of one of the proteins in order to limit the number of gaps inserted in regions of secondary structure, improved this figure to 68%. A similarity score of greater than six standard deviation units suggests that an alignment which is greater than 75% correct within secondary structural regions can be obtained automatically for the pair of sequences.  相似文献   

15.
The Taiwanese (Formosan) macaque (Macaca cyclopis) is the only nonhuman primate endemic to Taiwan. This primate species is valuable for evolutionary studies and as subjects in medical research. However, only partial fragments of the mitochondrial genome (mitogenome) of this primate species have been sequenced, not mentioning its nuclear genome. We employed next-generation sequencing to generate 2 x 90 bp paired-end reads, followed by reference-assisted de novo assembly with multiple k-mer strategy to characterize the M. cyclopis mitogenome. We compared the assembled mitogenome with that of other macaque species for phylogenetic analysis. Our results show that, the M. cyclopis mitogenome consists of 16,563 nucleotides encoding for 13 protein-coding genes, 2 ribosomal RNAs and 22 transfer RNAs. Phylogenetic analysis indicates that M. cyclopis is most closely related to M. mulatta lasiota (Chinese rhesus macaque), supporting the notion of Asia-continental origin of M. cyclopis proposed in previous studies based on partial mitochondrial sequences. Our work presents a novel approach for assembling a mitogenome that utilizes the capabilities of de novo genome assembly with assistance of a reference genome. The availability of the complete Taiwanese macaque mitogenome will facilitate the study of primate evolution and the characterization of genetic variations for the potential usage of this species as a non-human primate model for medical research.  相似文献   

16.
The Alphaproteobacteria comprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium.  相似文献   

17.
Genome structure and divergence of nucleotide sequences in echinodermata   总被引:1,自引:0,他引:1  
The arrangement of repetitive and single-copy DNA sequences has been studied in DNA of some species of Echinodermata — sea urchin, starfishes and sea-cucumber. Comparison of the reassociation kinetics of short and long DNA fragments indicates that the pattern of DNA sequence organization of all these species is similar to the so called Xenopus pattern characteristic of the genomes of most animals and plants. However, substantional variations have been found in the amount of repetitive nucleotide sequences in DNA of different species and in the length of DNA regions containing adjacent single-copy and repetitive sequences. Measurements of the size of S1-nuclease resistant reassociated repetitive DNA sequences show a variability of ratios between long and short repetitive DNA sequences of different species. — The degree of divergence of short and long repetitive DNA sequences and single-copy DNA was studied by molecular hybridization of the sea urchin Strongylocentrotus intermedius 3H-DNA with the DNA of other species and by determination of the thermostability of the hybridized molecules so obtained. All three fractions of S. intermedius DNA contain sequences homologous to DNA of the other echinoderm species studied. The results obtained suggest that short repetitive DNA sequences are those which have been most highly conserved throughout the evolution of Echinodermata. A new hypothesis is proposed to explain the nature of the evolutionary changes in DNA sequence interspersion patterns.  相似文献   

18.
Guo M  Han X  Jin T  Zhou L  Yang J  Li Z  Chen J  Geng B  Zou Y  Wan D  Li D  Dai W  Wang H  Chen Y  Ni P  Fang C  Yang R 《Journal of bacteriology》2012,194(14):3740-3741
Most of the species in the family Planctomycetaceae are of interest for their eukaryotic-like cell structures and characteristics of resistance to extreme environments. Here, we report draft genome sequences of three aquatic parasitic species of this family, Singulisphaera acidiphila (DSM 18658T), Schlesneria paludicola (DSM 18645T), and Zavarzinella formosa (DSM 19928T).  相似文献   

19.
20.
The implications of genome analysis for evolutionary theory and systematics are treated. The precise relationship between the theoretical and operational definitions of chromosome homology is shown to be uncertain. It is pointed out that genera defined by genome analysis may be either monophyletic or non-monophyletic, and that the genus is not a basic unit of evolution. Characters obtained by genome analysis may be useful in a phylogenetic context, provided they are treated as all other characters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号