首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Biological data mining using kernel methods can be improved by a task-specific choice of the kernel function. Oligo kernels for genomic sequence analysis have proven to have a high discriminative power and to provide interpretable results. Oligo kernels that consider subsequences of different lengths can be combined and parameterized to increase their flexibility. For adapting these parameters efficiently, gradient-based optimization of the kernel-target alignment is proposed. The power of this new, general model selection procedure and the benefits of fitting kernels to problem classes are demonstrated by adapting oligo kernels for bacterial gene start detection  相似文献   

2.
Gu X 《Genetics》2007,175(4):1813-1822
In this article, we develop an evolutionary model for protein sequence evolution. Gene pleiotropy is characterized by K distinct but correlated components (molecular phenotypes) that affect the organismal fitness. These K molecular phenotypes are under stabilizing selection with microadaptation (SM) due to random optima shifts, the SM model. Random coding mutations generate a correlated distribution of K molecular phenotypes. Under this SM model, we further develop a statistical method to estimate the "effective" number of molecular phenotypes (K(e)) of the gene. Therefore, for the first time we can empirically evaluate gene pleiotropy from the protein sequence analysis. Case studies of vertebrate proteins indicate that K(e) is typically approximately 6-9. We demonstrate that the newly developed SM model of protein evolution may provide a basis for exploring genomic evolution and correlations.  相似文献   

3.
Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA, and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from nonmembers and hence detect noncoding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVMs) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences, and calculates the inner product of common stem structure counts. An efficient algorithm is developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from nonmembers using SVMs. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Furthermore, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel in order to find novel RNA families from genome sequences.  相似文献   

4.
A method for refining the beginnings of genes and a search for shifts of the reading frame is proposed. The method is based on a comparison of nucleotide and amino acid sequences of homologous genes of related organisms. The algorithm is based on the fact that the rate of changes in the protein-coding regions of the genome is substantially lower than that of noncoding regions. A modification of the Smith-Waterman algorithm is proposed, which makes it possible to align the amino acid sequences obtained by formal translation of the starting nucleotide sequences by taking into account a possible shift of the reading frame. The algorithm has been implemented in the package of ORTOLOGATOR-GeneCorrector programs. Testing the program showed that the approach enables one to detect a wrong annotation of the beginnings in 1% of genes (even in well-studied organisms such as Escherichia coli) and identify several (approximately 10) shifts of the open reading frame. Thus, the algorithm can be used at both the initial and final stages of analysis of the genome.  相似文献   

5.
Linking molecular evolution to biological function is a long‐standing challenge in evolutionary biology. Some of the best examples of this involve opsins, the genes that encode the molecular basis of light reception. In this issue of Molecular Ecology, three studies examine opsin gene sequence, expression and repertoire to determine how natural selection has shaped the visual system. First, Escobar‐Camacho et al. ( 2017 ) use opsin repertoire and expression in three Amazonian cichlid species to show that a shift in sensitivity towards longer wavelengths is coincident with the long‐wavelength‐dominated Amazon basin. Second, Stieb et al. ( 2017 ) explore opsin sequence and expression in reef‐dwelling damselfish and find that UV‐ and long‐wavelength vision are both important, but likely for different ecological functions. Lastly, Suvorov et al. ( 2017 ) study an expansive opsin repertoire in the insect order Odonata and find evidence that copy number expansion is consistent with the permanent heterozygote model of gene duplication. Together these studies emphasize the utility of opsin genes for studying both the local adaptation of sensory systems and, more generally, gene family evolution.  相似文献   

6.
Polymerase chain reaction-based assays provide rapid, simple, and sensitive detection of bacterial genes, but are not without their drawbacks. This review summarizes the principal advantages and disadvantages of PCR-based bacterial gene detection, provides guidelines for the development and validation of new PCR assays, and describes potential pitfalls that may be encountered and how these can be avoided.  相似文献   

7.
Emerging known and unknown pathogens create profound threats to public health. Platforms for rapid detection and characterization of microbial agents are critically needed to prevent and respond to disease outbreaks. Available detection technologies cannot provide broad functional information about known or novel organisms. As a step toward developing such a system, we have produced and tested a series of high-density functional gene arrays to detect elements of virulence and antibiotic resistance mechanisms. Our first generation array targets genes from Escherichia coli strains K12 and CFT073, Enterococcus faecalis and Staphylococcus aureus. We determined optimal probe design parameters for gene family detection and discrimination. When tested with organisms at varying phylogenetic distances from the four target strains, the array detected orthologs for the majority of targeted gene families present in bacteria belonging to the same taxonomic family. In combination with whole-genome amplification, the array detects femtogram concentrations of purified DNA, either spiked in to an aerosol sample background, or in combinations from one or more of the four target organisms. This is the first report of a high density NimbleGen microarray system targeting microbial antibiotic resistance and virulence mechanisms. By targeting virulence gene families as well as genes unique to specific biothreat agents, these arrays will provide important data about the pathogenic potential and drug resistance profiles of unknown organisms in environmental samples.  相似文献   

8.
9.
The discovery of regulatory motifs embedded in upstream regions of plants is a particularly challenging bioinformatics task. Previous studies have shown that motifs in plants are short compared with those found in vertebrates. Furthermore, plant genomes have undergone several diversification mechanisms such as genome duplication events which impact the evolution of regulatory motifs. In this article, a systematic phylogenomic comparison of upstream regions is conducted to further identify features of the plant regulatory genomes, the component of genomes regulating gene expression, to enable future de novo discoveries. The findings highlight differences in upstream region properties between major plant groups and the effects of divergence times and duplication events. First, clear differences in upstream region evolution can be detected between monocots and dicots, thus suggesting that a separation of these groups should be made when searching for novel regulatory motifs, particularly since universal motifs such as the TATA box are rare. Second, investigating the decay rate of significantly aligned regions suggests that a divergence time of ~100 mya sets a limit for reliable conserved non-coding sequence (CNS) detection. Insights presented here will set a framework to help identify embedded motifs of functional relevance by understanding the limits of bioinformatics detection for CNSs.  相似文献   

10.
It is known that while the programs used to find genes in prokaryotic genomes reliably map protein-coding regions, they often fail in the exact determination of gene starts. This problem is further aggravated by sequencing errors, most notably insertions and deletions leading to frame-shifts. Therefore, the exact mapping of gene starts and identification of frame-shifts are important problems of the computer-assisted functional analysis of newly sequenced genomes. Here we review methods of gene recognition and describe a new algorithm for correction of gene starts and identification of frame-shifts in prokaryotic genomes. The algorithm is based on the comparison of nucleotide and protein sequences of homologous genes from related organisms, using the assumption that the rate of evolutionary changes in protein-coding regions is lower than that in non-coding regions. A dynamic programming algorithm is used to align protein sequences obtained by formal translation of genomic nucleotide sequences. The possibility of frame-shifts is taken into account. The algorithm was tested on several groups of related organisms: gamma-proteobacteria, the Bacillus/Clostridium group, and three Pyrococcus genomes. The testing demonstrated that, dependent or a genome, 1-10 per cent of genes have incorrect starts or contain frame-shifts. The algorithm is implemented in the program package Orthologator-GeneCorrector.  相似文献   

11.
Evolutionary origins of bacterial bioluminescence   总被引:5,自引:0,他引:5  
In bacteria, most genes required for the bioluminescence phenotype are contained in lux operons. Sequence alignments of several lux gene products show the existence of at least two groups of paralogous products. The alpha- and beta-subunits of bacterial luciferase and the non-fluorescent flavoprotein are paralogous, and two antennae proteins (lumazine protein and yellow fluorescence protein) are paralogous with riboflavin synthetase. Models describing the evolution of these paralogous proteins are suggested, as well as a postulate for the identity of the gene encoding a protobioluminescent luciferase.  相似文献   

12.
Isolation of bacterial endophytes from germinated maize kernels   总被引:1,自引:0,他引:1  
The germination of surface-sterilized maize kernels under aseptic conditions proved to be a suitable method for isolation of kernel-associated bacterial endophytes. Bacterial strains identified by partial 16S rRNA gene sequencing as Pantoea sp., Microbacterium sp., Frigoribacterium sp., Bacillus sp., Paenibacillus sp., and Sphingomonas sp. were isolated from kernels of 4 different maize cultivars. Genus Pantoea was associated with a specific maize cultivar. The kernels of this cultivar were often overgrown with the fungus Lecanicillium aphanocladii; however, those exhibiting Pantoea growth were never colonized with it. Furthermore, the isolated bacterium strain inhibited fungal growth in vitro.  相似文献   

13.
14.
15.
Evolutionary optimization of fluorescent proteins for intracellular FRET   总被引:17,自引:0,他引:17  
Fluorescent proteins that exhibit Forster resonance energy transfer (FRET) have made a strong impact as they enable measurement of molecular-scale distances through changes in fluorescence. FRET-based approaches have enabled otherwise intractable measurements of molecular concentrations, binding interactions and catalytic activity, but are limited by the dynamic range and sensitivity of the donor-acceptor pair. To address this problem, we applied a quantitative evolutionary strategy using fluorescence-activated cell sorting to optimize a cyan-yellow fluorescent protein pair for FRET. The resulting pair, CyPet-YPet, exhibited a 20-fold ratiometric FRET signal change, as compared to threefold for the parental pair. The optimized FRET pair enabled high-throughput flow cytometric screening of cells undergoing caspase-3-dependent apoptosis. The CyPet-YPet energy transfer pair provides substantially improved sensitivity and dynamic range for a broad range of molecular imaging and screening applications.  相似文献   

16.
Escherichia coli express many types of O antigen, present in the outer membrane of the Gram-negative bacterial cell wall. O-Antigen biosynthesis genes are clustered together and differences seen in O-antigen types are due to genetic variation within this gene cluster. Sequencing of the E. coli O4 O-antigen gene cluster revealed a similar gene order and high levels of similarity to that of E. coli O26; indicating a common ancestor. These lateral transfer events observed within O-antigen gene clusters may occur as part of the evolution of the pathogenic clones.  相似文献   

17.
One of the main obstacles to the widespread use of artificial neural networks is the difficulty of adequately defining values for their free parameters. This article discusses how Radial Basis Function (RBF) networks can have their parameters defined by genetic algorithms. For such, it presents an overall view of the problems involved and the different approaches used to genetically optimize RBF networks. A new strategy to optimize RBF networks using genetic algorithms is proposed, which includes new representation, crossover operator and the use of a multiobjective optimization criterion. Experiments using a benchmark problem are performed and the results achieved using this model are compared to those achieved by other approaches.  相似文献   

18.
Polyketide synthases (PKS) perform a stepwise biosynthesis of diverse carbon skeletons from simple activated carboxylic acid units. The products of the complex pathways possess a wide range of pharmaceutical properties, including antibiotic, antitumor, antifungal, and immunosuppressive activities. We have performed a comprehensive phylogenetic analysis of multimodular and iterative PKS of bacteria and fungi and of the distinct types of fatty acid synthases (FAS) from different groups of organisms based on the highly conserved ketoacyl synthase (KS) domains. Apart from enzymes that meet the classification standards we have included enzymes involved in the biosynthesis of mycolic acids, polyunsaturated fatty acids (PUFA), and glycolipids in bacteria. This study has revealed that PKS and FAS have passed through a long joint evolution process, in which modular PKS have a central position. They appear to have derived from bacterial FAS and primary iterative PKS and, in addition, share a common ancestor with animal FAS and secondary iterative PKS. Furthermore, we have carried out a phylogenomic analysis of all modular PKS that are encoded by the complete eubacterial genomes currently available in the database. The phylogenetic distribution of acyltransferase and KS domain sequences revealed that multiple gene duplications, gene losses, as well as horizontal gene transfer (HGT) have contributed to the evolution of PKS I in bacteria. The impact of these factors seems to vary considerably between the bacterial groups. Whereas in actinobacteria and cyanobacteria the majority of PKS I genes may have evolved from a common ancestor, several lines of evidence indicate that HGT has strongly contributed to the evolution of PKS I in proteobacteria. Discovery of new evolutionary links between PKS and FAS and between the different PKS pathways in bacteria may help us in understanding the selective advantage that has led to the evolution of multiple secondary metabolite biosyntheses within individual bacteria.  相似文献   

19.
20.
MOTIVATION: Protein remote homology detection is a central problem in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for remote homology detection. The performance of these methods depends on how the protein sequences are modeled and on the method used to compute the kernel function between them. RESULTS: We introduce two classes of kernel functions that are constructed by combining sequence profiles with new and existing approaches for determining the similarity between pairs of protein sequences. These kernels are constructed directly from these explicit protein similarity measures and employ effective profile-to-profile scoring schemes for measuring the similarity between pairs of proteins. Experiments with remote homology detection and fold recognition problems show that these kernels are capable of producing results that are substantially better than those produced by all of the existing state-of-the-art SVM-based methods. In addition, the experiments show that these kernels, even when used in the absence of profiles, produce results that are better than those produced by existing non-profile-based schemes. AVAILABILITY: The programs for computing the various kernel functions are available on request from the authors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号