共查询到20条相似文献,搜索用时 0 毫秒
1.
Eight years after publication of the Arabidopsis genome sequence and two years before completing the first phase of an international effort to characterize the function of every Arabidopsis gene, plant biologists remain unable to provide a definitive answer to the following basic question: what is the minimal gene set required for normal growth and development? The purpose of this review is to summarize different strategies employed to identify essential genes in Arabidopsis, an important component of the minimal gene set in plants, to present an overview of the datasets and specific genes identified to date, and to discuss the prospects for future saturation of this important class of genes. The long-term goal of this collaborative effort is to facilitate basic research in plant biology and complement ongoing research with other model organisms. 相似文献
2.
Identifying bacterial genes and endosymbiont DNA with Glimmer 总被引:11,自引:0,他引:11
MOTIVATION: The Glimmer gene-finding software has been successfully used for finding genes in bacteria, archaea and viruses representing hundreds of species. We describe several major changes to the Glimmer system, including improved methods for identifying both coding regions and start codons. We also describe a new module of Glimmer that can distinguish host and endosymbiont DNA. This module was developed in response to the discovery that eukaryotic genome sequencing projects sometimes inadvertently capture the DNA of intracellular bacteria living in the host. RESULTS: The new methods dramatically reduce the rate of false-positive predictions, while maintaining Glimmer's 99% sensitivity rate at detecting genes in most species, and they find substantially more correct start sites, as measured by comparisons to known and well-curated genes. We show that our interpolated Markov model (IMM) DNA discriminator correctly separated 99% of the sequences in a recent genome project that produced a mixture of sequences from the bacterium Prochloron didemni and its sea squirt host, Lissoclinum patella. AVAILABILITY: Glimmer is OSI Certified Open Source and available at http://cbcb.umd.edu/software/glimmer. 相似文献
3.
Background
Evolution of metabolism occurs through the acquisition and loss of genes whose products acts as enzymes in metabolic reactions, and from a presumably simple primordial metabolism the organisms living today have evolved complex and highly variable metabolisms. We have studied this phenomenon by comparing the metabolic networks of 134 bacterial species with known phylogenetic relationships, and by studying a neutral model of metabolic network evolution. 相似文献4.
Essential genes, those indispensable for the survival of an organism, play a key role in the emerging field, synthetic biology. Characterization of functions encoded by essential genes not only has important practical implications, such as in identifying antibiotic drug targets, but can also enhance our understanding of basic biology, such as functions needed to support cellular life. Enzymes are critical for almost all cellular activities. However, essential genes have not been systematically examined from the aspect of enzymes and the chemical reactions that they catalyze. Here, by comprehensively analyzing essential genes in 14 bacterial genomes in which large-scale gene essentiality screens have been performed, we found that enzymes are enriched in essential genes. Essential enzymes have overrepresented ligases (especially those forming carbon-oxygen bonds and carbon-nitrogen bonds), nucleotidyltransferases and phosphotransferases, while have underrepresented oxidoreductases. Furthermore, essential enzymes tend to associate with more gene ontology domains. These results, from the aspect of chemical reactions, provide further insights into the understanding of functions needed to support natural cellular life, as well as synthetic cells, and provide additional parameters that can be integrated into gene essentiality prediction algorithms. 相似文献
5.
6.
Genetic relationships and population structure of 8 horse breeds in the Czech and Slovak Republics were investigated using
classification methods for breed discrimination. To demonstrate genetic differences among these breeds, we used genetic information
— genotype data of microsatellite markers and classification algorithms — to perform a probabilistic prediction of an individual’s
breed. In total, 932 unrelated animals were genotyped for 17 microsatellite markers recommended by the ISAG for parentage
testing (AHT4, AHT5, ASB2, HMS3, HMS6, HMS7, HTG4, HTG10, VHL20, HTG6, HMS2, HTG7, ASB17, ASB23, CA425, HMS1, LEX3). Algorithms
of classification methods — J48 (decision trees); Naive Bayes, Bayes Net (probability predictors); IB1, IB5 (instance-based
machine learning methods); and JRip (decision rules) — were used for analysis of their classification performance and of results
of classification on this genotype dataset. Selected classification methods (Naive Bayes, Bayes Net, IB1), based on machine
learning and principles of artificial intelligence, appear usable for these tasks. 相似文献
7.
Background
Biological systems are often modular: they can be decomposed into nearly-independent structural units that perform specific functions. The evolutionary origin of modularity is a subject of much current interest. Recent theory suggests that modularity can be enhanced when the environment changes over time. However, this theory has not yet been tested using biological data. 相似文献8.
Background
A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. 相似文献9.
Zoltán Dezs? Yuri Nikolsky Tatiana Nikolskaya Jeremy Miller David Cherba Craig Webb Andrej Bugrim 《BMC systems biology》2009,3(1):36
Background
The identification of key target nodes within complex molecular networks remains a common objective in scientific research. The results of pathway analyses are usually sets of fairly complex networks or functional processes that are deemed relevant to the condition represented by the molecular profile. To be useful in a research or clinical laboratory, the results need to be translated to the level of testable hypotheses about individual genes and proteins within the condition of interest. 相似文献10.
11.
ABSTRACT: BACKGROUND: Abattoir detected pathologies are of crucial importance to both pig production and food safety. Usually, more than one pathology coexist in a pig herd although it often remains unknown how these different pathologies interrelate to each other. Identification of the associations between different pathologies may facilitate an improved understanding of their underlying biological linkage, and support the veterinarians in encouraging control strategies aimed at reducing the prevalence of not just one, but two or more conditions simultaneously. RESULTS: Multi-dimensional machine learning methodology was used to identify associations between ten typical pathologies in 6485 batches of slaughtered finishing pigs, assisting the comprehension of their biological association. Pathologies potentially associated with septicaemia (e.g. pericarditis, peritonitis) appear interrelated, suggesting on-going bacterial challenges by pathogens such as Haemophilus parasuis and Streptococcus suis. Furthermore, hepatic scarring appears interrelated with both milk spot livers (Ascaris suum) and bacteria-related pathologies, suggesting a potential multi-pathogen nature for this pathology. CONCLUSIONS: The application of novel multi-dimensional machine learning methodology provided new insights into how typical pig pathologies are potentially interrelated at batch level. The methodology presented is a powerful exploratory tool to generate hypotheses, applicable to a wide range of studies in veterinary research. 相似文献
12.
Identifying emerging viral pathogens and characterizing their transmission is essential to developing effective public health measures in response to an epidemic. Phylogenetics, though currently the most popular tool used to characterize the likely host of a virus, can be ambiguous when studying species very distant to known species and when there is very little reliable sequence information available in the early stages of the outbreak of disease. Motivated by an existing framework for representing biological sequence information, we learn sparse, tree-structured models, built from decision rules based on subsequences, to predict viral hosts from protein sequence data using popular discriminative machine learning tools. Furthermore, the predictive motifs robustly selected by the learning algorithm are found to show strong host-specificity and occur in highly conserved regions of the viral proteome. 相似文献
13.
Graph-based methods have been widely used for the analysis of biological networks. Their application to metabolic networks has been much discussed, in particular noting that an important weakness in such methods is that reaction stoichiometry is neglected. In this study, we show that reaction stoichiometry can be incorporated into path-finding approaches via mixed-integer linear programming. This major advance at the modeling level results in improved prediction of topological and functional properties in metabolic networks. 相似文献
14.
Jutta Gebert Susanne Motameny Ulrich Faigle Christian V Forst Rainer Schrader 《Journal of computational biology》2008,15(2):185-194
In order to understand the behavior of a gene regulatory network, it is essential to know the genes that belong to it. Identifying the correct members (e.g., in order to build a model) is a difficult task even for small subnetworks. Usually only few members of a network are known and one needs to guess the missing members based on experience or informed speculation. It is beneficial if one can additionally rely on experimental data to support this guess. In this work we present a new method based on formal concept analysis to detect unknown members of a gene regulatory network from gene expression time series data. We show that formal concept analysis is able to find a list of candidate genes for inclusion into a partially known basic network. This list can then be reduced by a statistical analysis so that the resulting genes interact strongly with the basic network and therefore should be included when modeling the network. The method has been applied to the DNA repair system of Mycobacterium tuberculosis. In this application, our method produces comparable results to an already existing method of component selection while it is applicable to a broader range of problems. 相似文献
15.
16.
Yan Lin 《Biochemical and biophysical research communications》2010,396(2):472-476
Essential genes, indispensable genes for an organism’s survival, encode functions that are considered a foundation of life. Based on those experimentally determined for 10 bacteria, we find that essential genes are more preferentially situated at the leading strand than at the lagging strand, for all the 10 genomes studied, confirming previous findings based on either smaller datasets or putatively assigned ones by homology search. Furthermore, we find that rather than all essential genes, only those with the COG functional category of information storage and process (J, K and L), and subcategories D (cell cycle control), M (cell wall biogenesis), O (posttranslational modification), C (energy production and conversion), G (carbohydrate transport and metabolism), E (amino acid transport and metabolism) and F (nucleotide transport and metabolism) are preferentially situated at the leading strand. In contrast, the strand-bias for essential genes in other COG functional subcategories is not statistically significant. These results suggest that the remarkable strand-bias of the distribution of essential genes is mainly relevant to the aforementioned functionalities, which, therefore, likely play a key role in shaping the gene strand-bias in bacterial genomes. 相似文献
17.
Jun Liao Manfred K Warmuth Sridhar Govindarajan Jon E Ness Rebecca P Wang Claes Gustafsson Jeremy Minshull 《BMC biotechnology》2007,7(1):16
Background
Altering a protein's function by changing its sequence allows natural proteins to be converted into useful molecular tools. Current protein engineering methods are limited by a lack of high throughput physical or computational tests that can accurately predict protein activity under conditions relevant to its final application. Here we describe a new synthetic biology approach to protein engineering that avoids these limitations by combining high throughput gene synthesis with machine learning-based design algorithms. 相似文献18.
In the last decade, bacterial taxonomy witnessed a huge expansion. The swift pace of bacterial species (re-)definitions has a serious impact on the accuracy and completeness of first-line identification methods. Consequently, back-end identification libraries need to be synchronized with the List of Prokaryotic names with Standing in Nomenclature. In this study, we focus on bacterial fatty acid methyl ester (FAME) profiling as a broadly used first-line identification method. From the BAME@LMG database, we have selected FAME profiles of individual strains belonging to the genera Bacillus, Paenibacillus and Pseudomonas. Only those profiles resulting from standard growth conditions have been retained. The corresponding data set covers 74, 44 and 95 validly published bacterial species, respectively, represented by 961, 378 and 1673 standard FAME profiles. Through the application of machine learning techniques in a supervised strategy, different computational models have been built for genus and species identification. Three techniques have been considered: artificial neural networks, random forests and support vector machines. Nearly perfect identification has been achieved at genus level. Notwithstanding the known limited discriminative power of FAME analysis for species identification, the computational models have resulted in good species identification results for the three genera. For Bacillus, Paenibacillus and Pseudomonas, random forests have resulted in sensitivity values, respectively, 0.847, 0.901 and 0.708. The random forests models outperform those of the other machine learning techniques. Moreover, our machine learning approach also outperformed the Sherlock MIS (MIDI Inc., Newark, DE, USA). These results show that machine learning proves very useful for FAME-based bacterial species identification. Besides good bacterial identification at species level, speed and ease of taxonomic synchronization are major advantages of this computational species identification strategy. 相似文献
19.
Shriyaa Mittal 《Molecular simulation》2018,44(11):891-904
AbstractMolecular dynamics (MD) simulations are critical to understanding the movements of proteins in time. Yet, MD simulations are limited due to the availability of high-resolution protein structures, accuracy of the underlying force-field, computational expense, and difficulty in analysing big data-sets. Machine learning algorithms are now routinely used to circumvent many of these limitations and computational biophysicists are continuously making progress in developing novel applications. Here, we discuss some of these methods, varying from traditional dimensionality reduction approaches to more recent abstractions such as transfer learning and reinforcement learning, and how they have been used to deal with the challenges in MD. We conclude with the prospective issues in the application of machine learning methods in MD, to increase accuracy and efficiency of protein dynamics studies in general. 相似文献
20.
Baumgartner C Böhm C Baumgartner D Marini G Weinberger K Olgemöller B Liebl B Roscher AA 《Bioinformatics (Oxford, England)》2004,20(17):2985-2996
MOTIVATION: During the Bavarian newborn screening programme all newborns have been tested for about 20 inherited metabolic disorders. Owing to the amount and complexity of the generated experimental data, machine learning techniques provide a promising approach to investigate novel patterns in high-dimensional metabolic data which form the source for constructing classification rules with high discriminatory power. RESULTS: Six machine learning techniques have been investigated for their classification accuracy focusing on two metabolic disorders, phenylketo nuria (PKU) and medium-chain acyl-CoA dehydrogenase deficiency (MCADD). Logistic regression analysis led to superior classification rules (sensitivity >96.8%, specificity >99.98%) compared to all investigated algorithms. Including novel constellations of metabolites into the models, the positive predictive value could be strongly increased (PKU 71.9% versus 16.2%, MCADD 88.4% versus 54.6% compared to the established diagnostic markers). Our results clearly prove that the mined data confirm the known and indicate some novel metabolic patterns which may contribute to a better understanding of newborn metabolism. 相似文献