首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Analysis of gene deletions is a fundamental approach for investigating gene function. We evaluated an algorithm that uses classification techniques to predict the phenotypic effects of gene deletions in yeast. We used a modified simulated annealing algorithm for feature selection and weighting. The selected features with high weights were phylogenetic conservation scores for bacteria, fungi (excluding Ascomycota), Ascomycota (excluding Saccharomyces cerevisiae), plants, and mammals, degree of paralogy, and number of protein-protein interactions. Classification was performed by weighted k-nearest neighbor and with support vector machine algorithms. To demonstrate how this approach might complement existing experimental procedures, we applied our algorithm to predict essential genes and genes causing morphological alterations in yeast.  相似文献   

2.
Complexity in the nervous system is established by developmental genetic programs, maintained by differential genetic profiles and sculpted by experiential and environmental influence over gene expression. Determining how specific genes define neuronal phenotypes, shape circuit connectivity and regulate circuit function is essential for understanding how the brain processes information, directs behavior and adapts to changing environments. Mouse genetics has contributed greatly to current percepts of gene‐circuit interfaces in behavior, but considerable work remains. Large‐scale initiatives to map gene expression and connectivity in the brain, together with advanced techniques in molecular genetics, now allow detailed exploration of the genetic basis of nervous system function at the level of specific circuit connections. In this review, we highlight several key advances for defining the function of specific genes within a neural network .  相似文献   

3.
An extension of the selection differential in the Robertson–Price equation for the mean phenotype in an age‐structured population is provided. Temporal changes in the mean phenotype caused by transient fluctuations in the age‐distribution and variation in mean phenotype among age classes, which can mistakenly be interpreted as selection, will disappear if reproductive value weighting is applied. Changes in any weighted mean phenotype in an age‐structured population may be decomposed into between‐ and within‐age class components. Using reproductive value weighting the between‐age class component becomes pure noise, generated by previous genetic drift or fluctuating selection. This component, which we call transient quasi‐selection, can therefore be omitted when estimating age‐specific selection on fecundity or viability within age classes. The final response can be computed at the time of selection, but can not be observed until lifetime reproduction is realized unless the heritability is one. The generality of these results is illustrated further by our derivation of the selection differential for the continuous time age‐structured model with general age‐dependent weights. A simple simulation example as well as estimation of selection components in a house sparrow population illustrates the applicability of the theory to analyze selection on the mean phenotype in fluctuating age‐structured populations.  相似文献   

4.
In population‐based case‐control studies, it is of great public‐health importance to estimate the disease incidence rates associated with different levels of risk factors. This estimation is complicated by the fact that in such studies the selection probabilities for the cases and controls are unequal. A further complication arises when the subjects who are selected into the study do not participate (i.e. become nonrespondents) and nonrespondents differ systematically from respondents. In this paper, we show how to account for unequal selection probabilities as well as differential nonresponses in the incidence estimation. We use two logistic models, one relating the disease incidence rate to the risk factors, and one modelling the predictors that affect the nonresponse probability. After estimating the regression parameters in the nonresponse model, we estimate the regression parameters in the disease incidence model by a weighted estimating function that weights a respondent's contribution to the likelihood score function by the inverse of the product of his/her selection probability and his/her model‐predicted response probability. The resulting estimators of the regression parameters and the corresponding estimators of the incidence rates are shown to be consistent and asymptotically normal with easily estimated variances. Simulation results demonstrate that the asymptotic approximations are adequate for practical use and that failure to adjust for nonresponses could result in severe biases. An illustration with data from a cardiovascular study that motivated this work is presented.  相似文献   

5.
Theory predicts that speciation‐with‐gene‐flow is more likely when the consequences of selection for population divergence transitions from mainly direct effects of selection acting on individual genes to a collective property of all selected genes in the genome. Thus, understanding the direct impacts of ecologically based selection, as well as the indirect effects due to correlations among loci, is critical to understanding speciation. Here, we measure the genome‐wide impacts of host‐associated selection between hawthorn and apple host races of Rhagoletis pomonella (Diptera: Tephritidae), a model for contemporary speciation‐with‐gene‐flow. Allele frequency shifts of 32 455 SNPs induced in a selection experiment based on host phenology were genome wide and highly concordant with genetic divergence between co‐occurring apple and hawthorn flies in nature. This striking genome‐wide similarity between experimental and natural populations of R. pomonella underscores the importance of ecological selection at early stages of divergence and calls for further integration of studies of eco‐evolutionary dynamics and genome divergence.  相似文献   

6.
《Genomics》2020,112(3):2524-2534
The development of embryonic cells involves several continuous stages, and some genes are related to embryogenesis. To date, few studies have systematically investigated changes in gene expression profiles during mammalian embryogenesis. In this study, a computational analysis using machine learning algorithms was performed on the gene expression profiles of mouse embryonic cells at seven stages. First, the profiles were analyzed through a powerful Monte Carlo feature selection method for the generation of a feature list. Second, increment feature selection was applied on the list by incorporating two classification algorithms: support vector machine (SVM) and repeated incremental pruning to produce error reduction (RIPPER). Through SVM, we extracted several latent gene biomarkers, indicating the stages of embryonic cells, and constructed an optimal SVM classifier that produced a nearly perfect classification of embryonic cells. Furthermore, some interesting rules were accessed by the RIPPER algorithm, suggesting different expression patterns for different stages.  相似文献   

7.
8.
Computational prediction of RNA‐binding residues is helpful in uncovering the mechanisms underlying protein‐RNA interactions. Traditional algorithms individually applied feature‐ or template‐based prediction strategy to recognize these crucial residues, which could restrict their predictive power. To improve RNA‐binding residue prediction, herein we propose the first integrative algorithm termed RBRDetector (RNA‐Binding Residue Detector) by combining these two strategies. We developed a feature‐based approach that is an ensemble learning predictor comprising multiple structure‐based classifiers, in which well‐defined evolutionary and structural features in conjunction with sequential or structural microenvironment were used as the inputs of support vector machines. Meanwhile, we constructed a template‐based predictor to recognize the putative RNA‐binding regions by structurally aligning the query protein to the RNA‐binding proteins with known structures. The final RBRDetector algorithm is an ingenious fusion of our feature‐ and template‐based approaches based on a piecewise function. By validating our predictors with diverse types of structural data, including bound and unbound structures, native and simulated structures, and protein structures binding to different RNA functional groups, we consistently demonstrated that RBRDetector not only had clear advantages over its component methods, but also significantly outperformed the current state‐of‐the‐art algorithms. Nevertheless, the major limitation of our algorithm is that it performed relatively well on DNA‐binding proteins and thus incorrectly predicted the DNA‐binding regions as RNA‐binding interfaces. Finally, we implemented the RBRDetector algorithm as a user‐friendly web server, which is freely accessible at http://ibi.hzau.edu.cn/rbrdetector . Proteins 2014; 82:2455–2471. © 2014 Wiley Periodicals, Inc.  相似文献   

9.
Microarrays have been useful in understanding various biological processes by allowing the simultaneous study of the expression of thousands of genes. However, the analysis of microarray data is a challenging task. One of the key problems in microarray analysis is the classification of unknown expression profiles. Specifically, the often large number of non-informative genes on the microarray adversely affects the performance and efficiency of classification algorithms. Furthermore, the skewed ratio of sample to variable poses a risk of overfitting. Thus, in this context, feature selection methods become crucial to select relevant genes and, hence, improve classification accuracy. In this study, we investigated feature selection methods based on gene expression profiles and protein interactions. We found that in our setup, the addition of protein interaction information did not contribute to any significant improvement of the classification results. Furthermore, we developed a novel feature selection method that relies exclusively on observed gene expression changes in microarray experiments, which we call “relative Signal-to-Noise ratio” (rSNR). More precisely, the rSNR ranks genes based on their specificity to an experimental condition, by comparing intrinsic variation, i.e. variation in gene expression within an experimental condition, with extrinsic variation, i.e. variation in gene expression across experimental conditions. Genes with low variation within an experimental condition of interest and high variation across experimental conditions are ranked higher, and help in improving classification accuracy. We compared different feature selection methods on two time-series microarray datasets and one static microarray dataset. We found that the rSNR performed generally better than the other methods.  相似文献   

10.
A vast number of human cell lines are available for cell culture model‐based studies, and as such the potential exists for discrepancies in findings due to cell line selection. To investigate this concept, the authors determine the relative protein abundance profiles of a panel of eight diverse, but commonly studied human cell lines. This panel includes HAP1, HEK293T, HeLa, HepG2, Jurkat, Panc1, SH‐SY5Y, and SVGp12. A mass spectrometry‐based proteomics workflow designed to enhance quantitative accuracy while maintaining analytical depth is used. To this end, this strategy leverages TMTpro16‐based sample multiplexing, high‐field asymmetric ion mobility spectrometry, and real‐time database searching. The data show that the differences in the relative protein abundance profiles reflect cell line diversity. The authors also determine several hundred proteins to be highly enriched for a given cell line, and perform gene ontology and pathway analysis on these cell line‐enriched proteins. An R Shiny application is designed to query protein abundance profiles and retrieve proteins with similar patterns. The workflows used herein can be applied to additional cell lines to aid cell line selection for addressing a given scientific inquiry or for improving an experimental design.  相似文献   

11.
Osteoarthritis (OA) significantly influences the quality life of people around the world. It is urgent to find an effective way to understand the genetic etiology of OA. We used weighted gene coexpression network analysis (WGCNA) to explore the key genes involved in the subchondral bone pathological process of OA. Fifty gene expression profiles of GSE51588 were downloaded from the Gene Expression Omnibus database. The OA‐associated genes and gene ontologies were acquired from JuniorDoc. Weighted gene coexpression network analysis was used to find disease‐related networks based on 21756 gene expression correlation coefficients, hub‐genes with the highest connectivity in each module were selected, and the correlation between module eigengene and clinical traits was calculated. The genes in the traits‐related gene coexpression modules were subject to functional annotation and pathway enrichment analysis using ClusterProfiler. A total of 73 gene modules were identified, of which, 12 modules were found with high connectivity with clinical traits. Five modules were found with enriched OA‐associated genes. Moreover, 310 OA‐associated genes were found, and 34 of them were among hub‐genes in each module. Consequently, enrichment results indicated some key metabolic pathways, such as extracellular matrix (ECM)‐receptor interaction (hsa04512), focal adhesion (hsa04510), the phosphatidylinositol 3'‐kinase (PI3K)‐Akt signaling pathway (PI3K‐AKT) (hsa04151), transforming growth factor beta pathway, and Wnt pathway. We intended to identify some core genes, collagen (COL)6A3, COL6A1, ITGA11, BAMBI, and HCK, which could influence downstream signaling pathways once they were activated. In this study, we identified important genes within key coexpression modules, which associate with a pathological process of subchondral bone in OA. Functional analysis results could provide important information to understand the mechanism of OA.  相似文献   

12.
13.
In recent years proteomics became increasingly important to functional genomics. Although a large amount of data is generated by high throughput large‐scale techniques, a connection of these mostly heterogeneous data from different analytical platforms and of different experiments is limited. Data mining procedures and algorithms are often insufficient to extract meaningful results from large datasets and therefore limit the exploitation of the generated biological information. In our proteomic core facility, which almost exclusively focuses on 2‐DE/MS‐based proteomics, we developed a proteomic database custom tailored to our needs aiming at connecting MS protein identification information to 2‐DE derived protein expression profiles. The tools developed should not only enable an automatic evaluation of single experiments, but also link multiple 2‐DE experiments with MS‐data on different levels and thereby helping to create a comprehensive network of our proteomics data. Therefore the key feature of our “PROTEOMER” database is its high cross‐referencing capacity, enabling integration of a wide range of experimental data. To illustrate the workflow and utility of the system, two practical examples are provided to demonstrate that proper data cross‐referencing can transform information into biological knowledge.  相似文献   

14.
Gene expression profiling has gradually become a routine procedure for disease diagnosis and classification. In the past decade, many computational methods have been proposed, resulting in great improvements on various levels, including feature selection and algorithms for classification and clustering. In this study, we present iPcc, a novel method from the feature extraction perspective to further propel gene expression profiling technologies from bench to bedside. We define ‘correlation feature space’ for samples based on the gene expression profiles by iterative employment of Pearson’s correlation coefficient. Numerical experiments on both simulated and real gene expression data sets demonstrate that iPcc can greatly highlight the latent patterns underlying noisy gene expression data and thus greatly improve the robustness and accuracy of the algorithms currently available for disease diagnosis and classification based on gene expression profiles.  相似文献   

15.
16.
Modern biology has been heavily influenced by the gene‐centric concept. Paradoxically, this very concept – on which bioresearch is based – is challenged by the success of gene‐based research in terms of explaining evolutionary theory. To overcome this major roadblock, it is essential to establish new theories, to not only solve the key puzzles presented by the gene‐centric concept, but also to provide a conceptual framework that allows the field to grow. This paper discusses a number of paradoxes and illustrates how they can be addressed by the genome‐centric concept in order to further resynthesize evolutionary theory. In particular, methodological breakthroughs that analyze genome evolution are discussed. The multiple interactions among different levels of a complex system provide the key to understanding the relationship between self‐organization and natural selection. Darwinian natural selection applies to the biological level due to its unique genetic and heterogeneous features, but does not simply or directly apply to either the lower non‐living level or higher intellectual society level. At the complex bio‐system level, the genome context (the entire package of genes and their genomic physical relationship or genomic topology), not the individual genes, defines the system and serves as the principle selection platform for evolution.  相似文献   

17.
Identifying reproducible yet relevant protein features in proteomics data is a major challenge. Analysis at the level of protein complexes can resolve this issue and we have developed a suite of feature‐selection methods collectively referred to as Rank‐Based Network Analysis (RBNA). RBNAs differ in their individual statistical test setup but are similar in the sense that they deploy rank‐defined weights among proteins per sample. This procedure is known as gene fuzzy scoring. Currently, no RBNA exists for paired‐sample scenarios where both control and test tissues originate from the same source (e.g. same patient). It is expected that paired tests, when used appropriately, are more powerful than approaches intended for unpaired samples. We report that the class‐paired RBNA, PPFSNET, dominates in both simulated and real data scenarios. Moreover, for the first time, we explicitly incorporate batch‐effect resistance as an additional evaluation criterion for feature‐selection approaches. Batch effects are class irrelevant variations arising from different handlers or processing times, and can obfuscate analysis. We demonstrate that PPFSNET and an earlier RBNA, PFSNET, are particularly resistant against batch effects, and only select features strongly correlated with class but not batch.  相似文献   

18.
RNA silencing is a complex of mechanisms that regulate gene expression through small RNA molecules. The microRNA (miRNA) pathway is the most common of these in mammals. Genome‐encoded miRNAs suppress translation in a sequence‐specific manner and facilitate shifts in gene expression during developmental transitions. Here, we discuss the role of miRNAs in oocyte‐to‐zygote transition and in the control of pluripotency. Existing data suggest a common principle involving miRNAs in defining pluripotent and differentiated cells. RNA silencing pathways also rapidly evolve, resulting in many unique features of RNA silencing in different taxonomic groups. This is exemplified in the mouse model of oocyte‐to‐zygote transition, in which the endogenous RNA interference pathway has acquired a novel role in regulating protein‐coding genes, while the miRNA pathway has become transiently suppressed.  相似文献   

19.
20.
Complex proteoforms contain various primary structural alterations resulting from variations in genes, RNA, and proteins. Top‐down mass spectrometry is commonly used for analyzing complex proteoforms because it provides whole sequence information of the proteoforms. Proteoform identification by top‐down mass spectral database search is a challenging computational problem because the types and/or locations of some alterations in target proteoforms are in general unknown. Although spectral alignment and mass graph alignment algorithms have been proposed for identifying proteoforms with unknown alterations, they are extremely slow to align millions of spectra against tens of thousands of protein sequences in high throughput proteome level analyses. Many software tools in this area combine efficient protein sequence filtering algorithms and spectral alignment algorithms to speed up database search. As a result, the performance of these tools heavily relies on the sensitivity and efficiency of their filtering algorithms. Here, we propose two efficient approximate spectrum‐based filtering algorithms for proteoform identification. We evaluated the performances of the proposed algorithms and four existing ones on simulated and real top‐down mass spectrometry data sets. Experiments showed that the proposed algorithms outperformed the existing ones for complex proteoform identification. In addition, combining the proposed filtering algorithms and mass graph alignment algorithms identified many proteoforms missed by ProSightPC in proteome‐level proteoform analyses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号