首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
As a result of genome, EST and cDNA sequencing projects, there are huge numbers of predicted and/or partially characterised protein sequences compared with a relatively small number of proteins with experimentally determined function and structure. Thus, there is a considerable attention focused on the accurate prediction of gene function and structure from sequence by using bioinformatics. In the course of our analysis of genomic sequence from Fugu rubripes, we identified a novel gene, SAND, with significant sequence identity to hypothetical proteins predicted in Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans, a Drosophila melanogaster gene, and mouse and human cDNAs. Here we identify a further SAND homologue in human and Arabidopsis thaliana by use of standard computational tools. We describe the genomic organisation of SAND in these evolutionarily divergent species and identify sequence homologues from EST database searches confirming the expression of SAND in over 20 different eukaryotes. We confirm the expression of two different SAND paralogues in mammals and determine expression of one SAND in other vertebrates and eukaryotes. Furthermore, we predict structural properties of SAND, and characterise conserved sequence motifs in this protein family.  相似文献   

2.
Identification of small nucleolar RNAs (snoRNAs) in genomic sequences has been challenging due to the relative paucity of sequence features. Many current prediction algorithms rely on detection of snoRNA motifs complementary to target sites in snRNAs and rRNAs. However, recent discovery of snoRNAs without apparent targets requires development of alternative prediction methods. We present an approach that combines rule-based filters and a Bayesian Classifier to identify a class of snoRNAs (H/ACA) without requiring target sequence information. It takes advantage of unique attributes of their genomic organization and improved species-specific motif characterization to predict snoRNAs that may otherwise be difficult to discover. Searches in the genomes of Caenorhabditis elegans and the closely related Caenorhabditis briggsae suggest that our method performs well compared to recent benchmark algorithms. Our results illustrate the benefits of training gene discovery engines on features restricted to particular phylogenetic groups and the utility of incorporating diverse data types in gene prediction.  相似文献   

3.
Large genomic sequencing projects of pathogens as well as human genome leads to immense genomic and proteomic data which would be very beneficial for the novel target identification in pathogens. Subtractive genomic approach is one of the most useful strategies helpful in identification of potential targets. The approach works by subtracting the genes or proteins homologous to both host and the pathogen and identify those set of gene or proteins which are essential for the pathogen and are exclusively present in the pathogen. Subtractive genomic approach is employed to identify novel target in salmonella typhi. The pathogen has 4718 proteins out of which 300 are found to be essential (“ indispensable to support cellular life”) in the pathogen with no human homolog. Metabolic pathway analyses of these 300 essential proteins revealed that 149 proteins are exclusively involved in several metabolic pathway of S. typhi. 8 metabolic pathways are found to be present exclusively in the pathogen comprising of 27 enzymes unique to the pathogen. Thus, these 27 proteins may serve as prospective drug targets. Sub-cellular localization prediction of the 300 essential proteins was done which reveals that 11 proteins lie on the outer membrane of the pathogen which could be probable vaccine candidates.  相似文献   

4.
Gene identification in novel eukaryotic genomes by self-training algorithm   总被引:8,自引:0,他引:8  
Finding new protein-coding genes is one of the most important goals of eukaryotic genome sequencing projects. However, genomic organization of novel eukaryotic genomes is diverse and ab initio gene finding tools tuned up for previously studied species are rarely suitable for efficacious gene hunting in DNA sequences of a new genome. Gene identification methods based on cDNA and expressed sequence tag (EST) mapping to genomic DNA or those using alignments to closely related genomes rely either on existence of abundant cDNA and EST data and/or availability on reference genomes. Conventional statistical ab initio methods require large training sets of validated genes for estimating gene model parameters. In practice, neither one of these types of data may be available in sufficient amount until rather late stages of the novel genome sequencing. Nevertheless, we have shown that gene finding in eukaryotic genomes could be carried out in parallel with statistical models estimation directly from yet anonymous genomic DNA. The suggested method of parallelization of gene prediction with the model parameters estimation follows the path of the iterative Viterbi training. Rounds of genomic sequence labeling into coding and non-coding regions are followed by the rounds of model parameters estimation. Several dynamically changing restrictions on the possible range of model parameters are added to filter out fluctuations in the initial steps of the algorithm that could redirect the iteration process away from the biologically relevant point in parameter space. Tests on well-studied eukaryotic genomes have shown that the new method performs comparably or better than conventional methods where the supervised model training precedes the gene prediction step. Several novel genomes have been analyzed and biologically interesting findings are discussed. Thus, a self-training algorithm that had been assumed feasible only for prokaryotic genomes has now been developed for ab initio eukaryotic gene identification.  相似文献   

5.
One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group features that are related to the phenotypes. In addition, the prediction mean square errors are also smaller than the component-wise boosting procedure. We demonstrate the application of the methods to pathway-based analysis of microarray gene expression data of breast cancer. Results from analysis of a breast cancer microarray gene expression data set indicate that the pathways of metalloendopeptidases (MMPs) and MMP inhibitors, as well as cell proliferation, cell growth, and maintenance are important to breast cancer-specific survival.  相似文献   

6.
7.
Germ Cell Tumors (GCT) have a high cure rate, but we currently lack the ability to accurately identify the small subset of patients who will die from their disease. We used a combined genomic and expression profiling approach to identify genomic regions and underlying genes that are predictive of outcome in GCT patients. We performed array-based comparative genomic hybridization (CGH) on 53 non-seminomatous GCTs (NSGCTs) treated with cisplatin based chemotherapy and defined altered genomic regions using Circular Binary Segmentation. We identified 14 regions associated with two year disease-free survival (2yDFS) and 16 regions associated with five year disease-specific survival (5yDSS). From corresponding expression data, we identified 101 probe sets that showed significant changes in expression. We built several models based on these differentially expressed genes, then tested them in an independent validation set of 54 NSGCTs. These predictive models correctly classified outcome in 64–79.6% of patients in the validation set, depending on the endpoint utilized. Survival analysis demonstrated a significant separation of patients with good versus poor predicted outcome when using a combined gene set model. Multivariate analysis using clinical risk classification with the combined gene model indicated that they were independent prognostic markers. This novel set of predictive genes from altered genomic regions is almost entirely independent of our previously identified set of predictive genes for patients with NSGCTs. These genes may aid in the identification of the small subset of patients who are at high risk of poor outcome.  相似文献   

8.
Yi G  Jung J 《Bioinformation》2011,7(5):251-256
Identifying genomic regions that descended from a common ancestor helps us study the gene function and genome evolution. In distantly related genomes, clusters of homologous gene pairs are evidently used in function prediction, operon detection, etc. Currently, there are many kinds of computational methods that have been proposed defining gene clusters to identify gene families and operons. However, most of those algorithms are only available on a data set of small size. We developed an efficient gene clustering algorithm that can be applied on hundreds of genomes at the same time. This approach allows for large-scale study of evolutionary relationships of gene clusters and study of operon formation and destruction. An analysis of proposed algorithms shows that more biological insight can be obtained by analyzing gene clusters across hundreds of genomes, which can help us understand operon occurrences, gene orientations and gene rearrangements.  相似文献   

9.
10.
11.
Non-ribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs) present in bacteria and fungi are the major multi-modular enzyme complexes which synthesize secondary metabolites like the pharmacologically important antibiotics and siderophores. Each of the multiple modules of an NRPS activates a different amino or aryl acid, followed by their condensation to synthesize a linear or cyclic natural product. The studies on NRPS domains, the knowledge of their gene cluster architecture and tailoring enzymes have helped in the in silico genetic screening of the ever-expanding sequenced microbial genomic data for the identification of novel NRPS/PKS clusters and thus deciphering novel non-ribosomal peptides (NRPs). Adenylation domain is an integral part of the NRPSs and is the substrate selecting unit for the final assembled NRP. In some cases, it also requires a small protein, the MbtH homolog, for its optimum activity. The presence of putative adenylation domain and MbtH homologs in a sequenced genome can help identify the novel secondary metabolite producers. The role of the adenylation domain in the NRPS gene clusters and its characterization as a tool for the discovery of novel cryptic NRPS gene clusters are discussed.  相似文献   

12.
With the recent emphasis on the importance of personalized genomic medicine, studies have performed prognostic stratification using gene signatures in cancers. However, these studies have not considered gene networks with clinical data. Therefore, this study aimed to develop a novel prognostic score using grouped variable selection for patients with osteosarcoma. We assessed messenger RNA (mRNA) expression and clinical data from Gene Expression Omnibus to develop a novel prognostic scoring system for patients with osteosarcoma. Variable selection using Network-Regularized high-dimensional Cox-regression analysis with information regarding gene networks obtained from six large pathway databases was performed. We determined the risk score on the linear combination of regression coefficients and mRNA expression values. Log-rank test, UNO's c-index, and area under the curve (AUC) values were determined to evaluate the discriminatory power between the low- and high-risk groups. A recently reported next-generation Connectivity Map was used to identify future therapeutic targets for osteosarcoma. Our novel model had significantly high discriminatory power in predicting overall survival. An optimal c-index of 0.967 was obtained and time-dependent receiver operating characteristic analysis revealed an acceptable predictive value of AUC between 0.953 and 1.000. Knockdown of BACE2 or ING2 and linifanib treatment may improve the prognosis of patients with osteosarcoma. Herein, this novel prognostic scoring system would not only facilitate a more accurate prediction of patient prognosis, but also contribute to the selection of suitable therapeutic alternatives for osteosarcoma patients.  相似文献   

13.
14.
《Genomics》2020,112(3):2233-2240
MicroRNA-like small RNAs (milRNAs) with length of 21–22 nucleotides are a type of small non-coding RNAs that are firstly found in Neurospora crassa in 2010. Identifying milRNAs of species without genomic information is a difficult problem. Here, knowledge-based energy features are developed to identify milRNAs by tactfully incorporating k-mer scheme and distance-dependent pair potential. Compared with k-mer scheme, features developed here can alleviate the inherent curse of dimensionality in k-scheme once k becomes large. In addition, milRNApredictor built on novel features performs comparably to k-mer scheme, and achieves sensitivity of 74.21%, and specificity of 75.72% based on 10-fold cross-validation. Furthermore, for novel miRNA prediction, there exists high overlap of results from milRNApredictor and state-of-the-art mirnovo. However, milRNApredictor is simpler to use with reduced requirements of input data and dependencies. Taken together, milRNApredictor can be used to de novo identify fungi milRNAs and other very short small RNAs of non-model organisms.  相似文献   

15.
DNA microarray technology is a versatile platform that allows rapid genetic analysis to take place on a genome-wide scale and has revolutionized the way cancers are studied. This platform has enabled researchers to characterize mechanisms central to tumorigenesis and understand important molecular events in the multi-step tumor progression model of cutaneous melanoma and other cancers. In melanoma, multiple global gene expression profiling studies using various DNA microarray platforms and various experimental designs have been performed. Each study has been able to capture and characterize either the involvement of a novel pathway or a novel cause-effect-relationship. The use of microarrays to define subclasses, to identify differentially regulated genes within a mutational context to analyze epigenetically regulated genes has resulted in an unprecedented understanding of the biology of cutaneous melanoma that may lead to more accurate diagnosis, more comprehensive prognosis, prediction and more effective therapeutic interventions. Related DNA microarray platforms like array-comparative genomic hybridization (CGH) have also been instrumental to identify many non-random chromosomal alterations; however, studies identifying validated targets as a result of CGH are limited. Thus, there exists significant opportunity to discover novel melanoma genes and translate such discoveries into meaningful clinical endpoints. In this review, we focus on various DNA microarray-based studies performed in cutaneous melanoma and summarize our current understanding of the genetics and biology of melanoma progression derived from accumulating genomic information.  相似文献   

16.
It has previously been shown that cDNA hybridization selection can identify and recover novel genes from large cloned genomic DNA such as cosmids or YACs. In an effort to identify candidate genes for hemochromatosis, this technique was applied to a 320-kb YAC containing the HLA-A gene. A short fragment cDNA library derived from human duodenum was selected with the YAC DNA. Ten novel gene fragments were isolated, characterized, and localized on the physical map of the YAC.  相似文献   

17.
Complex traits such as susceptibility to diseases are determined in part by variants at multiple genetic loci. Genome-wide association studies can identify these loci, but most phenotype-associated variants lie distal to protein-coding regions and are likely involved in regulating gene expression. Understanding how these genetic variants affect complex traits depends on the ability to predict and test the function of the genomic elements harboring them. Community efforts such as the ENCODE Project provide a wealth of data about epigenetic features associated with gene regulation. These data enable the prediction of testable functions for many phenotype-associated variants.  相似文献   

18.
Genome evolution in prokaryotes is assisted by integration of gene pools from phages and plasmids. Regions downstream of tRNAs and tmRNAs are considered as hot spots for the integration of these gene pools or genomic islands. Till date, genomic islands have been identified only at tRNA/tmRNA genes in the enterobacterial genomes. Present work reports 10 distinct small RNAs as potent integration sites for genomic islands. A known tool tRNAcc 1.0 has been used to identify genomic islands associated with small RNAs c0362, oxyS, ryaA, rybB, rybD, ryeB, ryeE, rtT, sraE and tmRNA. The coordinates of 25 such small RNA associated genomic islands in three E. coli (strains: CFT073, EDL933 and K12) and Shigella flexneri (strain: 301) genomes are presented. Moreover cross-verification of the genomic sequences encoded within the identified genomic islands in horizontal gene transfer database, GenBank annotation features and atypical sequence compositions support our results. Again, all of the identified 25 genomic integration sites do exhibit genomic block rearrangements with respect to the associated small RNA. Similar to tRNAs/tmRNAs, the downstream regions of the small RNAs are found to be hotspots of integration.  相似文献   

19.
Microbial genes that are “novel” (no detectable homologs in other species) have become of increasing interest as environmental sampling suggests that there are many more such novel genes in yet-to-be-cultured microorganisms. By analyzing known microbial genomic islands and prophages, we developed criteria for systematic identification of putative genomic islands (clusters of genes of probable horizontal origin in a prokaryotic genome) in 63 prokaryotic genomes, and then characterized the distribution of novel genes and other features. All but a few of the genomes examined contained significantly higher proportions of novel genes in their predicted genomic islands compared with the rest of their genome (Paired t test = 4.43E-14 to 1.27E-18, depending on method). Moreover, the reverse observation (i.e., higher proportions of novel genes outside of islands) never reached statistical significance in any organism examined. We show that this higher proportion of novel genes in predicted genomic islands is not due to less accurate gene prediction in genomic island regions, but likely reflects a genuine increase in novel genes in these regions for both bacteria and archaea. This represents the first comprehensive analysis of novel genes in prokaryotic genomic islands and provides clues regarding the origin of novel genes. Our collective results imply that there are different gene pools associated with recently horizontally transmitted genomic regions versus regions that are primarily vertically inherited. Moreover, there are more novel genes within the gene pool associated with genomic islands. Since genomic islands are frequently associated with a particular microbial adaptation, such as antibiotic resistance, pathogen virulence, or metal resistance, this suggests that microbes may have access to a larger “arsenal” of novel genes for adaptation than previously thought.  相似文献   

20.
Breast cancer is the most common malignancy in women worldwide. With the increasing awareness of heterogeneity in breast cancers, better prediction of breast cancer prognosis is much needed for more personalized treatment and disease management. Towards this goal, we have developed a novel computational model for breast cancer prognosis by combining the Pathway Deregulation Score (PDS) based pathifier algorithm, Cox regression and L1-LASSO penalization method. We trained the model on a set of 236 patients with gene expression data and clinical information, and validated the performance on three diversified testing data sets of 606 patients. To evaluate the performance of the model, we conducted survival analysis of the dichotomized groups, and compared the areas under the curve based on the binary classification. The resulting prognosis genomic model is composed of fifteen pathways (e.g. P53 pathway) that had previously reported cancer relevance, and it successfully differentiated relapse in the training set (log rank p-value = 6.25e-12) and three testing data sets (log rank p-value<0.0005). Moreover, the pathway-based genomic models consistently performed better than gene-based models on all four data sets. We also find strong evidence that combining genomic information with clinical information improved the p-values of prognosis prediction by at least three orders of magnitude in comparison to using either genomic or clinical information alone. In summary, we propose a novel prognosis model that harnesses the pathway-based dysregulation as well as valuable clinical information. The selected pathways in our prognosis model are promising targets for therapeutic intervention.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号