首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The availability of complete genome sequences of many bacterial species is facilitating numerous computational approaches for understanding bacterial genomes. One of the major incentives behind the genome sequencing of many pathogenic bacteria is the desire to better understand their diversity and to develop new approaches for controlling human diseases caused by these microorganisms. This task has become even more urgent with the rapid evolution of antibiotic resistance among many bacterial pathogens. Novel drug targets are required in order to design new antimicrobials against antibiotic-resistant pathogens. The complete genome sequences of an ever increasing number of pathogenic microbes constitute an invaluable resource and provide lead information on potential drug targets. This review focuses on in silico analyses of microbial genomes, their host-specific adaptations, with specific reference to genome architecture, design, evolution, and trends in computational identification of microbial drug targets. These trends underscore the utility of genomic data for systematic in silico drug target identification in the post-genomic era.  相似文献   

2.
Protein-protein interactions (PPIs) are crucial to most biochemical processes in human beings. Although many human PPIs have been identified by experiments, the number is still limited compared to the available protein sequences of human organisms. Recently, many computational methods have been proposed to facilitate the recognition of novel human PPIs. However the existing methods only concentrated on the information of individual PPI, while the systematic characteristic of protein-protein interaction networks (PINs) was ignored. In this study, a new method was proposed by combining the global information of PINs and protein sequence information. Random forest (RF) algorithm was implemented to develop the prediction model, and a high accuracy of 91.88% was obtained. Furthermore, the RF model was tested using three independent datasets with good performances, suggesting that our method is a useful tool for identification of PPIs and investigation into PINs as well.  相似文献   

3.
The field of computational biology has been revolutionized by recent advances in genomics. The completion of a number of genome projects, including that of the human genome, has paved the way toward a variety of challenges and opportunities in bioinformatics and biological systems engineering. One of the first challenges has been the determination of the structures of proteins encoded by the individual genes. This problem, which represents the progression from sequence to structure (genomics to structural genomics), has been widely known as the structure-prediction-in-protein-folding problem. We present the development and application of ASTRO-FOLD, a novel and complete approach for the ab initio prediction of protein structures given only the amino acid sequences of the proteins. The approach exhibits many novel components and the merits of its application are examined for a suite of protein systems, including a number of targets from several critical-assessment-of-structure-prediction experiments.  相似文献   

4.
High-throughput genome sequencing continues to accelerate the rate at which complete genomes are available for biological research. Many of these new genome sequences have little or no genome annotation currently available and hence rely upon computational predictions of protein coding genes. Evidence of translation from proteomic techniques could facilitate experimental validation of protein coding genes, but the techniques for whole genome searching with MS/MS data have not been adequately developed to date. Here we describe GENQUEST, a novel method using peptide isoelectric focusing and accurate mass to greatly reduce the peptide search space, making fast, accurate, and sensitive whole human genome searching possible on common desktop computers. In an initial experiment, almost all exonic peptides identified in a protein database search were identified when searching genomic sequence. Many peptides identified exclusively in the genome searches were incorrectly identified or could not be experimentally validated, highlighting the importance of orthogonal validation. Experimentally validated peptides exclusive to the genomic searches can be used to reannotate protein coding genes. GENQUEST represents an experimental tool that can be used by the proteomics community at large for validating computational approaches to genome annotation.  相似文献   

5.
The preponderance of evidence implicates protein misfolding in many unrelated human diseases. In all cases, normal correctly folded proteins transform from their proper native structure into an abnormal beta-rich structure known as amyloid fibril. Here we introduce a computational algorithm to detect nonnative (hidden) sequence propensity for amyloid fibril formation. Analyzing sequence-structure relationships in terms of tertiary contact (TC), we find that the hidden beta-strand propensity of a query local sequence can be quantitatively estimated from the secondary structure preferences of template sequences of known secondary structure found in regions of high TC. The present method correctly pinpoints the minimal peptide fragment shown experimentally as the likely local mediator of amyloid fibril formation in beta-amyloid peptide, islet amyloid polypeptide (hIAPP), alpha-synuclein, and human acetylcholinesterase (AChE). It also found previously unrecognized beta-strand propensities in the prototypical helical protein myoglobin that has been reported as amyloidogenic. Analysis of 2358 nonhomologous protein domains provides compelling evidence that most proteins contain sequences with significant hidden beta-strand propensity. The present method may find utility in many medically relevant applications, such as the engineering of protein sequences and the discovery of therapeutic agents that specifically target these sequences for the prevention and treatment of amyloid diseases.  相似文献   

6.
7.
Recent analyses of complete genome sequences have revealed that many genomes have been duplicated in their evolutionary past. Such events have been associated with important biological transitions, major leaps in evolution and adaptive radiations of species. Here, we consider recently developed computational methods to detect such ancient large-scale gene duplication events. Several new approaches have been used to show that large-scale gene duplications are more common than previously thought.  相似文献   

8.
Membrane proteins play a crucial role in various cellular processes and are essential components of cell membranes. Computational methods have emerged as a powerful tool for studying membrane proteins due to their complex structures and properties that make them difficult to analyze experimentally. Traditional features for protein sequence analysis based on amino acid types, composition, and pair composition have limitations in capturing higher-order sequence patterns. Recently, multiple sequence alignment (MSA) and pre-trained language models (PLMs) have been used to generate features from protein sequences. However, the significant computational resources required for MSA-based features generation can be a major bottleneck for many applications. Several methods and tools have been developed to accelerate the generation of MSAs and reduce their computational cost, including heuristics and approximate algorithms. Additionally, the use of PLMs such as BERT has shown great potential in generating informative embeddings for protein sequence analysis. In this review, we provide an overview of traditional and more recent methods for generating features from protein sequences, with a particular focus on MSAs and PLMs. We highlight the advantages and limitations of these approaches and discuss the methods and tools developed to address the computational challenges associated with features generation. Overall, the advancements in computational methods and tools provide a promising avenue for gaining deeper insights into the function and properties of membrane proteins, which can have significant implications in drug discovery and personalized medicine.  相似文献   

9.

Background  

Protein remote homology detection is a central problem in computational biology. Most recent methods train support vector machines to discriminate between related and unrelated sequences and these studies have introduced several types of kernels. One successful approach is to base a kernel on shared occurrences of discrete sequence motifs. Still, many protein sequences fail to be classified correctly for a lack of a suitable set of motifs for these sequences.  相似文献   

10.
Protein design has become a powerful approach for understanding the relationship between amino acid sequence and 3-dimensional structure. In the past 5 years, there have been many breakthroughs in the development of computational methods that allow the selection of novel sequences given the structure of a protein backbone. Successful design of protein scaffolds has now paved the way for new endeavors to design function. The ability to design sequences compatible with a fold may also be useful in structural and functional genomics by expanding the range of proteins used for fold recognition and for the identification of functionally important domains from multiple sequence alignments.  相似文献   

11.
12.
The structural genomics projects have been accumulating an increasing number of protein structures, many of which remain functionally unknown. In parallel effort to experimental methods, computational methods are expected to make a significant contribution for functional elucidation of such proteins. However, conventional computational methods that transfer functions from homologous proteins do not help much for these uncharacterized protein structures because they do not have apparent structural or sequence similarity with the known proteins. Here, we briefly review two avenues of computational function prediction methods, i.e. structure-based methods and sequence-based methods. The focus is on our recent developments of local structure-based and sequence-based methods, which can effectively extract function information from distantly related proteins. Two structure-based methods, Pocket-Surfer and Patch-Surfer, identify similar known ligand binding sites for pocket regions in a query protein without using global protein fold similarity information. Two sequence-based methods, protein function prediction and extended similarity group, make use of weakly similar sequences that are conventionally discarded in homology based function annotation. Combined together with experimental methods we hope that computational methods will make leading contribution in functional elucidation of the protein structures.  相似文献   

13.
In the cell, protein folding into stable globular conformations is in competition with aggregation into non-functional and usually toxic structures, since the biophysical properties that promote folding also tend to favor intermolecular contacts, leading to the formation of β-sheet-enriched insoluble assemblies. The formation of protein deposits is linked to at least 20 different human disorders, ranging from dementia to diabetes. Furthermore, protein deposition inside cells represents a major obstacle for the biotechnological production of polypeptides. Importantly, the aggregation behavior of polypeptides appears to be strongly influenced by the intrinsic properties encoded in their sequences and specifically by the presence of selective short regions with high aggregation propensity. This allows computational methods to be used to analyze the aggregation properties of proteins without the previous requirement for structural information. Applications range from the identification of individual amyloidogenic regions in disease-linked polypeptides to the analysis of the aggregation properties of complete proteomes. Herein, we review these theoretical approaches and illustrate how they have become important and useful tools in understanding the molecular mechanisms underlying protein aggregation.  相似文献   

14.
With the continuing accomplishments of the human genome project, high-throughput strategies to identify DNA sequences that are important in mammalian gene regulation are becoming increasingly feasible. In contrast to the historic, labour-intensive, wet-laboratory methods for identifying regulatory sequences, many modern approaches are heavily focused on the computational analysis of large genomic data sets. Data from inter-species genomic sequence comparisons and genome-wide expression profiling, integrated with various computational tools, are poised to contribute to the decoding of genomic sequence and to the identification of those sequences that orchestrate gene regulation. In this review, we highlight several genomic approaches that are being used to identify regulatory sequences in mammalian genomes.  相似文献   

15.
The human p107 protein shares many structural and functional features with the retinoblastoma gene product and retinoblastoma-related p130 protein. In this study, we have cloned and elucidated the complete intron-exon organization of the gene encoding the p107 protein. The gene contains 22 exons spanning over 100kilobase pairs of genomic DNA. The length of individual exons ranges from 50 to 840base pairs. The arrays of exons in the p107 gene are rather similar among members of the gene family, especially to those of the p130 gene, while the length of introns is extensively diverse. This study will provide a molecular basis for implementing comprehensive screening for p107 mutations using genomic DNAs from human malignancies.We also show a detailed structure of an intragenic deletion of the p107 gene found in a human B-cell lymphoma cell line, KAL-1, which was shown to occur by homologous recombination between the two directly repeated Alu family sequences.  相似文献   

16.
The RNases P and MRP are involved in tRNA and rRNA processing, respectively. Both enzymes in eukaryotes are composed of an RNA molecule and 9–12 protein subunits. Most of the protein subunits are shared between RNases P and MRP. We have here performed a computational analysis of the protein subunits in a broad range of eukaryotic organisms using profile-based searches and phylogenetic methods. A number of novel homologues were identified, giving rise to a more complete inventory of RNase P/MRP proteins. We present evidence of a relationship between fungal Pop8 and the protein subunit families Rpp14/Pop5 as well as between fungal Pop6 and metazoan Rpp25. These relationships further emphasize a structural and functional similarity between the yeast and human P/MRP complexes. We have also identified novel P and MRP RNAs and analysis of all available sequences revealed a K-turn motif in a large number of these RNAs. We suggest that this motif is a binding site for the Pop3/Rpp38 proteins and we discuss other structural features of the RNA subunit and possible relationships to the protein subunit repertoire.  相似文献   

17.
With the exponential growth of genomic sequences, there is an increasing demand to accurately identify protein coding regions (exons) from genomic sequences. Despite many progresses being made in the identification of protein coding regions by computational methods during the last two decades, the performances and efficiencies of the prediction methods still need to be improved. In addition, it is indispensable to develop different prediction methods since combining different methods may greatly improve the prediction accuracy. A new method to predict protein coding regions is developed in this paper based on the fact that most of exon sequences have a 3-base periodicity, while intron sequences do not have this unique feature. The method computes the 3-base periodicity and the background noise of the stepwise DNA segments of the target DNA sequences using nucleotide distributions in the three codon positions of the DNA sequences. Exon and intron sequences can be identified from trends of the ratio of the 3-base periodicity to the background noise in the DNA sequences. Case studies on genes from different organisms show that this method is an effective approach for exon prediction.  相似文献   

18.
A computational system for the prediction and classification of human G-protein coupled receptors (GPCRs) has been developed based on the support vector machine (SVM) method and protein sequence information. The feature vectors used to develop the SVM prediction models consist of statistically significant features selected from single amino acid, dipeptide, and tripeptide compositions of protein sequences. Furthermore, the length distribution difference between GPCRs and non-GPCRs has also been exploited to improve the prediction performance. The testing results with annotated human protein sequences demonstrate that this system can get good performance for both prediction and classification of human GPCRs.  相似文献   

19.
Recently a number of computational approaches have been developed for the prediction of protein–protein interactions. Complete genome sequencing projects have provided the vast amount of information needed for these analyses. These methods utilize the structural, genomic, and biological context of proteins and genes in complete genomes to predict protein interaction networks and functional linkages between proteins. Given that experimental techniques remain expensive, time-consuming, and labor-intensive, these methods represent an important advance in proteomics. Some of these approaches utilize sequence data alone to predict interactions, while others combine multiple computational and experimental datasets to accurately build protein interaction maps for complete genomes. These methods represent a complementary approach to current high-throughput projects whose aim is to delineate protein interaction maps in complete genomes. We will describe a number of computational protocols for protein interaction prediction based on the structural, genomic, and biological context of proteins in complete genomes, and detail methods for protein interaction network visualization and analysis.  相似文献   

20.
Grover D  Kannan K  Brahmachari SK  Mukerji M 《Genetica》2005,124(2-3):273-289
Elucidation of complete nucleotide sequence of the human has revealed that coding sequences that store the information needed to synthesize functional proteins, occupy only 2% of the genomic region. The remaining 98%, barring few regulatory sequences, has been referred to as non-functional or junk DNA and consists of many kinds of repeat elements. In fact, human genome is the most repeat rich genome sequenced so far, in which more than half of the region is occupied by such sequences. Determination of significance of these repeats in the human genome has become the focus of many studies all over the world, especially after genome sequencing did not reveal any significant difference in coding regions between lower eukaryotes and human. In this article, we have focused on Alu repeats that are primate specific elements with many interesting biological properties. Moreover, these are the repeats with highest copy number in the human genome. We have highlighted different facets of their interaction with the genome and changing paradigms regarding their role in genome organization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号