首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Quantitative trait locus (QTL) analysis is a powerful method for localizing disease genes, but identifying the causal gene remains difficult. Rodent models of disease facilitate QTL gene identification, and causal genes underlying rodent QTL are often associated with the corresponding human diseases. Recently developed bioinformatics methods, including comparative genomics, combined cross analysis, interval-specific and genome-wide haplotype analysis, followed by sequence and expression analysis, each facilitated by public databases, provide new tools for narrowing rodent QTLs. Here we discuss each tool, illustrate its application and generate a bioinformatics strategy for narrowing QTLs. Combining these bioinformatics tools with classical experimental methods should accelerate QTL gene identification.  相似文献   

2.

Background  

The structural genomics centers provide hundreds of protein structures of unknown function. Therefore, developing methods enabling the determination of a protein function automatically is imperative. The determination of a protein function can be achieved by studying the network of its physical interactions. In this context, identifying a potential binding site between proteins is of primary interest. In the literature, methods for predicting a potential binding site location generally are based on classification tools. The aim of this paper is to show that regression tools are more efficient than classification tools for patches based binding site predictors. For this purpose, we developed a patches based binding site localization method usable with either regression or classification tools.  相似文献   

3.
With the availability of the nearly complete genomic sequence of C. elegans, the first multicellular organism to be sequenced, molecular biology has definitely entered the postgenomic era. Annotation of the genomic sequence, which refers to identifying the genes and other biologically relevant sections of the genome, is an important and nontrivial next step. A first-pass annotation will be necessarily incomplete but will drive further biological experiments, which in turn will help to annotate the genome better. Given the scale of the genome sequence analysis, it is clear that the annotation should be automated as much as possible without sacrificing the quality of analysis. In this work, we outline our approach to identifying the protein kinases of C. elegans from the genomic sequence. We describe new tools we have developed for analysis, management and visualization of genomic data. By developing modular and scalable solutions, this study has provided a framework for future analysis of the Drosophila and human genomes.  相似文献   

4.
Membrane proteins play a crucial role in various cellular processes and are essential components of cell membranes. Computational methods have emerged as a powerful tool for studying membrane proteins due to their complex structures and properties that make them difficult to analyze experimentally. Traditional features for protein sequence analysis based on amino acid types, composition, and pair composition have limitations in capturing higher-order sequence patterns. Recently, multiple sequence alignment (MSA) and pre-trained language models (PLMs) have been used to generate features from protein sequences. However, the significant computational resources required for MSA-based features generation can be a major bottleneck for many applications. Several methods and tools have been developed to accelerate the generation of MSAs and reduce their computational cost, including heuristics and approximate algorithms. Additionally, the use of PLMs such as BERT has shown great potential in generating informative embeddings for protein sequence analysis. In this review, we provide an overview of traditional and more recent methods for generating features from protein sequences, with a particular focus on MSAs and PLMs. We highlight the advantages and limitations of these approaches and discuss the methods and tools developed to address the computational challenges associated with features generation. Overall, the advancements in computational methods and tools provide a promising avenue for gaining deeper insights into the function and properties of membrane proteins, which can have significant implications in drug discovery and personalized medicine.  相似文献   

5.
6.
Commonly, 16S ribosome RNA (16S rRNA) sequence analysis has been used for identifying enteric bacteria. However, it may not always be applicable for distinguishing closely related bacteria. Therefore, we selected gyrB genes that encode the subunit B protein of DNA gyrase (a topoisomerase type II protein) as target genes. The molecular evolution rate of gyrB genes is higher than that of 16S rRNA, and gyrB genes are distributed universally among bacterial species. Microarray technology includes the methods of arraying cDNA or oligonucleotides on substrates such as glass slides while acquiring a lot of information simultaneously. Thus, it is possible to identify the enteric bacteria easily using microarray technology. We devised a simple method of rapidly identifying bacterial species through the combined use of gyrB genes and microarrays. Closely related bacteria were not identified at the species level using 16S rRNA sequence analysis, whereas they were identified at the species level based on the reaction patterns of oligonucleotides on our microarrays using gyrB genes.  相似文献   

7.
Protein structure prediction   总被引:2,自引:0,他引:2  
The prediction of protein structure, based primarily on sequence and structure homology, has become an increasingly important activity. Homology models have become more accurate and their range of applicability has increased. Progress has come, in part, from the flood of sequence and structure information that has appeared over the past few years, and also from improvements in analysis tools. These include profile methods for sequence searches, the use of three-dimensional structure information in sequence alignment and new homology modeling tools, specifically in the prediction of loop and side-chain conformations. There have also been important advances in understanding the physical chemical basis of protein stability and the corresponding use of physical chemical potential functions to identify correctly folded from incorrectly folded protein conformations.  相似文献   

8.

Background  

MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data.  相似文献   

9.
The Molecular Evolutionary Genetics Analysis (MEGA) software is a desktop application designed for comparative analysis of homologous gene sequences either from multigene families or from different species with a special emphasis on inferring evolutionary relationships and patterns of DNA and protein evolution. In addition to the tools for statistical analysis of data, MEGA provides many convenient facilities for the assembly of sequence data sets from files or web-based repositories, and it includes tools for visual presentation of the results obtained in the form of interactive phylogenetic trees and evolutionary distance matrices. Here we discuss the motivation, design principles and priorities that have shaped the development of MEGA. We also discuss how MEGA might evolve in the future to assist researchers in their growing need to analyze large data set using new computational methods.  相似文献   

10.
A motif is a short DNA or protein sequence that contributes to the biological function of the sequence in which it resides. Over the past several decades, many computational methods have been described for identifying, characterizing and searching with sequence motifs. Critical to nearly any motif-based sequence analysis pipeline is the ability to scan a sequence database for occurrences of a given motif described by a position-specific frequency matrix. RESULTS: We describe Find Individual Motif Occurrences (FIMO), a software tool for scanning DNA or protein sequences with motifs described as position-specific scoring matrices. The program computes a log-likelihood ratio score for each position in a given sequence database, uses established dynamic programming methods to convert this score to a P-value and then applies false discovery rate analysis to estimate a q-value for each position in the given sequence. FIMO provides output in a variety of formats, including HTML, XML and several Santa Cruz Genome Browser formats. The program is efficient, allowing for the scanning of DNA sequences at a rate of 3.5 Mb/s on a single CPU. Availability and Implementation: FIMO is part of the MEME Suite software toolkit. A web server and source code are available at http://meme.sdsc.edu.  相似文献   

11.
Advancements in sequencing technologies have empowered recent efforts to identify polymorphisms and mutations on a global scale. The large number of variations and mutations found in these projects requires high-throughput tools to identify those that are most likely to have an impact on function. Numerous computational tools exist for predicting which mutations are likely to be functional, but none that specifically attempt to identify mutations that result in hyperactivation or gain-of-function. Here we present a modified version of the SIFT (Sorting Intolerant from Tolerant) algorithm that utilizes protein sequence alignments with homologous sequences to identify functional mutations based on evolutionary fitness. We show that this bi-directional SIFT (B-SIFT) is capable of identifying experimentally verified activating mutants from multiple datasets. B-SIFT analysis of large-scale cancer genotyping data identified potential activating mutations, some of which we have provided detailed structural evidence to support. B-SIFT could prove to be a valuable tool for efforts in protein engineering as well as in identification of functional mutations in cancer.  相似文献   

12.
ProteoMix is a suite of JAVA programs for identifying, annotating and predicting regions of interest in large sets of amino acid sequences, according to systematic and consistent criteria. It is based on two concepts (1) the integration of results from different sequence analysis tools increases the prediction reliability; and (2) the integration protocol is critical and needs to be easily adaptable in a case-by-case manner. ProteoMix was designed to analyze simultaneously multiple protein sequences using several bioinformatics tools, merge the results of the analyses using logical functions and display them on an integrated viewer. In addition, new sequences can be added seamlessly to an analysis performed on an initial set of sequences. ProteoMix has a modular design, and bioinformatics tools are run on remote servers accessed using the Internet Simple Object Access Protocol (SOAP), ensuring the swift implementation of additional tools. ProteoMix has a user-friendly interactive graphical user interface environment and runs on PCs with Microsoft OS. AVAILABILITY: ProteoMix is freely available for academic users at http://bio.gsc.riken.jp/ProteoMix/  相似文献   

13.
We analyze the effect of different environmental conditions, sequence lengths and starting configurations on the folding and unfolding pathways of small peptides exhibiting beta turns. We use chignolin and a sequence of peptide G as examples. A variety of different analysis tools allows us to characterize the changes in the folding pathways. It is observed that different harmonic modes dominate not only for different conditions but also for different starting points. The modes remain essentially very similar but their relative importance varies. A detailed analysis from diverse viewpoints including the influence of the particular amino acid sequence, conformational aspects as well as the associated motions yields a global picture that is consistent with experimental evidence and theoretical studies published elsewhere. Patterns of modes that remain stable over a range of temperatures might serve as an additional diagnostic to identify conformations that have reliably adopted a native fold. This could aid in reconstructing the folding process of a complete protein by identifying conformationally determined regions.  相似文献   

14.
With the accomplishment of human genome sequencing, the number of sequence-known proteins has increased explosively. In contrast, the pace is much slower in determining their biological attributes. As a consequence, the gap between sequence-known proteins and attribute-known proteins has become increasingly large. The unbalanced situation, which has critically limited our ability to timely utilize the newly discovered proteins for basic research and drug development, has called for developing computational methods or high-throughput automated tools for fast and reliably identifying various attributes of uncharacterized proteins based on their sequence information alone. Actually, during the last two decades or so, many methods in this regard have been established in hope to bridge such a gap. In the course of developing these methods, the following things were often needed to consider: (1) benchmark dataset construction, (2) protein sample formulation, (3) operating algorithm (or engine), (4) anticipated accuracy, and (5) web-server establishment. In this review, we are to discuss each of the five procedures, with a special focus on the introduction of pseudo amino acid composition (PseAAC), its different modes and applications as well as its recent development, particularly in how to use the general formulation of PseAAC to reflect the core and essential features that are deeply hidden in complicated protein sequences.  相似文献   

15.
AMIGene: Annotation of MIcrobial Genes   总被引:11,自引:0,他引:11       下载免费PDF全文
AMIGene (Annotation of MIcrobial Genes) is an application for automatically identifying the most likely coding sequences (CDSs) in a large contig or a complete bacterial genome sequence. The first step in AMIGene is dedicated to the construction of Markov models that fit the input genomic data (i.e. the gene model), followed by the combination of well-known gene-finding methods and an heuristic approach for the selection of the most likely CDSs. The web interface allows the user to select one or several gene models applied to the analysis of the input sequence by the AMIGene program and to visualize the list of predicted CDSs graphically and in a downloadable text format. The AMIGene web site is accessible at the following address: http://www.genoscope.cns.fr/agc/tools/amigene/index.html (Contact: sbocs@genoscope.cns.fr).  相似文献   

16.
Exact identification of complementarity determining regions (CDRs) is crucial for understanding and manipulating antigenic interactions. One way to do this is by marking residues on the antibody that interact with B cell epitopes on the antigen. This, of course, requires identification of B cell epitopes, which could be done by marking residues on the antigen that bind to CDRs, thus requiring identification of CDRs. To circumvent this vicious circle, existing tools for identifying CDRs are based on sequence analysis or general biophysical principles. Often, these tools, which are based on partial data, fail to agree on the boundaries of the CDRs. Herein we present an automated procedure for identifying CDRs and B cell epitopes using consensus structural regions that interact with the antigens in all known antibody-protein complexes. Consequently, we provide the first comprehensive analysis of all CDR-epitope complexes of known three-dimensional structure. The CDRs we identify only partially overlap with the regions suggested by existing methods. We found that the general physicochemical properties of both CDRs and B cell epitopes are rather peculiar. In particular, only four amino acids account for most of the sequence of CDRs, and several types of amino acids almost never appear in them. The secondary structure content and the conservation of B cell epitopes are found to be different than previously thought. These characteristics of CDRs and epitopes may be instrumental in choosing which residues to mutate in experimental search for epitopes. They may also assist in computational design of antibodies and in predicting B cell epitopes.  相似文献   

17.
Molecular methods, including conventional PCR, real-time PCR, denaturing gradient gel electrophoresis, fluorescent fragment detection PCR, and fluorescent in situ hybridization, have all been developed for use in identifying and studying the distribution of the toxic dinoflagellates Pfiesteria piscicida and P. shumwayae. Application of the methods has demonstrated a worldwide distribution of both species and provided insight into their environmental tolerance range and temporal changes in distribution. Genetic variability among geographic locations generally appears low in rDNA genes, and detection of the organisms in ballast water is consistent with rapid dispersal or high gene flow among populations, but additional sequence data are needed to verify this hypothesis. The rapid development and application of these tools serves as a model for study of other microbial taxa and provides a basis for future development of tools that can simultaneously detect multiple targets.  相似文献   

18.
19.
20.
As experimental technologies for characterization of proteomes emerge, bioinformatic analysis of the data becomes essential. Separation and identification technologies currently based on two-dimensional gels/mass spectrometry provide the inherent analytical power required. This strategy involves protein spot digestion and accurate mass mapping together with computational interrogation of available data bases for protein functional identification. When either no exact match is found or when the possible matches only partially account for molecular weights actually observed, peptide sequencing by tandem mass spectrometry has emerged as the methodology of choice to provide the basic additional information required. To evaluate the capabilities of bioinformatics methods employed for identifying homologs of a protein of interest, we attempted to identify the major proteins from the 20 S proteasome of Trypanosoma brucei using sequence information determined using mass spectrometry. The results suggest that neither the traditional query engines, BLAST and FASTA, nor specialized software developed for analysis of sequence information obtained by mass spectrometry are able to identify even closely related sequences at statistically significant scores. To address this deficit, new bioinformatics approaches were developed for concomitant use of the multiple fragments of short sequence typically available from methods of tandem mass spectrometry. These approaches rely on the occurrence of congruence across searches of multiple fragments from a single protein. This method resulted in sharply better statistical significance values for correct hits in the data base output relative to that achieved for independent searches using single sequence fragments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号