首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The review considers the computational prediction of functionally related proteins by comparative genomics. Growing possibilities of biotechnology for genome sequencing lead to generation of sequences for millions of genes. However, functions of majority of these genes remain unknown, and can be determined experimentally only for a few of them. Therefore, accurate and robust methods for in silico prediction (annotation) of gene functions are needed. We describe here the main techniques of comparative genomics, including the standard method based on transferring functions between homologous sequences and also context-based methods, including phylogenetic profiles and gene-neighbor approaches. Modern methods of comparative genomics allow obtaining correct functional annotations for more than a half of all organism proteins.  相似文献   

2.
结构基因组学研究与核磁共振   总被引:4,自引:0,他引:4  
各种生物的基因组DNA测序计划的完成,将结构生物学带入了结构基因组学时代.结构基因组学是对所有基因组产物结构的系统性测定,它运用高通量的选择、表达、纯化以及结构测定和计算分析手段,为基因组的每个蛋白质产物提供实验测定的结构或较好的理论模型,这将加速生命科学各个领域的研究.生物信息学、基因工程、结构测定技术等的发展为结构基因组学研究提供了保证.近年来核磁共振在技术方法上的进展,使其成为结构基因组学高通量结构分析中的一个关键方法.  相似文献   

3.
A team at the Lawrence Livermore National Laboratory (LLNL) was given the task of using computational tools to speed up the development of DNA diagnostics for pathogen detection. This work will be described in another paper in this issue (see pages 133-149). To achieve this goal it was necessary to understand the merits and limitations of the various available comparative genomics tools. A review of some recent tools for multisequence/genome alignment and substring comparison is presented, within the general framework of applicability to a large-scale application. We note that genome alignments are important for many things, only one of which is pathogen detection. Understanding gene function, gene regulation, gene networks, phylogenetic studies and other aspects of evolution all depend on accurate nucleic acid and protein sequence alignment. Selecting appropriate tools can make a large difference in the quality of results obtained and the effort required.  相似文献   

4.
Computational genomics of noncoding RNA genes   总被引:26,自引:0,他引:26  
Eddy SR 《Cell》2002,109(2):137-140
The number of known noncoding RNA genes is expanding rapidly. Computational analysis of genome sequences, which has been revolutionary for protein gene analysis, should also be able to address questions of the number and diversity of noncoding RNA genes. However, noncoding RNAs present computational genomics with a new set of challenges.  相似文献   

5.
The problem of rational target selection for protein structure determination in structural genomics projects on microbes is addressed. A flexible computational procedure is described that directly incorporates the whole body of annotation available in the PEDANT genome database into the sequence clustering and selection process in order to identify proteins that are likely to possess currently unknown structural domains. Filtering out gene products based on predicted structural features, such as known three-dimensional structures and transmembrane regions, allows one to reduce the complexity of neighbor relationships between sequences and all but eliminates the need for further partitioning of single-linkage clusters into disjoint protein groups corresponding to homologous families. The results of a large-scale computation experiment in which exemplary target selection for 32 prokaryotic genomes was conducted are presented.  相似文献   

6.
One of the most complicated remaining problems of molecular-phylogenetic analysis is choosing an appropriate genome region. In an ideal case, such a region should have two specific properties: (i) results of analysis using this region should be similar to the results of multigene analysis using the maximal number of regions; (ii) this region should be arranged compactly and be significantly shorter than the multigene set. The second condition is necessary to facilitate sequencing and extension of taxons under analysis, the number of which is also crucial for molecular phylogenetic analysis. Such regions have been revealed for some groups of animals and have been designated as "lucky genes". We have carried out a computational experiment on analysis of 41 complete chloroplast genomes of flowering plants aimed at searching for a "lucky gene" for reconstruction of their phylogeny. It is shown that the phylogenetic tree inferred from a combination of translated nucleotide sequences of genes encoding subunits of plastid RNA polymerase is closest to the tree constructed using all protein coding sites of the chloroplast genome. The only node for which a contradiction is observed is unstable according to the different type analyses. For all the other genes or their combinations, the coincidence is significantly worse. The RNA polymerase genes are compactly arranged in the genome and are fourfold shorter than the total length of protein coding genes used for phylogenetic analysis. The combination of all necessary features makes this group of genes main candidates for the role of "lucky gene" in studying phylogeny of flowering plants.  相似文献   

7.
The FKBP protein family has prolyl isomerase activity and is related in function to cyclophilins. FKBPs are known to be involved in many biological processes including hormone signaling, plant growth, and stress responses through a chaperone or an isomerization of proline residues during protein folding. The availability of complete peach genome sequences allowed the identification of 21 FKBP genes by HMMER and BLAST analyses. Scaffold locations of these FKBP genes in the peach genome were determined and the protein domain and motif organization of peach FKBPs were analyzed. The phylogenetic relationships between peach FKBPs were also assessed. The expression profiles of peach FKBP gene results revealed that most peach FKBPs were expressed in all tissues, while a few peach FKBPs were specifically expressed in some of the tissues. This data could contribute to better understanding of the complex regulation of the peach FKBP gene family, and also provide valuable information for further research in peach functional genomics.  相似文献   

8.
The relationship between ascomycetes and basidiomycetes, the two main phyla of non-flagellated fungi, has rarely been investigated. In this study, we performed a comparative genomics analysis of genome sequences of 55 ascomycetes and 26 basidiomycetes species and detected 81 universal markers, 875 homologous genes and a conserved contig in the glucose-regulated protein gene. In dendrograms based on simple sequence repeat markers and homologous genes, ascomycetes and basidiomycetes formed distinct clusters, with each set of taxa having a high coefficient of relatedness. Ascomycetes and basidiomycetes also constituted distinct groups in a phylogenetic tree based on a conserved contig in the glucose-regulated protein gene. These results provide evidence that basidiomycetes may be derived from ascomycetes but are definitely genetically differentiated at the genomic level. The phylogenetic relationships of ascomycetes and basidiomycetes uncovered in this study provide new insights for future research related to fungal classification and evolution.  相似文献   

9.
MOTIVATION: The evolution of viruses is very rapid and in addition to local point mutations (insertion, deletion, substitution) it also includes frequent recombinations, genome rearrangements and horizontal transfer of genetic materials (HGTS). Evolutionary analysis of viral sequences is therefore a complicated matter for two main reasons: First, due to HGTs and recombinations, the right model of evolution is a network and not a tree. Second, due to genome rearrangements, an alignment of the input sequences is not guaranteed. These facts encourage developing methods for inferring phylogenetic networks that do not require aligned sequences as input. RESULTS: In this work, we present the first computational approach which deals with both genome rearrangements and horizontal gene transfers and does not require a multiple alignment as input. We formalize a new set of computational problems which involve analyzing such complex models of evolution. We investigate their computational complexity, and devise algorithms for solving them. Moreover, we demonstrate the viability of our methods on several synthetic datasets as well as four biological datasets. AVAILABILITY: The code is available from the authors upon request.  相似文献   

10.
11.
为了理清丝兰属(Yucca)叶绿体基因组特征和序列变异情况,进行丝兰属植物叶绿体比较基因组学分析,并构建基于叶绿体基因组的系统发育树。利用高通量测序技术获得无刺龙舌兰(Y. treculeana)叶绿体基因组序列,结合丝兰属现已发表的叶绿体基因组,使用生物信息学方法对6种丝兰属植物叶绿体全基因组进行基本结构、重复序列、边界收缩与扩张以及序列变异分析等在内的比较基因组学研究,并进行系统发育分析。结果表明:6种丝兰属植物叶绿体基因组大小、基因的类型及数目相近,种间基因组结构比较保守;从丝兰属植物叶绿体基因组中检测到多条重复序列,其中SSR位点多是由单核苷酸、双核苷酸和四核苷酸组成,且偏好使用A、T碱基;根据核酸多态性指数π≥0.008,在6种丝兰属植物叶绿体基因组中筛选出了psbK-psbl-trnS-GCUrpl20-rps12ccsA-ndhD 3个高变异区域;基于叶绿体全基因组和LSC+SSC区序列构建的系统发育关系基本一致,确定了6种丝兰属植物间的系统发育关系,其中无刺龙舌兰与克雷塔罗丝兰(Y. queretaroensis)的亲缘关系最近。本研究测序获得了无刺龙舌兰叶绿体基因组,揭示了6种丝兰属植物叶绿体基因组特征和序列变异情况,明确了各物种间的亲缘关系,研究结果可为后续丝兰属植物分子标记开发及系统发育研究提供参考。  相似文献   

12.
There are a large number of ‘non‐family’ (NF) genes that do not cluster into families with three or more members per genome. While gene families have been extensively studied, a systematic analysis of NF genes has not been reported. We performed comparative studies on NF genes in 14 plant species. Based on the clustering of protein sequences, we identified ~94 000 NF genes across these species that were divided into five evolutionary groups: Viridiplantae wide, angiosperm specific, monocot specific, dicot specific, and those that were species specific. Our analysis revealed that the NF genes resulted largely from less frequent gene duplications and/or a higher rate of gene loss after segmental duplication relative to genes in both low‐copy‐number families (LF; 3–10 copies per genome) and high‐copy‐number families (HF; >10 copies). Furthermore, we identified functions enriched in the NF gene set as compared with the HF genes. We found that NF genes were involved in essential biological processes shared by all plant lineages (e.g. photosynthesis and translation), as well as gene regulation and stress responses associated with phylogenetic diversification. In particular, our analysis of an Arabidopsis protein–protein interaction network revealed that hub proteins with the top 10% most connections were over‐represented in the NF set relative to the HF set. This research highlights the roles that NF genes may play in evolutionary and functional genomics research.  相似文献   

13.
Cross-species comparative genomics approaches have been employed to map and clone many important disease resistance (R) genes from Solanum species-especially wild relatives of potato and tomato. These efforts will increase with the recent release of potato genome sequence and the impending release of tomato genome sequence. Most R genes belong to the prominent nucleotide binding site-leucine rich repeat (NBS-LRR) class and conserved NBS-LRR protein motifs enable survey of the R gene space of a plant genome by generation of resistance gene analogs (RGA), polymerase chain reaction fragments derived from R genes. We generated a collection of 97 RGA from the disease-resistant wild potato S. bulbocastanum, complementing smaller collections from other Solanum species. To further comparative genomics approaches, we combined all known Solanum RGA and cloned solanaceous NBS-LRR gene sequences, nearly 800 sequences in total, into a single meta-analysis. We defined R gene diversity bins that reflect both evolutionary relationships and DNA cross-hybridization results. The resulting framework is amendable and expandable, providing the research community with a common vocabulary for present and future study of R gene lineages. Through a series of sequence and hybridization experiments, we demonstrate that all tested R gene lineages are of ancient origin, are shared between Solanum species, and can be successfully accessed via comparative genomics approaches.  相似文献   

14.
Functional and structural genomics using PEDANT   总被引:11,自引:0,他引:11  
MOTIVATION: Enormous demand for fast and accurate analysis of biological sequences is fuelled by the pace of genome analysis efforts. There is also an acute need in reliable up-to-date genomic databases integrating both functional and structural information. Here we describe the current status of the PEDANT software system for high-throughput analysis of large biological sequence sets and the genome analysis server associated with it. RESULTS: The principal features of PEDANT are: (i) completely automatic processing of data using a wide range of bioinformatics methods, (ii) manual refinement of annotation, (iii) automatic and manual assignment of gene products to a number of functional and structural categories, (iv) extensive hyperlinked protein reports, and (v) advanced DNA and protein viewers. The system is easily extensible and allows to include custom methods, databases, and categories with minimal or no programming effort. PEDANT is actively used as a collaborative environment to support several on-going genome sequencing projects. The main purpose of the PEDANT genome database is to quickly disseminate well-organized information on completely sequenced and unfinished genomes. It currently includes 80 genomic sequences and in many cases serves as the only source of exhaustive information on a given genome. The database also acts as a vehicle for a number of research projects in bioinformatics. Using SQL queries, it is possible to correlate a large variety of pre-computed properties of gene products encoded in complete genomes with each other and compare them with data sets of special scientific interest. In particular, the availability of structural predictions for over 300 000 genomic proteins makes PEDANT the most extensive structural genomics resource available on the web.  相似文献   

15.
MOTIVATION: Practitioners of comparative genomics face huge analytical challenges as whole genome sequences and functional/expression data accumulate. Furthermore, the field would greatly benefit from a better integration of this wealth of data with evolutionary concepts. RESULTS: Here, we present MANTIS, a relational database for the analysis of (i) gains and losses of genes on specific branches of the metazoan phylogeny, (ii) reconstructed genome content of ancestral species and (iii) over- or under-representation of functions/processes and tissue specificity of gained, duplicated and lost genes. MANTIS estimates the most likely positions of gene losses on the true phylogeny using a maximum-likelihood function. A user-friendly interface and an extensive query system allow to investigate questions pertaining to gene identity, phylogenetic mapping and function/expression parameters. AVAILABILITY: MANTIS is freely available at http://www.mantisdb.org and constitutes the missing link between multi-species genome comparisons and functional analyses.  相似文献   

16.
17.
Comparisons of mitochondrial gene sequences and gene arrangements can be informative for reconstructing high-level phylogenetic relationships. We determined the complete sequence of the mitochondrial genome of Siphonodentalium lobatum, (Mollusca, Scaphopoda). With only 13,932 bases, it is the shortest molluscan mitochondrial genome reported so far. The genome contains the usual 13 protein-coding genes, two rRNA and 22 tRNA genes. The ATPase subunit 8 gene is exceptionally short. Several transfer RNAs show truncated TpsiC arms or DHU arms. The gene arrangement of S. lobatum is markedly different from all other known molluscan mitochondrial genomes and shows low similarity even to an unpublished gene order of a dentaliid scaphopod. Phylogenetic analyses of all available complete molluscan mitochondrial genomes based on amino acid sequences of 11 protein-coding genes yield trees with low support for the basal branches. None of the traditionally accepted molluscan taxa and phylogenies are recovered in all analyses, except for the euthyneuran Gastropoda. S. lobatum appears as the sister taxon to two of the three bivalve species. We conclude that the deep molluscan phylogeny is probably beyond the resolution of mitochondrial protein sequences. Moreover, assessing the phylogenetic signal in gene order data requires a much larger taxon sample than is currently available, given the exceptional diversity of this character set in the Mollusca.  相似文献   

18.
The complete genome sequences for human, Drosophila melanogaster and Arabidopsis thaliana have been reported recently. With the availability of complete sequences for many bacteria and archaea, and five eukaryotes, comparative genomics and sequence analysis are enabling us to identify counterparts of many human disease genes in model organisms, which in turn should accelerate the pace of research and drug development to combat human diseases. Continuous improvement of specialized protein databases, together with sensitive computational tools, have enhanced the power and reliability of computational prediction of protein function.  相似文献   

19.
20.

Background

Current sequencing technology makes it practical to sequence many samples of a given organism, raising new challenges for the processing and interpretation of large genomics data sets with associated metadata. Traditional computational phylogenetic methods are ideal for studying the evolution of gene/protein families and using those to infer the evolution of an organism, but are less than ideal for the study of the whole organism mainly due to the presence of insertions/deletions/rearrangements. These methods provide the researcher with the ability to group a set of samples into distinct genotypic groups based on sequence similarity, which can then be associated with metadata, such as host information, pathogenicity, and time or location of occurrence. Genotyping is critical to understanding, at a genomic level, the origin and spread of infectious diseases. Increasingly, genotyping is coming into use for disease surveillance activities, as well as for microbial forensics. The classic genotyping approach has been based on phylogenetic analysis, starting with a multiple sequence alignment. Genotypes are then established by expert examination of phylogenetic trees. However, these traditional single-processor methods are suboptimal for rapidly growing sequence datasets being generated by next-generation DNA sequencing machines, because they increase in computational complexity quickly with the number of sequences.

Results

Nephele is a suite of tools that uses the complete composition vector algorithm to represent each sequence in the dataset as a vector derived from its constituent k-mers by passing the need for multiple sequence alignment, and affinity propagation clustering to group the sequences into genotypes based on a distance measure over the vectors. Our methods produce results that correlate well with expert-defined clades or genotypes, at a fraction of the computational cost of traditional phylogenetic methods run on traditional hardware. Nephele can use the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes. We were able to generate a neighbour-joined tree of over 10,000 16S samples in less than 2 hours.

Conclusions

We conclude that using Nephele can substantially decrease the processing time required for generating genotype trees of tens to hundreds of organisms at genome scale sequence coverage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号