首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus’ evolutionary history using public data. We also present matUtils—a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively.  相似文献   

2.
3.
Human mitochondrial DNA (mtDNA) encodes a set of 37 genes which are essential structural and functional components of the electron transport chain. Variations in these genes have been implicated in a broad spectrum of diseases and are extensively reported in literature and various databases. In this study, we describe MitoLSDB, an integrated platform to catalogue disease association studies on mtDNA (http://mitolsdb.igib.res.in). The main goal of MitoLSDB is to provide a central platform for direct submissions of novel variants that can be curated by the Mitochondrial Research Community. MitoLSDB provides access to standardized and annotated data from literature and databases encompassing information from 5231 individuals, 675 populations and 27 phenotypes. This platform is developed using the Leiden Open (source) Variation Database (LOVD) software. MitoLSDB houses information on all 37 genes in each population amounting to 132397 variants, 5147 unique variants. For each variant its genomic location as per the Revised Cambridge Reference Sequence, codon and amino acid change for variations in protein-coding regions, frequency, disease/phenotype, population, reference and remarks are also listed. MitoLSDB curators have also reported errors documented in literature which includes 94 phantom mutations, 10 NUMTs, six documentation errors and one artefactual recombination. MitoLSDB is the largest repository of mtDNA variants systematically standardized and presented using the LOVD platform. We believe that this is a good starting resource to curate mtDNA variants and will facilitate direct submissions enhancing data coverage, annotation in context of pathogenesis and quality control by ensuring non-redundancy in reporting novel disease associated variants.  相似文献   

4.
An important goal in molecular biology is to understand functional changes upon single-point mutations in proteins. Doing so through a detailed characterization of structure spaces and underlying energy landscapes is desirable but continues to challenge methods based on Molecular Dynamics. In this paper we propose a novel algorithm, SIfTER, which is based instead on stochastic optimization to circumvent the computational challenge of exploring the breadth of a protein’s structure space. SIfTER is a data-driven evolutionary algorithm, leveraging experimentally-available structures of wildtype and variant sequences of a protein to define a reduced search space from where to efficiently draw samples corresponding to novel structures not directly observed in the wet laboratory. The main advantage of SIfTER is its ability to rapidly generate conformational ensembles, thus allowing mapping and juxtaposing landscapes of variant sequences and relating observed differences to functional changes. We apply SIfTER to variant sequences of the H-Ras catalytic domain, due to the prominent role of the Ras protein in signaling pathways that control cell proliferation, its well-studied conformational switching, and abundance of documented mutations in several human tumors. Many Ras mutations are oncogenic, but detailed energy landscapes have not been reported until now. Analysis of SIfTER-computed energy landscapes for the wildtype and two oncogenic variants, G12V and Q61L, suggests that these mutations cause constitutive activation through two different mechanisms. G12V directly affects binding specificity while leaving the energy landscape largely unchanged, whereas Q61L has pronounced, starker effects on the landscape. An implementation of SIfTER is made available at http://www.cs.gmu.edu/~ashehu/?q=OurTools. We believe SIfTER is useful to the community to answer the question of how sequence mutations affect the function of a protein, when there is an abundance of experimental structures that can be exploited to reconstruct an energy landscape that would be computationally impractical to do via Molecular Dynamics.  相似文献   

5.
ChiloKey is a matrix-based, interactive key to all 179 species of Geophilomorpha (Chilopoda) recorded from Europe, including species of uncertain identity and those whose morphology is known partially only. The key is intended to assist in identification of subadult and adult specimens, by means of microscopy and simple dissection techniques whenever necessary. The key is freely available through the web at: http://www.biologia.unipd.it/chilokey/ and at http://www.interactive-keys.eu/chilokey/.  相似文献   

6.
Whole-genome sequences are now available for many microbial species and clades, however existing whole-genome alignment methods are limited in their ability to perform sequence comparisons of multiple sequences simultaneously. Here we present the Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Together they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees. Using simulated and real data we demonstrate that our approach exhibits unrivaled speed while maintaining the accuracy of existing methods. The Harvest suite is open-source and freely available from: http://github.com/marbl/harvest.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0524-x) contains supplementary material, which is available to authorized users.  相似文献   

7.
More reliable and faster prediction methods are needed to interpret enormous amounts of data generated by sequencing and genome projects. We have developed a new computational tool, PON-P2, for classification of amino acid substitutions in human proteins. The method is a machine learning-based classifier and groups the variants into pathogenic, neutral and unknown classes, on the basis of random forest probability score. PON-P2 is trained using pathogenic and neutral variants obtained from VariBench, a database for benchmark variation datasets. PON-P2 utilizes information about evolutionary conservation of sequences, physical and biochemical properties of amino acids, GO annotations and if available, functional annotations of variation sites. Extensive feature selection was performed to identify 8 informative features among altogether 622 features. PON-P2 consistently showed superior performance in comparison to existing state-of-the-art tools. In 10-fold cross-validation test, its accuracy and MCC are 0.90 and 0.80, respectively, and in the independent test, they are 0.86 and 0.71, respectively. The coverage of PON-P2 is 61.7% in the 10-fold cross-validation and 62.1% in the test dataset. PON-P2 is a powerful tool for screening harmful variants and for ranking and prioritizing experimental characterization. It is very fast making it capable of analyzing large variant datasets. PON-P2 is freely available at http://structure.bmc.lu.se/PON-P2/.  相似文献   

8.
Inherited haemoglobinopathies are the most common monogenic diseases, with millions of carriers and patients worldwide. At present, we know several hundred disease-causing mutations on the globin gene clusters, in addition to numerous clinically important trans-acting disease modifiers encoded elsewhere and a multitude of polymorphisms with relevance for advanced diagnostic approaches. Moreover, new disease-linked variations are discovered every year that are not included in traditional and often functionally limited locus-specific databases. This paper presents IthaGenes, a new interactive database of haemoglobin variations, which stores information about genes and variations affecting haemoglobin disorders. In addition, IthaGenes organises phenotype, relevant publications and external links, while embedding the NCBI Sequence Viewer for graphical representation of each variation. Finally, IthaGenes is integrated with the companion tool IthaMaps for the display of corresponding epidemiological data on distribution maps. IthaGenes is incorporated in the ITHANET community portal and is free and publicly available at http://www.ithanet.eu/db/ithagenes.  相似文献   

9.

Background

Personal genome assembly is a critical process when studying tumor genomes and other highly divergent sequences. The accuracy of downstream analyses, such as RNA-seq and ChIP-seq, can be greatly enhanced by using personal genomic sequences rather than standard references. Unfortunately, reads sequenced from these types of samples often have a heterogeneous mix of various subpopulations with different variants, making assembly extremely difficult using existing assembly tools. To address these challenges, we developed SHEAR (Sample Heterogeneity Estimation and Assembly by Reference; http://vk.cs.umn.edu/SHEAR), a tool that predicts SVs, accounts for heterogeneous variants by estimating their representative percentages, and generates personal genomic sequences to be used for downstream analysis.

Results

By making use of structural variant detection algorithms, SHEAR offers improved performance in the form of a stronger ability to handle difficult structural variant types and better computational efficiency. We compare against the lead competing approach using a variety of simulated scenarios as well as real tumor cell line data with known heterogeneous variants. SHEAR is shown to successfully estimate heterogeneity percentages in both cases, and demonstrates an improved efficiency and better ability to handle tandem duplications.

Conclusion

SHEAR allows for accurate and efficient SV detection and personal genomic sequence generation. It is also able to account for heterogeneous sequencing samples, such as from tumor tissue, by estimating the subpopulation percentage for each heterogeneous variant.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-84) contains supplementary material, which is available to authorized users.  相似文献   

10.
Viral phylodynamics is defined as the study of how epidemiological, immunological, and evolutionary processes act and potentially interact to shape viral phylogenies. Since the coining of the term in 2004, research on viral phylodynamics has focused on transmission dynamics in an effort to shed light on how these dynamics impact viral genetic variation. Transmission dynamics can be considered at the level of cells within an infected host, individual hosts within a population, or entire populations of hosts. Many viruses, especially RNA viruses, rapidly accumulate genetic variation because of short generation times and high mutation rates. Patterns of viral genetic variation are therefore heavily influenced by how quickly transmission occurs and by which entities transmit to one another. Patterns of viral genetic variation will also be affected by selection acting on viral phenotypes. Although viruses can differ with respect to many phenotypes, phylodynamic studies have to date tended to focus on a limited number of viral phenotypes. These include virulence phenotypes, phenotypes associated with viral transmissibility, cell or tissue tropism phenotypes, and antigenic phenotypes that can facilitate escape from host immunity. Due to the impact that transmission dynamics and selection can have on viral genetic variation, viral phylogenies can therefore be used to investigate important epidemiological, immunological, and evolutionary processes, such as epidemic spread [2], spatio-temporal dynamics including metapopulation dynamics [3], zoonotic transmission, tissue tropism [4], and antigenic drift [5]. The quantitative investigation of these processes through the consideration of viral phylogenies is the central aim of viral phylodynamics.
This is a “Topic Page” article for PLOS Computational Biology.
  相似文献   

11.
In France, Bacillus anthracis subgroup B2 strains do not metabolize starch or glycogen but can use gluconate, whereas subgroup A1 strains show the inverse pattern. Functional genetic analysis revealed that mutations in the amyS and gntK genes encoding an alpha-amylase and a gluconate kinase, respectively, were responsible for these phenotypes.Bacillus anthracis, the etiological agent of anthrax, is a gram-positive, aerobic soil bacterium. Multilocus variable-number tandem repeat analysis of a collection of French isolates shows that the main groups of B. anthracis groups A (subgroup A1) and B (subgroup B2) described worldwide are represented (1, 2). Subgroup B2 isolates are the most common isolates in France and are found particularly in southern mountain regions, but they are extremely rare elsewhere in the world. Biochemical characterization of French isolates indicates that subgroup A1 and B2 strains have different carbohydrate utilization patterns (P. Vaissaire, A. Fouet, K. L. Smith, C. Keys, C. Le Doujet, P. Sylvestre, M. Levy, P. Keim, and M. Mock, presented at the 5th International Conference on Anthrax and 3rd International Workshop on the Molecular Biology of Bacillus cereus, B. anthracis and B. thuringiensis, 30 March to 3 April 2003, Nice, France). French subgroup A1 strains metabolize starch and glycogen but not gluconate, and the inverse is true for subgroup B2 strains. The genomes of several B. anthracis strains are available on the NCBI website (http://www.ncbi.nlm.nih.gov/), and two of these strains, Ames and CNEVA, are representative of groups A and B, respectively. We compared the genomic sequences of Ames and CNEVA to identify mutations that may affect metabolic activities involved in the phenotypic differences.The Kegg pathway database (http://www.genome.jp/kegg/pathway.html) was used to select enzyme activities involved in the metabolic pathways for starch, glycogen, and gluconate. BLAST analysis of the corresponding open reading frame in the Ames (subgroup A3) and CNEVA (subgroup B2) genomes was then used to identify the selected genes that were interrupted or mutated. The functions and localizations of these open reading frames were then investigated with the Pfam (http://pfam.sanger.ac.uk/), CDD (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml), SMART (http://smart.embl-heidelberg.de/), SignalP (http://www.cbs.dtu.dk/services/SignalP/), and TMHMM (http://www.cbs.dtu.dk/services/TMHMM-2.0/) search programs. A single-base deletion in the amyS gene (BA3551) encoding an alpha-amylase linked to starch and glycogen metabolism was found in the CNEVA genome. The wild-type AmyS protein contains 513 amino acids, and its predicted molecular mass is 58.4 kDa. In subgroup B2, there is a frameshift due to deletion of an adenosine in the 7th position of the nucleotide sequence that leads to a premature stop codon in the 13th position. In the Ames genome, a single-base substitution was found in the gntK gene (BA0162) encoding a gluconate kinase linked to gluconate metabolism. The predicted wild-type GntK protein contains 511 amino acids, and its predicted molecular mass is 56.7 kDa. The mutation identified is a cytosine-to-adenosine substitution at position 530 of the nucleotide sequence that leads to a premature stop codon at amino acid position 176. We confirmed the presence of these two mutations in the other B. anthracis subgroup genomes accessible in the NCBI unfinished microbial genome database and sequenced 12 isolates with various genotypes belonging to subgroups A1 and B2 (6 isolates in each subgroup) originating from outbreaks that occurred in different regions of France over the last 15 years. These analyses revealed that the deletion in amyS is restricted to strains belonging to group B subgroups, whereas the substitution in gntK is restricted to strains belonging to group A subgroups. The mutations identified in amyS and gntK both result in premature stop codons that lead to a loss of the enzymatic activities and may thus account for the observed phenotypic differences between subgroup A1 and B2 strains. We therefore focused on these two genes and used French strains 9602R and RA3R belonging to subgroups A1 and B2, respectively, for further analysis.  相似文献   

12.
Cytokines are subdivided in 12 sub-families and are described as multi-functional molecules that play an important biological activity in host defense system against pathogens, in homeostasis, tissue repair, cell growth and development. CytokineDB is an annotated database that collects biological information regarding the cytokines family in human and will be periodically updated by including new biological information. This database is freely available online and can be accessed at the URL: http://www.cro-m.eu/CytokineDB/  相似文献   

13.
As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.  相似文献   

14.
Saccharomyces cerevisiae Spt6 protein is a conserved chromatin factor with several distinct functional domains, including a natively unstructured 30-residue N-terminal region that binds competitively with Spn1 or nucleosomes. To uncover physiological roles of these interactions, we isolated histone mutations that suppress defects caused by weakening Spt6:Spn1 binding with the spt6-F249K mutation. The strongest suppressor was H2A-N39K, which perturbs the point of contact between the two H2A-H2B dimers in an assembled nucleosome. Substantial suppression also was observed when the H2A-H2B interface with H3-H4 was altered, and many members of this class of mutations also suppressed a defect in another essential histone chaperone, FACT. Spt6 is best known as an H3-H4 chaperone, but we found that it binds with similar affinity to H2A-H2B or H3-H4. Like FACT, Spt6 is therefore capable of binding each of the individual components of a nucleosome, but unlike FACT, Spt6 did not produce endonuclease-sensitive reorganized nucleosomes and did not displace H2A-H2B dimers from nucleosomes. Spt6 and FACT therefore have distinct activities, but defects can be suppressed by overlapping histone mutations. We also found that Spt6 and FACT together are nearly as abundant as nucleosomes, with ∼24,000 Spt6 molecules, ∼42,000 FACT molecules, and ∼75,000 nucleosomes per cell. Histone mutations that destabilize interfaces within nucleosomes therefore reveal multiple spatial regions that have both common and distinct roles in the functions of these two essential and abundant histone chaperones. We discuss these observations in terms of different potential roles for chaperones in both promoting the assembly of nucleosomes and monitoring their quality.  相似文献   

15.
In this study, we analyse the relevance of harvestmen distribution data derived from opportunistic, unplanned, and non-standardised collection events in an area in the north of the Iberian Peninsula. Using specimens deposited in the BOS Arthropod Collection at the University of Oviedo, we compared these data with data from planned, standardised, and periodic collections with pitfall traps in several locations in the same area. The Arthropod Collection, begun in 1977, includes specimens derived from both sampling types, and its recent digitisation allows for this type of comparative analysis. Therefore, this is the first data-paper employing a hybrid approach, wherein subset metadata are described alongside a comparative analysis. The full dataset can be accessed through Spanish GBIF IPT at http://www.gbif.es:8080/ipt/archive.do?r=Bos-Opi, and the metadata of the unplanned collection events at http://www.gbif.es:8080/ipt/resource.do?r=bos-opi_unplanned_collection_events. We have mapped the data on the 18 harvestmen species included in the unplanned collections and provided records for some species in six provinces for the first time. We have also provided the locations of Phalangium opilio in eight provinces without published records. These results highlight the importance of digitising data from unplanned biodiversity collections, as well as those derived from planned collections, especially in scarcely studied groups and areas.  相似文献   

16.
We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment—previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches—yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.  相似文献   

17.
Copper is a micronutrient essential for growth due to its role as a cofactor in enzymes involved in respiration, defense against oxidative damage, and iron uptake. Yet too much of a good thing can be lethal, and yeast cells typically do not have tolerance to copper levels much beyond the concentration in their ancestral environment. Here, we report a short-term evolutionary study of Saccharomyces cerevisiae exposed to levels of copper sulfate that are inhibitory to the initial strain. We isolated and identified adaptive mutations soon after they arose, reducing the number of neutral mutations, to determine the first genetic steps that yeast take when adapting to copper. We analyzed 34 such strains through whole-genome sequencing and by assaying fitness within different environments; we also isolated a subset of mutations through tetrad analysis of four lines. We identified a multilayered evolutionary response. In total, 57 single base-pair mutations were identified across the 34 lines. In addition, gene amplification of the copper metallothionein protein, CUP1-1, was rampant, as was chromosomal aneuploidy. Four other genes received multiple, independent mutations in different lines (the vacuolar transporter genes VTC1 and VTC4; the plasma membrane H+-ATPase PMA1; and MAM3, a protein required for normal mitochondrial morphology). Analyses indicated that mutations in all four genes, as well as CUP1-1 copy number, contributed significantly to explaining variation in copper tolerance. Our study thus finds that evolution takes both common and less trodden pathways toward evolving tolerance to an essential, but highly toxic, micronutrient.  相似文献   

18.
PathVisio is a commonly used pathway editor, visualization and analysis software. Biological pathways have been used by biologists for many years to describe the detailed steps in biological processes. Those powerful, visual representations help researchers to better understand, share and discuss knowledge. Since the first publication of PathVisio in 2008, the original paper was cited more than 170 times and PathVisio was used in many different biological studies. As an online editor PathVisio is also integrated in the community curated pathway database WikiPathways.Here we present the third version of PathVisio with the newest additions and improvements of the application. The core features of PathVisio are pathway drawing, advanced data visualization and pathway statistics. Additionally, PathVisio 3 introduces a new powerful extension systems that allows other developers to contribute additional functionality in form of plugins without changing the core application.PathVisio can be downloaded from http://www.pathvisio.org and in 2014 PathVisio 3 has been downloaded over 5,500 times. There are already more than 15 plugins available in the central plugin repository. PathVisio is a freely available, open-source tool published under the Apache 2.0 license (http://www.apache.org/licenses/LICENSE-2.0). It is implemented in Java and thus runs on all major operating systems. The code repository is available at http://svn.bigcat.unimaas.nl/pathvisio. The support mailing list for users is available on https://groups.google.com/forum/#!forum/wikipathways-discuss and for developers on https://groups.google.com/forum/#!forum/wikipathways-devel.
This is a PLOS Computational Biology software article.
  相似文献   

19.

Background

With the advance of next generation sequencing (NGS) technologies, a large number of insertion and deletion (indel) variants have been identified in human populations. Despite much research into variant calling, it has been found that a non-negligible proportion of the identified indel variants might be false positives due to sequencing errors, artifacts caused by ambiguous alignments, and annotation errors.

Results

In this paper, we examine indel redundancy in dbSNP, one of the central databases for indel variants, and develop a standalone computational pipeline, dubbed Vindel, to detect redundant indels. The pipeline first applies indel position information to form candidate redundant groups, then performs indel mutations to the reference genome to generate corresponding indel variant substrings. Finally the indel variant substrings in the same candidate redundant groups are compared in a pairwise fashion to identify redundant indels. We applied our pipeline to check for redundancy in the human indels in dbSNP. Our pipeline identified approximately 8% redundancy in insertion type indels, 12% in deletion type indels, and overall 10% for insertions and deletions combined. These numbers are largely consistent across all human autosomes. We also investigated indel size distribution and adjacent indel distance distribution for a better understanding of the mechanisms generating indel variants.

Conclusions

Vindel, a simple yet effective computational pipeline, can be used to check whether a set of indels are redundant with respect to those already in the database of interest such as NCBI’s dbSNP. Of the approximately 5.9 million indels we examined, nearly 0.6 million are redundant, revealing a serious limitation in the current indel annotation. Statistics results prove the consistency of the pipeline on indel redundancy detection for all 22 chromosomes. Apart from the standalone Vindel pipeline, the indel redundancy check algorithm is also implemented in the web server http://bioinformatics.cs.vt.edu/zhanglab/indelRedundant.php.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0359-1) contains supplementary material, which is available to authorized users.  相似文献   

20.
Selenoproteins are proteins containing an uncommon amino acid selenocysteine (Sec). Sec is inserted by a specific translational machinery that recognizes a stem-loop structure, the SECIS element, at the 3′ UTR of selenoprotein genes and recodes a UGA codon within the coding sequence. As UGA is normally a translational stop signal, selenoproteins are generally misannotated and designated tools have to be developed for this class of proteins. Here, we present two new computational methods for selenoprotein identification and analysis, which we provide publicly through the web servers at http://gladyshevlab.org/SelenoproteinPredictionServer or http://seblastian.crg.es. SECISearch3 replaces its predecessor SECISearch as a tool for prediction of eukaryotic SECIS elements. Seblastian is a new method for selenoprotein gene detection that uses SECISearch3 and then predicts selenoprotein sequences encoded upstream of SECIS elements. Seblastian is able to both identify known selenoproteins and predict new selenoproteins. By applying these tools to diverse eukaryotic genomes, we provide a ranked list of newly predicted selenoproteins together with their annotated cysteine-containing homologues. An analysis of a representative candidate belonging to the AhpC family shows how the use of Sec in this protein evolved in bacterial and eukaryotic lineages.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号