首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
All the protein sequences from SWISS-PROT database were analyzed for occurrence of single amino acid repeats, tandem oligo-peptide repeats, and periodically conserved amino acids. Single amino acid repeats of glutamine, serine, glutamic acid, glycine, and alanine seem to be tolerated to a considerable extent in many proteins. Tandem oligo-peptide repeats of different types with varying levels of conservation were detected in several proteins and found to be conspicuous, particularly in structural and cell surface proteins. It appears that repeated sequence patterns may be a mechanism that provides regular arrays of spatial and functional groups, useful for structural packing or for one to one interactions with target molecules. To facilitate further explorations, a database of Tandem Repeats in Protein Sequences (TRIPS) has been developed and is available at URL: http://www.ncl-india.org/trips.  相似文献   

3.
MOTIVATION: One of the most interesting features of genomes (both coding and non-coding regions) is the presence of relatively short tandemly repeated DNA sequences known as tandem repeats (TRs). We developed a new PC-based stand-alone software analysis program, combining sequence motif searches with keywords such as organs, tissues, cell lines or development stages for finding exact, inexact and compound, TRs. Tandem Repeats Analyzer 1.5 (TRA) has several advanced repeat search parameters/options over other repeat finder programs as it does not only accept GenBank, FASTA and expressed sequence tag (EST) sequence files but also does analysis of multifiles with multisequences. Advanced user-defined parameters/options let the researchers use different motif lengths search criteria for varying motif lengths simultaneously. The outputs show statistical results to be evaluated by the user. The discovery of TRs in ESTs could be useful for both gene mapping and association studies and discovering TRs located in coding regions of important genes that are expressed under various conditions of environment, stress, organ, tissue and development stage. RESULTS: In this paper, we demonstrated applications of TRA using 175 899 ESTs sequences for three Arabidopsis spp. downloaded from GenBank. The EST-SSRs/ESTs ratios were found 43.1%, 15.3% and 2.34% in A.lyrata, A.thaliana and A.halleri, respectively. Analysis revealed that organs, tissues and development stages possessed different amounts of repeats and repeat compositions. This indicated that the distribution of TRs among the tissues or organs may not be random differing from the untranscribed repeats found in genomes. AVAILABILITY: The program can be obtained free by anonymous FTP from ftp.akdeniz.edu.tr/Araclar/TRA.  相似文献   

4.
GoPipe: 批量序列的Gene Ontology 注释和统计分析   总被引:7,自引:0,他引:7       下载免费PDF全文
随着后基因组时代的到来,批量的测序,特别是 EST 的测序,逐渐成为普通实验室的日常工作 . 这些新的序列往往需要进行批量的 Gene Ontology (GO) 的注释及随后的统计分析 . 但是目前除了 Goblet 以外,并没有软件适合对未知序列进行批量的 GO 注释,而 GoBlet 因为具有上载量的限制,以及仅仅利用 BLAST 作为预测工具,所以仍有许多不足之处 . 开发了一个软件包 GoPipe ,通过整合 BLAST 和 InterProScan 的结果来进行序列注释,并提供了进一步作统计比较的工具 . 主程序接收任意个 BLAST 和 InterProScan 的结果文件,并依次进行文本分析、数据整合、去除冗余、统计分析和显示等工作 . 还提供了统计的工具来比较不同输入对 GO 的分布来挖掘生物学意义 . 另外,在交集工作模式下,程序取 InterProScan 和 BLAST 结果的交集, 在测试数据集中,其精确度达到 99.1% ,这大大超过了 InterProScan 本身对 GO 预测的精确度,而敏感度只是稍微下降 . 较高的精确度、较快的速度和较大的灵活性使它成为对未知序列进行批量 Gene Ontology 注释的理想的工具 . 上述软件包可以在网站 (http://gopipe.fishgenome.org/ ) 免费获得或者与作者联系获取 .  相似文献   

5.

Background

Polymorphic tandem repeat typing is a new generic technology which has been proved to be very efficient for bacterial pathogens such as B. anthracis, M. tuberculosis, P. aeruginosa, L. pneumophila, Y. pestis. The previously developed tandem repeats database takes advantage of the release of genome sequence data for a growing number of bacteria to facilitate the identification of tandem repeats. The development of an assay then requires the evaluation of tandem repeat polymorphism on well-selected sets of isolates. In the case of major human pathogens, such as S. aureus, more than one strain is being sequenced, so that tandem repeats most likely to be polymorphic can now be selected in silico based on genome sequence comparison.

Results

In addition to the previously described general Tandem Repeats Database, we have developed a tool to automatically identify tandem repeats of a different length in the genome sequence of two (or more) closely related bacterial strains. Genome comparisons are pre-computed. The results of the comparisons are parsed in a database, which can be conveniently queried over the internet according to criteria of practical value, including repeat unit length, predicted size difference, etc. Comparisons are available for 16 bacterial species, and the orthopox viruses, including the variola virus and three of its close neighbors.

Conclusions

We are presenting an internet-based resource to help develop and perform tandem repeats based bacterial strain typing. The tools accessible at http://minisatellites.u-psud.fr now comprise four parts. The Tandem Repeats Database enables the identification of tandem repeats across entire genomes. The Strain Comparison Page identifies tandem repeats differing between different genome sequences from the same species. The "Blast in the Tandem Repeats Database" facilitates the search for a known tandem repeat and the prediction of amplification product sizes. The "Bacterial Genotyping Page" is a service for strain identification at the subspecies level.
  相似文献   

6.
Summary Using the polymorphic DNA probes, ChdTC-15, ChdTC-114, pYNH24, and λTM-18, a DNA profiling system was developed that verified identities of individual cultured cell lines collected in the Japanese cell banks, JCRB, RCB, and IFO. These highly polymorphic DNA probes include both VNTR (Variable Number of Tandem Repeats) sequences and substantial lengths of unique regions. In the mixed probe system, several distinct bands from four to eight can be used for cell line identification. These bands were widely spread in a range of molecular sizes, and were stable and reproducible under stringent conditions of Southern blot hybridization. Because the DNA profile was specific for each individual human cell line, it is useful not only to authenticate many existing cultured cell lines but also to monitor their identity during propagation in a laboratory, and to confirm newly established lines as unique.  相似文献   

7.
Combined evidence annotation of transposable elements in genome sequences   总被引:1,自引:0,他引:1  
Transposable elements (TEs) are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated "TE models" in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1), and we found a substantially higher number of TEs (n = 6,013) than previously identified (n = 1,572). Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1). We also estimated that 518 TE copies (8.6%) are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other species in the genus Drosophila.  相似文献   

8.
Cross-contamination of eukaryotic cell lines used in biomedical research represents a highly relevant problem. Analysis of repetitive DNA sequences, such as Short Tandem Repeats (STR), or Simple Sequence Repeats (SSR), is a widely accepted, simple, and commercially available technique to authenticate cell lines. However, it provides only qualitative information that depends on the extent of reference databases for interpretation. In this work, we developed and validated a rapid and routinely applicable method for evaluation of cell culture cross-contamination levels based on mass spectrometric fingerprints of intact mammalian cells coupled with artificial neural networks (ANNs). We used human embryonic stem cells (hESCs) contaminated by either mouse embryonic stem cells (mESCs) or mouse embryonic fibroblasts (MEFs) as a model. We determined the contamination level using a mass spectra database of known calibration mixtures that served as training input for an ANN. The ANN was then capable of correct quantification of the level of contamination of hESCs by mESCs or MEFs. We demonstrate that MS analysis, when linked to proper mathematical instruments, is a tangible tool for unraveling and quantifying heterogeneity in cell cultures. The analysis is applicable in routine scenarios for cell authentication and/or cell phenotyping in general.  相似文献   

9.
The aim of the present study is to evaluate the efficiency of three methods to determine the molecular diversity of 34 Mycobacterium avium subsp. paratuberculosis (MAP) strains isolated from 17 cattle herds. The applied methods included the analysis of sequence polymorphism of the mononucleotide (G1 and G2) and trinucleotide sequences (GGT) of the Short Sequence Repeats (SSR) and the determination of size polymorphism of 9 different Mycobacterial Interspersed Repetitive Units (MIRU) and 6 Variable Number Tandem Repeats (VNTR). Sequence analysis of SSR of 34 isolates showed 4, 6, and 2 alleles of G1, G2, and GGT repeats, respectively. The amplification of the investigated 9 MIRU units revealed only two discriminatory genotyping systems (MIRU2 and MIRU3). Out of 6 VNTR PCR differentiation methods, only one method could be recommended for genotyping purposes. The profile 7g-12g-4ggt-II-b-2 of the combination systems G1-G2-GGT-MIRU2-MIRU3-VNTR1658 dominates among the examined isolates and was detected in 14.7% of the isolates. The use of certain repetitive loci of SSR, MIRU, and VNTR techniques in this study showed greater potential than others for the characterization of MAP isolates. The recommended loci can be used for the epidemiological tracing of MAP field strains and to determine the relationships between isolates in different herds.  相似文献   

10.
New approach for isolation of VNTR markers.   总被引:18,自引:3,他引:15       下载免费PDF全文
Elsewhere we have reported an efficient method for isolating VNTR (Variable Number of Tandem Repeats) markers. Several of the VNTR markers isolated in those experiments were sequenced, and a DNA sequence of 9 bp (GNNGTGGG) emerged as an apparent consensus sequence for VNTR markers. To confirm this result and to develop more VNTR markers, we synthesized nine different 18-base-long oligonucleotides whose sequences each included GNNGTGGG. When 102 cosmid clones selected by these oligonucleotides were tested for polymorphism, 34 (33%) of them showed multiallelic VNTR polymorphisms (average heterozygosity 68%). This procedure represents a new and efficient approach for isolating additional VNTR markers and supports the idea that the GNNGTGGG sequence may play an important role in the generation of the multiallelic systems within the human genome.  相似文献   

11.
Because genetically monomorphic bacterial pathogens harbour little DNA sequence diversity, most current genotyping techniques used to study the epidemiology of these organisms are based on mobile or repetitive genetic elements. Molecular markers commonly used in these bacteria include Clustered Regulatory Short Palindromic Repeats (CRISPR) and Variable Number Tandem Repeats (VNTR). These methods are also increasingly being applied to phylogenetic and population genetic studies. Using the Mycobacterium tuberculosis complex (MTBC) as a model, we evaluated the phylogenetic accuracy of CRISPR- and VNTR-based genotyping, which in MTBC are known as spoligotyping and Mycobacterial Interspersed Repetitive Units (MIRU)-VNTR-typing, respectively. We used as a gold standard the complete DNA sequences of 89 coding genes from a global strain collection. Our results showed that phylogenetic trees derived from these multilocus sequence data were highly congruent and statistically robust, irrespective of the phylogenetic methods used. By contrast, corresponding phylogenies inferred from spoligotyping or 15-loci-MIRU-VNTR were incongruent with respect to the sequence-based trees. Although 24-loci-MIRU-VNTR performed better, it was still unable to detect all strain lineages. The DNA sequence data showed virtually no homoplasy, but the opposite was true for spoligotyping and MIRU-VNTR, which was consistent with high rates of convergent evolution and the low statistical support obtained for phylogenetic groupings defined by these markers. Our results also revealed that the discriminatory power of the standard 24 MIRU-VNTR loci varied by strain lineage. Taken together, our findings suggest strain lineages in MTBC should be defined based on phylogenetically robust markers such as single nucleotide polymorphisms or large sequence polymorphisms, and that for epidemiological purposes, MIRU-VNTR loci should be used in a lineage-dependent manner. Our findings have implications for strain typing in other genetically monomorphic bacteria.  相似文献   

12.
随着流感病毒基因组测序数据的急剧增加,深入挖掘流感病毒基因组大数据蕴含的生物学信息成为研究热点。基于中国流感病毒流行特征数据,建设一个集自动化、一体化和信息化的序列库系统,对于实现流感病毒基因组批量快速翻译、注释、存储、查询、分析具有重要的应用价值。本课题组通过集成一系列软件和工具包,并结合自主研发的其他功能,在底层维护的2个关键的参考数据集基础上另外追加了翻译注释信息最佳匹配的精细化筛选规则,构建具有流感病毒基因组信息存储、自动化翻译、蛋白序列精准注释、同源序列比对和进化树分析等功能的自动化系统。结果显示,通过Web端输入fasta格式的流感病毒基因序列,本系统可针对参考序列片段数据集(blastdb.fasta)进行Blast同源性检索,可以鉴定流感病毒的型别(A、B或C)、亚型和基因片段(1~8片段);在此基础上,通过查询数据库底层用于翻译、注释的基因片段参考数据集,可以获得一组肽段数据集,然后通过循环调用ProSplign软件对其进行预测。结合精细化的筛选准入规则,选出与输入序列匹配最好的翻译后产物,作为该输入序列的预测蛋白,输出为gbk,asn和fasta等通用格式的文件,给出序列长度、是否全长、病毒型别、亚型、片段等信息。基于以上工作,另外自主研发了系统其他的附加功能如进化树分析展示、基因组数据存储等功能,构建成基于Web服务的流感病毒基因组自动化翻译注释系统。本研究提示,系统高度集成系列软件以及自有的注释翻译数据库文件,实现从序列存储、翻译、注释到序列分析和展示的功能,可全面满足我国高通量基因检测数据共享化、本土化、一体化、自动化的需求。  相似文献   

13.
The widespread use of mass spectrometry for protein identification has created a demand for computationally efficient methods of matching mass spectrometry data to protein databases. A search using X!Tandem, a popular and representative program, can require hours or days to complete, particularly when missed cleavages and post-translational modifications are considered. Existing techniques for accelerating X!Tandem by employing parallelism are unsatisfactory for a variety of reasons. The paper describes a parallelization of X!Tandem, called X!!Tandem, that shows excellent speedups on commodity hardware and produces the same results as the original program. Furthermore, the parallelization technique used is unusual and potentially useful for parallelizing other complex programs.  相似文献   

14.
gff2aplot: Plotting sequence comparisons   总被引:1,自引:0,他引:1  
SUMMARY: gff2aplot is a program to visualize the alignment of two sequences together with their annotations. Input for the program consists of single or multiple files in GFF-format which specify the alignment coordinates and annotation features of both sequences. Output is in PostScript format of any size. The features to be displayed are highly customizable to meet user specific needs. The program serves to generate print-quality images for comparative genome sequence analysis. AVAILABILITY: gff2aplot is freely available under the GNU software licence and can be downloaded from the address specified below. Supplementary information: http://genome.imim.es/software/gfftools/GFF2APLOT.html  相似文献   

15.
MOTIVATION: Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. RESULTS: The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. AVAILABILITY: BFAB is available at http://mips.gsf.de/proj/bfab  相似文献   

16.
MOTIVATION: Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or 'semantic similarity' between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. A measure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repertoire of analyses. RESULTS: We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases. AVAILABILITY: Software available from http://www.russet.org.uk.  相似文献   

17.
Although the transport of model proteins across the mammalian ER can be reconstituted with purified Sec61p complex, TRAM, and signal recognition particle receptor, some substrates, such as the prion protein (PrP), are inefficiently or improperly translocated using only these components. Here, we purify a factor needed for proper translocation of PrP and identify it as the translocon-associated protein (TRAP) complex. Surprisingly, TRAP also stimulates vectorial transport of many, but not all, other substrates in a manner influenced by their signal sequences. Comparative analyses of several natural signal sequences suggest that a dependence on TRAP for translocation is not due to any single physical parameter, such as hydrophobicity of the signal sequence. Instead, a functional property of the signal, efficiency of its post-targeting role in initiating substrate translocation, correlates inversely with TRAP dependence. Thus, maximal translocation independent of TRAP can only be achieved with a signal sequence, such as the one from prolactin, whose strong interaction with the translocon mediates translocon gating shortly after targeting. These results identify the TRAP complex as a functional component of the translocon and demonstrate that it acts in a substrate-specific manner to facilitate the initiation of protein translocation.  相似文献   

18.
Towards multidimensional genome annotation   总被引:1,自引:0,他引:1  
Our information about the gene content of organisms continues to grow as more genomes are sequenced and gene products are characterized. Sequence-based annotation efforts have led to a list of cellular components, which can be thought of as a one-dimensional annotation. With growing information about component interactions, facilitated by the advancement of various high-throughput technologies, systemic, or two-dimensional, annotations can be generated. Knowledge about the physical arrangement of chromosomes will lead to a three-dimensional spatial annotation of the genome and a fourth dimension of annotation will arise from the study of changes in genome sequences that occur during adaptive evolution. Here we discuss all four levels of genome annotation, with specific emphasis on two-dimensional annotation methods.  相似文献   

19.
Polymorphic minisatellites, also known as variable number of tandem repeats (VNTRs), are tandem repeat regions that show variation in the number of repeat units among chromosomes in a population. Currently, there are no general methods for predicting which minisatellites have a high probability of being polymorphic, given their sequence characteristics. An earlier approach has focused on potentially highly polymorphic and hypervariable minisatellites, which make up only a small fraction of all minisatellites in the human genome. We have developed a model, based on available minisatellite and VNTR sequence data, that predicts the probability that a minisatellite (unit size > or = 6 bp) identified by the computer program Tandem Repeats Finder is polymorphic (VNTR). According to the model, minisatellites with high copy number and high degree of sequence similarity are most likely to be VNTRs. This approach was used to scan the draft sequence of the human genome for VNTRs. A total of 157,549 minisatellite repeats were found, of which 29,224 are predicted to be VNTRs. Contrary to previous results, VNTRs appear to be widespread and abundant throughout the human genome, with an estimated density of 9.1 VNTRs/Mb.  相似文献   

20.
The functional annotation of the new protein sequences represents a major drawback for genomic science. The best way to suggest the function of a protein from its sequence is by finding a related one for which biological information is available. Current alignment algorithms display a list of protein sequence stretches presenting significant similarity to different protein targets, ordered by their respective mathematical scores. However, statistical and biological significance do not always coincide, therefore, the rearrangement of the program output according to more biological characteristics than the mathematical scoring would help functional annotation. A new method that predicts the putative function for the protein integrating the results from the PSI-BLAST program and a fuzzy logic algorithm is described. Several protein sequence characteristics have been checked in their ability to rearrange a PSI-BLAST profile according more to their biological functions. Four of them: amino acid content, matched segment length and hydropathic and flexibility profiles positively contributed, upon being integrated by a fuzzy logic algorithm into a program, BYPASS, to the accurate prediction of the function of a protein from its sequence. Antonio Gómez and Juan Cedano contributed equally to this work.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号