首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Sequence annotation is essential for genomics-based research. Investigators of a specific genomic region who have developed abundant local discoveries such as genes and genetic markers, or have collected annotations from multiple resources, can be overwhelmed by the difficulty in creating local annotation and the complexity of integrating all the annotations. Presenting such integrated data in a form suitable for data mining and high-throughput experimental design is even more daunting. DNannotator, a web application, was designed to perform batch annotation on a sizeable genomic region. It takes annotation source data, such as SNPs, genes, primers, and so on, prepared by the end-user and/or a specified target of genomic DNA, and performs de novo annotation. DNannotator can also robustly migrate existing annotations in GenBank format from one sequence to another. Annotation results are provided in GenBank format and in tab-delimited text, which can be imported and managed in a database or spreadsheet and combined with existing annotation as desired. Graphic viewers, such as Genome Browser or Artemis, can display the annotation results. Reference data (reports on the process) facilitating the user's evaluation of annotation quality are optionally provided. DNannotator can be accessed at http://sky.bsd.uchicago.edu/DNannotator.htm.  相似文献   

2.
We describe a rapid and cost-effective technique for the in vitro removal of introns and other unwanted regions from genomic DNA to generate a single sequence of continuous coding capacity, where tissues required for RNA extraction and complementary DNA synthesis are unavailable. Based on an overlapping fusion-PCR strategy, we name this procedure SPLICE (for swift PCR for ligating in vitro constructed exons). As proof-of-principle, we used SPLICE successfully to generate a single piece of DNA containing the coding region of a five-exon gene, the short-wavelength-sensitive 1 (SWS1) opsin gene, from genomic DNA extracted from the brown lemur, Eulemur fulvus, in only two short rounds of PCR. Where the genomic structure and sequence is known, this technique may be universally applied to any gene expressed in any organism to generate a practical unit for investigating the function of a particular gene of interest. In this report, we provide a detailed protocol, experimental considerations, and suggestions for troubleshooting.  相似文献   

3.
The Homeodomain Resource is a comprehensive collection of sequence, structure and genomic information on the homeodomain protein family. Available through the Resource are both full-length and domain-only sequence data, as well as X-ray and NMR structural data for proteins and protein-DNA complexes. Also available is information on human genetic diseases and disorders in which proteins from the homeodomain family play an important role; genomic information includes relevant gene symbols, cytogenetic map locations, and specific mutation data. Search engines are provided to allow users to easily query the component databases and assemble specialized data sets. The Homeodomain Resource is available through the World Wide Web at http://genome.nhgri.nih.gov/homeodomain  相似文献   

4.
ANTHEPROT is a fully interactive program devoted to the analysis of protein structures using a graphics workstation. It presents four options: The first option can predict secondary structures using five methods, and hydrophobicity, solvent accessibility, flexibility and antigenicity profiles using eighteen scales. The user may introduce his own scales. The results displayed on the screen can be easily analyzed. The second option is for representing results concerning up to eight proteins by one method. To compare these proteins, it is possible to align the profiles or the predicted secondary structure according to various motifs. The secondary structure deduced from crystallographic data may also be introduced. The third option is designed to compare the primary structure of two proteins and to visualize on the screen regions that exhibit similarity. Six different comparison matrices may be used, but the user can also introduce his own matrices. The last option is for studying the proteolytic peptides resulting from a chemical or enzymatic digestion of a given protein. It is possible to analyze the protein cleavage using eleven chemical reagents or enzymes. The results are displayed on the screen as RP-HPLC chromatogram.  相似文献   

5.
6.
A Monte Carlo method has been developed for generating the conformations of short single-stranded DNAs from arbitrary starting states. The chain conformers are constructed from energetically favorable arrangements of the constituent mononucleotides. Minimum energy states of individual dinucleotide monophosphate molecules are identified using a torsion angle minimizer. The glycosyl and acyclic backbone torsions of the dimers are allowed to vary, while the sugar rings are held fixed in one of the two preferred puckered forms. A total of 108 conformationally distinct states per dimer are considered in this first stage of minimization. The torsion angles within 5 kcal/mole of the global minimum in the resulting optimized states are then allowed to vary by ±10° in an effort to estimate the breadth of the different local minima. The energies of a total of 2187 (37) angle combinations are examined per local conformational minimum. Finally, the energies of all dinucleotide conformers are scaled so that the populations of differently puckered sugar rings in the theoretical sample match those found in nmr solution studies. This last step is necessitated by limitations in the theoretical methods to predict DNA sugar puckering accurately. The conformer populations of the individual acyclic torsion angles in the composite dimer ensembles are found to be in good agreement with the distributions of backbone conformations deduced from nmr coupling constants and the frequencies of glycosyl conformations in x-ray crystal structures, suggesting that the low energy states are reasonable. The low energy dimer forms (consisting of 150–325 conformational states per dimer step) are next used as variables in a Monte Carlo algorithm, which generates the conformations of single-stranded d(CXnG) chains, where X = A, T and n = 3, 4, 5. The oligonucleotides are built sequentially from the 5′ end of the chain using random numbers to select the conformations of overlapping dimer units. The simulations are very fast, involving a total of 106 conformations per chain sequence. The potential errors in the buildup procedure are minimized by taking advantage of known rotational interdependences in the sugar–phosphate backbone. The distributions of oligonucleotide conformations are examined in terms of the magnitudes, positions, and orientations of the end-to-end vectors of the chains. The differences in overall flexibility and extension of the oligomers are discussed in terms of the conformations of the constituent dinucleotide steps, while the general methodology is discussed and compared with other nucleic acid model building techniques. © 1993 John Wiley & Sons, Inc.  相似文献   

7.
The Homeodomain Resource is an annotated collection of non-redundant protein sequences, three-dimensional structures and genomic information for the homeodomain protein family. Release 3.0 contains 795 full-length homeodomain-containing sequences, 32 experimentally-derived structures and 143 homeo-box loci implicated in human genetic disorders. Entries are fully hyperlinked to facilitate easy retrieval of the original records from source databases. A simple search engine with a graphical user interface is provided to query the component databases and assemble customized data sets. A new feature for this release is the addition of DNA recognition sites for all human homeodomain proteins described in the literature. The Homeodomain Resource is freely available through the World Wide Web at http://genome.nhgri.nih.gov/homeodomain.  相似文献   

8.
PepLine is a fully automated software which maps MS/MS fragmentation spectra of trypsic peptides to genomic DNA sequences. The approach is based on Peptide Sequence Tags (PSTs) obtained from partial interpretation of QTOF MS/MS spectra (first module). PSTs are then mapped on the six-frame translations of genomic sequences (second module) giving hits. Hits are then clustered to detect potential coding regions (third module). Our work aimed at optimizing the algorithms of each component to allow the whole pipeline to proceed in a fully automated manner using raw nucleic acid sequences (i.e., genomes that have not been "reduced" to a database of ORFs or putative exons sequences). The whole pipeline was tested on controlled MS/MS spectra sets from standard proteins and from Arabidopsis thaliana envelope chloroplast samples. Our results demonstrate that PepLine competed with protein database searching softwares and was fast enough to potentially tackle large data sets and/or high size genomes. We also illustrate the potential of this approach for the detection of the intron/exon structure of genes.  相似文献   

9.
QGENE: software for marker-based genomic analysis and breeding   总被引:15,自引:0,他引:15  
Efficient use of DNA markers for genomic research and crop improvement will depend as much on computational tools as on laboratory technology. The large size and multidimensional character of marker datasets invite novel approaches to data visualization. Described here is a software application embodying two design principles: conventional reduction of raw genetic marker data to numerical summary statistics, and fast, interactive graphical display of both data and statistics. The program performs various analyses for mapping quantitative-trait loci in real or simulated datasets and other analyses in aid of phenotypic and marker-assisted breeding. Functionality is described and some output is illustrated.  相似文献   

10.
MOTIVATION: MethylCoder is a software program that generates per-base methylation data given a set of bisulfite-treated reads. It provides the option to use either of two existing short-read aligners, each with different strengths. It accounts for soft-masked alignments and overlapping paired-end reads. MethylCoder outputs data in text and binary formats in addition to the final alignment in SAM format, so that common high-throughput sequencing tools can be used on the resulting output. It is more flexible than existing software and competitive in terms of speed and memory use. AVAILABILITY: MethylCoder requires only a python interpreter and a C compiler to run. Extensive documentation and the full source code are available under the MIT license at: https://github.com/brentp/methylcode. CONTACT: bpederse@gmail.com.  相似文献   

11.
MOTIVATION: The annotation of the Arabidopsis thaliana genome remains a problem in terms of time and quality. To improve the annotation process, we want to choose the most appropriate tools to use inside a computer-assisted annotation platform. We therefore need evaluation of prediction programs with Arabidopsis sequences containing multiple genes. RESULTS: We have developed AraSet, a data set of contigs of validated genes, enabling the evaluation of multi-gene models for the Arabidopsis genome. Besides conventional metrics to evaluate gene prediction at the site and the exon levels, new measures were introduced for the prediction at the protein sequence level as well as for the evaluation of gene models. This evaluation method is of general interest and could apply to any new gene prediction software and to any eukaryotic genome. The GeneMark.hmm program appears to be the most accurate software at all three levels for the Arabidopsis genomic sequences. Gene modeling could be further improved by combination of prediction software. AVAILABILITY: The AraSet sequence set, the Perl programs and complementary results and notes are available at http://sphinx.rug.ac.be:8080/biocomp/napav/. CONTACT: Pierre.Rouze@gengenp.rug.ac.be.  相似文献   

12.

Background  

Repeat-rich regions such as centromeres receive less attention than their gene-rich euchromatic counterparts because the former are difficult to assemble and analyze. Our objectives were to 1) map all ten centromeres onto the maize genetic map and 2) characterize the sequence features of maize centromeres, each of which spans several megabases of highly repetitive DNA. Repetitive sequences can be mapped using special molecular markers that are based on PCR with primers designed from two unique "repeat junctions". Efficient screening of large amounts of maize genome sequence data for repeat junctions, as well as key centromere sequence features required the development of specific annotation software.  相似文献   

13.
Che D  Hasan MS  Wang H  Fazekas J  Huang J  Liu Q 《Bioinformation》2011,7(6):311-314
Genomic islands (GIs) are genomic regions that are originally transferred from other organisms. The detection of genomic islands in genomes can lead to many applications in industrial, medical and environmental contexts. Existing computational tools for GI detection suffer either low recall or low precision, thus leaving the room for improvement. In this paper, we report the development of our Ensemble algorithm for Genomic Island Detection (EGID). EGID utilizes the prediction results of existing computational tools, filters and generates consensus prediction results. Performance comparisons between our ensemble algorithm and existing programs have shown that our ensemble algorithm is better than any other program. EGID was implemented in Java, and was compiled and executed on Linux operating systems. EGID is freely available at http://www5.esu.edu/cpsc/bioinfo/software/EGID.  相似文献   

14.
The past decade has witnessed the construction of linkage and physical maps defining quantitative trait loci (QTL) in various domesticated species. Targeted chromosomal regions are being further characterized through the construction of bacterial artificial chromosome (BAC) contigs in order to isolate and characterize genes contributing towards phenotypic variation. Whole-genome BAC contigs are also being constructed that will serve as the tiling path for genomic sequencing. Harvesting this genetic information for biological gain requires either genetic selection or the production of genetically modified animals. This later approach when coupled with nuclear transfer technology (NT) provides "clones" of genetically modified animals. However, to date, the production of genetically modified animals has been limited to either microinjection of small gene constructs into embryos with random insertion or complex gene constructs designed to knock-out targeted gene expression. Neither of these approaches provides for introducing directed genetic manipulation allowing for allelic substitution [knock-in], subsequent analyses of gene expression, and cloning. An alternative approach utilizing genomic sequence information and recombineering to direct gene targeting of specific porcine BACs is presented here.  相似文献   

15.
We present a software system BASIO that allows one to segment a sequence into regions with homogeneous nucleotide composition at a desired length scale. The system can work with arbitrary alphabet and therefore can be applied to various (e.g. protein) sequences. Several sequences of complete genomes of eukaryotes are used to demonstrate the efficiency of the software. AVAILABILITY: The BASIO suite is available for non-commercial users free of charge as a set of executables and accompanying segmentation scenarios from http://www.imb.ac.ru/compbio/basio. To obtain the source code, contact the authors.  相似文献   

16.
17.
Determining the taxonomic affiliation of sequences assembled from metagenomes remains a major bottleneck that affects research across the fields of environmental, clinical and evolutionary microbiology. Here, we introduce MyTaxa, a homology-based bioinformatics framework to classify metagenomic and genomic sequences with unprecedented accuracy. The distinguishing aspect of MyTaxa is that it employs all genes present in an unknown sequence as classifiers, weighting each gene based on its (predetermined) classifying power at a given taxonomic level and frequency of horizontal gene transfer. MyTaxa also implements a novel classification scheme based on the genome-aggregate average amino acid identity concept to determine the degree of novelty of sequences representing uncharacterized taxa, i.e. whether they represent novel species, genera or phyla. Application of MyTaxa on in silico generated (mock) and real metagenomes of varied read length (100–2000 bp) revealed that it correctly classified at least 5% more sequences than any other tool. The analysis also showed that ∼10% of the assembled sequences from human gut metagenomes represent novel species with no sequenced representatives, several of which were highly abundant in situ such as members of the Prevotella genus. Thus, MyTaxa can find several important applications in microbial identification and diversity studies.  相似文献   

18.
Hasan MS  Liu Q  Wang H  Fazekas J  Chen B  Che D 《Bioinformation》2012,8(4):203-205
Genomic Islands (GIs) are genomic regions that are originally from other organisms, through a process known as Horizontal Gene Transfer (HGT). Detection of GIs plays a significant role in biomedical research since such align genomic regions usually contain important features, such as pathogenic genes. We have developed a use friendly graphic user interface, Genomic Island Suite of Tools (GIST), which is a platform for scientific users to predict GIs. This software package includes five commonly used tools, AlienHunter, IslandPath, Colombo SIGI-HMM, INDeGenIUS and Pai-Ida. It also includes an optimization program EGID that ensembles the result of existing tools for more accurate prediction. The tools in GIST can be used either separately or sequentially. GIST also includes a downloadable feature that facilitates collecting the input genomes automatically from the FTP server of the National Center for Biotechnology Information (NCBI). GIST was implemented in Java, and was compiled and executed on Linux/Unix operating systems. AVAILABILITY: The database is available for free at http://www5.esu.edu/cpsc/bioinfo/software/GIST.  相似文献   

19.
Xu H  Yang L  Xu P  Tao Y  Ma Z 《Proteomics》2007,7(2):177-179
cTrans is a comprehensive utility used to generate polypeptide databases from cDNA sequences. The goal is achieved through integrating four main functions, including retrieving sequences of species of interest from the downloaded packages from dbEST of GenBank, format conversion, checking and deleting vector and adaptor contamination, and translating the cDNA sequences in all six frames and selecting specific translations for database construction in a user-defined length threshold. In addition, this utility is also applicable to cDNA sequences produced by users themselves.  相似文献   

20.

Background  

We present Pegasys – a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号