共查询到20条相似文献,搜索用时 31 毫秒
1.
Background
The number of gene sequences that are available for comparative genomics approaches is increasing extremely quickly. A current challenge is to be able to handle this huge amount of sequences in order to build families of homologous sequences in a reasonable time. 相似文献2.
Kana Shimizu Yoichi Muraoka Shuichi Hirose Kentaro Tomii Tamotsu Noguchi 《BMC bioinformatics》2007,8(1):78
Background
Predicting intrinsically disordered proteins is important in structural biology because they are thought to carry out various cellular functions even though they have no stable three-dimensional structure. We know the structures of far more ordered proteins than disordered proteins. The structural distribution of proteins in nature can therefore be inferred to differ from that of proteins whose structures have been determined experimentally. We know many more protein sequences than we do protein structures, and many of the known sequences can be expected to be those of disordered proteins. Thus it would be efficient to use the information of structure-unknown proteins in order to avoid training data sparseness. We propose a novel method for predicting which proteins are mostly disordered by using spectral graph transducer and training with a huge amount of structure-unknown sequences as well as structure-known sequences. 相似文献3.
Background
Existing sequence alignment algorithms assume that similarities between DNA or amino acid sequences are linearly ordered. That is, stretches of similar nucleotides or amino acids are in the same order in both sequences. Recombination perturbs this order. An algorithm that can reconstruct sequence similarity despite rearrangement would be helpful for reconstructing the evolutionary history of recombined sequences. 相似文献4.
Background
Next generation ultra-sequencing technologies are starting to produce extensive quantities of data from entire human genome or exome sequences, and therefore new software is needed to present and analyse this vast amount of information. The 1000 Genomes project has recently released raw data for 629 complete genomes representing several human populations through their Phase I interim analysis and, although there are certain public tools available that allow exploration of these genomes, to date there is no tool that permits comprehensive population analysis of the variation catalogued by such data. 相似文献5.
6.
Background
Misfolding and aggregation of proteins into ordered fibrillar structures is associated with a number of severe pathologies, including Alzheimer's disease, prion diseases, and type II diabetes. The rapid accumulation of knowledge about the sequences and structures of these proteins allows using of in silico methods to investigate the molecular mechanisms of their abnormal conformational changes and assembly. However, such an approach requires the collection of accurate data, which are inconveniently dispersed among several generalist databases. 相似文献7.
8.
Background
The cataloging of marine prokaryotic DNA sequences is a fundamental aspect for bioprospecting and also for the development of evolutionary and speciation models. However, large amount of DNA sequences used to quantify prokaryotic biodiversity requires proper tools for storing, managing and analyzing these data for research purposes. 相似文献9.
MatGAT: An application that generates similarity/identity matrices using protein or DNA sequences 总被引:1,自引:0,他引:1
Background
The rapid increase in the amount of protein and DNA sequence information available has become almost overwhelming to researchers. So much information is now accessible that high-quality, functional gene analysis and categorization has become a major goal for many laboratories. To aid in this categorization, there is a need for non-commercial software that is able to both align sequences and also calculate pairwise levels of similarity/identity. 相似文献10.
Li Yu Dan Peng Jiang Liu Pengtao Luan Lu Liang Hang Lee Muyeong Lee Oliver A Ryder Yaping Zhang 《BMC evolutionary biology》2011,11(1):92
Background
Mustelidae, as the largest and most-diverse family of order Carnivora, comprises eight subfamilies. Phylogenetic relationships among these Mustelidae subfamilies remain argumentative subjects in recent years. One of the main reasons is that the mustelids represent a typical example of rapid evolutionary radiation and recent speciation event. Prior investigation has been concentrated on the application of different mitochondrial (mt) sequence and nuclear protein-coding data, herein we employ 17 nuclear non-coding loci (>15 kb), in conjunction with mt complete genome data (>16 kb), to clarify these enigmatic problems. 相似文献11.
Philipp N Seibel Tobias Müller Thomas Dandekar Jörg Schultz Matthias Wolf 《BMC bioinformatics》2006,7(1):498-7
Background
In sequence analysis the multiple alignment builds the fundament of all proceeding analyses. Errors in an alignment could strongly influence all succeeding analyses and therefore could lead to wrong predictions. Hand-crafted and hand-improved alignments are necessary and meanwhile good common practice. For RNA sequences often the primary sequence as well as a secondary structure consensus is well known, e.g., the cloverleaf structure of the t-RNA. Recently, some alignment editors are proposed that are able to include and model both kinds of information. However, with the advent of a large amount of reliable RNA sequences together with their solved secondary structures (available from e.g. the ITS2 Database), we are faced with the problem to handle sequences and their associated secondary structures synchronously. 相似文献12.
Background
Viroids, satellite RNAs, satellites viruses and the human hepatitis delta virus form the 'brotherhood' of the smallest known infectious RNA agents, known as the subviral RNAs. For most of these species, it is generally accepted that characteristics such as cell movement, replication, host specificity and pathogenicity are encoded in their RNA sequences and their resulting RNA structures. Although many sequences are indexed in publicly available databases, these sequence annotation databases do not provide the advanced searches and data manipulation capability for identifying and characterizing subviral RNA motifs. 相似文献13.
Background
The Internal Transcribed Spacer (ITS) regions of fungal ribosomal DNA (rDNA) are highly variable sequences of great importance in distinguishing fungal species by PCR analysis. Previously published PCR primers available for amplifying these sequences from environmental samples provide varying degrees of success at discriminating against plant DNA while maintaining a broad range of compatibility. Typically, it has been necessary to use multiple primer sets to accommodate the range of fungi under study, potentially creating artificial distinctions for fungal sequences that amplify with more than one primer set. 相似文献14.
Background
A necessary step for a genome level analysis of the cellular metabolism is the in silico reconstruction of the metabolic network from genome sequences. The available methods are mainly based on the annotation of genome sequences including two successive steps, the prediction of coding sequences (CDS) and their function assignment. The annotation process takes time. The available methods often encounter difficulties when dealing with unfinished error-containing genomic sequence. 相似文献15.
Background
When aligning several hundreds or thousands of sequences, such as epidemic virus sequences or homologous/orthologous sequences of some big gene families, to reconstruct the epidemiological history or their phylogenies, how to analyze and visualize the alignment results of many sequences has become a new challenge for computational biologists. Although there are several tools available for visualization of very long sequence alignments, few of them are applicable to the alignments of many sequences. 相似文献16.
Background
Recent, rapid growth in the quantity of available genomic data has generated many protein sequences that are not yet biochemically classified. Thus, the prediction of biochemical function based on structural motifs is an important task in post-genomic analysis. The InterPro databases are a major resource for protein function information. For optimal results, these databases should be searched at regular intervals, since they are frequently updated. 相似文献17.
Background
A number of the deeper divergences in the placental mammal tree are still inconclusively resolved despite extensive phylogenomic analyses. A recent analysis of 200 kbp of protein coding sequences yielded only limited support for the relationships among Laurasiatheria (cow, dog, bat and shrew), probably because the divergences occurred only within a few million years from each other. It is generally expected that increasing the amount of data and improving the taxon sampling enhance the resolution of narrow divergences. Therefore these and other difficult splits were examined by phylogenomic analysis of the hitherto largest sequence alignment. The increasingly complete genome data of placental mammals also allowed developing a novel and stringent data search method. 相似文献18.
Maha Bouzid Kevin M Tyler Richard Christen Rachel M Chalmers Kristin Elwin Paul R Hunter 《BMC microbiology》2010,10(1):213
Background
Cryptosporidium is a protozoan parasite that causes diarrheal illness in a wide range of hosts including humans. Two species, C. parvum and C. hominis are of primary public health relevance. Genome sequences of these two species are available and show only 3-5% sequence divergence. We investigated this sequence variability, which could correspond either to sequence gaps in the published genome sequences or to the presence of species-specific genes. Comparative genomic tools were used to identify putative species-specific genes and a subset of these genes was tested by PCR in a collection of Cryptosporidium clinical isolates and reference strains. 相似文献19.
Weizhong Li 《BMC bioinformatics》2009,10(1):359
Background
The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. 相似文献20.