共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
An information-based sequence distance and its application to whole mitochondrial genome phylogeny 总被引:12,自引:0,他引:12
Li M Badger JH Chen X Kwong S Kearney P Zhang H 《Bioinformatics (Oxford, England)》2001,17(2):149-154
MOTIVATION: Traditional sequence distances require an alignment and therefore are not directly applicable to the problem of whole genome phylogeny where events such as rearrangements make full length alignments impossible. We present a sequence distance that works on unaligned sequences using the information theoretical concept of Kolmogorov complexity and a program to estimate this distance. RESULTS: We establish the mathematical foundations of our distance and illustrate its use by constructing a phylogeny of the Eutherian orders using complete unaligned mitochondrial genomes. This phylogeny is consistent with the commonly accepted one for the Eutherians. A second, larger mammalian dataset is also analyzed, yielding a phylogeny generally consistent with the commonly accepted one for the mammals. AVAILABILITY: The program to estimate our sequence distance, is available at http://www.cs.cityu.edu.hk/~cssamk/gencomp/GenCompress1.htm. The distance matrices used to generate our phylogenies are available at http://www.math.uwaterloo.ca/~mli/distance.html. 相似文献
3.
Ermakova EO Nurtdinov RN Gelfand MS 《Journal of bioinformatics and computational biology》2007,5(5):991-1004
Over 50% of donor splice sites in the human genome have a potential alternative donor site at a distance of three to six nucleotides. Conservation of these potential sites is determined by the consensus requirements and by its exonic or intronic location. Several hundred pairs of overlapping sites are confirmed to be alternatively spliced as both sites in a pair are supported by a protein, by a full-length mRNA, or by expressed sequence tags (ESTs) from at least two independent clone libraries. Overlapping sites may clash with consensus requirements. Pairs with a site shift of four nucleotides are the most abundant, despite the frameshift in the protein-coding region that they introduce. The site usage in pairs is usually uneven, and the major site is more frequently conserved in other mammalian genomes. Overlapping alternative donor sites and acceptor sites may have different functional roles: alternative splicing of overlapping acceptor sites leads mainly to microvariations in protein sequences; whereas alternative donor sites often lead to frameshifts and thus either yield major differences in the protein sequence and structure, or generate nonsense-mediated decay-inducing mRNA isoforms likely involved in regulated unproductive splicing pathways. 相似文献
4.
Kang Yu Dongcheng Liu Wenying Wu Wenlong Yang Jiazhu Sun Xin Li Kehui Zhan Dangqun Cui Hongqing Ling Chunming Liu Aimin Zhang 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2017,130(1):53-70
Key message
An integrated genetic map was constructed for einkorn wheat A genome and provided valuable information for QTL mapping and genome sequence anchoring.Abstract
Wheat is one of the most widely grown food grain crops in the world. The construction of a genetic map is a key step to organize biologically or agronomically important traits along the chromosomes. In the present study, an integrated linkage map of einkorn wheat was developed using 109 recombinant inbred lines (RILs) derived from an inter sub-specific cross, KT1-1 (T. monococcum ssp. boeoticum) × KT3-5 (T. monococcum ssp. monococcum). The map contains 926 molecular markers assigned to seven linkage groups, and covers 1,377 cM with an average marker interval of 1.5 cM. A quantitative trait locus (QTL) analysis of five agronomic traits identified 16 stable QTL on all seven chromosomes, except 6A. The total phenotypic variance explained by these stable QTL using multiple regressions varied across environments from 8.8 to 87.1 % for days to heading, 24.4–63.0 % for spike length, 48.2–79.6 % for spikelet number per spike, 13.1–48.1 % for plant architecture, and 12.2–26.5 % for plant height, revealing that much of the RIL phenotypic variation had been genetically dissected. Co-localizations of closely linked QTL for different traits were frequently observed, especially on 3A and 7A. The QTL on 3A, 5A and 7A were closely associated with Eps-A m 3, Vrn1 and Vrn3 loci, respectively. Furthermore, this genetic map facilitated the anchoring of 237 T. urartu scaffolds onto seven chromosomes with a physical length of 26.15 Mb. This map and the QTL data provide valuable genetic information to dissect important agronomic and developmental traits in diploid wheat and contribute to the genetic ordering of the genome assembly.5.
Wallqvist A Fukunishi Y Murphy LR Fadel A Levy RM 《Bioinformatics (Oxford, England)》2000,16(11):988-1002
MOTIVATION: Sequence alignment techniques have been developed into extremely powerful tools for identifying the folding families and function of proteins in newly sequenced genomes. For a sufficiently low sequence identity it is necessary to incorporate additional structural information to positively detect homologous proteins. We have carried out an extensive analysis of the effectiveness of incorporating secondary structure information directly into the alignments for fold recognition and identification of distant protein homologs. A secondary structure similarity matrix based on a database of three-dimensionally aligned proteins was first constructed. An iterative application of dynamic programming was used which incorporates linear combinations of amino acid and secondary structure sequence similarity scores. Initially, only primary sequence information is used. Subsequently contributions from secondary structure are phased in and new homologous proteins are positively identified if their scores are consistent with the predetermined error rate. RESULTS: We used the SCOP40 database, where only PDB sequences that have 40% homology or less are included, to calibrate homology detection by the combined amino acid and secondary structure sequence alignments. Combining predicted secondary structure with sequence information results in a 8-15% increase in homology detection within SCOP40 relative to the pairwise alignments using only amino acid sequence data at an error rate of 0.01 errors per query; a 35% increase is observed when the actual secondary structure sequences are used. Incorporating predicted secondary structure information in the analysis of six small genomes yields an improvement in the homology detection of approximately 20% over SSEARCH pairwise alignments, but no improvement in the total number of homologs detected over PSI-BLAST, at an error rate of 0.01 errors per query. However, because the pairwise alignments based on combinations of amino acid and secondary structure similarity are different from those produced by PSI-BLAST and the error rates can be calibrated, it is possible to combine the results of both searches. An additional 25% relative improvement in the number of genes identified at an error rate of 0.01 is observed when the data is pooled in this way. Similarly for the SCOP40 dataset, PSI-BLAST detected 15% of all possible homologs, whereas the pooled results increased the total number of homologs detected to 19%. These results are compared with recent reports of homology detection using sequence profiling methods. AVAILABILITY: Secondary structure alignment homepage at http://lutece.rutgers.edu/ssas CONTACT: anders@rutchem.rutgers.edu; ronlevy@lutece.rutgers.edu Supplementary Information: Genome sequence/structure alignment results at http://lutece.rutgers.edu/ss_fold_predictions. 相似文献
6.
7.
David J Griffiths 《Genome biology》2001,2(6):reviews1017.1-reviews10175
8.
Using comparative genomics to reorder the human genome sequence into a virtual sheep genome 总被引:3,自引:1,他引:3
Dalrymple BP Kirkness EF Nefedov M McWilliam S Ratnakumar A Barris W Zhao S Shetty J Maddox JF O'Grady M Nicholas F Crawford AM Smith T de Jong PJ McEwan J Oddy VH Cockett NE;International Sheep Genomics Consortium 《Genome biology》2007,8(7):R152-20
Background
Is it possible to construct an accurate and detailed subgene-level map of a genome using bacterial artificial chromosome (BAC) end sequences, a sparse marker map, and the sequences of other genomes?Results
A sheep BAC library, CHORI-243, was constructed and the BAC end sequences were determined and mapped with high sensitivity and low specificity onto the frameworks of the human, dog, and cow genomes. To maximize genome coverage, the coordinates of all BAC end sequence hits to the cow and dog genomes were also converted to the equivalent human genome coordinates. The 84,624 sheep BACs (about 5.4-fold genome coverage) with paired ends in the correct orientation (tail-to-tail) and spacing, combined with information from sheep BAC comparative genome contigs (CGCs) built separately on the dog and cow genomes, were used to construct 1,172 sheep BAC-CGCs, covering 91.2% of the human genome. Clustered non-tail-to-tail and outsize BACs located close to the ends of many BAC-CGCs linked BAC-CGCs covering about 70% of the genome to at least one other BAC-CGC on the same chromosome. Using the BAC-CGCs, the intrachromosomal and interchromosomal BAC-CGC linkage information, human/cow and vertebrate synteny, and the sheep marker map, a virtual sheep genome was constructed. To identify BACs potentially located in gaps between BAC-CGCs, an additional set of 55,668 sheep BACs were positioned on the sheep genome with lower confidence. A coordinate conversion process allowed us to transfer human genes and other genome features to the virtual sheep genome to display on a sheep genome browser.Conclusion
We demonstrate that limited sequencing of BACs combined with positioning on a well assembled genome and integrating locations from other less well assembled genomes can yield extensive, detailed subgene-level maps of mammalian genomes, for which genomic resources are currently limited. 相似文献9.
Summary The nucleotide sequence of the RNA of the bacteriophage MS2 was examined by computer for internal patterns. We used a technique which analyzes a nucleotide sequence as a Markov chain. This led us to discover patterns within the translated and untranslated regions of the RNA in addition to those patterns formed by the codons. One of the more surprising results of this analysis was the discovery that the non-coding sequences in the genome are as highly ordered, although in a different sense, as the genes themselves. Also of interest was the discovery that the codon frequency distributions for the three genes are similar. 相似文献
10.
Tsuge Y Suzuki N Ninomiya K Inui M Yukawa H 《Bioscience, biotechnology, and biochemistry》2007,71(7):1683-1690
A new functional Corynebacterium glutamicum insertion sequence (IS) element, IS13655, was isolated using a suicide vector. The IS element was 1,293 bp in size and contained 26-bp imperfect inverted repeats (IRs) and 3-bp target site duplication as direct repeats (DRs). IS13655 harbored two ORFs with high similarity to the transposase of IS1206, an IS3 family element. IS13655 revealed relatively high transposition efficiency, with low target site selectivity along the Corynebacterium glutamicum R genome, making it a potentially useful genetic engineering tool. 相似文献
11.
Our study was aimed at examinating whether or not the human genome encodes for previously unreported cysteine cathepsins. To this end, we used analyses of the genome sequence and mRNA expression levels. The program TBLASTN was employed to scan the draft sequence of the human genome for the 11 known cysteine cathepsins. The cathepsin-like segments in the genome were inspected, filtered, and annotated. In addition to the known cysteine cathepsins, the scan identified three pseudogenes, closely related to cathepsin L, on chromosome 10, as well as two remote homologs, tubulointerstitial protein antigen and tubulointerstitial protein antigen-related protein. No new members of the family were identified. mRNA expression profiles for 10 known human cysteine cathepsins showed varying expression levels in 46 different human tissues and cell lines. No expression of any of the three cathepsin L-like pseudogenes was found. Based on these results, it is likely that to date all human cysteine cathepsins are known. 相似文献
12.
One theory formalised in 1970 proposes that the complexity of vertebrate genomes originated by means of genome duplication at the base of the vertebrate lineage. Since then, the theory has remained both popular and controversial. Here we review the theory, and present preliminary results from our analysis of duplications in the draft human genome sequence. We find evidence for extensive duplication of parts of the genome. We also question the validity of the 'parsimony test' that has been used in other analyses. 相似文献
13.
Nucleotide sequence of human endogenous retrovirus genome related to the mouse mammary tumor virus genome. 总被引:30,自引:17,他引:30
下载免费PDF全文

We determined the complete nucleotide sequence of the human endogenous retrovirus genome HERV-K10 isolated as the sequence homologous to the Syrian hamster intracisternal A-particle (type A retrovirus) genome. HERV-K10 is 9,179 base pairs long with long terminal repeats of 968 base pairs at both ends; a sequence 290 base pairs long, however, was found to be deleted. It was concluded that a composite genome having the 290-base-pair fragment is the prototype HERV-K provirus gag (666 codons), protease (334 codons), pol (937 codons), and env (618 codons) genes. The size of the protease gene product of HERV-K is essentially the same as that of A- and D-type oncoviruses but nearly twice that of other retroviruses. A comparison of the deduced amino acid sequences encoded by the pol region showed HERV-K to be closely related to types A and D retroviruses and even more so to type B retrovirus. It was noted that the env gene product of HERV-K structurally resembles the mouse mammary tumor virus (type B retrovirus) env protein, and the possible expression of the HERV-K env gene in human breast cancer cells is discussed. 相似文献
14.
Once thought to be impossible or a waste of resources, the initial high-volume stages of sequencing the human genome have been completed. 相似文献
15.
Song H Kim TY Choi BK Choi SJ Nielsen LK Chang HN Lee SY 《Applied microbiology and biotechnology》2008,79(2):263-272
This study presents a novel methodology for the development of a chemically defined medium (CDM) using genome-scale metabolic
network and flux balance analysis. The genome-based in silico analysis identified two amino acids and four vitamins as non-substitutable
essential compounds to be supplemented to a minimal medium for the sustainable growth of Mannheimia succiniciproducens, while no substitutable essential compounds were identified. The in silico predictions were verified by cultivating the cells
on a CDM containing the six non-substitutable essential compounds, and it was further demonstrated by observing no cell growth
on the CDM lacking any one of the non-substitutable essentials. An optimal CDM for the enhancement of cell growth and succinic
acid production, as a target product, was formulated with a single-addition technique. The fermentation on the optimal CDM
increased the succinic acid productivity by 36%, the final succinic acid concentration by 17%, and the succinic acid yield
on glucose by 15% compared to the cultivation using a complex medium. The optimal CDM also lowered the sum of the amounts
of by-products (acetic, formic, and lactic acids) by 30%. The strategy reported in this paper should be generally applicable
to the development of CDMs for other organisms, whose genome sequences are available. 相似文献
16.
The DNA sequence of the human cytomegalovirus genome. 总被引:14,自引:0,他引:14
A T Bankier S Beck R Bohni C M Brown R Cerny M S Chee C A Hutchison T Kouzarides J A Martignetti E Preddie 《DNA sequence》1991,2(1):1-12
In the first part of this article we review what has been learnt from the analysis of the sequence of HCMV. A summary of this information is presented in the form of an updated map of the viral genome. HCMV is representative of a major lineage of herpesviruses distinct from previously sequenced members of this viral family and demonstrates striking differences in genetic content and organization. The virus encodes approximately 200 genes, including nine gene families, a large number of glycoprotein genes, and homologues of the human HLA class I and G protein-coupled receptor genes. The HCMV sequence thus provides a sound basis for future molecular studies of this highly complex eukaryotic virus. The second part discusses the practical rate of DNA sequencing as deduced from this and other studies. The 229 kilobase pair DNA genome of human cytomegalovirus (HCMV) strain AD169 is the largest contiguous sequence determined to date, and as such provides a realistic benchmark for assessing the practical rate of DNA sequencing as opposed to theoretical calculations which are usually much greater. The sequence was determined manually and we assess the impact of new developments in DNA sequencing. 相似文献
17.
18.
19.
Human DNA was fractionated by centrifugation in Cs2SO4 density gradients containing 3,6-bis(acetatomercurimethyl)dioxane (BAMD). Fractions were investigated in their analytical CsCl profiles and a number of specific sequences were localized in them. The results so obtained led to an improved understanding of the organization of nucleotide sequences in the human genome, as well as to the discovery that a class of DNA having a very high G + C content and not represented in the mouse genome, is particularly rich in genes and interspersed repetitive sequences. 相似文献