首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present SequenceMatrix, software that is designed to facilitate the assembly and analysis of multi‐gene datasets. Genes are concatenated by dragging and dropping FASTA, NEXUS, or TNT files with aligned sequences into the program window. A multi‐gene dataset is concatenated and displayed in a spreadsheet; each sequence is represented by a cell that provides information on sequence length, number of indels, the number of ambiguous bases (“Ns”), and the availability of codon information. Alternatively, GenBank numbers for the sequences can be displayed and exported. Matrices with hundreds of genes and taxa can be concatenated within minutes and exported in TNT, NEXUS, or PHYLIP formats, preserving both character set and codon information for TNT and NEXUS files. SequenceMatrix also creates taxon sets listing taxa with a minimum number of characters or gene fragments, which helps assess preliminary datasets. Entire taxa, whole gene fragments, or individual sequences for a particular gene and species can be excluded from export. Data matrices can be re‐split into their component genes and the gene fragments can be exported as individual gene files. SequenceMatrix also includes two tools that help to identify sequences that may have been compromised through laboratory contamination or data management error. One tool lists identical or near‐identical sequences within genes, while the other compares the pairwise distance pattern of one gene against the pattern for all remaining genes combined. SequenceMatrix is Java‐based and compatible with the Microsoft Windows, Apple MacOS X and Linux operating systems. The software is freely available from http://code.google.com/p/sequencematrix/ . © The Willi Hennig Society 2010.  相似文献   

2.
This study evaluates the phylogeny of ray‐finned fishes (Actinopterygii) combining most available information (44 markers from nuclear and mitochondrial DNA and 274 morphological characters). The molecular partition of the dataset was produced through a pipeline (GB‐to‐TNT) that allows the fast building of large matrices from GenBank format. The analysed dataset has 8104 species, including representatives of all orders and 95% of the 475 families of Actinopterygii, making it the most diverse phylogenetic dataset analysed to date for this clade of fishes. Analysed morphological characters are features historically considered diagnostic for families or orders, which can be unequivocally coded from the literature. Analyses are by parsimony under several weighting schemes. General results agree with previous classifications, especially for groups with better gene sampling and those long thought (from morphological evidence) to be monophyletic. Many clades have low support and some orders are not recovered as monophyletic. Additional data and synthetic studies of homology are needed to obtain synapomorphies and diagnoses for most clades.  相似文献   

3.
4.
5.
Phylogenetic studies of ciliates are mainly based on the primary structure information of the nuclear genes. Some regions of the small subunit ribosomal RNA (SSU‐rRNA) gene have distinctive secondary structures, which have demonstrated value as phylogenetic/taxonomic characters. In the current work, we predict the secondary structures of four variable regions (V2, V4, V7 and V9) in the SSU‐rRNA gene of 45 urostylids. Structure comparisons indicate that the V4 region is the most effective in revealing interspecific relationships, while the V9 region appears suitable at the family level or higher. The V2 region also offers some taxonomic information, but is too conserved to reflect phylogenetic relationships at the family or lower level, at least for urostylids. The V7 region is the least informative. We constructed several phylogenetic trees, based on the primary sequence alignment and based on an improved alignment according to the secondary structures. The results suggest that including secondary structure information in phylogenetic analyses provides additional insights into phylogenetic relationships. Using urostylid ciliates as an example, we show that secondary structure information results in a better understanding of their relationships, for example generic relationships within the family Pseudokeronopsidae.  相似文献   

6.
Every protein fated to receive the glycophosphatidylinositol (GPI) anchor post‐translational modification has a C‐terminal GPI‐anchor attachment signal sequence. This signal peptide varies with respect to length, content, and hydrophobicity. With the exception of predictions based on an upstream amino acid triplet termed ω→ω + 2 which designates the site of GPI uptake, there is no information on how the efficiencies of different native signal sequences compare in the transamidation reaction that catalyzes the substitution of the GPI anchor for the C‐terminal peptide. In this study we utilized the placental alkaline phosphatase (PLAP) minigene, miniPLAP, and replaced its native 3′ end‐sequence encoding ω‐2 to the C‐terminus with the corresponding C‐terminal sequences of nine other human GPI‐anchored proteins. The resulting chimeras then were fed into an in vitro processing microsomal system where the cleavages leading to mature product from the nascent preproprotein could be followed by resolution on an SDS–PAGE system after immunoprecipitation. The results showed that the native signal of each protein differed markedly with respect to transamidation efficiency, with the signals of three proteins out‐performing the others in GPI‐anchor addition and those of two proteins being poorer substrates for the GPI transamidase. The data additionally indicated that the hierarchical order of efficiency of transamidation did not depend solely on the combination of permissible residues at ω→ω + 2. J. Cell. Biochem. 84: 68–83, 2002. © 2001 Wiley‐Liss, Inc.  相似文献   

7.
This study examines the utility of morphology and DNA barcoding in species identification of freshwater fishes from north‐central Nigeria. We compared molecular data (mitochondrial cytochrome c oxidase subunit I (COI) sequences) of 136 de novo samples from 53 morphologically identified species alongside others in GenBank and BOLD databases. Using DNA sequence similarity‐based (≥97% cutoff) identification technique, 50 (94.30%) and 24 (45.30%) species were identified to species level using GenBank and BOLD databases, respectively. Furthermore, we identified cases of taxonomic problems in 26 (49.00%) morphologically identified species. There were also four (7.10%) cases of mismatch in DNA barcoding in which our query sequence in GenBank and BOLD showed a sequence match with different species names. Using DNA barcode reference data, we also identified four unknown fish samples collected from fishermen to species level. Our Neighbor‐joining (NJ) tree analysis recovers several intraspecific species clusters with strong bootstrap support (≥95%). Analysis uncovers two well‐supported lineages within Schilbe intermedius. The Bayesian phylogenetic analyses of Nigerian S. intermedius with others from GenBank recover four lineages. Evidence of genetic structuring is consistent with geographic regions of sub‐Saharan Africa. Thus, cryptic lineage diversity may illustrate species’ adaptive responses to local environmental conditions. Finally, our study underscores the importance of incorporating morphology and DNA barcoding in species identification. Although developing a complete DNA barcode reference library for Nigerian ichthyofauna will facilitate species identification and diversity studies, taxonomic revisions of DNA sequences submitted in databases alongside voucher specimens are necessary for a reliable taxonomic and diversity inventory.  相似文献   

8.
9.
Sampling oribatid mites in large areas using conventional methods is expensive, time‐consuming, and this constrains their use in environmental monitoring programs. We used samples collected in 38 plots of 3.75 ha spread over 30,000 ha in an Amazonian savanna to evaluate the reduction in costs and person‐hours in sampling and sorting and to elaborate cost‐effective protocols. Ten samples per plot were collected and extracted using a Berlese‐Tullgren apparatus. In the laboratory, samples were reduced to 50, 25, 12.5, and 6.25 percent of the initial content. Field‐effort reduction was estimated by reducing the number of subsamples per plot. Dissimilarity matrices were generated using Bray–Curtis, Sørensen, and Chao–Sørensen indices. Correlations between each reduced‐effort dissimilarity matrix and 100 or 50 percent sorting were used as an index of how much information was retained in reduced‐effort sampling, and could still be used in multivariate analyses. The effects of most predictor variables on mite composition were detected in data based on every level of sample reduction. The intensive sampling was insufficient to reveal the full oribatid‐mite fauna in the savanna; as more plots were sampled, more species were recorded. Our data indicate subsampling protocols for biodiversity assessment of oribatid mites in savanna that increase field and laboratory efficiency, and optimize both taxonomic and ecological aspects of the investigation.  相似文献   

10.
Proteomics approaches using MS in combination with affinity purification have emerged as powerful tools to study protein‐protein interactions. Here we make use of the specificity of sortase A transpeptidation reaction to prepare affinity matrices in which a protein bait is covalently linked to the matrix via a short C‐terminal linker region. As a result of this site‐directed immobilization, the bait remains functionally accessible to protein interactions. To apply this approach, we performed SILAC‐based pull‐down experiments and demonstrate the suitability of the approach.  相似文献   

11.
Version 1.5 of the computer program TNT completely integrates landmark data into phylogenetic analysis. Landmark data consist of coordinates (in two or three dimensions) for the terminal taxa; TNT reconstructs shapes for the internal nodes such that the difference between ancestor and descendant shapes for all tree branches sums up to a minimum; this sum is used as tree score. Landmark data can be analysed alone or in combination with standard characters; all the applicable commands and options in TNT can be used transparently after reading a landmark data set. The program continues implementing all the types of analyses in former versions, including discrete and continuous characters (which can now be read at any scale, and automatically rescaled by TNT). Using algorithms described in this paper, searches for landmark data can be made tens to hundreds of times faster than it was possible before (from T to 3T times faster, where T is the number of taxa), thus making phylogenetic analysis of landmarks feasible even on standard personal computers.  相似文献   

12.
根据物种学名、分类号、任意一段核酸或蛋白质的序列,判定其属于什么物种及其详细分类的信息如何,是生物信息分析的最为基础且重要的环节,但该过程的分析及结果的获取均为手动,费时费力且容易出错。本研究旨在解决如何在NCBI网站上自动或批量获取物种信息。通过解析NCBI在线BLAST结果及其网页源程序特点,利用Perl语言编写自动化脚本,以达到批量获取查询或比对结果的物种分类信息。本研究编写的Perl语言脚本可解决序列在NCBI在线比对后自动或批量获取物种的分类信息问题,适用于细菌、真菌、动物、植物等物种学名、分类号、核酸或蛋白质的任意序列,可以为同行生物数据分析提供参考。  相似文献   

13.
A widely used algorithm for computing an optimal local alignment between two sequences requires a parameter set with a substitution matrix and gap penalties. It is recognized that a proper parameter set should be selected to suit the level of conservation between sequences. We describe an algorithm for selecting an appropriate substitution matrix at given gap penalties for computing an optimal local alignment between two sequences. In the algorithm, a substitution matrix that leads to the maximum alignment similarity score is selected among substitution matrices at various evolutionary distances. The evolutionary distance of the selected substitution matrix is defined as the distance of the computed alignment. To show the effects of gap penalties on alignments and their distances and help select appropriate gap penalties, alignments and their distances are computed at various gap penalties. The algorithm has been implemented as a computer program named SimDist. The SimDist program was compared with an existing local alignment program named SIM for finding reciprocally best-matching pairs (RBPs) of sequences in each of 100 protein families, where RBPs are commonly used as an operational definition of orthologous sequences. SimDist produced more accurate results than SIM on 50 of the 100 families, whereas both programs produced the same results on the other 50 families. SimDist was also used to compare three types of substitution matrices in scoring 444,461 pairs of homologous sequences from the 100 families.  相似文献   

14.
MFG‐E8 was initially identified as a principle component of the Milk Fat Globule, a membrane‐encased collection of proteins and triglycerides that bud from the apical surface of mammary epithelia during lactation. It has since been independently identified in many species and by many investigators and given a variety of names, including p47, lactadherin, rAGS, PAS6/7, and BA‐46. The acronym SED1 was proposed to bring cohesion to this nomenclature based upon it being a Secreted protein that contains two distinct functional domains: an N‐terminal domain with two EGF‐repeats, the second of which has an integrin‐binding RGD motif, and a C‐terminal domain with two Discoidin/F5/8C domains that bind to anionic phospholipids and/or extracellular matrices. SED1/MFG‐E8 is now known to participate in a wide variety of cellular interactions, including phagocytosis of apoptotic lymphocytes and other apoptotic cells, adhesion between sperm and the egg coat, repair of intestinal mucosa, mammary gland branching morphogenesis, angiogenesis, among others. This article will explore the various roles proposed for SED1/MFG‐E8, as well as its provocative therapeutic potential. J. Cell. Biochem. 106: 957–966, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

15.
The typical wet lab user often annotates smaller sequences in the GenBank format, but resulting files are not accepted for database submission by NCBI. This makes submission of such annotations a cumbersome task. Here we present “GB2sequin” an easy-to-use web application that converts custom annotations in the GenBank format into the NCBI direct submission format Sequin. Additionally, the program generates a “five-column, tab-delimited feature table” and a FASTA file. Those are required for submission through BankIt or the update of an existing GenBank entry. We specifically developed “GB2sequin” for the regular wet lab researcher with strong focus on user-friendliness and flexibility. The application is equipped with an intuitive graphical interface and a comprehensive documentation. It can be employed to prepare any GenBank file for database submission and is freely available online at https://chlorobox.mpimp-golm.mpg.de/GenBank2Sequin.html.  相似文献   

16.
Clostridium histolyticum collagenase causes extensive degradation of collagen in connective tissue that results in gas gangrene. The C‐terminal collagen‐binding domain (CBD) of these enzymes is the minimal segment required to bind to a collagen fibril. CBD binds unidirectionally to the undertwisted C‐terminus of triple helical collagen. Here, we examine whether CBD could also target undertwisted regions even in the middle of the triple helix. Collageneous peptides with an additional undertwisted region were synthesized by introducing a Gly → Ala substitution [(POG)xPOA(POG)y]3, where x + y = 9 and x > 3). 1H–15N heteronuclear single quantum coherence nuclear magnetic resonance (HSQC NMR) titration studies with 15N‐labeled CBD demonstrated that the minicollagen binds to a 10 Å wide 25 Å long cleft. Six collagenous peptides each labeled with a nitroxide radical were then titrated with 15N‐labeled CBD. CBD binds to either the Gly → Ala substitution site or to the C‐terminus of each minicollagen. Small‐angle X‐ray scattering measurements revealed that CBD prefers to bind the Gly → Ala site to the C‐terminus. The HSQC NMR spectra of 15N‐labeled minicollagen and minicollagen with undertwisted regions were unaffected by the titration of unlabeled CBD. The results imply that CBD binds to the undertwisted region of the minicollagen but does not actively unwind the triple helix.  相似文献   

17.
A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular taxonomic and phylogenetic research means that a lot of information in public sequence databases is not linked to formal taxonomic names. This review explores the use of other identifiers, such as specimen codes and GenBank accession numbers, to link otherwise disconnected facts in different databases. The structure of these links can also be exploited using the PageRank algorithm to rank the results of searches on biodiversity databases. The key to rich integration is a commitment to deploy and reuse globally unique, shared identifiers [such as Digital Object Identifiers (DOIs) and Life Science Identifiers (LSIDs)], and the implementation of services that link those identifiers.  相似文献   

18.
Y.-P. Tian  X.-P. Zhu  J.-L. Liu    X.-Q. Yu    J. Du    J. Kreuze    X.-D. Li 《Journal of Phytopathology》2007,155(6):333-341
Turnip mosaic virus (TuMV; genus Potyvirus, family Potyviridae) causes great losses to cruciferous crop production worldwide. The 3′‐terminal genomic sequences of eight TuMV isolates from eastern China were compared with those of 74 other Chinese TuMV isolates of known host origin in the GenBank and isolated during the past 25 years. The reported sequences of the eight TuMV isolates are 1125 or 1126‐nucleotides (nt) long excluding the poly(A) tail. They all contain one partial open reading frame of 912 nt, encoding 304 amino acids, followed by a stop codon and a non‐translated region of 209–210 nt. Results of phylogenetic analyses showed that Chinese TuMV isolates clustered into three groups: basal‐BR, Asian‐BR and world‐B. The ratios of non‐synonymous and synonymous substitutions and results of amino acid alignment provided evidence for purifying or negative selection in TuMV populations of China.  相似文献   

19.
Multigene and genomic data sets have become commonplace in the field of phylogenetics, but many existing tools are not designed for such data sets, which often makes the analysis time‐consuming and tedious. Here, we present PhyloSuite , a (cross‐platform, open‐source, stand‐alone Python graphical user interface) user‐friendly workflow desktop platform dedicated to streamlining molecular sequence data management and evolutionary phylogenetics studies. It uses a plugin‐based system that integrates several phylogenetic and bioinformatic tools, thereby streamlining the entire procedure, from data acquisition to phylogenetic tree annotation (in combination with iTOL). It has the following features: (a) point‐and‐click and drag‐and‐drop graphical user interface; (b) a workplace to manage and organize molecular sequence data and results of analyses; (c) GenBank entry extraction and comparative statistics; and (d) a phylogenetic workflow with batch processing capability, comprising sequence alignment (mafft and macse ), alignment optimization (trimAl, HmmCleaner and Gblocks), data set concatenation, best partitioning scheme and best evolutionary model selection (PartitionFinder and modelfinder ), and phylogenetic inference (MrBayes and iq‐tree ). PhyloSuite is designed for both beginners and experienced researchers, allowing the former to quick‐start their way into phylogenetic analysis, and the latter to conduct, store and manage their work in a streamlined way, and spend more time investigating scientific questions instead of wasting it on transferring files from one software program to another.  相似文献   

20.
Tandem MS (MS2) quantification using the series of N‐ and C‐terminal fragment ion pairs generated from isobaric‐labelled peptides was recently considered an accurate strategy in quantitative proteomics. However, the presence of multiplexed terminal fragment ion in MS2 spectra may reduce the efficiency of peptide identification, resulting in lower identification scores or even incorrect assignments. To address this issue, we developed a quantitative software tool, denoted isobaric tandem MS quantification (ITMSQ), to improve N‐ and C‐terminal fragment ion pairs based isobaric MS2 quantification. A spectrum splitting module was designed to separate the MS2 spectra from different samples, increasing the accuracy of both identification and quantification. ITMSQ offers a convenient interface through which parameters can be changed along with the labelling method, and the result files and all of the intermediate files can be exported. We performed an analysis of in vivo terminal amino acid labelling labelled HeLa samples and found that the numbers of quantified proteins and peptides increased by 13.64 and 27.52% after spectrum splitting, respectively. In conclusion, ITMSQ provides an accurate and reliable quantitative solutionfor N‐ and C‐terminal fragment ion pairs based isobaric MS2 quantitative methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号