首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

Background  

Large databases of genetic data are often biased in their representation. Thus, selection of genetic data with desired properties, such as evolutionary representation or shared genotypes, is problematic. Selection on the basis of epidemiological variables may not achieve the desired properties. Available automated approaches to the selection of influenza genetic data make a tradeoff between speed and simplicity on the one hand and control over quality and contents of the dataset on the other hand. A poorly chosen dataset may be detrimental to subsequent analyses.  相似文献   

2.
Cho SY  Park KS  Shim JE  Kwon MS  Joo KH  Lee WS  Chang J  Kim H  Chung HC  Kim HO  Paik YK 《Proteomics》2002,2(9):1104-1113
We describe an integrated proteome database, termed Yonsei Proteome Research Center Proteome Database (YPRC-PDB) which can store, retrieve and analyze various information including two-dimensional electrophoresis (2-DE) images and associated spot information that were obtained during studies of hepatocellular carcinoma (HCC). YPRC-PDB is also designed to perform as a laboratory information management system that manages sample information, clinical background, conditions of both sample preparation and 2-DE, and entire sets of experimental results. It also features query system and data-mining applications, which are amenable to automatically analyze expression level changes of a specific protein and directly link to clinical information. The user interface is web-based, so that the results from other laboratories can be shared effectively. In particular, the master gel image query is equipped with a graphic tool that can easily identify the relationship between the specific pathological stage of HCC and expression levels of a potential marker protein on the master gel image. Thus, YPRC-PDB is a versatile integrated database suitable for subsequent analyses. The information in YPRC-PDB is updated easily and it is available to authorized users on the World Wide Web (http://yprcpdb.proteomix.org/ approximately damduck/).  相似文献   

3.
An alternative-exon database and its statistical analysis   总被引:19,自引:0,他引:19  
We compiled a comprehensive database of alternative exons from the literature and analyzed them statistically. Most alternative exons are cassette exons and are expressed in more than two tissues. Of all exons whose expression was reported to be specific for a certain tissue, the majority were expressed in the brain. Whereas the length of constitutive exons follows a normal distribution, the distribution of alternative exons is skewed toward smaller ones. Furthermore, alternative-exon splice sites deviate more from the consensus: their 3' splice sites are characterized by a higher purine content in the polypyrimidine stretch, and their 5' splice sites deviate from the consensus sequence mostly at the +4 and +5 positions. Furthermore, for exons expressed in a single tissue, adenosine is more frequently used at the -3 position of the 3' splice site. In addition to the known AC-rich and purine-rich exonic sequence elements, sequence comparison using a Gibbs algorithm identified several motifs in exons surrounded by weak splice sites and in tissue-specific exons. Together, these data indicate a combinatorial effect of weak splice sites, atypical nucleotide usage at certain positions, and functional enhancers as an important contribution to alternative-exon regulation.  相似文献   

4.
赵锐  钱震  任双喜 《生物信息学》2009,7(2):143-145,149
设计一种基于网络的可用来存储和注释海量DNA数据的数据库模型。整个过程分为三部分:首先是构建数据库框架,然后对原始基因组序列数据进行批量注释并输出有效格式导入数据库,最后通过一个友好的用户交互界面,实现对基因组数据的在线读取,查询,注释等操作。设计的数据库用于解决大量产生并有待分析的基因组序列的有效存储和管理问题。  相似文献   

5.
We present our experience of building biological databases. Such databases have most aspects in common with other complex databases in other fields. We do not believe that biological data are that different from complex data in other fields. Our experience has led us to emphasise simplicity and conservative technology choices when building these databases. This is a short paper of advice that we hope is useful to people designing their own biological database.  相似文献   

6.
An object-oriented database system has been developed which is being used to store protein structure data. The database can be queried using the logic programming language Prolog or the query language Daplex. Queries retrieve information by navigating through a network of objects which represent the primary, secondary and tertiary structures of proteins. Routines written in both Prolog and Daplex can integrate complex calculations with the retrieval of data from the database, and can also be stored in the database for sharing among users. Thus object-oriented databases are better suited to prototyping applications and answering complex queries about protein structure than relational databases. This system has been used to find loops of varying length and anchor positions when modelling homologous protein structures.  相似文献   

7.
A new, efficient method for the assembly of protein tertiary structure from known, loosely encoded secondary structure restraints and sparse information about exact side chain contacts is proposed and evaluated. The method is based on a new, very simple method for the reduced modeling of protein structure and dynamics, where the protein is described as a lattice chain connecting side chain centers of mass rather than Cαs. The model has implicit built-in multibody correlations that simulate short- and long-range packing preferences, hydrogen bonding cooperativity and a mean force potential describing hydrophobic interactions. Due to the simplicity of the protein representation and definition of the model force field, the Monte Carlo algorithm is at least an order of magnitude faster than previously published Monte Carlo algorithms for structure assembly. In contrast to existing algorithms, the new method requires a smaller number of tertiary restraints for successful fold assembly; on average, one for every seven residues as compared to one for every four residues. For example, for smaller proteins such as the B domain of protein G, the resulting structures have a coordinate root mean square deviation (cRMSD), which is about 3 Å from the experimental structure; for myoglobin, structures whose backbone cRMSD is 4.3 Å are produced, and for a 247-residue TIM barrel, the cRMSD of the resulting folds is about 6 Å. As would be expected, increasing the number of tertiary restraints improves the accuracy of the assembled structures. The reliability and robustness of the new method should enable its routine application in model building protocols based on various (very sparse) experimentally derived structural restraints. Proteins 32:475–494, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

8.
At first mostly dedicated to molecular analysis, microfluidic systems are rapidly expanding their range of applications towards cell biology, thanks to their ability to control the mechanical, biological and fluidic environment at the scale of the cells. A number of new concepts based on microfluidics were indeed proposed in the last ten years for cell sorting. For many of these concepts, progress remains to be done regarding automation, standardization, or throughput, but it is now clear that microfluidics will have a major contribution to the field, from fundamental research to point-of-care diagnosis. We present here an overview of cells sorting in microfluidics, with an emphasis on circulating tumor cells. Sorting principles are classified in two main categories, methods based on physical properties of the cells, such as size, deformability, electric or optical properties, and methods based on biomolecular properties, notably specific surface antigens. We document potential applications, discuss the main advantages and limitations of different approaches, and tentatively outline the main remaining challenges in this fast evolving field.  相似文献   

9.
Research in the genomic sciences is confronted with the volume of sequencing and resequencing data increasing at a higher pace than that of data storage and communication resources, shifting a significant part of research budgets from the sequencing component of a project to the computational one. Hence, being able to efficiently store sequencing and resequencing data is a problem of paramount importance. In this article, we describe GReEn (Genome Resequencing Encoding), a tool for compressing genome resequencing data using a reference genome sequence. It overcomes some drawbacks of the recently proposed tool GRS, namely, the possibility of compressing sequences that cannot be handled by GRS, faster running times and compression gains of over 100-fold for some sequences. This tool is freely available for non-commercial use at ftp://ftp.ieeta.pt/~ap/codecs/GReEn1.tar.gz.  相似文献   

10.
With the advent of DNA sequencing technologies, more and more reference genome sequences are available for many organisms. Analyzing sequence variation and understanding its biological importance are becoming a major research aim. However, how to store and process the huge amount of eukaryotic genome data, such as those of the human, mouse and rice, has become a challenge to biologists. Currently available bioinformatics tools used to compress genome sequence data have some limitations, such as the requirement of the reference single nucleotide polymorphisms (SNPs) map and information on deletions and insertions. Here, we present a novel compression tool for storing and analyzing Genome ReSequencing data, named GRS. GRS is able to process the genome sequence data without the use of the reference SNPs and other sequence variation information and automatically rebuild the individual genome sequence data using the reference genome sequence. When its performance was tested on the first Korean personal genome sequence data set, GRS was able to achieve ~159-fold compression, reducing the size of the data from 2986.8 to 18.8 MB. While being tested against the sequencing data from rice and Arabidopsis thaliana, GRS compressed the 361.0 MB rice genome data to 4.4 MB, and the A. thaliana genome data from 115.1 MB to 6.5 KB. This de novo compression tool is available at http://gmdd.shgmo.org/Computational-Biology/GRS.  相似文献   

11.
银缕梅叶器官的宏观与微观结构及系统意义   总被引:2,自引:0,他引:2  
银缕梅〔Shaniodendronsubaequale(Chang)Deng,WeietWang〕叶表皮毛为星状毛,气孔器为平列式中周缘型,叶缘齿型为弗特吉型,叶肉栅栏组织一层细胞厚,三叶隙三叶迹的节,从茎节到叶之间的维管束呈分离合并分离的变化格局,并且呈续次合并和续次分离发育模式。这些结果进而说明银缕梅属的独立性,也从一个侧面证明它应属于狭义的弗特吉族(Fothergileae)。  相似文献   

12.
The manufacture and use of a whole-genome microarray is a complex process and it is essential that all data surrounding the process is stored, is accessible and can be easily associated with the data generated following hybridization and scanning. As part of a program funded by the Wellcome Trust, the Bacterial Microarray Group at St. George's Hospital Medical School (BmuG@S) will generate whole-genome microarrays for 12 bacterial pathogens for use in collaboration with specialist research groups. BmuG@S will collaborate with these groups at all levels, including the experimental design, methodology and analysis. In addition, we will provide informatic support in the form of a database system (BmuG@Sbase). BmuG@Sbase will provide access through a web interface to the microarray design data and will allow individual users to store their data in a searchable, secure manner. Tools developed by BmuG@S in collaboration with specific research groups investigating analysis methodology will also be made available to those groups using the arrays and submitting data to BmuG@Sbase.  相似文献   

13.
Review of the structure of the symphysis pubis, based on my extensive study of the pelvic joints ('31) shows changes from age, function, pregnancy hormones and stress of parturition. Primary physiologic shearing clefts and secondary traumatic clefts in cartilage are more frequent in females. Inter-digitations in the young osteocartilaginous border secure the vulnerable growth cartilage against increasing shearing forces. The retropubic eminence, ligamentous or cartilaginous, forms earlier in females, later, due to bony lipping in males, secondary to extrusion of disc cartilage. Ovarian and placental hormones in pregnancy cause remodeling and resorption of the posterior margin of the pubic facette and adjacent cortex, making a (variably) deep bony groove for greatly hypertrophied transverse ligaments. Delivery of a mature infant produces traumatic changes leading to extrusion of torn fibrocartilage in any direction, progressively loosening the symphysis, producing cartilage nodules, cysts and reactive bone formation. Older age degenerative arthritis is more frequent in parous females.  相似文献   

14.
蛋白质二级结构预测样本集数据库的设计与实现   总被引:1,自引:0,他引:1  
张宁  张涛 《生物信息学》2006,4(4):163-166
将数据库技术应用到蛋白质二级结构预测的样本集处理和分析上,建立了二级结构预测样本集数据库。以CB513样本集为例介绍了该数据库的构建模式。构建样本数据库不仅便于存储、管理和检索数据,还可以完成一些简单的序列分析工作,取代许多以往必须的编程。从而大大提高了工作效率,减少错误的发生。  相似文献   

15.
Development of an efficient cell-free translation system from mammalian cells is an important goal. We examined whether supplementation of HeLa cell extracts with any translation initiation factor or translational regulator could enhance protein synthesis. eIF2 (eukaryotic translation initiation factor 2) and eIF2B augmented translation of capped, uncapped and encephalomyocarditis virus-internal ribosome entry site-promoted mRNAs. eIF4E specifically stimulated capped mRNA translation, while p97, a homologue to the C-terminal two-thirds of eIF4G, increased uncapped mRNA translation. When the HeLa cell extract was supplemented with a combination of eIF2, eIF2B, and p97, the capacity to synthesize a protein from an uncapped mRNA became comparable to that from the capped counterpart stimulated with a combination of eIF2, eIF2B, and eIF4E. A dialysis method rendered the HeLa cell extract capable of synthesizing proteins for 36h, and the yield was augmented when supplemented with initiation factors. In contrast, the productivity of a rabbit reticulocyte lysate was not enhanced by this method. Collectively, the translation factor-supplemented HeLa cell extract should become an important tool for the production of recombinant proteins.  相似文献   

16.

Background  

It is useful to develop a tool that would effectively describe protein mutation matrices specifically geared towards the identification of mutations that produce either wanted or unwanted effects, such as an increase or decrease in affinity, or a predisposition towards misfolding. Here, we describe a tool where such mutations are efficiently identified, categorized and visualized. To categorize the mutations, amino acids in a mutation matrix are arrang according to one of three sets of physicochemical characteristics, namely hydrophilicity, size and polarizability, and charge and polarity. The magnitude and frequences of mutations for an alignment are subsequently described using color information and scaling factors.  相似文献   

17.
Protein phosphorylation, one of the most important protein post-translational modifications, is involved in various biological processes, and the identification of phosphorylation peptides (phosphopeptides) and their corresponding phosphorylation sites (phosphosites) will facilitate the understanding of the molecular mechanism and function of phosphorylation. Mass spectrometry (MS) provides a high-throughput technology that enables the identification of large numbers of phosphosites. PhoPepMass is designed to assist human phosphopeptide identification from MS data based on a specific database of phophopeptide masses and a multivariate hypergeometric matching algorithm. It contains 244,915 phosphosites from several public sources. Moreover, the accurate masses of peptides and fragments with phosphosites were calculated. It is the first database that provides a systematic resource for the query of phosphosites on peptides and their corresponding masses. This allows researchers to search certain proteins of which phosphosites have been reported, to browse detailed phosphopeptide and fragment information, to match masses from MS analyses with defined threshold to the corresponding phosphopeptide, and to compare proprietary phosphopeptide discovery results with results from previous studies. Additionally, a database search software is created and a “two-stage search strategy” is suggested to identify phosphopeptides from tandem mass spectra of proteomics data. We expect PhoPepMass to be a useful tool and a source of reference for proteomics researchers. PhoPepMass is available at https://www.scbit.org/phopepmass/index.html.  相似文献   

18.
A mouse ENU-mutagenesis program at RIKEN GSC has been initiated to conduct a large-scale, genome-wide, early- and late-onset phenotypic screen of mutant mice. We screened about a hundred mice every week with a comprehensive set of phenotype assays including behavioral tests based on a modified SHIRPA protocol, blood tests (both clinical biochemical testing and hemogram), and measurement of locomotor activity in their home cages. To manage the entire program, we developed a client/server architecture database system and named it MUSDB (Mutagenesis Universal Support DataBase). It manages mouse husbandry, mating protocols, procedures for ENU injection and phenotypic screens, phenotype inheritance tests, preservation of sperm and organs, and other materials generated during the program. We have implemented MUSDB in quite a large-scale system that includes 150 client computers. It has, helped reduce typographical errors and provided simple and efficient operation via its front-end user interface. It significantly contributed to the communication within and between workgroups in the program and in the accumulation of various phenotypic and inheritance data.  相似文献   

19.
Over the past decades, genetic studies in rodent models of human multifactorial disorders have led to the detection of numerous chromosomal regions associated with disease phenotypes. Owing to the complex control of these phenotypes and the size of the disease loci, identifying the underlying genes requires further analyses in new original models, including chromosome substitution (consomic) and congenic lines, derived to evaluate the phenotypic effects of disease susceptibility loci and fine-map the disease genes. We have developed a relational database (MACS) specifically designed for the genetic marker-assisted production of large series of rodent consomic and congenic lines (speed congenics), the organization of their genetic and phenotypic characterizations, and the acquisition and archiving of both genetic and phenotypic data. This database, originally optimized for the production of rat congenics, can also be applied to mouse mapping projects. MACS represents an essential system for significantly improving efficiency and accuracy in investigations of multiple consomic and congenic lines simultaneously derived for different disease loci, and ultimately cloning genes underlying complex phenotypes.  相似文献   

20.
An efficient system for small protein expression and refolding   总被引:1,自引:0,他引:1  
The low expression yield and poor refolding efficiency of small recombinant proteins expressed in Escherichia coli have continued to hinder the large-scale purification of such proteins for structural and biological investigations. A system based on a small fusion partner, the B1 domain of Streptococcal protein G (GB1), was utilized to overcome this problem. We have tested this system on a small cysteine-rich toxin, mutant myotoxin alpha (MyoP20G). The highly expressed fusion protein was refolded using an unfolding/refolding protocol. Due to the small size of GB1, we were able to monitor the unfolding/refolding status by heteronuclear single quantum coherence (HSQC) NMR spectroscopy. The final product yielded well-resolved NMR spectra, with a topology corresponding to the natural product. We conclude that GB1 not only increases the expression level but also enhances the refolding of small proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号