首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Exact Tandem Repeats Analyzer 1.0 (E-TRA) combines sequence motif searches with keywords such as ‘organs’, ‘tissues’, ‘cell lines’ and ‘development stages’ for finding simple exact tandem repeats as well as non-simple repeats. E-TRA has several advanced repeat search parameters/options compared to other repeat finder programs as it not only accepts GenBank, FASTA and expressed sequence tags (EST) sequence files, but also does analysis of multiple files with multiple sequences. The minimum and maximum tandem repeat motif lengths that E-TRA finds vary from one to one thousand. Advanced user defined parameters/options let the researchers use different minimum motif repeats search criteria for varying motif lengths simultaneously. One of the most interesting features of genomes is the presence of relatively short tandem repeats (TRs). These repeated DNA sequences are found in both prokaryotes and eukaryotes, distributed almost at random throughout the genome. Some of the tandem repeats play important roles in the regulation of gene expression whereas others do not have any known biological function as yet. Nevertheless, they have proven to be very beneficial in DNA profiling and genetic linkage analysis studies. To demonstrate the use of E-TRA, we used 5,465,605 human EST sequences derived from 18,814,550 GenBank EST sequences. Our results indicated that 12.44% (679,800) of the human EST sequences contained simple and non-simple repeat string patterns varying from one to 126 nucleotides in length. The results also revealed that human organs, tissues, cell lines and different developmental stages differed in number of repeats as well as repeat composition, indicating that the distribution of expressed tandem repeats among tissues or organs are not random, thus differing from the un-transcribed repeats found in genomes.  相似文献   

3.
Expressed sequence tags (ESTs) are widely used in gene survey research these years. The EST Pipeline System, software developed by Hangzhou Genomics Institute (HGI), can automatically analyze different scalar EST sequences by suitable methods. All the analysis reports, including those of vector masking, sequence assembly, gene annotation, Gene Ontology classification, and some other analyses, can be browsed and searched as well as downloaded in the Excel format from the web interface, saving research efforts from routine data processing for biological rules embedded in the data.  相似文献   

4.
Computational techniques have been adopted in medi-cal and biological systems for a long time. There is no doubt that the development and application of computational methods will render great help in better understanding biomedical and biological functions. Large amounts of datasets have been produced by biomedical and biological experiments and simulations. In order for researchers to gain knowledge from origi- nal data, nontrivial transformation is necessary, which is regarded as a critical link in the chain of knowledge acquisition, sharing, and reuse. Challenges that have been encountered include: how to efficiently and effectively represent human knowledge in formal computing models, how to take advantage of semantic text mining techniques rather than traditional syntactic text mining, and how to handle security issues during the knowledge sharing and reuse. This paper summarizes the state-of-the-art in these research directions. We aim to provide readers with an introduction of major computing themes to be applied to the medical and biological research.  相似文献   

5.
Plant genomics projects involving model species and many agriculturally important crops are resulting in a rapidly increasing database of genomic and expressed DNA sequences. The publicly available collection of expressed sequence tags (ESTs) from several grass species can be used in the analysis of both structural and functional relationships in these genomes. We analyzed over 260000 EST sequences from five different cereals for their potential use in developing simple sequence repeat (SSR) markers. The frequency of SSR-containing ESTs (SSR-ESTs) in this collection varied from 1.5% for maize to 4.7% for rice. In addition, we identified several ESTs that are related to the SSR-ESTs by BLAST analysis. The SSR-ESTs and the related sequences were clustered within each species in order to reduce the redundancy and to produce a longer consensus sequence. The consensus and singleton sequences from each species were pooled and clustered to identify cross-species matches. Overall a reduction in the redundancy by 85% was observed when the resulting consensus and singleton sequences (3569) were compared to the total number of SSR-EST and related sequences analyzed (24606). This information can be useful for the development of SSR markers that can amplify across the grass genera for comparative mapping and genetics. Functional analysis may reveal their role in plant metabolism and gene evolution.  相似文献   

6.
Modelling and simulation techniques are valuable tools for the understanding of complex biological systems. The design of a computer model necessarily has many diverse inputs, such as information on the model topology, reaction kinetics and experimental data, derived either from the literature, databases or direct experimental investigation. In this review, we describe different data resources, standards and modelling and simulation tools that are relevant to integrative systems biology.  相似文献   

7.
With its predicted proteome of 1550 proteins (data set Etalon) Helicobacter pylori 26695 represents a perfect model system of medium complexity for investigating basic questions in proteomics. We analyzed urea‐solubilized proteins by 2‐DE/MS (data set 2‐DE) and by 1‐DE‐LC/MS (Supprot); proteins insoluble in 9 M urea but solubilized by SDS (Pellet); proteins precipitating in the Sephadex layer at the application side of IEF (Sephadex) by 1‐DE‐LC/MS; and proteins precipitating close to the application side within the IEF gel by LC/MS (Startline). The experimental proteomics data of H. pylori comprising 567 proteins (protein coverage: 36.6%) were stored in the Proteome Database System for Microbial Research ( http://www.mpiib‐berlin.mpg.de/2D‐PAGE/ ), which gives access to raw mass spectra (MALDI‐TOF/TOF) in T2D format, as well as to text files of peak lists. For data mining the protein mapping and comparison tool PROMPT ( http://webclu.bio.wzw.tum.de/prompt/ ) was used. The percentage of proteins with transmembrane regions, relative to all proteins detected, was 0, 0.2, 0, 0.5, 3.8 and 6.3% for 2‐DE, Supprot, Startline, Sephadex, Pellet, and Etalon, respectively. 2‐DE does not separate membrane proteins because they are insoluble in 9 M urea/70 mM DTT and 2% CHAPS. SDS solubilizes a considerable portion of the urea‐insoluble proteins and makes them accessible for separation by SDS‐PAGE and LC. The 2‐DE/MS analysis with urea‐solubilized proteins and the 1‐DE‐LC/MS analysis with the urea‐insoluble protein fraction (Pellet) are complementary procedures in the pursuit of a complete proteome analysis. Access to the PROMPT‐generated diagrams in the Proteome Database allows the mining of experimental data with respect to other functional aspects.  相似文献   

8.
We present an EST library, chloroplast genome sequence, and nuclear microsatellite markers that were developed for the semi-domesticated oilseed crop noug (Guizotia abyssinica) from Ethiopia. The EST library consists of 25 711 Sanger reads, assembled into 17 538 contigs and singletons, of which 4781 were functionally annotated using the Arabidopsis Information Resource (TAIR). The age distribution of duplicated genes in the EST library shows evidence of two paleopolyploidizations—a pattern that noug shares with several other species in the Heliantheae tribe (Compositae family). From the EST library, we selected 43 microsatellites and then designed and tested primers for their amplification. The number of microsatellite alleles varied between 2 and 10 (average 4.67), and the average observed and expected heterozygosities were 0.49 and 0.54, respectively. The chloroplast genome was sequenced de novo using Illumina’s sequencing technology and completed with traditional Sanger sequencing. No large re-arrangements were found between the noug and sunflower chloroplast genomes, but 1.4% of sites have indels and 1.8% show sequence divergence between the two species. We identified 34 tRNAs, 4 rRNA sequences, and 80 coding sequences, including one region (trnH-psbA) with 15% sequence divergence between noug and sunflower that may be particularly useful for phylogeographic studies in noug and its wild relatives.  相似文献   

9.
Chu CK  Feng LL  Wouters MA 《Proteins》2005,60(4):577-583
Structural data mining studies attempt to deduce general principles of protein structure from solved structures deposited in the protein data bank (PDB). The entire database is unsuitable for such studies because it is not representative of the ensemble of protein folds. Given that novel folds continue to be unearthed, some folds are currently unrepresented in the PDB while other folds are overrepresented. Overrepresentation can easily be avoided by filtering the dataset. PDB_SELECT is a well-used representative subset of the PDB that has been deduced by sequence comparison. Specifically, structures with sequences that exhibit a pairwise sequence identity above a threshold value are weeded from the dataset. Although length criteria for pairwise alignments have a structural basis, this automated method of pruning is essentially sequence-based and runs into problems in the twilight zone, possibly resulting in some folds being overrepresented. The value-added structure databases SCOP and CATH are also a potential source of a nonredundant dataset. Here we compare the sequence-derived dataset PDB_SELECT with the structural databases SCOP (Structural Classification Of Proteins) and CATH (Class-Architecture-Topology-Homology). We show that some folds remain overrepresented in the PDB_SELECT dataset while other folds are not represented at all. However, SCOP and CATH also have their own problems such as the labor-intensiveness of the update process and the problem of determining whether all folds are equally or sufficiently distant. We discuss areas where further work is required.  相似文献   

10.
We isolated and characterized 52 novel microsatellite markers from Florida largemouth bass, Micropterus salmoides floridanus, for use in conservation, management and population genetic studies. Markers were assessed in M. s. floridanus from peninsular Florida (n = 30) and averaged eight alleles per locus with observed heterozygosity of 0.57 (range 0–0.97). Cross‐taxa amplification was successful among 88% of tested congeners. These polymorphic and potentially taxon‐diagnostic markers contribute to the limited number of microsatellites currently available for micropterids and specifically M. s. floridanus.  相似文献   

11.
There have been no molecular genetic examinations of mating systems in the Australian varanid lizards (genus Varanus) despite their high species diversity, the abundance of some species and difficulties with direct observation of behaviourally cryptic species in the field. We developed 10 polymorphic microsatellite loci and assessed their utility in a range of varanid species. Observed heterozygosities in the three species assessed ranged from 30% to 100%. These loci should be useful for investigation of population structure, gene flow and mating systems in Varanus acanthurus, V. baritji and V. tristis and may also be of use in other varanid species.  相似文献   

12.
Abstract A comprehensive but simple‐to‐use software package called DPS (Data Processing System) has been developed to execute a range of standard numerical analyses and operations used in experimental design, statistics and data mining. This program runs on standard Windows computers. Many of the functions are specific to entomological and other biological research and are not found in standard statistical software. This paper presents applications of DPS to experimental design, statistical analysis and data mining in entomology.  相似文献   

13.
High levels of genetic diversity are generated in Haemophilus influenzae populations through DNA repeat-mediated phase variation and recombination with DNA fragments acquired by uptake from the external milieu. Conversely, multiple pathways for maintenance of the genome sequence are encoded in H. influenzae genomes. In Escherichia coli, mutations in single-stranded-DNA exonucleases destabilise tandem DNA repeats whilst inactivation of recG can stabilise repeat tracts. These enzymes also have varying effects on recombination. Deletion mutations were constructed in H. influenzae genes encoding homologs of ExoI, RecJ and RecG whilst ExoVII was refractory to mutation. Inactivation of RecJ and RecG, but not ExoI, increased sensitivity to irradiation with ultraviolet light. An increase in spontaneous mutation rate was not observed in single mutants but only when both RecJ and ExoI were mutated. None of the single- or double-mutations increased or decreased the rates of slippage in tetranucleotide repeat tracts. Furthermore, the exonuclease mutants did not exhibit significant defects in horizontal gene transfer. We conclude that RecJ, ExoI and RecG are required for maintenance of the H. influenzae genome but none of these enzymes influence the generation of genetic diversity through mutations in the tetranucleotide repeat tracts of this species.  相似文献   

14.
《Genetics》2022,220(4)
WormBase (www.wormbase.org) is the central repository for the genetics and genomics of the nematode Caenorhabditis elegans. We provide the research community with data and tools to facilitate the use of C. elegans and related nematodes as model organisms for studying human health, development, and many aspects of fundamental biology. Throughout our 22-year history, we have continued to evolve to reflect progress and innovation in the science and technologies involved in the study of C. elegans. We strive to incorporate new data types and richer data sets, and to provide integrated displays and services that avail the knowledge generated by the published nematode genetics literature. Here, we provide a broad overview of the current state of WormBase in terms of data type, curation workflows, analysis, and tools, including exciting new advances for analysis of single-cell data, text mining and visualization, and the new community collaboration forum. Concurrently, we continue the integration and harmonization of infrastructure, processes, and tools with the Alliance of Genome Resources, of which WormBase is a founding member.  相似文献   

15.
16.
Soybean is one of the most economically important crops in the world. Soybean seeds have abundant protein and lipid content and very high economic value. In this study, a total of 184 seed-specific genes were obtained using online microarray databases, DDD, and RNA-seq data. The reported seed-specific genes in soybean and the 184 seed-specific genes analyzed in this paper were compared. Of the screened genes, 26 were common to both previous reports and the current screening. Meanwhile, 90 of the 184 genes have homologous counterparts in Arabidopsis, among which 24 have seed-specific expression, as indicated by microarray data for Arabidopsis. Furthermore, promoter analysis showed that almost all seed-specific genes contain at least one seed specific-related element. Seed-specific element Skn-1 motif exists in most, if not all, of the seed-specific genes screened. Five genes were randomly selected from 184 soybean seed specific gene pool and their expressions were quantified using quantitative real time polymerase chain reaction (qRT-PCR) to further confirm the specificity of the screened genes. The results indicated that all five genes showed seed-specific expression. Moreover, the identification of genes with seed-specific expression screened in this study provides information valuable to the in-depth study of soybean.  相似文献   

17.
Recent studies have demonstrated that cell cycle plays a central role in development and carcinogenesis. Thus, the use of big databases and genome-wide high-throughput data to unravel the genetic and epigenetic mechanisms underlying cell cycle progression in stem cells and cancer cells is a matter of considerable interest.

Real genetic-and-epigenetic cell cycle networks (GECNs) of embryonic stem cells (ESCs) and HeLa cancer cells were constructed by applying system modeling, system identification, and big database mining to genome-wide next-generation sequencing data. Real GECNs were then reduced to core GECNs of HeLa cells and ESCs by applying principal genome-wide network projection. In this study, we investigated potential carcinogenic and stemness mechanisms for systems cancer drug design by identifying common core and specific GECNs between HeLa cells and ESCs. Integrating drug database information with the specific GECNs of HeLa cells could lead to identification of multiple drugs for cervical cancer treatment with minimal side-effects on the genes in the common core. We found that dysregulation of miR-29C, miR-34A, miR-98, and miR-215; and methylation of ANKRD1, ARID5B, CDCA2, PIF1, STAMBPL1, TROAP, ZNF165, and HIST1H2AJ in HeLa cells could result in cell proliferation and anti-apoptosis through NFκB, TGF-β, and PI3K pathways. We also identified 3 drugs, methotrexate, quercetin, and mimosine, which repressed the activated cell cycle genes, ARID5B, STK17B, and CCL2, in HeLa cells with minimal side-effects.  相似文献   


18.
We deal in this paper with the concept of health smart home (HSH) designed to follow dependent people at home in order to avoid the hospitalisation, limiting hospital sojourns to short acute care or fast specific diagnostic investigations. For elderly people the project of such a HSH has been called AISLE (Apartment with Intelligent Sensors for Longevity Effectiveness). For this purpose, system having three levels of automatic measuring (1) the circadian activity, (2) the vegetative state, and (3) some state variables specific of certain organs involved in precise diseases, has been developed within the framework of a 'Health Integrated Smart Home Information System' (HIS2). HIS2 is an experimental platform for technologic development and clinical evaluation, in order to ensure the medical security and quality of life for patients who need home based medical monitoring. Location sensors are placed in each room of the HIS2, allowing the monitoring of patient's successive daily activity phases within the patient's home environment. We proceed with a sampling in an hourly schedule to detect weak variations of the nycthemeral rhythms. Based on numerous measurements, we establish a mean value with confidence limits of activity variables in normal behaviour permitting to detect for example a sudden abnormal event (like a fall) as well as a chronic pathologic activity (like a pollakiuria), allowing us to define a canonical domain within which the patient's activity is qualified to be 'predictable'. Alerts are set off if the patient's activity deviates from a predictable canonical domain. Moreover, we can follow the cardio-respiratory state by measuring the intensity of the respiratory sinusal arrhythmia in order to quantify the integrity of the bulbar vegetative system, and we finally propose to carefully watch abnormal symptoms like arterial pressure or presence of plasma proteins in the expired air flow for early detecting respectively hypertension or pulmonary oedema.  相似文献   

19.
The literature concerning the tissue culture of Taxus sp. as an alternative source for taxoid production is reviewed. The aim of this review is to summarize and discuss the progress achieved with the approaches and methods used for the establishment of various Taxus culture systems, the methods used for the evaluation of taxoid production, the multiple factors which control taxoid production and the feasibility of the in vitro production of taxoids on a commercial scale.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号