首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.  相似文献   

3.
4.
A proposal for a flow cytometric data file standard   总被引:1,自引:0,他引:1  
R F Murphy  T M Chused 《Cytometry》1984,5(5):553-555
The increasing complexity of multiparameter data collection and analysis in flow cytometry and the development of relatively inexpensive arc-lamp-based flow cytometers, which increases the probability that laboratories or institutions may have more than one type of instrument, creates a need for shareable analysis programs and for the transport of flow cytometric data files within an installation or from one institution to another. To address this need, we propose a standard file format to be used for all flow cytometric data. The general principles of this proposal are: (1) The data file will contain a minimum of three segments, TEXT, DATA, and ANALYSIS; (2) The TEXT and ANALYSIS segments consist of KEYWORDS, which are the names of data fields, and their values; (3) All TEXT is encoded in ASCII; (4) KEYWORDS and their values may be of any length; (5) Certain KEYWORDS will be standard, i.e., having specified formats to be recognized by all programs. The structure of the DATA segment will be uniquely defined by the values of KEYWORDS in the TEXT area. It may be in any bit resolution, facilitating compatibility between machines with different word length and/or allowing bit compression of the data. The structured nature of the TEXT area should facilitate management of flow cytometric data using existing data base management systems. The proposed file format has been implemented on VAX, PDP-11, and HP9920 based flow cytometry data acquisition systems.  相似文献   

5.
6.
Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems. Cluster analysis has proven to be a very useful tool for investigating the structure of microarray data. This paper presents a program for clustering microarray data, which is based on the so-called path-distance. The algorithm gives in each step a partition in two clusters and no prior assumptions on the structure of clusters are required. It assigns each object (gene or sample) to only one cluster and gives the global optimum for the function that quantifies the adequacy of a given partition of the sample into k clusters. The program was tested on experimental data sets, showing the robustness of the algorithm.  相似文献   

7.
8.
9.
The program d'plus calculates accuracy (sensitivity) and response-bias parameters using Signal Detection Theory. Choice Theory, and 'nonparametric' models. is is appropriate for data from one-interval, two- and three-interval forced-choice, same different, ABX, and oddity experimental paradigms.  相似文献   

10.
11.
A modification and extension of the computer program REVCUT (Blumenthal et al, Nucl.Acids Res. 10, 91-101 (1982) is described. The new program searches for restriction endonuclease recognition sites that are not coding DNA sequences of a protein of known aminoacid sequence using bit patterns. The modifications make the program more accurate and extend the range of the restriction endonucleases.  相似文献   

12.
A computer program is described for the rapid calculation of least squares solutions for data fitted to different functions normally used in reassociation and hybridization kinetic measurements. The equations for the fraction not reacted as a function of Cot follow: First order, exp(-kCot); second order, (1+kCot)-1; variable order, (1+kCot)-n; approximate fraction of DNA sequence remaining single stranded, (1+kCot)-.44; and a function describing the pairing of tracer when the rate constant for the tracer (k) is distinct from the driver rate constant (kd): (formula: see text). Several components may be used for most of these functional forms. The standard deviations of the individual parameters at the solutions are calculated.  相似文献   

13.
14.
A tutorially-assisted, interactive program, written for a Digital Equipment Corporation LAB-11 minicomputer (PDP-11/20, is described which allows a user to fit (with or without automatic estimation of initial parameter values), by a method of nonlinear least squres, any one of seven different types of probability density functions (p.d.f.'s) to an empirical frequency distribution; the latter of which may be input to the program or formed by the program whenever it is furnished a series of times between events. The iteratively-obtained, "best fit" p.d.f. is displayed on a two color, point-plot display against the background of a point-plot histogram. By selecting any one of nine output modes, the user is allowed: (1) to view histograms successively on the point-plot display, (2) to generate selected p.d.f.'s (3) to "force" p.d.f.'s having known parameters through the histogram data, (4) to obtain Chi-square (x2) and Kolmogorov-Smionov estimates of the goodness of fit to the data, and (5) to apply a special test [Williams and Kloot, 1953] in order to determine whether the least squares estimates of two candidate models are statistically different. The resident driver program and the four overlayable program segments are written in standard FORTRAN IV; except for two plot routines, which are written in PDP-11 assembly language.  相似文献   

15.
Many models for inference of population genetic parameters are based on the assumption that the data set at hand consists of groups displaying within-group Hardy-Weinberg equilibrium at individual loci and linkage equilibrium between loci. This assumption is commonly violated by the presence of within-group spatial structure arising from nonrandom mating of individuals due to isolation by distance (IBD). This paper proposes a model and simulation method implemented in a computer program to flexibly simulate data displaying such patterns. The program permits displaying of smooth spatial variations of allele frequencies due to IBD and more abrupt variations due to presence of strong barriers to gene flow. It is useful in assessing performance of various statistical inference methods and in designing spatial sampling schemes. This is shown by a simulation study aimed at assessing the extent to which IBD patterns affect accuracy of cluster inferences performed in models assuming panmixia. The program is also used to study the effects of spatial sampling scheme (e.g. sampling individuals in clumps or uniformly across the spatial domain). The accuracy of such inferences is assessed in terms of number of inferred populations, assignment of individuals to populations and location of borders between populations. The effect of spatial sampling was weak while the effect of IBD may be substantial, leading to the inference of spurious populations, especially when IBD was strong with respect to the size of the sampling domain. The model and program are new and have been embedded in the R package Geneland, for user convenience and compliance with existing data formats.  相似文献   

16.
A computer algorithm has been developed which identifies tRNA genes and tRNA-like structures in DNA sequences. The program searches the sequence string for specific base positions that correspond to the invariant and semi-invariant bases found in tRNAs. The tRNA nature of the sequence is confirmed by the presence of complementary base pairing at the tRNA's calculated 5' and 3' ends (which in situ constitutes the amino-acyl stem region). The program achieves greater than 96% accuracy when run against known tRNA sequences in the Genbank database. The program is modular and is readily modified to allow searching either a file or database. The program is written in "C" and operates on a D.E.C. Vax 750. The utility of the algorithm is demonstrated by the identification of a distinctive tRNA structure in an intron of a published bovine hemoglobin gene.  相似文献   

17.
We used site-directed mutagenesis to determine the minimum number of PDGF B residues needed to convert PDGF A to a potently transforming PDGF B-like molecule. Substitution of two PDGF B subdomains, 106-115 and 135-144, were found to be critical. These substitutions were sufficient to broaden the ability of PDGF A to activate beta as well as alpha platelet-derived growth factor (PDGF) receptors and increase its transforming efficiency to that of PDGF B. Within subdomain I, either PDGF B residues Arg-109 and Asn-115 or Arg-109, Leu-110, and Arg-113, in combination with subdomain II PDGF B residues Asn-136, Arg-137, and Arg-142 were identified as being essential. Those mutants with transforming ability comparable with PDGF B showed significantly lower efficiencies of beta receptor triggering. Thus, our studies identify a small number of PDGF B amino acids indispensable for beta PDGF receptor interaction and suggest that a low level of beta PDGF receptor activation is sufficient to dramatically increase PDGF transforming efficiency in NIH 3T3 cells.  相似文献   

18.
The program which is written in FORTRAN estimates haplotype frequencies in two-locus and three-locus genetic systems from population diploid data. It is based on the gene counting method which leads to maximum likelihood estimates, and can be used whenever the possible antigens (one or more) on each chromosome can be specified for each person and for each locus, i.e., ABO-like systems and inclusions are permitted. The number of alleles per locus may be rather large, and both grouped and ungrouped data can be used. Log likelihoods are calculated on the basis of various assumptions, so that likelihood ratio tests can be carried out.  相似文献   

19.
It has been suggested that the physician would be aided in his interpretation of clinical laboratory results if the data were reported in terms of percentile ranking in a reference population as well as in conventional units. This paper describes a FORTRAN porgram for estimating the necessary reference percentile values by means of regression over 20% of the distribution rather than by interpolation between adjacent points. The regression estimates are shown to have a negligible bias and to be somewhat more stable than those obtained by interpolation.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号