首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
The programs offer the possibility of comparing pairs of homologous sequences in order to find out percentage of homology, number of identical and deviating nucleotides, of transitions and transversions and, derived from these, KNUC-values according to Kimura (1) and the corresponding standard error sigmaK. The sequences can be printed in pairs underneath each other, homologies are indicated by asterisks between the identical nucleotides. Out of a set of homologous sequences stored on a disk any number of sequences can be compared in pairs in this way, and a matrix containing either the percentage of homology values, the number of deviating nucleotides or the KNUC-values together with the corresponding standard errors can be sent to screen, printer or disk. A program will be available soon which creates a dendrogram representing the similarity between the sequences by use of an average linkage clustering method deduced from this matrix. The programs are written for Apple II computers using UCSD-PASCAL and for Sirius I/Victor 9000 computers using TURBO-PASCAL.  相似文献   

2.
3.
4.
The program ‘MacStAn’ for the Apple Macintosh generatesrandom sequences and can analyze their tendency to form secondarystructure or translation products as well as their mono-, di-and trinucleotide composition. Generation of random sequencesis versatile in that one can (i) predefine the G + C content,maximal base repetitions and constant regions; (ii) preset theentire dinucleotide composition; or (iii) shuffle an existingsequence. The program constitutes an integrated package witha graphical user interface, fill-featured editing, saving, printing,text import and export, dot plot and sequence alignment.  相似文献   

5.
DNA Strider is a new integrated DNA and Protein sequence analysis program written with the C language for the Macintosh Plus, SE and II computers. It has been designed as an easy to learn and use program as well as a fast and efficient tool for the day-to-day sequence analysis work. The program consists of a multi-window sequence editor and of various DNA and Protein analysis functions. The editor may use 4 different types of sequences (DNA, degenerate DNA, RNA and one-letter coded protein) and can handle simultaneously 6 sequences of any type up to 32.5 kB each. Negative numbering of the bases is allowed for DNA sequences. All classical restriction and translation analysis functions are present and can be performed in any order on any open sequence or part of a sequence. The main feature of the program is that the same analysis function can be repeated several times on different sequences, thus generating multiple windows on the screen. Many graphic capabilities have been incorporated such as graphic restriction map, hydrophobicity profile and the CAI plot- codon adaptation index according to Sharp and Li. The restriction sites search uses a newly designed fast hexamer look-ahead algorithm. Typical runtime for the search of all sites with a library of 130 restriction endonucleases is 1 second per 10,000 bases. The circular graphic restriction map of the pBR322 plasmid can be therefore computed from its sequence and displayed on the Macintosh Plus screen within 2 seconds and its multiline restriction map obtained in a scrolling window within 5 seconds.  相似文献   

6.
The rapidly growing body of sequenced DNA demands efficientcomputer programs for its analysis and storage. The programdescribed in this paper, SEQ-ED, has been designed to handlea large number of DNA sequences up to 200 kilobases [kb] longstored in a sequence library. In order to minimize the requiredstorage space, the sequences are stored in a compressed formatusing three binary digits per base. In the development of thisprogram, special care has been given to make it easy to usefor molecular biologists without any previous computer experience. Received on September 10, 1984; accepted on October 30, 1984  相似文献   

7.
8.
Summary A method for detecting homology between two protein or nucleic acid sequences which require insertions or deletions for optimum alignment has been devised for use with a computer. Sequences are assessed for possible relationship by Monte Carlo methods involving comparisons between the alignment of the real sequences and alignments of randomly scrambled sequences of the Same composition as the real sequences, each alignment having the optimum number of gaps. As each gap is successively introduced into a comparison (real or random) a maximum score is determined from the similarity of the aligned residues. From the distribution of the maximum alignment scores of randomly scrambled sequences having the same number of gaps, the percentage of random comparisons having higher scores is determined, and the smallest of these percentage levels for each pair of sequences (real or random) indicates the optimum alignment. The fraction of the comparisons of random sequences having percentage levels at their optimum alignment below that of the real sequence comparison at its optimum estimates the probability that such an alignment might have arisen by chance. Related sequences are detected since their optimum alignment score, by virtue of a contribution from ancestral homology in addition to optimised random considerations, occupies a more extreme position in the appropriate frequency distribution of scores than do the majority of optimum scores of randomly scrambled sequences in their appropriate distributions.Application of this optimum match method of sequence comparison shows that the sensitivity of the maximum match method of Needleman and Wunsch (1970) decreases quite dramatically with sequence comparisons which require only a few gaps for a reasonable alignment, or when sequences differ greatly in length. The maximum match method as applied by Barker and Dayhoff (1972) has the additional disadvantage that deletions which have occurred in the longer of two homologous protein sequences further decrease the sensitivity of detection of relationship. The constrained match method of Sankoff and Cedergren (1973) is seen to be misleading since large increments in the alignment score from added gaps do not necessarily result in a high total alignment score required to demonstrate sequence homology.  相似文献   

9.
10.
A program was written in GFA-BASIC for the Atari ST microcomputeraimed at drawing two-dimensional homology ‘dotplot’patterns for two protein or DNA sequences. The program, builtaround a machine-code subroutine, communicates interactivelywith the user by means of a multi-button dialogue panel andmouse-directed input. A 1000 x 1000 sequence comparison witha 14: 21 stringency window takes 12 s.  相似文献   

11.
MOTIVATION: In 2001 and 2002, we published two papers (Bioinformatics, 17, 282-283, Bioinformatics, 18, 77-82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST.  相似文献   

12.
13.

Background  

A large number of bioinformatics applications in the fields of bio-sequence analysis, molecular evolution and population genetics typically share input/ouput methods, data storage requirements and data analysis algorithms. Such common features may be conveniently bundled into re-usable libraries, which enable the rapid development of new methods and robust applications.  相似文献   

14.
Presently the sequences of more than 150 different kinds of proteins and nucleic acids are known from the many thousands thought to exist in all living creatures. Some few of these have occpied much the same functional niche within the living cell from near the beginning of life. In three of these latter, sequence evidence pointing to duplications of genetic material in a primitive ancestor is available and in the fourth other evidence suggests it. Such a duplication, shared by the many descendant species, permits us to locate the point of earliest time on an evolutionary tree and to infer the actual order of subsequent evolutionary events. The amounts of change which have occurred in each descendant line can be estimated with good confidence. Some inferences can be made of the structure of the ancestral duplicated sequence, the evolutionary mechanisms which have been operative on it, and the functional capacity of the organism in which it originated. We will describe new, sensitive, objective methods for establishing the probable common ancestry of very distantly related sequences and the quantitative evolutionary change which has taken place. These methods will be applied to the four families, and evolutionary trees will be derived where possible. Of the three families containing duplications of genetic material, two are nucleic acids: transfer RNA and 5S ribosomal RNA. Both of these structures are functional in the synthesis of coded proteins, and prototypes must have been present in the cell at the inception of the fundamental coding process that all living things share. There are many types of tRNA which recognise the various nucleotide triplets and the 20 amino acids. These types are thought to have arisen as a result of many gene duplications. Relationships among these types will be discussed. The 5S ribosomal RNA, presently functional in both eukaryotes and prokaryotes, is very likely descended from an early form incorporating almost a complete duplication of genetic material. The amount of evolution in the various lines can again be compared. The other two families containing duplications are proteins: ferredoxin and cytochrome c. Ferredoxin from photosynthetic and nonphotosynthetic bacteria shows clear evidence of a duplication of genetic material. This duplication is very possibly shared by the ferredoxin from plant plastids and the related adrenodoxin from mammalian mitochondria. If so, a chronology of the detalls of evolution of these groups can be inferred. From these examples of protein and nucleic acid sequence, we conclude that the amount of change in the bacterial lines is less than that in the eukaryote lines. Even though mutant bacteria are easily produced in the laboratory, though their evolutionary adaptation to new drugs is very rapid, and though new virulent strains often appear spontaneously, nevertheless the sequences of ancient structures in the wild types have changed less than those in the eukaryote lines. Cytochrome c sequences from many eukaryotes and the closely related cytochrome c2 fromRhodospirillum rubrum are known. Other types of cytochrome, such as c551 and c553, are probably related to these through gene duplication. Knowledge of enough of these structures to establish an early duplication will provide a time orientation for the cytochrome c evolutionary tree. This quantitative tree now contains sequences from animals, fungi, green plants, protozoa, and bacteria, examples from all five biological kingdoms.  相似文献   

15.
16.
17.
We have developed a collection of programs for manipulation and analysis of nucleotide and protein sequences. The package was written in Fortran 77 on a Sirius1/Victor microcomputer which can be easily implemented on a large variety of other computers. Some of the programs have already been adapted for use on a Vax 11. Our aim was to develop programs consisting of small, comprehensible and well documented units that have very fast execution times and are comfortably interactive. The package is therefore suitable for individual modifications, even with little understanding of computer languages.  相似文献   

18.
One of the main problems in nucleic acid-based techniques for detection of infectious agents, such as influenza viruses, is that of nucleic acid sequence variation. DNA probes, 70-nt long, some including the nucleotide analog deoxyribose-Inosine (dInosine), were analyzed for hybridization tolerance to different amounts and distributions of mismatching bases, e.g. synonymous mutations, in target DNA. Microsphere-linked 70-mer probes were hybridized in 3M TMAC buffer to biotinylated single-stranded (ss) DNA for subsequent analysis in a Luminex® system. When mismatches interrupted contiguous matching stretches of 6 nt or longer, it had a strong impact on hybridization. Contiguous matching stretches are more important than the same number of matching nucleotides separated by mismatches into several regions. dInosine, but not 5-nitroindole, substitutions at mismatching positions stabilized hybridization remarkably well, comparable to N (4-fold) wobbles in the same positions. In contrast to shorter probes, 70-nt probes with judiciously placed dInosine substitutions and/or wobble positions were remarkably mismatch tolerant, with preserved specificity. An algorithm, NucZip, was constructed to model the nucleation and zipping phases of hybridization, integrating both local and distant binding contributions. It predicted hybridization more exactly than previous algorithms, and has the potential to guide the design of variation-tolerant yet specific probes.  相似文献   

19.
EZ-FIT, an interactive microcomputer software package, has been developed for the analysis of enzyme kinetic and equilibrium binding data. EZ-FIT was designed as a user-friendly menu-driven package that has the facility for data entry, editing, and filing. Data input permits the conversion of cpm, dpm, or optical density to molar per minute per milligram protein. Data can be fit to any of 14 model equations including Michaelis-Menten, Hill, isoenzyme, inhibition, dual substrate, agonist, antagonist, and modified integrated Michaelis-Menten. The program uses the Nelder-Mead simplex and Marquardt nonlinear regression algorithms sequentially. A report of the results includes the parameter estimates with standard errors, a Student t test to determine the accuracy of the parameter values, a Runs statistic test of the residuals, identification of outlying data, an Akaike information criterion test for goodness-of-fit, and, when the experimental variance is included, a chi 2 statistic test for goodness-of-fit. Several different graphs can be displayed: an X-Y, a Scatchard, an Eadie-Hofstee, a Lineweaver-Burk, a semilogarithmic, and a residual plot. A data analysis report and graphs are designed to evaluate the goodness-of-fit of the data to a particular model.  相似文献   

20.
Citrinin lowered contents of chlorophyll, carotenoids, proteins and nucleic acids during seed germination of maize cv. Suwan composite. The inhibitory effect was concentration dependent. Acknowledgements: The authors are thankful to Head, University Department of Botany, Bhagalpur University for providing laboratory facilities and to Prof. J.V.V. Dogra for his help in gel electrophoresis. One of the authors (GP) is also thankful to the CSIR, New Delhi for financial assistance (Project No. 9/24/(17)EMR-I).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号