Sequence Analysis Tools


Shows a graphical overview of hydrophobicity and side chain volume along a protein sequence.



This server compares a single sequence (COILS) or a sequence alignment (PCOILS) to a database of known coiled-coils and derives a similarity score. The program then calculates the probability that the sequence will adopt a coiled-coil conformation. The external COILS server at ISREC can be found here.



Prediction of functional residues in protein multiple sequence alignments.



De novo repeat identification in protein sequences by HMM-HMM comparison. HHrep is extremely sensitive in detecting very diverged repeat units but will also be happy with nearly identical repeats. It builds a profile HMM from the query sequence by several rounds of PSI-BLAST and compares this HMM with itself. HHrep can also exploit the redundancy in a set of pairwise alignments ("transitivity") with the help of its "Merge alignments" function. It returns a profile-profile dotplot for immediate visual identification of the repeat structure, as well as pairwise self-alignments and an alignment of the repeat units.



HHrepID, a method for the de novo identification of repeats in protein sequences. It is able to detect the sequence signature of structural repeats in many proteins that have not yet been known to possess internal sequence symmetry, such as TIM barrels and outer membrane beta-barrels. HHrepID uses HMM-HMM comparison to exploit evolutionary information in the form of multiple sequence alignments of homologs. In contrast to HHrep, the new method (1) generates a multiple alignment of repeats; (2) utilizes the transitive nature of homology through a novel merging procedure with fully probabilistic treatment of alignments; (3) improves alignment quality through an algorithm that maximizes the expected accuracy; (4) is able to identify different repeats within complicated architectures or multiple domains through automatic domain boundary detection, (5) has improved sensitivity through a new approach to assess statistical significance.



MARCOIL is a hidden Markov model-based program that predicts existence and location of potential coiled-coil domains in protein sequences. The external MARCOIL Homepage can be found here.



REPPER is a server that detects and analyses regions with short gapless REPeats in protein sequences. It finds PERiodicities by Fourier Transform (FTwin) and internal homology analysis (REPwin). The exact locations of the periodic patterns are traced by a sliding window. The output is complemented by coiled coil prediction (COILS) and optionally by secondary structure prediction according to Jones (PSIPRED).



TPRpred uses the profile representation of the known repeats to detect Tetratrico Peptide Repeats (TPRs), Pentatrico Peptide Repeats(PPRs) and SEL1-like repeats from the query sequence and computes the statistical significance for their occurrence.


External servers for predicting functional sites


Identification of functionally and structurally important residues in protein sequences. This server scores residues by a combination of evolutionary conservation and solvent accessibility prediction. (No structure required.)



Identification of functionally important residues in protein structures. Very similiar to ConSeq, but scores are also mapped onto 3D protein structure.



Identification of functionally important residues in protein structures. As ConSurf, the "Evolutionary Trace" method calculates a score from a multiple sequence alignment (MSA) and maps the scores onto 3D protein structure. But the score is a mixture of conservation and subtyping score: columns which are only conserved WITHIN some clades of the phylogenetic tree underlying the MSA will count as partially conserved and will also tend to get good scores. (These residues are supposed to specify 'subtypes' within the family of homologs.)


External function prediction servers


Gene/protein annotation tool that can find the gene cluster a gene/protein belongs to, finds similar clusters in other organisms (with similar genes highlighted in similar colors), provides a gene's upstream and downstream regions, and much more. For an introductory tutorial, click "FIG tutorials" and then "Why Use the SEED".



Predict functional and physical associations (protein-protein interactions) between proteins. Underlying STRING is a database of protein-protein associations from diverse sources: genomic context, high-throughput experiments, coexpression, and literature mining.



Predict the subcellular localization of a protein based on its amino acid sequence (e.g. cytosol, nucleus, mitochondria). Predictions are based on both known sorting signal motifs and some correlative sequence features such as amino acid content. The overall prediction accuracy is claimed to be over 80%.


External de-novo repeat detection servers


Fast automatic server for de novo repeat detection that determines optimal repeat boundaries and uses iterative alignment of the query protein sequence to the repeat profile(s). Good for short composition-biased repeats and complex repeat architectures involving many different types of repeats.



Tracking Repeats Using Significance and Transitivity. De novo repeat detection server based on sequence-sequence comparison that is able to exploit the information from transitivity. Fast.



De novo repeat detection from the same group as TRUST.


Internal Repeat Finder

De novo repeat detection by sequence-sequence self-alignment.