Robetta server provides automated
computational tools for predicting and analyzing protein structures
Protein identification
SWISS-PROT or TrEMBL : find the
complete amino acid sequence of a protein knowing its ID
AACompIdent : identify
a protein by its amino acid composition
MultiIdent : identify
proteins with pI, Mw, amino acid composition, sequence tag and peptide
mass fingerprinting data
PeptIdent : identify
proteins with peptide mass fingerprinting data, pI and Mw experimentally
measured, user-specified peptide masses are compared with the theoretical
peptides calculated for all proteins in SWISS-PROT, making extensive use
of database annotations
TagIdent : identify
proteins with pI, Mw and sequence tag, or generate a list of proteins close
to a given pI and Mw
FindPept : identify
peptides that result from unspecific cleavage of proteins from their experimental
masses, taking into account artefactual chemical modifications, post-translational
modifications (PTM) and protease autolytic cleavage
PepMAPPER : peptide mass
fingerprinting tool from UMIST, UK
Mascot
: peptide mass fingerprint, sequence query and MS/MS ion search from Matrix
Science Ltd., London
PepSea
: protein identification by peptide mapping or peptide sequencing from
Protana, Denmark
PeptideSearch
: peptide mass fingerprint tool from EMBL Heidelberg
ProteinProspector : a variety
of tools from UCSF (MS-Fit, MS-Tag, MS-Digest, etc.) for mining sequence
databases in conjunction with mass spectrometry experiments [Mirrors at
Joint ProteomicS Laboratory, Ludwig
Institute Melbourne (Australia)]
PROWL : protein chemistry and
mass spectrometry resource from Rockefeller and NY Universities
CombSearch : an experimental
unified interface to query several protein identification tools accessible
on the web
Primary structure analysis
ProtParam : physico-chemical
parameters of a protein sequence (amino-acid and atomic compositions, pI,
extinction coefficient, etc.)
Compute pI/Mw : compute
the theoretical pI and Mw from a SWISS-PROT or TrEMBL entry
or for a user sequence
DrawHCA : draw
an HCA (Hydrophobic Cluster Analysis) plot of a protein sequence [mirror]
Structural Alignment Program for Proteins (StrAP)
by Christoph Gille, Group for Protein Structure Theory, Institute for Biochemistry,
Medical School Charite of the Humboldt University Berlin
FindMod : predict potential
protein post-translational modifications and potential single amino acid
substitutions in peptides. Experimentally measured peptide masses are compared
with the theoretical peptides calculated from a specified SWISS-PROT entry
or from a user-entered sequence, and mass differences are used to better
characterize the protein of interest.
Glycosylation
NetOGlyc 3.1 Server
: prediction of type O-glycosylation sites in mammalian proteins
NetNGlyc 1.0 Server
: prediction of type O-glycosylation sites in mammalian proteins
DGPI : prediction of GPI-anchor
and cleavage sites [Mirror site]
GlycoMod : predict possible
oligosaccharide structures that occur on proteins from their experimentally
determined masses (can be used for free or derivatized oligosaccharides
and for glycopeptides)
GlycanMass
: calculate the mass of an oligosaccharide structure
Phosphorylation
NetPhos : prediction
of Ser, Thr and Tyr phosphorylation sites in eukaryotic proteins
Jpred : a consensus
method for protein secondary structure prediction at EBI
nnPredict
: University of California at San Francisco (UCSF)
PredictProtein
: PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom, EvalSec from
Columbia University
Coils :
prediction of coiled coil regions in proteins (Lupas's method) at EMBnet-CH
Paircoil : prediction
of coiled coil regions in proteins (Berger's method)
Multicoil : prediction
of 2- and 3-stranded coiled coils
PREDATOR
: protein secondary structure prediction from single or multiple sequences
at EMBL (Argos' group)
PSA : BioMolecular
Engineering Research Center (BMERC) / Boston
PSIpred : Various protein
structure prediction methods at Brunel University
ProTherm
: Thermodynamic Database for wild-type and mutant proteins: Gibbs free
energy, enthalpy, heat capacity, transition temperature, secondary structure,
accessibility of wild type residues, experimental conditions (pH, temperature
etc.), measurements and methods used for each data, and activity information
(Km and kcat etc.)
quantify similarity of the 3D structures of a pair of proteins
Dali e-mail server at EMBL-EBI
computes the distances between corresponding locations in each of the proteins
and then compares these distances. There is not much point in measuring
the exact coordinates of atoms in proteins for structure comparisons, since
proteins are flexible and structures fluctuate about a mean position.
PRobability of IDEntity
(PRIDE) between 3D domains (or whole structures) estimated by a probabilistic
approach based on Ca-Ca
distance comparisonref
superimpose each protein on the other and measures how close they match.
It is not heuristic; instead, it exhaustively computes all the ways each
protein can rotate and translate in space and finds optimal solutions.
When the algorithm fails to find a good alignment, the researchers say
it is certain none exist. The fact that each molecular bond in a protein
exists in 3D space helps limit the ways that it can align, shrinking the
scope of the problem somewhat. In addition, the algorithm could also be
used to tackle the open problem of detecting multiple 3D conformations
a protein can adopt that can prove equally viable from a biological point
of view. This approach isn't a non-deterministic polynomial-time (NP) complete
(such problems require time that grows exponentially with the size of the
problem, which means there is no sense developing an algorithm for, since
there is no way they will be efficient), but rather a polynomial-time
problem, one that is computationally feasible, a problem that computers
could grow fast enough to solve within minutes 10 or so years from now.
The algorithm is too slow to be a useful everyday tool : to analyze even
a single pair of relatively small proteins, the algorithm took a day on
a multiple processor machine, while common protein structure analysis programs
take at most a few minutes
aMAZE is a system for representing
and analyzing molecular interactions and cellular processes
BindingDB is a public,
web-accessible database of measured binding affinities for biomolecules,
genetically or chemically modified biomolecules, and synthetic compounds.The
database currently contains data generated by isothermal titration calorimetry
(ITC) and enzyme inhibition (Enz. Inhib.) methodsref1,
ref2,
ref3
DisEMBL Protein Disorder Prediction/Predictor
predicts the probability that particular regions within a protein sequence
will be disordered (unstructured). Such unstructured regions frequently
contain short motifs involved in protein:protein interactions and targeting.
Because these regions may also affect protein stability and expression,
DisEMBL may be useful in designing protein constructs
Gene Ontology (GO) Consortium
(why, what and where of a protein role in cells)
International
Regulome Consortium (IRC), Ottawa, Canada (including scientists from
Canada, France, the United Kingdom, Singapore, Italy, and the United States)
aims to characterize the protein components of transcriptional complexes
containing all potential transcription factors and to identify and validate
the complete set of their binding sites and corresponding target genes
Minimotif
Miner (MnM) analyzes protein queries for the presence of short functional
motifs that, in at least one protein, has been demonstrated to be involved
in posttranslational modifications, binding to other proteins, nucleic
acids, or small molecules, or proteins traffickingref1,
ref2
Prolysis
(proteases, proteasome and protease inhibitors, analyzers for proteolytic
fragments)
Scansite (search for binding motifs
and phosphorylation sites) by Michael B. Yaffe and Lewis Cantley
SenseLab contains databases
and tools that provide insight into neuronal processes using the olfactory
system as a model. There are 3 neuronal databases and 2 olfactory databases
with tools for analysis. Some of the databases require the Neuron
simulation software be installed.
Molecular
Interaction Map of Macrophage : 506 reactions and 678 species. The
breakdown of the species shown on this map is as follows: 363 proteins,
15 ions, 135 simple molecules, 113 oligomers, and 39 genes. In the number
of species, 11 degraded products and 2 unknown molecules are also included.
The nucleotides, ROS, carbohydrates, lipids, coenzymes, peptides, and amino
acids are all shown as "simple molecules" in this version. Among 363 protein
species, we identified 281 molecules, that is, 6 G protein subunits, 121
enzymes (including 47 kinases), 40 receptors, 7 ion channels, 40 transcription
factors and their cofactors, 7 transporters, 14 cytokines, and 46 adaptor
proteins. During the construction of the map, there arose unclear cases
for the specific expression of a gene in RAW 264.7 cells as well as for
the specific occurance of protein-protein interactions. For example, the
cross talk between NF-?B and PPAR? was shown in pluripotent mesenchymal
stem cells, but it is not clear if such cross talk also exists in macrophages.
In addition, there are conflicts among published papers and possible alternative
explanations for certain interactions because of the varied experimental
systems studied. For legacy data that depend fully on published literature,
there are no clear means for making decisions on such cases. Therefore,
we have taken a heuristic approach. First, we ensure that we incorporate
molecules and interactions that are certain to exist and well agreed upon
in the community. This can be determined on the basis of consistency among
research papers as well as numbers of review papers. In cases where several
interacting partners for one protein are reported, priority is given to
those with demonstrated biological activity in macrophages. For example,
there are several reported ligands for PPAR?, but only 9-HODE and 13-HODE
are represented because they have known effects in macrophages. Selections
of this kind are made because of space constraints on the map. However,
when space permits, all possible interactions are included. Second, when
ambiguity exists between papers based on in vivo and in vitro
experiments, we opt for conclusions from in vivo experiments. When certain
interactions are only ambiguously reported, or not reported but known to
exist in a variety of different cell types, we look at reports that use
the embryologically nearer cell types to monocyte-derived macrophages,
such as bone marrow stem cells, to increase the reliability of the map.
Although some interactions are incorporated hypothetically, such as the
IRAK1-TRAF6-TAK1-TAB1-TAB2 cascade, molecules and interactions that do
not meet the criteria described above are not included in version 1.0 of
the map. In the future we hope to develop a consistent methodology to score
the reliability of the map based on legacy data and recalibration based
on controlled comprehensive measurements. The version 1.0 map is intended
to be comprehensive but not necessarily exhaustive. To create an exhaustive
map we need hard evidence on which proteins exist in RAW 264.7 cells as
well as which genes are expressed. However, the presence of a protein can
sometimes show a paradoxical relationship with gene expression, thus it
is important that we directly assess the presence of signaling proteins
using direct measurement and not just by inference. In addition, detailed
time-course measurements of protein levels combined with phosphoprotein
assays and shRNAi data generated by AfCS labs will help researchers to
reproduce and analyze dynamics of the network. We will periodically update
the map on the AfCS Web site based on the most current data available.
Our next step will be a systems-level analysis using real data retrieved
from quantitative experiments (1). To do this, we need a highly reliable
data set that includes expression levels, time course information, and
results of perturbation effects. Quantitative modeling, simulation, and
analysis of a particular focused subset of the system, as seen in the FXM
(Focus on X Module) project, will be the next step. Based on this map,
we are going to create another smaller but more detailed model, including
isoforms. Using the SBW-SBML platform, we can easily share and revise this
map, which will facilitate sharing and exchange of views in this project.
The other aspect of our future plan is to incorporate data derived from
the RIKEN FANTOM3 project, which measures the expression profile of all
transcription start sites, including that of noncoding RNA. Contrary to
FXM, which is focused on specific cascades but measured in depth, FAMTON3-based
analysis will be genome-wide but only with expression profiles. Our challenge
is to create a system of model-based analysis methods that can accommodate
these two extremes.
TRANSPATH Signal Transduction Browser
is an information system on gene-regulatory pathways, and an extension
module to the TRANSFAC database
Two-dimension polyacrylamide gel electrophoresis
(2-D PAGE)
Mouse Proteome Project at
the University of Toronto in Canada pinpointed > 3200 proteins in six organs.
The project's database indicates whether each protein is present in four
cellular compartments, such as the cytoplasm and mitochondria.
Organelle Map Database
from the Max Planck Institute for Biochemistry in Martinsried, Germany,
focuses on the mouse liver and caches results from a method called protein
correlation profiling. The site maps some 1400 proteins to 10 cellular
locations.