Abstract: We present a program
for the prediction of biological activity
spectra for drug-like organic substances. New lead compounds can be found on the
basis of predicted biological activity spectra. In house and Internet versions
of the PASS program are discussed.
Keywords: biological activity spectrum, computer-aided
prediction, computer system PASS (Prediction of Activity Spectra for
Substances), applications in computer-aided drug discovery, prediction via
Internet.
Introduction
Most of known biologically active substances have many
different biological activities that cause both main (therapeutic) and
supplementary (side) actions. Some of these activities are found during the
initial preclinical study; others are found unfortunately too late in clinical
trials (see, for example, the
Fluorouracil's activities in Table 1). Sometimes, many years after the first
launch of a drug additional activities
are discovered that become
the base for a new therapeutic application (see some
examples in Table 2).
Most computer-aided
drug-discovery methods are used to study a single, or only a few activities of a
compound class. [1-5]. A program that
predicts simultaneously pharmacological effects, mechanisms, and specific toxicities on the
basis of the 2D chemical structure is the tool of choice to get an early
indication if a compound could be a potential
lead.
Victor Avidon proposed this idea more than 35 years ago [6, 7].
In the framework of national registration
system of the UDSSR this technology has
been formerly developed and tested on new chemical compounds synthesized in the USSR [8, 9].
The program was revised several times. The theoretical analysis went
through several approaches and the accumulated experience of finding new leads
allows constant improvements [10-14].
|
The PASS team is permanently collecting and
evaluating the information about new pharmaceutical substances and lead
compounds, to update the PASS training set and extend PASS predictive
abilities on new chemical classes and novel biological activities:
,
Figure 1. Increase of the number of
compounds over years that are abstracted for the knowledge base
|
Figure 2. Increase of the number of predictable
activities over years
The current version of PASS predicts ca.
to 4130 pharmacological
effects, mechanisms
of action, and other effects, see Table 1. [15].
We provide a list of
activities.
In the following we show the methods used in PASS, examples of
practical applications, and how you can evaluate PASS yourself by using it on
line as demo version, or as evaluation version.
|
Number |
Area |
Examples |
261 |
pharmacotherapeutic actions |
Anxiolytic |
66 |
anti-infective actions |
Antileishmanial |
72 |
actions blocking a certain process |
Apoptosis antagonist |
40 |
actions stimulated a certain process |
Apoptosis agonist |
140 |
actions blocking activity of certain
endogenous substance |
Acetylcholine antagonist |
71 |
actions simulating activity of certain
endogenous substance |
Acetylcholine agonist |
5 |
actions blocking a release of a
certain endogenous substance |
Cytochrome C release inhibitor |
9 |
actions stimulating a release of a
certain endogenous substance |
Acetylcholine release stimulant |
9 |
actions blocking an uptake of a
certain endogenous substance |
Adenosine uptake inhibitor |
2219 |
actions inhibiting a certain enzyme |
12 Lipoxygenase inhibitor |
41 |
actions stimulating action of a
certain enzyme |
ATPase stimulant |
268 |
actions blocking a certain receptor |
5 Hydroxytrypamine 1 agonist
|
121 |
actions stimulating a certain receptor |
5 Hydroxytrypamine 1 antagonist |
28 |
actions blocking a certain channel |
Chloride channel antagonist |
5 |
actions stimulating a certain channel |
Calcium channel agonist |
28 |
actions blocking a certain transporter |
GABA transporter 1 inhibitor |
128 |
actions that is a substrate of a
certain metabolic enzyme |
CYP3A4 substrate |
24 |
actions inhibiting a certain metabolic
enzyme ( |
,
CYP3A4 inhibitor |
13 |
actions inducing a certain metabolic
enzyme |
CYP3A4 inducer |
28 |
actions inhibiting a certain protein |
Collagen inhibitor |
8 |
actions inhibiting an expression of a
certain transcription factor |
Transcription factor Rho inhibitor |
2 |
actions stimulating an expression of a
certain transcription factor |
TP53 expression enhancer |
389 |
actions that cause a certain
adverse/toxic effect |
Carcinogen |
Table 1. List of biological effects
|
Presentation of biological activities in PASS
Let's define biological activity as the result of a compound's
interaction with an biological entity. In clinical studies the entity is the human organism. In preclinical testing it
can be animals (in vivo) or experimental models (in vitro). The biological activity
depends on a compound's structure, charge distribution, physico-chemical properties, and more. The activity depends on
the biological entity (species, sex, age, etc.), on the mode of treatment
(dose, route), etc. Any biologically active compound reveals a wide spectrum of
different effects. Some of them are useful in treatment of diseases but others cause various side and toxic effects.
All activities
caused by the compound are considered to be the "biological
activity spectrum of the substance".
If the experimental conditions can not be defined narrowly,
i.e if difference in species, sex, age, dose, route, etc. is
neglected the biological activity can be identified only qualitatively
(“yes’/“none”, “active”/“inactive”). Thus, the "biological activity spectrum" is defined as
an
"intrinsic" property of a compound depending only on its structure and
physico-chemical characteristics. Qualitative presentation allows integrating
information concerning biologically active compounds collected from many
different sources into the general PASS training set. Any property of chemical
compounds, which is determined by their structural peculiarities, can be used
for prediction by PASS. It was shown, that the applicability of PASS is broader
than the prediction of biological activities. For instance, this approach was
successfully used for prediction of such general property of organic molecules
as drug-likeness (Anzali et al., 2001).
Chemical structure description
in PASS.
The 2D structural formulae of
compounds were chosen as the basis for description of chemical structure because
this is the only information available at the early stage of research. Thus,
using the structural formula as an input data, one can obtain the estimates of
biological activity profiles even for virtual molecules, prior to their chemical
synthesis and biological testing.
Many different characteristics of
chemical compounds can be calculated on the basis of structural formulae. In the
earliest versions of PASS (Poroikov et al., 1993; Filimonov et al., 1995;
Filimonov and Poroikov, 1996) we used the Substructure Superposition Fragment
Notation (SSFN) codes (Avidon et al., 1982). However, SSFN, like many other
structural descriptors, reflects rather abstraction of chemical structure by the
human than the nature of ligand-target interactions, which are the molecular
mechanisms of biological activities.
|
The Multilevel Neighbourhoods
of Atoms (MNA) descriptors (Filimonov et al., 1999) have certain
advantages in comparison with SSFN. These descriptors are based on the
molecular structure representation, which includes the hydrogen atoms
according to the valences and partial charges of other atoms and does
not specify the types of bonds. MNA descriptors are generated as
recursively defined sequence:
-
zero-level MNA descriptor
for each atom is the mark A of the atom itself;
-
any next-level MNA descriptor for the atom is the
sub-structure notation A(D1D2....Di
the previous-level MNA descriptor for i–th immediate
neighbour’s of the atom A.
The mark of atom may include
not only the atomic type but also any additional information about the
atom. In particular, if the atom is not included into the ring, it is
marked by “-”. The neighbor descriptors D1D2....Di
are arranged in unique
lexicographic order. Iterative process of MNA descriptors generation can
be continued covering first, second, etc. neighborhoods of each atom.
The molecular structure is
represented in PASS by the set of unique MNA descriptors of the 1st and
2nd levels (Figure 3). The substances are considered to be equivalent in
PASS if they have the same set of MNA descriptors. Since MNA descriptors
do not represent the stereochemical peculiarities of a molecule, the
substances whose structures differ only stereochemically, are formally
considered as equivalent.
HC |
C(C(CC—H)C(CC—C)—H(C)) |
HO |
C(C(CC—H)C(CN—H)—H(C)) |
CHCC |
C(C(CC—H)C(CN—H)—C(C—O—O)) |
CHCN |
C(C(CC—H)N(CC)—H(C)) |
CCCC |
C(C(CC—C)N(CC)—H(C)) |
CCOO |
N(C(CN—H)C(CN—H)) |
NCC |
—H(C(CC—H)) |
OHC |
—H(C(CN—H)) |
OC |
—H(—O(—H—C)) |
|
—C(C(CC—C)—O(—H—C)—O(—C)) |
|
—O(—H(—O)—C(C—O—O)) |
|
—O(—C(C—O—O)) |
Figure 3. Structural
formula of nicotinic acid and its MNA descriptors of the 1st (left
column) and 2nd (right column) levels
New QNA (Quantitative
Neighbourhoods of Atoms) descriptors were recently developed, which
allow the analysis of quantitative structure-activity relationships (Filimonov
et al., 2009).
|
Mathematical Approach
PASS algorithm of biological activity
spectrum prediction is based on Bayesian estimates of probabilities of
molecule’s belonging to the classes of active and inactive compounds,
respectively. The mathematical method is described in several publications (Lagunin
et al., 2000; Stepanchikova et al., 2003; Poroikov and Filimonov, 2005;
Filimonov and Poroikov, 2006; Filimonov and Poroikov, 2008), and its details
will not be discussed here. Only general description necessary for
interpretation of prediction results is presented below.
Since the main purpose of PASS is the
prediction of activity spectra for new molecule, the general principle of the
PASS algorithm is the exclusion from SAR Base the substances, which are
equivalent to the substance under prediction.
The structural formula of molecule,
for which PASS prediction should be carried out, is presented as a MOL file (for
the set of molecules – as SDFile). The predicted activity spectrum is presented
in PASS by the list of activities with probabilities "to be active" Pa and "to
be inactive" Pi calculated for each activity (Figure xxx). The list is arranged
in descending order of Pa-Pi; thus, the more probable activities appeared at the
top of the list. Only activities with Pa>Pi are considered as possible for a
particular compound. The list can be shortened at any desirable cutoff value,
but Pa>Pi is used by default. If the user chooses rather high value of Pa as a
cutoff for selection of probable activities, the chance to confirm the predicted
activities by the experiment is high too, but many existing activities will be
lost. For instance, if Pa>90% is used as a cutoff, about 90% of real activities
will be lost; for Pa>80%, the portion of lost activities is 80%, etc.
It is necessary to keep in mind that
probability Pa reflects the similarity of molecule under prediction with the
structures of molecules, which are the most typical in a sub-set of “actives” in
the training set. Therefore, usually there is no direct correlation between the
Pa values and quantitative characteristics of activities.
Even active and potent compound,
whose structure does not resemble the typical structures of “actives” from the
training set, may obtain a low Pa value during the prediction (even negative
Pa-Pi values could be observed). This may be explained by the way how the
appropriate estimates are constructed: the values Pa for “actives” and Pi for
“inactives” are distributed uniformly. |
Taking this into
account, the following interpretation of prediction results is possible.
If, for instance, Pa=0.9, then for 90% of “actives” from the training
set the appropriate estimates are less than for this compound, and only
for 10% of “actives” these values are higher. If one declines the
suggestion that this compound is active, he will make a wrong decision
with probability 0.1. In case if Pa<0.5, but Pa>Pi, for more than half
of “actives” from the training set the appropriate estimates are higher
than for this compound. If one declines the suggestion that this
compound is active, he will make a wrong decision with probability less
than 0.5. In such case the probability to confirm this kind of activity
in the experiment is small, but if it will be confirmed, more than 50%
chances that this structure has a high novelty and may become NCE.
If the predicted biological
activity spectrum is wide, the structure of the compound is quite
simple, and does not contain peculiarities, which are responsible for
the selectivity of its biological action.
If it appears that the
structure under prediction contains several new MNA descriptors (in
comparison with the descriptors from the compounds of the training set),
then the structure has low similarity with any structure from the
training set, and the results of prediction should be considered as
rather rough estimates.
Based on these criteria, one
may choose which activities have to be tested for the studied compounds
on the basis of compromise between the novelty of expected
pharmacological action and the risk to obtain the negative result in
experimental testing. Certainly, one could also take into account a
particular interest to some kinds of activity, experimental facilities,
etc.
We have developed a special application
CWM Lead Finder which matches with
clustering algorithms the biological spectra of a set of compounds with
known biological activity and a set of untested compounds.
|
Mathematical Approach
The accuracy and efficiency of more than 200 various mathematical
approaches were tested to select the most relevant algorithms [16].
One of the methods that provides a satisfactory quality of prediction is described
below in more details.
Definitions:
n is the total number of compounds in the training set;
ni is the number of compounds, that have the descriptor i;
nj is the number of compounds, that reveal the activity j;
nij is the number of compounds, that have both the descriptor i and the activity
j;
pj = nj/n is the estimate of a priori probability of activity
j;
pij = nij/ni is the estimate of the conditional probability of the activity j
for the descriptor i;
m is the number of descriptors for the compound under prediction;
ri = ni/(ni + 0.5/m) is a regulating factor;
Prj is the initial estimate of the probability of the
activity j for the compound under prediction;
CPj is the cutting point;
E1j(CPj) is the estimate of 1st kind error probability;
E2j(CPj) is the estimate of 2nd kind error probability;
The 1st kind error is observed when the compound under
prediction actually is active but Prj < CPj;
The 2nd kind error is observed when the compound under prediction is considered
as inactive but Prj > CPj.
LOO is the leave-one-out
procedure.
For each compound in the training set the values n,
ni, nj,
nij are changed to n-1, ni-1, and nj-1, nij-1 when it has activity j, and the
estimates Prj are calculated.
MEP is the maximal error of prediction (see below). --->
|
Algorithm of Prediction
Structural descriptors are generated for the compound under prediction.
The following values are calculated for each activity:
uj = SiArcSin{ri(2pij-1)},
vj = SiArcSin{ri(2pj-1)}
sj = Sin(uj/m), tj = Sin(vj/m)
Prj = (1+(sj-tj)/(1-sjtj))/2
Validation criteria: The LOO estimates of
Prj are calculated for each compound in the training set.
The estimates of E1j(CPj) and E2j(CPj) are calculated for each activity. The
cross point
E1j(CPj*) = E2j(CPj*)
are calculated. The maximal error of prediction MEP is:
MEPj = E1j(CPj*) = E2j(CPj*)
Results of the prediction:
The probability to be active is:
Pa = E1j(Prj)
The probability to be inactive is:
Pi = E2j(Prj)
The result for the prediction is presented as the list of
activities with appropriate Pa and Pi, sorted in descending order of the
difference (Pa-Pi)>0.
|
|
|
Figure
1. The Process of PASS Development
|
PASS Elements
The principal elements of PASS include the Training
Set, Chemical Structure
Description, Biological
Activity Description, and the Mathematical
Approach. They are described in more details below.
The Training Set
PASS 1.611 training set consists of about 46'000 of
biologically active compounds, from which about 15,000 substances are already
launched drugs, and about 13,000 drug-candidates are under clinical or
advanced preclinical testing, now. Since 1972 this training set is compiled from
many sources including: --->
|
publications, patents, databases, private communications, etc. For the majority of compounds, included into the training set,
the biological activity spectrum of each compound was studied in detail.
In PASS Pro the customer can create easily his own training
set. A training set consists of a SDFile with the field activity_prediction.
This file is read into PASS. It takes about 5 minutes to read a training set of
1000 compounds.
Chemical
Structure Description
The structure of a compound is described by descriptors. We
call them Multilevel
Neighborhoods of Atoms (MNA). The general idea is illustrated below for Ethanol.
|
0
|
1 |
2 |
C
|
C(HHHC)
|
C(H(C)H(C)H(C)C(HHCO))
|
C
|
C(HHCO)
|
C(H(C)H(C)C(HHHC)O(HC))
|
O
|
O(HC)
|
O(H(O)C(HHCO))
|
H
|
H(C)
|
H(C(HHHC))
|
H
|
H(C)
|
H(C(HHHC))
|
H
|
H(C)
|
H(C(HHHC))
|
H
|
H(C)
|
H(C(HHCO))
|
H
|
H(C)
|
H(C(HHCO))
|
H
|
H(O)
|
H(O(HC))
|
|
At the first step, the 1st and 2nd levels neighborhoods for
atoms are generated, see above. At the second step, the duplicate MNA
descriptors are eliminated.
H(C(HHHC))
H(C(HHCO))
H(O(HC))
C(H(C)H(C)H(C)C(HHCO))
C(H(C)H(C)C(HHHC)O(HC))
O(H(O)C(HHCO))
Despite the possibility to continue the procedure calculating
the 3rd, 4th, etc. levels of atoms' neighborhoods, only descriptors of the 1st
and 2nd levels are used, because this approximation is shown to provide the best
quality of prediction.
Biological
Activity Description
Let's define biological activity as the result of a compound's
interaction with an biological entity. In clinical studies the entity is the human organism. In preclinical testing it
can be animals (in vivo) or experimental models (in vitro). The biological activity
depends on a compound's structure, charge distribution, --->
|
|
|
|
Quality of Prediction
The quality of prediction can be calculated by leave-one-out cross
validation. Each of the compounds is subsequently removed from the training set
and the prediction of its activity spectrum is carried out on the basis of the
remaining part of the training set. The result is compared to the known activity
of the compound, and the maximal error of prediction (MEP) is calculated, and
averaged over all compounds and activities.
This error is about 0.15 for the current version of PASS. The average accuracy of prediction
using LOO cross-validation is about 0.85.
Such accuracy is enough for practical use. Especially taking
into account that random guess would produce a number of 1/1000 = 0.001 for 100
activities.
Interpretation
of the Prediction's Results
The total
number of MNA descriptions of the compound, and the number of new MNA descriptors
are shown with the result. If the number of new descriptors is more than 3, the result
of the prediction may questionable.--->
|
Pa and Pi are the estimates expressed as probability of the
compound to be active and inactive. The values vary from 0 to 1. Only activities with Pa
> Pi are considered as possible for a particular compound.
If Pa > 0.7. the chance to find the activity experimentally
is high. But, in many cases the compound may occur to be a close analogue of
known pharmaceutical agents.
If 0.5 < Pa < 0.7 the chance to find the activity experimentally is less, but the compound is
probably not so similar to known pharmaceutical
agents.
If Pa < 0.5 the chance to find the activity experimentally
is even less. But, the chance increases of finding a structurally new
compound.
Thus, one may choose which activities have to be tested in
ones compounds on the basis of compromise between expected novelty of
pharmacological agent and risk to get too many false positives.
Using a biological profile = several important activities one
can increase the quality of prediction considerably.
|
Various
Applications of PASS
It takes about 30 minutes to calculated the biological
activity spectra for 100'000 compounds on an ordinary IBM PC Pentium/500 MHz. One can effectively use
PASS for predicting activities of many compounds from large in-house and commercial databases.
At NCI the PASS parameters are provided with each compound, by
updating the database periodically. One also can calculated the PASS parameters
on the fly. This would have the advantage that one gets also a prediction for
compounds that one searches on-line at CAS, or at DiscoveryGate.
PASS can be useful applied for:--->
|
Revealing
new effects and mechanisms of action
for the old substances in
corporate and private data bases.
Finding
new leads among the
compounds from in-house and commercial databases.
Selecting the
most prospective compounds for high throughput screening from a set
of available samples.
Determining relevant
screens for a particular compound.
|
Revealing
New Effects and Mechanisms of Action
This is considered below on the example
of predicting the biological activity spectrum for the well-known cerebrotonic drug
Cavinton (Vinpocetin). This was launched by Gedeon Richter (Hungary) more than twenty
years ago. Its structural formula and predicted biological activity spectrum are
given below. |
Predicted biological activity spectrum for Cavinton
45 Descriptors, 0 New Descriptors, 47 Predicted Activities
No
|
Pa
|
Pi
|
Activity
|
Experiment
|
Reference
|
1
|
0.929
|
0.004
|
Peripheral vasodilator
|
|
|
2
|
0.900
|
0.000
|
Multiple sclerosis treatment
|
|
|
3
|
0.855
|
0.005
|
Vasodilator
|
+
|
[17, 18]
|
4
|
0.844
|
0.003
|
Abortion inducer
|
+
|
[17]
|
5
|
0.812
|
0.001
|
Antineoplastic enhancer
|
|
|
6
|
0.760
|
0.006
|
Coronary vasodilator
|
+
|
[19]
|
7
|
0.732
|
0.007
|
Spasmogenic
|
|
|
8
|
0.700
|
0.036
|
Antihypoxic
|
+
|
[17, 20,
21]
|
9
|
0.650
|
0.004
|
Lipid peroxidase inhibitor
|
+
|
[22, 23]
|
10
|
0.648
|
0.008
|
Cognition disorders treatment
|
+
|
[17, 24,
25]
|
11
|
0.656
|
0.021
|
Antiischemic
|
+
|
[17, 26-28]
|
12
|
0.577
|
0.013
|
Acute neurologic disorders treatment
|
+
|
[17, 18]
|
13
|
0.540
|
0.039
|
Spasmolytic
|
+
|
[18]
|
14
|
0.519
|
0.026
|
Antianginal agent
|
|
|
15
|
0.486
|
0.037
|
Antihypertensive
|
+
|
[18]
|
16
|
0.449
|
0.035
|
Antiarrhythmic
|
+
|
[29]
|
17
|
0.432
|
0.063
|
Sympatholytic
|
|
|
18
|
0.438
|
0.077
|
Sedative
|
+
|
[18]
|
19
|
0.500
|
0.152
|
Antiinflammatory, Pancreatic
|
|
|
20
|
0.328
|
0.020
|
Antidepressant, Imipramin-like
|
|
|
21
|
0.300
|
0.010
|
Thrombolytic
|
+
|
[17, 18,
20]
|
22
|
0.342
|
0.075
|
Psychotropic
|
+
|
[18]
|
23
|
0.276
|
0.023
|
Alpha 2 adrenoreceptor antagonist
|
+
|
[30]
|
24
|
0.273
|
0.029
|
Anesthetic intravenous
|
|
|
25
|
0.547
|
0.304
|
Vascular (periferal) disease treatment
|
|
|
26
|
0.225
|
0.006
|
Antineoplastic Alkaloid
|
|
|
27
|
0.291
|
0.086
|
Cholinergic antagonist
|
|
|
28
|
0.263
|
0.066
|
Benzodiazepine agonist partial
|
|
|
29
|
0.417
|
0.238
|
Insulin promoter
|
|
|
30
|
0.222
|
0.045
|
MAO-A inhibitor
|
|
|
31
|
0.353
|
0.188
|
Cardiovascular analeptic
|
|
|
32
|
0.249
|
0.100
|
Narcotic antagonist
|
|
|
33
|
0.300
|
0.161
|
Acetylcholine release stimulant
|
|
|
34
|
0.236
|
0.104
|
Antitumor-cytostatic
|
|
|
35
|
0.271
|
0.165
|
Antiparkinsonian, rigidity relieving
|
|
|
36
|
0.218
|
0.127
|
Antidepressant
|
|
|
37
|
0.247
|
0.157
|
Analeptic
|
|
|
38
|
0.211
|
0.126
|
Potassium channel antagonist
|
|
|
39
|
0.243
|
0.158
|
Antiparkinsonian, tremor relieving
|
|
|
40
|
0.333
|
0.258
|
5 Hydroxytryptamine 3 agonist
|
|
|
41
|
0.233
|
0.172
|
Respiratory analeptic
|
|
|
42
|
0.242
|
0.184
|
Antipsoriatic
|
|
|
43
|
0.131
|
0.081
|
Analgesic, opioid
|
|
|
44
|
0.147
|
0.128
|
N-cholinergic agonist
|
|
|
45
|
0.285
|
0.267
|
cAMP phosphodiesterase inhibitor
|
+
|
[17]
|
46
|
0.175
|
0.162
|
Anestetic general
|
|
|
47
|
0.375
|
0.370
|
Male reproductive disfunction treatment
|
|
|
|
Cavinton is used in medicinal practice for twenty years. Many activities that were found in
preclinical testing and clinical trials
during this period are compared with the result of the prediction. According to the
available literature only 16 of 47 predicted activities of Cavinton are already
found. These activities are marked by "+" in the Table above.
In particular, ASS predicts the vasodilator
and spasmolytic activities (Pa=0.855 and 0.540). It corresponds with the well-known pharmacological effects of
Cavinton. It causes vasodilatation,
increases the brain blood flow and metabolism. Antihypoxic and Antiischemic
effects are also predicted for Cavinton (Pa=0.700 and 0.656 respectively). Cavinton is used for these purposes. Cavinton is predicted as Lipid
peroxidase inhibitor (Pa=0.650), agent for cognition disorders treatment
(0.648), agent for acute neurological disorders treatment (0.577), etc. Cavinton
has all these activities.--->
|
The predicted biological activity spectrum of Cavinton
suggests several new application of the
substance. Among them are: Multiple sclerosis treatment (Pa=0.900); Antineoplastic
enhancer (0.812), Antineoplastic Alkaloid (0.225) and Antitumor-Cytostatic
(0.236); Antiparkinsonian rigidity-relieving (0.271) and Antiparkinsonian
tremor-relieving (0.243); etc. While the Multiple sclerosis treatment is
predicted with high probability, all other additionally predicted activities
have relatively small values of Pa.
Similarly, the predicted activity spectrum for any compound
provides ideas for further testing. As a result some new effects and
mechanisms will be found for old substances. Varying the cutoff value of Pa one
may choose the desirable level of novelty vs. acceptable risk of negative
result.
|
Finding
Potential New Leads
A researcher can define desirable
and not desirable activities for a compound. She can select such compounds from
a set of structures with the help of PharmaExpert. For example, among
the 15630 compounds of the ChemStar database (http://www.chemstar-ru.com)
959 compounds are predicted as
Endothelin antagonist, 236 compounds as Angiotensin II antagonist, 57 compounds
as Angiotensin converting enzyme inhibitor. --->
|
If the purpose of the study is to
find the compounds with dual mechanism of Antihypertensive effect, e.g.
Angiotensin converting enzyme inhibitor + Endothelin antagonist, only 11
compounds are predicted as having both activities. The best from the hits has Pa
=0.170 (Endothelin antagonist) and Pa=0.244 (Angiotensin converting enzyme
inhibitor). Based on this result one may decide either to test these 11
compounds or to carry out the prediction and selection for compounds from
another database. In any case varying the cutoff value of Pa it is possible to
choose the compounds with less or higher novelty (see: Interpretation
of the Prediction's Results). |
Selecting
the Most Prospective Compounds for Highthroughput Screening.
Sometimes one is interested in activities that are not yet
included in PASS, and the data are not available to train ones own knowledge
base for PASS Pro. In such cases two other strategies are
suitable.
The first strategy is based on the hypothesis that the more
activities are predicted for a compound, the higher is the chance to find
any useful pharmacological action for this compound. For each compound the following value is
calculated: P = [S Pa/(Pa+Pi )]/n
where n is the number of biological activities under
consideration.
All compounds are arranged in the descending order of P
values, and only compounds with the highest values of P are selected for screening.
The second strategy is based on the hypothesis that the more "novel"
a compounds is, the higher is the probability to find a NCE. Thus, the compounds with the
highest amount of new descriptors are selected.
Both strategies were tested on datasets including 10,000 -
70,000 compounds and their efficacy is shown [31].--->
|
Determining Relevant
Screens for a Particular Compound.
Testing can be
organized in descending order of difference (Pa-Pi) for different activities.
For example, if we consider the example of Cavinton, it should be
studied in the following tests: Peripheral vasodilator (0.929-0.004), Multiple
sclerosis treatment (0.900-0.000), Vasodilator (0.855-0.005), Abortion inducer
(0.844-0.003), Antineoplastic enhancer (0.812-0.001), Coronary vasodilator
(0.760-0.006), etc.
In this case both safety and efficacy of a new compound will be
characterized more comprehensively. Moreover, it is shown that the economic
viability of such approach to testing is more than 500% [32].
|
Experimental
Verification
The predictions of PASS were confirmed by
experiment. Some of these examples are given below.
The activity spectra have been predicted for 300 new chemical
compounds, synthesized in the Chemical-Pharmaceutical Research Institute (Novokuznetzk).
Twenty compounds have been selected for testing as probable antiulcer agents.
Nine compounds have been synthesized and tested. A potent antiulzer activity was
found for 5 of these compounds. These new antiulcer agents are NCE [33].
The economic advantage is about (300/20)100 = 1500% in this study.
The activity spectra have been predicted for 520 new chemical
compounds, synthesized in the Institute of Organic Chemistry of Russian Academy
of Science (Moscow). Fourteen compounds have been selected for testing as the
most prospective. It was shown that the results of 22 experiments made on 5
various kinds of activity, coincide with predictions in 20 cases. The accuracy
of prediction is about 90%.
Based on the predicted biological activity spectra for about
20 macroheterocyclic compounds, 2 antitumor leads were found.[34].
New antibacterial agents were found based on the biological activity spectra for derivatives of
1-amino-4-(5-arylozaxolyl-2)-butadiens-1,3 [35].
Analgesic, antiinflammatory, antioxidant and some additional
activities were predicted and confirmed by experiment for some thiazole
derivatives [36]. --->
|
These and some other examples demonstrate that the
approach to predicting many biological activities simultaneously can be
effectively applied to compounds from different chemical series to find various
pharmacological actions.
Naturally, the PASS approach has some limitations. They are:
PASS approach can be applied to so-called
"drug-like" substances.
PASS can be applied to the activities for which
the training set will include no less than 5 active compounds per activity.
The accuracy of the PASS predictions are significantly
higher than random guess. PASS cannot predict the
activity spectrum for essentially new compounds that
have no descriptor in the training set
In some cases PASS predicts both agonist's and
antagonist's (blocker and stimulator) actions simultaneously. Thus, only
experiments can clarify the intrinsic activity of a compound, but it probably
has an affinity to appropriate receptor (enzyme).
|
Using PASS
via Internet
Since July 1998 PASS is open for free testing via
Internet (http://www.ibmh.msk.su/PASS/default.htm).
Anyone who would like to obtain additional information about the biological
potential of her compound may fill the registration form and send the
structure file in ISIS (MDL Information
Systems, Inc.) "MOLl" format.
Such files can be prepared, for example, with the chemical editor
ISIS/Draw (MDL Information Systems, Inc.).
ISIS/Draw is available free for personal or non-commercial use from the MDL web
site http://www.mdl.com.
The molfile can be prepared with ISIS/Draw by drawing the sucture
using the menu options and
the mouse. After that one can choose "Edit" a
"Select All". When the molecule is selected as a total, choose
"File" a "Export" a
"Molfile". Files have to be saved on the disk under a name
defined by the user.
|
When the molfile is prepared and the registration form is filled in the Internet version of PASS, one may click on the option
"Browse" and select the molfile, and click on "Submit now" and
wait for the result. In case of any problem, please send an E-mail to pass@ibmh.msk.su.
Conclusions
A new program is developed for predicting biological activity
of
drug-like compound on the basis of the 2D structure. It can be applied
effectively in finding new leads. This is demonstrated on examples
of both compounds with known activities and new synthesized structures studied
as potential pharmacological agents. The program can be tested easily via the
Internet, by downloading a light version, or by applying for the evaluation of
the full version. Please send an email to
us
Acknowledgments
We gratefully acknowledge MDL
Information Systems, Inc. for providing ISIS/Host, ISIS/Base and the MDDR
database used in this study.
This is an edited version of the original
paper, A. Kos 3.2.03.
|
FLUOROURACIL'S KNOWN ACTIVITIES (CAS NO. 51-21-8)
Activity |
Publ. Year
|
Antineoplastic |
1962
|
Embryotoxic |
1966
|
Antiviral |
1971
|
Thimidine Triphosphate Synthesis
Inhibition |
1977
|
RNA Synthesis Inhibition |
1980
|
Protein Synthesis Inhibition |
1987
|
Lipid Metabolism Regulator |
1990
|
Immunosuppressant |
1990
|
Antimetabolite |
1991
|
Antiviral (AIDS) |
1996
|
. . .
?
|
. . .
?
|
|
EXAMPLES OF PHARMACEUTICALS, WHICH ADDITIONALLY DISCOVERED
ACTIVITY
WAS USED FOR ANOTHER INDICATION
Pharmaceutical |
Therapeutic Effect |
Year
|
Acetazolamide |
Diuretic
Antiepileptic
|
1954
1956
|
Valproate |
Anxiolytic
Antiepileptic
|
1961
1989
|
Levamisole |
Antihelmintic
Immunostimulant
|
1968
1980
|
Alprostadil |
Antiagregant
Erectant
|
1988
1994
|
Aspirin |
Analgesic
Antiagregant
|
1899
1971
|
?
|
?
|
?
|
|
References
[1] Wermuth C.G., ed., Medicinal chemistry
in practice, Academic Press, London, 1996, 968 p.p.
[2] Van de Waterbeemd H., ed.,
Structure-property correlations in drug research, Landes, Austin, 1996,
210 p.p.
[3] Dean P.M., Molecular similarity in drug
design, Blackie Academic, London, 1995,
[4] Livingstone D., Data analysis for
chemists. Applications to QSAR and Chemical Product Design, Oxford Science
Publ., Oxford, 1995, 239 p.p.
[5] Kubinyi H., ed., 3D QSAR in drug
design, Escom, Leiden, 1993, 759 p.p.
[6] Avidon V., Criteria for similarity
assessment of chemical structures and the basics of informational language for
development of informational-logical system on biologically active compounds.
Chem. & Pharmaceut. J. (Rus.), 1974, 8 (8), 22-25.
[7] Piruzyan L.A., Avidon V.V., Rozenblit
A.B., et.al. Statistical analysis of the information file on biologically
active compounds. I. Data base on the structure and activity of biologically
active compounds. Chem. & Pharmaceut. J. (Rus.), 1977, 11 (4),
35-40.
[8] Piruzyan L.A., Rudzit E.A. The
methodical approaches to study biological activity of chemical compounds.
Chem. & Pharmaceut. J. (Rus.), 1976, 10 (8), 21-27.
[9] Burov Yu.V., Korolchenko L.V., Poroikov
V.V. National system for registration and biological testing of chemical
compounds: facilities for new drugs' search. Bull. Natl. Center for
Biologically Active Compounds (Rus.), 1990, No. 1, 4-25.
[10] Filimonov D.A., Poroikov V.V.,
Karaicheva E.I., et. al. (1995). Computer-aided prediction of biological
activity spectra of chemical substances on the basis of their structural
formulae: computerized system PASS. Experimental and Clinical Pharmacology (Rus),
58 (2), 56-62.
[11] Filimonov D.A., Poroikov V.V. PASS:
Computerized prediction of biological activity spectra for chemical
substances. Bioactive Compound Design: Possibilities for Industrial Use, BIOS
Scientific Publishers, Oxford, 1996, p.47-56.
[12] Poroikov V.V., Filimonov D.A.
Computerized prediction of biological activity spectra for chemical substance
- new approach to effective drug design. In: QSAR and Molecular Modelling
Concepts, Computational Tools and Biological Applications. Barcelona: Prous
Science Publishers, 1996, p.49-50.
[13] Poroikov V.V., Filimonov D.A.,
Stepanchikova A.V., et.al.. Opimization of synthesis and pharmacological
testing of new compounds based on computerized prediction of their biological
activity spectra. Chem. & Pharmaceut. J. (Rus), 1996, 30 (9),
20-23. (English translation by Consultants Bureau, New York: Pharmaceutical
Chemistry Journal, 1996, 30 (9), 570-573).
[14] Poroikov V.V. PASS, a program for the
prediction of activity spectra from molecular structure. Newsletter of The
QSAR and Modelling Society, 1997, No. 8, 12-15.
[15] Gloriozova T.A., Filimonov D.A.,
Lagunin A.A., Poroikov V.V. Testing of computer system for prediction of
biological activity spectra PASS on the set of new chemical compounds. Chem.
& Pharmaceut. J. (Rus), 1996, In press.
[16] Filimonov D.A. Comparison of
Algorithms for Computer Prediction of Biological Activity Spectra for Chemical
Compounds on the Basis of Their Structural Formulae. II Rus. Natl. Congress
"Man and Drugs", Moscow, Abstracts, 1995, 62-63.
[17] Summary of Cavinton (Vinpocetine)
Gedeon Richter, Budapest-Hungary, 1994-06-07.
[18] Mashkovskii M.D. The Pharmaceuticals,
Medicine, Moscow, 1997, v.1,
399-400.
[19] VIDAL. Pharmaceuticals in Russia.
Moscow, AstraPharmService, 1997.
[20] Kiss B., Karpati E. Acta Pharm. Hung., 1996, 66 (5), 213-224.
[21] Plotnikova T.M., Plotnikov M.V.,
Bazhenova T.G. Bull. Exp. Biol. Med., 1991,
111 (2), 170-172.
[22] Karmazsin L., Olah V. A., Balla G.,
Makay A. Acta Paediatr. Hung. 1990, 30 (2),
217-224.
[23] Suno M., Nagaoka A. Nippon Yakurigaku
Zasshi, 1988, 91 (5), 295-299.
[24] Boda J., Karsay K., Czako L., Fugi
S., Kovacs A., Koncz I., Maczko P. A. Ther. Hung., 1989, 37 (3), 176-180.
[25] Molnar P., Gaal L. Eur. J. Pharmacol., 1992, 215 (1), 17-22.
[26] Kiss B., Karpati E. Acta Pharm. Hung., 1996, 66 (5), 213-224.
[27] Hadjiev D., Yancheva S.
Arzneimittelforschung, 1976, 26 (10A), 1947-1950.
[28] Rischke R., Krieglstein J.
Pharmacology, 1990, 41 (3), 153-160.
[29] Karpati E., Szporny L.
Arzneimittelforschung, 1976, 26 (10A),1908-1912.
[30] Paulo T., Toth P.T., Nguyen T.T.,
Forgacs L., Torok T.L., Magyar K. J. Pharm. Pharmacol., 1986, 38 (9), 668-73.
[31] Poroikov V.V., Filimonov D.A.,
Stepanchikova A.V. Biological Activity Spectra Prediction as a Tool to Select
the Most Prospective Compounds from Commercial and In-House Databases. Abstr.
Intern. Med. Chem. Symp., Seoul, 1997, P.143.
[32] Poroikov V.V, Filimonov D.A,
Boudunova A.P. Computer Assisted Prediction of Biological Activity Spectra:
Estimating the Effectivity of Use in High Throughput Screening. Abstr: XIVth
International Symposium on Medicinal Chemistry, Maastricht, the Netherlands, 1996,
P-3.05.
[33] Trapkov V.A., Budunova A.P., Burova
O.A., Filimonov D.A., Poroikov V.V. Discovery of New Antiulcer Agents by
Computer Aided Prediction of Biological Activity. Problems in Medical
Chemistry (Moscow), 1997, 43 (1), 41-57.
[34] Islyaikin M.K., Danilova E.A., Kudrik
E.V., Smirnov R.P., Boudunova A.P., Kinzirskii A.S. Synthesis and study of
antitumor action of macroheterocyclic compounds and their complexes with
metals. Chemical & Pharmaceutical J. (Rus), 1997, 31 (8), 19-22.
[35] Maiboroda D.A., Babaev E.V.,
Goncharenko L.V. (1998). Synthesis and study of spectral and pharmacological
properties of 1-amino-4-(5-arylozaxolyl-2)-butadiens-1,3. Chemical &
Pharmaceutical J. (Rus), 32 (6), 24-28.
[36] Geronikaki A., Poroikov V.,
Hajipavlou-Litina D., Mgonzo R., Filimonov D., Lagunin A. Synthesis, computer
assisted prediction of biological activity spectra and experimental testing of
new thiazole derivatives. Quantitative Structure-Activity Relationships, 1998,
In press
Anzali S., Barnickel G., Cezanne B., Krug M., Filimonov
D., Poroikov V. (2001). Discriminating between drugs and nondrugs by
Prediction of Activity Spectra for Substances (PASS). J. Med. Chem. 44:
2432-2437.
Avidon V.V. (1974). Criteria for the comparison of
chemical structures and principles of construction of an information
language for a logical information system for biologically active compounds.
Pharm-Chem. J. (Rus). 8: 22-25.
Avidon V.V., Arolovich V.S., Kozlova S.P., Piruzian L.A.
(1978a). Statistical study of information file on biologically active
compounds. II. Choice of decision rule for biological activity prediction.
Pharm-Chem. J. (Rus). 12: 88-93.
Avidon V.V., Arolovich V.S., Kozlova S.P., Piruzian L.A.
(1978b). Statistical investigation of large volumes of data with respect to
the biological activity of compounds III. Selection of a determinant for
predicting biological activity. Pharm-Chem. J. (Rus). 12: 99–106.
Avidon V.V., Pomerantsev I.A., Rozenblit A.B., Golender
V.E. (1982). Structure-activity relationship oriented languages for chemical
structure representation. J. Chem. Inf. Comput. Sci. 22: 207-214.
Avidon V.V., Arolovich V.S., Blinova V.G., Freidina A.M.
(1983). Statistical investigation of the data file on biologically active
compounds. V. Allowance for the novelty of the chemical structure in the
prediction of the biological activity by an improved method of substructural
analysis. Pharm-Chem. J. (Rus). 17: 59-62.
Burov Yu.V., Poroikov V.V., Korolchenko L.V. (1990).
National system for registration and biological testing of chemical
compounds: facilities for new drugs search. Bull. Natl. Cent. Biol. Active
Compnds (Rus.). No. 1: 4-25.
Delmas F., Di Giorgio C., Robin M., Azas N., Gasquet M.,
Detang C., Costa M., Timon-David P., Galy J.P. (2002). In vitro activities
of position 2 substitution-bearing 6-nitro- and 6-aminobenzothiazoles and
their corresponding anthranilic acid derivatives against Leishmania infantum
and Trichomonas vaginalis. Antimicrob. Agents Chemother. 46: 2588–2594.
Di Giorgio C., Delmas F., Filloux N., Robin M., Seferian
L., Azas N., Gasquet M., Costa M., Timon-David P., Galy J.P. (2003). In
vitro activities of 7-substituted 9-chloro and 9-amino-2-methoxyacridines
and their bis- and tetra-acridine complexes against Leishmania infantum.
Antimicrob. Agents Chemother. 47: 174–180.
Di Giorgio C., Delmas F., Ollivier E., Elias R., Balansard
G., Timon-David P. (2004). In vitro activity of the beta-carboline alkaloids
harmane, harmine, and harmaline toward parasites of the species Leishmania
infantum. Exp. Parasitol. 106: 67–74.
Dolzhenko A.V., Kolotova N.V., Koz'minykh V.O., Vasilyuk
M.V., Kotegov V.P., Novoselova G.N., Syropyatov B.Ya., Vakhrin M.I. (2003).
Substituted amides and hydrazides of dicarboxylic acids. Part 14. Synthesis
and antimicrobial and antiinflammatory activity of 4-antipyrylamides,
2-thiazolylamides, and 1-triazolylamides of some dicarboxylic acids.
Pharm-Chem. J. 37: 149–151.
Filimonov D.A., Poroikov V.V., Karaicheva E.I., Kazarian
R.K., Budunova A.P., Mikhailovskii E.M., Rudnitskikh A.V., Goncharenko L.V.,
Burov Yu.V. (1995). Computer-aided prediction of biological activity spectra
of chemical substances on the basis of their structural formulae:
computerized system PASS. Exper. Clin. Pharmacol. (Rus). 58: 56-62.
Filimonov D.A., Poroikov V.V. (1996). PASS: computerized
prediction of biological activity spectra for chemical substances. In:
Bioactive Compound Design: Possibilities for Industrial Use, BIOS Scientific
Publishers, Oxford (UK), pp.47-56.
Filimonov D., Poroikov V., Borodina Yu., Gloriozova T.
(1999). Chemical Similarity Assessment through multilevel neighborhoods of
atoms: definition and comparison with the other descriptors. J. Chem. Inf.
Comput. Sci. 39: 666-670.
Filimonov D.A., Poroikov V.V. (2006). Prediction of
biological activity spectra for organic compounds. Russian Chemical Journal,
50 (2), 66-75
Filimonov D.A., Poroikov V.V. (2008). Probabilistic
approach in activity prediction. In: Chemoinformatics Approaches to Virtual
Screening. Eds. Alexandre Varnek and Alexander Tropsha. Cambridge (UK): RSC
Publishing, 182-216.
Filimonov D.A., Zakharov A.V., Lagunin A.A., Poroikov V.V.
(2009). QNA based “Star Track” QSAR approach. SAR & QSAR Environ. Res. 20:
679-709.
Geronikaki A., Babaev E., Dearden J., Dehaen W., Filimonov
D., Galaeva I., Krajneva V., Lagunin A., Macaev F., Molodavkin G., Poroikov
V., Saloutin V., Stepanchikova A., Voronina T. (2004). Design of new
anxiolytics: from computer prediction to synthesis and biological
evaluation. Bioorg. Med. Chem. 12: 6559-6568.
Geronikaki A., Druzhilovsky D., Zakharov A., Poroikov V.
(2008a). Computer-aided predictions for medicinal chemistry via Internet.
SAR & QSAR Environ. Res. 19: 27-38.
Geronikaki A.A., Lagunin A.A., Hadjipavlou-Litina D.I.,
Elefteriou P.T., Filimonov D.A., Poroikov V.V., Alam I., Saxena A.K.
(2008b). Computer-aided discovery of anti-inflammatory thiazolidinones with
dual cyclooxygenase/lipoxygenase inhibition. J. Med. Chem. 51: 1601-1609.
Goel R.K., Kumar V., Mahajan M.P. (2005). Quinazolines
revisited: search for novel anxiolytic and GABAergic agents. Bioorg .Med.
Chem. Lett. 15: 2145–2148.
Golender V.E., Rozenblit A.E. (1978). Computer Methods for
Drug Design. Riga: Zinatne, 232 pp.
Golender V.E., Rosenblit A.B. (1983). Logical and
Combinatorial Algorithms for Drug Design, Research Studies Press,
Wiley&Sons, 352 pp.
Labanauskas L., Brukstus A., Udrenaite E., Bucinskaite V.,
Susvilo I., Urbelis G. (2005). Synthesis and anti-inflammatory activity of
1-acylaminoalkyl-3,4-dialkoxybenzene derivatives. Il Farmaco. 60: 203–207.
Lagunin A., Stepanchikova A., Filimonov D., Poroikov V.
(2000). PASS: prediction of activity spectra for biologically active
substances. Bioinformatics. 16: 747-748.
Lagunin A.A., Gomazkov O.A., Filimonov D.A., Gureeva T.A.,
Dilakyan E.A., Kugaevskaya E.V., Elisseeva Yu.E., Solovyeva N.I., Poroikov
V.V. (2003). Computer-aided selection of potential antihypertensive
compounds with dual mechanisms of action. J. Med. Chem. 46: 3326-3332.
PASS program package, © Filimonov D.A., Poroikov V.V.,
Gloziozova T.A., Lagunin A.A. Russian State Patent Agency, N 2006613275 of
15.09.2006.
PharmaExpert program package, © Lagunin A.A., Poroikov
V.V., Filimonov D.A., Gloziozova T.A. Russian State Patent Agency, N
2006613590 of 16.10.2006.
Poroikov V.V., Filimonov D.A., Boudunova A.P. (1993).
Comparison of the Results of Prediction of the Spectra of Biological
Activity of Chemical Compounds by Experts and the PASS System. Automat
Document Math Linguistics. 27: 40-43.
Poroikov V.V., Filimonov D.A., Borodina Yu.V., Lagunin A.A.,
Kos A. (2000). Robustness of biological activity spectra predicting by
computer program PASS for non-congeneric sets of chemical compounds. J.
Chem. Inform. Comput. Sci. 40: 1349-1355.
Poroikov V., Akimov D., Shabelnikova E., Filimonov D.
(2001). Top 200 medicines: can new actions be discovered through
computer-aided prediction? SAR and QSAR in Environmental Research, 12 (4),
327-344.
Poroikov V.V., Filimonov D.A. (2002). How to acquire new
biological activities in old compounds by computer prediction. J. Comput.
Aid. Molec. Des., 16 (11), 819-824.
Poroikov V.V., Filimonov D.A., Ihlenfeldt W.-D.,
Gloriozova T.A., Lagunin A.A., Borodina Yu.V., Stepanchikova A.V., Nicklaus
M.C. (2003). PASS Biological Activity Spectrum Predictions in the Enhanced
Open NCI Database Browser. J. Chem. Inform. Comput. Sci. 43: 228-236.
Poroikov V., Filimonov D. (2005). PASS: Prediction of
Biological Activity Spectra for Substances. In: Predictive Toxicology. Ed.
by Christoph Helma. Taylor & Francis, 459-478.
Poroikov V., Lagunin A., Filimonov D. (2005).
PharmaExpert: diseases, targets and ligands – three in one. QSAR and
Molecular Modelling in Rational Design of Bioactive Molecules. Eds. Esin Aki
Sener, Ismail Yalcin, Ankara (Turkey), CADD & D Society, 514-515.
Poroikov V., Filimonov D., Lagunin A., Gloriozova T.,
Zakharov A. (2007). PASS: Identification of probable targets and mechanisms
of toxicity. SAR & QSAR in Environmental Research., 18 (1-2), 101-110.
Sadym A., Lagunin A., Filimonov D., Poroikov V. (2003).
Prediction of biological activity spectra via Internet. SAR & QSAR Environ.
Res. 14: 339-347.
Stepanchikova A.V., Lagunin A.A., Filimonov D.A., Poroikov
V.V. (2003). Prediction of biological activity spectra for substances:
Evaluation on the diverse set of drugs-like structures. Cur. Med. Chem. 10:
225-233.
|
| |