AKos
Consulting & Solutions GmbH
a chemoinformatics company

 

 

Abstract: We present a program for the  prediction of biological activity spectra for drug-like organic substances. New lead compounds can be found on the basis of predicted biological activity spectra. In house and Internet versions of the PASS program are discussed. 

Keywords: biological activity spectrum, computer-aided prediction, computer system PASS (Prediction of Activity Spectra for Substances), applications in computer-aided drug discovery, prediction via Internet.

Introduction

Most of known biologically active substances have many different biological activities that cause both main (therapeutic) and supplementary (side) actions. Some of these activities are found during the initial preclinical study;  others are found unfortunately too late in clinical trials (see, for example, the Fluorouracil's activities in Table 1). Sometimes, many years after the first launch of a drug additional activities are discovered that become the base for a new therapeutic application (see some examples in Table 2).

Most computer-aided drug-discovery methods are used to study a single, or only a few activities of a compound class.  [1-5]. A program that predicts simultaneously pharmacological effects, mechanisms, and specific toxicities on the basis of the 2D chemical structure is the tool of choice to get an early indication if a compound could be a potential lead.

Victor Avidon proposed this idea more than 35 years ago [6, 7]. In the framework of national registration system of the UDSSR this technology has been formerly developed and tested on new chemical compounds synthesized in the USSR [8, 9]. The program was revised several times. The theoretical analysis went through several approaches and the accumulated experience of  finding new leads allows constant improvements [10-14].

 

 The PASS team is permanently collecting and evaluating the information about new pharmaceutical substances and lead compounds, to update the PASS training set and extend PASS predictive abilities on new chemical classes and novel biological activities:

 

 

,

Figure 1. Increase of the number of compounds over years that are abstracted for the knowledge base

Figure 2. Increase of the number of predictable activities over years

 The current version of PASS predicts ca. to 4130  pharmacological effects, mechanisms of action, and other effects, see Table 1. [15]. We provide a list of activities. 

In the following we show the methods used in PASS, examples of practical applications, and how you can evaluate PASS yourself by using it on line as demo version, or as evaluation version. 

 

 

Number

Area

Examples

261

pharmacotherapeutic actions

Anxiolytic

66  

anti-infective actions

Antileishmanial

72

actions blocking a certain process

Apoptosis antagonist

40

actions stimulated a certain process

Apoptosis agonist

140

actions blocking activity of certain endogenous substance

Acetylcholine antagonist

71

actions simulating activity of certain endogenous substance

Acetylcholine agonist

5

actions blocking a release of a certain endogenous substance

Cytochrome C release inhibitor

9

actions stimulating a release of a certain endogenous substance

Acetylcholine release stimulant

9

actions blocking an uptake of a certain endogenous substance

Adenosine uptake inhibitor

2219

actions inhibiting a certain enzyme

12 Lipoxygenase inhibitor

41

actions stimulating action of a certain enzyme

ATPase stimulant

268

actions blocking a certain receptor

5 Hydroxytrypamine 1 agonist 

121

actions stimulating a certain receptor

5 Hydroxytrypamine 1 antagonist

28

actions blocking a certain channel

Chloride channel antagonist

5

actions stimulating a certain channel

Calcium channel agonist

28

actions blocking a certain transporter

GABA transporter 1 inhibitor

128

actions that is a substrate of a certain metabolic enzyme

CYP3A4 substrate

24

actions inhibiting a certain metabolic enzyme (

, CYP3A4 inhibitor

13

actions inducing a certain metabolic enzyme

CYP3A4 inducer

28

actions inhibiting a certain protein

Collagen inhibitor

8

actions inhibiting an expression of a certain transcription factor

Transcription factor Rho inhibitor

2

actions stimulating an expression of a certain transcription factor

TP53 expression enhancer

389

actions that cause a certain adverse/toxic effect

Carcinogen

Table 1. List of biological effects

Presentation of biological activities in PASS

Let's define biological activity as  the result of a compound's interaction with an biological entity. In clinical studies the entity is the human organism. In preclinical testing it can be animals (in vivo) or experimental models (in vitro). The biological activity depends on a compound's structure, charge distribution, physico-chemical properties, and more. The activity depends on the biological entity (species, sex, age, etc.), on the mode of treatment (dose, route), etc. Any biologically active compound reveals a wide spectrum of different effects. Some of them are useful in treatment of diseases but others cause various side and toxic effects. All activities caused by the compound are considered to be the  "biological activity spectrum of the substance".

If the experimental conditions can not be defined narrowly, i.e if difference in species, sex, age, dose, route, etc. is neglected the biological activity can be identified only qualitatively (“yes’/“none”, “active”/“inactive”). Thus, the "biological activity spectrum" is defined as an "intrinsic" property of a compound depending only on its structure and physico-chemical characteristics. Qualitative presentation allows integrating information concerning biologically active compounds collected from many different sources into the general PASS training set. Any property of chemical compounds, which is determined by their structural peculiarities, can be used for prediction by PASS. It was shown, that the applicability of PASS is broader than the prediction of biological activities. For instance, this approach was successfully used for prediction of such general property of organic molecules as drug-likeness (Anzali et al., 2001).

Chemical structure description in PASS.

The 2D structural formulae of compounds were chosen as the basis for description of chemical structure because this is the only information available at the early stage of research. Thus, using the structural formula as an input data, one can obtain the estimates of biological activity profiles even for virtual molecules, prior to their chemical synthesis and biological testing.

Many different characteristics of chemical compounds can be calculated on the basis of structural formulae. In the earliest versions of PASS (Poroikov et al., 1993; Filimonov et al., 1995; Filimonov and Poroikov, 1996) we used the Substructure Superposition Fragment Notation (SSFN) codes (Avidon et al., 1982). However, SSFN, like many other structural descriptors, reflects rather abstraction of chemical structure by the human than the nature of ligand-target interactions, which are the molecular mechanisms of biological activities.

 

The Multilevel Neighbourhoods of Atoms (MNA) descriptors (Filimonov et al., 1999) have certain advantages in comparison with SSFN. These descriptors are based on the molecular structure representation, which includes the hydrogen atoms according to the valences and partial charges of other atoms and does not specify the types of bonds. MNA descriptors are generated as recursively defined sequence:

  • zero-level MNA descriptor for each atom is the mark A of the atom itself;

  • any next-level MNA descriptor for the atom is the sub-structure notation A(D1D2....Di the previous-level MNA descriptor for i–th immediate neighbour’s of the atom A.

The mark of atom may include not only the atomic type but also any additional information about the atom. In particular, if the atom is not included into the ring, it is marked by “-”. The neighbor descriptors D1D2....Di  are arranged in unique lexicographic order. Iterative process of MNA descriptors generation can be continued covering first, second, etc. neighborhoods of each atom.

The molecular structure is represented in PASS by the set of unique MNA descriptors of the 1st and 2nd levels (Figure 3). The substances are considered to be equivalent in PASS if they have the same set of MNA descriptors. Since MNA descriptors do not represent the stereochemical peculiarities of a molecule, the substances whose structures differ only stereochemically, are formally considered as equivalent.

HC

C(C(CC—H)C(CC—C)—H(C))

HO

C(C(CC—H)C(CN—H)—H(C))

CHCC

C(C(CC—H)C(CN—H)—C(C—O—O))

CHCN

C(C(CC—H)N(CC)—H(C))

CCCC

C(C(CC—C)N(CC)—H(C))

CCOO

N(C(CNH)C(CNH))

NCC

H(C(CCH))

OHC

H(C(CNH))

OC

H(O(HC))

 

—C(C(CC—C)—O(—H—C)—O(—C))

 

O(H(O)C(COO))

 

O(C(COO))

Figure 3. Structural formula of nicotinic acid and its MNA descriptors of the 1st (left column) and 2nd (right column) levels

New QNA (Quantitative Neighbourhoods of Atoms) descriptors were recently developed, which allow the analysis of quantitative structure-activity relationships (Filimonov et al., 2009).

Mathematical Approach

PASS algorithm of biological activity spectrum prediction is based on Bayesian estimates of probabilities of molecule’s belonging to the classes of active and inactive compounds, respectively. The mathematical method is described in several publications (Lagunin et al., 2000; Stepanchikova et al., 2003; Poroikov and Filimonov, 2005; Filimonov and Poroikov, 2006; Filimonov and Poroikov, 2008), and its details will not be discussed here. Only general description necessary for interpretation of prediction results is presented below.

Since the main purpose of PASS is the prediction of activity spectra for new molecule, the general principle of the PASS algorithm is the exclusion from SAR Base the substances, which are equivalent to the substance under prediction.

The structural formula of molecule, for which PASS prediction should be carried out, is presented as a MOL file (for the set of molecules – as SDFile). The predicted activity spectrum is presented in PASS by the list of activities with probabilities "to be active" Pa and "to be inactive" Pi calculated for each activity (Figure xxx). The list is arranged in descending order of Pa-Pi; thus, the more probable activities appeared at the top of the list. Only activities with Pa>Pi are considered as possible for a particular compound. The list can be shortened at any desirable cutoff value, but Pa>Pi is used by default. If the user chooses rather high value of Pa as a cutoff for selection of probable activities, the chance to confirm the predicted activities by the experiment is high too, but many existing activities will be lost. For instance, if Pa>90% is used as a cutoff, about 90% of real activities will be lost; for Pa>80%, the portion of lost activities is 80%, etc.

It is necessary to keep in mind that probability Pa reflects the similarity of molecule under prediction with the structures of molecules, which are the most typical in a sub-set of “actives” in the training set. Therefore, usually there is no direct correlation between the Pa values and quantitative characteristics of activities.

Even active and potent compound, whose structure does not resemble the typical structures of “actives” from the training set, may obtain a low Pa value during the prediction (even negative Pa-Pi values could be observed). This may be explained by the way how the appropriate estimates are constructed: the values Pa for “actives” and Pi for “inactives” are distributed uniformly.

 

 

 

 Taking this into account, the following interpretation of prediction results is possible. If, for instance, Pa=0.9, then for 90% of “actives” from the training set the appropriate estimates are less than for this compound, and only for 10% of “actives” these values are higher. If one declines the suggestion that this compound is active, he will make a wrong decision with probability 0.1. In case if Pa<0.5, but Pa>Pi, for more than half of “actives” from the training set the appropriate estimates are higher than for this compound. If one declines the suggestion that this compound is active, he will make a wrong decision with probability less than 0.5. In such case the probability to confirm this kind of activity in the experiment is small, but if it will be confirmed, more than 50% chances that this structure has a high novelty and may become NCE.

If the predicted biological activity spectrum is wide, the structure of the compound is quite simple, and does not contain peculiarities, which are responsible for the selectivity of its biological action.

If it appears that the structure under prediction contains several new MNA descriptors (in comparison with the descriptors from the compounds of the training set), then the structure has low similarity with any structure from the training set, and the results of prediction should be considered as rather rough estimates.

Based on these criteria, one may choose which activities have to be tested for the studied compounds on the basis of compromise between the novelty of expected pharmacological action and the risk to obtain the negative result in experimental testing. Certainly, one could also take into account a particular interest to some kinds of activity, experimental facilities, etc.

We have developed a special application CWM Lead Finder which matches with clustering algorithms the biological spectra of a set of compounds with known biological activity and a set of untested compounds.

 

Mathematical Approach

The accuracy and efficiency of more than 200 various mathematical approaches were tested to select the most relevant algorithms [16]. One of the methods that provides a satisfactory quality of prediction is described below in more details.

Definitions:

n is the total number of compounds in the training set;
ni is the number of compounds, that have the descriptor i;
nj is the number of compounds, that reveal the activity j;
nij is the number of compounds, that have both the descriptor i and the activity j;

pj = nj/n is the estimate of a priori probability of activity j;
pij = nij/ni is the estimate of the conditional probability of the activity j for the descriptor i;
m is the number of descriptors for the compound under prediction;
ri = ni/(ni + 0.5/m) is a regulating factor;

Prj is the initial estimate of the probability of the activity j for the compound under prediction;
CPj is the cutting point;
E1j(CPj) is the estimate of 1st kind error probability;
E2j(CPj) is the estimate of 2nd kind error probability;

The 1st kind error is observed when the compound under prediction actually is active but Prj < CPj;
The 2nd kind error is observed when the compound under prediction is considered as inactive but Prj > CPj. 

LOO is the leave-one-out procedure. 

For each compound in the training set the values n, ni, nj, nij are changed to n-1, ni-1, and nj-1, nij-1 when it has activity j, and the estimates Prj are calculated.

MEP is the maximal error of prediction (see below). --->

 

Algorithm of Prediction

Structural descriptors are generated for the compound under prediction. The following values are calculated for each activity:

        uj = SiArcSin{ri(2pij-1)},   vj = SiArcSin{ri(2pj-1)}

                            sj = Sin(uj/m),   tj = Sin(vj/m)

                            Prj = (1+(sj-tj)/(1-sjtj))/2

Validation criteria: The LOO estimates of Prj are calculated for each compound in the training set. 
The estimates of E1j(CPj) and E2j(CPj) are calculated for each activity. The cross point 

                            E1j(CPj*) = E2j(CPj*)

are calculated. The maximal error of prediction MEP is:

                    MEPj = E1j(CPj*) = E2j(CPj*)

Results of the prediction:

The probability to be active is:

                            Pa = E1j(Prj)

The probability to be inactive is:

                            Pi = E2j(Prj)

The result for the prediction is presented as the list of activities with appropriate Pa and Pi, sorted in descending order of the difference (Pa-Pi)>0.

 

 

 

Figure 1. The Process of PASS Development

PASS Elements

The principal elements of PASS include the Training Set, Chemical Structure Description, Biological Activity Description, and the Mathematical Approach. They are described in more details below.

The Training Set

PASS 1.611 training set consists of about 46'000 of biologically active compounds, from which about 15,000 substances are already launched drugs, and about 13,000 drug-candidates are under clinical or advanced preclinical testing, now. Since 1972 this training set is compiled from many sources including: --->

publications, patents, databases, private communications, etc. For the majority of compounds, included into the training set, the biological activity spectrum of each compound was studied in detail.

In PASS Pro the customer can create easily his own training set. A training set consists of a SDFile with the field activity_prediction. This file is read into PASS. It takes about 5 minutes to read a training set of 1000 compounds.

Chemical Structure Description

The structure of a compound is described by descriptors. We call them Multilevel Neighborhoods of Atoms (MNA).  The general idea is illustrated below for Ethanol.

0

1

2

C

C(HHHC)

C(H(C)H(C)H(C)C(HHCO))

C

C(HHCO)

C(H(C)H(C)C(HHHC)O(HC))

O

O(HC)

O(H(O)C(HHCO))

H

H(C)

H(C(HHHC))

H

H(C)

H(C(HHHC))

H

H(C)

H(C(HHHC))

H

H(C)

H(C(HHCO))

H

H(C)

H(C(HHCO))

H

H(O)

H(O(HC))

At the first step, the 1st and 2nd levels neighborhoods for atoms are generated, see above. At the second step, the duplicate MNA descriptors are eliminated. 

H(C(HHHC))
H(C(HHCO))
H(O(HC))
C(H(C)H(C)H(C)C(HHCO))
C(H(C)H(C)C(HHHC)O(HC))
O(H(O)C(HHCO))

Despite the possibility to continue the procedure calculating the 3rd, 4th, etc. levels of atoms' neighborhoods, only descriptors of the 1st and 2nd levels are used, because this approximation is shown to provide the best quality of prediction. 

Biological Activity Description

Let's define biological activity as  the result of a compound's interaction with an biological entity. In clinical studies the entity is the human organism. In preclinical testing it can be animals (in vivo) or experimental models (in vitro). The biological activity depends on a compound's structure, charge distribution, --->

 

   

Quality of Prediction

The quality of prediction can be calculated by leave-one-out cross validation. Each of the compounds is subsequently removed from the training set and the prediction of its activity spectrum is carried out on the basis of the remaining part of the training set. The result is compared to the known activity of the compound, and the maximal error of prediction (MEP) is calculated, and averaged over all compounds and activities.

This error is about 0.15 for the current version of PASS. The average accuracy of prediction using LOO cross-validation is about 0.85. Such accuracy is enough for practical use. Especially taking into account that random guess would produce a number of 1/1000 = 0.001 for 100 activities.

 Interpretation of the Prediction's Results

The total number of MNA descriptions of the compound, and the number of new MNA descriptors are shown with the result.  If the number of new descriptors is more than 3, the result of the prediction may questionable.--->

Pa and Pi are the estimates expressed as probability of the compound to be active and inactive. The values vary from 0 to 1. Only activities with Pa > Pi are considered as possible for a particular compound.

If Pa > 0.7. the chance to find the activity experimentally is high. But, in many cases the compound may occur to be a close analogue of known pharmaceutical agents.

If 0.5 < Pa < 0.7 the chance to find the activity experimentally is less, but the compound is probably not so similar to known pharmaceutical agents.

If Pa < 0.5 the chance to find the activity experimentally is even  less. But, the chance increases of finding a structurally new compound.  

Thus, one may choose which activities have to be tested in ones compounds on the basis of compromise between expected novelty of pharmacological agent and risk to get too many false positives.  

Using a biological profile = several important activities one can increase the quality of prediction considerably. 

Various Applications of PASS

It takes about 30 minutes to calculated  the biological activity spectra for 100'000 compounds on an ordinary IBM PC Pentium/500 MHz. One can effectively use PASS for predicting activities of many compounds from large in-house and commercial databases.

At NCI the PASS parameters are provided with each compound, by updating the database periodically. One also can calculated the PASS parameters on the fly. This would have the advantage that one gets also a prediction for compounds that one searches on-line at CAS, or at DiscoveryGate.

PASS can be useful applied for:--->

Revealing new effects and mechanisms of action for the old substances in corporate and private data bases.

Finding new leads among the compounds from in-house and commercial databases.

Selecting the most prospective compounds for high throughput screening from a set of available samples.

Determining relevant screens for a particular compound.

 

Revealing New Effects and Mechanisms of Action  

This is considered below on the example of predicting the biological activity spectrum for the well-known cerebrotonic drug Cavinton (Vinpocetin). This was  launched by Gedeon Richter (Hungary) more than twenty years ago. Its structural formula and predicted biological activity spectrum are given below.

 

Predicted biological activity spectrum for Cavinton

45 Descriptors, 0 New Descriptors, 47 Predicted Activities

No

Pa

Pi

Activity

Experiment

Reference

1

0.929

0.004

Peripheral vasodilator

 

 

2

0.900

0.000

Multiple sclerosis treatment

 

 

3

0.855

0.005

Vasodilator

+

[17, 18]

4

0.844

0.003

Abortion inducer

+

[17]

5

0.812

0.001

Antineoplastic enhancer

 

 

6

0.760

0.006

Coronary vasodilator

+

[19]

7

0.732

0.007

Spasmogenic

 

 

8

0.700

0.036

Antihypoxic

+

[17, 20, 21]

9

0.650

0.004

Lipid peroxidase inhibitor

+

[22, 23]

10

0.648

0.008

Cognition disorders treatment

+

[17, 24, 25]

11

0.656

0.021

Antiischemic

+

[17, 26-28]

12

0.577

0.013

Acute neurologic disorders treatment

+

[17, 18]

13

0.540

0.039

Spasmolytic

+

[18]

14

0.519

0.026

Antianginal agent

 

 

15

0.486

0.037

Antihypertensive

+

[18]

16

0.449

0.035

Antiarrhythmic

+

[29]

17

0.432

0.063

Sympatholytic

 

 

18

0.438

0.077

Sedative

+

[18]

19

0.500

0.152

Antiinflammatory, Pancreatic

 

 

20

0.328

0.020

Antidepressant, Imipramin-like

 

 

21

0.300

0.010

Thrombolytic

+

[17, 18, 20]

22

0.342

0.075

Psychotropic

+

[18]

23

0.276

0.023

Alpha 2 adrenoreceptor antagonist

+

[30]

24

0.273

0.029

Anesthetic intravenous

 

 

25

0.547

0.304

Vascular (periferal) disease treatment

 

 

26

0.225

0.006

Antineoplastic Alkaloid

 

 

27

0.291

0.086

Cholinergic antagonist

 

 

28

0.263

0.066

Benzodiazepine agonist partial

 

 

29

0.417

0.238

Insulin promoter

 

 

30

0.222

0.045

MAO-A inhibitor

 

 

31

0.353

0.188

Cardiovascular analeptic

 

 

32

0.249

0.100

Narcotic antagonist

 

 

33

0.300

0.161

Acetylcholine release stimulant

 

 

34

0.236

0.104

Antitumor-cytostatic

 

 

35

0.271

0.165

Antiparkinsonian, rigidity relieving

 

 

36

0.218

0.127

Antidepressant

 

 

37

0.247

0.157

Analeptic

 

 

38

0.211

0.126

Potassium channel antagonist

 

 

39

0.243

0.158

Antiparkinsonian, tremor relieving

 

 

40

0.333

0.258

5 Hydroxytryptamine 3 agonist

 

 

41

0.233

0.172

Respiratory analeptic

 

 

42

0.242

0.184

Antipsoriatic

 

 

43

0.131

0.081

Analgesic, opioid

 

 

44

0.147

0.128

N-cholinergic agonist

 

 

45

0.285

0.267

cAMP phosphodiesterase inhibitor

+

[17]

46

0.175

0.162

Anestetic general

 

 

47

0.375

0.370

Male reproductive disfunction treatment

 

 

 

Cavinton is used in medicinal practice for twenty years. Many activities that were found in preclinical testing and clinical trials during this period are compared with the result of the prediction. According to the available literature only 16 of 47 predicted activities of Cavinton are already found. These activities are marked by "+" in the Table above.

In particular, ASS predicts the vasodilator and spasmolytic activities (Pa=0.855 and 0.540). It corresponds with the well-known pharmacological effects of Cavinton. It causes vasodilatation, increases the brain blood flow and metabolism. Antihypoxic and Antiischemic effects are also predicted for Cavinton (Pa=0.700 and 0.656 respectively). Cavinton is used for these purposes. Cavinton is predicted as Lipid peroxidase inhibitor (Pa=0.650), agent for cognition disorders treatment (0.648), agent for acute neurological disorders treatment (0.577), etc. Cavinton has all these activities.--->

The predicted biological activity spectrum of Cavinton suggests several new application of the substance. Among them are: Multiple sclerosis treatment (Pa=0.900); Antineoplastic enhancer (0.812), Antineoplastic Alkaloid (0.225) and Antitumor-Cytostatic (0.236); Antiparkinsonian rigidity-relieving (0.271) and Antiparkinsonian tremor-relieving (0.243); etc. While the Multiple sclerosis treatment is predicted with high probability, all other additionally predicted activities have relatively small values of Pa. 

Similarly, the predicted activity spectrum for any compound provides ideas for further testing. As a result some new effects and mechanisms will be found for old substances. Varying the cutoff value of Pa one may choose the desirable level of novelty vs. acceptable risk of negative result.

Finding Potential New Leads 

A researcher can define desirable and not desirable activities for a compound. She can select such compounds from a set of structures with the help of PharmaExpert.   For example, among the 15630 compounds of the ChemStar database (http://www.chemstar-ru.com) 959 compounds are predicted as Endothelin antagonist, 236 compounds as Angiotensin II antagonist, 57 compounds as Angiotensin converting enzyme inhibitor. --->

If the purpose of the study is to find the compounds with dual mechanism of Antihypertensive effect, e.g. Angiotensin converting enzyme inhibitor + Endothelin antagonist, only 11 compounds are predicted as having both activities. The best from the hits has Pa =0.170 (Endothelin antagonist) and Pa=0.244 (Angiotensin converting enzyme inhibitor). Based on this result one may decide either to test these 11 compounds or to carry out the prediction and selection for compounds from another database. In any case varying the cutoff value of Pa it is possible to choose the compounds with less or higher novelty (see: Interpretation of the Prediction's Results).

Selecting the Most Prospective Compounds for Highthroughput Screening. 

Sometimes one is interested in activities that are not yet included in PASS, and the data are not available to train ones own knowledge base for PASS Pro. In such cases two other strategies are suitable.

The first strategy is based on the hypothesis that the more  activities are predicted for a compound, the higher is the chance to find any useful pharmacological action for this compound. For each compound the following value is  calculated: P = [S Pa/(Pa+Pi )]/n 
where n is the number of biological activities under consideration.

All compounds are arranged in the descending order of P values, and only compounds with the highest values of P are selected for screening. 

The second strategy is based on the hypothesis that the more "novel" a compounds is, the higher is the probability to find a NCE. Thus, the compounds with the highest amount of new descriptors are selected.

Both strategies were tested on datasets including 10,000 - 70,000 compounds and their efficacy is shown [31].--->

Determining Relevant Screens for a Particular Compound. 

Testing can be organized in descending order of difference (Pa-Pi) for different activities. For example, if we consider the example of Cavinton, it should be studied in the following tests: Peripheral vasodilator (0.929-0.004), Multiple sclerosis treatment (0.900-0.000), Vasodilator (0.855-0.005), Abortion inducer (0.844-0.003), Antineoplastic enhancer (0.812-0.001), Coronary vasodilator (0.760-0.006), etc.

In this case both safety and efficacy of a new compound will be characterized more comprehensively. Moreover, it is shown that the economic viability of such approach to testing is more than 500% [32]. 

Experimental Verification

The predictions of PASS were confirmed by experiment. Some of these examples are given below.

The activity spectra have been predicted for 300 new chemical compounds, synthesized in the Chemical-Pharmaceutical Research Institute (Novokuznetzk). Twenty compounds have been selected for testing as probable antiulcer agents. Nine compounds have been synthesized and tested. A potent antiulzer activity was found for 5 of these compounds. These new antiulcer agents are NCE [33]. The economic advantage  is about (300/20)100 = 1500% in this study.

The activity spectra have been predicted for 520 new chemical compounds, synthesized in the Institute of Organic Chemistry of Russian Academy of Science (Moscow). Fourteen compounds have been selected for testing as the most prospective. It was shown that the results of 22 experiments made on 5 various kinds of activity, coincide with predictions in 20 cases. The accuracy of prediction is about 90%.

Based on the predicted biological activity spectra for about 20 macroheterocyclic compounds, 2 antitumor leads were found.[34].

New antibacterial agents were found based on the biological activity spectra for derivatives of 1-amino-4-(5-arylozaxolyl-2)-butadiens-1,3 [35].

Analgesic, antiinflammatory, antioxidant and some additional activities were predicted and confirmed by experiment for some thiazole derivatives [36]. --->

 

These and some other examples demonstrate that the approach to predicting many biological activities simultaneously can be effectively applied to compounds from different chemical series to find various pharmacological actions.

Naturally, the PASS approach has some limitations. They are:

PASS approach can be applied to so-called "drug-like" substances.

PASS can be applied to the activities for which the training set will include no less than 5 active compounds per activity.

The accuracy of the PASS predictions are significantly higher than random guess. PASS cannot predict the activity spectrum for essentially new compounds that have no  descriptor in the training set 

In some cases PASS predicts  both agonist's and antagonist's (blocker and stimulator) actions simultaneously. Thus, only experiments can clarify the intrinsic activity of a compound, but it probably has an affinity to appropriate receptor (enzyme).

 

 

Using PASS via Internet

Since July 1998 PASS is open for free testing via Internet (http://www.ibmh.msk.su/PASS/default.htm). 

Anyone who would like to obtain additional information about the biological potential of her compound may fill the registration form and send the structure file in ISIS (MDL Information Systems, Inc.) "MOLl" format.

Such files can be prepared, for example, with the chemical editor ISIS/Draw (MDL Information Systems, Inc.). ISIS/Draw is available free for personal or non-commercial use from the MDL web site http://www.mdl.com.

The molfile can be prepared with ISIS/Draw by drawing the sucture using the menu options and the mouse. After that one can choose "Edit" a "Select All". When the molecule is selected as a total, choose "File" a "Export" a "Molfile". Files have to be saved on the disk under a name defined by the user. 

 

When the molfile is prepared and the registration form is filled in the Internet version of PASS, one may click on the option "Browse" and select the molfile, and click on "Submit now" and wait for the result. In case of any problem, please send an E-mail to pass@ibmh.msk.su.

Conclusions

A new program is developed for predicting biological activity of drug-like compound on the basis of the 2D structure. It can be applied effectively in finding new leads. This is demonstrated on examples of both compounds with known activities and new synthesized structures studied as potential pharmacological agents. The program can be tested easily via the Internet, by downloading a light version, or by applying for the evaluation of the full version. Please send an email to us

Acknowledgments

We gratefully acknowledge MDL Information Systems, Inc. for providing ISIS/Host, ISIS/Base and the MDDR database used in this study.

This is an edited version of the original paper, A. Kos 3.2.03.

Table 1

FLUOROURACIL'S KNOWN ACTIVITIES (CAS NO. 51-21-8)

Activity

Publ. Year

Antineoplastic

1962

Embryotoxic

1966

Antiviral

1971

Thimidine Triphosphate Synthesis Inhibition

1977

RNA Synthesis Inhibition

1980

Protein Synthesis Inhibition

1987

Lipid Metabolism Regulator

1990

Immunosuppressant

1990

Antimetabolite

1991

Antiviral (AIDS)

1996

. . .

?

. . .

?

 

Table 2

EXAMPLES OF PHARMACEUTICALS, WHICH ADDITIONALLY DISCOVERED ACTIVITY
WAS USED FOR ANOTHER INDICATION

Pharmaceutical Therapeutic Effect

Year

Acetazolamide Diuretic

Antiepileptic

1954

1956

Valproate Anxiolytic

Antiepileptic

1961

1989

Levamisole Antihelmintic

Immunostimulant

1968

1980

Alprostadil Antiagregant

Erectant

1988

1994

Aspirin Analgesic

Antiagregant

1899

1971

?

?

?

 

References

[1] Wermuth C.G., ed., Medicinal chemistry in practice, Academic Press, London, 1996, 968 p.p.

[2] Van de Waterbeemd H., ed., Structure-property correlations in drug research, Landes, Austin, 1996, 210 p.p.

[3] Dean P.M., Molecular similarity in drug design, Blackie Academic, London, 1995,

[4] Livingstone D., Data analysis for chemists. Applications to QSAR and Chemical Product Design, Oxford Science Publ., Oxford, 1995, 239 p.p.

[5] Kubinyi H., ed., 3D QSAR in drug design, Escom, Leiden, 1993, 759 p.p.

[6] Avidon V., Criteria for similarity assessment of chemical structures and the basics of informational language for development of informational-logical system on biologically active compounds. Chem. & Pharmaceut. J. (Rus.), 1974, 8 (8), 22-25.

[7] Piruzyan L.A., Avidon V.V., Rozenblit A.B., et.al. Statistical analysis of the information file on biologically active compounds. I. Data base on the structure and activity of biologically active compounds. Chem. & Pharmaceut. J. (Rus.), 1977, 11 (4), 35-40.

[8] Piruzyan L.A., Rudzit E.A. The methodical approaches to study biological activity of chemical compounds. Chem. & Pharmaceut. J. (Rus.), 1976, 10 (8), 21-27.

[9] Burov Yu.V., Korolchenko L.V., Poroikov V.V. National system for registration and biological testing of chemical compounds: facilities for new drugs' search. Bull. Natl. Center for Biologically Active Compounds (Rus.), 1990, No. 1, 4-25.

[10] Filimonov D.A., Poroikov V.V., Karaicheva E.I., et. al. (1995). Computer-aided prediction of biological activity spectra of chemical substances on the basis of their structural formulae: computerized system PASS. Experimental and Clinical Pharmacology (Rus), 58 (2), 56-62.

[11] Filimonov D.A., Poroikov V.V. PASS: Computerized prediction of biological activity spectra for chemical substances. Bioactive Compound Design: Possibilities for Industrial Use, BIOS Scientific Publishers, Oxford, 1996, p.47-56.

[12] Poroikov V.V., Filimonov D.A. Computerized prediction of biological activity spectra for chemical substance - new approach to effective drug design. In: QSAR and Molecular Modelling Concepts, Computational Tools and Biological Applications. Barcelona: Prous Science Publishers, 1996, p.49-50.

[13] Poroikov V.V., Filimonov D.A., Stepanchikova A.V., et.al.. Opimization of synthesis and pharmacological testing of new compounds based on computerized prediction of their biological activity spectra. Chem. & Pharmaceut. J. (Rus), 1996, 30 (9), 20-23. (English translation by Consultants Bureau, New York: Pharmaceutical Chemistry Journal, 1996, 30 (9), 570-573).

[14] Poroikov V.V. PASS, a program for the prediction of activity spectra from molecular structure. Newsletter of The QSAR and Modelling Society, 1997, No. 8, 12-15.

[15] Gloriozova T.A., Filimonov D.A., Lagunin A.A., Poroikov V.V. Testing of computer system for prediction of biological activity spectra PASS on the set of new chemical compounds. Chem. & Pharmaceut. J. (Rus), 1996, In press.

[16] Filimonov D.A. Comparison of Algorithms for Computer Prediction of Biological Activity Spectra for Chemical Compounds on the Basis of Their Structural Formulae. II Rus. Natl. Congress "Man and Drugs", Moscow, Abstracts, 1995, 62-63.

[17] Summary of Cavinton (Vinpocetine) Gedeon Richter, Budapest-Hungary, 1994-06-07.

[18] Mashkovskii M.D. The Pharmaceuticals, Medicine, Moscow, 1997, v.1, 399-400.

[19] VIDAL. Pharmaceuticals in Russia. Moscow, AstraPharmService, 1997.

[20] Kiss B., Karpati E. Acta Pharm. Hung., 1996, 66 (5), 213-224.

[21] Plotnikova T.M., Plotnikov M.V., Bazhenova T.G. Bull. Exp. Biol. Med., 1991, 111 (2), 170-172.

[22] Karmazsin L., Olah V. A., Balla G., Makay A. Acta Paediatr. Hung. 1990, 30 (2), 217-224.

[23] Suno M., Nagaoka A. Nippon Yakurigaku Zasshi, 1988, 91 (5), 295-299.

[24] Boda J., Karsay K., Czako L., Fugi S., Kovacs A., Koncz I., Maczko P. A. Ther. Hung., 1989, 37 (3), 176-180.

[25] Molnar P., Gaal L. Eur. J. Pharmacol., 1992, 215 (1), 17-22.

[26] Kiss B., Karpati E. Acta Pharm. Hung., 1996, 66 (5), 213-224.

[27] Hadjiev D., Yancheva S. Arzneimittelforschung, 1976, 26 (10A), 1947-1950.

[28] Rischke R., Krieglstein J. Pharmacology, 1990, 41 (3), 153-160.

[29] Karpati E., Szporny L. Arzneimittelforschung, 1976, 26 (10A),1908-1912.

[30] Paulo T., Toth P.T., Nguyen T.T., Forgacs L., Torok T.L., Magyar K. J. Pharm. Pharmacol., 1986, 38 (9), 668-73.

[31] Poroikov V.V., Filimonov D.A., Stepanchikova A.V. Biological Activity Spectra Prediction as a Tool to Select the Most Prospective Compounds from Commercial and In-House Databases. Abstr. Intern. Med. Chem. Symp., Seoul, 1997, P.143.

[32] Poroikov V.V, Filimonov D.A, Boudunova A.P. Computer Assisted Prediction of Biological Activity Spectra: Estimating the Effectivity of Use in High Throughput Screening. Abstr: XIVth International Symposium on Medicinal Chemistry, Maastricht, the Netherlands, 1996, P-3.05.

[33] Trapkov V.A., Budunova A.P., Burova O.A., Filimonov D.A., Poroikov V.V. Discovery of New Antiulcer Agents by Computer Aided Prediction of Biological Activity. Problems in Medical Chemistry (Moscow), 1997, 43 (1), 41-57.

[34] Islyaikin M.K., Danilova E.A., Kudrik E.V., Smirnov R.P., Boudunova A.P., Kinzirskii A.S. Synthesis and study of antitumor action of macroheterocyclic compounds and their complexes with metals. Chemical & Pharmaceutical J. (Rus), 1997, 31 (8), 19-22.

[35] Maiboroda D.A., Babaev E.V., Goncharenko L.V. (1998). Synthesis and study of spectral and pharmacological properties of 1-amino-4-(5-arylozaxolyl-2)-butadiens-1,3. Chemical & Pharmaceutical J. (Rus), 32 (6), 24-28.

[36] Geronikaki A., Poroikov V., Hajipavlou-Litina D., Mgonzo R., Filimonov D., Lagunin A. Synthesis, computer assisted prediction of biological activity spectra and experimental testing of new thiazole derivatives. Quantitative Structure-Activity Relationships, 1998, In press

Anzali S., Barnickel G., Cezanne B., Krug M., Filimonov D., Poroikov V. (2001). Discriminating between drugs and nondrugs by Prediction of Activity Spectra for Substances (PASS). J. Med. Chem. 44: 2432-2437.

Avidon V.V. (1974). Criteria for the comparison of chemical structures and principles of construction of an information language for a logical information system for biologically active compounds. Pharm-Chem. J. (Rus). 8: 22-25.

Avidon V.V., Arolovich V.S., Kozlova S.P., Piruzian L.A. (1978a). Statistical study of information file on biologically active compounds. II. Choice of decision rule for biological activity prediction. Pharm-Chem. J. (Rus). 12: 88-93.

Avidon V.V., Arolovich V.S., Kozlova S.P., Piruzian L.A. (1978b). Statistical investigation of large volumes of data with respect to the biological activity of compounds III. Selection of a determinant for predicting biological activity. Pharm-Chem. J. (Rus). 12: 99–106.

Avidon V.V., Pomerantsev I.A., Rozenblit A.B., Golender V.E. (1982). Structure-activity relationship oriented languages for chemical structure representation. J. Chem. Inf. Comput. Sci. 22: 207-214.

Avidon V.V., Arolovich V.S., Blinova V.G., Freidina A.M. (1983). Statistical investigation of the data file on biologically active compounds. V. Allowance for the novelty of the chemical structure in the prediction of the biological activity by an improved method of substructural analysis. Pharm-Chem. J. (Rus). 17: 59-62.

Burov Yu.V., Poroikov V.V., Korolchenko L.V. (1990). National system for registration and biological testing of chemical compounds: facilities for new drugs search. Bull. Natl. Cent. Biol. Active Compnds (Rus.). No. 1: 4-25.

Delmas F., Di Giorgio C., Robin M., Azas N., Gasquet M., Detang C., Costa M., Timon-David P., Galy J.P. (2002). In vitro activities of position 2 substitution-bearing 6-nitro- and 6-aminobenzothiazoles and their corresponding anthranilic acid derivatives against Leishmania infantum and Trichomonas vaginalis. Antimicrob. Agents Chemother. 46: 2588–2594.

Di Giorgio C., Delmas F., Filloux N., Robin M., Seferian L., Azas N., Gasquet M., Costa M., Timon-David P., Galy J.P. (2003). In vitro activities of 7-substituted 9-chloro and 9-amino-2-methoxyacridines and their bis- and tetra-acridine complexes against Leishmania infantum. Antimicrob. Agents Chemother. 47: 174–180.

Di Giorgio C., Delmas F., Ollivier E., Elias R., Balansard G., Timon-David P. (2004). In vitro activity of the beta-carboline alkaloids harmane, harmine, and harmaline toward parasites of the species Leishmania infantum. Exp. Parasitol. 106: 67–74.

Dolzhenko A.V., Kolotova N.V., Koz'minykh V.O., Vasilyuk M.V., Kotegov V.P., Novoselova G.N., Syropyatov B.Ya., Vakhrin M.I. (2003). Substituted amides and hydrazides of dicarboxylic acids. Part 14. Synthesis and antimicrobial and antiinflammatory activity of 4-antipyrylamides, 2-thiazolylamides, and 1-triazolylamides of some dicarboxylic acids. Pharm-Chem. J. 37: 149–151.

Filimonov D.A., Poroikov V.V., Karaicheva E.I., Kazarian R.K., Budunova A.P., Mikhailovskii E.M., Rudnitskikh A.V., Goncharenko L.V., Burov Yu.V. (1995). Computer-aided prediction of biological activity spectra of chemical substances on the basis of their structural formulae: computerized system PASS. Exper. Clin. Pharmacol. (Rus). 58: 56-62.

Filimonov D.A., Poroikov V.V. (1996). PASS: computerized prediction of biological activity spectra for chemical substances. In: Bioactive Compound Design: Possibilities for Industrial Use, BIOS Scientific Publishers, Oxford (UK), pp.47-56.

Filimonov D., Poroikov V., Borodina Yu., Gloriozova T. (1999). Chemical Similarity Assessment through multilevel neighborhoods of atoms: definition and comparison with the other descriptors. J. Chem. Inf. Comput. Sci. 39: 666-670.

Filimonov D.A., Poroikov V.V. (2006). Prediction of biological activity spectra for organic compounds. Russian Chemical Journal, 50 (2), 66-75

Filimonov D.A., Poroikov V.V. (2008). Probabilistic approach in activity prediction. In: Chemoinformatics Approaches to Virtual Screening. Eds. Alexandre Varnek and Alexander Tropsha. Cambridge (UK): RSC Publishing, 182-216.

Filimonov D.A., Zakharov A.V., Lagunin A.A., Poroikov V.V. (2009). QNA based “Star Track” QSAR approach. SAR & QSAR Environ. Res. 20: 679-709.

Geronikaki A., Babaev E., Dearden J., Dehaen W., Filimonov D., Galaeva I., Krajneva V., Lagunin A., Macaev F., Molodavkin G., Poroikov V., Saloutin V., Stepanchikova A., Voronina T. (2004). Design of new anxiolytics: from computer prediction to synthesis and biological evaluation. Bioorg. Med. Chem. 12: 6559-6568.

Geronikaki A., Druzhilovsky D., Zakharov A., Poroikov V. (2008a). Computer-aided predictions for medicinal chemistry via Internet. SAR & QSAR Environ. Res. 19: 27-38.

Geronikaki A.A., Lagunin A.A., Hadjipavlou-Litina D.I., Elefteriou P.T., Filimonov D.A., Poroikov V.V., Alam I., Saxena A.K. (2008b). Computer-aided discovery of anti-inflammatory thiazolidinones with dual cyclooxygenase/lipoxygenase inhibition. J. Med. Chem. 51: 1601-1609.

Goel R.K., Kumar V., Mahajan M.P. (2005). Quinazolines revisited: search for novel anxiolytic and GABAergic agents. Bioorg .Med. Chem. Lett. 15: 2145–2148.

Golender V.E., Rozenblit A.E. (1978). Computer Methods for Drug Design. Riga: Zinatne, 232 pp.

Golender V.E., Rosenblit A.B. (1983). Logical and Combinatorial Algorithms for Drug Design, Research Studies Press, Wiley&Sons, 352 pp.

Labanauskas L., Brukstus A., Udrenaite E., Bucinskaite V., Susvilo I., Urbelis G. (2005). Synthesis and anti-inflammatory activity of 1-acylaminoalkyl-3,4-dialkoxybenzene derivatives. Il Farmaco. 60: 203–207.

Lagunin A., Stepanchikova A., Filimonov D., Poroikov V. (2000). PASS: prediction of activity spectra for biologically active substances. Bioinformatics. 16: 747-748.

Lagunin A.A., Gomazkov O.A., Filimonov D.A., Gureeva T.A., Dilakyan E.A., Kugaevskaya E.V., Elisseeva Yu.E., Solovyeva N.I., Poroikov V.V. (2003). Computer-aided selection of potential antihypertensive compounds with dual mechanisms of action. J. Med. Chem. 46: 3326-3332.

PASS program package, © Filimonov D.A., Poroikov V.V., Gloziozova T.A., Lagunin A.A. Russian State Patent Agency, N 2006613275 of 15.09.2006.

PharmaExpert program package, © Lagunin A.A., Poroikov V.V., Filimonov D.A., Gloziozova T.A. Russian State Patent Agency, N 2006613590 of 16.10.2006.

Poroikov V.V., Filimonov D.A., Boudunova A.P. (1993). Comparison of the Results of Prediction of the Spectra of Biological Activity of Chemical Compounds by Experts and the PASS System. Automat Document Math Linguistics. 27: 40-43.

Poroikov V.V., Filimonov D.A., Borodina Yu.V., Lagunin A.A., Kos A. (2000). Robustness of biological activity spectra predicting by computer program PASS for non-congeneric sets of chemical compounds. J. Chem. Inform. Comput. Sci. 40: 1349-1355.

Poroikov V., Akimov D., Shabelnikova E., Filimonov D. (2001). Top 200 medicines: can new actions be discovered through computer-aided prediction? SAR and QSAR in Environmental Research, 12 (4), 327-344.

Poroikov V.V., Filimonov D.A. (2002). How to acquire new biological activities in old compounds by computer prediction. J. Comput. Aid. Molec. Des., 16 (11), 819-824.

Poroikov V.V., Filimonov D.A., Ihlenfeldt W.-D., Gloriozova T.A., Lagunin A.A., Borodina Yu.V., Stepanchikova A.V., Nicklaus M.C. (2003). PASS Biological Activity Spectrum Predictions in the Enhanced Open NCI Database Browser. J. Chem. Inform. Comput. Sci. 43: 228-236.

Poroikov V., Filimonov D. (2005). PASS: Prediction of Biological Activity Spectra for Substances. In: Predictive Toxicology. Ed. by Christoph Helma. Taylor & Francis, 459-478.

Poroikov V., Lagunin A., Filimonov D. (2005). PharmaExpert: diseases, targets and ligands – three in one. QSAR and Molecular Modelling in Rational Design of Bioactive Molecules. Eds. Esin Aki Sener, Ismail Yalcin,  Ankara (Turkey), CADD & D Society, 514-515.

Poroikov V., Filimonov D., Lagunin A., Gloriozova T., Zakharov A. (2007). PASS: Identification of probable targets and mechanisms of toxicity. SAR & QSAR in Environmental Research., 18 (1-2), 101-110.

Sadym A., Lagunin A., Filimonov D., Poroikov V. (2003). Prediction of biological activity spectra via Internet. SAR & QSAR Environ. Res. 14: 339-347.

Stepanchikova A.V., Lagunin A.A., Filimonov D.A., Poroikov V.V. (2003). Prediction of biological activity spectra for substances: Evaluation on the diverse set of drugs-like structures. Cur. Med. Chem. 10: 225-233.