At present Bayesian Networks (BN) are being used widely for demonstrating uncertain knowledge in many disciplines, including biology, computer science, risk analysis, service quality analysis, and business. But they suffer from the problem that when the nodes and edges increase, the structure learning difficulty increases and algorithms become inefficient. To solve this problem, heuristic optimization algorithms are used, which tend to find a near-optimal answer rather than an exact one, with particle swarm optimization (PSO) being one of them. PSO is a swarm intelligence-based algorithm having basic inspiration from flocks of birds (how they search for food). PSO is employed widely because it is easier to code, converges quickly, and can be parallelized easily. We use a recently proposed version of PSO called generalized particle swarm optimization (GEPSO) to learn bayesian network structure. We construct an initial directed acyclic graph (DAG) by using the max-min parent’s children (MMPC) algorithm and cross relative average entropy. This DAG is used to create a population for the GEPSO optimization procedure. Moreover, we propose a velocity update procedure to increase the efficiency of the algorithmic search process. Results of the experiments show that as the complexity of the dataset increases, our algorithm Bayesian network generalized particle swarm optimization (BN-GEPSO) outperforms the PSO algorithm in terms of the Bayesian information criterion (BIC) score.

Bayesian networks (BN) [

In simple terms, BN is a DAG i.e., it consists of nodes and edge, where nodes represent random variables and edge the conditional independence present among variables. Moreover, BN also has some conditional probability tables (CPTs) showing the probability of the occurrence of an event given a combination of nodes and their values [

We can use three methods to learn the bayesian network structure that is score-based, where we choose the bayesian network having the best score [

In the past decades, researchers have proposed different heuristic approaches to learning bayesian network structure, including (but not limited to) particle swarm optimization (PSO) [

The remainder of the paper is systematically ordered as follows: in Section 2, we discuss recent and past work done on our topic and our contribution. In Section 3, we present the basics of Bayesian networks and score-based BN structure learning. In Section 4, we discuss in detail the discretization of GEPSO and the velocity update procedure. In Section 5, we report our proposed methodology. In Section 6, we present the experimental setup, evaluation indicators, and results. Lastly, conclusions are presented in Section 7.

Recently researchers have adopted the following approaches to learning BN structure using PSO: local-information PSO (LIPSO) [

It is a common issue to face challenges regarding the time taken and accuracy while experimenting on higher complexity networks.

Generalized particle swarm optimization (GEPSO) [

Ref. No. | PSO/Variant | Hybrid or Score-based | Scoring function used |
---|---|---|---|

9 | PSO | Hybrid | BIC |

19 | PSO | Score-based | BIC |

20 | Variant | Hybrid | BIC |

21 | Variant | Score-based | K2 |

22 | Variant | Score-based | BIC |

24 | Variant | Score-based | AIC |

39 | Variant | Score-based | BDe |

40 | Variant | Score-based | MDL |

In this paper, we have made two contributions: firstly, we discretize GEPSO and use it to learn BN structure (the discretization was inspired by [

Two basic components that make a BN are a DAG denoted as

An edge that goes from node N1 to node N2 (denoted as N1→N2) means that node N1 is the cause (parent) of node N2. Non-appearance of an arrow tells that variable under observation does not depend on each other (marginal or conditional independence). Leveraging the Markov property, we can write the following, see

The aim of the score-based BN structure learning is to use the training dataset (D) to determine the network structure (T) having the best score see

Since P(D) is independent of structure, P(T, D) is used as a scoring metric. There are different scoring metrics, but the one we will be using is the BIC score see

Here

After selecting the scoring function, we apply GEPSO to search for the optimal solution. Since we are working on a discrete optimization problem, Discrete binary PSO coding is used where particle position is represented with a matrix (see

Both these matrices have size n x n.

The GEPSO formula is redefined to work in discrete space (using DBPSO). Refer to

We update the personal and global best position as shown in

In our research, we keep the position updated as it is, while we propose to use a selection method to update velocity. We define the five velocities shown in

After that, we save all BIC scores in an array and normalize the array by its sum (so that the outputs are between {0,1}. We sort the array (array = [a, b, c, d, e]) in ascending order (while doing so, we keep the velocity linked with array indexes, such that we know whether “a” corresponds to V1 or V2 and so on). To be unbiased a random number in the interval {0,1} and we pick the update velocity as in

In our proposed method, we get an initial DAG (initial structure) to generate PSO particles by combining the MMPC algorithm [

MMPC algorithm takes the target node and dataset as input and outputs a set that consists of possible parents and children of the target node (this set is not constrained in our approach). The max-min heuristic outputs the maximum association achieved (maximum from all the minimum associations, given target node and CPC set) and the node that has this value. If this association value is greater than zero, we add that node to our CPC set. In the backward phase, we double-check our CPC set for false positives (if there is some node in CPC that can cause another node in CPC to be independent of the target variable when conditioned on it, we remove that (another) node).

After MMPC executes, we have an un-directed graph between the target node and its CPC, so we use CRAE (

For two variables A_{i} A_{j}, where |A_{i}| and |A_{j}| represent the count of values of the variables A_{i} and A_{j}. CRAE is given as follows:

We perform two operations similar to those used in [

The experimental setup consisted of a personal laptop having the 10th generation Intel Core i7-10750H CPU with a frequency of 2.60 GHz, MS-Windows 10 (x64), and 16 GB RAM. The programming language used was Python (3.9.7), the main library used was “bnlearn,” and the compiler was “Spyder IDE 5.1.5” (Anaconda). The datasets used for testing our algorithm are the ASIA network (benchmark dataset), a small dataset that relates a visit to Asia and lung diseases, and the SACHS network, which consists of data regarding 11 proteins and phospholipids derived from immune system cells. For each dataset, we use four different sample sizes. These datasets are available in the bnlearn library. Samples are generated by using the python “bn.sampling” (model, number-samples) command.

Dataset | Nodes | Edges | Samples |
---|---|---|---|

ASIA | 8 | 8 | 500, 1000, 3000, 5000 |

SACHS | 11 | 17 | 500, 1000, 3000, 5000 |

To evaluate our results, i.e., the learned Bayesian network structure we use the Mean BIC score of the final global best structure. BIC score outputs a negative value. The smaller the value, the better the result. In other words, BIC score furthest away from 0 is better. Time taken per iteration of PSO (TPI) in seconds and the total time taken for complete execution in minutes (TTT). Each experiment is repeated five times to get mean values. The number of particles is 100, total iterations for the GEPSO search procedure are 50.

We have named our method BN-GEPSO, and we compare our results against discrete binary PSO (DBPSO) and LIPSO. DBPSO is in fact classic PSO, where we code position and velocity into matrices followed by adding the three velocity matrices and updating the position by the addition of velocity and position matrix.

Dataset/Samples | Evaluation criteria | DBPSO | BN-GEPSO | LIPSO |
---|---|---|---|---|

ASIA |
BIC |
−1233.88 |
−1231.19 |
−1226.69 |

ASIA |
BIC |
−2284.15 |
−2296.82 |
−2299.74 |

ASIA |
BIC |
−6780.23 |
−6796.50 |
−6795.45 |

ASIA |
BIC |
−11352.54 |
−11343.17 |
−11342.68 |

Dataset/Samples | Evaluation criteria | DBPSO | BN-GEPSO | LIPSO |
---|---|---|---|---|

SACHS |
BIC |
−4818.6 |
−6114.9 |
−7500.3 |

SACHS |
BIC |
−8691 |
−9402.5 |
−10989.04 |

SACHS |
BIC |
−24270.7 |
−25015.5 |
−26956.63 |

SACHS |
BIC |
−39610.92 |
−41256.84 |
−44806.97 |

If we compare our BIC score with DBPSO and LIPSO on the ASIA dataset, we perform better than both of them. If in any case, we do lose, the margin is small. Contrary to that, on the SACHS dataset, LIPSO stays ahead of BN-GEPSO while using a dataset with more edges and nodes. A reason for this behavior is that LIPSO adds more information into the initial DAG using the mutual information step. In contrast, we don’t use this step. Another important behavior to see is that as the dataset gets bigger, we are outperforming the DBPSO algorithm, which was our target in this research, to prove that GEPSO works better than PSO while learning the BN structure.

Now, if we compare TTI and TTT terms we are losing here. The reason is that we have five velocity terms (PSO has three). Another reason is that we are not using any constraining step to limit the maximum number of CPC nodes for each target node.

While DBPSO does not have CPC and LIPSO constrains it to a maximum of 2. We did this to show how knowledge sharing in GEPSO is working. Note that the fourth term in GEPSO, which choose the random best position of a random particle, works to share information between particles. From our experiments, we can see that this information sharing is not as effective as using mutual information at the beginning phase before the CRAE step.

We have discretized GEPSO (a new generalized PSO algorithm) and used it to learn the Bayesian network (BN) structure. We name this Bayesian network generalized particle swarm optimization (BN-GEPSO). The discretization was inspired by discrete binary particle swarm optimization (DBPSO). BN-GEPSO uses the max min parent’s and children (MMPC) algorithm and crosses relative average entropy (CRAE) to generate an initial directed acyclic graph (DAG) structure. Then two operations of adding and reversing edges are performed to generate an initial population for GEPSO. Finally, GEPSO iterates, and we choose the best global solution as our output. We also proposed a way to update the velocity hence improving the search process. Experiments were performed on datasets of different sample sizes, which concluded that BN-GEPSO outperforms DBPSO as the complexity of the dataset increases. Future research directions include: reducing total time taken per iteration (TTI) and total time taken for complete execution (TTT) values by constraining candidate parents and children (CPC) set, applying this approach to continuous data (learning BN structure from continuous data, and instead of using our equations, the GEPSO paper is to be the reference. GEPSO has different parameters, weights, etc., which are not taken into consideration when discretizing. This means that we still have to fully unleash the power of this algorithm). This approach can also be extended towards time-based data as in dynamic Bayesian networks (where we work with data at different time slices), experimenting on large samples sizes and datasets by leveraging the parallel power of graphics processing units (GPUs), improving the velocity update procedure, application-based studies, for example, using BN-GEPSO for the analysis of disease, making an application for insurance companies to predict the risk of each customer to decide whether they are a good or bad fit (this approach will focus more on BN prediction, BN classification, and BN inference).

The authors extended their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through the Large Groups Project under grant number RGP.2/132/43. The authors express their gratitude to the editor and referees for their valuable time and efforts on our manuscript.

Funds are available under the Grant Number RGP.

The authors declare that they have no conflicts of interest to report regarding the present study.