Optimal node clustering in wireless sensor networks (WSNs) is a major issue in reducing energy consumption and extending network node life time and reliability measures. Many techniques for optimizing the node clustering process in WSN have been proposed in the literature. The metaheuristic algorithms are a subset of these techniques. Genetic algorithm (GA) is an evolutionary metaheuristic technique utilized to improve the network reliability and extending the network life time by optimizing the clustering process in the network. The GA dynamic clustering (GA-DC) algorithm is proposed in this paper to extend the network reliability and node life time of three dimensional (3D) WSN. The GA-DC algorithm made use of an improved fitness function that takes into account a variety of metrics such as energy expenditure per protocol round, clustering distance, and the number of long-distance wireless connections. There have been two types of simulation scenarios run. First, simulation results show that the GA-DC algorithm increases network life time by 80% and network throughput by 55% when compared to the well-known LEACH protocol. Second, simulation results show that the uniform node distribution outperforms the normal and exponential distributions in terms of network life time by 5.7% and 7%, network reliability by 4.2% and 76%, and data throughput by 10.85% and 19.54%, respectively

Rapid technological advances in the field of micro electro mechanical systems (MEMS) have led to the development of cheap and autonomous miniature sensor nodes [

Sensor nodes can send their collected data directly to the sink node [

Nature-inspired or meta-heuristic algorithms have been applied to WSNs to achieve a balance between balancing energy consumption among sensor nodes and lowering overall energy consumption. Genetic algorithm (GA) is one of the most popular and few metaheuristics techniques used for solving the four fundamental problems of WSNs. These include: energy efficient clustering, localization of sensor nodes, data aggregation, and optimal coverage [

This paper has the following contributions: First, GA dynamic clustering (GA-DC) algorithm is proposed to extend the life time of 3D WSN and network reliability. Second, three random deployment strategies of sensor nodes are investigated and evaluated for better network life time and reliability. Three, a new GA fitness function is used to optimize the number of clusters in the network, which essentially includes information about the type of connections MNs establish with CHs in a cluster and CHs establish with BSs in the network.

The rest of the paper is organized as follows. Section 2 overviews the related work. Section 3 describes GA operators’ details. In Section 4 GA fitness function is introduced a long with cluster formation and maintenance phases. Section 5 provides the simulation setups and configuration. Simulation results are explained in Section 6. Finally, we conclude our research in Section 7.

A lot of researches have been conducted utilizing nature inspired algorithms for optimizing WSNs life time. GA has showed a great applicability and solutions to different challenges in WSNs. In [

GA is an evolutionary, bio-inspired, meta-heuristic and stochastic optimization algorithm [

GA is an iterative algorithm. It applies the three operators at each run to find new chromosome which may yield better solution. The GA operators are [

Selection: At this stage, chromosomes with higher fitness values (parents) are selected to produce next generation's offspring. Roulette wheel, tournament, rank, and elitism are some examples of the selection operator.

Crossover: At this stage, the genetic material of the two selected parents are merged to produce offspring. Single point, double-point and uniform-point techniques are used individually or collectively in this phase.

Mutation: At this stage, single or multiple gene modifications are applied to the newly produced chromosomes in order to broaden the search space.

GA is terminated by checking two conditions: achieving a threshold optimal solution or reaching the maximum number of algorithm repetitions. The GA then suggests the best chromosome as a solution.

We propose genetic algorithm dynamic clustering (GA-DC) algorithm to improve network reliability, extend the network life time and reduce the overall energy expenditure per protocol round. GA-DC algorithm uses an enhanced fitness function (F) compared to other clustering algorithms. GA-DC fitness function is comprised of the following components.

In order to extend network life time, energy consumption per protocol round must be minimized. The chromosome with the minimal energy expenditure is chosen as a solution. Total energy expenditure is the sum of the energy consumed within a cluster (E_{C}) and the energy consumed by the CH to transfer the aggregated data to the BS (E_{BS}). For a cluster of

For

The total energy (E_{T}) expenditure per protocol round is:

_{TX}(i, CH)^{th} node to its corresponding CH. _{RX}_{DA}_{TX}(j, BS) shows the transmitted energy consumed by the j^{th} CH to the BS. This paper uses the simple node's radio energy model [_{TX} depends on the distance between the transmitter and the receiver to switch between the free-space and multiple-path amplification models and the number of bits _{TX}

d_{0} is the distance threshold to swap between the free-space and multiple-path models. d_{0} is given by

The clustering distance (_{C}_{C}

^{th} member node to its corresponding j^{th} CH, and the distance from the j^{th} CH to the BS respectively.

The total distance in the network (_{T}

The number of CHs

A node can connect to either a CH or to the BS. If the distance between a CH or the BS is greater than a threshold value (_{0}_{mp})_{mpBS}_{mpBS}

As the value of the _{mp}

The fitness function (F) of GA-DC protocol is a function of four independent variables and is given by:

ω1, ω2, ω3 and ω4 represent constant coefficients weights of the four fitness parameters. Their values are application-dependent.

GA-DC protocol has two working phases, cluster formation and data collection. The following explains the two phases.

Cluster formation: At this phase GA-DC algorithm runs in the BS since it knows the geographical location of all nodes. Binary encoding technique is utilized in the GA-DC algorithm. In this approach, CHs are encoded as binary ‘1’ and other member nodes as binary ‘0’. An example of binary encoding is shown in

Data processing and transfer: At this phase data collected by MNs are transferred to the CHs. Then, CHs process, aggregate and compress the collected data into packets before being transferred to the BS. CHs organize data collection from their MNs through TDMA scheduling while sending their data reports utilizing CDMA technique.

Algorithm 1 shows the pseudo code for the GA-DC algorithm.

In this section a number of network metrics are utilized to evaluate the performance of the GA-DC algorithm for various network configurations. These metrics include:

Network life time (NLT): Represents the time duration from the beginning of the network to the death of the last node. NLT is usually measured by the number of protocol rounds till the network is down.

Energy metric: Represents a number of energy parameters to measure. It includes, total remaining energy per protocol round, average energy consumption per protocol round.

Stability period: Represents the time interval (number of rounds) till the death of the first node in the network, usually referred to as the network reliability measure. The time interval from the death of the first node (FND) to the death of the last node (LND) is referred to as the instability period.

Network throughput: Represents the accumulated data reports reaching the BS during the network life time.

Simulation and algorithm implementation are done using MATLAB. A number of simulation scenarios are implemented. The first scenario compares the performance of the GA-DC algorithm to the state-of-the-art LEACH protocol in a 3D network setting with the BS in the middle of the network and a uniform node distribution. The second scenario involves evaluating the performance of the GA-DC algorithm when different node distributions are used. In this paper, three common node distributions are used viz. uniform, normal and exponential. Most of the literature assumes that nodes are uniformly distributed in a 2D environment. The actual location of a node is specified by generating a uniform random number in the interval [0,1] and then multiplying it be the network dimensions. Uniform distribution does not require any location information to determine a node position. All network locations are equally likely to be chosen as a potential node position. The need to define distribution parameters is a major issue when distributing network nodes according to other non-uniform distributions. Normal node distribution requires the sample mean and standard deviation, (

Three dimensional (3D) networks are more realistic than 2D networks. The third dimension is important in determining the CHs to connect with. It also important in calculating the energy expenditure budget during protocol round. It is shown that 3D networks life span is less than the life span of the 2D networks [

Parameter | Value |
---|---|

N | 100 |

Node energy | 0.5 J |

E_{elec} |
50 nJ/bit |

E_{DA} |
5 pJ/bit |

ɛ_{fs} |
10 pJ/bit/m2 |

ɛ_{mp} |
0.0013 pJ/bit/m4 |

d_{0} |
87 m |

k | 4000 bits |

Network volume | 200 × 200 × 200 |

Crossover rate | 1.0 |

Mutation rate | 0.02 |

Rounds | 2500 |

Two sets of simulation scenarios are investigated in this section. The first scenario compares and evaluates the performance of the GA-DC algorithm with that of the well-known LEACH algorithm for clustering and routing in a 3D network with uniformly distributed nodes.

_{0}) in which the GA-DC algorithm outperforms the LEACH algorithm. P_{0} represents the point at which the remaining energy in the network for the two algorithms is the same. P_{0} occurs at round number 738 in which 11.5% of the energy is available. GA-DC algorithm utilizes the remaining energy efficiently to extend the network life time.

The second scenario evaluates and compares the performance of GA-DC algorithm for three different node distributions. The goal of this scenario is to determine which node distribution is best in terms of energy consumption and extending the NLT.

Metric | Uniform | Normal | Exponential |
---|---|---|---|

NLT (rounds) | 1965 | 1859 | 1836 |

FND | 250 | 240 | 142 |

LND | 1965 | 1859 | 1836 |

Throughput (packets) | 12572 | 11341 | 10517 |

Frequent # of CHs | 4 | 4 | 5 |

_{u}_{o}

Metric | Normal | Exponential |
---|---|---|

NLT (rounds) | 5.7% | 7% |

FND | 4.2% | 76% |

Throughput (packets) | 10.85% | 19.54% |

In this work, the GA-DC algorithm is proposed for 3D network setting. The GA-DC algorithm is based on binary GA for distinguishing CHs from MNs. When compared to the state-of-the-art LEACH protocol, the GA-DC algorithm increased network life time by 80% and data throughput by 55%. The GA-DC algorithm demonstrated that the uniform distribution of network nodes in a 3D setting outperformed the normal and exponential distributions in terms of network life time, reliability, and network throughput. When compared to the normal distribution, the uniform distribution improved the NLT by 5.7%, reliability measure by 4.2% and throughput by 10.85%. Furthermore, when compared to the exponential distribution, the uniform distribution improved NLT by 7%, reliability measure by 76% and throughput by 19.54%. The GA-DC algorithm employs a new fitness function that implements various parameters to compute the cost value of the proposed solution. The parameters include information about the proposed solution's energy consumption, inter and intra clustering distances, and short and long range wireless link connections.

MLE technique computes the defining parameter(s) of a random distribution by maximizing the likelihood function, L, of the samples of interest. L(θ; x_{1},…, x_{n}) of n samples is defined by:

_{X}(x_{1},…, x_{n}; θ) is the joint probability function from which the samples are drawn. Assuming that the random samples are independent, identically distributed (IID) random variables, then L is simplified to:

To find the value of θ that maximize the L function, the derivative of the L function is computed and set to zero.

It is usually easier to work with ln[L(θ)] since the maxima of the L(θ) and ln[L(θ)] occur at the same θ value especially when working with distributions that utilize an exponential function in their distribution.

For n independent random samples x_{1},…, x_{n}, the L function is given by:

Taking the natural logarithm of both sides of

To find the estimators for μ and σ that maximize the L function.

Solving

For n independent random samples x1,…, xn, the L function is given by:

Taking the natural logarithm of both sides of

To find the estimator for λ that maximize the L function we solve the following.