Lung Cancer is one of the hazardous diseases that have to be detected in earlier stages for providing better treatment and clinical support to patients. For lung cancer diagnosis, the computed tomography (CT) scan images are to be processed with image processing techniques and effective classification process is required for appropriate cancer diagnosis. In present scenario of medical data processing, the cancer detection process is very time consuming and exactitude. For that, this paper develops an improved model for lung cancer segmentation and classification using genetic algorithm. In the model, the input CT images are pre-processed with the filters called adaptive median filter and average filter. The filtered images are enhanced with histogram equalization and the ROI(Regions of Interest) cancer tissues are segmented using Guaranteed Convergence Particle Swarm Optimization technique. For classification of images, Probabilistic Neural Networks (PNN) based classification is used. The experimentation is carried out by simulating the model in MATLAB, with the input CT lung images LIDC-IDRI (Lung Image Database Consortium-Image Database Resource Initiative) benchmark Dataset. The results ensure that the proposed model outperforms existing methods with accurate classification results with minimal processing time.

In present scenario of world health, lung tumor has become the second largest death threatening disease next to Heart Attacks, as this type of tumor cause more death rate than other types of cancer [

The lung cancer can be majorly categorized into two as, Non-Small Cell Lung Cancer and Small Cell Lung Cancer based on the cell features. Among those, the Non-small cell Lung Cancer is the most common type of lung cancer, whereas the other category of tumor is diagnosed less [

Grade I-the tumor cells are confined to the lung

Grade II-the cancer cells are spreading to the chest area

Grade III-the cancer with large and persistent tumor cells that are confined to the chest area

Grade IV-the tumor cells are started spreading to other body parts.

There are several techniques are used to detect tumor from lung images such as, Sputum Cytoogy, MRI (Magnetic Resonance Imaging) images, X-ray images and Computed Tomography (CT) [

The proposed work for detecting Lung Cancer using Genetic Algorithm utilized CT lung image inputs from LIDC-IDRI dataset that comprises large amount of cancerous and non-cancerous images for training and testing. The contributions of the proposed model for Accurate Lung Cancer Segmentation and Classification are listed below.

The input CT images are pre-processed using two efficient filters such as Median and Average Filter.

For image enhancement, histogram equalization is processed.

Genetic Algorithm is effectively applied here for segmenting Region of Interest of cancer cells.

Specifically, the proposed model uses Guaranteed Convergence Particle Swarm Optimization for cancer tissue segmentation.

The final identification of cancerous and non-cancerous images is performed with Probabilistic Neural Networks (PNN) based classification, which can produce accurate results in binary classification.

Performance evaluation of the proposed model is derived with evaluation factors such as accuracy, precision and recall.

Comparative analysis is carried out for evidencing the work efficiency when compared with existing model in Lung cancer segmentation and classification.

The remaining part of this paper is framed as follows: Section 2 deliberates the existing models in Lung Tumor Detection and also the methods used genetic algorithms in different manner for tumor diagnosis. The work procedure of the proposed model in cancer diagnosis is described in Section 3. Section 4 contains the results obtained with the evaluation factors with proposed model and discussions of comparative evaluations. Finally, conclusion and points for future enhancement is given in Section 5.

There are myriad works have been developed in recent years for cancer detection. The method proposed by the authors in [

For classifying the images under benign and malignant classes, the work in [

In [

Massive Training Artificial Neural Network (MTANN) has been developed and presented in [

In the process of medical image processing in lung tumor diagnosis, the classification accuracy is very much important, for saving people lives. Hence, this paper develops Genetic Algorithm based Accurate Lung Cancer Segmentation and Classification called Image Model- Lung Cancer Segmentation and Classification (IM-LCSC), which comprises the following phases to be processed for accurate cancer detection from CT lung images.

Image Pre-Processing

CT Image Enhancement

Cancer Segmentation using Guaranteed Convergence Particle Swarm Optimization (GCPSO)

Classification using Probabilistic Neural Networks (PNN)

Analysis Metrics for Performance Evaluation

Here, the first phase of work involves in performing pre-processing of input CT images, from which the noise content is effectively removed using filters. The quality of noiseless images is further improved by image enhancement. Furthermore, the segmentation is employed using Guaranteed Convergence Particle Swarm Optimization technique for separating the cancer tissue from the input CT image. GC-PSO is incorporated here to enhance the segmentation accuracy. Finally, the segmented images are given for SVM training for classifying the images under two classes CANCEROUS and NON-CANCEROUS. The operations of the proposed IM-LCSC are depicted in

As mentioned in the

From the input CT image, extra noise is removed using two filters in this section such as median filter and average filter. In that, the median filter involves in noise removal and holds the sharpness of the image. Each pixel in the input image is replaced with a median value of their corresponding pixels nearer. Here, the filter uses 3 × 3 window matrix ‘M’ that has ‘X’ rows and ‘Y’ columns. A new matrix is constructed by appending zeros on two sides as ‘X+2’ and ‘Y+2’ and mask is generated with size 3 × 3. The mask is placed at the first element of matrix ‘M’ and the elements pointed by the mask are selected for sorting in ascending order [

The next process is using average filter for eliminating the spatial noise, which are caused during the data obtaining process of input CT image. The average value of neighbourhood of each pixel is determined and replaced with their corresponding average rates. In this, for matrix ‘M’, after sorting the elements in ascending order, the average or the mean value for all pixels is measured and replaced. The process is repeated for all elements to provide noiseless image to the image enhancement process.

Image enhancement is used to enhance the digital image quality thereby improving the classification accuracy. For that, the proposed model uses histogram equalization, through which, a small changes on the pixel intensities are performed. Moreover, each pixel in the input image is mapped to other neighbour pixel which is proportional to their corresponding ranks.

For histogram equalization, the histogram of the input digital image is computed and then, normalized with the probability distribution function. Normalization is determined by the rate of each pixel frequency to the total number of pixels presented in the lung image. This is mathematically equal to the image denotation matrix with X rows and Y columns. The transformation to modified histogram ensures that it is to be flat. The steps and the computations involved in the process are provided below.

Compute the histogram of input CT image

Cumulative Distributive Function (CDF) is calculated for determining the gray levels

The gray levels are computed using,

The gray levels are mapped to the image pixels

Modified histogram is plotted using this.

The discrete gray scale image is considered as ‘{m}’ and ‘n_k’ is the number of frequencies of gray level ‘k’. The probability of pixel frequency is computed as,

where, ‘N’ is the total number of gray levels in the Lung image, ‘n’ is the total number of image pixels and ‘P_m (k)’ is the equivalent image histogram of pixel rate ‘k’ normalized with [0,1]. The cumulative distribution function with respect to P_m can be given as,

The new image transformation is to be generated, G = T(m), the new image is given as {G} as the flat histogram output. From that, the cdf for new image is given as,

where, ‘H’ is the constant factor. Further, the property of cumulative distribution function can process transform function, which is given as,

Particle Swarm Optimization (PSO) is a meta-heuristic model that can be effectively used for image processing. The derivations can be made with the behaviour of birds that are on search for food. The basic idea behind PSO is data communication and information sharing. In the proposed model, each particle is assumed to have an initial velocity value and position. With respect the fitness rate, the rate of velocity and particle position is updated. The equations for position and velocity update of particles in PSO is given as follows,

where, ‘k_1 and k_2’ are the random numbers and ‘a_1 and a_2’ are the accelerate on co-efficient and the fitness function (FF) of PSO is derived as,

where ‘n’ is the total number of clusters. In the proposed model, GCPSO is used for image segmentation, which focuses on the particles with present best position in the space. In this process, the particle is swarm is considered as the particle member and the position and updating process of particle velocity is given as follows,

The search capability of the particles is carried out in the surface around the global best position. The diameter of the search space is given as ‘α(t)’ and ‘r’ is the random factor that ranges from 0 to 1. The search space diameter is updated with respect to the findings, as follows.

Here, the success and failure rates are determined with their corresponding consecutive success and failure orders. Moreover, order_success and order_failures are considered here as the threshold factors. The flow of working process of GCPSO is given in

1. BEGIN |

In the proposed IM-LCSC, Probabilistic Neural Network (PNN) is utilized for classifying CT lung images into classes, CANCEROUS and Non- CANCEROUS based on the segmentation done by GC-PSO. The advantage of using PNN with this is, the training process has less complexity. The structure of PNN is presented in

where, ‘P’ is the amount of patterns in the segmentation vector, ‘n’ denotes the training patters presented in the cluster ‘c’, ‘

This section presents the analysis metrics for performance evaluation of the proposed IM-LCSC. The performance is analysed with the classification of images on the following values,

i. True positive, which denotes the cancerous lung image that are accurately classified.

ii. True negative, which is the number of Non- cancerous images that are under the same.

iii. False positive, which is the number of non-cancerous images are classified under cancerous class

iv. False negative defines the number of sample cancerous images that are classified on to the non-cancerous class.

By determining the above values, the sensitivity (True Positive Rate), Specificity (Recall), Precision and Accuracy Rate are calculated. Further, the computations are provided below.

It can be defined as the probability of the non-cancerous class is accurately classified on the same

class and the equation is given as,

It can be defined as the number of images of non-cancerous class is identified as such; derivation is

presented in

The images are falsely classified as Non-cancerous and the calculation is given as,

It is derived as the falsely classified images as Cancerous, when it is really non-cancerous and the

derivation is presented below.

It is derived as the classification of lung CT images under CANCEROUS and NON-CANCEROUS

classes and the calculation is,

The quality of the IM-LCSC is evaluated by accuracy rate of image classification using the following derivation.

For evaluating the IM-LCSC in lung cancer diagnosis, the implementation is carried out in MATLAB V12. Furthermore, the obtained results that are derived on the metrics presented in Section 3.5 are compared with the available models, namely, Massive Training Artificial Neural Networks (MTANN), Fuzzy C-Means Clustering (FCM) and Genetic Algorithms based Template Matching (GATM) for evidencing the competence of the proposed model.

The proposed model is analyzed using the input CT images obtained from the benchmark dataset called LIDC-IDRI database dataset. The database contains lung images from 1018 patients. Moreover, the images are on three categories, namely, nodule size ≥ 3 mm, nodule size < 3 mm and non-nodule ≥ 3 mm and the sample image from the LIDC dataset is displayed in the

The evaluations are performed based on the analysis metrics such as True Positive Rate (TPR), True Negative Rate (TNR), False Positive Rate (FPR), False Negative Rate (FNR), Precision Rate and Accuracy Rate. The

The results obtained in classifying lung images with False Positive Rate and False Negative Rates are portrayed in

Accuracy Rate and Precision Rates are the most important parameters to be considered for evaluating a classification model in medical image diagnosis. In that concern, the computations are made and the results are portrayed in

In this paper, improved model for lung cancer segmentation and classification using genetic algorithm (IM-LCSC) is proposed for accurately diagnosing lung cancer from CT images. The model utilizes two filters, Median and Average Filter for eliminating noise from input CT images and enhanced with histogram equalization to improve image quality. The enhanced image is further processed with Guaranteed Convergence based Particle Swarm Optimization to segment the cancer image. Based on the segmentation results, PNN is used for training and classifying CT images under CANCEROUS and NON-CANCEROUS classes. The evaluation results show that the IM-LCSC produced 95.88% which is greater than the other works in cancer diagnosis. Moreover, the model produces minimal rate of false classification. Hence, the proposed model is more efficient and outperforms other techniques.

The proposed technique achieves better efficiency with noise cancellation filters and the expected results of image enhancement, classification and segmentation. This research work aims to propose a better genetic algorithm that overcomes existing techniques that provides assessment during lung cancer detection, diagnosis, and therapy, as a result accurate classification results with minimal processing time is obtained. The work can be improved with efficient algorithms such as machine learning and Deep Learning to improve result accuracy and the tumor can be categorized based on stages for providing exact results to the radiologists, in future.