Urea is the most common fertilizer used by the farmers. In this study, the variation of mid-infrared transmittance spectra with addition of urea in soil was studied for five different concentrations of urea. 150 gm of soil is taken and dried in a hot air oven for 5 h at 80°C and then samples are prepared by adding urea and water to it. The spectral signature of soil with urea is obtained by using an Infrared Spectrometer that reads the spectra in the mid infra-red region. The analysis is done using Partial Least Square Regression and Support Vector Machine algorithms by applying Savitzky Golay filter and Gaussian filter. The score plot, prediction and reference plots are used in the analysis using PLSR. RMSE and R-squared value are obtained from the analysis. It is evident that the detection accuracy was appropriate for Gaussian filter compared to Golay filter for both the PLSR and SVM models. The RMSE for PLSR is 0.8% and for SVM is 16%. The results show that Support vector machine model has higher accuracy compared to Partial least square regression model considering the prediction for which R square value is 0.99 with and without filters. SVM model gives better prediction without filters.

Soil is an important source of nutrients, minerals and several other constituents that is required for plant growth and for decomposing the dead matters. The essential macronutrients for plants present in the soil naturally are Nitrogen (N), Phosphorus (P) and Potassium (K). The NPK level in the soil will decide which plant is suitable for their land. NPK may be present in their form or different form i.e., the nitrogen may be in ammonia form. The nitrogen present in the soil may get evaporated. Several fertilisers are used to improve NPK content in soil. Soil nitrogen is an important component that enriches the plant growth. The most common fertilizer to improve the nitrogen content in soil is urea. The excess use of urea can also damage the crops. It is therefore essential to analyse the urea content in soil and thereby estimate the amount of nitrogen in soil. The standard methods are costly when a large number of samples is needed for analysis. Therefore, rapid and low-cost technique is needed for precision farming. Mid Infra-red (MIR) spectroscopy method shown to be fast, simple, convenient, accurate and able to analyse more nutrients at the same time.

Near Infra-red (NIR) spectroscopy is used to find chemical and physical properties of the samples which samples are measured using NIR spectroscopy. In this method 165 samples were collected from a particular area in different places, 135 spectra samples were used during analysis of nutrient content and calibration. They have used principle component analysis (PCA) and partial least square algorithm (PLS). This method is expensive when a high number of samples are used for testing [

The understanding of soil organic and moisture content is responsible for the plant growth and health. The ability of predicting these components using a Near Infrared Sensor was examined. The different pre-processing techniques to remove outliers were detailed. The Standard error technique was used to validate the NIR sensor prediction of organic matter and moisture level [_{3}-N to NH_{4}-N reduction with standard Kjeldahl nitrogen digestion method were done. The nitrogen content was measured on soil, water and body and plant tissue. This analysis helps to choose the correct method based on the requirement [

The canonical discriminant analysis, principal component analysis and depth of band analysis were done for examining an iron content and organic matter. From the analysis result it is clear that the spectral shape is affected by the iron and organic matter [

The land is prepared (548 × 640 cm area) in an Agricultural plot, Bannari Amman Institute of Technology College, Sathyamangalam, and Tamilnadu. Soil samples were collected from the prepared plot. Soil samples were transferred to air tight plastic bags. By sieving process, grass, stones and other external objects were removed from the soil samples. Here the sieving is done using 2 mm sieve.

150 gm of soil is taken and dried in a hot air oven for 5 h at 80°C. It was split into 6 samples, each sample containing 20 gm of soil. Urea is added with each 20 gm soil as 0, 0.5, 1, 2, 3 and 4 gm respectively. Then, the soil sample is mixed with 10 ml of water and kept for one day is mixed with 25 ml of water and stirred. Next the mixed soil sample is filtered, and water sample collected in beaker. Then finally filtered with help of filter paper and collected in the sample tubes for MIR spectroscopy analysis.

Mid-infrared spectroscopy is a very essential and generally used sample analytical method. It detects the fundamental vibrations of minerals and organic matter, which have strong absorptions. MIR is used in chemical composition of materials. MIR spectroscopy is mostly used in the pharmaceutical industry because of high informational content of the spectra. The spectral range of the MIR spectroscopy is 400 cm^{−1}– 4000 cm^{−1} (2500 – 25000 nm). Electromagnetic radiation is passed through the sample, the sample absorbs and reflects the radiation, with help of the absorption and reflection, the spectra is created. The resulting spectral signature shows how much energy was absorbed at each wavelength. It also responds to mineral composition and soil organic. The absorption reveals the molecular structure of the sample and quality of the sample. MIR is low cost, to detect the soil without chemical components and easy to work.

The Beer-Lambert law gives the relationship between the light attenuation and the same substance’s properties. The relationship can be exp

A is absorbance (no units)

Ɛ - Molar absorptivity (L mol^{−1} cm^{−1})

l - Sample’s path length (cm)

c - Compound concentration (mol L^{−1})

Attenuated Total Reflection (ATR) is a wave which is penetrating the electromagnetic field whose intensity decays quickly as it moves away from that source. The beam interacts and absorbs energy from the sample. So, the reflected wave’s intensity which reaches the detector is reduced. An Attenuated Total Reflection is the most essential technique used in laboratories for sampling, allowing for quick analysis of liquid and solid materials. MIR spectra are a powerful technique to identify the unknown chemicals. ATR generally allows little or no preparation of samples which greatly accelerates sample analysis. It allows the IR beam to penetrate into the sample in very thin path length and depth. It is useful for samples which are too thick to be examined during transmission and those which absorb radiation strongly.

Absorbance is a measure of the amount of light with a defined wavelength which prevents a given material from going through it. The transmittance is described as the ratio of the transmitted intensity over the light intensity.

T - Transmittance (No unit)

I - Intensity of incident light (candela)

I0 - Intensity of reference light (candela).

In order to measure a sample in a spectrometer, a reference sample is required for performing computations. In the proposed method, the sample is analysed in the Mid Infra-red region of the spectrum. Here, water sample is used as reference as the soil samples are prepared using water. Once the sample is placed on the spectrometer, it starts scanning the sample and the intensity of the sample under test is recorded for the wavelength from 5.5 μm to 12.5 μm (wavenumber 1800 cm^{−1} to 800 cm^{−1}) with a sample interval of 0.04 μm. Hence, for a single sample, 128 intensity data are recorded for the wavelength from 5.5 μm to 12.5 μm in Comma Separated Value (CSV) format and the readings can be visualised as a graph between intensity and wavelength on the interface software of the spectrometer. First, the water sample is to be analysed in MIR spectroscopy. Then the soil samples are intensity spectra is recorded as it is done for the water sample. Each soil samples are subject for recording 30 spectral data in order to train the model developed using Partial Least Square Regression (PLSR) and Support Vector Machine (SVM). This is the initial procedure for collecting the data. This process is repeated for the remaining five soil samples. Finally, 6 types of sample data are collected in six different folders. Each data folder contains 33 data sets, which contains 3 water samples data and 30 soil samples data. The folders’ names are indicated as Pure Soil, Soil Urea 0.5 gm, Soil Urea 1 gm, Soil Urea 2u gm, Soil Urea 3 gm, Soil Urea 4 gm. All folders are consolidated with the help of python program.

Partial least squares regression is a popular numerical method. The basic principle of PLSR is finding the correlation between the sample and variable. The sample data is plotted and decomposed into latent structure after several iterations. Then the T vector was found which shows most variation in sample data. The same plotting and decomposition were done for variable data also. The most variation in variables was represented by the U vector.

The plotting of u and t will give the relationship between variable and sample. The underlying model of PLSR is given as

Y - Sample matrix

X - Variable matrix

U and T - Projection of Y and X matrices

P and Q - Orthogonal matrices

E and F - Errors

Using the projection [U, T] and orthogonal matrices [P, Q] were able to construct the regression model between X and Y. The regression equation is given as

Y - Sample matrix

X - Variable matrix

Th, Ch, Wh, Ph - Matrix generated by PLSR algorithm

ε_h - Residual matrix

Support vector machine was used for multi class classification. The number of SVM models used for multi classification was equal to the number of classifications. The SVM model is given below

_{i} - Training data

ϕ, C - Penalty parameter

For minimizing

The k decision function is given below

The variable x belongs to which class is depended on the k decision function

Class of variable x,

The soil divided into 6 parts and in each part, urea was mixed at different concentrations. The 6 soil samples were analysed using MIR spectroscopy. First the raw data was used to develop the PLSR and SVM model for soil classification based on urea content. Then the data was pre-processed using Savitzky-Golay filter (zero order) and Gaussian filter (window size 7). Again, the PLSR and SVM model was developed for pre-processed data. After that all the two models with and without filtered data were compared for finding the best method. All the spectral analysis, pre-processing and model evaluation techniques are done using the Unscrambler (Evaluation version) tool and is discussed in the following sections. It is observed that results show that Root Mean Square Error (RMSE) is very minimum using PLSR model compared to SVM model as indicated in ^{2} value. From ^{2} value is 0.99.

Model dataset | For data without filter | Golay savitzlq, filter | Gaussian filter results | |||
---|---|---|---|---|---|---|

RMSE | R-square | RMSE | R-square | RMSE | R-square | |

PLSR calibration | 0.0083583 | 0.9768696 | 0.0062823 | 0.9847907 | 0.0040786 | 0.9929104 |

Validation | 0.0084908 | 0.9763952 | 0.0063795 | 0.9844902 | 0.0041453 | 0.9927576 |

SVM calibration | 0.1681785 | 0.9913318 | 0.1602461 | 0.9923993 | 0.1588369 | 0.9925927 |

Validation | 0.1765683 | 0.9904004 | 0.1670446 | 0.9916455 | 0.1634977 | 0.9921014 |

The 6 types of soil samples with various urea concentration are analysed using the MIR spectroscopy. The spectra of standard urea solution using FTIR has peaks that can be observed in the range from 1500 cm^{−1} to 1800 cm^{−1} and from 3500 cm^{−1} to 3800 cm^{−1} [^{−1} of spectrum and y-axis represent the transmittance value in percentage. For classifying the different concentrations of urea in the soil spectra, spectrum data between wavenumber 1300 cm^{−1} and 1850 cm^{−1} as shown in ^{−1}.

The PLSR model was developed for the soil spectra dataset between wavenumber 1300 cm^{−1} and 1850 cm^{−1}. For model valuation the score plot was used and it is shown in

The predicted and reference sample of calibration and validation dataset were shown in

The R-squared value i.e., the coefficient of determination is calculated using the formula

The Root Mean Square Error (RMSE) is the standard deviation of the residuals i.e., the prediction errors and is obtained by squaring the residuals, find the average of the squared residuals and finally taking the square root of the results. The RMSE shows how far the data points are closer to the regression line. It is given by

_{i} - Actual observations

N – Number of non-missing data points

i – variable index

The spectrum data with wavelength 1300 cm^{−1} and 1850 cm^{−1} was taken and developed using SVM. The SVM classifies the urea content in soil based on the hyperplane. The number of hyperplanes depends on the number of samples needed to be classified. The classification of soil using SVM based on urea content is shown in the

The Savitzky-Golay filter is applied to the spectrum data for smoothing the signal. It is used to reduce the presence of tiny noise signals. The concepts behind this filter are convolution and polynomial approximation. The polynomial coefficients are linear to the raw data values. If the smoothing window size is N*N and the polynomial order is k. The general filter equation for smoothing is

n = (N−1)/2

C = Convolution matrix

_{xy} = Original/raw data

For fitting the polynomial in the raw data equation, the coefficient has to find by solving the least square

A is Polynomial coefficient vector

Coefficient matrix compute by

Because of linearity fitting to the data, the coefficient should be calculated independently. For that the unit vector replaces the f in the equation. Then the coefficient matrix converted as follows

The 1st coefficient was used to smooth the spectrum data. The other coefficients were used to compute derivatives. The Golay filtered data with wavenumber 1300 cm^{−1} and 1850 cm^{−1} after was considered for urea content classification. The PLSR model was developed using the new dataset. The same steps and plots discussed in 3.1 were used to evaluate the PLSR model. The evaluation plot of the developed PLSR model after applying Golay filter is shown in

The spectrum data after applying Golay filter with wavelength 1300 cm^{−1} and 1850 cm^{−1} was taken and developed using SVM. The classification of soil based on urea content is shown in the

The Gaussian filter is based on 2-D convolution and it is used to smoothening the data by removing the noise. It is somewhat similar to the mean filter, but the kernel used in this filter varies. A Gaussian filter alters the input data by convolution with the help of Gaussian function. In 1D, the Gaussian function is given as

The Gaussian function of 2D is given as

The Gaussian filter was applied to the raw spectrum data to smoothening the signal. After pre-processing by Gaussian filter the spectrum data with wavelength 1300 cm^{−1} and 1850 cm^{−1} was taken. The PLSR model was developed for this new dataset. The PLSR score plot of Gaussian filter data is shown in

The spectrum data after applying Gaussian filter with wavelength 1300 cm^{−1} and 1850 cm^{−1} was taken and developed using SVM. The classification of soil based on urea content is shown in the

The

Soil urea detection based on Mid Infrared spectroscopy and machine learning has become more feasible but the performance of prediction is based on the algorithms implemented. The result of this work indicates that the pre-processing technique is much important for increasing the accuracy of soil urea detection. The best pre-processing technique is assessed by comparison. This comparison shows that the Gaussian filter is better than the Savitzky-Golay filter. RMSE is 0.8% for PLSR model and 0.16% for SVM model without filter and is further reduced when pre-processing is used with Golay and Gaussian filter. Though RMSE is very low for PLSR, R square value decides the prediction of results and the results shows that SVM has better R square values with and without filter. The R-squared parameters values shows that Support Vector Machine model has higher accuracy compared to Partial Least Square Regression model with a factor of 0.99. In future, a portable handheld device will be designed in by incorporating this model. This device definitely will revolutionize the agricultural field. The farmer can use this device for protecting their crops from excess and lack of urea content. It helps them to increase their productivity.

We would like to thank the Management, Bannari Amman Institute of Technology for providing support by providing the agricultural plot for this research and Robert Bosch, Bangalore for providing MIR spectrometer to measure the soil spectra for analysis.

