This study aims to realize the sharing of near-infrared analysis models of lignin and holocellulose content in pulp wood on two different batches of spectrometers and proposes a combined algorithm of SPA-DS, MCUVE-DS and SiPLS-DS. The Successive Projection Algorithm (SPA), the Monte-Carlo of Uninformative Variable Elimination (MCUVE) and the Synergy Interval Partial Least Squares (SiPLS) algorithms are respectively used to reduce the adverse effects of redundant information in the transmission process of the full spectrum DS algorithm model. These three algorithms can improve model transfer accuracy and efficiency and reduce the manpower and material consumption required for modeling. These results show that the modeling effects of the characteristic wavelengths screened by the SPA, MCUVE and SiPLS algorithms are all greatly improved compared with the full-spectrum modeling, in which the SPA-PLS result in the best prediction with RPDs above 6.5 for both components. The three wavelength selection methods combined with the DS algorithm are used to transfer the models of the two instruments. Among them, the MCUVE combined with the DS algorithm has the best transfer effect. After the model transfer, the RMSEP of lignin is 0.701, and the RMSEP of holocellulose is 0.839, which was improved significantly than the full-spectrum model transfer of 0.759 and 0.918.

Holocellulose (including cellulose and hemicellulose) and lignin are the main components of wood, and they are closely related to other properties of wood, along with its processing and utilization of it. In the paper industry, the content of holocellulose is closely related to wood pulp yield and quality, while the content of lignin is an important basis for formulating cooking and bleaching process conditions [

In recent years, many scholars have carried out a large number of model transfer studies on model sharing between different instruments, most of which are standard sample algorithms [

The wood powder samples used in the experiments were provided by the Institute of Chemical Industry of Forest Products, Chinese Academy of Forestry (China), with a total of 82 log samples and their holocellulose and lignin content values. The logs were cut into wood chips and ground, and then the wood powder samples with a particle size of 0.250 to 0.425 mm (40 to 60 mesh) were selected to determine their holocellulose and lignin content according to GB/T2677 (1994). The results are shown in

Component | Types of wood powder | Number | Minimum (%) | Maximum (%) | Average (%) |
---|---|---|---|---|---|

Holocellulose | Eucalyptus | 24 | 78.04 | 82.61 | 80.87 |

Chinese fir | 23 | 66.08 | 70.25 | 67.97 | |

Poplar | 13 | 79.09 | 86.28 | 82.20 | |

Acacia | 12 | 76.70 | 80.58 | 78.53 | |

Masson pine | 10 | 71.54 | 74.01 | 72.80 | |

Total | 82 | 66.08 | 86.28 | 76.14 | |

Lignin | Eucalyptus | 24 | 21.49 | 27.56 | 23.74 |

Chinese fir | 23 | 32.55 | 34.20 | 33.44 | |

Poplar | 13 | 14.82 | 20.51 | 18.00 | |

Acacia | 12 | 24.62 | 27.15 | 25.69 | |

Masson pine | 10 | 28.48 | 28.95 | 28.63 | |

Eucalyptus | 24 | 21.49 | 27.56 | 23.74 | |

Total | 82 | 14.82 | 34.20 | 26.43 |

The experiment employed two S450 near-infrared spectrometers produced by Shanghai Lengguang Technology Co., Ltd. (China), one of which was a new instrument (target) that had just been debugged by the company, and the other was an old instrument (master) used for two years. The old and new instruments were shown in

Spectral collection was performed on the prepared wood powder samples on two Benchtop NIR Spectrometers. When collecting samples, the samples were placed in a measuring cup and flattened with a 50 g weight to make them evenly distributed. Normally each sample was repeatedly loaded several times to take the average spectrum as the measured results. After scanning a sample, use a brush was used to remove the residual wood powder in the sample cup so as not to avoid adverse effects on the accuracy of subsequent sample spectrum collection. Since the S450 near-infrared spectrometer was equipped with a rotating table, repeated sample loading steps can be omitted. Each sample was rotated and measured 6 times, and the average spectrum was taken as the final sample spectrum. The spectral comparison of the two instruments is shown in

The successive projection algorithm (SPA) is a forward cyclic variable selection method [

Before the start of the first iteration (Z = 1), choose any column of the wavelength vector _{j} in the spectral matrix denoted as

The set of unselected column vector positions is denoted as

Calculate the projection of the remaining column vector

Extraction of the largest projection values of the wavelength variables ordinate.

Let

The Monte Carlo uninformative variable elimination (MCUVE) algorithm [

The matrix is randomly sampled _{ij} of the _{j} and the variance is _{j}

The importance of the variable is defined using the UVE method and the importance indicator

Among them, _{j}_{j}

Arrange the

The basic idea of the Synergy intervals PLS (SiPLS) algorithm [

An important development in the multivariate model transfer is direct standardization (DS), which relates the spectra _{m} measured on the master instrument to the spectra _{t} measured on the target instrument by the transformation matrix

The transformation matrix

where ^{+}_{t} is the generalized inverse of _{t}.

Once _{unknown}), measured on the target instrument, can be projected to the master instrument space, and then the property values can be predicted by the old model.

The modeling method in this study is the partial least square regression (PLSR) algorithm, and the interactive verification adopts the leave-one-out method [_{IC}) and relative standard deviation (RPD) were used for comprehensive evaluation. Among them, the closer the correlation coefficient (R) is to 1, the better the regression or prediction results of the model. The smaller the RMSECV and RMSEP, the better the model effect is; RPD is used to verify the stability and predictive ability of the model, and RPD > 3 indicates that the model has high stability and good predictive ability [_{IC} [

where _{IC}, the more simplified the model, and the higher the transfer efficiency of the model.

Due to the large number of wavelength points of the sample spectrum collected by S450 spectrometers, the use of full spectrum results in slower speed and more irrelevant information on whether to build an analysis model or transfer the model. The wavelength selection algorithm was used to select characteristic wavelengths for different components and remove redundant wavelengths, which can not only speed up the establishment of the analysis model but also improve the prediction accuracy of the model. Therefore, this study used SPA, MCUVE, SiPLS algorithms combined with DS method to conduct model transfer research, the results were compared with the full-spectrum model transfer prediction results.

The 82 wood powder samples used in this study required the elimination of one spectrum with excessive noise due to measurement problems before dividing the correction set and prediction set. The SPXY algorithm [

Component | Instrument | LV | Calibration set | Prediction set | |||
---|---|---|---|---|---|---|---|

R | RMSECV | R | RMSEP | RPD | |||

Lignin | Master | 8 | 0.983 | 1.002 | 0.989 | 0.791 | 5.832 |

Holocellulose | Master | 9 | 0.982 | 0.998 | 0.998 | 0.969 | 6.019 |

The SPA algorithm selected wavelengths for the two components of lignin and holocellulose respectively, and the threshold was set to select at least 10 wavelength points and at most 100 wavelength points. Based on the principle of minimum predicted RMSEP, 13 and 19 wavelength points were finally selected for lignin and holocellulose, respectively. The selected wavelength positions are shown in

The MCUVE algorithm set the number of Monte Carlo simulations to 1000 times and used the minimum predicted RMSEP as the standard to select the wavelengths for lignin and holocellulose respectively. Finally, 607 wavelengths were selected for lignin and 639 wavelengths for holocellulose. The wavelengths selected for the two components are shown in

Different from the previous two characteristic wavelength selection methods, SiPLS was selected for the wavelength band. According to the principle of minimum RMSEP, the following best results were obtained for the two components respectively, after multiple parameter settings. For both the band selection of lignin and holocellulose, the spectral interval was divided into 16, and 4 intervals were selected for combination. When the cross-validation number was 6, a total of 400 wavelength points in the 4th, 5th, 8th, and 14th intervals were selected for lignin. The combination interval had the best prediction effect, the specific bands are shown in

_{2} near 2280–2330, 2335 nm. The characteristic wavelengths could be selected by all three algorithms for these functional group positions.

Component | Wavelength selection | Number of wavelengths | LV | Master | Target | ||||
---|---|---|---|---|---|---|---|---|---|

R | RMSEP | RPD | R | RMSEP | RPD | ||||

Lignin | Full spectrum | 1601 | 8 | 0.989 | 0.791 | 5.832 | 0.991 | 3.158 | 1.845 |

SPA | 13 | 8 | 0.990 | 0.676 | 6.820 | 0.983 | 2.299 | 2.006 | |

MCUVE | 607 | 9 | 0.990 | 0.704 | 6.551 | 0.989 | 2.348 | 1.964 | |

SiPLS | 400 | 9 | 0.991 | 0.746 | 6.178 | 0.980 | 1.931 | 2.388 | |

Holocellulose | Full spectrum | 1601 | 9 | 0.998 | 0.969 | 6.019 | 0.992 | 2.214 | 2.083 |

SPA | 19 | 9 | 0.994 | 0.813 | 7.161 | 0.989 | 7.346 | 0.793 | |

MCUVE | 639 | 9 | 0.991 | 0.957 | 6.086 | 0.990 | 3.052 | 1.909 | |

SiPLS | 400 | 8 | 0.991 | 0.878 | 6.637 | 0.989 | 15.712 | 0.371 |

In terms of calculation principle, the SPA algorithm was based on the principle of selecting collinear minimum variables [

The MCUVE algorithm judged the reliability of the variable by calculating the stability value of the variable, the number of variables selected was related to the set standard [

The SiPLS algorithm was different from the above two wavelength selection methods. Instead of selecting a single wavelength point, the spectrum was evenly divided the full spectral range into multiple bands, and then the bands were combined to establish a PLS model. With the minimum RMSEP as the standard, the optimal band combination was selected [

The two near-infrared instruments used in this study have high precision, a wide wavelength range, and small inter-instrument differences, which can be corrected by a simpler model transfer algorithm. However, the large wavelength range and a large number of wavelength points also lead to an increase in the amount of calculation, long operation time, and longer model transfer time. Therefore, before the model was transferred, several wavelength selection algorithms were used to simplify the variables of the master spectral data. The SPXY algorithm was used to select 10, 15, 20, 25, and 30 data from all the samples numbered correspondingly measured on the master instrument and two target instruments, respectively as the transfer set. Since the wavelength selection algorithm can eliminate the collinearity between wavelength points, the total number of wavelengths after wavelength selection was very small, it was not necessary to use the CCA and PDS algorithms, and a simple DS algorithm can be used to obtain a better model transfer effect. The prediction results of the master model for the target samples are shown in

Component | Methods | Transform set | Number of wavelengths | A_{IC} |
R | RMSEP | RPD |
---|---|---|---|---|---|---|---|

Lignin | Full spectrum | 20 | 1601 | 3179.664 | 0.995 | 0.759 | 6.075 |

SPA-DS | 25 | 13 | 5.458 | 0.988 | 0.776 | 5.944 | |

MCUVE-DS | 20 | 607 | 1185.225 | 0.993 | 0.701 | 6.580 | |

SiPLS-DS | 25 | 400 | 793.598 | 0.981 | 0.924 | 4.988 | |

Holocellulose | Full spectrum | 15 | 1601 | 3195.088 | 0.988 | 0.918 | 6.350 |

SPA-DS | 15 | 19 | 48.542 | 0.988 | 1.139 | 5.116 | |

MCUVE-DS | 25 | 639 | 1263.781 | 0.992 | 0.839 | 6.950 | |

SiPLS-DS | 15 | 400 | 791.646 | 0.989 | 0.902 | 6.469 |

It can be seen from _{IC} values were reduced from 3179.664 and 3195.088 to 5.458 and 48.542, respectively, and the model transfer process was greatly simplified. However, the model transfer accuracy for these two components was slightly worse than that of the MCUVE-DS and full spectrum methods. This was because wavelength selection calculations were calculated only for the master spectrum, and the characteristic wavelength of the target was slightly offset from the master. The wavelength filtered out by SPA algorithm was too small to contain the characteristics of holocellulose in the target spectrum, resulting in poor model transfer effect.

Although the MCUVE algorithm was not the best in the prediction process of the master model, this method combined with the DS algorithm had the best transfer effect, and the prediction accuracy of the target lignin and holocellulose had been greatly improved compared with that of full-spectrum and SPA-DS methods. The A_{IC} values were reduced from 3179.664 and 3195.088 to 1185.225 and 1263.781, respectively, indicating that the model transfer efficiency of the MCUVE-DS method was significantly improved. This was due to the fact that the MCUVE algorithm selected more wavelengths, avoiding the problem that the wavelengths selected by the method like the SPA algorithm cannot contain the characteristic wavelengths of the target spectra. The method selected wavelengths with higher stability and removed some redundant wavelengths. Therefore, it had better transfer effect than the full spectrum and SPA-DS methods, but its transfer efficiency was poor compared with the SPA-DS method.

The prediction accuracy of SiPLS algorithm for lignin was not very good because the near-infrared spectral characteristics of lignin were relatively scattered, and SiPLS was a band combination algorithm, which was not flexible enough compared with the two other algorithms of single point selection. However, the characteristic wavelengths of holocellulose were concentrated in the second half of the whole spectral range, which contained multiple characteristic wavelength points and the selected characteristic wavelengths in the same band, and the total number of wavelength points in the band was also more than that of the SPA algorithm. Therefore, the transfer effect of the holocellulose model is better than that of the full spectrum and the SPA algorithm. Compared with the full spectrum, the A_{IC} values of the SiPLS-DS method for the model transfer process of the above two components decreased from 3179.664 and 3195.088 to 793.598 and 791.646, respectively, indicating that the method can simplify the model transfer process to a certain extent.

To address the problem of sharing NIR analysis models of lignin and holocellulose content in pulpwood between two different batches of NIR instruments, model transfer studies were carried out using SPA, MCUVE and SiPLS combined with DS algorithm. The results show that the spectral modeling effects of characteristic wavelength selection by the above three algorithms are greatly improved compared with full-spectrum modeling. The SPA-PLS model has the best prediction effect on the master, selects the least characteristic wavelength and has a fast calculation speed. For lignin, only 13 wavelengths were selected, 19 wavelengths were selected for holocellulose, and the RPDs were all above 6.8, but the prediction performances for target samples were poor. Using these three algorithms combined with the DS algorithm can better realize the model transfer between the two instruments and also simplify the model transfer process drastically. Among them, the model transfer strategy of MCUVE-DS algorithm has the best prediction effect and is obviously better than the effect of the full spectrum model transfer. The model transfer result of SPA on holocellulose is not as good as that of the full-spectrum model, which also means that to ensure a satisfying model transfer, the number of wavelengths in the transferred model should not be too small.

The authors would like to express their gratitude to the Institute of Forest Products and Chemical Industry of the Chinese Academy of Forestry for the samples provided, and to Yunchao Hu and Zhijian Liu for the sample spectra collected.

The authors are grateful for the support of the

The authors declare that they have no conflicts of interest to report regarding the present study.