In hardware Trojan detection technology, destructive reverse engineering can restore an original integrated circuit with the highest accuracy. However, this method has a much higher overhead in terms of time, effort, and cost than bypass detection. This study proposes an algorithm, called mixed-feature gene expression programming, which applies non-destructive reverse engineering to the chip with bypass detection data. It aims to recover the original integrated circuit hardware, or else reveal the unknown circuit design in the chip.

The term hardware Trojan refers either to a particular circuit module deliberately implanted or changed during the process of designing or manufacturing an integrated circuit (IC), or to an unintentional design defect in the IC [

In the global semiconductor supply chain, the traditional security strategy of implementing protection based on the underlying hardware, is no longer valid [

In today’s hardware Trojan detection technology [

Unlike some previous work that researched evolvable hardware [

A hardware Trojan primarily consists of two components: the trigger logic and the payload. A model of the structure of a hardware Trojan is shown in

Destructive detection methods usually use destructive reverse engineering to decapsulate an IC and obtain images of each layer, so as to reproduce and verify the trusted design of the final product [

Courbon et al. [

The specific operation process of the SVM method is depicted in

The author uses the golden layout of _{nm} as input. First, the images of each layer of _{nm} of each grid are tested in the classification conditions. These grids are used to train the classifier and are subsequently classified as Trojan-free (TF) or Trojan-inserted (TI) through SVM. Finally, the classification results are used to identify whether the chip contains a Trojan.

The main goal of the logic test method is to activate a hardware Trojan by applying function, structure, or random test vectors, and then compare the response result with the correct one [

The bypass analysis method detects hardware Trojans through the parameters of an IC during normal operation, such as delay [

The concept of evolvable hardware (EHW) was initially proposed in 1993 by Hugo de Garis while at the Advanced Telecommunications Research Institute in Japan and scientists from the Swiss Federal Institute of Technology. EHW aims to make use of the reconfigurable internal structure of programmable devices, as well as the capabilities of the evolutionary algorithm in combinatorial optimization and universal search. The algorithm can help to locate the structure of the bit string combination of programmable devices for specific tasks, which obtains the hardware circuit with expected functions [

For example [

Xie et al. [

Gene expression programming (GEP) [

For example, consider a gene for the set of functions

The algebraic expression represented by

The redundant symbols in the tail are discarded directly. Then GEP can use fixed-length encoding to express different sizes and shapes of expression trees.

GEP has performed well in mining association rules, clustering, classification rules, time-series predictions, and sunspot predictions [

As the structure shows, GEP performs well in resolving tree-structured problems. In other words, if one circuit can be represented as a tree-shaped structure with

In GEP, uppercase letters represent operators and lowercase letters represent terminators. In this paper, however, we use the name string of the logic gate to represent logic gate, and uppercase letters to represent input parameters. The purpose of this modification is to facilitate the use of circuit simulation software.

However, if only logical values are used to represent the circuit, there would be too many isomorphic situations, For example, the circuit and its ET in

String 3 describes the corresponding effective gene of

The two circuits are fully equivalent in terms of logic values, and both are shown in expression

This example is only for the logic value of the circuit. A similar situation exists in terms of bypass information detection, and many different circuit structures could result from any single bypass information, which is referred to as

Therefore, if a circuit is described only by logical values or a certain kind of bypass information, the circuit structure cannot be confirmed because of so much isomorphism. This paper proposes an algorithm called mixed-feature GEP (MF-GEP), which represents multiple circuit features by using the same structure in GEP. This algorithm can reduce the number of isomorphic circuits and can be used to detect hardware Trojans.

For a circuit

Let _{k}

_{k} _{1}, _{2},_{n}_{k}, and

_{k} _{1}, _{2},_{m}_{k} corresponding to circuit _{k} _{k}

_{k} _{k} _{k}, _{k} _{k} _{k}.

We provide the following definitions:

For two circuits

For

Then we call circuits _{k} for input vector _{k}.

For

Then we call circuits _{k}.

For

Then we call circuits _{k}.

Although Strings 2 and 3 represent two different circuits, they are exactly the same in the logical value test, and both represent the circuits in expression (2). According to Def 2, Strings 2 and 3 are FSVIC in logical values. However, it is always possible to find a certain feature (such as current) that makes Strings 2 and 3 PSVIC. Therefore, Strings 2 and 3 are considered as PSVIC.

A formal definition of a Trojan circuit is as follows:

If the circuits

In fact,

For the input domain

There are two sets of

For

For

The smaller the scale of

In order to make set

• Feature Value 1:

In digital logic terms, the description of this circuit is:

• Feature Value 2:

In terms of voltage, the description of this circuit is:

where _{SH} represents the higher threshold and lower potential limit of “1” in the circuit, and _{CES} represents a saturation voltage drop of the triode.

• Feature Value 3:

In terms of current, the description of this circuit is:

where

In addition, there are a variety of other bypass information detection items, such as delay and spectrum.

With any of the above three feature values, the circuit structure cannot be determined. However, it is possible to determine the circuit structure if the three features are combined.

In GEP, an operator represents only one computation. A GEP individual only represents a description of one feature value and obtains an unlimited number of isomorphic circuits. It is impossible to determine the actual circuit structure.

The MF-GEP proposed in this paper combines the test results of multiple feature values on a circuit into a single function representation, which integrates them into a composite function,

Detection values are included in a function. The input comes from multiple feature values and the output is a vector, like the Not-gate circuit. In GEP’s ET, it is still represented by “

where _{k} _{k} _{k}_{k}.

However, in an electronic component, each feature value should have a different importance. So, each member of this vector should be multiplied by a weight:

For normalization, it can be specified as

For example, corresponding to this Not gate circuit, the symbol Not indicates the following meaning:

where _{A} represents the input voltage of point _{A} represents the input current of point

Therefore, when using GEP evolution, a single symbol

In this paper, four groups of experiments (Experiments 1–4) were designed initially. Later, in order to verify the new problems in Experiments 1–4, Experiment 5 was added. As comparison experiments, most of the parameters and fitness functions are exactly the same.

The circuit has multiple inputs and only one output. Experiment 1 used one feature, Experiments 2 and 3 used two features, Experiment 4 used three features, and Experiment 5 used two features. The three features are logical value, voltage value and current value. The GEP parameters of all the experiments are exactly the same.

Parameter | Value |
---|---|

Fitness | =1 |

Selection mode | Tournament, size = 3 |

Population size | 10000 |

Head length | 20 |

Tail length | 21 |

Chromosome length | 1 |

Mutation rate | 0.05 |

Insert rate | 0.1 |

Root insert rate | 0.01 |

One-point cross rate | 0.1 |

Two-point recombination rate | 0.1 |

Number of inputs | 4 |

Number of outputs | 1 |

Function set | Not, And, Or |

The feature data includes logical data, voltage data, and current data, and each has its respective fitness function:

(1) Logical fitness function:

where _{i} is the actual logical value in the test data. The range of _{1} is now discussed as follows:

a) If all _{i}, there is

Then

Then the maximum value of _{1} is 1.

b) Because of the logic value, the worst case is that every _{i}, then every

Then

Then the minimum value of _{1} is 0.

c) Suppose there are two GEP individuals, _{1} and _{2}. For the same set of test data,

For _{1}, there are _{i..}

For _{2}, there are _{i.}

For _{1}, a rearrangement of (_{i}) leads to _{i} when

For _{2}, a rearrangement of (_{i}) leads to _{i} when

Because _{1}_{1}_{1}_{2}

Therefore, the conclusion can be drawn as follows:

The range of _{1} is in

With the improvement of the matching degree between the calculated value _{i}, the value of _{1} increases monotonically.

(2) Voltage fitness function

_{i} is the actual voltage value in the test data. _{CC} _{DD}_{mx} = _{CC} _{DD}_{2} can be defined as

The range of each _{mx}_{2} are discussed as follows:

a) If all _{i}, there is

Then

Then the maximum value of _{2} is 1.

b) The worst case is that every _{i}, which means that every _{mx}

Then the minimum value of _{2} is 0.

c) Suppose there are _{i} in a GEP individual _{i}) leads to _{i} when

Let us define _{A} _{i}_{i}

Then,

Because _{mx}_{mx}

Then

When

Then

d) Suppose there are two GEP individuals, _{1} and _{2}.

For _{1}, _{2}, _{2}_{1}_{2}_{2}

Therefore, we can draw the conclusion that:

The range of _{2} is in

If individual _{2} is more similar to the destination circuit than _{1}, there is _{2}_{2}_{2}_{1}

(3) Current fitness function

_{i} is the actual current value in the test data.

(4) Individual’s fitness

According to the previous description of the algorithm, the individual’s fitness should be a combination of the three fitness functions, and therefore the individual’s fitness is defined as:

_{i} is the weight of corresponding _{i} in final fitness. The sum of _{i} is 1:

If we set the weight vector

then

Its Boolean expression is:

The circuit makes the computation:

After several logic gates are added to the circuit of _{2}

Its Boolean expression becomes:

String 4 describes the corresponding effective gene of

Only a portion of the values can be tested if there are too many pins. The input value used to activate the Trojan _{2} _{2} may be missed at this time. In the following experiment, the input value _{2} will not be provided, and the output will be determined by the evolved circuit.

Four groups of experiments were designed using different provided values and weight vectors. Considering that the logic values are first required to be correct in the circuit, the voltage and current values must be based on the correct logic values in order to make sense. The individual’s fitness is a combination of several data, in which the proportion of logical value is larger.

Parameter | Value |
---|---|

Logic gate | And, Or, Not |

Values provided | Logic values |

Weight vector | C = [1, 0, 0] |

Exercise count | 100 |

Trojans discovered | 0 |

The simplified Boolean expression of all the circuits above is:

Parameter | Value |
---|---|

Logic gate | And, Or, Not |

Values provided | Logic values, voltage values |

Weight vector | C = [0.8, 0.2, 0] |

Exercise count | 100 |

Trojans discovered | 0 |

The simplified Boolean expression of all the circuits above is:

A Trojan still cannot be found, although both the logical value and the voltage value were provided. The reason is that in digital circuits, the logic value is expressed in the form of voltage values. For example, a voltage value less than

Parameter | Value |
---|---|

Logic gate | And, Or, Not |

Values provided | Logic values, current values |

Weight vector | C = [0.8, 0, 0.2] |

Exercise count | 100 |

Trojans discovered | 72 |

The simplified Boolean expression of both

The equivalent circuit has been discovered, although the original circuit was hidden.

Parameter | Value |
---|---|

Logic gate | And, Or, Not |

Values provided | Logic values, voltage values current values |

Weight vector | C = [0.6, 0.2, 0.2] |

Exercise count | 100 |

Trojans discovered | 67 |

Its simplified Boolean expression is:

A circuit equivalent to the original circuit has been found. The equivalent circuit has been discovered even though the original circuit was hidden.

In this group of experiments, some wrong circuits were found by the failed evolution. As shown in

Although three feature values were used in this group of experiments, the efficiency in discovering the Trojan was similar to that in Experiment 3, which used only two feature values. This is because the logical value itself is expressed in the form of voltage values and the three feature values are equal to the two feature values.

As discussed in Experiments 2 and 4, in digital circuits, the logical value is expressed in the form of voltage value,

Parameter | Value |
---|---|

Logic gate | And, Or, Not |

Values provided | Voltage values, current values |

Weight vector | C = [0, 0.8, 0.2] |

Exercise count | 100 |

Trojans discovered | 53 |

We can see that in 53 of the 100 experiments, results equivalent to Trojan circuits were obtained, but these circuits failed to make breakthrough findings in Experiments 3 and 4. However, several false circuits were found, as shown in

Experiment 5 shows a lower probability in detecting Trojan circuits than Experiment 3. Nevertheless, the result is acceptable considering the randomness of evolutionary calculation. Results indicate that voltage data and logical value data in the mixed-feature GEP algorithm produce similar results. This verifies the hypothesis in Experiments 2 and 4 that two feature values are inter-replaceable in the algorithm if there is a simple correlation between them. Therefore, the maximization of variance between feature values helps to improve the effectiveness of the algorithm when the simple and direct correlation is uncertain.

This paper proposes a mixed-feature GEP (MF-GEP) algorithm in which multiple feature values were fused into the same operator. There is a specific probability that Trojan circuits could be detected by MF-GEP, which automatically discovers the evolutionary power of mathematical formulas. The fewer features that are used, the higher the efficiency of the GEP evolution, but the conclusion is in wider disparity from the real circuit. At the same time, as the number of features used increases, the efficiency of GEP evolution decreases, but the conclusion drawn gets closer to the real circuit. However, if there is a direct conversion relationship between the multiple feature values used, these values can be considered as one and the accuracy of MF-GEP evolution will not be increased.

_{DDQ}s