#### 1. Introduction

With the rapid increase of data rate, *e.g.* 50 Gb/s or above, the traditional non-return-zero (NRZ) based equalizer faces more and more challenges due to the high insertion loss of backplane channel in high-speed electrical interconnect applications like 400 Gb/s Ethernet (400 GbE), InfiniBand and OIF-CEI [1]. To alleviate this problem, 4-Level Pulse Amplitude Modulation (PAM4) has been standardized in 400 GbE physical layer over backplanes to constrain the effective signal bandwidth [2], which can reduce the channel insertion loss equivalently. In 400 GbE PAM4 electrical link, forward error correction (FEC) technology [3], *e.g.* RS (Reed-Solomn) (544,514) is also deployed to raise the performance of bit error rate (BER) further and compensate the inherent SNR loss of about 9.5 dB compared with NRZ one. However, the error propagation of decision feedback equalizer (DFE) in the receiver, which easily contributes to long burst error beyond the error correction capability of FEC coding, may harm the FEC performance to some extent [4].

To handle the problem of error propagation, some researches have focused on estimating the probability of various lengths of burst errors with the aim of achieving the desired BER. For example, Dong [5] presented a recursive analytical model for PAM4 link system to evaluate the probability of various lengths of burst error and FEC performance. Since simplified error patterns occurring only between the neighboring levels, it is hard to calculate the probability of all possible symbol error patterns induced by single or multiple error symbols accurately. Meanwhile, others put emphasis on the modification of link system in order to reduce the burst error length significantly. Lu [6] investigated the effect of pre-coding on FEC performance with DFE error propagation through a simple Monte-Carlo model and also demonstrated that improved performance margin can be obtained by pre-coding conditionally. Further analysis was made by Zhang in which pre-coding impact for various constructed DFE configurations, *e.g.* 1-tap or multi-tap DFE was simulated case by case, giving the conclusion that pre-coding isn’t always effective for multi-tap DFE architecture in PAM4 link system, even worsen the FEC performance [7]. On the other hand, some alternative methods, such as FEC codeword interleaving and bit multiplexing which can be implemented in physical media attachment (PMA) sub-layer can improve FEC performance by breaking long burst errors into short ones [8].

Although DFE error propagation for NRZ system has been studied deeply [9], its effect on PAM4 needs further exploring due to the degraded noise margin of PAM4. This paper focuses on the mitigation of DFE error propagation for PAM4 system. By using an analytic method to estimate burst run length probability, BER performance can be evaluated for typical TX FFE \(+\) RX DFE \(+\) FEC configuration. In additional, in order to further improve BER performance by breaking a long burst errors into several short ones, effective interleaving schemes combined with FEC code are investigated for PAM4 electrical link and the performance improvement are evaluated not only based on the simulation but also on the theoretical analysis.

#### 2. PAM4 Link System

##### 2.1 Architecture

Before estimating the effect of DFE error propagation, a typical high speed PAM4 link system will be given first [10], [11]. As shown in Fig. 1, it includes transmitter (TX), receiver (RX) and lossy channel. At the transmitter, two lanes of 25 Gb/s pseudo random binary sequence (PRBS31) are generated, and then shaped by feed-forward equalizer (FFE) to compensate the high frequency content attenuation caused by channel. After that two lanes of 25 Gb/s equalized signal are combined into one lane of 50 Gb/s PAM4 signal and transmitted to the channel. It is well known that PAM4 signal uses four levels, *e.g.* 1, 1/3, \(-1/3\), \(-1\), to represent 11, 10, 01 and 00, respectively. So, in the receiver a DFE with two parallel feedback paths combing with a PAM4 decoder is employed to cancel the post-cursor inter-symbol interference (ISI). In addition, for emulating actual link performance, impairments such as device package (pkg) and crosstalk are necessarily added into the link system in a form of S-parameter [12], [13].

##### 2.2 DFE for PAM4 System

Figure 2 gives a typical DFE for PAM4 system [14]. It can be seen that three separate slicers with different threshold voltages (\(+V_{\mathrm{th}}\), 0, \(-V_{\mathrm{th}}\)) are used to generate a 3-bit thermometer code (\(d_{\mathrm{U}}\), \(d_{\mathrm{M}}\), \(d_{\mathrm{B}}\)), respectively, which will be converted to two NRZ signals (\(d_{\mathrm{MSB}}\), \(d_{\mathrm{LSB}}\)) by a decoder. The parallel feedback structure is designed to remove ISI as much as possible, in which PAM4 tap coefficients \(c_{i}'\) can be generated from coefficient \(c_{i}\) (\(i=1\)\({\sim}\)*k*) which can be achieved with the similar approach used in NRZ system.

And we have:

\[\begin{equation*} \begin{aligned} y(n) &= x(n)-\sum_{i=1}^k c_{i}' d(n-i) \\ &= x(n)-\sum_{i=1}^k \left( \frac{2c_{i}}{3} \times d (n-i)_{\mathrm{MSB}} +\frac{c_{i}}{3} \times d (n-i)_{\mathrm{LSB}} \right) \end{aligned} \tag{1} \end{equation*}\] |

where \(x(n)\), \(y(n)\) and \(d(n-i)\) depict the *n*-th input, output signal of the adder and the (\(n-i\))-th decoded signal, respectively. \(\sum\limits_{i=1}^k c_{i}' d(n-i)\) represents the compensated value provided by the feedback filter.

From Eq. (1), we know that the delta of \(x(n)\), \(\sum\limits_{i=1}^k c_{i}' d(n-i)\), not only depends on the number of taps *k* but also on the value of tap coefficients. If any of the bits in a previous symbol, *i.e.* either \(d(n-i)_{\mathrm{MSB}}\) or \(d(n-i)_{\mathrm{LSB}}\), or both, makes mistake in the decision, it may induce either one bit or even two bits in error for current symbol \(x(n)\) with a certain probability. This is also the primary process of DFE error propagation, both for PAM4 and NRZ. From the view of this point, decreasing the number of taps or the magnitude of tap coefficients or both can benefit the mitigation of error propagation.

However, there are some differences between PAM4 DFE and NRZ DFE. First, the delta of \(x(n)\) mentioned above in NRZ DFE depends on only one bit of previous symbol. In PAM4 DFE, however, it depends on two bits of previous symbol \(d_{\mathrm{MSB}}\) and \(d_{\mathrm{LSB}}\), making it more complex when error propagation happens. The second is that PAM4 has a larger probability of one bit in error in previous symbol affecting latter symbols compared with NRZ. This is because PAM4 has lower noise margin if it has the same outer received symbol level with NRZ. As shown in Fig. 3, suppose *h* represents this level amplitude, we can see that the minimum distance between two adjacent levels for PAM4 \(d_{\min,\mathrm{PAM4}}\) is equal to one third of that for NRZ, *i.e.* \(d_{\min,\mathrm{PAM4}}=(1/3)\times d_{\min,\mathrm{NRZ}}\).

##### 2.3 Using TX FFE for PAM4 System

It is known that for NRZ the error propagation of DFE results from the feedback filter can be eliminated effectively by decreasing the tap magnitude and tap number at the cost of equalization performance [15]. In order to maintain the BER performance, FFE with appropriate post-cursor taps can be deployed for PAM4 system to limit the magnitude and number of DFE taps, shown as Fig. 4. It can be observed that the only difference between PAM4 FFE and NRZ one is the additional combiners, *i.e.* combiner II with \(\times\) 2 weight and combiner I with \(\times\) 1 weight, which are responsible for converting two equalized NRZ signals into PAM4 one.

Figure 5 compares the equalization performance of RX 5-tap DFE combined with TX 2-tap and 3-tap FFE. We can see that larger horizontal and vertical eye opening for DFE input and output can be obtained for 3-tap FFE structure compared with 2-tap FFE one. This illustrates that FFE with post-cursor tap, *e.g.* \(a_{1}\), can reduce the DFE tap coefficients and thus improve the voltage margin effectively without affecting the whole link performance.

#### 3. DFE Error Propagation

##### 3.1 Estimation of Burst Error Run Length

Different from NRZ system in which the burst error due to DFE error propagation happens in a bit-by-bit pattern, for PAM4, it occurs in a symbol-by-symbol one. So, before we analyze the burst length distribution, which is defined as the cumulative-probability distribution of symbol burst errors as a function of burst length, we will give the signal-noise-ratio (\(\mathit{SNR}\)) based analytic models for calculating symbol error rate of PAM4 [16]:

\[\begin{equation*} \left\{\begin{aligned} & P_{\mathit{err}} = P_{s} \approx \frac{3}{4} \mathit{erfc} \left(\frac{\sqrt{\mathit{SNR}_{\mathrm{PAM4}}}} {2\sqrt{2}} \right) = \frac{3}{4} \mathit{erfc} \left(\frac{h}{3\sqrt{2}\sigma} \right) \\ &\hskip70mm \mathit{for}\quad \mbox{PAM4} \\ & P_{\mathit{err}} = P_{b} \approx \frac{1}{2} \mathit{erfc} \left(\frac{\sqrt{\mathit{SNR}_{\mathrm{NRZ}}}} {2\sqrt{2}} \right) =\frac{1}{2} \mathit{erfc} \left(\frac{h}{\sqrt{2}\sigma} \right) \\ & \hskip70mm\mathit{for}\quad \mbox{NRZ} \end{aligned}\right. \tag{2} \end{equation*}\] |

where *h* is the outer level amplitude mentioned above and \(\sigma\) is the standard deviation of error caused by noise.

Next, assume \(p(e_{i} | E)\) is the probability of the detection of the *i*-th wrong symbol after error pattern *E* happened, then \(p(\mathit{brl}=l)\), the probability of a burst error with run length equal to *l* can be derived according to \(p(e_{i} | E\)). Let’s take \(p(\mathit{brl}=3)\) as an example, see Fig. 6, this situation include two error patterns: one is \(E_{3,1}=\{1,0,1\}\), in which the first and third symbols are in error, the other is \(E_{3,2}=\{1,1,1\}\), in which all 3 symbols are in error. The probability of \(\mathit{brl}=3\) is the sum of these two separate probabilities as the following expression:

\[\begin{equation*} \begin{aligned} &p (\mathit{brl}=3) = \sum_{j=1}^2 p(\mathit{brl}=3, E_{3,j}) \\ &\!\quad{}= p (\mathit{brl}=3, E_{3,1} = \{1, 0, 1\}) + p (\mathit{brl}=3, E_{3,2} = \{1, 1, 1\}) \\ &\!\quad{} = (1-p (e_{2} | \{e_{1}\} = \{1\})) \cdot p (e_{3} | \{e_{1}, e_{2} \} = \{1,0\}) \\ &\! \qquad\hphantom{={}} \cdot \prod_{i=4}^{\mathit{brl}_{\max} + 3} (1 - p(e_{i} | \{e_{1}, e_{2}, e_{3} \} = \{1, 0, 1\})) \\ &\! \quad \hphantom{={}} {} + p (e_{2} | \{e_{1} \} = \{1\}) \cdot p (e_{3} | \{e_{1}, e_{2} \} = \{1,1\}) \\ &\! \qquad\hphantom{={}} \cdot \prod_{i=4}^{\mathit{brl}_{\max} + 3} (1-p (e_{i} | \{e_{1}, e_{2}, e_{3} \} = \{1, 1, 1\})) \end{aligned} \tag{3} \end{equation*}\] |

where \(p(e_{2} | \{e_{1}\} = \{1\})\) is the probability of the 2*nd* symbol in error when the first symbol is wrong, \(p(e_{3} | \{e_{1}, e_{2}\} = \{1,1\})\) is that of the 3*rd* symbol in error when the first and second symbol wrong, and \(\prod\limits_{i=4}^{\mathit{brl}_{\max} +3} (1-p(e_{i} | \{e_{1}, e_{2}, e_{3} \} = \{1, 0, 1\}))\) is that of any symbol *i* (\(4\le i \le \mathit{brl}_{\max+3}\)) is either correct or in error but not caused by the given error pattern \(E= \{ e_{1}, e_{2}, e_{3}\} = \{1,0,1\}\). \(\mathit{brl}_{\max}\) is the maximum burst run length. (The detail derivation can be found in [9]).

Generally, there are overall \(2^{l-2}\) symbol error patterns in a burst run length of *l*, then we can get \(p(\mathit{brl}=l\)) as following:

\[\begin{equation*} \begin{aligned} p(brl=l) &= \sum_{j=1}^{2^{l-2}} p(\mathit{brl=l}, E_{l,j}) \\ &= \sum_{j=1}^{2^{l-2}} \prod_{i=2}^l p(e_{i}^{l,j}) \cdot \prod_{i=l+1}^{\mathit{brl}_{\max} +l} (1-p(e_{i} | E_{l,j})) \\ \end{aligned} \tag{4} \end{equation*}\] |

where \(E_{l,j}\) represents the *j*-th error pattern when \(\mathit{brl}=l\). \(p(e_{i}^{l,j} )\), given in Eq. (5), is the probability of the *i*-th symbol in error for pattern \(E_{l,j}\) (\(1\le l \le \mathit{brl}_{\max}\)), and \(\prod\limits_{i=l+1}^{\mathit{brl}_{\max} +l} (1-p(e_{i} | E_{l,j}))\) has the similar meaning with \(\prod\limits_{i=4}^{\mathit{brl}_{\max} + 3} (1-p(e_{i} | \{ e_{1}, e_{2}, e_{3} \} = \{1, 0, 1\}))\) in Eq. (3).

\[\begin{equation*} p(e_{i}^{l,j}) =\left\{\begin{aligned} &1 - p (e_{i} | \{ e_{1}, e_{2}, \ldots, e_{i-1} \}), & \mathit{if}\ e_{i}^{l,j} =0 \\[.5mm] & p (e_{i} | \{ e_{1}, e_{2}, \ldots, e_{i-1} \}), & \mathit{if}\ e_{i}^{l,j} =1 \\ \end{aligned}\right. \tag{5} \end{equation*}\] |

After acquiring Eqs. (4) and (5), the probability of different burst error run length can be analyzed.

##### 3.2 Simulation of Burst Error Run Length

The distribution of burst error run length for PAM4 system is simulated based on the analytic model above and totally three backplane channels A, B and C are considered. Figure 7 depicts the channel frequency responses, in which near-end and far-end crosstalk are also plotted. Table 1 lists the insertion losses for 50 Gb/s data rate. For example, for PAM4, the insertion loss of channel C is about 11.6 dB at 12.5 GHz, while for NRZ, it can be up to 26.9 dB at 25 GHz.

Table 2 lists part of the simulation results for TX 2-tap FFE \(+\) RX 5-tap DFE and Fig. 8 plots the probabilities of different \(\mathit{brl}\) under two equalization configures: one is TX 2-tap FFE \(+\) RX 5-tap DFE and the other is TX 3-tap FFE \(+\) RX 5-tap DFE. From Table 2, it can be seen that a random error may be propagated to a short or long burst error with certain probability. For channel A, for example, \(p(e_{2} | e_{1})\), \(p(e_{3} | e_{1})\) and \(p(e_{4} | e_{1})\), the probability of the second, third and fourth symbol in error due to the first random symbol error, are 2.889e-1, 1.352e-1 and 6.636e-2, respectively, decreasing with the increasing of \(\mathit{brl}\). Furthermore, it can be observed that all the probabilities decrease significantly with the increasing of \(\mathit{brl}\), either for 2-tap or 3-tap FFE. On the other hand, it can be concluded that the probability with 3-tap FFE is less than that with 2-tap FFE for the same \(\mathit{brl}\), illustrating that increasing FFE post-cursor tap numbers and/or magnitudes can mitigate the effect of DFE error propagation effectively.

##### 3.3 Effect of Different Equalizer Configurations on BER

From the above discussion, we can find that FFE with a post-cursor tap can reduce the probability of burst error due to the decreasing of DFE tap coefficient. To verify its reasonability, the theoretical analysis is performed and the comparison between simulation result and the analysis result about the impact of different equalizer configurations on BER are illustrated as below.

First, we give the symbol error rate (\(\mathit{SER}\)) considering DFE error propagation according to the BER calculation of NRZ system [9]:

\[\begin{equation*} \mathit{SER} = \frac{\sum\limits_{w=1}^\infty p (W(E)=w)\cdot W(E) }{n} \tag{6} \end{equation*}\] |

where *n* is the symbol number of a block, \(p(W(E)=w)\) is the probability that total *w* symbols are in error among *n* symbols, and \(W(E)\) is the weight of error pattern *E*. Furthermore, we can approximate \(p(W(E)=w)\) with the probability of a single burst error with *w* symbols in error as following (see Appendix A):

\[\begin{equation*} \begin{aligned} & p (W(E)=w) \approx p (\mbox{burst error of $w$ symbols}) \\ &\quad{} \approx \sum_{l=w}^{\mathit{brl}_{\max}} p (\mathit{brl}=l, W(E)=w) \\ &\quad{} = \sum_{l=w}^{\mathit{brl}_{\max}} n \cdot p \cdot \left(\sum_{j, W(E_{l,j})=w} p(\mathit{brl}=l, E_{l,j})\right) \cdot (1-p)^{n-\mathit{brl}_{\max} -l} \\ \end{aligned} \tag{7} \end{equation*}\] |

where \(p(\mathit{brl}=l, W(E)=w)\) is the probability that error pattern *E* with weight of *w* happens in the burst error run length *l*, *p* is the random symbol error rate due to the channel loss and noise (see Eq. (2)), and \(\sum\limits_{j,W(E_{l,j} )=w} p(\mathit{brl}=l, E_{l,j})\) is the probability of \(\mathit{brl}=l\) for all error patterns \(E_{l,j}\) with weight of *w*.

Then, by bringing Eq. (7) into Eq. (6) and rewriting Eq. (6), we can get \(\mathit{SER}\) as following (can refer Appendix B for detailed derivation):

\[\begin{equation*} \mathit{SER} = \sum_{l=1}^{\mathit{brl}_{\max}} \sum_{\mathit{all}\ E} p(\mathit{brl}=l, E) \cdot W(E) \cdot p \cdot (1-p)^{n-\mathit{brl}_{\max} -l} \tag{8} \end{equation*}\] |

Finally, because of linear coding, \(\mathit{BER}\) for PAM4 system can be obtained from \(\mathit{SER}\) [17], which can be approximated as:

\[\begin{equation*} \begin{aligned} \mathit{BER} & \approx d_{\mathrm{avg}} \cdot \frac{\mathit{SER}}{\log_{2} M} = \left(2 - \frac{\log_{2} M}{M-1} \right) \cdot \frac{\mathit{SER}}{\log_{2} M} \\ & = \frac{2}{3} \cdot \sum_{l=1}^{\mathit{brl}_{\max}} \! \sum_{\mathit{all}\ E} p (\mathit{brl}=l, E) \cdot W(E) \cdot p \cdot (1-p)^{n-\mathit{brl}_{\max} -l} \end{aligned} \tag{9} \end{equation*}\] |

where \(d_{\mathrm{avg}}\) donates the average Hamming distance, *M* is the level number and \(M=4\) for PAM4. When \(M=2\) and \(d_{\mathrm{avg}}=1\), we have \(\mathit{BER} = \mathit{SER}\) for NRZ.

Figure 9 compares the analysis results based on Eq. (9) and the simulation results for 2-tap/3-tap FFE\(+\)5-tap DFE. We can see that the theoretical results meet the simulation results well for both configurations. In addition, it can be seen that 3-tap FFE based configuration with a post-cursor tap can achieve the better BER performance than 2-tap FFE without post-cursor, indicating that suitable FFE configuration can mitigate DFE error propagation effectively.

##### 3.4 Effect of Error Propagation on BER with FEC Coding

From the analysis above, we have known that DFE error propagation with different equalizer configurations can result in burst error with different run length and thus impact BER performance to some extent. Meanwhile, from [9], we know that RS(544,514), with a capability of correcting single burst error up to 140 bits and a burst coding gain of 6.64 dB at the BER of \(10^{-15}\), can reduce the effect of DFE error propagation for NRZ system. Therefore, it is necessary to investigate the performance improvement when FEC is applied to PAM4 system.

Figure 10 gives the BER simulation results without and with FEC for the same PAM4 system in Fig. 1. From Fig. 10, it can be seen that for random error, RS(544,514) can provide a coding gain of 5.2 dB at the BER of \(10^{-7}\) for the link with 2-tap TX FFE, while a 4.56 dB coding gain can be provided for 3-tap TX FFE structure, which has better BER performance. Compared with random error, however, DFE error propagation can degrade the link performance more significantly, either for 2-tap FFE or for 3-tap FFE with loss of 0.56 dB and 0.36 dB, respectively, although FEC has been deployed. Furthermore, it is worth to note that 0.55 dB performance boosting at \(10^{-7}\) BER can be obtained for 3-tap FFE structure compared with 2-tap FFE. This demonstrates again that FFE with a post-cursor tap can mitigate DFE error propagation and improve the link performance to some extent.

##### 3.5 Performance Improvement Using FEC Interleaving

Like NRZ system, PAM4 system can also employ pre-interleaving and bit multiplexing technique, which can break long burst error into short ones, to enhance performance further [18]. Figure 11 gives a block diagram of 400 GbE physical layer through chip-to-module method [2], in which interleaving and/or multiplexing can be realized in host chip and the signal interaction between the host chip and the optical module is achieved through 8-lane electrical link, *i.e.* 400GAUI-8.

Different from NRZ system in which either bit or symbol pre-interleaving can be realized without considering the LSB and MSB of an output symbol, in PAM4, however, LSB and MSB of a PAM4 symbol should be arranged carefully in order to reduce its effect on BER performance. It has been verified that it is much better to combine two bits coming from the same FEC symbol compared to from different ones.

In our paper, three interleaving schemes are simulated for total 4 FEC lanes: non-interleaving, bit and symbol interleaving, shown as Fig. 12, in which two blocks are contained: the block of 4 FEC lanes to 16 sub-lanes, and PAM4 modulation. It is worth to note that the LSB and MSB of the generated PAM4 symbol are coming from the same FEC symbol for all three schemes although they have different interleaving method.

For the first scheme, *i.e.* non-interleaving scheme, each 2 10-bit FEC symbols from the same FEC lane is distributed into 2 given pairs of sub-lane alternatively. And 4 2-bit data from 4 FEC lanes are sent into 8 pairs of sub-lane, *i.e.* 16 sub-lanes. Each 2-bit from a pair of sub-lane is modulated to a PAM4 symbol, reducing the effect of error symbol on BER.

In the second method, *i.e.* bit pre-interleaving, each 2-bit data from 4 FEC lanes is distributed to one given sub-lane pair, alternatively. And each 8 symbols from the same FEC lane are arranged into 8 different pairs of sub-lane, *i.e.* 16 sub-lanes. Similarly, each 2-bit data from the same FEC symbol is modulated to a PAM4 symbol. This scheme, however, has a disadvantage that a short burst error may cause multiple FEC symbols in error easily. This problem can be solved by the third scheme, *i.e.* symbol pre-interleaving, in which each 10-bit FEC symbol from 4 FEC lanes is distributed into a given sub-lane pair alternatively. And 8 symbols from the same FEC lane are arranged into 8 different pairs of sub-lane, *i.e.* 16 sub-lanes. This scheme has a larger interleaving depth compared with bit pre-interleaving. For example, considering a 4-symbol burst error for PAM4 signal that occurs around the boundaries (see Fig. 12 (b), (c)), the 4 error symbols hit 4 FEC symbols in bit pre-interleaving scheme, while they only hit 2 ones in symbol pre-interleaving one. So, it is obvious that symbol pre-interleaving scheme outperforms bit one.

Below we will explore the impact of interleaving scheme on erroneous FEC symbols for a given PAM4 burst error length. Equations (10)\(\sim\)(16) calculate the number of erroneous symbols and the probability for the interleaving schemes mentioned above and [19], respectively, and the results comparison is listed in Table 3.

For non-interleaving scheme in Fig. 12 (a), when a burst error with length of \(\mathit{brl}\) symbol occurs, the number of error symbols and corresponding probability can be calculated as:

\[\begin{align} &\mathit{error\ symbol\ number} \notag\\ &\quad{}= \left\{ \begin{aligned} &\mathit{ceil} \left(\frac{\mathit{brl}}{m/2}\right)+1, && \mathit{of}\ prob_{1} = \frac{|\mathit{brl}\% (m/2)-1|}{m/2} \\ &\mathit{ceil} \left(\frac{\mathit{brl}}{m/2}\right),\hphantom{+1} && \mathit{of}\ prob_{2} =1-prob_{1} \\ \end{aligned} \right. \tag{10} \end{align}\] |

where *m* is the bit number of a FEC symbol. For RS(544, 514), \(m=10\). And a FEC symbol contains \(m/2\) PAM4 symbol. Equation (10) gives the number of a FEC symbol contained by a \(\mathit{bl}\) burst error symbol and its probability.

For bit pre-interleaving in Fig. 12 (b), the shortened burst errors *x* and error symbol number is calculated as following:

\[\begin{equation*} x=\left\{ \begin{aligned} &\mathit{ceil}\left(\frac{\mathit{brl}}{4}\right), && \mathit{of}\ prob_{1} =\frac{\mathit{brl}\% 4}{4} \\ &\mathit{floor}\left(\frac{\mathit{brl}}{4}\right), && \mathit{of}\ prob_{2} =1-prob_{1} \end{aligned}\right. \tag{11} \end{equation*}\] |

Thus the number of erroneous symbols caused by shorter burst on each FEC lane can be calculated as:

\[\begin{align} & \mathit{error\ symbol\ number} \notag\\ &\quad{}=\left\{\begin{aligned} &\mathit{ceil} \left(\frac{x}{m/2}\right)+1, &&\mathit{of}\ prob_{3} =\frac{|x\% (m/2)-1|}{m/2} \\ &\mathit{ceil} \left(\frac{x}{m/2}\right), && \mathit{of}\ prob_{4} =1-prob_{3} \\ \end{aligned}\right. \tag{12} \end{align}\] |

Equation (11) represents the shorted symbol with length of *x* that a \(\mathit{bl}\) burst error [2]symbol is divided into 4 FEC lanes and the corresponding probability. Equation (12) denotes the number and the probability of FEC error symbol caused by *x* shorted symbol.

For symbol pre-interleaving in Fig. 12 (c), a burst error with length of \(\mathit{brl}\) symbols becomes *x* and \(x+1\) shorter symbol with certain probability according to the following equation:

\[\begin{equation*} \begin{aligned} & x+1 = \mathit{ceil}\left(\frac{\mathit{brl}}{m/2} \right)+1, && \mathit{of}\ prob_{1} =\frac{|\mathit{brl}\% (m/2)-1|}{m/2} \\ & x = \mathit{ceil} \left(\frac{\mathit{brl}}{m/2}\right), && \mathit{of}\ prob_{2} =1-prob_{1} \\ \end{aligned} \tag{13} \end{equation*}\] |

Then, the number of error symbols on each FEC lane can be calculated as

\[\begin{align} & \mathit{error\ symbol\ number} \notag\\ &\quad = \left\{ \begin{aligned} & \mathit{ceil} \left(\frac{x+1}{4}\right), && \mathit{of}\ prob_{3} =\frac{x\% 4}{4}prob_{2} \\ &&& \hphantom{\mathit{of} prob_{3} ={}}\ +\frac{(x+1)\% 4}{4}prob_{1} \\ & \mathit{floor} \left(\frac{x}{4}\right), && \mathit{of}\ prob_{4} =1-prob_{3} \\ \end{aligned}\right. \tag{14} \end{align}\] |

Because of the symbol interleaving, the difference with bit pre-interleaving is that the length of the shorter symbol is based on a whole FEC symbol. So the number of a FEC symbol contained by a \(\mathit{bl}\) burst error symbol is represented as Eq. (13). Equation (14) is the number and the probability of FEC error symbols caused by these shorted error symbol.

For the interleaving scheme in [19], a burst error with length of \(\mathit{brl}\) symbol can be divided into shorter burst errors with length of *x* as below:

\[\begin{equation*} x=\left\{ \begin{aligned} & \mathit{ceil} \left(\frac{2\times \mathit{brl}}{4}\right), && \mathit{of}\ prob_{1} =\frac{(2\times \mathit{brl})\% 4}{4} \\ & \mathit{floor} \left(\frac{2\times \mathit{brl}}{4}\right), && \mathit{of}\ prob_{2} =1-prob_{1} \\ \end{aligned}\right. \tag{15} \end{equation*}\] |

where the factor of 2 for \(\mathit{brl}\) comes from the fact that one error symbol may affect two bits at most in this scheme.

\[\begin{align} & \mathit{error\ symbol\ number} \notag\\ & \quad=\left\{ \begin{aligned} & \mathit{ceil}\left(\frac{x}{m}\right)+1, && \mathit{of}\ prob_{3} =\frac{|x\% m-1|}{m} \\ & \mathit{ceil}\left(\frac{x}{m}\right), && \mathit{of}\ prob_{4} =1-prob_{3} \\ \end{aligned}\right. \tag{16} \end{align}\] |

See Table 3, for example, when a burst error with length of 6 occurs, it can cause 2 FEC symbols with a 100% probability in the non-interleaving scheme, while bit pre-interleaving can cause 2 FEC symbols with a probability of only 10%, which is half of that in [19]. Especially, symbol pre-interleaving can cause only one FEC symbol error. So, it is clear that symbol pre-interleaving has the best BER performance, while bit pre-interleaving has better resistance of burst errors than [19], but worse than symbol scheme.

Figure 13 gives the performance simulation results for three FEC interleaving schemes in this paper and [19] based on the same equalization configuration, *i.e.* TX 3-tap FFE \(+\) RX 5-tap DFE. It can be observed that the interleaving scheme in this paper can achieve better BER performance than [19] since the former has an advantage on the depth of interleaving and the 2 bits in one PAM4 symbol are from the same FEC lane. In addition, more performance improvement can be achieved for symbol pre-interleaving compared to bit one, and totally 0.52 dB interleaving gain at the BER of \(10^{-7}\) can obtained for this scheme. This is because larger interleaving depth is beneficial for splitting a long burst error into different FEC symbols, making more errors can be corrected after being de-interleaved.

It is clear that interleaving operation enhances the BER performance at the cost of storing resource and latency compared to non-interleaving method. For these two interleaving schemes, they take the same time to buffer 8 FEC symbols before they are read out. So, their latency due to the interleaving is almost equal. On the other hand, symbol pre-interleaving needs a slightly larger memory to store the interleaving data compared to the bit one since the former buffers the waiting data in FEC symbol while the latter in bit pattern. Therefore, the symbol pre-interleaving scheme can provide better performance in tradeoff and can be applied for 400 Gb/s Ethernet.

#### 4. Conclusion

In this paper, the effect of DFE error propagation on BER performance for multi-tap PAM4 DFE with two parallel feedback paths is evaluated and then the analysis result and the simulation result about the impact of different equalizer configurations on BER are also compared through an analytical model of DFE burst error length distribution. Different effective methods such as FEC bit pre-interleaving and FEC symbol pre-interleaving is employed and their impacts on PAM4 system performance have been studied not only based on the simulation but also on the theoretical analysis. Simulation results show that symbol pre-interleave can achieve better BER performance compared to bit one, which is more preferred for 400 Gb/s Ethernet from the view of tradeoff between the interleaving gain and the cost. Future work focuses on the circuit implementation of symbol pre-interleaving scheme.

#### Acknowledgments

This work was supported by National Power Gird Corp Science and Technology Project (SGTYHT/17-JS-201).

#### References

[1] P.-C. Chiang, H.-W. Hung, H.-Y. Chu, G.-S. Chen, and J. Lee, “60Gb/s NRZ and PAM4 transmitters for 400GbE in 65nm CMOS,” IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, USA, pp.42–43, March 2014.

CrossRef

[2] LAN/MAN Standards Committee, “IEEE standard for ethernet ― Amendment 10: Media access control parameters, physical layers, and management parameters for 200 Gb/s and 400 Gb/s operation,” IEEE P802.3bs, https://ieeexplore.ieee.org/servlet/opac?punumber=8207823, 12 Dec., 2017.

URL

[3] G. Tzimpragos, C. Kachris, I.B. Djordjevic, M. Cvijetic, D. Soudris, and I. Tomkos, “A survey on FEC codes for 100G and beyond optical networks,” IEEE Communications Surveys & Tutorials, vol.18, no.1, pp.209–221, 2016.

CrossRef

[4] L. Tang, W. Gai, L. Shi, X. Xiang, K. Sheng, and A. He, “A 32Gb/s 133mW PAM-4 transceiver with DFE based on adaptive clock phase and threshold voltage in 65nm CMOS,” IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, pp.114–116, March 2018.

CrossRef

[5] X.Q. Dong and C.X. Huang, “Improved engineering analysis in FEC system gain for 56G PAM4 applications,” Proc. DesignCon, Santa Clara, USA, Feb. 2018.

[6] Y.C. Lu, H. Wong, D. Tonietto, et al., “DFE error propagation characteristics in real 56Gbps PAM4 high-speed links with pre-coding and impact on the FEC performance,” Proc. DesignCon, Santa Clara, USA, Feb. 2017.

[7] G. Zhang, “Preliminary studies on DFE error propagation, precoding, and their impact on KP4 FEC performance for PAM4 signaling systems,” IEEE 802.3 Interim Meeting, http://www.ieee802.org/3/ck/public/18_09/zhang_3ck_01a_0918.pdf, 2018.

URL

[8] T. Wang, Z. Wang, X. Wang, J. Sun, and A. Ghiasi, “Analysis and comparison of FEC schemes for 200GbE and 400GbE,” IEEE Communications Standards Magazine, vol.1, no.1, pp.24–30, 2017.

CrossRef

[9] Y.Z. Zhan and Q.S. Hu, “Effect of DFE error propagation and its mitigation using MUX-based FEC interleaving for 400 GbE electrical link,” High Technology Letters, vol.24, no.4, pp.387–395, 2018.

[10] F. Lv, X. Zheng, S. Yuan, Z. Wang, Y. He, C. Zhang, Z. Wang, F. Lv, and J. Wang, “A 40–80 Gb/s PAM4 wireline transmitter in 65nm CMOS technology,” IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, USA, pp.539–542, 2017.

CrossRef

[11] P.-J. Peng, J.-F. Li, L.-Y. Chen, and J. Lee, “A 56Gb/s PAM-4/NRZ transceiver in 40nm CMOS,” IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, USA, pp.110–111, March 2017.

CrossRef

[12] W. Yao, J. Lim, J. Zhang, K. Tseng, K. Qiu, and R. Brooks, “Design of package BGA pin-out for >25Gb/s high speed SerDes considering PCB via crosstalk,” IEEE Symposium on Electromagnetic Compatibility and Signal Integrity, Santa Clara, USA, pp.111–116, March 2015.

CrossRef

[13] N. Dikhaminjia, J. He, H. Deng, M. Tsiklauri, J. Drewniak, A. Chada, and B. Mutnury, “Effect of improved optimization of DFE equalization on crosstalk and jitter in high speed links with multi-level signal,” IEEE 68th Electronic Components and Technology Conference (ECTC), San Diego, USA, pp.2101–2106, June 2018.

CrossRef

[14] J. Lee, P.-C. Chiang, P.-J. Peng, L.-Y. Chen, and C.-C. Weng, “Design of 56 Gb/s NRZ and PAM4 SerDes transceivers in CMOS technologies,” IEEE J. Solid-State Circuits, vol.50, no.9, pp.2061–2073, 2015.

CrossRef

[15] A. Roshan-Zamir, O. Elhadidy, H.-W. Yang, and S. Palermo, “A reconfigurable 16/32 Gb/s dual-mode NRZ/PAM4 SerDes in 65-nm CMOS,” IEEE J. Solid-State Circuits, vol.52, no.9, pp.2430–2447, 2017.

CrossRef

[16] C.Y. Liu and J. Caroselli, “Modeling and mitigation of error propagation of decision feedback equalization in high speed backplane transceivers,” Proc. DesignCon, Santa Clara, USA, Feb. 2006.

[17] G. Zhang, H.T. Zhang, S. Asuncion, et al., “A tutorial on PAM4 signaling for 56G serial link applications,” Proc. DesignCon, Santa Clara, USA, Feb. 2017.

[18] M. Shimanouchi, H. Wu, and M.P. Li, “Behavioral FEC models for high speed serial link BER simulation,” Proc. DesignCon, Santa Clara, USA, Feb. 2018.

[19] J. Slavick, “PMA muxing considerations,” IEEE P802.3bs 200 GbE & 400 GbE Task Force, Jan. 14–16, 2015.

#### Appendix A:

For \(p(W(E)=w)\) in Eq. (6), where \(1 \le w \le \infty\), it can be derived in the following.

For the simplest case, *i.e.* \(W(E=\{1\})=1\), the probability is wrote as:

\[\begin{equation*} p (W(E)=1) = n \cdot p \cdot p(\mathit{brl}=1) \cdot (1-p)^{n- \mathit{brl}_{\max} -1} \tag{A$\cdot$1} \end{equation*}\] |

where *p* is random error probability and \(p(\mathit{brl}=1)\) can be gotten from Eq. (4).

For the case of \(W(E)\ge 2\), however, there are two different subcases: one is that the error pattern is a single burst error, the other is that it consists of multiple random or/and burst errors. Since the probability of former is much greater than that of latter, \(p(W(E)=2)\) can be calculated as following:

\[\begin{align} & p (W(E)=2) \notag\\ &\quad{} = p (\text{burst error of 2 symbols}) \notag\\ & \quad{} \hphantom{={}} + p (\text{two separate errors}) \notag\\ & \quad{} \approx p (\text{burst error of 2 symbols}) \notag\\ & \quad{} = p (E=\{1,1\}) + p (E=\{1, 0, 1\}) \notag\\ & \quad{} \hphantom{={}} + p (E=\{1, 0, 0, 1\}) + p (E=\{1,0,0,0,1\}) \notag\\ & \quad{} \hphantom{={}} + p (E=\{1,0,0,0,0,1\}) + \cdots \notag\\ & \quad{} \approx \sum_{l=2}^{\mathit{brl}_{\max}} p (\mathit{brl}=l, W(E)=2) \notag\\ {}& \quad{} = \sum_{l=2}^{\mathit{brl}_{\max}} n \cdot p \cdot \left(\sum_{j,W(E_{l,j})=2} p (\mathit{brl}=l, E_{l,j})\right) \cdot (1-p)^{n- \mathit{brl}_{\max} -l} \tag{A$\cdot$2} \end{align}\] |

Similarly, \(p(W(E)=w)\) can be wrote as Eq. (7).

#### Appendix B:

The numerator in Eq. (6) is calculated as following:

\[\begin{align} & \sum_{w=1}^\infty p (W(E)=w) \cdot W(E) \notag\\ & = \sum_{w=1}^\infty \left(\sum_{l=w}^{\mathit{brl}_{\max}} p (\mathit{brl}=l, W(E)=w)\right) \cdot W(E) \notag\\ & = \sum_{w=1}^\infty \left(\sum_{l=w}^{\mathit{brl}_{\max}} n\cdot p\cdot \left(\sum_{j, W(E_{l,j})=w} p_{\mathrm{sym}} (\mathit{brl}=l, E_{l,j})\right) \right. \notag\\ & \hphantom{={}}\left. \hphantom{\sum_{l=w}^{\mathit{brl}_{\max}}\hskip17.5mm}\vphantom{\sum_{l=w}^{\mathit{brl}_{\max}}} {} \cdot (1-p)^{n- \mathit{brl}_{\max} -l} \right) \cdot w \notag\\ & = \sum_{l=1}^{\mathit{brl}_{\max}} n \cdot p\cdot \left(\sum_{j, W(E_{l,j})=1} p(\mathit{brl}=l, E_{l,j})\right) \cdot (1{-}p)^{n- \mathit{brl}_{\max} -l} \cdot 1 \notag\\ &\hphantom{={}} +\sum_{l=2}^{\mathit{brl}_{\max}} n \cdot p\cdot \left(\sum_{j, W(E_{l,j})=2} p(\mathit{brl}=l, E_{l,j})\right) \cdot (1{-}p)^{n- \mathit{brl}_{\max} -l} \cdot 2 \notag\\ & \hphantom{={}} + \sum_{l=3}^{\mathit{brl}_{\max}} n \cdot p\cdot \left(\sum_{j, W(E_{l,j})=3} p(\mathit{brl}=l, E_{l,j})\right) \cdot (1{-}p)^{n-\mathit{brl}_{\max} -l} \cdot 3 \notag\\ &\hphantom{={}} +\sum_{l=4}^{\mathit{brl}_{\max}} n \cdot p\cdot \left(\sum_{j, W(E_{l,j})=4} p(\mathit{brl}=l, E_{l,j})\right) \cdot (1{-}p)^{n-\mathit{brl}_{\max} -l} \cdot 4 \notag\\ &\hphantom{={}} + \cdots \notag\\ & \qquad\vdots \notag\\ & = n \cdot p \cdot p(\mathit{brl}=1, E_{1,1}) \cdot (1-p)^{n-\mathit{brl}_{\max} -1} \cdot 1 \notag\\ &\hphantom{={}} + n \cdot p \cdot p(\mathit{brl}=2, E_{2,1}) \cdot (1-p)^{n-\mathit{brl}_{\max} -2} \cdot 2 \notag\\ &\hphantom{={}} + n \cdot p \cdot p(\mathit{brl}=3, E_{3,1}) \cdot (1-p)^{n-\mathit{brl}_{\max} -3} \cdot 2 \notag\\ & \hphantom{={}} + n \cdot p \cdot p(\mathit{brl}=4, E_{4,1}) \cdot (1-p)^{n-\mathit{brl}_{\max} -4} \cdot 2 \notag\\ &\hphantom{={}} + \cdots \notag\\ & \hphantom{={}} + n \cdot p \cdot p(\mathit{brl}=3, E_{3,2}) \cdot (1-p)^{n-\mathit{brl}_{\max} -3} \cdot 3 \notag\\ & \hphantom{={}} + n \cdot p \cdot \sum_{j, W(E_{4,j})=3} p(\mathit{brl}=4, E_{4,j}) \cdot (1-p)^{n-\mathit{brl}_{\max} -4} \cdot 3 \notag\\ & \hphantom{={}} + n \cdot p \cdot \sum_{j, W(E_{5,j})=3} p(\mathit{brl}=5, E_{5,j}) \cdot (1-p)^{n-\mathit{brl}_{\max} -5} \cdot 3 \notag\\ & \hphantom{={}} + \cdots \notag\\ & \qquad \vdots \notag\\ & = n \cdot p \cdot p(\mathit{brl}=1, E_{1,1}) \cdot (1-p)^{n-\mathit{brl}_{\max} -1} \cdot 1 \notag\\ & \hphantom{={}} + n \cdot p \cdot p(\mathit{brl}=2, E_{2,1}) \cdot (1-p)^{n-\mathit{brl}_{\max} -2} \cdot 2 \notag\\ & \hphantom{={}} + n \cdot p \cdot p(\mathit{brl}=3, E_{3,1}) \cdot (1-p)^{n-\mathit{brl}_{\max} -3} \cdot 2 \notag\\ & \hphantom{={}} + n \cdot p \cdot p(\mathit{brl}=3, E_{3,2}) \cdot (1-p)^{n-\mathit{brl}_{\max} -3} \cdot 3 \notag\\ & \hphantom{={}} + n \cdot p \cdot p(\mathit{brl}=4, E_{4,1}) \cdot (1-p)^{n-\mathit{brl}_{\max} -4} \cdot 2 \notag\\ & \hphantom{={}} + n \cdot p \cdot \sum_{j, W(E_{4,j})=3} p(\mathit{brl}=4, E_{4,j}) \cdot (1-p)^{n-\mathit{brl}_{\max} -4} \cdot 3 \notag\\ & \hphantom{={}} + \cdots \notag\\ & \hphantom{={}} + n \cdot p \cdot p(\mathit{brl}=5, E_{5,1}) \cdot (1-p)^{n-\mathit{brl}_{\max} -5} \cdot 2 \notag\\ & \hphantom{={}} + n \cdot p \cdot \sum_{j, W(E_{5,j})=3} p(\mathit{brl}=5, E_{5,j}) \cdot (1-p)^{n-\mathit{brl}_{\max} -5} \cdot 3 \notag\\ & \hphantom{={}} + \cdots \notag\\ &\qquad \vdots \notag\\ & = n \cdot p \cdot \left(\sum_{\mathit{all}\ E_{1,j}} p(\mathit{brl}=1, E_{1,j}) \cdot W(E_{1,j})\right) \cdot (1{-}p)^{n-\mathit{brl}_{\max} -1} \notag\\ &\hphantom{={}} + n \cdot p \cdot \left(\sum_{\mathit{all}\ E_{2,j}} p(\mathit{brl}=2, E_{2,j}) \cdot W(E_{2,j})\right) \cdot (1{-}p)^{n-\mathit{brl}_{\max} -2} \notag\\ & \hphantom{={}} + n \cdot p \cdot \left(\sum_{\mathit{all}\ E_{3,j}} p(\mathit{brl}=3, E_{3,j}) \cdot W(E_{3,j})\right) \cdot (1{-}p)^{n-\mathit{brl}_{\max} -3} \notag\\ &\hphantom{={}} + n \cdot p \cdot \left(\sum_{\mathit{all}\ E_{4,j}} p(\mathit{brl}=4, E_{4,j}) \cdot W(E_{4,j})\right) \cdot (1{-}p)^{n-\mathit{brl}_{\max} -4} \notag\\ & \hphantom{={}} +\cdots \notag\\ & \qquad\vdots \notag\\ & = n \cdot p\cdot \sum_{l=1}^{\mathit{brl}_{\max}} \left(\left(\sum_{\mathit{all}\ E_{l,j}} p(\mathit{brl}=l, E_{l,j}) \cdot W(E_{l,j}) \right) \right. \notag\\ & \hphantom{= n \cdot p\cdot \sum_{l=1}^{\mathit{brl}_{\max}}\left(\frac{a}{a}\right.}\left. \vphantom{\sum_{\mathit{all}\ E_{l,j}}} {}\cdot (1-p)^{n-\mathit{brl}_{\max} -l} \right) \notag\\ & = n \cdot p\cdot \sum_{l=1}^{\mathit{brl}_{\max}} \left(\left(\sum_{\mathit{all}\ E} p(\mathit{brl}=l, E) \cdot W(E) \right) \cdot (1-p)^{n-\mathit{brl}_{\max} -l} \right). \tag{A$\cdot$3} \end{align}\] |

Then, Eq. (8) is obtained as following:

\[\begin{align} &\mathit{SER} = \frac{\sum\limits_{w=1}^\infty p (W(E)=w)\cdot W(E)}{n} \notag\\ & {} = \frac{n\cdot p\cdot \sum\limits_{l=1}^{\mathit{brl}_{\max}} \left(\left(\sum\limits_{\mathit{all}\ E} p(\mathit{brl} = l, E) \cdot W(E) \right) \cdot (1-p)^{n- \mathit{brl}_{\max} -l} \right)}{n} \notag \\ &{} = \sum_{l=1}^{\mathit{brl}_{\max}} \sum_{\mathit{all}\ E} p(\mathit{brl}=l, E) \cdot W(E) \cdot p \cdot (1-p)^{n- \mathit{brl}_{\max} -l}. \tag{A$\cdot$4} \end{align}\] |