## A Mueller-Müller CDR with False-Lock-Aware Locking Scheme for a 56-Gb/s ADC-Based PAM4 Transceiver

Fumihiko TACHIBANA<sup>†a)</sup>, Huy CU NGO<sup>†</sup>, Go URAKAWA<sup>†</sup>, Takashi TOI<sup>†</sup>, Mitsuyuki ASHIDA<sup>†</sup>, Yuta TSUBOUCHI<sup>†</sup>, Mai NOZAWA<sup>†</sup>, Junji WADATSUMI<sup>†</sup>, Hiroyuki KOBAYASHI<sup>†</sup>, *and* Jun DEGUCHI<sup>†</sup>, Nonmembers

SUMMARY Although baud-rate clock and data recovery (CDR) such as Mueller-Müller (MM) CDR is adopted to ADC-based receivers (RXs), it suffers from false-lock points when the RXs handle PAM4 data pattern because of the absence of edge data. In this paper, a false-lock-aware locking scheme is proposed to address this issue. After the false-lock-aware locking scheme, a clock phase is adjusted to achieve maximum eye height by using a post-1-tap parameter for an FFE in the CDR loop. The proposed techniques are implemented in a 56-Gb/s PAM4 transceiver. A PLL uses an area-efficient "glasses-shaped" inductor. The RX comprises an AFE, a 28-GS/s 7-bit time-interleaved SAR ADC, and a DSP with a 31-tap FFE and a 1-tap DFE. A TX is based on a 7-bit DAC with a 4-tap FFE. The transceiver is fabricated in 16-nm CMOS FinFET technology, and achieves a BER of less than 1e-7 with a 30-dB loss channel. The measurement results show that the MM CDR escapes from false-lock points, and converges to near the optimum point for large eye height.

key words: baud-rate CDR, PAM4, false-lock-aware, phase adjustment

## 1. Introduction

Due to increasing data bandwidth demand, the PAM4 signaling has been used instead of the NRZ signaling for relaxing the impact of the insertion loss (IL) because required Nyquist frequency of the PAM4 signaling is half compared with that of the NRZ signaling at the same data bandwidth. Furthermore, ADC-based receivers (RXs) [1]–[11] have been adopted instead of analog-based RXs [12]–[15] to handle high-speed multi-level signaling beyond 50 Gb/s in relatively long-distance channels such as middle reach (MR) and long reach (LR), which enables utilizing digital equalization using a DSP after an ADC. Typical ADC-based RXs use baud-rate clock and data recovery (CDR) such as Mueller-Müller (MM) CDR [16], which does not need edge data.

However, the absence of edge data makes it difficult for the CDR to converge to the optimal lock point. In particular, NRZ-like transitions within the PAM4 data pattern introduce false-lock points near their edges, leading to a higher BER compared with that at correct lock point. However, to the best of our knowledge, there is no proposal to escape from the false-lock points generated by the PAM4 data pattern.

Another issue is how to adjust the clock phase to the

 $^\dagger The authors are with Kioxia Corporation, Yokohama-shi, 247-8585 Japan.$ 

a) E-mail: fumihiko.tachibana@kioxia.com

DOI: 10.1587/transfun.2023GCP0003

optimum lock point for achieving large eye heights. Although several techniques are proposed to adjust the clock phase [17], [18], the pre-1 cursor is not equalized because [17], [18] are proposed for analog-based RX which generally utilizes only a decision feedback equalizer (DFE), resulting in smaller eye height for the CDR compared with that the pre-1 cursor is equalized. Moreover, although [3] utilizing ADC-based RX mentions the clock phase adjustment, its detail is not given.

In this paper, we propose techniques to address these issues [19]. False-lock-aware locking scheme allows the MM CDR to find the correct lock point. Clock phase adjustment by controlling the post-1-tap parameter adjusts the clock phase to the point which achieves maximum eye height for data decision while both the pre-1 cursor and the post-1 cursor can be made zero for the CDR. This paper is organized as follows. Section 2 proposes the techniques for the MM CDR. Section 3 describes a transceiver architecture. Section 4 shows the experimental results. Finally, conclusion is presented in Sect. 5.

#### 2. Proposed Techniques for the MM CDR

Proposed techniques for the MM CDR consist of two parts. The first one is a false-lock-aware locking scheme for escaping from the false-lock points. The second one is a clock phase adjustment using a post-1-tap parameter for a feed forward equalizer (FFE) in the CDR loop (CDR FFE) to achieve maximum eye height for data decision. These two techniques are described in the following subsections in detail.

#### 2.1 False-Lock-Aware Locking Scheme

Figure 1 shows the simplified block diagram of the CDR loop. The output signal from an ADC is equalized in the CDR FFE and its outputs are sent to a FFE for data decision (DATA FFE) and the comparator for clock phase control, where cdr\_tap is the tap parameter for the CDR FFE. The output to the DATA FFE does not include the result of the post-1-tap FFE because a post-1-tap DFE for data decision (DATA DFE) is performed after the DATA FFE. In the comparator, the output of the CDR FFE for the CDR (FFE\_OUT(t)) and the reference value for the comparator in the CDR loop (REFC) are compared, and converted data values D(t) and errors E(t) are sent to a MM phase detec-

Copyright © 2024 The Institute of Electronics, Information and Communication Engineers

Manuscript received May 22, 2023.

Manuscript revised September 12, 2023.

Manuscript publicized November 2, 2023.



Fig. 1 Simplified block diagram of the CDR loop (©2022 IEEE [19]).



**Fig. 2** Simulation results of (a) MM PD output, (b) eye diagram, (c) histogram at the correct lock point, and (d) histogram at the false-lock point (©2022 IEEE [19]).

tor (MM PD). The output of the MM PD (PD\_OUT) is sent to a loop filter (LF), and the clock phase is controlled via a phase interpolator (PI). A clock generator (CK\_GEN) generates several clocks for the RX from the PI clock signals.

Figure 2 shows simulation results of a relation between the clock phase and the MM PD output with the PAM4 signaling, the eye diagram after the CDR FFE, and the histograms at correct/false lock points. As shown in Figs. 2(a) and (b), there is a correct lock point at around the center of the eye, at which the acquired data (blue plots) have the expected quad-modal distribution as shown in Fig. 2(c). On the other hand, there are two false-lock points near the data



Fig. 3 Concept of offset-based technique for escaping from the false-lock point.

edges, although the acquired data at these points do not have a quad-modal distribution as shown in Fig. 2(d).

To investigate why these false-lock points exist, only the transitions between 0 and 3 (orange lines) are extracted from the PAM4 data pattern, as shown in Fig. 2(b). When only the extracted transitions are considered, there are two phase points near the data edges where the acquired data have a quad-modal distribution as shown in Fig. 2(d), so the comparator misjudges these points as PAM4 data. This incurs incorrect lock points near the data edges, and the CDR might converge to a false-lock point.

In order to avoid the false-lock points, two approaches are considered in this paper. The first one is adding an offset value to PD\_OUT, which is inspired by [18], and the second one is using the comparator in a NRZ mode. Concept of the first approach is shown in Fig. 3. Modified PD\_OUT (PD\_OUT') is expressed as Eq. (1).

$$PD_OUT' = PD_OUT + offset$$
 (1)

Based on Eq. (1), the false-lock points are avoided by following the sequence as shown in Fig. 3(b). Step 1: the offset value is set negative enough to make PD\_OUT' lower than zero at any clock phase to force the CDR unlock. Step 2: the offset value is gradually set higher, and the clock phase that meets PD\_OUT'=0 comes to appear. Because the absolute values of peak and valley near the correct lock point are larger than those near the false-lock points, adequate offset value can make the false-lock points disappear. As a result, the CDR locks to near the correct lock point. Step 3: the offset value gradually converges to zero, and the CDR finally locks to the correct lock point.

The drawback of this offset-based technique is that it is difficult to know the absolute values of peak and valley near the correct lock point and those near the false-lock points. So that, the initial offset value starts at much higher absolute value than that of peak and valley in order to guarantee that the CDR does not lock to the false-lock points. That makes the CDR unlock, and the integrator in the LF might be saturated at the start of the sequence. Moreover, when



**Fig.4** Simulation results of PD\_OUT with the comparator in the NRZ mode or the PAM4 mode (before the adaptation of FFE parameters) (©2022 IEEE [19]).

the difference between the absolute values of peak and valley near the correct lock point and those near the false-lock points is small, it means that the offset range where only the correct lock point exists is narrow. In addition, the offset range could be small when the input signal is not equalized well in the analog domain. From the above reasons, this offset-based technique is not adopted for avoiding false-lock points.

The second approach is to find the correct lock point using the characteristics of the MM PD with the comparator operating in the NRZ mode as shown in Fig. 4. Because the false-lock points are generated by misjudgments at the additional stable points when the comparator is operating in the PAM4 mode, these lock points could disappear by setting the comparator mode from the PAM4 mode to the NRZ mode as shown in Fig. 4(a). Figure 4(b) shows simulation results of PD\_OUT with the PAM4 signaling using the comparator with NRZ/PAM4 modes. As shown in Fig. 4(b), the false-lock points disappear and only the correct lock point exists with the NRZ mode. This indicates that the false-lock points could disappear when the NRZ mode is adopted for the MM PD. However, the absolute value of the slope at the correct lock point with the NRZ mode is lower than that with the PAM4 mode as shown in Fig. 4(b), which means that the gain of the MM PD is lower than that with the PAM4 mode. This results in lower tracking bandwidth (BW) of the jitter tolerance (JTOL) with the NRZ mode compared to that with the PAM4 mode.

To achieve higher gain with avoiding the false-lock points, we propose a sequence that uses both the NRZ mode and the PAM4 mode as shown in Fig. 5. Step 1: the CDR locks to the correct lock point using the comparator in the NRZ mode. Step 2: adaptation of the parameters related to equalizers such as FFE/DFE, REFC, the reference value for



Fig. 5 Proposed sequence of false-lock-aware locking scheme.



**Fig.6** Connection from the comparator in the CDR loop to the CDR LMS adaptation circuit and the MM PD.

the comparator in the DATA DFE (REFD), a variable-gain amplifier (VGA), and a continuous-time linear equalizer (CTLE) is performed. Step 3: the comparator is switched to the PAM4 mode to have higher gain than that with the NRZ mode. Step 4: re-adaptation of the parameters related to equalizers is performed. Because the CDR locks to the correct lock point before switching to the PAM4 mode, the clock phase gets difficult to fall into false-lock points. Since the proposed sequence just switches the comparator modes according to the steps, it does not need to evaluate the BER for detecting false-lock points, reducing the time for the adaptation.

Note that the least mean square (LMS) algorithm using the output of the comparator in the NRZ mode fails in adapting the FFE parameters because the comparator converts data level of 1 and 2 to 0 and 3, respectively. However, appropriate adaptation of the FFE parameters in Step 2 is important for the correct lock point more stable, and further reducing the risk of falling into false-lock points when the comparator mode is switched from the NRZ mode to the PAM4 mode in Step 3. Figure 6 shows the connection from the comparator in the CDR loop to the CDR LMS and the MM PD, where E\_NRZ is the E with the NRZ mode, D\_MM is the D for the MM PD, E\_MM is the E for the MM PD, MM\_SEL is the selector signal between the NRZ mode and the PAM4 mode, respectively. The outputs of comparators connected to the MM PD are switched between the NRZ mode and the PAM4 mode according to the steps. On the other hand, the output of the comparator in the PAM4 mode (E) is always connected to the adaptation circuit with the LMS algorithm, and this enables to activate the LMS algorithm while in the NRZ mode.

# 2.2 Clock Phase Adjustment Using the Post-1-Tap Parameter of the CDR FFE

After the false-lock-aware locking scheme, the clock phase is adjusted to the point where it achieves maximum eye height, which is strongly associated with low BER. Theoretically, the MM CDR locks to the point where the pre-1 cursor (h(-1)) is equal to the post-1 cursor (h(1)) as shown in Fig. 7(a). On the other hand, the LMS algorithm adapts the CDR FFE parameters to make both the h(-1) and h(1)zero regardless of whether the lock point is the optimal point for maximum eye height or not. As a result, the lock point is not always the optimal point for maximum eye height if the pre-1-tap parameter of the CDR FFE (cdr\_tap(-1)) and the post-1-tap parameter of the CDR FFE (cdr\_tap(1)) are adapted by the LMS algorithm.

Several techniques are proposed to adjust the clock phase by adding the weights of "Early" and "Late" [17] or the offset value [18] to the MM PD. However, these techniques utilizes the difference between h(-1) and h(1) to adjust the clock phase, and h(-1) is not equalized with [17] and [18]. As a result, eye height for the CDR remains smaller compared with that h(-1) is equalized.

To solve this issue,  $cdr_tap(1)$  is adapted independently instead of the LMS algorithm as shown in Fig. 7(b). With this scheme, the waveform of single bit response (SBR(t)) is changed to SBR'(t) after the CDR FFE by controlling cdr\_tap(1), and locked phase value is also controlled (red arrow). Fig. 7(c) shows the simulation results of the relation between cdr\_tap(1) and the locked phase value. As shown in Fig. 7(c), the clock phase is controlled by changing cdr\_tap(1). Whereas h(-1) is not equalized with [17], [18], both h(-1) and h(1) can be made zero with proposed technique, resulting in larger eye height for the CDR.

In the proposed sequence,  $cdr_tap(1)$  is adapted independently to achieve maximum REFD, it means that the eye height for data decision is also maximum. Figure 8(a) shows a flowchart of the  $cdr_tap(1)$  adaptation sequence. Step 1: initialize the direction of the  $cdr_tap(1)$  adaptation, the previous REFD (pREFD), and the current REFD (cREFD).



Fig.7 Clock phase adjustment via the cdr\_tap(1) adaptation (©2022 IEEE [19]).



**Fig.8** Clock phase adjustment via the cdr\_tap(1) adaptation (©2022 IEEE [19]).



Fig. 9 Transceiver block diagram (©2022 IEEE [19]).

Step 2: the LMS algorithm adapts the FFE parameters except for cdr\_tap(1), the DFE parameters, REFC, and REFD for a certain period of time. At the same time, REFD is integrated to cREFD at every cycle. Step 3: cREFD is compared with pREFD, meaning that the averaged REFD in this period is compared with that in the previous period. If cREFD is larger than pREFD, then the direction is unchanged; otherwise, the direction is reversed (from +1 to -1 or from -1 to +1). Step 4: according to the direction, cdr\_tap(1) is either incremented or decremented. At the same time, cREFD is saved to pREFD, and cREFD is reset to zero, and return to Step 2. Figure 8(b) shows the example of the cdr\_tap(1) adaptation. Because the FFE is already implemented in the CDR loop, no additional computational cost except for the cdr\_tap(1) adaptation is required for adjusting the clock phase. Note that the sequence described in Fig. 8 just adapts FFE/DFE parameters and REFC/REFD, it is used in the Step 2 and 4 in Fig. 5.

#### 3. Transceiver Architecture

The proposed techniques in Sect. 2 are implemented in a 56-Gb/s PAM4 transceiver with ADC-based RX. The transceiver architecture is shown in Fig. 9. To improve the signal quality, an LC ladder filter assisted ESD protection is used for a transmitter (TX). A 7-bit DAC with a function of the 1-pre/2-post-tap FFE is used for the TX. An RX analog front end (RX-AFE) consists of a T-Coil assisted on-die termination, a 2-stage CTLE and a 2-stage VGA. The RX-AFE is followed by a time-interleaved ADC (TI-ADC). The outputs of the TI-ADC are connected to an aligner, and aligned outputs are sent to the DSP.

As shown in Fig. 9, the DSP consists of an offset/gain mismatch correction, a 3-pre/4-post-tap CDR FFE, a MM PD, a LF, a PI controller, a 4-pre/26-post-tap DATA FFE, and a 1-tap DATA DFE. It also includes a mismatch estimation circuit, an adaptation circuits with the LMS algorithm for the FFE/DFE parameters and REFC/REFD, an adaptation circuit for cdr\_tap(1), and a CTLE/VGA controller. The output of the CDR FFE without cdr\_tap(1) equalization is sent to the DATA FFE/DFE to reduce the bit width



Fig. 10 Glasses-shaped inductor (©2022 IEEE [19]).

of the DATA FFE parameters while that of the CDR FFE with cdr\_tap(1) equalization is sent to the MM PD. The offset/gain mismatches are corrected in the digital domain while the skew mismatches are corrected in the analog domain.

An LC-type VCO is applied to a PLL for the low-jitter clock distributed to the TX and a PI followed by a CK\_GEN for the ADC/DSP. In general, it is important for VCOs to achieve the low-jitter performance with small occupied area. However, conventional high-Q spiral inductors need large area for the low-jitter performance. To reduce the occupied area of the inductor, a custom "glasses-shaped" inductor has been designed as shown in Fig. 10. Compared with conventional inductor, small voltage difference between upper and lower layer in the inner-winding helps reduce the equivalent lamped capacitance, resulting in larger Q-value [20]. The current direction of the most part is the same as the adjacent wire, resulting in large inductance by mutual inductance. However, the current direction near the center is opposite, that reduces total inductance. To relax the impact of the center part, the wire near the center is designed to be short and widely spaced as possible. Using the proposed inductor, the occupied area of the inductor can be reduced while maintaining the required inductance and Q-value. Figure 11



Fig. 11 Simulation results of proposed inductor at 14 GHz (©2022 IEEE [19]).



Fig. 12 TI-ADC architecture (©2022 IEEE [19]).

shows the relation between occupied area and simulated Q-values at 14 GHz. Compared with conventional spiral inductor with the same inductance and Q-value, the occupied area of the proposed inductor (Area=X\*Y) is reduced by 28.7% at the moderate Q-value of 9.4, which is enough for the target clock jitter. In addition, a noise-cancelling charge pump [21] is also implemented in the PLL.

Figure 12(a) shows the block diagram of the TI-ADC. A hierarchical architecture with four rank-1 buffers is used in the 7-bit 32-way TI-ADC. Each rank-1 buffer is followed by two track-and-hold circuits (THs) running at 3.5 GS/s, and each TH is connected to a rank-2 buffer followed by four sub-ADCs. Figure 12(b) shows the timing chart of the TI-ADC. CK\_R1\_Xs are skew calibrated by the control signal from the DSP while CK\_R2\_Xs are not calibrated. Figure 12(c) shows the block diagram of the sub-ADC. Each sub-ADC uses a dynamic comparator in an asynchronous loop with 8 successive approximation cycles (1-bit redundancy for testing 8-bit operation).

## 4. Measurement Results

A test chip is fabricated in 16-nm CMOS FinFET technology, and a die micrograph of the test chip is shown in Fig. 13. The chip area of the transceiver is  $4.0 \text{ mm}^2$  including test circuits. Figure 14 shows the plots of channel IL used for external loopback tests. The total loss from ball to ball is 30 dB at 14 GHz. Figures 15(a) and 15(b) show the



Fig. 13 Die micrograph (©2022 IEEE [19]).



Fig.14 Insertion loss used for external loopback tests (©2022 IEEE [19]).



**Fig. 15** (a) TX output and (b) recovered eye diagram at 56 Gb/s PRQS7 (©2022 IEEE [19]).



Fig. 16 Bathtub curve at 56 Gb/s (©2022 IEEE [19]).

TX output and the recovered RX eye diagram at 56 Gb/s. As shown in Fig. 15(b), an open eye pattern is obtained after the equalization by the DSP. Figure 16 shows the bathtub curve obtained from the external loopback tests. The BER of less than 1e-7 is achieved with PRBS31. Figure 17 shows the JTOL with PRBS15 at IL=29.3 dB. In the JTOL measurements, an arbitrary waveform generator outputs the PAM4 signal to the RX. As shown in Fig. 17, 10-MHz tracking BW is achieved, and it meets our target JTOL.

Figure 18 shows the equalized eye diagram and the data histogram at the lock point with and without the proposed false-lock-aware locking scheme when the clock phase is set to the false-lock point before the adaptation sequence.



Fig. 17 JTOL at 56 Gb/s (©2022 IEEE [19]).



Fig. 18 Eye diagram and histogram at lock point with/without the false-lock-aware locking scheme at 56 Gb/s PRQS7 from the external loopback tests (IL=30 dB) (O2022 IEEE [19]).

With proposed scheme, the MM CDR can successfully escape from the false-lock point, and the distribution of each data level is separated. On the other hand, the clock phase stays at the false-lock point without this scheme, and the distribution of each data level is overlapped even after the adaptation sequence. From this result, it is confirmed that the false-lock-aware locking scheme is the effective way to find the correct lock point.

Figure 19 shows measured REFD and BER against cdr\_tap(1), 10 different adaptation sequences are executed in each cdr\_tap(1) to reduce the effect of the fluctuations in the parameters caused by the difference in the way adaptation sequences are performed. The averaged REFD and BER with various cdr\_tap(1) are plotted as the blue plots in Fig. 19. The measured REFD and BER using the proposed cdr\_tap(1) adaptation technique are also plotted as the red plots. As shown in Fig. 19, cdr\_tap(1) is successfully converged and REFD and BER are close to the optimum values by using the proposed cdr\_tap(1) adaptation.

Table 1 shows a comparison with previous works at 56 Gb/s. Compared with previous works, our work includes



**Fig. 19** Plots of (a) averaged REFD and (b) BER against fixed cdr\_tap(1) at 56 Gb/s PRBS31 from the external loopback tests (IL=30 dB) (©2022 IEEE [19]).

|                              | This work            | [1]         | [3]              | [5]        |
|------------------------------|----------------------|-------------|------------------|------------|
| Technology                   | 16nm FinFET          | 16nm FinFET | 16nm FinFET      | 7nm FinFET |
| Power Supply [V]             | 0.8/0.9/1.0/1.8      | 0.9/1.2/1.8 | 0.85/0.9/1.2/1.8 | 0.75/0.9   |
| Data Rate [Gb/s]             | 56                   | 56          | 56               | 56         |
| L [dB]                       | 30                   | 31          | 32               | 42.5       |
| Area/lane [mm <sup>2</sup> ] | 4.0                  | 1.4         | 2.2              | 0.468      |
|                              | (incl. test circuit) |             |                  |            |
| Power/lane (excl. DSP) [mW]  | 365                  | 550         | 325              | 146        |
| Power Efficiency [pJ/bit]    | 6.5                  | 9.8         | 5.8              | 2.6        |
| BER                          | 1.0E-07              | 1.0E-15     | 1.0E-12          | 1.0E-07    |
| False-Lock-Aware             | Yes                  | -           | -                | -          |
| Phase Adjustment             | Yes                  | -           | Yes              | -          |

 Table 1
 Performance Summary and Comparison (©2022 IEEE [19]).

both false-lock-aware locking scheme and phase adjustment scheme.

### 5. Conclusion

In this paper, a technique for the MM CDR is proposed to allow the MM CDR to escape from false-lock points by switching the comparator modes for the MM PD. Another technique for the clock phase adjustment using the post-1tap parameter is also proposed to adjust the clock phase to the point which achieves maximum eye height. Proposed techniques are implemented in the test chip fabricated in 16-nm CMOS FinFET technology. From measurement results, it is confirmed that the MM CDR escapes from falselock points successfully, and converges to near the optimum point for large eye height.

## Acknowledgments

This paper is based on results obtained from "Research and Development Project of the Enhanced Infrastructures for Post-5G Information and Communication Systems" (JPNP20017), commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

## References

- [1] Y. Frans, J. Shin, L. Zhou, P. Upadhyaya, J. Im, V. Kireev, M. Elzeftawi, H. Hedayati, T. Pham, S. Asuncion, C. Borrelli, G. Zhang, H. Zhang, and K. Chang, "A 56-Gb/s PAM4 wireline transceiver using a 32-way time-interleaved SAR ADC in 16-nm FinFET," IEEE J. Solid-State Circuits, vol.52, no.4, pp.1101–1110, April 2017.
- [2] J. Hudner, D. Carey, R. Casey, K. Hearne, P.W.D.F. Neto, I. Chlis, M. Erett, C.F. Poon, A. Laraba, H. Zhang, S.L.C. Ambatipudi, D. Mahashin, P. Upadhyaya, Y. Frans, and K. Chang, "A 112 Gb/s PAM4 wireline receiver using a 64-way time-interleaved SAR ADC in 16 nm FinFET," IEEE Symp. VLSI Circuits, June 2018.
- [3] P. Upadhyaya, C.F. Poon, S.W. Lim, J. Cho, A. Roldan, W. Zhang, J. Namkoong, T. Pham, B. Xu, W. Lin, H. Zhang, N. Narang, K.H. Tan, G. Zhang, Y. Frans, and K. Chang, "A fully adaptive 19-58-Gb/s PAM-4 and 9.5-29-Gb/s NRZ wireline transceiver with configurable ADC in 16-nm FinFET," IEEE J. Solid-State Circuits, vol.54, no.1, pp.18–28, Jan. 2019.
- [4] S. Kiran, S. Cai, Y. Luo, S. Hoyos, and S. Palermo, "A 52-Gb/s ADC-based PAM-4 receiver with comparator-assisted 2-bit/stage SAR ADC and partially unrolled DFE in 65-nm CMOS," IEEE J. Solid-State Circuits, vol.54, no.3, pp.659–671, March 2019.
- [5] M. Pisati, A. Minuti, G. Bollati, F. Giunco, R.G. Massolini, G. Cesura, F. De Bernardinis, P. Pascale, C. Nani, N. Ghittori, E. Pozzati, M. Sosio, M. Garampazzi, and A. Milani, "A 243-mW 1.25-56-Gb/s continuous range PAM-4 42.5-dB IL ADC/DAC-based transceiver in 7-nm FinFET," IEEE J. Solid-State Circuits, vol.55, no.1, pp.6–18, Jan. 2020.
- [6] Y. Krupnik, Y. Perelman, I. Levin, Y. Sanhedrai, R. Eitan, A. Khairi, Y. Shifman, Y. Landau, U. Virobnik, N. Dolev, A. Meisler, and A. Cohen, "112-Gb/s PAM4 ADC-based SERDES receiver with resonant AFE for long-reach channels," IEEE J. Solid-State Circuits, vol.55, no.4, pp.1077–1085, April 2020.
- [7] H. Lin, C. Boecker, M. Hossain, S. Tangirala, R. Vu, S. Vamvakos, E. Groen, S. Li, P. Choudhary, N. Wang, M. Shibata, H. Taghavi, M. van Ierssel, A. Maniyar, A. Wodkowski, N. Nguyen, and S. Desai, "A 4 × 112 Gb/s ADC-DSP based multistandard receiver in 7 nm FinFET," IEEE Symp. VLSI Circuits, June 2020.
- [8] J. Im, K. Zheng, C.-H.A. Chou, L. Zhou, J.W. Kim, S. Chen, Y. Wang, H.-W. Hung, K. Tan, W. Lin, A.B. Roldan, D. Carey, I. Chlis, R. Casey, A. Bekele, Y. Cao, D. Mahashin, H. Ahn, H. Zhang, Y. Frans, and K. Chang, "A 112-Gb/s PAM4 long-reach wireline transceiver using a 36-way time-interleaved SAR ADC and inverter-based RX analog front-end in 7-nm FinFET," IEEE J. Solid-State Circuits, vol.56, no.1, pp.7–18, Jan. 2021.
- [9] M.A. LaCroix, E. Chong, W. Shen, E. Nir, F.A. Musa, H. Mei, M.-M. Mohsenpour, S. Lebedev, B. Zamanlooy, C. Carvalho, Q. Xin, D. Petrov, H. Wong, H. Ho, Y. Xu, S.N. Shahi, P. Krotnev, C. Feist, H. Huang, and D. Tonietto, "A 116 Gb/s DSP-based wireline transceiver in 7 nm CMOS achieving 6 pJ/b at 45 dB loss in PAM-4/duo-PAM-4 and 52 dB in PAM-2," IEEE ISSCC, Feb. 2021.
- [10] A. Varzaghani, B. Bozorgzadeh, J. Lam, A. Goel, X. Yuan, M. Elzeftawi, M. Izad, S. Sarkar, A. Baldisserotto, S.-R. Ryu, S. Mikes, J. Hwang, V. Joshi, S. Naraghi, D. Kadia, M. Ranjbar, P. Lee, D. Loizos, S. Zogopoulos, S. Verma, and S. Sidiropoulos, "A 1-to-112 Gb/s DSP-based wireline transceiver with a flexible clocking

scheme in 5 nm FinFET," IEEE Symp. VLSI Circuits, June 2022.

- [11] A. Khairi, Y. Krupnik, A. Laufer, Y. Segal, M. Cusmai, I. Levin, A. Gordon, Y. Sabag, V. Rahinski, I. Lotan, G. Ori, N. Familia, S. Litski, T.W. Grafi, U. Virobnik, D. Lazar, Y. Horwitz, A. Balankutty, S. Kiran, S. Palermo, P.M. Li, F. O'Mahony, and A. Cohen, "A 1.41pJ/b 224-Gb/s PAM4 6-bit ADC-based SerDes receiver with hybrid AFE capable of supporting long reach channels," IEEE J. Solid-State Circuits, vol.58, no.1, pp.8–18, Jan. 2023.
- [12] R. Yousry, E. Chen, Y.-M. Ying, M. Abdullatif, M. Elbadry, A. ElShater, T.-B. Liu, J. Lee, D. Ramachandran, K. Wang, C.-H. Weng, M.-L. Wu, and T. Ali, "1.7 pJ/b 112 Gb/s XSR transceiver for intra-package communication in 7 nm FinFET technology," IEEE ISSCC, Feb. 2021.
- [13] R. Shivnaraine, M.V. Ierssel, K. Farzan, D. Diclemente, G. Ng, N. Wang, J. Musayev, G. Dutta, M. Shibata, A. Moradi, H. Vahedi, M. Farzad, P. Kainth, M. Yu, N. Nguyen, J. Pham, and A. McLaren, "A 26.5625-to-106.25 Gb/s XSR SerDes with 1.55 pJ/b efficiency in 7 nm CMOS," IEEE ISSCC, Feb. 2021.
- [14] B. Ye, K. Sheng, W. Gai, H. Niu, B. Zhang, Y. He, S. Jia, C. Chen, and J. Yu, "A 2.29 pJ/b 112 Gb/s wireline transceiver with RX 4-Tap FFE for medium-reach applications in 28 nm CMOS," IEEE ISSCC, Feb. 2022.
- [15] B. Zand, M. Bichan, A. Mahmoodi, M. Shashaani, J. Wang, R. Shulyzki, J. Guthrie, K. Tyshchenko, J. Zhao, E. Liu, N. Soltani, A. Freeman, R. Anand, S. Rubab, R. Khela, S. Sharifian, and K. Herterich, "A 1-58.125 Gb/s, 5–33 dB IL multi-protocol Ethernet-compliant analog PAM-4 receiver with 16 DFE Taps in 10 nm," IEEE ISSCC, Feb. 2022.
- [16] K. Mueller and M. Müller, "Timing recovery in digital synchronous data receivers," IEEE Trans. Commun., vol.24, no.5, pp.516–531, May 1976.
- [17] M.-C. Choi, H.-G. Ko, J. Oh, H.-Y. Joo, K. Lee, and D.-K. Jeong, "A 0.1-pJ/b/dB 28-Gb/s maximum-eye tracking, weight-adjusting MM CDR and adaptive DFE with single shared error sampler," IEEE Symp. VLSI Circuits, June 2020.
- [18] R. Dokania, A. Kern, M. He, A. Faust, R. Tseng, S. Weaver, K. Yu, C. Bil, T. Liang, and F. O'Mahony, "A 5.9 pJ/b 10 Gb/s serial link with unequalized MM-CDR in 14 nm tri-gate CMOS," IEEE ISSCC, Feb. 2015.
- [19] F. Tachibana, H.C. Ngo, G. Urakawa, T. Toi, M. Ashida, Y. Tsubouchi, M. Nozawa, J. Wadatsumi, H. Kobayashi, and J. Deguchi, "A 56-Gb/s PAM4 transceiver with false-lock-aware lock-ing scheme for Mueller-Müller CDR," IEEE ESSCIRC, Sept. 2022.
- [20] B. Razavi, RF Microelectronics, 2nd ed., Prentice Hall, 2011.
- [21] G. Urakawa, H. Kobayashi, J. Deguchi, and R. Fujimoto, "A noisecanceling charge pump for area efficient PLL design," IEEE Symp. RFIT, Sept. 2020.



**Fumihiko Tachibana** received the B.E. and M.E. degrees in electronics engineering from the University of Tokyo, Tokyo, Japan, in 2003 and 2005, respectively. In 2005, he joined the Center for Semiconductor Research and Development, Toshiba Corporation, Kawasaki, Japan, where he was engaged in research and development of low-power digital circuits, embedded SRAMs, image sensors, and high speed I/O. From 2013 to 2014, he was a Visiting Scholar with Stanford University, Stanford, CA, USA,

where he was involved in research on energy efficient image sensors. In 2017, he joined Kioxia Corporation, Kawasaki, Japan, where he has been engaged in research and development of efficient hardware and algorithms for machine learning applications, data converter and DSP for high speed I/O. His current research interests include data converter and DSP for high speed I/O.



Huy Cu Ngo received the B.E. degree in electrical and electronic engineering and the M.E. degree in physical electronics from the Tokyo Institute of Technology, Tokyo, Japan, in 2015 and 2017, respectively. In 2017, he joined NTT Device Technology Laboratories, Atsugi, Japan where he was engaged in research of highspeed optical interconnects and deep learning inference accelerator using field-programmable gate array (FPGA). In 2019, he joined Kioxia Corporation, Kawasaki, Japan, where he is in-

volved in research and development of analog mixed-signal circuits and architectures for advanced high-speed wireline communication. His current interests include high-speed wireline transceivers and high speed analog to digital converters.



**Go Urakawa** received B.E. and M.E. degrees from Kyushu University, Fukuoka, Japan, in 2002 and 2004 respectively. In 2004, he joined the circuit design section of highfrequency analog integrated circuit in Semiconductor Company, Toshiba Corporation, Yokohama Japan, where he was engaged in the development of integrated PLLs. In 2017, he joined Kioxia Corporation, Kawasaki, Japan. He has been engaged in research and development of an advanced circuit design on high speed I/O.



**Takashi Toi** received the B.S. and M.S. degrees in electrical and electronic engineering from the University of Tokyo, Tokyo, Japan, in 2014 and 2016, respectively. In 2016, he joined the Center for Semiconductor Research & Development, Toshiba Corporation, where he was involved in the development of clock and data recovery circuits for high-speed wireline communication. In 2017, he moved to Kioxia Corporation, Kawasaki, Japan. His present research interests include ultra-high speed I/O design.



**Mitsuyuki Ashida** received the B.S degree in physics from Science University of Tokyo, Tokyo, Japan, in 1999, and the M.S. degree in electronics from Tokyo Institute of Technology, Tokyo, Japan, in 2001. In 2001 he joined the Research & Development Center, Toshiba Corp., Kawasaki, Japan. From 2004, he has been with the Center for Semiconductor Research & Development, Kawasaki, Japan. In 2017, he joined Kioxia Corporation, Kawasaki, Japan, and he has been engaged in the design of analog cir-

cuits for high-speed wireline communications.



**Hiroyuki Kobayashi** received the B.E. degree in electronic engineering from the Osaka Institute of Technology, Osaka, Japan, in 1998 and the M.E. degree in electronic engineering from Osaka University, Suita, Japan, in 2000. In 2000, he joined Toshiba Corporation, Kawasaki, Japan, where he was involved in the research and development of analog and RF circuits for wireless communications. In 2017, he joined Kioxia Corporation, Kawasaki, Japan. He has been engaged in research and development of an

advanced circuit design on high speed I/O.



Yuta Tsubouchi received B.E. and M.S. degree in electronic engineering from Kyoto Institute of Technology, Japan, in 2007 and 2009, respectively. In 2009, he joined the Corporate Research & Development Center, Toshiba Corporation, where he was involved in the development of millimeter-wave transceivers and optical transceivers. From 2012 to 2017, he was an analog circuit engineer with the Center for Semiconductor Research & Development, Toshiba Corporation, Kawasaki, Japan. He is now

with the Institute of Memory Technology Research & Development, Kioxia Corporation, Kawasaki, Japan. His present research interests include signal and power integrity of high-speed PCB systems.



**Mai Nozawa** received the B.E. and M.E. degrees from Waseda University, Tokyo, Japan, in 2004 and 2006. In 2006, she joined To-shiba Corporation, Kawasaki, Japan, where she was involved in the research and development of analog integrated circuits for wireless communications. In 2017, she joined Kioxia Corporation, Kawasaki, Japan. She has been engaged in research and development of an advanced circuit design on high-speed I/O.



Junji Wadatsumi received the B.E. and M.E. degrees from Tokyo Institute of Technology, Tokyo, Japan, in 2003 and 2005, respectively. In 2005, he joined the Center for Semiconductor Research & Development, Toshiba Corporation, Kawasaki, Japan, where he was engaged in research and development of analog and RF circuits for wireless communications. In 2017, he moved to Institute of Memory Technology Research & Development, Kioxia Corporation, and he has been engaged in the design of

analog circuits and systems for high-speed wireline communications.



Jun Deguchi received the B.E. and M.E. degrees in machine intelligence and systems engineering and the Ph.D. degree in bioengineering and robotics from Tohoku University, Sendai, Japan, in 2001, 2003, and 2006, respectively. In 2004, he was a Visiting Scholar at the University of California, Santa Cruz, CA, USA. In 2006, he joined Toshiba Corporation, and was involved in design of analog/RF circuits for wireless communications, CMOS image sensors, high-speed I/O, and accelerators for deep learning. From

2014 to 2015, he was a Visiting Scientist at the MIT Media Lab, Cambridge, MA, USA, and was involved in research on brain/neuro science. In 2017, he moved to Kioxia Corporation (formerly Toshiba Memory Corporation), and has been a Research Lead of an advanced circuit design team working on high-speed I/O, deep learning/neuromorphic accelerators and quantum annealing. Dr. Deguchi has served as a member of the technical program committee (TPC) of IEEE International Solid-State Circuits Conference (ISSCC) since 2016, and IEEE Asian Solid-State Circuits Conference (A-SSCC) since 2017. He has also served as a TPC vice-chair of IEEE A-SSCC 2019, and a review committee member of IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) 2020.