Dynamic linear feedback shift registers (DLFSRs) are a scheme to transfer from one LFSR to another. In cryptography each LFSR included in a DLFSR should generate maximal-length sequences, and the number of switches transferring LFSRs should be small for efficient performance. This corresponding addresses on searching such conditioned DLFSRs. An efficient probabilistic algorithm is given to find such DLFSRs with two or four switches, and it is proved to succeed with nonnegligible probability.
Yizhou JIANG Sai HUANG Yixin ZHANG Zhiyong FENG Di ZHANG Celimuge WU
This letter proposes a novel modulation classification method for overlapped sources named LRGP involving multinomial logistic regression (MLR) and multi-gene genetic programming (MGGP). MGGP based feature engineering is conducted to transform the cumulants of the received signals into highly discriminative features and a MLR based classifier is trained to identify the combination of the modulation formats of the overlapped sources instead of signal separation. Extensive simulations demonstrate that LRGP yields superior performance compared with existing methods.
Naoki FUJIEDA Kiyohiro SATO Ryodai IWAMOTO Shuichi ICHIKAWA
Instruction set randomization (ISR) is a cost-effective obfuscation technique that modifies or enhances the relationship between instructions and machine languages. An Instruction Register File (IRF), a list of frequently used instructions, can be used for ISR by providing the way of indirect access to them. This study examines the IRF that integrates a positional register, which was proposed as a supplementary unit of the IRF, for the sake of tamper resistance. According to our evaluation, with a new design for the contents of the positional register, the measure of tamper resistance was increased by 8.2% at a maximum, which corresponds to a 32.2% increase in the size of the IRF. The number of logic elements increased by the addition of the positional register was 3.5% of its baseline processor.
Yusuke KIMURA Amir Masoud GHAREHBAGHI Masahiro FUJITA
In the process of VLSI design, ECO (Engineering Change Order) may occur at any design phase. When ECO happens after the netlist is generated and optimized, designers may like to modify the netlist directly. This is because if ECO is performed in the high-level description, the netlist should be resynthesized and the result may be significantly different from the original one, even if the modification in the high-level description is small. As the result, the efforts spent on optimization so far may become useless. When the netlist is modified directly, the C description should be revised accordingly. This paper proposes a method to reconstruct a C description from the revised netlist. In the proposed method, designers need to provide a template represented in C, which has some vacant (blanked) places and is created from the original C description. The vacant places are automatically synthesized using a CEGIS-based method (Counter Example Guided Inductive Synthesis). Using a set of use-cases, our method tries to find the correct expressions for the vacant places so that the entire description becomes functionally equivalent to the given modified netlist, by only simulating the netlist. Experimental results show that the proposed method can reconstruct C descriptions successfully within practical time for several examples including the one having around 9,000 lines of executable statements. Moreover, the proposed method can be applied to equivalence checking between a netlist and a C description, as shown by our experimental results.
MohammadAmin LOTFOLAHI Cheng-Zen YANG I-Shyan HWANG AliAkbar NIKOUKAR Yu-Hua WU
Ethernet passive optical network (EPON) is one of the energy-efficient access networks. Many studies have been done to reach maximum energy saving in the EPON. However, it is a trade-off between achieving maximum energy saving and guaranteeing QoS. In this paper, a predictive doze mode mechanism in an enhanced EPON architecture is proposed to achieve energy saving by using a logistic regression (LR) model. The optical line terminal (OLT) in the EPON employs an enhanced Doze Manager practicing the LR model to predict the doze periods of the optical network units (ONUs). The doze periods are estimated more accurately based on the historical high-priority traffic information, and logistic regression DBA (LR-DBA) performs dynamic bandwidth allocation accordingly. The proposed LR-DBA mechanism is compared with a scheme without energy saving (IPACT) and another scheme with energy saving (GDBA). Simulation results show that LR-DBA effectively improves the power consumption of ONUs in most cases, and the improvement can be up to 45% while it guarantees the QoS metrics, such as the high-priority traffic delay and jitter.
A 2nd-order ΔΣAD modulator architecture is proposed to simplify the operation phase using ring amplifier and SAR quantizer. The proposed modulator architecture can guarantee the reset time for ring amplifier and relax the speed requirement on asynchronous SAR quantizer. The SPICE simulation results demonstrate the feasibility of the proposed 2nd-order ΔΣAD modulator in 90nm CMOS technology. Simulated SNDR of 95.70dB is achieved while a sinusoid -1dBFS input is sampled at 60MS/s for the bandwidth is BW=470kHz. The power consumption of the analog part in the modulator is 1.67mW while the supply voltage is 1.2V.
Ryo FUJIMOTO Takanori FUJISAWA Masaaki IKEHARA
This paper proposes a novel method to estimate non-integer shift of images based on least squares approximation in the phase region. Conventional methods based on Phase Only Correlation (POC) take correlation between an image and its shifted image, and then estimate the non-integer shift by fitting the model equation. The problem when estimating using POC is that the estimated peak of the fitted model equation may not match the true peak of the POC function. This causes error in non-integer shift estimation. By calculating the phase difference directly in the phase region, the proposed method allows the estimation of sub-pixel shift through least squares approximation. Also by utilizing the characteristics of natural images, the proposed method limits adoption range for least squares approximation. By these improvements, the proposed method achieves high accuracy, and we validate through some examples.
Kotaro TERADA Masao YANAGISAWA Nozomu TOGAWA
As application hardware designs and implementations in a short term are required, high-level synthesis is more and more essential EDA technique nowadays. In deep-submicron era, interconnection delays are not negligible even in high-level synthesis thus distributed-register and -controller architectures (DR architectures) have been proposed in order to cope with this problem. It is also profitable to take data-bitwidth into account in high-level synthesis. In this paper, we propose a bitwidth-aware high-level synthesis algorithm using operation chainings targeting Tiled-DR architectures. Our proposed algorithm optimizes bitwidths of functional units and utilizes the vacant tiles by adding some extra functional units to realize effective operation chainings to generate high performance circuits without increasing the total area. Experimental results show that our proposed algorithm reduces the overall latency by up to 47% compared to the conventional approach without area overheads by eliminating unnecessary bitwidths and adding efficient extra FUs for Tiled-DR architectures.
Mobility management is very important in mobile cellular networks, since to connect incoming calls, the network must maintain the locations of the mobiles. This study considers the zone-based registration methods that most mobile cellular networks have adopted. We focus on two special zone-based registration methods, called two-zone registration (2Z), and two-zone registration with implicit registration by outgoing calls (2Zi). Although some mathematical models for their performances have been presented, they still cannot accurately estimate 2Zi performance. We provide a new and simple mathematical model based on Markov chain theory that can accurately analyze the performances of 2Z and 2Zi. We also explain the propositions underlying the explicit expressions adopted by our model. We finally present various numerical results, to compare the performance of 2Zi with those of 2Z and one-zone registration (1Z), and show that in every case, 2Zi is superior to 2Z, and in most practical cases, to 1Z.
Kousuke IMAMURA Ryota HONDA Yoshifumi KAWAMURA Naoki MIURA Masami URANO Satoshi SHIGEMATSU Tetsuya MATSUMURA Yoshio MATSUDA
The development of an extremely efficient packet inspection algorithm for lookup engines is important in order to realize high throughput and to lower energy dissipation. In this paper, we propose a new lookup engine based on a combination of a mismatch detection circuit and a linked-list hash table. The engine has an automatic rule registration and deletion function; the results are that it is only necessary to input rules, and the various tables included in the circuits, such as the Mismatch Table, Index Table, and Rule Table, will be automatically configured using the embedded hardware. This function utilizes a match/mismatch assessment for normal packet inspection operations. An experimental chip was fabricated using 40-nm 8-metal CMOS process technology. The chip operates at a frequency of 100MHz under a power supply voltage of VDD =1.1V. A throughput of 100Mpacket/s (=51.2Gb/s) is obtained at an operating frequency of 100MHz, which is three times greater than the throughput of 33Mpacket/s obtained with a conventional lookup engine without a mismatch detection circuit. The measured energy dissipation was a 1.58pJ/b·Search.
Kazuyoshi TSUCHIYA Yasuyuki NOGAMI
Pseudorandom number generators have been widely used in Monte Carlo methods, communication systems, cryptography and so on. For cryptographic applications, pseudorandom number generators are required to generate sequences which have good statistical properties, long period and unpredictability. A Dickson generator is a nonlinear congruential generator whose recurrence function is the Dickson polynomial. Aly and Winterhof obtained a lower bound on the linear complexity profile of a Dickson generator. Moreover Vasiga and Shallit studied the state diagram given by the Dickson polynomial of degree two. However, they do not specify sets of initial values which generate a long period sequence. In this paper, we show conditions for parameters and initial values to generate long period sequences, and asymptotic properties for periods by numerical experiments. We specify sets of initial values which generate a long period sequence. For suitable parameters, every element of this set occurs exactly once as a component of generating sequence in one period. In order to obtain sets of initial values, we consider a logistic generator proposed by Miyazaki, Araki, Uehara and Nogami, which is obtained from a Dickson generator of degree two with a linear transformation. Moreover, we remark on the linear complexity profile of the logistic generator. The sets of initial values are described by values of the Legendre symbol. The main idea is to introduce a structure of a hyperbola to the sets of initial values. Our results ensure that generating sequences of Dickson generator of degree two have long period. As a consequence, the Dickson generator of degree two has some good properties for cryptographic applications.
Hideo FUJIWARA Katsuya FUJIWARA
In our previous work, we introduced new concepts of secure scan design; shift register equivalent circuits (SR-equivalents, for short) and strongly secure circuits, and also introduced generalized shift registers (GSRs, for short) to apply them to secure scan design. In this paper, we combine both concepts of SR-equivalents and strongly secure circuits and apply them to GSRs, and consider the synthesis problem of strongly secure SR-equivalents using GSRs. We also consider the enumeration problem of GSRs that are strongly secure and SR-equivalent, i.e., the cardinality of the class of strongly secure SR-equivalent GSRs to clarify the security level of the secure scan architecture.
Ming LI Yupeng JIANG Dongdai LIN Qiuyan WANG
We regard a De Bruijn sequence of order n as a bijection on $mathbb{F}_2^n$ and consider the transition mappings between them. It is shown that there are only two conjugate transformations that always transfer De Bruijn sequences to De Bruijn sequences.
Junji YAMADA Ushio JIMBO Ryota SHIOYA Masahiro GOSHIMA Shuichi SAKAI
The region that includes the register file is a hot spot in high-performance cores that limits the clock frequency. Although multibanking drastically reduces the area and energy consumption of the register files of superscalar processor cores, it suffers from low IPC due to bank conflicts. Our skewed multistaging drastically reduces not the bank conflict probability but the pipeline disturbance probability by the second stage. The evaluation results show that, compared with NORCS, which is the latest research on a register file for area and energy efficiency, a proposed register file with 18 banks achieves a 39.9% and 66.4% reduction in circuit area and in energy consumption, while maintaining a relative IPC of 97.5%.
Junji YAMADA Ushio JIMBO Ryota SHIOYA Masahiro GOSHIMA Shuichi SAKAI
An 8-issue superscalar core generally requires a 24-port RAM for the register file. The area and energy consumption of a multiported RAM increase in proportional to the square of the number of ports. A register cache can reduce the area and energy consumption of the register file. However, earlier register cache systems suffer from lower IPC caused by register cache misses. Thus, we proposed the Non-Latency-Oriented Register Cache System (NORCS) to solve the IPC problem with a modified pipeline. We evaluated NORCS mainly from the viewpoint of microarchitecture in the original article, and showed that NORCS maintains almost the same IPC as conventional register files. Researchers in NVIDIA adopted the same idea for their GPUs. However, the evaluation was not sufficient from the viewpoint of LSI design. In the original article, we used CACTI to evaluate the area and energy consumption. CACTI is a design space exploration tool for cache design, and adopts some rough approximations. Therefore, this paper shows design of NORCS with FreePDK45, an open source process design kit for 45nm technology. We performed manual layout of the memory cells and arrays of NORCS, and executed SPICE simulation with RC parasitics extracted from the layout. The results show that, from a full-port register file, an 8-entry NORCS achieves a 75.2% and 48.2% reduction in area and energy consumption, respectively. The results also include the latency which we did not present in our original article. The latencies of critical path is 307ps and 318ps for an 8-entry NORCS and a conventional multiported register file, respectively, when the same two cycles are allocated to register file read.
Wei GAO Lin HAN Rongcai ZHAO Yingying LI Jian LIU
Single-instruction multiple-data (SIMD) extension provides an energy-efficient platform to scale the performance of media and scientific applications while still retaining post-programmability. However, the major challenge is to translate the parallel resources of the SIMD hardware into real application performance. Currently, all the slots in the vector register are used when compilers exploit SIMD parallelism of programs, which can be called sufficient vectorization. Sufficient vectorization means all the data in the vector register is valid. Because all the slots which vector register provides must be used, the chances of vectorizing programs with low SIMD parallelism are abandoned by sufficient vectorization method. In addition, the speedup obtained by full use of vector register sometimes is not as great as that obtained by partial use. Specifically, the length of vector register provided by SIMD extension becomes longer, sufficient vectorization method cannot exploit the SIMD parallelism of programs completely. Therefore, insufficient vectorization method is proposed, which refer to partial use of vector register. First, the adaptation scene of insufficient vectorization is analyzed. Second, the methods of computing inter-iteration and intra-iteration SIMD parallelism for loops are put forward. Furthermore, according to the relationship between the parallelism and vector factor a method is established to make the choice of vectorization method, in order to vectorize programs as well as possible. Finally, code generation strategy for insufficient vectorization is presented. Benchmark test results show that insufficient vectorization method vectorized more programs than sufficient vectorization method by 107.5% and the performance achieved by insufficient vectorization method is 12.1% higher than that achieved by sufficient vectorization method.
This paper proposes a low power single-ended successive approximation register (SAR) analog-to-digital converter (ADC) to replace the only analog active circuit, the comparator, with a digital circuit, which is an inverter-based comparator. The replacement helps possible design automation. The inverter threshold voltage variation impact is minimal because an SAR ADC has only one comparator, and many applications are either insensitive to the resulting ADC offset or easily corrected digitally. The proposed resetting approach mitigates leakage when the input is close to the threshold voltage. As an intrinsic headroom-free, and thus low-rail-voltage, friendly structure, an inverter-based comparator also occupies a small area. Furthermore, an 11-bit ADC was designed and manufactured through a 0.35-µm CMOS process by adopting a low-power switching procedure. The ADC achieves an FOM of 181fJ/Conv.-step at a 25kS/s sampling rate when the supply voltage VDD is 1.2V.
The three dimensional (3D) reconstruction of a medical image sequence can provide intuitive morphologies of a target and help doctors to make more reliable diagnosis and give a proper treatment plan. This paper aims to reconstruct the surface of a renal corpuscle from the microscope renal biopsy image sequence. First, the contours of renal corpuscle in all slices are extracted automatically by using a context-based segmentation method with a coarse registration. Then, a new coevolutionary-based strategy is proposed to realize a fine registration. Finally, a Gauss-Seidel iteration method is introduced to achieve a non-rigid registration. Benefiting from the registrations, a smooth surface of the target can be reconstructed easily. Experimental results prove that the proposed method can effectively register the contours and give an acceptable surface for medical doctors.
Lin GAO Jian HUANG Wen SUN Ping WEI Hongshu LIAO
The cardinality balanced multi-target multi-Bernoulli (CBMeMBer) filter has emerged as a promising tool for tracking a time-varying number of targets. However, the standard CBMeMBer filter may perform poorly when measurements are coupled with sensor biases. This paper extends the CBMeMBer filter for simultaneous target tracking and sensor biases estimation by introducing the sensor translational biases into the multi-Bernoulli distribution. In the extended CBMeMBer filter, the biases are modeled as the first order Gauss-Markov process and assumed to be uncorrelated with target states. Furthermore, the sequential Monte Carlo (SMC) method is adopted to handle the non-linearity and the non-Gaussian conditions. Simulations are carried out to examine the performance of the proposed filter.
Hideo FUJIWARA Katsuya FUJIWARA
We reported a secure scan design approach using shift register equivalents (SR-equivalents, for short) that are functionally equivalent but not structurally equivalent to shift registers [10 and also introduced generalized shift registers (GSRs, for short) to apply them to secure scan design [11]-[13]. In this paper, we combine both concepts of SR-equivalents and GSRs and consider the synthesis problem of SR-equivalent GSRs, i.e., how to modify a given GSR to an SR-equivalent GSR. We also consider the enumeration problem of SR-equivalent GFSRs, i.e., the cardinality of the class of SR-equivalent GSRs to clarify the security level of the secure scan architecture.