1-9hit |
Mitsuo IKEDA Toshio KONDO Koyo NITTA Kazuhito SUGURI Takeshi YOSHITOME Toshihiro MINAMI Jiro NAGANUMA Takeshi OGURA
This paper presents an architecture for a single-chip MPEG-2 video encoder and demonstrates its flexibility and usefulness. The architecture based on three-layer cooperation provides flexible data-transfer that improves the encoder from the standpoints of versatility, scalability, and video quality. The LSI was successfully fabricated in the 0.25-µm four-metal CMOS process. Its small size and its low power consumption make it ideal for a wide range of applications, such as DVD recorders, PC-card encoders and HDTV encoders.
Hiroyuki UZAWA Kazuhiko TERADA Koyo NITTA
The power consumption of optical network units (ONUs) is a major issue in optical access networks. The downstream buffer is one of the largest power consumers among the functional blocks of an ONU. A cyclic sleep scheme for reducing power has been reported, which periodically powers off not only the downstream buffer but also other components, such as optical transceivers, when the idle period is long. However, when the idle period is short, it cannot power off those components even if the input data rate is low. Therefore, as continuous traffic, such as video, increases, the power-reduction effect decreases. To resolve this issue, we propose another sleep scheme in which the downstream buffer can be partially powered off by cooperative operation with an optical line terminal. Simulation and experimental results indicate that the proposed scheme reduces ONU power consumption without causing frame loss even while the ONU continuously receives traffic and the idle period is short.
Koyo NITTA Toshihiro MINAMI Toshio KONDO Takeshi OGURA
This paper describes a unique motion estimation and compensation (ME/MC) hardware architecture for a scene-adaptive algorithm. By statistically analyzing the characteristics of the scene being encoded and controlling the encoding parameters according to the scene, the quality of the decoded image can be enhanced. The most significant feature of the architecture is that the two modules for ME/MC can work independently. Since a time interval can be inserted between the operations of the two modules, a scene-adaptive algorithm can be implemented in the architecture. The ME/MC architecture is loaded on a single-chip MPEG-2 video encoder.
Koyo NITTA Hiroe IWASAKI Takayuki ONISHI Takashi SANO Atsushi SAGATA Yasuyuki NAKAJIMA Minoru INAMORI Ryuichi TANIDA Atsushi SHIMIZU Ken NAKAMURA Mitsuo IKEDA Jiro NAGANUMA
An H.264/AVC encoder LSI (named “SARA”) that supports High422 profile, as well as 422 profile of MPEG-2, has been developed for HDTV broadcasting infrastructures. It contains three motion estimation and compensation (ME/MC) engines with wide search ranges of -217.75 to +199.75 horizontally, -109.75 to +145.75 vertically, which can utilize almost all H.264/AVC ME/MC coding tools, such as multiple reference frame, variable block size, quarter-pel prediction, macroblock adaptive field/frame prediction (MBAFF), spatial/temporal direct mode, and weighted prediction. Our evaluations show that it can encode fast moving scenes with 1.2 dB to 1.7 dB higher than the JM. It was successfully fabricated in a 90-nm technology, and integrates 140 million transistors.
Saki HATTA Nobuyuki TANAKA Hiroyuki UZAWA Koyo NITTA
The application of network function virtualization (NFV) and software-defined networking (SDN) to passive optical networks (PONs) is attracting attention for the deployment of cost-effective access network systems. This paper presents a novel architecture of a programmable finite state machine (P-FSM) as a hardware accelerator for protocol processing in an optical line terminal (OLT). The P-FSM is programmable hardware that manages various types of FSMs to enhance flexibility in OLTs and achieve wired-rate performance with a negligible increase in total chip area. The P-FSM is implemented using three key technologies: a specific architecture for state management of communications protocols to minimize the logic area, a memory distributed implementation to minimize the program memory, and a new branch operation to minimize the memory area and reduce processing time. Evaluation results show that the P-FSM can handle 10G-EPON/NG-PON2 communications protocols in the same architecture while achieving wired-rate performance. The increase in the total designed area is only 1.5% to 4.9% depending on the number of protocols supported compared to the area of a conventional communications SoC without flexibility. We also clarify that our architecture has the scalability needed to modify the number of FSMs and the maximum number of ONU connections according to the system scale.
Yuta UKON Koji YAMAZAKI Koyo NITTA
Advanced information-processing services based on cloud computing are in great demand. However, users want to be able to customize cloud services for their own purposes. To provide image-processing services that can be optimized for the purpose of each user, we propose a technique for chaining image-processing functions in a CPU-field programmable gate array (FPGA) coupled server architecture. One of the most important requirements for combining multiple image-processing functions on a network, is low latency in server nodes. However, large delay occurs in the conventional CPU-FPGA architecture due to the overheads of packet reordering for ensuring the correctness of image processing and data transfer between the CPU and FPGA at the application level. This paper presents a CPU-FPGA server architecture with a real-time packet reordering circuit for low-latency image processing. In order to confirm the efficiency of our idea, we evaluated the latency of histogram of oriented gradients (HOG) feature calculation as an offloaded image-processing function. The results show that the latency is about 26 times lower than that of the conventional CPU-FPGA architecture. Moreover, the throughput decreased by less than 3.7% under the worst-case condition where 90 percent of the packets are randomly swapped at a 40-Gbps input rate. Finally, we demonstrated that a real-time video monitoring service can be provided by combining image processing functions using our architecture.
Kazuyoshi TAKAGI Koyo NITTA Hironori BOUNO Yasuhiko TAKENAGA Shuzo YAJIMA
Ordered Binary Decision Diagrams (OBDDs) are graph-based representations of Boolean functions which are widely used because of their good properties. In this paper, we introduce nondeterministic OBDDs (NOBDDs) and their restricted forms, and evaluate their expressive power. In some applications of OBDDs, canonicity, which is one of the good properties of OBDDs, is not necessary. In such cases, we can reduce the required amount of storage by using OBDDs in some non-canonical form. A class of NOBDDs can be used as a non-canonical form of OBDDs. In this paper, we focus on two particular methods which can be regarded as using restricted forms of NOBDDs. Our aim is to show how the size of OBDDs can be reduced in such forms from theoretical point of view. Firstly, we consider a method to solve satisfiability problem of combinational circuits using the structure of circuits as a key to reduce the NOBDD size. We show that the NOBDD size is related to the cutwidth of circuits. Secondly, we analyze methods that use OBDDs to represent Boolean functions as sets of product terms. We show that the class of functions treated feasibly in this representation strictly contains that in OBDDs and contained by that in NOBDDs.
Ken NAKAMURA Daisuke KOBAYASHI Yuya OMORI Tatsuya OSAWA Takayuki ONISHI Koyo NITTA Hiroe IWASAKI
In this paper, we describe a novel low-delay 4K 120-fps real-time HEVC decoder with a parallel processing architecture that conforms to the HEVC main 4:2:2 10 profile. It supports the hierarchical temporal scalable streams required for Ultra High Definition high-frame-rate broadcasting and also supports low-delay and high-bitrate decoding for video transmission uses. To achieve this support, the decoding processes are parallelized and pipelined at the frame level, slice level, and coding tree unit row level. The proposed decoder was implemented on three FPGAs operated at 133 and 150 MHz, and it achieved 300-Mbps stream decoding and 37-msec end-to-end delay with our concurrently developed 4K 120-fps encoder.
Ken NAKAMURA Yuya OMORI Daisuke KOBAYASHI Koyo NITTA Kimikazu SANO Masayuki SATO Hiroe IWASAKI Hiroaki KOBAYASHI
This paper proposes an efficient reference image sharing method for the image-division parallel video encoding architecture. This method efficiently reduces the amount of data transfer by using pre-transfer with area prediction and on-demand transfer with a transfer management table. Experimental results show that the data transfer can be reduced to 19.8-35.3% of the conventional method on average without major degradation of coding performance. This makes it possible to reduce the required bandwidth of the inter-chip transfer interface by saving the amount of data transfer.