1-9hit |
The CC-Link proposed by the Mitsubishi Electric Company is an industrial network used exclusively in most industries. However, the probabilities of data loss and interference with equipment control increase if the transmission time is greater than the link scan time of 381µs. The link scan time can be reduced by designing the CC-Link module as an external microprocessor (MPU) interface of R-IN32M3; however, it then suffers from expandability issues. Thus, in this paper, we propose a new CC-Link module utilizing R-IN32M3 to improve the expandability. In our designed CC-Link module, we devise a dual-port RAM (DPRAM) function in an external I/O module, which enables parallel communication between the DPRAM and the external MPU. Our experiment with the implemented CC-Link prototype demonstrates that our CC-Link design improves the communication speed owing to the parallel communication between DPRAM and external MPU, and expandability of remote I/O. Our design achieves miniaturization of the CC-Link module, wiring reduction, and an approximately 30% reduction in the link scan time. Furthermore, because we utilize both the Renesas R-IN32M3 and Xilinx XC95144XL chips widely used in diverse application areas, the designed CC-Link module reduces the investment cost. The proposed design is expected to significantly contribute to the utilization of the programmable logic controller memory and I/O expansion for factory automation and improvement of the investment efficiency in the flat panel display industry.
Sanchuan GUO Zhenyu LIU Guohong LI Takeshi IKENAGA Dongsheng WANG
H.264 video codec system requires big capacity and high bandwidth of Frame Store (FS) for buffering reference frames. The up-to-date three dimensional (3D) stacked Phase change Random Access Memory (PRAM) is the promising approach for on-chip caching the reference signals, as 3D stacking offers high memory bandwidth, while PRAM possesses the advantages in terms of high density and low leakage power. However, the write endurance problem, that is a PRAM cell can only tolerant limited number of write operations, becomes the main barrier in practical applications. This paper studies the wear reduction techniques of PRAM based FS in H.264 codec system. On the basis of rate-distortion theory, the content oriented selective writing mechanisms are proposed to reduce bit updates in the reference frame buffers. With the proposed control parameter a, our methods make the quantitative trade off between the quality degradation and the PRAM lifetime prolongation. Specifically, taking a in the range of [0.2,2], experimental results demonstrate that, our methods averagely save 29.9–35.5% bit-wise write operations and reduce 52–57% power, at the cost of 12.95–20.57% BDBR bit-rate increase accordingly.
Xu LI Kai LU Xiaoping WANG Bin DAI Xu ZHOU
Existing large-scale systems suffer from various hardware/software failures, motivating the research of fault-tolerance techniques. Checkpoint-restart techniques are widely applied fault-tolerance approaches, especially in scientific computing systems. However, the overhead of checkpoint largely influences the overall system performance. Recently, the emerging byte-addressable, persistent memory technologies, such as phase change memory (PCM), make it possible to implement checkpointing in arbitrary data granularity. However, the impact of data granularity on the checkpointing cost has not been fully addressed. In this paper, we investigate how data granularity influences the performance of a checkpoint system. Further, we design and implement a high-performance checkpoint system named AG-ckpt. AG-ckpt is a hybrid-granularity incremental checkpointing scheme through: (1) low-cost modified-memory detection and (2) fine-grained memory duplication. Moreover, we also formulize the performance-granularity relationship of checkpointing systems through a mathematical model, and further obtain the optimum solutions. We conduct the experiments through several typical benchmarks to verify the performance gain of our design. Compared to conventional incremental checkpoint, our results show that AG-ckpt can reduce checkpoint data amount up to 50% and provide a speedup of 1.2x-1.3x on checkpoint efficiency.
Hideki HORII Jeong Hee PARK Ji Hye YI Bong Jin KUH Yong Ho HA
We have integrated a phase change random access memory (PRAM), completely based on 0.24 µm-CMOS technologies using nitrogen doped GeSbTe films. The Ge2Sb2Te5 (GST) thin films are well known to play a critical role in writing current of PRAM. Through device simulation, we found that high-resistive GST is indispensable to minimize the writing current of PRAM. For the first time, we found the resistivity of GST film can be controlled with nitrogen doping. Doping nitrogen to GST film successfully reduced writing current. A 0.24 µm PRAM using N-doped GST films were demonstrated with writing pulse of 0.8 mA-50 ns for RESET and 0.4 mA-100 ns for SET. Also, the cell endurance has been enhanced with grain growth suppression effect of dopant nitrogen. Endurance performance of fully integrated PRAM using N-doped GST shows no fail bit up to 2E9 cycles. Allowing 1% failures, extrapolation to 85 indicates retention time of 2 years. All the results show that PRAM is one of the most promising candidates in the market for the next generation memories.
The center problem of a graph is motivated by a number of facility location problems. In this paper, we propose parallel algorithms for finding the center of interval graphs and circular-arc graphs. Our algorithms run in O(log n) time algorithm using O(n/log n) processors while the intervals and arcs are given in sorted order. Our algorithms are on the EREW PRAM model.
Yuh-Rau WANG Shi-Jinn HORNG Yu-Hua LEE Pei-Zong LEE
Based on the dimensionality reduction technique and the solution for proximate points problem, we achieve the optimality of the three-dimensional Euclidean distance transform (3D_EDT) computation. For an N N N binary image, our algorithms for both 3D_EDT and its applications can be performed in O (log log N) time using CRCW processors or in O (log N) time using EREW processors. To the best of our knowledge, all results described above are the best known. As for the n-dimensional Euclidean distance transform (nD_EDT) and its applications of a binary image of size Nn, all of them can be computed in O (nlog log N) time using CRCW processors or in O (nlog N) time using EREW processors.
The objective of thinning is to reduce the amount of information in image patterns to the minimum needed for recognition. Thinned image helps the extraction of important features such as end points, junction points, and connections from image patterns. The ultimate goal of parallel algorithms is to minimize the execution time while producing high quality thinned image. Though much research has been performed for parallel thinning algorithms, there has been no systematical approach for comparing the execution speed of parallel thinning algorithms. Several rough comparisons have been done in terms of iteration numbers. But, such comparisons may lead to wrong guides since the time required for iterations varies from one algorithm to the other algorithm. This paper proposes a formal method to analyze the performance of parallel thinning algorithms based on PRAM (Parallel Random Access Machine) model. Besides, the quality of skeletons, robustness to boundary noise sensitivity, and execution speed are considered. Six parallel algorithms, which shows relatively high performance, are selected, and analyzed based on the proposed analysis method. Experiments show that the proposed analysis method is sufficiently accurate to evaluate the performance of parallel thinning algorithms.
Akihiro FUJIWARA Michiko INOUE Toshimitsu MASUZAWA Hideo FUJIWARA
The medial axis transform (MAT) is an image representation scheme. For a binary image, the MAT is defined as a set of upright maximal squares which consist of pixels of value l entirely. The MAT plays an important role in image understanding. This paper presents a parallel algorithm for computing the MAT of an n n binary image. We show that the algorithm can be performed in O(log n) time using n2/log n processors on the EREW PRAM and in O(log log n) time using n2/log log n processors on the common CRCW PRAM. We also show that the algorithm can be performed in O(n2/p2 + n) time on a p p mesh and in O(n2/p2 + (n log p)/p) time on a p2 processor hypercube (for 1 p n). The algorithm is cost optimal on the PRAMs, on the mesh (for 1 p n) and on the hypercube (for 1 p n/log n).
Wei CHEN Koji NAKANO Toshimitsu MASUZAWA Nobuki TOKURA
Given a sorted set S of n points in the plane, the prefix convex hulls problem of S is to compute the convex hull for every prefix set of S. We present a parallel algorithm for this problem. Our algorithm runs in O(logn) time using n/logn processors in the CREW PRAM computational model. The algorithm is shown to be time and cost optimal. One of the techniques we adopt to achieve these optimal bounds is the use of a new parallel data structure Array-Tree.