Shimpei SATO Eijiro SASSA Yuta UKON Atsushi TAKAHASHI
In order to obtain high-performance circuits in advanced technology nodes, design methodology has to take the existence of large delay variations into account. Clock scheduling and speculative execution have overheads to realize them, but have potential to improve the performance by averaging the imbalance of maximum delay among paths and by utilizing valid data available earlier than worst-case scenarios, respectively. In this paper, we propose a high-performance digital circuit design method with speculative executions with less overhead by utilizing clock scheduling with delay insertions effectively. The necessity of speculations that cause overheads is effectively reduced by clock scheduling with delay insertion. Experiments show that a generated circuit achieves 26% performance improvement with 1.3% area overhead compared to a circuit without clock scheduling and without speculative execution.
Takahiro MAEKAWA Ayana KAWAMURA Takayuki NAKACHI Hitoshi KIYA
A privacy-preserving support vector machine (SVM) computing scheme is proposed in this paper. Cloud computing has been spreading in many fields. However, the cloud computing has some serious issues for end users, such as the unauthorized use of cloud services, data leaks, and privacy being compromised. Accordingly, we consider privacy-preserving SVM computing. We focus on protecting visual information of images by using a random unitary transformation. Some properties of the protected images are discussed. The proposed scheme enables us not only to protect images, but also to have the same performance as that of unprotected images even when using typical kernel functions such as the linear kernel, radial basis function (RBF) kernel and polynomial kernel. Moreover, it can be directly carried out by using well-known SVM algorithms, without preparing any algorithms specialized for secure SVM computing. In an experiment, the proposed scheme is applied to a face-based authentication algorithm with SVM classifiers to confirm the effectiveness.
Songlin DU Yuhao XU Tingting HU Takeshi IKENAGA
High frame rate and ultra-low delay matching system plays an important role in various human-machine interactive applications, which demands better performance in matching deformable and out-of-plane rotating objects. Although many algorithms have been proposed for deformation tracking and matching, few of them are suitable for hardware implementation due to complicated operations and large time consumption. This paper proposes a hardware-oriented template update and recovery method for high frame rate and ultra-low delay deformation matching system. In the proposed method, the new template is generated in real time by partially updating the template descriptor and adding new keypoints simultaneously with the matching process in pixels (proposal #1), which avoids the large inter-frame delay. The size and shape of region of interest (ROI) are made flexible and the Hamming threshold used for brute-force matching is adjusted according to pixel position and the flexible ROI (proposal #2), which solves the problem of template drift. The template is recovered by the previous one with a relative center-shifting vector when it is judged as lost via region-wise difference check (proposal #3). Evaluation results indicate that the proposed method successfully achieves the real-time processing of 784fps at the resolution of 640×480 on field-programmable gate array (FPGA), with a delay of 0.808ms/frame, as well as achieves satisfactory deformation matching results in comparison with other general methods.
Xina CHENG Yiming ZHAO Takeshi IKENAGA
Real-time 3D players tracking plays an important role in sports analysis, especially for the live services of sports broadcasting, which have a strict limitation on processing time. For these kinds of applications, 3D trajectories of players contribute to high-level game analysis such as tactic analysis and commercial applications such as TV contents. Thus real-time implementation for 3D players tracking is expected. In order to achieve real-time for 60fps videos with high accuracy, (that means the processing time should be less than 16.67ms per frame), the factors that limit the processing time of target algorithm include: 1) Large image area of each player. 2) Repeated processing of multiple players in multiple views. 3) Complex calculation of observation algorithm. To deal with the above challenges, this paper proposes a representative spatial selection and temporal combination based real-time implementation for multi-view volleyball players tracking on the GPU device. First, the representative spatial pixel selection, which detects the pixels that mostly represent one image region to scale down the image spatially, reduces the number of processing pixels. Second, the representative temporal likelihood combination shares observation calculation by using the temporal correlation between images so that the times of complex calculation is reduced. The experiments are based on videos of the Final and Semi-Final Game of 2014 Japan Inter High School Games of Men's Volleyball in Tokyo Metropolitan Gymnasium. On the GPU device GeForce GTX 1080Ti, the tracking system achieves real-time on 60fps videos and keeps the tracking accuracy higher than 97%.
The windowed interpolation DFT methods have been utilized to estimate the parameters of a single frequency and multi-frequency signal. Nevertheless, they do not work well for the real-valued sinusoids with closely spaced positive- and negative- frequency. In this paper, we describe a novel three-point windowed interpolation DFT method for frequency measurement of real-valued sinusoid signal. The exact representation of the windowed DFT with maximum sidelobe decay window (MSDW) is constructed. The spectral superposition of positive- and negative-frequency is considered and calculated to improve the estimation performance. The simulation results match with the theoretical values well. In addition, computer simulations demonstrate that the proposed algorithm provides high estimation accuracy and good noise suppression capability.
Chuang ZHU Jie LIU Xiao Feng HUANG Guo Qing XIANG
This paper reports a high-quality hardware-friendly integer motion estimation (IME) scheme. According to different characteristics of CTU content, the proposed method adopts different adaptive multi-resolution strategies coupled with accurate full-PU modes IME at the finest level. Besides, by using motion vector derivation, IME for the second reference frame is simplified and hardware resource is saved greatly through processing element (PE) sharing. It is shown that the proposed architecture can support the real-time processing of 4K-UHD @60fps, while the BD-rate is just increased by 0.53%.
Ryohei BANNO Jingyu SUN Susumu TAKEUCHI Kazuyuki SHUDO
MQTT is one of the promising protocols for various data exchange in IoT environments. Typically, those environments have a characteristic called “edge-heavy”, which means that things at the network edge generate a massive volume of data with high locality. For handling such edge-heavy data, an architecture of placing multiple MQTT brokers at the network edges and making them cooperate with each other is quite effective. It can provide higher throughput and lower latency, as well as reducing consumption of cloud resources. However, under this kind of architecture, heterogeneity could be a vital issue. Namely, an appropriate product of MQTT broker could vary according to the different environment of each network edge, even though different products are hard to cooperate due to the MQTT specification providing no interoperability between brokers. In this paper, we propose Interworking Layer of Distributed MQTT brokers (ILDM), which enables arbitrary kinds of MQTT brokers to cooperate with each other. ILDM, designed as a generic mechanism independent of any specific cooperation algorithm, provides APIs to facilitate development of a variety of algorithms. By using the APIs, we also present two basic cooperation algorithms. To evaluate the usefulness of ILDM, we introduce a benchmark system which can be used for both a single broker and multiple brokers. Experimental results show that the throughput of five brokers running together by ILDM is improved 4.3 times at maximum than that of a single broker.
Chihiro GO Yuma KINOSHITA Sayaka SHIOTA Hitoshi KIYA
This paper proposes a novel multi-exposure image fusion (MEF) scheme for single-shot high dynamic range imaging with spatially varying exposures (SVE). Single-shot imaging with SVE enables us not only to produce images without color saturation regions from a single-shot image, but also to avoid ghost artifacts in the producing ones. However, the number of exposures is generally limited to two, and moreover it is difficult to decide the optimum exposure values before the photographing. In the proposed scheme, a scene segmentation method is applied to input multi-exposure images, and then the luminance of the input images is adjusted according to both of the number of scenes and the relationship between exposure values and pixel values. The proposed method with the luminance adjustment allows us to improve the above two issues. In this paper, we focus on dual-ISO imaging as one of single-shot imaging. In an experiment, the proposed scheme is demonstrated to be effective for single-shot high dynamic range imaging with SVE, compared with conventional MEF schemes with exposure compensation.
In this paper, we consider the clustering problem of independent general subspaces. That is, with given data points lay near or on the union of independent low-dimensional linear subspaces, we aim to recover the subspaces and assign the corresponding label to each data point. To settle this problem, we take advantages of both greedy strategy and energy minimization strategy to propose a simple yet effective algorithm based on the assumption that an m-branched (i.e., perfect m-ary) tree which is constructed by collecting m-nearest neighbor points in each node has a high probability of containing the near-exact subspace. Specifically, at first, subspace candidates are enumerated by multiple m-branched trees. Each tree starts with a data point and grows by collecting nearest neighbors in the breadth-first search order. Then, subspace proposals are further selected from the enumeration to initialize the energy minimization algorithm. Eventually, both the proposals and the labeling result are finalized by iterative re-estimation and labeling. Experiments with both synthetic and real-world data show that the proposed method can outperform state-of-the-art methods and is practical in real application.
Fanxin ZENG Xiping HE Guixin XUAN Zhenyu ZHANG Yanni PENG Linjie QIAN Li YAN
Based on the number of cyclotomy of order eight, a class of balanced almost 8-QAM sequences with odd prime periods is presented. The resultant sequences have low two-level nontrivial autocorrelation values, and their distribution is determined. Furthermore, the smallest possible absolute sidelobes (SPASs) of autocorrelation functions of balanced almost 8-QAM sequences are derived. Compared with the obtained SPASs, some of the proposed sequences is optimal or suboptimal.
Mizuki YAMADA Keigo TAKEUCHI Kiyoyuki KOIKE
We propose hardware-aware sum-product (SP) decoding for low-density parity-check codes. To simplify an implementation using a fixed-point number representation, we transform SP decoding in the logarithm domain to that in the decision domain. A polynomial approximation is proposed to implement an update rule of the proposed SP decoding efficiently. Numerical simulations show that the approximate SP decoding achieves almost the same performance as the exact SP decoding when an appropriate degree in the polynomial approximation is used, that it improves the convergence properties of SP and normalized min-sum decoding in the high signal-to-noise ratio regime, and that it is robust against quantization errors.
Toshiki HIRAO Raula GAIKOVINA KULA Akinori IHARA Kenichi MATSUMOTO
Modern code review is a well-known practice to assess the quality of software where developers discuss the quality in a web-based review tool. However, this lightweight approach may risk an inefficient review participation, especially when comments becomes either excessive (i.e., too many) or underwhelming (i.e., too few). In this study, we investigate the phenomena of reviewer commenting. Through a large-scale empirical analysis of over 1.1 million reviews from five OSS systems, we conduct an exploratory study to investigate the frequency, size, and evolution of reviewer commenting. Moreover, we also conduct a modeling study to understand the most important features that potentially drive reviewer comments. Our results find that (i) the number of comments and the number of words in the comments tend to vary among reviews and across studied systems; (ii) reviewers change their behaviours in commenting over time; and (iii) human experience and patch property aspects impact the number of comments and the number of words in the comments.
Saya OHIRA Naoki TSUCHIYA Tetsuya MATSUMURA
We propose a three-dimensional (3D) sound processor architecture that includes super-directional modulation intellectual property (IP) and 3D sound processing IP and for consumer applications. In addition, we also propose an automatic design environment for 3D sound processing IP. This processor can generate realistic small sound fields in arbitrary spaces using ultrasound. In particular, in the 3D sound processing IP, in order to reproduce 3D audio, it is necessary to reproduce the personal frequency characteristics of complex head related transfer functions. For this reason, we have constructed an automatic design environment with high reconfigurability. This automatic design environment is based on high-level synthesis, and it is possible to automatically generate a C-based algorithm simulator and automatically synthesize the IP hardware by inputting a parameter description file for filter design. This automatic design environment can reduce the design period to approximately 1/5 as compared with conventional manual design. Applying the automatic design environment, a 3D sound processing IP was designed experimentally. The designed IP can be sufficiently applied to consumer applications from the viewpoints of hardware amount and power consumption.
Haiyang LIU Lianrong MA Hao ZHANG
For an odd prime q and an integer m≤q, we can construct a regular quasi-cyclic parity-check matrix HI(m,q) that specifies a linear block code CI(m,q), called an improper array code. In this letter, we prove the minimum distance of CI(4,q) is equal to 10 for any q≥11. In addition, we prove the minimum distance of CI(5,q) is upper bounded by 12 for any q≥11 and conjecture the upper bound is tight.
The latency and the energy consumption of DRAM are serious concerns because (1) the latency has not improved much for decades and (2) recent machines have huge capacity of main memory. Device-level studies reduce them by shortening the wait time of DRAM internal operations so that they finish fast and consume less energy. Applying these techniques aggressively to achieve approximate memory is a promising direction to further reduce the overhead, given that many data-center applications today are to some extent robust to bit-flips. To advance research on approximate memory, it is required to evaluate its effect to applications so that both researchers and potential users of approximate memory can investigate how it affects realistic applications. However, hardware simulators are too slow to run workloads repeatedly with different parameters. To this end, we propose a lightweight method to evaluate effect of approximate memory. The idea is to count the number of DRAM internal operations that occur to approximate data of applications and calculate the probability of bit-flips based on it, instead of using heavy-weight simulators. The evaluation shows that our system is 3 orders of magnitude faster than cycle accurate simulators, and we also give case studies of evaluating effect of approximate memory to some realistic applications.
Kohei YOSHIGAMI Taishi HAYASHI Masateru TSUNODA Hidetake UWANO Shunichiro SASAKI Kenichi MATSUMOTO
Recently, many studies have applied gamification to software engineering education and software development to enhance work results. Gamification is defined as “the use of game design elements in non-game contexts.” When applying gamification, we make various game rules, such as a time limit. However, it is not clear whether the rule affects working time or not. For example, if we apply a time limit to impatient developers, the working time may become shorter, but the rule may negatively affect because of pressure for time. In this study, we analyze with subjective experiments whether the rules affects work results such as working time. Our experimental results suggest that for the coding tasks, working time was shortened when we applied a rule that made developers aware of working time by showing elapsed time.
A wireless sensor network consists of spatially distributed devices using sensors to monitor physical and environmental conditions. Key infection is a key distribution protocol for wireless sensor networks with a partially present adversary; a sensor node wishing to communicate secretly with other nodes simply sends a symmetric encryption key in the clear. The partially present adversary can eavesdrop on only a small fraction of the keys. Secrecy amplification is a post-deployment strategy to improve the security of key infection by combining multiple keys propagated along different paths. The previous mathematical analysis of secrecy amplification assumes that sensor nodes always transmit packets at the maximum strength. We provide a mathematical analysis of secrecy amplification where nodes adjust their transmission power adaptively (a.k.a. whispering mode).
Yuu AIKOU Shahidatul SADIAH Toru NAKANISHI
In conventional ID-based user authentications, privacy issues may occur, since users' behavior histories are collected in Service Providers (SPs). Although anonymous authentications such as group signatures have been proposed, these schemes rely on a Trusted Third Party (TTP) capable of tracing misbehaving users. Thus, the privacy is not high, because the TTP of tracing authority can always trace users. Therefore, the anonymous credential system using a blacklist without the TTP of tracing authority has been proposed, where blacklisted anonymous users can be blocked. Recently, an RSA-based blacklistable anonymous credential system with efficiency improvement has been proposed. However, this system still has an efficiency problem: The data size in the authentication is O(K'), where K' is the maximum number of sessions in which the user can conduct. Furthermore, the O(K')-size data causes the user the computational cost of O(K') exponentiations. In this paper, a blacklistable anonymous credential system using a pairing-based accumulator is proposed. In the proposed system, the data size in the authentication is constant for parameters. Although the user's computational cost depends on parameters, the dependent cost is O(δBL·K) multiplications, instead of exponentiations, where δBL is the number of sessions added to the blacklist after the last authentication of the user, and K is the number of past sessions of the user. The demerit of the proposed system is O(n)-size public key, where n corresponds to the total number of all sessions of all users in the system. But, the user only has to download the public key once.
Shijie WANG Yuanyuan GAO Xiaochen LIU Guangna ZHANG Nan SHA Mingxi GUO Kui XU
In this paper, we explore how to enhance the physical layer security performance in downlink cellular networks through cooperative jamming technology. Idle user equipments (UE) are used to cooperatively transmit jamming signal to confuse eavesdroppers (Eve). We propose a threshold-based jammer selection scheme to decide which idle UE should participate in the transmission of jamming signal. Threshold conditions are carefully designed to decrease interference to legitimate channel, while maintain the interference to the Eves. Moreover, fewer UE are activated, which is helpful for saving energy consumptions of cooperative UEs. Analytical expressions of the connection and secrecy performances are derived, which are validated through Monte Carlo simulations. Theoretical and simulation results reveal that our proposed scheme can improve connection performance, while approaches the secrecy performance of [12]. Furthermore, only 43% idle UEs of [12] are used for cooperative jamming, which helps to decrease energy consumption of network.
Guangna ZHANG Yuanyuan GAO Huadong LUO Nan SHA Mingxi GUO Kui XU
In this paper, we investigate a novel joint multi-relay and jammer selection (JMRJS) scheme in order to improve the physical layer security of wireless networks. In the JMRJS scheme, all the relays succeeding in source decoding are selected to assist in the source signal transmission and meanwhile, all the remaining relay nodes are employed to act as friendly jammers to disturb the eavesdroppers by broadcasting artificial noise. Based on the more general Nakagami-m fading channel, we analyze the security performance of the JMRJS scheme for protecting the source signal against eavesdropping. The exact closed-form expressions of outage probability (OP) and intercept probability (IP) for the JMRJS scheme over Nakagami-m fading channel are derived. Moreover, we analyze the security-reliability tradeoff (SRT) of this scheme. Simulation results show that as the number of decode-and-forward (DF)relay nodes increases, the SRT of the JMRJS scheme improves notably. And when the transmit power is below a certain value, the SRT of the JMRJS scheme consistently outperforms the joint single-relay and jammer selection (JSRJS) scheme and joint equal-relay and jammer selection (JERJS) scheme respectively. In addition, the SRT of this scheme is always better than that of the multi-relay selection (MRS) scheme.