The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] PU(3318hit)

361-380hit(3318hit)

  • A Fully-Connected Ising Model Embedding Method and Its Evaluation for CMOS Annealing Machines

    Daisuke OKU  Kotaro TERADA  Masato HAYASHI  Masanao YAMAOKA  Shu TANAKA  Nozomu TOGAWA  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2019/06/10
      Vol:
    E102-D No:9
      Page(s):
    1696-1706

    Combinatorial optimization problems with a large solution space are difficult to solve just using von Neumann computers. Ising machines or annealing machines have been developed to tackle these problems as a promising Non-von Neumann computer. In order to use these annealing machines, every combinatorial optimization problem is mapped onto the physical Ising model, which consists of spins, interactions between them, and their external magnetic fields. Then the annealing machines operate so as to search the ground state of the physical Ising model, which corresponds to the optimal solution of the original combinatorial optimization problem. A combinatorial optimization problem can be firstly described by an ideal fully-connected Ising model but it is very hard to embed it onto the physical Ising model topology of a particular annealing machine, which causes one of the largest issues in annealing machines. In this paper, we propose a fully-connected Ising model embedding method targeting for CMOS annealing machine. The key idea is that the proposed method replicates every logical spin in a fully-connected Ising model and embeds each logical spin onto the physical spins with the same chain length. Experimental results through an actual combinatorial problem show that the proposed method obtains spin embeddings superior to the conventional de facto standard method, in terms of the embedding time and the probability of obtaining a feasible solution.

  • Consideration of Relationship between Human Preference and Pulse Wave Derived from Brain Activity

    Mami KITABATA  Yota NIIGAKI  Yuukou HORITA  

     
    LETTER

      Vol:
    E102-A No:9
      Page(s):
    1250-1253

    In this paper, we consider the relationship between human preference and brain activity, especially pulse wave information using NIRS. First of all, we extracted the information of on pulse wave from the Hb changes signal of NIRS. By using the FFT to the Hb signals, we found out the 2-nd peak of power spectrum that is implying the frequency information of the pulse wave. The frequency deviation of 2-nd peak may have some information about the change of brain activity, it is associated with the human preference for viewing the significant image content.

  • Cefore: Software Platform Enabling Content-Centric Networking and Beyond Open Access

    Hitoshi ASAEDA  Atsushi OOKA  Kazuhisa MATSUZONO  Ruidong LI  

     
    INVITED PAPER

      Pubricized:
    2019/03/22
      Vol:
    E102-B No:9
      Page(s):
    1792-1803

    Information-Centric or Content-Centric Networking (ICN/CCN) is a promising novel network architecture that naturally integrates in-network caching, multicast, and multipath capabilities, without relying on centralized application-specific servers. Software platforms are vital for researching ICN/CCN; however, existing platforms lack a focus on extensibility and lightweight implementation. In this paper, we introduce a newly developed software platform enabling CCN, named Cefore. In brief, Cefore is lightweight, with the ability to run even on top of a resource-constrained device, but is also easily extensible with arbitrary plugin libraries or external software implementations. For large-scale experiments, a network emulator (Cefore-Emu) and network simulator (Cefore-Sim) have also been developed for this platform. Both Cefore-Emu and Cefore-Sim support hybrid experimental environments that incorporate physical networks into the emulated/simulated networks. In this paper, we describe the design, specification, and usage of Cefore as well as Cefore-Emu and Cefore-Sim. We show performance evaluations of in-network caching and streaming on Cefore-Emu and content fetching on Cefore-Sim, verifying the salient features of the Cefore software platform.

  • On Computational Complexity of Pipe Puzzles

    Takumu SHIRAYAMA  Takuto SHIGEMURA  Yota OTACHI  Shuichi MIYAZAKI  Ryuhei UEHARA  

     
    PAPER-Puzzles

      Vol:
    E102-A No:9
      Page(s):
    1134-1141

    In this paper, we investigate computational complexity of pipe puzzles. A pipe puzzle is a kind of tiling puzzle; the input is a set of cards, and a part of a pipe is drawn on each card. For a given set of cards, we arrange them and connect the pipes. We have to connect all pipes without creating any local loop. While ordinary tiling puzzles, like jigsaw puzzles, ask to arrange the tiles with local consistency, pipe puzzles ask to join all pipes. We first show that the pipe puzzle is NP-complete in general even if the goal shape is quite restricted. We also investigate restricted cases and show some polynomial-time algorithms.

  • Dynamic Throughput Allocation among Multiple Servers for Heterogeneous Storage System

    Zhisheng HUO  Limin XIAO  Zhenxue HE  Xiaoling RONG  Bing WEI  

     
    PAPER-Computer System

      Pubricized:
    2019/05/27
      Vol:
    E102-D No:9
      Page(s):
    1731-1739

    Previous works have studied the throughput allocation of the heterogeneous storage system consisting of SSD and HDD in the dynamic setting where users are not all present in the system simultaneously, but those researches make multiple servers as one large resource pool, and cannot cope with the multi-server environment. We design a dynamic throughput allocation mechanism named DAM, which can handle the throughput allocation of multiple heterogeneous servers in the dynamic setting, and can provide a number of desirable properties. The experimental results show that DAM can make one dynamic throughput allocation of multiple servers for making sure users' local allocations in each server, and can provide one efficient and fair throughput allocation in the whole system.

  • Reducing CPU Power Consumption with Device Utilization-Aware DVFS for Low-Latency SSDs

    Satoshi IMAMURA  Eiji YOSHIDA  Kazuichi OE  

     
    PAPER-Computer System

      Pubricized:
    2019/06/18
      Vol:
    E102-D No:9
      Page(s):
    1740-1749

    Emerging solid state drives (SSDs) based on a next-generation memory technology have been recently released in market. In this work, we call them low-latency SSDs because the device latency of them is an order of magnitude lower than that of conventional NAND flash SSDs. Although low-latency SSDs can drastically reduce an I/O latency perceived by an application, the overhead of OS processing included in the I/O latency has become noticeable because of the very low device latency. Since the OS processing is executed on a CPU core, its operating frequency should be maximized for reducing the OS overhead. However, a higher core frequency causes the higher CPU power consumption during I/O accesses to low-latency SSDs. Therefore, we propose the device utilization-aware DVFS (DU-DVFS) technique that periodically monitors the utilization of a target block device and applies dynamic voltage and frequency scaling (DVFS) to CPU cores executing I/O-intensive processes only when the block device is fully utilized. In this case, DU-DVFS can reduce the CPU power consumption without hurting performance because the delay of OS processing incurred by decreasing the core frequency can be hidden. Our evaluation with 28 I/O-intensive workloads on a real server containing an Intel® Optane™ SSD demonstrates that DU-DVFS reduces the CPU power consumption by 41.4% on average (up to 53.8%) with a negligible performance degradation, compared to a standard DVFS governor on Linux. Moreover, the evaluation with multiprogrammed workloads composed of I/O-intensive and non-I/O-intensive programs shows that DU-DVFS is also effective for them because it can apply DVFS only to CPU cores executing I/O-intensive processes.

  • Multi-Party Computation for Modular Exponentiation Based on Replicated Secret Sharing

    Kazuma OHARA  Yohei WATANABE  Mitsugu IWAMOTO  Kazuo OHTA  

     
    PAPER-Cryptography and Information Security

      Vol:
    E102-A No:9
      Page(s):
    1079-1090

    In recent years, multi-party computation (MPC) frameworks based on replicated secret sharing schemes (RSSS) have attracted the attention as a method to achieve high efficiency among known MPCs. However, the RSSS-based MPCs are still inefficient for several heavy computations like algebraic operations, as they require a large amount and number of communication proportional to the number of multiplications in the operations (which is not the case with other secret sharing-based MPCs). In this paper, we propose RSSS-based three-party computation protocols for modular exponentiation, which is one of the most popular algebraic operations, on the case where the base is public and the exponent is private. Our proposed schemes are simple and efficient in both of the asymptotic and practical sense. On the asymptotic efficiency, the proposed schemes require O(n)-bit communication and O(1) rounds,where n is the secret-value size, in the best setting, whereas the previous scheme requires O(n2)-bit communication and O(n) rounds. On the practical efficiency, we show the performance of our protocol by experiments on the scenario for distributed signatures, which is useful for secure key management on the distributed environment (e.g., distributed ledgers). As one of the cases, our implementation performs a modular exponentiation on a 3,072-bit discrete-log group and 256-bit exponent with roughly 300ms, which is an acceptable parameter for 128-bit security, even in the WAN setting.

  • Sub-Linear Time Aggregation in Probabilistic Population Protocol Model

    Ryota EGUCHI  Taisuke IZUMI  

     
    PAPER-Distributed algorithms

      Vol:
    E102-A No:9
      Page(s):
    1187-1194

    A passively mobile system is an abstract notion of mobile ad-hoc networks. It is a collection of agents with computing devices. Agents move in a region, but the algorithm cannot control their physical behavior (i.e., how they move). The population protocol model is one of the promising models in which the computation proceeds by the pairwise communication between two agents. The communicating agents update their states by a specified transition function (algorithm). In this paper, we consider a general form of the aggregation problem with a base station. The base station is a special agent having the computational power more powerful than others. In the aggregation problem, the base station has to sum up for inputs distributed to other agents. We propose an algorithm that solves the aggregation problem in sub-linear parallel time using a relatively small number of states per agent. More precisely, our algorithm solves the aggregation problem with input domain X in O(√n log2 n) parallel time and O(|X|2) states per agent (except for the base station) with high probability.

  • Computational Complexity of Herugolf and Makaro

    Chuzo IWAMOTO  Masato HARUISHI  Tatsuaki IBUSUKI  

     
    PAPER-Puzzles

      Vol:
    E102-A No:9
      Page(s):
    1118-1125

    Herugolf and Makaro are Nikoli's pencil puzzles. We study the computational complexity of Herugolf and Makaro puzzles. It is shown that deciding whether a given instance of each puzzle has a solution is NP-complete.

  • Improved Optical Amplification Efficiency by Using Turbo Cladding Pumping Scheme for Multicore Fiber Optical Networks Open Access

    Hitoshi TAKESHITA  Keiichi MATSUMOTO  Hiroshi HASEGAWA  Ken-ichi SATO  Emmanuel Le Taillandier de GABORY  

     
    PAPER-Fiber-Optic Transmission for Communications

      Pubricized:
    2019/01/24
      Vol:
    E102-B No:8
      Page(s):
    1579-1589

    We realize a multicore erbium-doped fiber amplifier (MC-EDFA) with 2dB optical gain improvement (average) by recycling the residual 0.98μm pump light from the MC-EDF output. Eight-channel per core wavelength division multiplexed (WDM) Nyquist PM-16QAM optical signal amplification is demonstrated over a 40-minute period. Furthermore, we demonstrate the proposed MC-EDFA's stability by using it to amplify a Nyquist PM-16QAM signal and evaluating the resulting Q-factor variation. We found that our scheme contributes to reducing the total power consumption of MC-EDFAs in spatial division multiplexing (SDM)/WDM networks by up to 33.5%.

  • Graph Similarity Metric Using Graph Convolutional Network: Application to Malware Similarity Match

    Bing-lin ZHAO  Fu-dong LIU  Zheng SHAN  Yi-hang CHEN  Jian LIU  

     
    LETTER-Information Network

      Pubricized:
    2019/05/20
      Vol:
    E102-D No:8
      Page(s):
    1581-1585

    Nowadays, malware is a serious threat to the Internet. Traditional signature-based malware detection method can be easily evaded by code obfuscation. Therefore, many researchers use the high-level structure of malware like function call graph, which is impacted less from the obfuscation, to find the malware variants. However, existing graph match methods rely on approximate calculation, which are inefficient and the accuracy cannot be effectively guaranteed. Inspired by the successful application of graph convolutional network in node classification and graph classification, we propose a novel malware similarity metric method based on graph convolutional network. We use graph convolutional network to compute the graph embedding vectors, and then we calculate the similarity metric of two graph based on the distance between two graph embedding vectors. Experimental results on the Kaggle dataset show that our method can applied to the graph based malware similarity metric method, and the accuracy of clustering application with our method reaches to 97% with high time efficiency.

  • Delay Distribution Based Remote Data Fetch Scheme for Hadoop Clusters in Public Cloud

    Ravindra Sandaruwan RANAWEERA  Eiji OKI  Nattapong KITSUWAN  

     
    PAPER-Network

      Pubricized:
    2019/02/04
      Vol:
    E102-B No:8
      Page(s):
    1617-1625

    Apache Hadoop and its ecosystem have become the de facto platform for processing large-scale data, or Big Data, because it hides the complexity of distributed computing, scheduling, and communication while providing fault-tolerance. Cloud-based environments are becoming a popular platform for hosting Hadoop clusters due to their low initial cost and limitless capacity. However, cloud-based Hadoop clusters bring their own challenges due to contradictory design principles. Hadoop is designed on the shared-nothing principle while cloud is based on the concepts of consolidation and resource sharing. Most of Hadoop's features are designed for on-premises data centers where the cluster topology is known. Hadoop depends on the rack assignment of servers (configured by the cluster administrator) to calculate the distance between servers. Hadoop calculates the distance between servers to find the best remote server from which to fetch data from when fetching non-local data. However, public cloud environment providers do not share rack information of virtual servers with their tenants. Lack of rack information of servers may allow Hadoop to fetch data from a remote server that is on the other side of the data center. To overcome this problem, we propose a delay distribution based scheme to find the closest server to fetch non-local data for public cloud-based Hadoop clusters. The proposed scheme bases server selection on the delay distributions between server pairs. Delay distribution is calculated measuring the round-trip time between servers periodically. Our experiments observe that the proposed scheme outperforms conventional Hadoop nearly by 12% in terms of non-local data fetch time. This reduction in data fetch time will lead to a reduction in job run time, especially in real-world multi-user clusters where non-local data fetching can happen frequently.

  • OpenACC Parallelization of Stochastic Simulations on GPUs

    Pilsung KANG  

     
    LETTER-Fundamentals of Information Systems

      Pubricized:
    2019/05/17
      Vol:
    E102-D No:8
      Page(s):
    1565-1568

    We present an OpenACC-based parallelization implementation of stochastic algorithms for simulating biochemical reaction networks on modern GPUs (graphics processing units). To investigate the effectiveness of using OpenACC for leveraging the massive hardware parallelism of the GPU architecture, we carefully apply OpenACC's language constructs and mechanisms to implementing a parallel version of stochastic simulation algorithms on the GPU. Using our OpenACC implementation in comparison to both the NVidia CUDA and the CPU-based implementations, we report our initial experiences on OpenACC's performance and programming productivity in the context of GPU-accelerated scientific computing.

  • Fast Computation with Efficient Object Data Distribution for Large-Scale Hologram Generation on a Multi-GPU Cluster Open Access

    Takanobu BABA  Shinpei WATANABE  Boaz JESSIE JACKIN  Kanemitsu OOTSU  Takeshi OHKAWA  Takashi YOKOTA  Yoshio HAYASAKI  Toyohiko YATAGAI  

     
    PAPER-Human-computer Interaction

      Pubricized:
    2019/03/29
      Vol:
    E102-D No:7
      Page(s):
    1310-1320

    The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have adapted a conventional FFT algorithm to a GPU cluster environment in order to avoid heavy inter-node communications. Then, we have applied several single-node and multi-node optimization and parallelization techniques. The single-node optimizations include a change of the way of object decomposition, reduction of data transfer between the CPU and GPU, kernel integration, stream processing, and utilization of multiple GPUs within a node. The multi-node optimizations include distribution methods of object data from host node to the other nodes. Experimental results show that intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain an execution time of 4.28 sec for generating a 1.6 giga-pixel hologram from a 3.2 giga-pixel object. It means a 237.92 times speed-up of the sequential processing by CPU and 41.78 times speed-up of multi-threaded execution on multicore-CPU, using a conventional FFT-based algorithm.

  • Programmable Analog Calculation Unit with Two-Stage Architecture: A Solution of Efficient Vector-Computation Open Access

    Renyuan ZHANG  Takashi NAKADA  Yasuhiko NAKASHIMA  

     
    PAPER

      Vol:
    E102-A No:7
      Page(s):
    878-885

    A programmable analog calculation unit (ACU) is designed for vector computations in continuous-time with compact circuit scale. From our early study, it is feasible to retrieve arbitrary two-variable functions through support vector regression (SVR) in silicon. In this work, the dimensions of regression are expanded for vector computations. However, the hardware cost and computing error greatly increase along with the expansion of dimensions. A two-stage architecture is proposed to organize multiple ACUs for high dimensional regression. The computation of high dimensional vectors is separated into several computations of lower dimensional vectors, which are implemented by the free combination of several ACUs with lower cost. In this manner, the circuit scale and regression error are reduced. The proof-of-concept ACU is designed and simulated in a 0.18μm technology. From the circuit simulation results, all the demonstrated calculations with nine operands are executed without iterative clock cycles by 4960 transistors. The calculation error of example functions is below 8.7%.

  • A Pulse-Tail-Feedback LC-VCO with 700Hz Flicker Noise Corner and -195dBc FoM Open Access

    Aravind Tharayil NARAYANAN  Kenichi OKADA  

     
    PAPER-Electronic Circuits

      Vol:
    E102-C No:7
      Page(s):
    595-606

    This paper proposes a pulse-tail-feedback VCO, in which the tail transistor is driven using pulse-shaped voltage signals with rail-to-rail swing. The proposed pulse-tail-feedback (PTFB) VCO relies on reducing the current conduction period of the tail transistor and operating the tail transistors in triode region for reducing the flicker and thermal noise from the active elements. Mathematical analysis and circuit level simulations of the phase noise mechanism in the proposed PTFB-VCO is also presented in this paper for validating the effectiveness of the proposed technique. A prototype LC-VCO with the proposed PTFB technique is fabricated in a standard 180nm CMOS. Laboratory measurement shows a power consumption of 1.35mW from a 1.2V supply at 4.6GHz. The proposed PTFB-VCO achieves a flicker corner of 700Hz, which enables the VCO to maintain a fairly constant figure-of-merit (FoM) of -195dB within a wide offset frequency range of 1kHz-10MHz.

  • Rapid Single-Flux-Quantum Truncated Multiplier Based on Bit-Level Processing Open Access

    Nobutaka KITO  Ryota ODAKA  Kazuyoshi TAKAGI  

     
    BRIEF PAPER-Superconducting Electronics

      Vol:
    E102-C No:7
      Page(s):
    607-611

    A rapid single-flux-quantum (RSFQ) truncated multiplier based on bit-level processing is proposed. In the multiplier, two operands are transformed to two serialized patterns of bits (pulses), and the multiplication is carried out by processing those bits. The result is obtained by counting bits. By calculating in bit-level, the proposed multiplier can be implemented in small area. The gate level design of the multiplier is shown. The layout of the 4-bit multiplier was also designed.

  • Entropy Based Illumination-Invariant Foreground Detection

    Karthikeyan PANJAPPAGOUNDER RAJAMANICKAM  Sakthivel PERIYASAMY  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2019/04/18
      Vol:
    E102-D No:7
      Page(s):
    1434-1437

    Background subtraction algorithms generate a background model of the monitoring scene and compare the background model with the current video frame to detect foreground objects. In general, most of the background subtraction algorithms fail to detect foreground objects when the scene illumination changes. An entropy based background subtraction algorithm is proposed to address this problem. The proposed method adapts to illumination changes by updating the background model according to differences in entropy value between the current frame and the previous frame. This entropy based background modeling can efficiently handle both sudden and gradual illumination variations. The proposed algorithm is tested in six video sequences and compared with four algorithms to demonstrate its efficiency in terms of F-score, similarity and frame rate.

  • Low-Complexity Blind Spectrum Sensing in Alpha-Stable Distributed Noise Based on a Gaussian Function

    Jinjun LUO  Shilian WANG  Eryang ZHANG  

     
    PAPER-Antennas and Propagation

      Pubricized:
    2019/01/09
      Vol:
    E102-B No:7
      Page(s):
    1334-1344

    Spectrum sensing is a fundamental requirement for cognitive radio, and it is a challenging problem in impulsive noise modeled by symmetric alpha-stable (SαS) distributions. The Gaussian kernelized energy detector (GKED) performs better than the conventional detectors in SαS distributed noise. However, it fails to detect the DC signal and has high computational complexity. To solve these problems, this paper proposes a more efficient and robust detector based on a Gaussian function (GF). The analytical expressions of the detection and false alarm probabilities are derived and the best parameter for the statistic is calculated. Theoretical analysis and simulation results show that the proposed GF detector has much lower computational complexity than the GKED method, and it can successfully detect the DC signal. In addition, the GF detector performs better than the conventional counterparts including the GKED detector in SαS distributed noise with different characteristic exponents. Finally, we discuss the reason why the GF detector outperforms the conventional counterparts.

  • Serially Concatenated CPM in Two-Way Relay Channels with Physical-Layer Network Coding

    Nan SHA  Lihua CHEN  Yuanyuan GAO  Mingxi GUO  Kui XU  

     
    LETTER-Communication Theory and Signals

      Vol:
    E102-A No:7
      Page(s):
    934-937

    A physical-layer network coding (PNC) scheme is developed using serially concatenated continuous phase modulation (SCCPM) with symbol interleavers in a two-way relay channel (TWRC), i.e., SCCPM-PNC. The decoding structure of the relay is designed and the corresponding soft input soft output (SISO) iterative decoding algorithm is discussed. Simulation results show that the proposed SCCPM-PNC scheme performs good performance in bit error rate (BER) and considerable improvements can be achieved by increasing the interleaver size and number of iterations.

361-380hit(3318hit)