The search functionality is under construction.

Keyword Search Result

[Keyword] MPU(1519hit)

21-40hit(1519hit)

  • Information-Centric Function Chaining for ICN-Based In-Network Computing in the Beyond 5G/6G Era Open Access

    Yusaku HAYAMIZU  Masahiro JIBIKI  Miki YAMAMOTO  

     
    PAPER

      Pubricized:
    2023/10/06
      Vol:
    E107-B No:1
      Page(s):
    94-104

    Information-Centric Networking (ICN) originally innovated for efficient data distribution, is currently discussed to be applied to edge computing environment. In this paper, we focus on a more flexible context, in-network computing, which is enabled by ICN architecture. In ICN-based in-network computing, a function chaining (routing) method for chaining multiple functions located at different routers widely distributed in the network is required. Our proposal is a twofold approach, On-demand Routing for Responsive Route (OR3) and Route Records (RR). OR3 efficiently chains data and multiple functions compared with an existing routing method. RR reactively stores routing information to reduce communication/computing overhead. In this paper, we conducted a mathematical analytics in order to verify the correctness of the proposed routing algorithm. Moreover, we investigate applicabilities of OR3/RR to an edge computing context in the future Beyond 5G/6G era, in which rich computing resources are provided by mobile nodes thanks to the cutting-edge mobile device technologies. In the mobile environments, the optimum from viewpoint of “routing” is largely different from the stable wired environment. We address this challenging issue and newly propose protocol enhancements for OR3 by considering node mobility. Evaluation results reveal that mobility-enhanced OR3 can discover stable paths for function chaining to enable more reliable ICN-based in-network computing under the highly-dynamic network environment.

  • Minimization of Energy Consumption in TDMA-Based Wireless-Powered Multi-Access Edge Computing Networks

    Xi CHEN  Guodong JIANG  Kaikai CHI  Shubin ZHANG  Gang CHEN  Jiang LIU  

     
    PAPER-Communication Theory and Signals

      Pubricized:
    2023/06/19
      Vol:
    E106-A No:12
      Page(s):
    1544-1554

    Many nodes in Internet of Things (IoT) rely on batteries for power. Additionally, the demand for executing compute-intensive and latency-sensitive tasks is increasing for IoT nodes. In some practical scenarios, the computation tasks of WDs have the non-separable characteristic, that is, binary offloading strategies should be used. In this paper, we focus on the design of an efficient binary offloading algorithm that minimizes system energy consumption (EC) for TDMA-based wireless-powered multi-access edge computing networks, where WDs either compute tasks locally or offload them to hybrid access points (H-APs). We formulate the EC minimization problem which is a non-convex problem and decompose it into a master problem optimizing binary offloading decision and a subproblem optimizing WPT duration and task offloading transmission durations. For the master problem, a DRL based method is applied to obtain the near-optimal offloading decision. For the subproblem, we firstly consider the scenario where the nodes do not have completion time constraints and obtain the optimal analytical solution. Then we consider the scenario with the constraints. By jointly using the Golden Section Method and bisection method, the optimal solution can be obtained due to the convexity of the constraint function. Simulation results show that the proposed offloading algorithm based on DRL can achieve the near-minimal EC.

  • MHND: Multi-Homing Network Design Model for Delay Sensitive Applications Open Access

    Akio KAWABATA  Bijoy CHAND CHATTERJEE  Eiji OKI  

     
    PAPER-Network

      Pubricized:
    2023/07/24
      Vol:
    E106-B No:11
      Page(s):
    1143-1153

    When mission-critical applications are provided over a network, high availability is required in addition to a low delay. This paper proposes a multi-homing network design model, named MHND, that achieves low delay, high availability, and the order guarantee of events. MHND maintains the event occurrence order with a multi-homing configuration using conservative synchronization. We formulate MHND as an integer linear programming problem to minimize the delay. We prove that the distributed server allocation problem with MHND is NP-complete. Numerical results indicate that, as a multi-homing number, which is the number of servers to which each user belongs, increases, the availability increases while increasing the delay. Noteworthy, two or more multi-homing can achieve approximately an order of magnitude higher availability compared to that of conventional single-homing at the expense of a delay increase up to two times. By using MHND, flexible network design is achieved based on the acceptable delay in service and the required availability.

  • User Scheduling and Clustering for Distributed Antenna Network Using Quantum Computing

    Keishi HANAKAGO  Ryo TAKAHASHI  Takahiro OHYAMA  Fumiyuki ADACHI  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2023/07/24
      Vol:
    E106-B No:11
      Page(s):
    1210-1218

    In this study, an overloaded large-scale distributed antenna network is considered, for which the number of active users is larger than that of antennas distributed in a base station coverage area (called a cell). To avoid overload, users in each cell are divided into multiple user groups, and, to reduce the computational complexity required for multi-user multiple-input and multiple-output (MU-MIMO), users in each user group are grouped into multiple user clusters so that cluster-wise distributed MU-MIMO can be performed in parallel in each user group. However, as the network size increases, conventional computational methods may not be able to solve combinatorial optimization problems, such as user scheduling and user clustering, which are required for performing cluster-wise distributed MU-MIMO in a finite amount of time. In this study, we apply quantum computing to solve the combinatorial optimization problems of user scheduling and clustering for an overloaded distributed antenna network and propose a quantum computing-based user scheduling and clustering method. The results of computer simulations indicate that as the technology of quantum computers and their related algorithms evolves in the future, the proposed method can realize large-scale dense wireless systems and realize real-time optimization with a short optimization execution cycle.

  • Enhancing Cup-Stacking Method for Collective Communication

    Takashi YOKOTA  Kanemitsu OOTSU  Shun KOJIMA  

     
    PAPER-Computer System

      Pubricized:
    2023/08/22
      Vol:
    E106-D No:11
      Page(s):
    1808-1821

    An interconnection network is an inevitable component for constructing parallel computers. It connects computation nodes so that the nodes can communicate with each other. As a parallel computation essentially requires inter-node communication according to a parallel algorithm, the interconnection network plays an important role in terms of communication performance. This paper focuses on the collective communication that is frequently performed in parallel computation and this paper addresses the Cup-Stacking method that is proposed in our preceding work. The key issues of the method are splitting a large packet into slices, re-shaping the slice, and stacking the slices, in a genetic algorithm (GA) manner. This paper discusses extending the Cup-Stacking method by introducing additional items (genes) and proposes the extended Cup-Stacking method. Furthermore, this paper places comprehensive discussions on the drawbacks and further optimization of the method. Evaluation results reveal the effectiveness of the extended method, where the proposed method achieves at most seven percent improvement in duration time over the former Cup-Stacking method.

  • Switch-Based Quorum Coordination for Low Tail Latency in Replicated Storage

    Gyuyeong KIM  

     
    LETTER-Information Network

      Pubricized:
    2023/08/22
      Vol:
    E106-D No:11
      Page(s):
    1922-1925

    Modern distributed storage requires microsecond-scale tail latency, but the current coordinator-based quorum coordination causes a burdensome latency overhead. This paper presents Archon, a new quorum coordination architecture that supports low tail latency for microsecond-scale replicated storage. The key idea of Archon is to perform the quorum coordination in the network switch by leveraging the flexibility and capability of emerging programmable switch ASICs. Our in-network quorum coordination is based on the observation that the modern programmable switch provides nanosecond-scale processing delay and high flexibility simultaneously. To realize the idea, we design a custom switch data plane. We implement a Archon prototype on an Intel Tofino switch and conduct a series of testbed experiments. Our experimental results show that Archon can provide lower tail latency than the coordinator-based solution.

  • Recursive Probability Mass Function Method to Calculate Probability Distributions of Pulse-Shaped Signals

    Tomoya FUKAMI  Hirobumi SAITO  Akira HIROSE  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2023/03/27
      Vol:
    E106-A No:10
      Page(s):
    1286-1296

    This paper proposes an accurate and efficient method to calculate probability distributions of pulse-shaped complex signals. We show that the distribution over the in-phase and quadrature-phase (I/Q) complex plane is obtained by a recursive probability mass function of the accumulator for a pulse-shaping filter. In contrast to existing analytical methods, the proposed method provides complex-plane distributions in addition to instantaneous power distributions. Since digital signal processing generally deals with complex amplitude rather than power, the complex-plane distributions are more useful when considering digital signal processing. In addition, our approach is free from the derivation of signal-dependent functions. This fact results in its easy application to arbitrary constellations and pulse-shaping filters like Monte Carlo simulations. Since the proposed method works without numerical integrals and calculations of transcendental functions, the accuracy degradation caused by floating-point arithmetic is inherently reduced. Even though our method is faster than Monte Carlo simulations, the obtained distributions are more accurate. These features of the proposed method realize a novel framework for evaluating the characteristics of pulse-shaped signals, leading to new modulation, predistortion and peak-to-average power ratio (PAPR) reduction schemes.

  • A Network Design Scheme in Delay Sensitive Monitoring Services Open Access

    Akio KAWABATA  Takuya TOJO  Bijoy CHAND CHATTERJEE  Eiji OKI  

     
    PAPER-Network Management/Operation

      Pubricized:
    2023/04/19
      Vol:
    E106-B No:10
      Page(s):
    903-914

    Mission-critical monitoring services, such as finding criminals with a monitoring camera, require rapid detection of newly updated data, where suppressing delay is desirable. Taking this direction, this paper proposes a network design scheme to minimize this delay for monitoring services that consist of Internet-of-Things (IoT) devices located at terminal endpoints (TEs), databases (DB), and applications (APLs). The proposed scheme determines the allocation of DB and APLs and the selection of the server to which TE belongs. DB and APL are allocated on an optimal server from multiple servers in the network. We formulate the proposed network design scheme as an integer linear programming problem. The delay reduction effect of the proposed scheme is evaluated under two network topologies and a monitoring camera system network. In the two network topologies, the delays of the proposed scheme are 78 and 80 percent, compared to that of the conventional scheme. In the monitoring camera system network, the delay of the proposed scheme is 77 percent compared to that of the conventional scheme. These results indicate that the proposed scheme reduces the delay compared to the conventional scheme where APLs are located near TEs. The computation time of the proposed scheme is acceptable for the design phase before the service is launched. The proposed scheme can contribute to a network design that detects newly added objects quickly in the monitoring services.

  • Contact Pad Design Considerations for Semiconductor Qubit Devices for Reducing On-Chip Microwave Crosstalk

    Kaito TOMARI  Jun YONEDA  Tetsuo KODERA  

     
    BRIEF PAPER

      Pubricized:
    2023/02/20
      Vol:
    E106-C No:10
      Page(s):
    588-591

    Reducing on-chip microwave crosstalk is crucial for semiconductor spin qubit integration. Toward crosstalk reduction and qubit integration, we investigate on-chip microwave crosstalk for gate electrode pad designs with (i) etched trenches between contact pads or (ii) contact pads with reduced sizes. We conclude that the design with feature (ii) is advantageous for high-density integration of semiconductor qubits with small crosstalk (below -25 dB at 6 GHz), favoring the introduction of flip-chip bonding.

  • Feedback Node Sets in Pancake Graphs and Burnt Pancake Graphs

    Sinyu JUNG  Keiichi KANEKO  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2023/06/30
      Vol:
    E106-D No:10
      Page(s):
    1677-1685

    A feedback node set (FNS) of a graph is a subset of the nodes of the graph whose deletion makes the residual graph acyclic. By finding an FNS in an interconnection network, we can set a check point at each node in it to avoid a livelock configuration. Hence, to find an FNS is a critical issue to enhance the dependability of a parallel computing system. In this paper, we propose a method to find FNS's in n-pancake graphs and n-burnt pancake graphs. By analyzing the types of cycles proposed in our method, we also give the number of the nodes in the FNS in an n-pancake graph, (n-2.875)(n-1)!+1.5(n-3)!, and that in an n-burnt pancake graph, 2n-1(n-1)!(n-3.5).

  • Efficient Construction of CGL Hash Function Using Legendre Curves

    Yuji HASHIMOTO  Koji NUIDA  

     
    PAPER-Cryptography and Information Security

      Pubricized:
    2023/02/07
      Vol:
    E106-A No:9
      Page(s):
    1131-1140

    The CGL hash function is a provably secure hash function using walks on isogeny graphs of supersingular elliptic curves. A dominant cost of its computation comes from iterative computations of power roots over quadratic extension fields. In this paper, we reduce the necessary number of power root computations by almost half, by applying and also extending an existing method of efficient isogeny sequence computation on Legendre curves (Hashimoto and Nuida, CASC 2021). We also point out some relationship between 2-isogenies for Legendre curves and those for Edwards curves, which is of independent interests, and develop a method of efficient computation for 2e-th roots in quadratic extension fields.

  • Attractiveness Computing in Image Media

    Toshihiko YAMASAKI  

     
    INVITED PAPER-Vision

      Pubricized:
    2023/06/16
      Vol:
    E106-A No:9
      Page(s):
    1196-1201

    Our research group has been working on attractiveness prediction, reasoning, and even enhancement for multimedia content, which we call “attractiveness computing.” Attractiveness includes impressiveness, instagrammability, memorability, clickability, and so on. Analyzing such attractiveness was usually done by experienced professionals but we have experimentally revealed that artificial intelligence (AI) based on big multimedia data can imitate or reproduce professionals' skills in some cases. In this paper, we introduce some of the representative works and possible real-life applications of our attractiveness computing for image media.

  • Backup Resource Allocation Model with Probabilistic Protection Considering Service Delay

    Shinya HORIMOTO  Fujun HE  Eiji OKI  

     
    PAPER-Network

      Pubricized:
    2023/03/24
      Vol:
    E106-B No:9
      Page(s):
    798-816

    This paper proposes a backup resource allocation model for virtual network functions (VNFs) to minimize the total allocated computing capacity for backup with considering the service delay. If failures occur to primary hosts, the VNFs in failed hosts are recovered by backup hosts whose allocation is pre-determined. We introduce probabilistic protection, where the probability that the protection by a backup host fails is limited within a given value; it allows backup resource sharing to reduce the total allocated computing capacity. The previous work does not consider the service delay constraint in the backup resource allocation problem. The proposed model considers that the probability that the service delay, which consists of networking delay between hosts and processing delay in each VNF, exceeds its threshold is constrained within a given value. We introduce a basic algorithm to solve our formulated delay-constraint optimization problem. In a problem with the size that cannot be solved within an acceptable computation time limit by the basic algorithm, we develop a simulated annealing algorithm incorporating Yen's algorithm to handle the delay constraint heuristically. We observe that both algorithms in the proposed model reduce the total allocated computing capacity by up to 56.3% compared to a baseline; the simulated annealing algorithm can get feasible solutions in problems where the basic algorithm cannot.

  • A Fully Analog Deep Neural Network Inference Accelerator with Pipeline Registers Based on Master-Slave Switched Capacitors

    Yaxin MEI  Takashi OHSAWA  

     
    PAPER-Integrated Electronics

      Pubricized:
    2023/03/08
      Vol:
    E106-C No:9
      Page(s):
    477-485

    A fully analog pipelined deep neural network (DNN) accelerator is proposed, which is constructed by using pipeline registers based on master-slave switched capacitors. The idea of the master-slave switched capacitors is an analog equivalent of the delayed flip-flop (D-FF) which has been used as a digital pipeline register. To estimate the performance of the pipeline register, it is applied to a conventional DNN which performs non-pipeline operation. Compared with the conventional DNN, the cycle time is reduced by 61.5% and data rate is increased by 160%. The accuracy reaches 99.6% in MNIST classification test. The energy consumption per classification is reduced by 88.2% to 0.128µJ, achieving an energy efficiency of 1.05TOPS/W and a throughput of 0.538TOPS in 180nm technology node.

  • Networking Experiment of Domain-Specific Networking Platform Based on Optically Interconnected Reconfigurable Communication Processors Open Access

    Masaki MURAKAMI  Takashi KURIMOTO  Satoru OKAMOTO  Naoaki YAMANAKA  Takayuki MURANAKA  

     
    PAPER-Network System

      Pubricized:
    2023/02/15
      Vol:
    E106-B No:8
      Page(s):
    660-668

    A domain-specific networking platform based on optically interconnected reconfigurable communication processors is proposed. Some application examples of the reconfigurable communication processor and networking experiment results are presented.

  • Write Variation & Reliability Error Compensation by Layer-Wise Tunable Retraining of Edge FeFET LM-GA CiM

    Shinsei YOSHIKIYO  Naoko MISAWA  Kasidit TOPRASERTPONG  Shinichi TAKAGI  Chihiro MATSUI  Ken TAKEUCHI  

     
    PAPER

      Pubricized:
    2022/12/19
      Vol:
    E106-C No:7
      Page(s):
    352-364

    This paper proposes a layer-wise tunable retraining method for edge FeFET Computation-in-Memory (CiM) to compensate the accuracy degradation of neural network (NN) by FeFET device errors. The proposed retraining can tune the number of layers to be retrained to reduce inference accuracy degradation by errors that occur after retraining. Weights of the original NN model, accurately trained in cloud data center, are written into edge FeFET CiM. The written weights are changed by FeFET device errors in the field. By partially retraining the written NN model, the proposed method combines the error-affected layers of NN model with the retrained layers. The inference accuracy is thus recovered. After retraining, the retrained layers are re-written to CiM and affected by device errors again. In the evaluation, at first, the recovery capability of NN model by partial retraining is analyzed. Then the inference accuracy after re-writing is evaluated. Recovery capability is evaluated with non-volatile memory (NVM) typical errors: normal distribution, uniform shift, and bit-inversion. For all types of errors, more than 50% of the degraded percentage of inference accuracy is recovered by retraining only the final fully-connected (FC) layer of Resnet-32. To simulate FeFET Local-Multiply and Global-accumulate (LM-GA) CiM, recovery capability is also evaluated with FeFET errors modeled based on FeFET measurements. Retraining only FC layer achieves recovery rate of up to 53%, 66%, and 72% for FeFET write variation, read-disturb, and data-retention, respectively. In addition, just adding two more retraining layers improves recovery rate by 20-30%. In order to tune the number of retraining layers, inference accuracy after re-writing is evaluated by simulating the errors that occur after retraining. When NVM typical errors are injected, it is optimal to retrain FC layer and 3-6 convolution layers of Resnet-32. The optimal number of layers can be increased or decreased depending on the balance between the size of errors before retraining and errors after retraining.

  • Evaluation of Performance and Power Consumption on Supercomputer Fugaku Using SPEC HPC Benchmarks

    Yuetsu KODAMA  Masaaki KONDO  Mitsuhisa SATO  

     
    PAPER

      Pubricized:
    2022/12/12
      Vol:
    E106-C No:6
      Page(s):
    303-311

    The supercomputer, “Fugaku”, which ranked number one in multiple supercomputing lists, including the Top500 in June 2020, has various power control features, such as (1) an eco mode that utilizes only one of two floating-point pipelines while decreasing the power supply to the chip; (2) a boost mode that increases clock frequency; and (3) a core retention feature that turns unused cores to the low-power state. By orchestrating these power-performance features while considering the characteristics of running applications, we can potentially gain even better system-level energy efficiency. In this paper, we report on the performance and power consumption of Fugaku using SPEC HPC benchmarks. Consequently, we confirmed that it is possible to reduce the energy by about 17% while improving the performance by about 2% from the normal mode by combining boost mode and eco mode.

  • A Shallow SNN Model for Embedding Neuromorphic Devices in a Camera for Scalable Video Surveillance Systems

    Kazuhisa FUJIMOTO  Masanori TAKADA  

     
    PAPER-Biocybernetics, Neurocomputing

      Pubricized:
    2023/03/13
      Vol:
    E106-D No:6
      Page(s):
    1175-1182

    Neuromorphic computing with a spiking neural network (SNN) is expected to provide a complement or alternative to deep learning in the future. The challenge is to develop optimal SNN models, algorithms, and engineering technologies for real use cases. As a potential use cases for neuromorphic computing, we have investigated a person monitoring and worker support with a video surveillance system, given its status as a proven deep neural network (DNN) use case. In the future, to increase the number of cameras in such a system, we will need a scalable approach that embeds only a few neuromorphic devices in a camera. Specifically, this will require a shallow SNN model that can be implemented in a few neuromorphic devices while providing a high recognition accuracy comparable to a DNN with the same configuration. A shallow SNN was built by converting ResNet, a proven DNN for image recognition, and a new configuration of the shallow SNN model was developed to improve its accuracy. The proposed shallow SNN model was evaluated with a few neuromorphic devices, and it achieved a recognition accuracy of more than 80% with about 1/130 less energy consumption than that of a GPU with the same configuration of DNN as that of SNN.

  • A Computer-Aided Solution to Find All Feasible Schemes of Cyclic Interference Alignment for Propagation-Delay Based X Channels

    Conggai LI  Feng LIU  Xin ZHOU  Yanli XU  

     
    LETTER-Communication Theory and Signals

      Pubricized:
    2022/11/02
      Vol:
    E106-A No:5
      Page(s):
    868-870

    To obtain a full picture of potential applications for propagation-delay based X channels, it is important to obtain all feasible schemes of cyclic interference alignment including the encoder, channel instance, and decoder. However, when the dimension goes larger, theoretical analysis about this issue will become tedious and even impossible. In this letter, we propose a computer-aided solution by searching the channel space and the scheduling space, which can find all feasible schemes in details. Examples are given for some typical X channels. Computational complexity is further analyzed.

  • Edge Computing Resource Allocation Algorithm for NB-IoT Based on Deep Reinforcement Learning

    Jiawen CHU  Chunyun PAN  Yafei WANG  Xiang YUN  Xuehua LI  

     
    PAPER-Network

      Pubricized:
    2022/11/04
      Vol:
    E106-B No:5
      Page(s):
    439-447

    Mobile edge computing (MEC) technology guarantees the privacy and security of large-scale data in the Narrowband-IoT (NB-IoT) by deploying MEC servers near base stations to provide sufficient computing, storage, and data processing capacity to meet the delay and energy consumption requirements of NB-IoT terminal equipment. For the NB-IoT MEC system, this paper proposes a resource allocation algorithm based on deep reinforcement learning to optimize the total cost of task offloading and execution. Since the formulated problem is a mixed-integer non-linear programming (MINLP), we cast our problem as a multi-agent distributed deep reinforcement learning (DRL) problem and address it using dueling Q-learning network algorithm. Simulation results show that compared with the deep Q-learning network and the all-local cost and all-offload cost algorithms, the proposed algorithm can effectively guarantee the success rates of task offloading and execution. In addition, when the execution task volume is 200KBit, the total system cost of the proposed algorithm can be reduced by at least 1.3%, and when the execution task volume is 600KBit, the total cost of system execution tasks can be reduced by 16.7% at most.

21-40hit(1519hit)