The search functionality is under construction.

Keyword Search Result

[Keyword] speedup(8hit)

1-8hit
  • Symmetric Decomposition of Convolution Kernels

    Jun OU  Yujian LI  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2018/10/18
      Vol:
    E102-D No:1
      Page(s):
    219-222

    It is a hot issue that speeding up the network layers and decreasing the network parameters in convolutional neural networks (CNNs). In this paper, we propose a novel method, namely, symmetric decomposition of convolution kernels (SDKs). It symmetrically separates k×k convolution kernels into (k×1 and 1×k) or (1×k and k×1) kernels. We conduct the comparison experiments of the network models designed by SDKs on MNIST and CIFAR-10 datasets. Compared with the corresponding CNNs, we obtain good recognition performance, with 1.1×-1.5× speedup and more than 30% reduction of network parameters. The experimental results indicate our method is useful and effective for CNNs in practice, in terms of speedup performance and reduction of parameters.

  • Parallel DFA Architecture for Ultra High Throughput DFA-Based Pattern Matching

    Yi TANG  Junchen JIANG  Xiaofei WANG  Chengchen HU  Bin LIU  Zhijia CHEN  

     
    PAPER

      Vol:
    E93-D No:12
      Page(s):
    3232-3242

    Multi-pattern matching is a key technique for implementing network security applications such as Network Intrusion Detection/Protection Systems (NIDS/NIPSes) where every packet is inspected against tens of thousands of predefined attack signatures written in regular expressions (regexes). To this end, Deterministic Finite Automaton (DFA) is widely used for multi-regex matching, but existing DFA-based researches have claimed high throughput at an expense of extremely high memory cost, so fail to be employed in devices such as high-speed routers and embedded systems where the available memory is quite limited. In this paper, we propose a parallel architecture of DFA called Parallel DFA (PDFA) taking advantage of the large amount of concurrent flows to increase the throughput with nearly no extra memory cost. The basic idea is to selectively store the underlying DFA in memory modules that can be accessed in parallel. To explore its potential parallelism we intensively study DFA-split schemes from both state and transition points in this paper. The performance of our approach in both the average cases and the worst cases is analyzed, optimized and evaluated by numerical results. The evaluation shows that we obtain an average speedup of 100 times compared with traditional DFA-based matching approach.

  • Efficient Scheduling for SDMG CIOQ Switches

    Mei YANG  Si Qing ZHENG  

     
    PAPER-Switching for Communications

      Vol:
    E89-B No:9
      Page(s):
    2457-2468

    Combined input and output queuing (CIOQ) switches are being considered as high-performance switch architectures due to their ability to achieve 100% throughput and perfectly emulate output queuing (OQ) switch performance with a small speedup factor S. To realize a speedup factor S, a conventional CIOQ switch requires the switching fabric and memories to operate S times faster than the line rate. In this paper, we propose to use a CIOQ switch with space-division multiplexing expansion and grouped input/output ports (SDMG CIOQ switch for short) to realize speedup while only requiring the switching fabric and memories to operate at the line rate. The cell scheduling problem for the SDMG CIOQ switch is abstracted as a bipartite k-matching problem. Using fluid model techniques, we prove that any maximal size k-matching algorithm on an SDMG CIOQ switch with an expansion factor 2 can achieve 100% throughput assuming input line arrivals satisfy the strong law of large numbers (SLLN) and no input/output line is oversubscribed. We further propose an efficient and starvation-free maximal size k-matching scheduling algorithm, kFRR, for the SDMG CIOQ switch. Simulation results show that kFRR achieves 100% throughput for SDMG CIOQ switches with an expansion factor 2 under two SLLN traffic models, uniform traffic and polarized traffic, confirming our analysis.

  • Performance Evaluation of Concurrent System Using Formal Model: Simulation Speedup

    Wan Bok LEE  Tag Gon KIM  

     
    PAPER

      Vol:
    E86-A No:11
      Page(s):
    2755-2766

    Analysis of concurrent systems, such as computer/communication networks and manufacturing systems, usually employs formal discrete event models. The analysis then includes model validation, property verification, and performance evaluation of such models. The DEVS (Discrete Event Systems Specification) formalism is a well-known formal modeling framework which supports specification of discrete event models in a hierarchical, modular manner. While validation and verification using formal models may not resort to discrete event simulation, accurate performance evaluation must employ discrete event simulation of formal models. Since formal models, such as DEVS models, explicitly represent communication semantics between component models, their simulation cost is much higher than using simulation languages with informal models. This paper proposes a method for simulation speedup in performance evaluation of concurrent systems using DEVS models. The method is viewed as a compiled simulation technique which eliminates run-time interpretation of communication paths between component models. The elimination has been done by a behavior-preserved transformation method, called model composition, which is based on the closed under coupling property in DEVS theory. Experimental results show that the simulation speed of transformed DEVS models is about 14 times faster than original ones.

  • A Hybrid Switch System Architecture for Large-Scale Digital Communication Network Using SFQ Technology

    Shinichi YOROZU  Yoshio KAMEDA  Shuichi TAHARA  

     
    PAPER-Digital Applications

      Vol:
    E84-C No:1
      Page(s):
    15-19

    Within the next few decades, high-end telecommunication systems on the larger nationwide network will require a switching capacity of over 5 Tbps. Advanced optical transmission technologies, such as wavelength division multiplexing (WDM) will support optical-fiber data transmission at such speeds. However, semiconductors may not be capable of high-throughput data switching because of the limitations by power consumption and operating speed, and pin count. Superconducting single flux quantum (SFQ) technology is a promising approach for overcoming these problems. This paper proposed an optical-electrical-SFQ hybrid switching system and a novel switch architecture. This architecture uses time-shifted internal speedup, shuffle and grouping exchange and a Batcher-Banyan switch. Our proposed switch consists of an interface circuit with small buffers, a Batcher sorter, a time-shift-speedup buffer (TSSB), a Banyan switch, and a slowdown buffer. Simulations showed good scalability up to 100 Tbps, which no router could ever offer such features.

  • PLL Frequency Synthesizer with Multi-Phase Detector

    Yasuaki SUMI  Kouichi SYOUBU  Shigeki OBOTE  Yutaka FUKUI  Yoshio ITOH  

     
    PAPER

      Vol:
    E82-A No:3
      Page(s):
    431-435

    The lock-up time of a PLL frequency synthesizer mainly depends on the total loop gain. Since the gain of the conventional phase detector is constant, it is difficult to improve the lock-up time by the phase detector. In this paper, we reconsider the operation of the phase detector and propose the PLL frequency synthesizer with multi-phase detector in which the gain of phase detector is increased by using four stage phase detectors and charge pumps. Then, a higher speed lock-up time and good spurious characteristics can be achieved.

  • A Method of Automatic Skew Normalization for Input Images

    Yasuo KUROSU  Hidefumi MASUZAKI  

     
    PAPER-Image Processing,Computer Graphics and Pattern Recognition

      Vol:
    E81-D No:8
      Page(s):
    909-916

    It becomes essential in practice to improve a processing rate and to divide an image into small segments adjusting a limited memory, because image filing systems handle large images up to A1 size. This paper proposes a new method of an automatic skew normalization, comprising a high-speed skew detection and a distortion-free dividing rotation. We have evaluated the proposed method from the viewpoints of the processing rate and the accuracy for typed documents. As results, the processing rate is 2. 9 times faster than that of a conventional method. A practical processing rate for A1 size documents can be achieved under the condition that the accuracy of a normalized angle is controlled within 0. 3 degrees. Especially, the rotation with dividing can have no error angle, even when the A1 size documents is divided into 200 segments, whereas the conventional method cause the error angle of 1. 68 degrees.

  • Performance Analysis of Parallel Test Generation for Combinational Circuits

    Tomoo INOUE  Takaharu FUJII  Hideo FUJIWARA  

     
    PAPER-Fault Tolerant Computing

      Vol:
    E79-D No:9
      Page(s):
    1257-1265

    The problem of test generation for VLSI circuits computationally requires prohibitive costs. Parallel processing on a multiprocessor system is one of available methods in order to speedup the process for such time-consuming problems. In this paper, we analyze the performance of parallel test generation for combinational circuits. We present two types of parallel test generation systems in which the communication methods are different; vector broadcasting (VB) and fault broadcasting (FB) systems, and analyze the number of generated test vectors, the costs of test vector generation, fault simulation and communication, and the speedup of these parallel test generation systems, where the two types of communication factors; the communication cut-off factor and the communication period, are applied. We also present experimental results on the VB and FB systems implemented on a network of workstations using ISCAS'85 and ISCAS'89 benchmark circuits. The analytical and experimental results show that the total number of test vectors generated in the VB system is the same as that in the FB system, the speedup of the FB system is larger than that of the VB, and it is effective in reducing the communication cost to switch broadcasted data from vectors to faults.