The search functionality is under construction.

Keyword Search Result

[Keyword] scalar(57hit)

1-20hit(57hit)

  • Simulation of Scalar-Mode Optically Pumped Magnetometers to Search Optimal Operating Conditions Open Access

    Yosuke ITO  Tatsuya GOTO  Takuma HORI  

     
    INVITED PAPER

      Pubricized:
    2023/12/04
      Vol:
    E107-C No:6
      Page(s):
    164-170

    In recent years, measuring biomagnetic fields in the Earth’s field by differential measurements of scalar-mode OPMs have been actively attempted. In this study, the sensitivity of the scalar-mode OPMs under the geomagnetic environment in the laboratory was studied by numerical simulation. Although the noise level of the scalar-mode OPM in the laboratory environment was calculated to be 104 pT/$\sqrt{\mathrm{Hz}}$, the noise levels using the first-order and the second-order differential configurations were found to be 529 fT/cm/$\sqrt{\mathrm{Hz}}$ and 17.2 fT/cm2/$\sqrt{\mathrm{Hz}}$, respectively. This result indicated that scalar-mode OPMs can measure very weak magnetic fields such as MEG without high-performance magnetic shield roomns. We also studied the operating conditions by varying repetition frequency and temperature. We found that scalar-mode OPMs have an upper limit of repetition frequency and temperature, and that the repetition frequency should be set below 4 kHz and the temperature should be set below 120°C.

  • Reducing Energy Consumption of Wakeup Logic through Double-Stage Tag Comparison

    Yasutaka MATSUDA  Ryota SHIOYA  Hideki ANDO  

     
    PAPER-Computer System

      Pubricized:
    2021/11/02
      Vol:
    E105-D No:2
      Page(s):
    320-332

    The high energy consumption of current processors causes several problems, including a limited clock frequency, short battery lifetime, and reduced device reliability. It is therefore important to reduce the energy consumption of the processor. Among resources in a processor, the issue queue (IQ) is a large consumer of energy, much of which is consumed by the wakeup logic. Within the wakeup logic, the tag comparison that checks source operand readiness consumes a significant amount of energy. This paper proposes an energy reduction scheme for tag comparison, called double-stage tag comparison. This scheme first compares the lower bits of the tag and then, only if these match, compares the higher bits. Because the energy consumption of tag comparison is roughly proportional to the total number of bits compared, energy is saved by reducing this number. However, this sequential comparison increases the delay of the IQ, thereby increasing the clock cycle time. Although this can be avoided by allocating an extra cycle to the issue operation, this in turn degrades the IPC. To avoid IPC degradation, we reconfigure a small number of entries in the IQ, where several oldest instructions that are likely to have an adverse effect on performance reside, to a single stage for tag comparison. Our evaluation results for SPEC2017 benchmark programs show that the double-stage tag comparison achieves on average a 21% reduction in the energy consumed by the wakeup logic (15% when including the overhead) with only 3.0% performance degradation.

  • Low Latency 256-bit $mathbb{F}_p$ ECDSA Signature Generation Crypto Processor

    Shotaro SUGIYAMA  Hiromitsu AWANO  Makoto IKEDA  

     
    PAPER

      Vol:
    E101-A No:12
      Page(s):
    2290-2296

    A 256-bit $mathbb{F}_p$ ECDSA crypto processor featuring low latency, low energy consumption and capability of changing the Elliptic curve parameters is designed and fabricated in SOTB 65nm CMOS process. We have demonstrated the lowest ever reported signature generation time of 31.3 μs at 238MHz clock frequency. Energy consumption is 3.28 μJ/signature-generation, which is same as the lowest reported till date. We have also derived addition formulae on Elliptic curve useful for reduce the number of registers and operation cycles.

  • An Improvement of Scalar Multiplication by Skew Frobenius Map with Multi-Scalar Multiplication for KSS Curve

    Md. Al-Amin KHANDAKER  Yasuyuki NOGAMI  

     
    PAPER

      Vol:
    E100-A No:9
      Page(s):
    1838-1845

    Scalar multiplication over higher degree rational point groups is often regarded as the bottleneck for faster pairing based cryptography. This paper has presented a skew Frobenius mapping technique in the sub-field isomorphic sextic twisted curve of Kachisa-Schaefer-Scott (KSS) pairing friendly curve of embedding degree 18 in the context of Ate based pairing. Utilizing the skew Frobenius map along with multi-scalar multiplication procedure, an efficient scalar multiplication method for KSS curve is proposed in the paper. In addition to the theoretic proposal, this paper has also presented a comparative simulation of the proposed approach with plain binary method, sliding window method and non-adjacent form (NAF) for scalar multiplication. The simulation shows that the proposed method is about 60 times faster than plain implementation of other compared methods.

  • Skewed Multistaged Multibanked Register File for Area and Energy Efficiency

    Junji YAMADA  Ushio JIMBO  Ryota SHIOYA  Masahiro GOSHIMA  Shuichi SAKAI  

     
    PAPER-Computer System

      Pubricized:
    2017/01/11
      Vol:
    E100-D No:4
      Page(s):
    822-837

    The region that includes the register file is a hot spot in high-performance cores that limits the clock frequency. Although multibanking drastically reduces the area and energy consumption of the register files of superscalar processor cores, it suffers from low IPC due to bank conflicts. Our skewed multistaging drastically reduces not the bank conflict probability but the pipeline disturbance probability by the second stage. The evaluation results show that, compared with NORCS, which is the latest research on a register file for area and energy efficiency, a proposed register file with 18 banks achieves a 39.9% and 66.4% reduction in circuit area and in energy consumption, while maintaining a relative IPC of 97.5%.

  • FXA: Executing Instructions in Front-End for Energy Efficiency

    Ryota SHIOYA  Ryo TAKAMI  Masahiro GOSHIMA  Hideki ANDO  

     
    PAPER-Computer System

      Pubricized:
    2016/01/06
      Vol:
    E99-D No:4
      Page(s):
    1092-1107

    Out-of-order superscalar processors have high performance but consume a large amount of energy for dynamic instruction scheduling. We propose a front-end execution architecture (FXA) for improving the energy efficiency of out-of-order superscalar processors. FXA has two execution units: an out-of-order execution unit (OXU) and an in-order execution unit (IXU). The OXU is the execution core of a common out-of-order superscalar processor. In contrast, the IXU consists only of functional units and a bypass network only. The IXU is placed at the processor front end and executes instructions in order. The IXU functions as a filter for the OXU. Fetched instructions are first fed to the IXU, and the instructions are executed in order if they are ready to execute. The instructions executed in the IXU are removed from the instruction pipeline and are not executed in the OXU. The IXU does not include dynamic scheduling logic, and thus its energy consumption is low. Evaluation results show that FXA can execute more than 50% of the instructions by using IXU, thereby making it possible to shrink the energy-consuming OXU without incurring performance degradation. As a result, FXA achieves both high performance and low energy consumption. We evaluated FXA and compared it with conventional out-of-order/in-order superscalar processors after ARM big.LITTLE architecture. The results show that FXA achieves performance improvements of 7.4% on geometric mean in SPECCPU INT 2006 benchmark suite relative to a conventional superscalar processor (big), while reducing the energy consumption by 17% in the entire processor. The performance/energy ratio (the inverse of the energy-delay product) of FXA is 25% higher than that of a conventional superscalar processor (big) and 27% higher than that of a conventional in-order superscalar processor (LITTLE).

  • Improvement of Renamed Trace Cache through the Reduction of Dependent Path Length for High Energy Efficiency

    Ryota SHIOYA  Hideki ANDO  

     
    PAPER-Computer System

      Pubricized:
    2015/12/04
      Vol:
    E99-D No:3
      Page(s):
    630-640

    Out-of-order superscalar processors rename register numbers to remove false dependencies between instructions. A renaming logic for register renaming is a high-cost module in a superscalar processor, and it consumes considerable energy. A renamed trace cache (RTC) was proposed for reducing the energy consumption of a renaming logic. An RTC caches and reuses renamed operands, and thus, register renaming can be omitted on RTC hits. However, conventional RTCs suffer from several performance, energy consumption, and hardware overhead problems. We propose a semi-global renamed trace cache (SGRTC) that caches only renamed operands that are short distance from producers outside traces, and solves the problems of conventional RTCs. Evaluation results show that SGRTC achieves 64% lower energy consumption for renaming with a 0.2% performance overhead as compared to a conventional processor.

  • Performance of Dynamic Instruction Window Resizing for a Given Power Budget under DVFS Control

    Hideki ANDO  Ryota SHIOYA  

     
    PAPER-Computer System

      Pubricized:
    2015/11/12
      Vol:
    E99-D No:2
      Page(s):
    341-350

    Dynamic instruction window resizing (DIWR) is a scheme that effectively exploits both memory-level parallelism and instruction-level parallelism by configuring the instruction window size appropriately for exploiting each parallelism. Although a previous study has shown that the DIWR processor achieves a significant speedup, power consumption has not been explored. The power consumption is increased in DIWR because the instruction window resources are enlarged in memory-intensive phases. If the power consumption exceeds the power budget determined by certain requirements, the DIWR processor must save power and thus, the performance previously presented cannot be achieved. In this paper, we explore to what extent the DIWR processor can achieve improved performance for a given power budget, assuming that dynamic voltage and frequency scaling (DVFS) is introduced as a power saving technique. Evaluation results using the SPEC2006 benchmark programs show that the DIWR processor, even with a constrained power budget, achieves a speedup over the conventional processor over a wide range of given power budgets. At the most important power budget point, i.e., when the power a conventional processor consumes without any power constraint is supplied, DIWR achieves a 16% speedup.

  • Improving Width-3 Joint Sparse Form to Attain Asymptotically Optimal Complexity on Average Case

    Hiroshi IMAI  Vorapong SUPPAKITPAISARN  

     
    LETTER

      Vol:
    E98-A No:6
      Page(s):
    1216-1222

    In this paper, we improve a width-3 joint sparse form proposed by Okeya, Katoh, and Nogami. After the improvement, the representation can attain an asymtotically optimal complexity found in our previous work. Although claimed as optimal by the authors, the average computation time of multi-scalar multiplication obtained by the representation is 563/1574n+o(n)≈0.3577n+o(n). That number is larger than the optimal complexity 281/786n+o(n)≈0.3575n+o(n) found in our previous work. To optimize the width-3 joint sparse form, we add more cases to the representation. After the addition, we can show that the complexity is updated to 281/786n+o(n)≈0.3575n+o(n), which implies that the modified representation is asymptotically optimal. Compared to our optimal algorithm in the previous work, the modified width-3 joint sparse form uses less dynamic memory, but it consumes more static memory.

  • MLP-Aware Dynamic Instruction Window Resizing in Superscalar Processors for Adaptively Exploiting Available Parallelism

    Yuya KORA  Kyohei YAMAGUCHI  Hideki ANDO  

     
    PAPER-Computer System

      Pubricized:
    2014/09/22
      Vol:
    E97-D No:12
      Page(s):
    3110-3123

    Single-thread performance has not improved much over the past few years, despite an ever increasing transistor budget. One of the reasons for this is that there is a speed gap between the processor and main memory, known as the memory wall. A promising method to overcome this memory wall is aggressive out-of-order execution by extensively enlarging the instruction window resources to exploit memory-level parallelism (MLP). However, simply enlarging the window resources lengthens the clock cycle time. Although pipelining the resources solves this problem, it in turn prevents instruction-level parallelism (ILP) from being exploited because issuing instructions requires multiple clock cycles. This paper proposed a dynamic scheme that adaptively resizes the instruction window based on the predicted available parallelism, either ILP or MLP. Specifically, if the scheme predicts that MLP is available during execution, the instruction window is enlarged and the window resources are pipelined, thereby exploiting MLP. Conversely, if the scheme predicts that less MLP is available, that is, ILP is exploitable for improved performance, the instruction window is shrunk and the window resources are de-pipelined, thereby exploiting ILP. Our evaluation results using the SPEC2006 benchmark programs show that the proposed scheme achieves nearly the best performance possible with fixed-size resources. On average, our scheme realizes a performance improvement of 21% over that of a conventional processor, with additional cost of only 6% of the area of the conventional processor core or 3% of that of the entire processor chip. The evaluation results also show 8% better energy efficiency in terms of 1/EDP (energy-delay product).

  • Implementation of an Elliptic Curve Scalar Multiplication Method Using Division Polynomials

    Naoki KANAYAMA  Yang LIU  Eiji OKAMOTO  Kazutaka SAITO  Tadanori TERUYA  Shigenori UCHIYAMA  

     
    LETTER

      Vol:
    E97-A No:1
      Page(s):
    300-302

    We implemented a scalar multiplication method over elliptic curves using division polynomials. We adapt an algorithm for computing elliptic nets proposed by Stange. According to our experimental results, the scalar multiplication method using division polynomials is faster than the binary method in an affine coordinate system.

  • An Adaptive Model for Particle Fluid Surface Reconstruction

    Fengquan ZHANG  Xukun SHEN  Xiang LONG  

     
    LETTER-Computer Graphics

      Vol:
    E96-D No:5
      Page(s):
    1247-1250

    In this letter, we present an efficient method for high quality surface reconstruction from simulation data of smoothed particles hydrodynamics (SPH). For computational efficiency, instead of computing scalar field in overall particle sets, we only construct scalar field around fluid surfaces. Furthermore, an adaptive scalar field model is proposed, which adaptively adjusts the smoothing length of ellipsoidal kernel by a constraint-correction rule. Then the isosurfaces are extracted from the scalar field data. The proposed method can not only effectively preserve fluid details, such as splashes, droplets and surface wave phenomena, but also save computational costs. The experimental results show that our method can reconstruct the realistic fluid surfaces with different particle sets.

  • Dual-Core Framework: Eliminating the Bottleneck Effect of Scalar Kernels on SIMD Architectures

    Yaohua WANG  Shuming CHEN  Hu CHEN  Jianghua WAN  Kai ZHANG  Sheng LIU  

     
    LETTER-Computer System

      Vol:
    E96-D No:2
      Page(s):
    365-369

    The efficiency of ubiquitous SIMD (Single Instruction Multiple Data) media processors is seriously limited by the bottleneck effect of the scalar kernels in media applications. To solve this problem, a dual-core framework, composed of a micro control unit and an instruction buffer, is proposed. This framework can dynamically decouple the scalar and vector pipelines of the original single-core SIMD architecture into two free-running cores. Thus, the bottleneck effect can be eliminated by effectively exploiting the parallelism between scalar and vector kernels. The dual-core framework achieves the best attributes of both single-core and dual-core SIMD architectures. Experimental results exhibit an average performance improvement of 33%, at an area overhead of 4.26%. What's more, with the increase of the SIMD width, higher performance gain and lower cost can be expected.

  • Delay Evaluation of Issue Queue in Superscalar Processors with Banking Tag RAM and Correct Critical Path Identification

    Kyohei YAMAGUCHI  Yuya KORA  Hideki ANDO  

     
    PAPER-Computer System

      Vol:
    E95-D No:9
      Page(s):
    2235-2246

    This paper evaluates the delay of the issue queue in a superscalar processor to aid microarchitectural design, where quick quantification of the complexity of the issue queue is needed to consider the tradeoff between clock cycle time and instructions per cycle. Our study covers two aspects. First, we introduce banking tag RAM, which comprises the issue queue, to reduce the delay. Unlike normal RAM, this is not straightforward, because of the uniqueness of the issue queue organization. Second, we explore and identify the correct critical path in the issue queue. In a previous study, the critical path of each component in the issue queue was summed to obtain the issue queue delay, but this does not give the correct delay of the issue queue, because the critical paths of the components are not connected logically. In the evaluation assuming 32-nm LSI technology, we obtained the delays of issue queues with eight to 128 entries. The process of banking tag RAM and identifying the correct critical path reduces the delay by up to 20% and 23% for 4- and 8-issue widths, respectively, compared with not banking tag RAM and simply summing the critical path delay of each component.

  • Scalar Multiplication on Kummer Surface Revisited

    Qiping LIN  Fangguo ZHANG  

     
    LETTER-Cryptography and Information Security

      Vol:
    E95-A No:1
      Page(s):
    410-413

    The main benefit of HECC is that it has much smaller parameter sizes and offers equivalent security as ECC and RSA. However, there are still more researches on ECC than on HECC. One of the reasons is that the computation of scalar multiplication cannot catch up. The Kummer surface can speed up the scalar multiplication in genus two curves. In this paper, we find that the scalar multiplication formulas of Duquesne in characteristic p > 3 on the Kummer surface are not correct. We verify and revise the formulas with mathematical software. The operation counts become 29M + 2S for new pseudo-addition formulas and 30M + 10S for doubling ones. And then we speed up the scalar multiplication on the Kummer surface with Euclidean addition chains.

  • Scalar Multiplication on Pairing Friendly Elliptic Curves

    Naoki KANAYAMA  Tadanori TERUYA  Eiji OKAMOTO  

     
    PAPER

      Vol:
    E94-A No:6
      Page(s):
    1285-1292

    In the present paper, we propose elliptic curve scalar multiplication methods on pairing-friendly elliptic curves. The proposed method is efficient on elliptic curves on which Atei pairing or optimal pairing is efficiently computed.

  • IP Lookup Using the Novel Idea of Scalar Prefix Search with Fast Table Updates

    Mohammad BEHDADFAR  Hossein SAIDI  Masoud-Reza HASHEMI  Ali GHIASIAN  Hamid ALAEI  

     
    PAPER

      Vol:
    E93-D No:11
      Page(s):
    2932-2943

    Recently, we have proposed a new prefix lookup algorithm which would use the prefixes as scalar numbers. This algorithm could be applied to different tree structures such as Binary Search Tree and some other balanced trees like RB-tree, AVL-tree and B-tree with minor modifications in the search, insert and/or delete procedures to make them capable of finding the prefixes of an incoming string e.g. an IP address. As a result, the search procedure complexity would be O(log n) where n is the number of prefixes stored in the tree. More important, the search complexity would not depend on the address length w i.e. 32 for IPv4 and 128 for IPv6. Here, it is assumed that interface to memory is wide enough to access the prefix and some simple operations like comparison can be done in O(1) even for the word length w. Moreover, insertion and deletion procedures of this algorithm are much simpler and faster than its competitors. In what follows, we report the software implementation results of this algorithm and compare it with other solutions for both IPv4 and IPv6. It also reports on a simple hardware implementation of the algorithm for IPv4. Comparison results show better lookup and update performances or superior storage requirements for Scalar Prefix Search in both average and worst cases.

  • On the Computational Sequence of Scalar Multiplication with Left-to-Right Recoded NAF and Sliding Window Technique

    Chien-Ning CHEN  Sung-Ming YEN  SangJae MOON  

     
    PAPER-Cryptography and Information Security

      Vol:
    E93-A No:10
      Page(s):
    1806-1812

    Simple power analysis (SPA) can be employed in examining the power consumption trace of elliptic curve scalar multiplication to retrieve the computational sequence. However, SPA cannot distinguish point addition from point subtraction. The attacker still requires an exhaustive search to recover the private key when it is recoded in NAF or recoded by the 2-bit sliding window method. The average Hamming weight of an n-bit NAF recoded scalar is n/3, and an exhaustive search among the 2n/3 candidates is required. This paper shows that in a left-to-right NAF recoded or a left-to-right 2-bit sliding window manipulated scalar the relative position of nonzero bits will reveal their values. Our analysis skill reduces the number of candidates of the scalar from the naive search of 2n/3 to 22n/9 and 20.19n respectively for the cases of NAF and sliding window method.

  • Sole Inversion Precomputation for Elliptic Curve Scalar Multiplications

    Erik DAHMEN  Katsuyuki OKEYA  

     
    PAPER-Cryptography and Information Security

      Vol:
    E93-A No:6
      Page(s):
    1140-1147

    This paper presents a new approach to precompute points [3]P, [5]P,..., [2k-1]P, for some k ≥ 2 on an elliptic curve over Fp. Those points are required for the efficient evaluation of a scalar multiplication, the most important operation in elliptic curve cryptography. The proposed method precomputes the points in affine coordinates and needs only one single field inversion for the computation. The new method is superior to all known methods that also use one field inversion, if the required memory is taken into consideration. Compared to methods that require several field inversions for the precomputation, the proposed method is faster for a broad range of ratios of field inversions and field multiplications. The proposed method benefits especially from ratios as they occur on smart cards.

  • Scalar Multiplication Using Frobenius Expansion over Twisted Elliptic Curve for Ate Pairing Based Cryptography

    Yasuyuki NOGAMI  Yumi SAKEMI  Takumi OKIMOTO  Kenta NEKADO  Masataka AKANE  Yoshitaka MORIKAWA  

     
    PAPER-Mathematics

      Vol:
    E92-A No:1
      Page(s):
    182-189

    For ID-based cryptography, not only pairing but also scalar multiplication must be efficiently computable. In this paper, we propose a scalar multiplication method on the circumstances that we work at Ate pairing with Barreto-Naehrig (BN) curve. Note that the parameters of BN curve are given by a certain integer, namely mother parameter. Adhering the authors' previous policy that we execute scalar multiplication on subfield-twisted curve (Fp2) instead of doing on the original curve E(Fp12), we at first show sextic twisted subfield Frobenius mapping (ST-SFM) in (Fp2). On BN curves, note is identified with the scalar multiplication by p. However a scalar is always smaller than the order r of BN curve for Ate pairing, so ST-SFM does not directly applicable to the above circumstances. We then exploit the expressions of the curve order r and the characteristic p by the mother parameter to derive some radices such that they are expressed as a polynomial of p. Thus, a scalar multiplication [s] can be written by the series of ST-SFMs . In combination with the binary method or multi-exponentiation technique, this paper shows that the proposed method runs about twice or more faster than plain binary method.

1-20hit(57hit)