The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] sort(66hit)

1-20hit(66hit)

  • High-Parallelism and Pipelined Architecture for Accelerating Sort-Merge Join on FPGA Open Access

    Meiting XUE  Wenqi WU  Jinfeng LUO  Yixuan ZHANG  Bei ZHAO  

     
    PAPER-Algorithms and Data Structures

      Pubricized:
    2024/05/28
      Vol:
    E107-A No:10
      Page(s):
    1582-1594

    Join is an important but data-intensive and compute-intensive operation in database systems. Moreover, there are multiple types of join operations according to different join conditions and data relationships with diverse complexities. Because most existing solutions for accelerating the join operation on field programmable gate arrays (FPGAs) focus only on the easiest join application, this study presents a novel architecture that is suitable for multiple types of join operation. This architecture has a modular design and consists of three components that are executed sequentially and in pipeline. Specifically, the top-K sorter is used instead of the full sorter to reduce resource utilization and advance the merge processing. Further, the architecture is perfectly compatible with both N-to-1 and N-to-M join relationships, and can also adapt well to both equi-join and band-join. Experimental results show that this design, which is implemented on an FPGA, achieved a high join throughput of 242.1 million tuples per second, which is better than other reported FPGA implementations.

  • A POMDP-Based Approach to Assortment Optimization Problem for Vending Machine Open Access

    Gaku NEMOTO  Kunihiko HIRAISHI  

     
    PAPER-Mathematical Systems Science

      Pubricized:
    2023/09/05
      Vol:
    E107-A No:6
      Page(s):
    909-918

    Assortment optimization is one of main problems for retailers, and has been widely studied. In this paper, we focus on vending machines, which have many characteristic issues to be considered. We first formulate an assortment optimization problem for vending machines, next propose a model that represents consumer’s decision making, and then show a solution method based on partially observable Markov decision process (POMDP). The problem includes incomplete state observation, stochastic consumer behavior and policy decisions that maximize future expected rewards. Using computer simulation, we observe that sales increases compared to that by heuristic methods under the same condition. Moreover, the sales approaches the theoretical upper bound.

  • Design of a Hippocampal Cognitive Prosthesis Chip

    Ming NI  Yan HAN  Ray C. C. CHEUNG  Xuemeng ZHOU  

     
    PAPER-Electronic Circuits

      Pubricized:
    2022/12/09
      Vol:
    E106-C No:7
      Page(s):
    417-426

    This paper presents a hippocampal cognitive prosthesis chip designed for restoring the ability to form new long-term memories due to hippocampal system damage. The system-on-chip (SOC) consists of a 16-channel micro-power low-noise amplifier (LNA), high-pass filters, analog-digital converters (ADCs), a 16-channel spike-sorter, a generalized Laguerre-Volterra model multi-input, multi-output (GLVM-MIMO) hippocampal processor, an 8-channel neural stimulator and peripheral circuits. The proposed LNA achieved a voltage gain of 50dB, input-referred noise of 3.95µVrms, and noise efficiency factor (NEF) of 3.45 with the power consumption of 3.3µW. High-pass filters with a 300-Hz bandwidth are used to filter out the unwanted local field potential (LFP). 4 12-bit successive approximation register (SAR) ADCs with a signal-to-noise-and-distortion ratio (SNDR) of 63.37dB are designed for the digitization of the neural signals. A 16-channel spike-sorter has been integrated in the chip enabling a detection accuracy of 98.3% and a classification accuracy of 93.4% with power consumption of 19µW/ch. The MIMO hippocampal model processor predict output spatio-temporal patterns in CA1 according to the recorded input spatio-temporal patterns in CA3. The neural stimulator performs bipolar, symmetrical charge-balanced stimulation with a maximum current of 310µA, triggered by the processor output. The chip has been fabricated in 40nm standard CMOS technology, occupying a silicon area of 3mm2.

  • Blockchain-Based Pension System Ensuring Security, Provenance and Efficiency

    Minhaz KAMAL  Chowdhury Mohammad ABDULLAH  Fairuz SHAIARA  Abu Raihan Mostofa KAMAL  Md Mehedi HASAN  Jik-Soo KIM  Md Azam HOSSAIN  

     
    LETTER-Office Information Systems, e-Business Modeling

      Pubricized:
    2023/02/21
      Vol:
    E106-D No:5
      Page(s):
    1085-1088

    The literature presents a digitized pension system based on a consortium blockchain, with the aim of overcoming existing pension system challenges such as multiparty collaboration, manual intervention, high turnaround time, cost transparency, auditability, etc. In addition, the adoption of hyperledger fabric and the introduction of smart contracts aim to transform multi-organizational workflow into a synchronized, automated, modular, and error-free procedure.

  • Operations Smart Contract to Realize Decentralized System Operations Workflow for Consortium Blockchain

    Tatsuya SATO  Taku SHIMOSAWA  Yosuke HIMURA  

     
    PAPER

      Pubricized:
    2022/05/27
      Vol:
    E105-B No:11
      Page(s):
    1318-1331

    Enterprises have paid attention to consortium blockchains like Hyperledger Fabric, which is one of the most promising platforms, for efficient decentralized transactions without depending on any particular organization. A consortium blockchain-based system will be typically built across multiple organizations. In such blockchain-based systems, system operations across multiple organizations in a decentralized manner are essential to maintain the value of introducing consortium blockchains. Decentralized system operations have recently been becoming realistic with the evolution of consortium blockchains. For instance, the release of Hyperledger Fabric v2.x, in which individual operational tasks for a blockchain network, such as command execution of configuration change of channels (Fabric's sub-networks) and upgrade of chaincodes (Fabric's smart contracts), can be partially executed in a decentralized manner. However, the operations workflows also include the preceding procedure of pre-sharing, coordinating, and pre-agreeing the operational information (e.g., configuration parameters) among organizations, after which operation executions can be conducted, and this preceding procedure relies on costly manual tasks. To realize efficient decentralized operations workflows for consortium blockchain-based systems in general, we propose a decentralized inter-organizational operations method that we call Operations Smart Contract (OpsSC), which defines an operations workflow as a smart contract. Furthermore, we design and implement OpsSC for blockchain network operations with Hyperledger Fabric v2.x. This paper presents OpsSC for operating channels and chaincodes, which are essential for managing the blockchain networks, through clarifying detailed workflows of those operations. A cost evaluation based on an estimation model shows that the total operational cost for executing a typical operational scenario to add an organization to a blockchain network having ten organizations could be reduced by 54 percent compared with a conventional script-based method. The implementation of OpsSC has been open-sourced and registered as one of Hyperledger Labs projects, which hosts experimental projects approved by Hyperledger.

  • Resource Efficient Top-K Sorter on FPGA

    Binhao HE  Meiting XUE  Shubiao LIU  Feng YU  Weijie CHEN  

     
    LETTER-Digital Signal Processing

      Pubricized:
    2022/03/02
      Vol:
    E105-A No:9
      Page(s):
    1372-1376

    The top-K sorting is a variant of sorting used heavily in applications such as database management systems. Recently, the use of field programmable gate arrays (FPGAs) to accelerate sorting operation has attracted the interest of researchers. However, existing hardware top-K sorting algorithms are either resource-intensive or of low throughput. In this paper, we present a resource-efficient top-K sorting architecture that is composed of L cascading sorting units, and each sorting unit is composed of P sorting cells. K=PL largest elements are produced when a variable length input sequence is processed. This architecture can operate at a high frequency while consuming fewer resources. The experimental results show that our architecture achieved a maximum 1.2x throughput-to-resource improvement compared to previous studies.

  • Anomaly Detection Using Spatio-Temporal Context Learned by Video Clip Sorting

    Wen SHAO  Rei KAWAKAMI  Takeshi NAEMURA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/02/08
      Vol:
    E105-D No:5
      Page(s):
    1094-1102

    Previous studies on anomaly detection in videos have trained detectors in which reconstruction and prediction tasks are performed on normal data so that frames on which their task performance is low will be detected as anomalies during testing. This paper proposes a new approach that involves sorting video clips, by using a generative network structure. Our approach learns spatial contexts from appearances and temporal contexts from the order relationship of the frames. Experiments were conducted on four datasets, and we categorized the anomalous sequences by appearance and motion. Evaluations were conducted not only on each total dataset but also on each of the categories. Our method improved detection performance on both anomalies with different appearance and different motion from normality. Moreover, combining our approach with a prediction method produced improvements in precision at a high recall.

  • ExamChain: A Privacy-Preserving Onscreen Marking System Based on Consortium Blockchain

    Haoyang AN  Jiageng CHEN  

     
    PAPER

      Pubricized:
    2021/12/06
      Vol:
    E105-D No:2
      Page(s):
    235-247

    The development of educational informatization makes data privacy particularly important in education. With society's development, the education system is complicated, and the result of education evaluation becomes more and more critical to students. The evaluation process of education must be justice and transparent. In recent years, the Onscreen Marking (OSM) system based on traditional cloud platforms has been widely used in various large-scale public examinations. However, due to the excessive concentration of power in the existing scheme, the mainstream marking process is not transparent, and there are hidden dangers of black-box operation, which will damage the fairness of the examination. In addition, issues related to data security and privacy are still considered to be severe challenges. This paper deals with the above problems by providing secure and private transactions in a distributed OSM assuming the semi-trusted examination center. We have implemented a proof-of-concept for a consortium blockchain-based OSM in a privacy-preserving and auditable manner, enabling markers to mark on the distributed ledger anonymously. We have proposed a distributed OSM system in high-level, which provides theoretical support for the fair evaluation process of education informatization. It has particular theoretical and application value for education combined with blockchain.

  • Lempel-Ziv Factorization in Linear-Time O(1)-Workspace for Constant Alphabets

    Weijun LIU  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2021/08/30
      Vol:
    E104-D No:12
      Page(s):
    2145-2153

    Computing the Lempel-Ziv Factorization (LZ77) of a string is one of the most important problems in computer science. Nowadays, it has been widely used in many applications such as data compression, text indexing and pattern discovery, and already become the heart of many file compressors like gzip and 7zip. In this paper, we show a linear time algorithm called Xone for computing the LZ77, which has the same space requirement with the previous best space requirement for linear time LZ77 factorization called BGone. Xone greatly improves the efficiency of BGone. Experiments show that the two versions of Xone: XoneT and XoneSA are about 27% and 31% faster than BGoneT and BGoneSA, respectively.

  • Design and VLSI Implementation of a Sorted MMSE QR Decomposition for 4×4 MIMO Detectors

    Lu SUN  Bin WU  Tianchun YE  

     
    LETTER-VLSI Design Technology and CAD

      Pubricized:
    2020/10/12
      Vol:
    E104-A No:4
      Page(s):
    762-767

    In this letter, a low latency, high throughput and hardware efficient sorted MMSE QR decomposition (MMSE-SQRD) for multiple-input multiple-output (MIMO) systems is presented. In contrast to the method of extending the complex matrix to real model and thereafter applying real-valued QR decomposition (QRD), we develop a highly parallel decomposition scheme based on coordinate rotation digital computer (CORDIC) which performs the QRD in complex domain directly and then converting the complex result to its real counterpart. The proposed scheme can greatly improve the processing parallelism and curtail the nullification and sorting procedures. Besides, we also design the corresponding pipelined hardware architecture of the MMSE-SQRD based on highly parallel Givens rotation structure with CORDIC algorithm for 4×4 MIMO detectors. The proposed MMSE-SQRD is implemented in SMIC 55nm CMOS technology achieving up to 50M QRD/s throughput and a latency of 59 clock cycles with only 218 kilo-gates (KG). Compared to the previous works, the proposed design achieves the highest normalized throughput efficiency and lowest processing latency.

  • High-Performance and Hardware-Efficient Odd-Even Based Merge Sorter

    Elsayed A. ELSAYED  Kenji KISE  

     
    PAPER-Computer System

      Pubricized:
    2020/08/13
      Vol:
    E103-D No:12
      Page(s):
    2504-2517

    Data sorting is an important operation in computer science. It is extensively used in several applications such as database and searching. While high-performance sorting accelerators are in demand, it is very important to pay attention to the hardware resources for such kind of high-performance sorters. In this paper, we propose three FPGA based architectures to accelerate sorting operation based on the merge sorting algorithm. We call our proposals as WMS: Wide Merge Sorter, EHMS: Efficient Hardware Merge Sorter, and EHMSP: Efficient Hardware Merge Sorter Plus. We target the Virtex UltraScale FPGA device. Evaluation results show that our proposed merge sorters maintain both the high-performance and cost-effective properties. While using much fewer hardware resources, our proposed merge sorters achieve higher performance compared to the state-of-the-art. For instance, with 256 sorted records are produced per cycle, implementation results of proposed EHMS show a significant reduction in the required number of Flip Flops (FFs) and Look-Up Tables (LUTs) to about 66% and 79%, respectively over the state-of-the-art merge sorter. Moreover, while requiring fewer hardware resources, EHMS achieves about 1.4x higher throughput than the state-of-the-art merge sorter. For the same number of produced records, proposed WMS also achieves about 1.6x throughput improvement over the state-of-the-art while requiring about 81% of FFs and 76% of LUTs needed by the state-of-the-art sorter.

  • Compression by Substring Enumeration Using Sorted Contingency Tables

    Takahiro OTA  Hiroyoshi MORITA  Akiko MANADA  

     
    PAPER-Information Theory

      Vol:
    E103-A No:6
      Page(s):
    829-835

    This paper proposes two variants of improved Compression by Substring Enumeration (CSE) with a finite alphabet. In previous studies on CSE, an encoder utilizes inequalities which evaluate the number of occurrences of a substring or a minimal forbidden word (MFW) to be encoded. The inequalities are derived from a contingency table including the number of occurrences of a substring or an MFW. Moreover, codeword length of a substring and an MFW grows with the difference between the upper and lower bounds deduced from the inequalities, however the lower bound is not tight. Therefore, we derive a new tight lower bound based on the contingency table and consequently propose a new CSE algorithm using the new inequality. We also propose a new encoding order of substrings and MFWs based on a sorted contingency table such that both its row and column marginal total are sorted in descending order instead of a lexicographical order used in previous studies. We then propose a new CSE algorithm which is the first proposed CSE algorithm using the new encoding order. Experimental results show that compression ratios of all files of the Calgary corpus in the proposed algorithms are better than those of a previous study on CSE with a finite alphabet. Moreover, compression ratios under the second proposed CSE get better than or equal to that under a well-known compressor for 11 files amongst 14 files in the corpus.

  • Sorting Matrix Architecture for Continuous Data Sequences

    Meiting XUE  Huan ZHANG  Weijun LI  Feng YU  

     
    LETTER-Algorithms and Data Structures

      Vol:
    E103-A No:2
      Page(s):
    542-546

    Sorting is one of the most fundamental problems in mathematics and computer science. Because high-throughput and flexible sorting is a key requirement in modern databases, this paper presents efficient techniques for designing a high-throughput sorting matrix that supports continuous data sequences. There have been numerous studies on the optimization of sorting circuits on FPGA (field-programmable gate array) platforms. These studies focused on attaining high throughput for a single command with fixed data width. However, the architectures proposed do not meet the requirement of diversity for database data types. A sorting matrix architecture is thus proposed to overcome this problem. Our design consists of a matrix of identical basic sorting cells. The sorting cells work in a pipeline and in parallel, and the matrix can simultaneously process multiple data streams, which can be combined into a high-width single-channel data stream or low-width multiple-channel data streams. It can handle continuous sequences and allows for sorting variable-length data sequences. Its maximum throughput is approximately 1.4 GB/s for 32-bit sequences and approximately 2.5 GB/s for 64-bit sequences on our platform.

  • Multiple Matrix Rank Minimization Approach to Audio Declipping

    Ryohei SASAKI  Katsumi KONISHI  Tomohiro TAKAHASHI  Toshihiro FURUKAWA  

     
    LETTER-Speech and Hearing

      Pubricized:
    2017/12/06
      Vol:
    E101-D No:3
      Page(s):
    821-825

    This letter deals with an audio declipping problem and proposes a multiple matrix rank minimization approach. We assume that short-time audio signals satisfy the autoregressive (AR) model and formulate the declipping problem as a multiple matrix rank minimization problem. To solve this problem, an iterative algorithm is provided based on the iterative partial matrix shrinkage (IPMS) algorithm. Numerical examples show its efficiency.

  • Efficient Parallel Join Processing Exploiting SIMD in Multi-Thread Environments

    Gilseok HONG  Seonghyeon KANG  Chang soo KIM  Jun-Ki MIN  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2017/12/14
      Vol:
    E101-D No:3
      Page(s):
    659-667

    In this paper, we study parallel join processing to improve the performance of the merge phase of sort-merge join by integrating all parallelism provided by mainstream CPUs. Modern CPUs support SIMD instruction sets with wider SIMD registers which allows to process multiple data items per each instruction. Thus, we devise an efficient parallel join algorithm, called Parallel Merge Join with SIMD instructions (PMJS). In our proposed algorithm, we utilize data parallelism by exploiting SIMD instructions. And we also accelerate the performance by avoiding the usage of conditional branch instructions. Furthermore, to take advantage of the multiple cores, our proposed algorithm is threaded in multi-thread environments. In our multi-thread algorithm, to distribute workload evenly to each thread, we devise an efficient workload balancing algorithm based on the kernel density estimator which allows to estimate the workload of each thread accurately.

  • A Spectrum-Sharing Approach in Heterogeneous Networks Based on Multi-Objective Optimization

    Runze WU  Jiajia ZHU  Liangrui TANG  Chen XU  Xin WU  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2016/12/27
      Vol:
    E100-B No:7
      Page(s):
    1145-1151

    Deploying low power nodes (LPNs), which reuse the spectrum licensed to a macrocell network, is considered to be a promising way to significantly boost network capacity. Due to the spectrum-sharing, the deployment of LPNs could trigger the severe problem of interference including intra-tier interference among dense LPNs and inter-tier interference between LPNs and the macro base station (MBS), which influences the system performance strongly. In this paper, we investigate a spectrum-sharing approach in the downlink for two-tier networks, which consists of small cells (SCs) with several LPNs and a macrocell with a MBS, aiming to mitigate the interference and improve the capacity of SCs. The spectrum-sharing approach is described as a multi-objective optimization problem. The problem is solved by the nondominated sorting genetic algorithm version II (NSGA-II), and the simulations show that the proposed spectrum-sharing approach is superior to the existing one.

  • A High Performance FPGA-Based Sorting Accelerator with a Data Compression Mechanism

    Ryohei KOBAYASHI  Kenji KISE  

     
    PAPER-Computer System

      Pubricized:
    2017/01/30
      Vol:
    E100-D No:5
      Page(s):
    1003-1015

    Sorting is an extremely important computation kernel that has been accelerated in a lot of fields such as databases, image processing, and genome analysis. Given that advent of Internet of Things (IoT) era due to mobile technology progressions, the future needs a sorting method that is available on any environment, such as not only high performance systems like servers but also low computational performance machines like embedded systems. In this paper, we present an FPGA-based sorting accelerator combining Sorting Network and Merge Sorter Tree, which is customizable by means of tuning design parameters. The proposed FPGA accelerator sorts data sent from a host PC via the PCIe bus, and sends back the fully sorted data sequence to it. We also present a detailed analytical model that accurately estimates the sorting performance. Due to these characteristics, designers can know how fast a developed sorting hardware is in advance and can implement the best one to fulfill the cost and performance constraints. Our experiments show that the proposed hardware achieves up to 19.5x sorting performance, compared with Intel Core i7-3770K operating at 3.50GHz, when sorting 256M 32-bits integer elements. However, this result is limited because of insufficient memory bandwidth. To overcome this problem, we propose a data compression mechanism and the experimental result shows that the sorting hardware with it achieves almost 90% of the estimated performance, while the hardware without it does about 60%. In order to allow every designer to easily and freely use this accelerator, the RTL source code is released as open-source hardware.

  • Modular Serial Pipelined Sorting Architecture for Continuous Variable-Length Sequences with a Very Simple Control Strategy

    Tingting CHEN  Weijun LI  Feng YU  Qianjian XING  

     
    LETTER-Circuit Theory

      Vol:
    E100-A No:4
      Page(s):
    1074-1078

    A modular serial pipelined sorting architecture for continuous input sequences is presented. It supports continuous sequences, whose lengths can be dynamically changed, and does so using a very simple control strategy. It consists of identical serial cascaded sorting cells, and lends itself to high frequency implementation with any number of sorting cells, because both data and control signals are pipelined. With L cascaded sorting cells, it produces a fully sorted result for sequences whose length N is equal to or less than L+1; for longer sequences, the largest L elements are sorted out. Being modularly designed, several independent smaller sorters can be dynamically configured to form a larger sorter.

  • On the Stack Number and the Queue Number of the Bubble-Sort Graph

    Yuuki TANAKA  

     
    PAPER

      Vol:
    E99-A No:6
      Page(s):
    1012-1018

    In this paper, we consider the stack layout of the bubble-sort graph. The bubble-sort graph is a type of Cayley graph on a symmetric group; the bubble-sort graph has an important role for the study of Cayley graphs as interconnection networks. The stack layout and the queue layout problem that are treated in this paper have been studied widely. In this paper, we show that the stack number of the bubble-sort graph BS(n) is either n-1 or n-2. In addition, we show that an upper bound of the queue number of BS(n) is n-2.

  • Sorting Method for Fully Homomorphic Encrypted Data Using the Cryptographic Single-Instruction Multiple-Data Operation

    Pyung KIM  Younho LEE  Hyunsoo YOON  

     
    PAPER-Fundamental Theories for Communications

      Vol:
    E99-B No:5
      Page(s):
    1070-1086

    In this paper, we present a faster (wall-clock time) sorting method for numerical data subjected to fully homomorphic encryption (FHE). Owing to circuit-based construction and the FHE security property, most existing sorting methods cannot be applied to encrypted data without significantly compromising efficiency. The proposed algorithm utilizes the cryptographic single-instruction multiple-data (SIMD) operation, which is supported by most existing FHE algorithms, to reduce the computational overhead. We conducted a careful analysis of the number of required recryption operations, which are the computationally dominant operations in FHE. Accordingly, we verified that the proposed SIMD-based sorting algorithm completes the given task more quickly than existing sorting methods if the number of data items and (or) the maximum bit length of each data item exceed specific thresholds.

1-20hit(66hit)