Yuntao WANG Yoshinori AONO Tsuyoshi TAKAGI
The learning with errors (LWE) problem is considered as one of the most compelling candidates as the security base for the post-quantum cryptosystems. For the application of LWE based cryptographic schemes, the concrete parameters are necessary: the length n of secret vector, the moduli q and the deviation σ. In the middle of 2016, Germany TU Darmstadt group initiated the LWE Challenge in order to assess the hardness of LWE problems. There are several approaches to solve the LWE problem via reducing LWE to other lattice problems. Xu et al.'s group solved some LWE Challenge instances using Liu-Nguyen's adapted enumeration technique (reducing LWE to BDD problem) [23] and they published this result at ACNS 2017 [32]. In this paper, at first, we applied the progressive BKZ on the LWE challenge cases of σ/q=0.005 using Kannan's embedding technique. We can intuitively observe that the embedding technique is more efficient with the embedding factor M closer to 1. Then we will analyze the optimal number of samples m for a successful attack on LWE case with secret length of n. Thirdly based on this analysis, we show the practical cost estimations using the precise progressive BKZ simulator. Simultaneously, our experimental results show that for n ≥ 55 and the fixed σ/q=0.005, the embedding technique with progressive BKZ is more efficient than Xu et al.'s implementation of the enumeration algorithm in [32][14]. Moreover, by our parameter setting, we succeed in solving the LWE Challenge over (n,σ/q)=(70, 0.005) using 216.8 seconds (32.73 single core hours).
Rungsiman NARARATWONG Natthawut KERTKEIDKACHORN Nagul COOHAROJANANONE Hitoshi OKADA
Word boundary ambiguity in word segmentation has long been a fundamental challenge within Thai language processing. The Conditional Random Fields (CRF) model is among the best-known methods to have achieved remarkably accurate segmentation. Nevertheless, current advancements appear to have left the problem of compound words unaccounted for. Compound words lose their meaning or context once segmented. Hence, we introduce a dictionary-based word-merging algorithm, which merges all kinds of compound words. Our evaluation shows that the algorithm can accomplish a high-accuracy of word segmentation, with compound words being preserved. Moreover, it can also restore some incorrectly segmented words. Another problem involving a different word-chunking approach is sentence boundary ambiguity. In tackling the problem, utilizing the part of speech (POS) of a segmented word has been found previously to help boost the accuracy of CRF-based sentence segmentation. However, not all segmented words can be tagged. Thus, we propose a POS-based word-splitting algorithm, which splits words in order to increase POS tags. We found that with more identifiable POS tags, the CRF model performs better in segmenting sentences. To demonstrate the contributions of both methods, we experimented with three of their applications. With the word merging algorithm, we found that intact compound words in the product of topic extraction can help to preserve their intended meanings, offering more precise information for human interpretation. The algorithm, together with the POS-based word-splitting algorithm, can also be used to amend word-level Thai-English translations. In addition, the word-splitting algorithm improves sentence segmentation, thus enhancing text summarization.
Qin DAI Naoya INOUE Paul REISERT Kentaro INUI
A tremendous amount of knowledge is present in the ever-growing scientific literature. In order to efficiently grasp such knowledge, various computational tasks are proposed that train machines to read and analyze scientific documents. One of these tasks, Scientific Relation Extraction, aims at automatically capturing scientific semantic relationships among entities in scientific documents. Conventionally, only a limited number of commonly used knowledge bases, such as Wikipedia, are used as a source of background knowledge for relation extraction. In this work, we hypothesize that unannotated scientific papers could also be utilized as a source of external background information for relation extraction. Based on our hypothesis, we propose a model that is capable of extracting background information from unannotated scientific papers. Our experiments on the RANIS corpus [1] prove the effectiveness of the proposed model on relation extraction from scientific articles.
Kazuhito ITO Yuto ISHIHARA Shinichi NISHIZAWA
As LSI chips integrate more transistors and the operating power supply voltage decreases, LSI chips are becoming more vulnerable to the soft error caused by neutrons induced from cosmic rays. The soft error is detected by comparing the duplicated operation results in double modular redundancy (DMR) and the error is corrected by re-executing necessary operations. In this paper, based on the error recovery scheme of re-executing necessary operations, the minimization of the vote operations for error checking with respect to given resource constraints is considered. An ILP model for the optimal solution to the problem is presented and a heuristic algorithm is proposed to minimize the vote operations.
Kazuyoshi TSUCHIYA Chiaki OGAWA Yasuyuki NOGAMI Satoshi UEHARA
Pseudorandom number generators are required to generate pseudorandom numbers which have good statistical properties as well as unpredictability in cryptography. An m-sequence is a linear feedback shift register sequence with maximal period over a finite field. M-sequences have good statistical properties, however we must nonlinearize m-sequences for cryptographic purposes. A geometric sequence is a sequence given by applying a nonlinear feedforward function to an m-sequence. Nogami, Tada and Uehara proposed a geometric sequence whose nonlinear feedforward function is given by the Legendre symbol, and showed the period, periodic autocorrelation and linear complexity of the sequence. Furthermore, Nogami et al. proposed a generalization of the sequence, and showed the period and periodic autocorrelation. In this paper, we first investigate linear complexity of the geometric sequences. In the case that the Chan-Games formula which describes linear complexity of geometric sequences does not hold, we show the new formula by considering the sequence of complement numbers, Hasse derivative and cyclotomic classes. Under some conditions, we can ensure that the geometric sequences have a large linear complexity from the results on linear complexity of Sidel'nikov sequences. The geometric sequences have a long period and large linear complexity under some conditions, however they do not have the balance property. In order to construct sequences that have the balance property, we propose interleaved sequences of the geometric sequence and its complement. Furthermore, we show the periodic autocorrelation and linear complexity of the proposed sequences. The proposed sequences have the balance property, and have a large linear complexity if the geometric sequences have a large one.
The security of a software program critically depends on the prevention of vulnerabilities in the source code; however, conventional computer programs lack the ability to identify vulnerable code in another program. Our research was aimed at developing a technique capable of generating substitution code for the detection of buffer overflow vulnerability in C/C++ programs. The technique automatically verifies and sanitizes code instrumentation by comparing the result of each candidate variable with that expected from the input data. Our results showed that statements containing buffer overflow vulnerabilities could be detected and prevented by using a substitution variable and by sanitizing code vulnerabilities based on the size of the variables. Thus, faults can be detected prior to execution of the statement, preventing malicious access. Our approach is particularly useful for enhancing software security monitoring, and for designing retrofitting techniques in applications.
Low-density chaotic binary sequences generated by Bernoulli map are discussed in this paper. We theoretically evaluate auto-correlation functions of the low-density chaotic binary sequences based on chaos theory.
Kazuichi OE Mitsuru SATO Takeshi NANRI
The response times of solid state drives (SSDs) have decreased dramatically due to the growing use of non-volatile memory express (NVMe) devices. Such devices have response times of less than 100 micro seconds on average. The response times of all-flash-array systems have also decreased dramatically through the use of NVMe SSDs. However, there are applications, particularly virtual desktop infrastructure and in-memory database systems, that require storage systems with even shorter response times. Their workloads tend to contain many input-output (IO) concentrations, which are aggregations of IO accesses. They target narrow regions of the storage volume and can continue for up to an hour. These narrow regions occupy a few percent of the logical unit number capacity, are the target of most IO accesses, and appear at unpredictable logical block addresses. To drastically reduce the response times for such workloads, we developed an automated tiered storage system called “automated tiered storage with fast memory and slow flash storage” (ATSMF) in which the data in targeted regions are migrated between storage devices depending on the predicted remaining duration of the concentration. The assumed environment is a server with non-volatile memory and directly attached SSDs, with the user applications executed on the server as this reduces the average response time. Our system predicts the effect of migration by using the previously monitored values of the increase in response time during migration and the change in response time after migration. These values are consistent for each type of workload if the system is built using both non-volatile memory and SSDs. In particular, the system predicts the remaining duration of an IO concentration, calculates the expected response-time increase during migration and the expected response-time decrease after migration, and migrates the data in the targeted regions if the sum of response-time decrease after migration exceeds the sum of response-time increase during migration. Experimental results indicate that ATSMF is at least 20% faster than flash storage only and that its memory access ratio is more than 50%.
A parallel phrase matching (PM) engine for dictionary compression is presented. Hardware based parallel chaining hash can eliminate erroneous PM results raised by hash collision; while newly-designed storage architecture holding PM results solved the data dependency issue; Thus, the average compression speed is increased by 53%.
Shanding XU Xiwang CAO Jian GAO Chunming TANG
As an optimal combinatorial object, cyclic perfect Mendelsohn difference family (CPMDF) was introduced by Fuji-Hara and Miao to construct optimal optical orthogonal codes. In this paper, we propose a direct construction of disjoint CPMDFs from the Zeng-Cai-Tang-Yang cyclotomy. Compared with a recent work of Fan, Cai, and Tang, our construction doesn't need to depend on a cyclic difference matrix. Furthermore, strictly optimal frequency-hopping sequences (FHSs) are a kind of optimal FHSs which has optimal Hamming auto-correlation for any correlation window. As an application of our disjoint CPMDFs, we present more flexible combinatorial constructions of strictly optimal FHSs, which interpret the previous construction proposed by Cai, Zhou, Yang, and Tang.
In this paper, we deal with coding for synchronous errors. We introduce a synchronously erroneous finite state Markov channel model whose SIR is computable. We apply density evolution analysis [1] and the extended version for FSMC [2] to the channel Numerical experiments demonstrated spatially-coupled codes approach the SIR of the channel.
Yubo LI Shuonan LI Hongqian XUAN Xiuping PENG
In this letter, a generic method to construct mutually orthogonal binary zero correlation zone (ZCZ) sequence sets from mutually orthogonal complementary sequence sets (MOCSSs) with certain properties is presented at first. Then MOCSSs satisfying conditions are generated from binary orthogonal matrices with order N×N, where N=p-1, p is a prime. As a result, mutually orthogonal binary ZCZ sequence sets with parameters (2N2,N,N+1)-ZCZ can be obtained, the number of ZCZ sets is N. Note that each single ZCZ sequence set is optimal with respect to the theoretical bound.
Kento HASEGAWA Masao YANAGISAWA Nozomu TOGAWA
Recently, it has been reported that malicious third-party IC vendors often insert hardware Trojans into their products. Especially in IC design step, malicious third-party vendors can easily insert hardware Trojans in their products and thus we have to detect them efficiently. In this paper, we propose a machine-learning-based hardware-Trojan detection method for gate-level netlists using multi-layer neural networks. First, we extract 11 Trojan-net feature values for each net in a netlist. After that, we classify the nets in an unknown netlist into a set of Trojan nets and that of normal nets using multi-layer neural networks. By experimentally optimizing the structure of multi-layer neural networks, we can obtain an average of 84.8% true positive rate and an average of 70.1% true negative rate while we can obtain 100% true positive rate in some of the benchmarks, which outperforms the existing methods in most of the cases.
Wei LI Huajun GONG Chunlin SHEN Yi WU
Surface light field advances conventional light field rendering techniques by utilizing geometry information. Using surface light field, real-world objects with complex appearance could be faithfully represented. This capability could play an important role in many VR/AR applications. However, an accurate geometric model is needed for surface light field sampling and processing, which limits its wide usage since many objects of interests are difficult to reconstruct with their usually very complex appearances. We propose a novel two-step optimization framework to reduce the dependency of accurate geometry. The key insight is to treat surface light field sampling as a multi-view multi-texture optimization problem. Our approach can deal with both model inaccuracy and image to model misalignment, making it possible to create high-fidelity surface light field models without using high-precision special hardware.
Keisuke OSAWA Hiromasa HABUCHI Yusuke KOZAWA
Lighting constrained visible-light communications are expected as indoor communications of next generation. In lighting constrained visible-light communications, lighting devices are used not only for illuminating rooms but also for optical wireless communications. For lighting constrained visible-light communications, we have been proposed a variable N-parallel code-shift-keying (VN-CSK) using a modified prime sequence code (MPSC). The VN-CSK system using MPSC has not only a suppression function for reducing co-channel interference from neighboring lighting devices, but also a function for keeping constant data transmission regardless of dimming control. In this paper, the bit error rate (BER) of the VN-CSK system using MPSC is derived under an indoor visible-light communication channel by theoretical analysis. Moreover, we evaluate the BER performance for the brightness level (dimming control stage).
Kehai CHEN Tiejun ZHAO Muyun YANG
Learning semantic representation for translation context is beneficial to statistical machine translation (SMT). Previous efforts have focused on implicitly encoding syntactic and semantic knowledge in translation context by neural networks, which are weak in capturing explicit structural syntax information. In this paper, we propose a new neural network with a tree-based convolutional architecture to explicitly learn structural syntax information in translation context, thus improving translation prediction. Specifically, we first convert parallel sentences with source parse trees into syntax-based linear sequences based on a minimum syntax subtree algorithm, and then define a tree-based convolutional network over the linear sequences to learn syntax-based context representation and translation prediction jointly. To verify the effectiveness, the proposed model is integrated into phrase-based SMT. Experiments on large-scale Chinese-to-English and German-to-English translation tasks show that the proposed approach can achieve a substantial and significant improvement over several baseline systems.
Recent studies utilize multiple kernel learning to deal with incomplete-data problem. In this study, we introduce new methods that do not only complete multiple incomplete kernel matrices simultaneously, but also allow control of the flexibility of the model by parameterizing the model matrix. By imposing restrictions on the model covariance, overfitting of the data is avoided. A limitation of kernel matrix estimations done via optimization of an objective function is that the positive definiteness of the result is not guaranteed. In view of this limitation, our proposed methods employ the LogDet divergence, which ensures the positive definiteness of the resulting inferred kernel matrix. We empirically show that our proposed restricted covariance models, employed with LogDet divergence, yield significant improvements in the generalization performance of previous completion methods.
Seiji MIYOSHI Yoshinobu KAJIKAWA
We analyze the behaviors of the FXLMS algorithm using a statistical-mechanical method. The cross-correlation between a primary path and an adaptive filter and the autocorrelation of the adaptive filter are treated as macroscopic variables. We obtain simultaneous differential equations that describe the dynamical behaviors of the macroscopic variables under the condition that the tapped-delay line is sufficiently long. The obtained equations are deterministic and closed-form. We analytically solve the equations to obtain the correlations and finally compute the mean-square error. The obtained theory can quantitatively predict the behaviors of computer simulations including the cases of both not only white but also nonwhite reference signals. The theory also gives the upper limit of the step size in the FXLMS algorithm.