1-6hit |
Yuma MASUBUCHI Masaki HASHIMOTO Akira OTSUKA
Binary code similarity comparison methods are mainly used to find bugs in software, to detect software plagiarism, and to reduce the workload during malware analysis. In this paper, we propose a method to compare the binary code similarity of each function by using a combination of Control Flow Graphs (CFGs) and disassembled instruction sequences contained in each function, and to detect a function with high similarity to a specified function. One of the challenges in performing similarity comparisons is that different compile-time optimizations and different architectures produce different binary code. The main units for comparing code are instructions, basic blocks and functions. The challenge of functions is that they have a graph structure in which basic blocks are combined, making it relatively difficult to derive similarity. However, analysis tools such as IDA, display the disassembled instruction sequence in function units. Detecting similarity on a function basis has the advantage of facilitating simplified understanding by analysts. To solve the aforementioned challenges, we use machine learning methods in the field of natural language processing. In this field, there is a Transformer model, as of 2017, that updates each record for various language processing tasks, and as of 2021, Transformer is the basis for BERT, which updates each record for language processing tasks. There is also a method called node2vec, which uses machine learning techniques to capture the features of each node from the graph structure. In this paper, we propose SIBYL, a combination of Transformer and node2vec. In SIBYL, a method called Triplet-Loss is used during learning so that similar items are brought closer and dissimilar items are moved away. To evaluate SIBYL, we created a new dataset using open-source software widely used in the real world, and conducted training and evaluation experiments using the dataset. In the evaluation experiments, we evaluated the similarity of binary codes across different architectures using evaluation indices such as Rank1 and MRR. The experimental results showed that SIBYL outperforms existing research. We believe that this is due to the fact that machine learning has been able to capture the features of the graph structure and the order of instructions on a function-by-function basis. The results of these experiments are presented in detail, followed by a discussion and conclusion.
The security and reliability of Arabic text exchanged via the Internet have become a challenging area for the research community. Arabic text is very sensitive to modify by malicious attacks and easy to make changes on diacritics i.e. Fat-ha, Kasra and Damma, which are represent the syntax of Arabic language and can make the meaning is differing. In this paper, a Hybrid of Natural Language Processing and Zero-Watermarking Approach (HNLPZWA) has been proposed for the content authentication and tampering detection of Arabic text. The HNLPZWA approach embeds and detects the watermark logically without altering the original text document to embed a watermark key. Fifth level order of word mechanism based on hidden Markov model is integrated with digital zero-watermarking techniques to improve the tampering detection accuracy issues of the previous literature proposed by the researchers. Fifth-level order of Markov model is used as a natural language processing technique in order to analyze the Arabic text. Moreover, it extracts the features of interrelationship between contexts of the text and utilizes the extracted features as watermark information and validates it later with attacked Arabic text to detect any tampering occurred on it. HNLPZWA has been implemented using PHP with VS code IDE. Tampering detection accuracy of HNLPZWA is proved with experiments using four datasets of varying lengths under multiple random locations of insertion, reorder and deletion attacks of experimental datasets. The experimental results show that HNLPZWA is more sensitive for all kinds of tampering attacks with high level accuracy of tampering detection.
Khalid MAHMOOD Mazen ALOBAIDI Hironao TAKAHASHI
The automation of traceability links or traceability matrices is important to many software development paradigms. In turn, the efficiency and effectiveness of the recovery of traceability links in the distributed software development is becoming increasingly vital due to complexity of project developments, as this include continuous change in requirements, geographically dispersed project teams, and the complexity of managing the elements of a project - time, money, scope, and people. Therefore, the traceability links among the requirements artifacts, which fulfill business objectives, is also critical to reduce the risk and ensures project‘s success. This paper proposes Autonomous Decentralized Semantic based Traceability Link Recovery (AD-STLR) architecture. According to best of our knowledge this is the first architectural approach that uses an autonomous decentralized concept, DBpedia knowledge-base, Babelnet 2.5 multilingual dictionary and semantic network, for finding similarity among different project artifacts and the automation of traceability links recovery.
Hong LIU Yang YANG Xiumei YANG Zhengmin ZHANG
Small cell networks have been promoted as an enabling solution to enhance indoor coverage and improve spectral efficiency. Users usually deploy small cells on-demand and pay no attention to global profile in residential areas or offices. The reduction of cell radius leads to dense deployment which brings intractable computation complexity for resource allocation. In this paper, we develop a semi-distributed resource allocation algorithm by dividing small cell networks into clusters with limited inter-cluster interference and selecting a reference cluster for interference estimation to reduce the coordination degree. Numerical results show that the proposed algorithm can maintain similar system performance while having low complexity and reduced information exchange overheads.
Harksoo KIM Choong-Nyoung SEON Jungyun SEO
Most of commercial websites provide customers with menu-driven navigation and keyword search. However, these inconvenient interfaces increase the number of mouse clicks and decrease customers' interest in surfing the websites. To resolve the problem, we propose an information retrieval assistant using a natural language interface in online sales domains. The information retrieval assistant has a client-server structure; a system connector and a NLP (natural language processing) server. The NLP server performs a linguistic analysis of users' queries with the help of coordinated NLP agents that are based on shallow NLP techniques. After receiving the results of the linguistic analysis from the NLP server, the system connector interacts with outer information provision systems such as conventional information retrieval systems and relational database management systems according to the analysis results. Owing to the client-server structure, we can easily add other information provision systems to the information retrieval assistant with trivial modifications of the NLP server. In addition, the information retrieval assistant guarantees fast responses because it uses shallow NLP techniques. In the preliminary experiment, as compared to the menu-driven system, we found that the information retrieval assistant could reduce the bothersome tasks such as menu selecting and mouse clicking because it provides a convenient natural language interface.
This paper presents a novel method to speed up neural network (NN) based face detection systems. NN-based face detection can be viewed as a classification and search problem. The proposed method formulates the face search problem as an integer nonlinear optimization problem (INLP) and expands the basic particle swarm optimization (PSO) to handle it. PSO works with a population of particles, each representing a subwindow in an input image. The subwindows are evaluated by how well they match a NN based face filter. A face is indicated when the filter response of the best particle is above a given threshold. Experiments on a set of 42 test images show the effectiveness of the proposed approach. Moreover, the effect of PSO parameter settings on the search performance was investigated.