1-4hit |
Rizal Setya PERDANA Yoshiteru ISHIDA
This study presents a formulation for generating context-aware natural language by machine from visual representation. Given an image sequence input, the visual storytelling task (VST) aims to generate a coherent, object-focused, and contextualized sentence story. Previous works in this domain faced a problem in modeling an architecture that works in temporal multi-modal data, which led to a low-quality output, such as low lexical diversity, monotonous sentences, and inaccurate context. This study introduces a further improvement, that is, an end-to-end architecture, called cross-modal contextualize attention, optimized to extract visual-temporal features and generate a plausible story. Visual object and non-visual concept features are encoded from the convolutional feature map, and object detection features are joined with language features. Three scenarios are defined in decoding language generation by incorporating weights from a pre-trained language generation model. Extensive experiments are conducted to confirm that the proposed model outperforms other models in terms of automatic metrics and manual human evaluation.
Takeshi OKAMOTO Yoshiteru ISHIDA
More than forty thousands computer viruses have appeared so far since the first virus. Six computer viruses on average appear every day. Enormous expansion of the computer network opened a thread of explosive spread of computer viruses. In this paper, we propose a distributed approach against computer virus using the computer network that allows distributed and agent-based approach. Our system is composed of an immunity-based system similar to the biological immune system and recovery system similar to the recovery mechanism by cell division. The immunity-based system recognizes "non-self" (which includes computer viruses) using the "self" information. The immunity-based system uses agents similar to an antibody, a natural killer cell and a helper T-cell. The recover system uses a copy agent which sends an uninfected copy to infected computer on LAN, or receives from uninfected computer on LAN. We implemented a prototype with JAVATM known as a multi-platform language. In experiments, we confirmed that the proposed system works against some of existing computer viruses that can infect programs for MS-DOSTM.
Yoshiteru ISHIDA Norihiko ADACHI Hidekatsu TOKUMARU
This paper presents a simple algorithm for diagnosis of a graph-theoretical, self-diagnosis model. The algorithm is based on a ranking method. That is, the algorithm uses the analogy between the rule used in the ranking method: the player who was hit more players directly or indirectly should be ranked lower and the rule used in diagnosis : the unit which is identified as faulty by more other units directly or indirectly should be more strongly suspected as faulty. With this algorithm, faulty units are identified from a given syndrome by simple calculation of a matrix. Although another algorithm is proposed whose complexity is less than the algorithm proposed here, this algorithm has the following two features : (1) Simplicity : The algorithm uses only the matrix multiplication and further, the matrix is directly obtainable from syndromes. Thus the algorithm can be implemented easily as a computer program. (2) Universality : This algorithm can be used not only for self-diagnosis model of PMC type, but for other types of self-diagnosis models including probabilistic model with slight modifications. These features of the algorithm are investigated for the systems with design D1t (t1, 2). The modification of the algorithm for the probabilistic self-diagnosis model is also discussed. The approach seems to open a channel between theoretical works of ranking theory in the field of graph theory and syndrome decoding of models in system diagnosis.
Rizal Setya PERDANA Yoshiteru ISHIDA
Automatic generation of textual stories from visual data representation, known as visual storytelling, is a recent advancement in the problem of images-to-text. Instead of using a single image as input, visual storytelling processes a sequential array of images into coherent sentences. A story contains non-visual concepts as well as descriptions of literal object(s). While previous approaches have applied external knowledge, our approach was to regard the non-visual concept as the semantic correlation between visual modality and textual modality. This paper, therefore, presents new features representation based on a canonical correlation analysis between two modalities. Attention mechanism are adopted as the underlying architecture of the image-to-text problem, rather than standard encoder-decoder models. Canonical Correlation Attention Mechanism (CAAM), the proposed end-to-end architecture, extracts time series correlation by maximizing the cross-modal correlation. Extensive experiments on VIST dataset ( http://visionandlanguage.net/VIST/dataset.html ) were conducted to demonstrate the effectiveness of the architecture in terms of automatic metrics, with additional experiments show the impact of modality fusion strategy.