The search functionality is under construction.

Author Search Result

[Author] Kiyoharu AIZAWA(36hit)

1-20hit(36hit)

  • Robust Object-Based Watermarking Using Feature Matching

    Viet-Quoc PHAM  Takashi MIYAKI  Toshihiko YAMASAKI  Kiyoharu AIZAWA  

     
    PAPER-Application Information Security

      Vol:
    E91-D No:7
      Page(s):
    2027-2034

    We present a robust object-based watermarking algorithm using the scale-invariant feature transform (SIFT) in conjunction with a data embedding method based on Discrete Cosine Transform (DCT). The message is embedded in the DCT domain of randomly generated blocks in the selected object region. To recognize the object region after being distorted, its SIFT features are registered in advance. In the detection scheme, we extract SIFT features from the distorted image and match them with the registered ones. Then we recover the distorted object region based on the transformation parameters obtained from the matching result using SIFT, and the watermarked message can be detected. Experimental results demonstrated that our proposed algorithm is very robust to distortions such as JPEG compression, scaling, rotation, shearing, aspect ratio change, and image filtering.

  • Detection and Tracking of Facial Features by Using Edge Pixel Counting and Deformable Circular Template Matching

    Liyanage C. DE SILVA  Kiyoharu AIZAWA  Mitsutoshi HATORI  

     
    PAPER-Image Processing, Computer Graphics and Pattern Recognition

      Vol:
    E78-D No:9
      Page(s):
    1195-1207

    In this paper face feature detection and tracking are discussed, using methods called edge pixel counting and deformable circular template matching. Instead of utilizing color or gray scale information of the facial image, the proposed edge pixel counting method utilizes the edge information to estimate the face feature positions such as eyes, nose and mouth, using a variable size face feature template, the initial size of which is predetermined by using a facial image database. The method is robust in the sense that the detection is possible with facial images with different skin color and different facial orientations. Subsequently, by using a deformable circular template matching two iris positions of the face are determined and are used in the edge pixel counting, to track the features in the next frame. Although feature tracking using gray scale template matching often fails when inter frame correlation around the feature areas are very low due to facial expression change (such as, talking, smiling, eye blinking etc.), feature tracking using edge pixel counting can track facial features reliably. Some experimental results are shown to demonstrate the effectiveness of the proposed method.

  • Subband Image Coding with Biorthogonal Wavelets

    Cha Keon CHEONG  Kiyoharu AIZAWA  Takahiro SAITO  Mitsutoshi HATORI  

     
    PAPER-Image Coding and Compression

      Vol:
    E75-A No:7
      Page(s):
    871-881

    In this paper, subband image coding with symmetric biorthogonal wavelet filters is studied. In order to implement the symmetric biorthogonal wavelet basis, we use the Laplacian Pyramid Model (LPM) and the trigonometric polynomial solution method. These symmetric biorthogonal wavelet basis are used to form filters in each subband. Also coefficients of the filter are optimized with respect to the coding efficiency. From this optimization, we show that the values of a in the LPM generating kernel have the best coding efficiency in the range of 0.7 to 0.75. We also present an optimal bit allocation method based on considerations of the reconstruction filter characteristics. The step size of each subband uniform quantizer is determined by using this bit allocation method. The coding efficiency of the symmetric biorthogonal wavelet filter is compared with those of other filters: QMF, SSKF and Orthonormal wavelet filter. Simulation results demonstrate that the symmetric biorthogonal wavelet filter is useful as a basic means for image analysis/synthesis filters and can give better coding efficiency than other filters.

  • 3-D Modeling of Real World by Fusing Multi-View Range Data and Texture Images

    Conny GUNADI  Hiroyuki SHIMIZU  Kazuya KODAMA  Kiyoharu AIZAWA  

     
    PAPER-Image Processing, Image Pattern Recognition

      Vol:
    E86-D No:5
      Page(s):
    947-955

    Construction of large-scale virtual environment is gaining more attentions for its applications in virtual mall, virtual sightseeing, tele-presence, etc. This paper presents a framework for building a realistic virtual environment from geometry-based approach. We propose an algorithm to construct a realistic 3-D model from multi-view range data and multi-view texture images. The proposed method tries to adopt the result of region segmentation of range images in some phases of the modeling process. It is shown that the relations obtained from region segmentation are quite effective in improving the result of registration as well as mesh merging.

  • SIFT-Based Non-blind Watermarking Robust to Non-linear Geometrical Distortions

    Toshihiko YAMASAKI  Kiyoharu AIZAWA  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E96-D No:6
      Page(s):
    1368-1375

    This paper presents a non-blind watermarking technique that is robust to non-linear geometric distortion attacks. This is one of the most challenging problems for copyright protection of digital content because it is difficult to estimate the distortion parameters for the embedded blocks. In our proposed scheme, the location of the blocks are recorded by the translation parameters from multiple Scale Invariant Feature Transform (SIFT) feature points. This method is based on two assumptions: SIFT features are robust to non-linear geometric distortion and even such non-linear distortion can be regarded as “linear” distortion in local regions. We conducted experiments using 149,800 images (7 standard images and 100 images downloaded from Flickr, 10 different messages, 10 different embedding block patterns, and 14 attacks). The results show that the watermark detection performance is drastically improved, while the baseline method can achieve only chance level accuracy.

  • Estimation of Semantic Impressions from Portraits

    Mari MIYATA  Kiyoharu AIZAWA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/03/18
      Vol:
    E104-D No:6
      Page(s):
    863-872

    In this paper, we present a novel portrait impression estimation method using nine pairs of semantic impression words: bitter-majestic, clear-pure, elegant-mysterious, gorgeous-mature, modern-intellectual, natural-mild, sporty-agile, sweet-sunny, and vivid-dynamic. In the first part of the study, we analyzed the relationship between the facial features in deformed portraits and the nine semantic impression word pairs over a large dataset, which we collected by a crowdsourcing process. In the second part, we leveraged the knowledge from the results of the analysis to develop a ranking network trained on the collected data and designed to estimate the semantic impression associated with a portrait. Our network demonstrated superior performance in impression estimation compared with current state-of-the-art methods.

  • Vision Chip for Very Fast Detection of Motion Vectors: Design and Implementation

    Zheng LI  Kiyoharu AIZAWA  

     
    PAPER-Imaging Circuits and Algorithms

      Vol:
    E82-C No:9
      Page(s):
    1739-1748

    This paper gives a detailed presentation of a "vision chip" for a very fast detection of motion vectors. The chip's design consists of a parallel pixel array and column parallel block-matching processors. Each pixel of the pixel array contains a photo detector, an edge detector and 4 bits of memory. In the detection of motion vectors, first, the gray level image is binarized by the edge detector and subsequently the binary edge data is used in the block matching processor. The block-matching takes place locally in pixel and globally in column. The chip can create a dense field of motion where a vector is assigned to each pixel by overlapping 2 2 target blocks. A prototype with 16 16 pixels and four block-matching processors has been designed and implemented. Preliminary results obtained by the prototype are shown.

  • Quality Improvement Technique for Compressed Image by Merging a Reference Image

    Supatana AUETHAVEKIAT  Kiyoharu AIZAWA  Mitsutoshi HATORI  

     
    PAPER-Image Coding

      Vol:
    E81-B No:12
      Page(s):
    2269-2275

    A novel image improving algorithm for compressed image sequence by merging a reference image is presented. A high quality still image of the same scene is used as a reference image. The degraded images are improved by merging reference image with them. Merging amount is controlled by the resemblance between the reference image and compressed image after applying motion compensation. Experiments conducted on sequences of JPEG images are given. This technique does not need a prior knowledge of compression technique so it can be applied to other techniques as well.

  • Users' Preference Prediction of Real Estate Properties Based on Floor Plan Analysis

    Naoki KATO  Toshihiko YAMASAKI  Kiyoharu AIZAWA  Takemi OHAMA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/11/20
      Vol:
    E103-D No:2
      Page(s):
    398-405

    With the recent advances in e-commerce, it has become important to recommend not only mass-produced daily items, such as books, but also items that are not mass-produced. In this study, we present an algorithm for real estate recommendations. Automatic property recommendations are a highly difficult task because no identical properties exist in the world, occupied properties cannot be recommended, and users rent or buy properties only a few times in their lives. For the first step of property recommendation, we predict users' preferences for properties by combining content-based filtering and Multi-Layer Perceptron (MLP). In the MLP, we use not only attribute data of users and properties, but also deep features extracted from property floor plan images. As a result, we successfully predict users' preference with a Matthews Correlation Coefficient (MCC) of 0.166.

  • Ubiquitous Home: Retrieval of Experiences in a Home Environment

    Gamhewage C. DE SILVA  Toshihiko YAMASAKI  Kiyoharu AIZAWA  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E91-D No:2
      Page(s):
    330-340

    Automated capture and retrieval of experiences at home is interesting due to the wide variety and personal significance of such experiences. We present a system for retrieval and summarization of continuously captured multimedia data from Ubiquitous Home, a two-room house consisting of a large number of cameras and microphones. Data from pressure based sensors on the floor are analyzed to segment footsteps of different persons. Video and audio handover are implemented to retrieve continuous video streams corresponding to moving persons. An adaptive algorithm based on the rate of footsteps summarizes these video streams. A novel method for audio segmentation using multiple microphones is used for video retrieval based on sounds with high accuracy. An experiment, in which a family lived in this house for twelve days, was conducted. The system was evaluated by the residents who used the system for retrieving their own experiences; we report and discuss the results.

  • A Fast Adaptive Algorithm Using Gradient Vectors of Multiple ADF

    Kei IKEDA  Mitsutoshi HATORI  Kiyoharu AIZAWA  

     
    PAPER

      Vol:
    E75-A No:8
      Page(s):
    972-979

    The inherent simplicity of the LMS (Least Mean Square) Algorithm has lead to its wide usage. However, it is well known that high speed convergence and low final misadjustment cannot be realized simultaneously by the conventional LMS method. To overcome this trade-off problem, a new adaptive algorithm using Multiple ADF's (Adaptive Digital Filters) is proposed. The proposed algorithm modifies coefficients using multiple gradient vectors of the squared error, which are computed at different points on the performance surface. First, the proposed algorithm using 2 ADF's is discussed. Simulation results show that both high speed convergence and low final misadjustment can be realized. The computation time of this proposed algorithm is nearly as much as that of LMS if parallel processing techniques are used. Moreover, the proposed algorithm using more than 2 ADF's is discussed. It is understood that if more than 2 ADF's are used, further improvement in the convergence speed in not realized, but a reduction of the final misadjustment and an improvement in the stability are realized. Finally, a method which can improve the convergence property in the presence of correlated input is discussed. It is indicated that using priori knowledge and matrix transformation, the convergence property is quite improved even when a strongly correlated signal input is applied.

  • Ubiquitous Display Controlled by Mobile Terminals

    Kiyoharu AIZAWA  Kentaro KAKAMI  

     
    LETTER

      Vol:
    E85-B No:10
      Page(s):
    2214-2217

    Mobile terminals with Internet services such as i-mode are in wide use, and communication bandwidths are growing even further under 3G technology. However, displays of mobile terminals will remain small in view of their portable size and power consumption. In this paper, we propose a "ubiquitous display" that can be used in combination with mobile terminals. The user operates the mobile terminal and the ubiquitous display shows any content that requires a large screen space.

  • Image Acquisition by Pixel-Based Random-Access Image Sensor for a Real-Time IBR System

    Ryutaro OI  Takayuki HAMAMOTO  Kiyoharu AIZAWA  

     
    PAPER-Signal Processing

      Vol:
    E85-C No:3
      Page(s):
    505-510

    We have studied an image acquisition system for a real-time image- based rendering (IBR) system. In this area, most conventional systems sacrifice spatial or temporal resolution for a large number of input images. However, only a portion of the image data is needed for rendering, and the portion required is determined by the position of the imaginary viewpoint. In this paper, we propose an acquisition system for a real-time image-based rendering system that uses pixel-based random-access image sensors to eliminate the main bottleneck in conventional systems. We have developed a prototype CMOS image sensor, which has 128 128 pixels. We verified the prototype chip's selective readout function. We also verified the sample & hold feature.

  • Personalized Food Image Classifier Considering Time-Dependent and Item-Dependent Food Distribution Open Access

    Qing YU  Masashi ANZAWA  Sosuke AMANO  Kiyoharu AIZAWA  

     
    PAPER

      Pubricized:
    2019/06/21
      Vol:
    E102-D No:11
      Page(s):
    2120-2126

    Since the development of food diaries could enable people to develop healthy eating habits, food image recognition is in high demand to reduce the effort in food recording. Previous studies have worked on this challenging domain with datasets having fixed numbers of samples and classes. However, in the real-world setting, it is impossible to include all of the foods in the database because the number of classes of foods is large and increases continually. In addition to that, inter-class similarity and intra-class diversity also bring difficulties to the recognition. In this paper, we solve these problems by using deep convolutional neural network features to build a personalized classifier which incrementally learns the user's data and adapts to the user's eating habit. As a result, we achieved the state-of-the-art accuracy of food image recognition by the personalization of 300 food records per user.

  • A Study of a Blind Multiple Beam Adaptive Array

    Sanghoon SONG  Yoonki CHOI  Kiyoharu AIZAWA  Mitsutoshi HATORI  

     
    PAPER-Communication Theory and Signals

      Vol:
    E81-A No:6
      Page(s):
    1270-1275

    In land mobile communication, CMA (Constant Modulus Algorithm) has been studied to reduce multipath fading effect. By this method, the transmitted power is not used efficiently since all the multipath components have the same information. To make use of received power efficiently, we propose a Blind Multiple Beam Adaptive Array. It has the following three feature points. First, we use CMA which can reduce the multipath fading effect to some extent without training signal. Second, LMS algorithm which can capture the multipath components which are separated from the reference signal by some extent. Third, we use FDF (Fractional Delay Filter) and TED (Timing Error Detector) loop which can detect and compensate fractional delay. As a result of utilizing the multipath components which is suppressed by CMA, the proposed technique achieves better performance than CMA adaptive array.

  • Movie Map for Virtual Exploration in a City

    Kiyoharu AIZAWA  

     
    INVITED PAPER

      Pubricized:
    2021/10/12
      Vol:
    E105-D No:1
      Page(s):
    38-45

    This paper introduces our work on a Movie Map, which will enable users to explore a given city area using 360° videos. Visual exploration of a city is always needed. Nowadays, we are familiar with Google Street View (GSV) that is an interactive visual map. Despite the wide use of GSV, it provides sparse images of streets, which often confuses users and lowers user satisfaction. Forty years ago, a video-based interactive map was created - it is well-known as Aspen Movie Map. Movie Map uses videos instead of sparse images and seems to improve the user experience dramatically. However, Aspen Movie Map was based on analog technology with a huge effort and never built again. Thus, we renovate the Movie Map using state-of-the-art technology. We build a new Movie Map system with an interface for exploring cities. The system consists of four stages; acquisition, analysis, management, and interaction. After acquiring 360° videos along streets in target areas, the analysis of videos is almost automatic. Frames of the video are localized on the map, intersections are detected, and videos are segmented. Turning views at intersections are synthesized. By connecting the video segments following the specified movement in an area, we can watch a walking view along a street. The interface allows for easy exploration of a target area. It can also show virtual billboards in the view.

  • Quality Enhancement of Conventional Compression with a Learned Side Bitstream

    Takahiro NARUKO  Hiroaki AKUTSU  Koki TSUBOTA  Kiyoharu AIZAWA  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2023/04/25
      Vol:
    E106-D No:8
      Page(s):
    1296-1299

    We propose Quality Enhancement via a Side bitstream Network (QESN) technique for lossy image compression. The proposed QESN utilizes the network architecture of deep image compression to produce a bitstream for enhancing the quality of conventional compression. We also present a loss function that directly optimizes the Bjontegaard delta bit rate (BD-BR) by using a differentiable model of a rate-distortion curve. Experimental results show that QESN improves the rate by 16.7% in the BD-BR compared to Better Portable Graphics.

  • Negative Learning to Prevent Undesirable Misclassification

    Kazuki EGASHIRA  Atsuyuki MIYAI  Qing YU  Go IRIE  Kiyoharu AIZAWA  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/10/05
      Vol:
    E107-D No:1
      Page(s):
    144-147

    We propose a novel classification problem setting where Undesirable Classes (UCs) are defined for each class. UC is the class you specifically want to avoid misclassifying. To address this setting, we propose a framework to reduce the probabilities for UCs while increasing the probability for a correct class.

  • Content-Adaptive Optimization Framework for Universal Deep Image Compression

    Koki TSUBOTA  Kiyoharu AIZAWA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2023/10/24
      Vol:
    E107-D No:2
      Page(s):
    201-211

    While deep image compression performs better than traditional codecs like JPEG on natural images, it faces a challenge as a learning-based approach: compression performance drastically decreases for out-of-domain images. To investigate this problem, we introduce a novel task that we call universal deep image compression, which involves compressing images in arbitrary domains, such as natural images, line drawings, and comics. Furthermore, we propose a content-adaptive optimization framework to tackle this task. This framework adapts a pre-trained compression model to each target image during testing for addressing the domain gap between pre-training and testing. For each input image, we insert adapters into the decoder of the model and optimize the latent representation extracted by the encoder and the adapter parameters in terms of rate-distortion, with the adapter parameters transmitted per image. To achieve the evaluation of the proposed universal deep compression, we constructed a benchmark dataset containing uncompressed images of four domains: natural images, line drawings, comics, and vector arts. We compare our proposed method with non-adaptive and existing adaptive compression methods, and the results show that our method outperforms them. Our code and dataset are publicly available at https://github.com/kktsubota/universal-dic.

  • FOREWORD Open Access

    Kiyoharu AIZAWA  

     
    FOREWORD

      Vol:
    E90-D No:1
      Page(s):
    75-75
1-20hit(36hit)