The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] image generation(13hit)

1-13hit
  • Multiple Layout Design Generation via a GAN-Based Method with Conditional Convolution and Attention

    Xing ZHU  Yuxuan LIU  Lingyu LIANG  Tao WANG  Zuoyong LI  Qiaoming DENG  Yubo LIU  

     
    LETTER-Computer Graphics

      Pubricized:
    2023/06/12
      Vol:
    E106-D No:9
      Page(s):
    1615-1619

    Recently, many AI-aided layout design systems are developed to reduce tedious manual intervention based on deep learning. However, most methods focus on a specific generation task. This paper explores a challenging problem to obtain multiple layout design generation (LDG), which generates floor plan or urban plan from a boundary input under a unified framework. One of the main challenges of multiple LDG is to obtain reasonable topological structures of layout generation with irregular boundaries and layout elements for different types of design. This paper formulates the multiple LDG task as an image-to-image translation problem, and proposes a conditional generative adversarial network (GAN), called LDGAN, with adaptive modules. The framework of LDGAN is based on a generator-discriminator architecture, where the generator is integrated with conditional convolution constrained by the boundary input and the attention module with channel and spatial features. Qualitative and quantitative experiments were conducted on the SCUT-AutoALP and RPLAN datasets, and the comparison with the state-of-the-art methods illustrate the effectiveness and superiority of the proposed LDGAN.

  • Multi-Scale Correspondence Learning for Person Image Generation

    Shi-Long SHEN  Ai-Guo WU  Yong XU  

     
    PAPER-Person Image Generation

      Pubricized:
    2022/04/15
      Vol:
    E106-D No:5
      Page(s):
    804-812

    A generative model is presented for two types of person image generation in this paper. First, this model is applied to pose-guided person image generation, i.e., converting the pose of a source person image to the target pose while preserving the texture of that source person image. Second, this model is also used for clothing-guided person image generation, i.e., changing the clothing texture of a source person image to the desired clothing texture. The core idea of the proposed model is to establish the multi-scale correspondence, which can effectively address the misalignment introduced by transferring pose, thereby preserving richer information on appearance. Specifically, the proposed model consists of two stages: 1) It first generates the target semantic map imposed on the target pose to provide more accurate guidance during the generation process. 2) After obtaining the multi-scale feature map by the encoder, the multi-scale correspondence is established, which is useful for a fine-grained generation. Experimental results show the proposed method is superior to state-of-the-art methods in pose-guided person image generation and show its effectiveness in clothing-guided person image generation.

  • Face Image Generation of Anime Characters Using an Advanced First Order Motion Model with Facial Landmarks

    Junki OSHIBA  Motoi IWATA  Koichi KISE  

     
    PAPER

      Pubricized:
    2022/10/12
      Vol:
    E106-D No:1
      Page(s):
    22-30

    Recently, deep learning for image generation with a guide for the generation has been progressing. Many methods have been proposed to generate the animation of facial expression change from a single face image by transferring some facial expression information to the face image. In particular, the method of using facial landmarks as facial expression information can generate a variety of facial expressions. However, most methods do not focus on anime characters but humans. Moreover, we attempted to apply several existing methods to anime characters by training the methods on an anime character face dataset; however, they generated images with noise, even in regions where there was no change. The first order motion model (FOMM) is an image generation method that takes two images as input and transfers one facial expression or pose to the other. By explicitly calculating the difference between the two images based on optical flow, FOMM can generate images with low noise in the unchanged regions. In the following, we focus on the aspect of the face image generation in FOMM. When we think about the employment of facial landmarks as targets, the performance of FOMM is not enough because FOMM cannot use a facial landmark as a facial expression target because the appearances of a face image and a facial landmark are quite different. Therefore, we propose an advanced FOMM method to use facial landmarks as a facial expression target. In the proposed method, we change the input data and data flow to use facial landmarks. Additionally, to generate face images with expressions that follow the target landmarks more closely, we introduce the landmark estimation loss, which is computed by comparing the landmark detected from the generated image with the target landmark. Our experiments on an anime character face image dataset demonstrated that our method is effective for landmark-guided face image generation for anime characters. Furthermore, our method outperformed other methods quantitatively and generated face images with less noise.

  • Sketch Face Recognition via Cascaded Transformation Generation Network

    Lin CAO  Xibao HUO  Yanan GUO  Kangning DU  

     
    PAPER-Image

      Pubricized:
    2021/04/01
      Vol:
    E104-A No:10
      Page(s):
    1403-1415

    Sketch face recognition refers to matching photos with sketches, which has effectively been used in various applications ranging from law enforcement agencies to digital entertainment. However, due to the large modality gap between photos and sketches, sketch face recognition remains a challenging task at present. To reduce the domain gap between the sketches and photos, this paper proposes a cascaded transformation generation network for cross-modality image generation and sketch face recognition simultaneously. The proposed cascaded transformation generation network is composed of a generation module, a cascaded feature transformation module, and a classifier module. The generation module aims to generate a high quality cross-modality image, the cascaded feature transformation module extracts high-level semantic features for generation and recognition simultaneously, the classifier module is used to complete sketch face recognition. The proposed transformation generation network is trained in an end-to-end manner, it strengthens the recognition accuracy by the generated images. The recognition performance is verified on the UoM-SGFSv2, e-PRIP, and CUFSF datasets; experimental results show that the proposed method is better than other state-of-the-art methods.

  • Conditional Wasserstein Generative Adversarial Networks for Rebalancing Iris Image Datasets

    Yung-Hui LI  Muhammad Saqlain ASLAM  Latifa Nabila HARFIYA  Ching-Chun CHANG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/06/01
      Vol:
    E104-D No:9
      Page(s):
    1450-1458

    The recent development of deep learning-based generative models has sharply intensified the interest in data synthesis and its applications. Data synthesis takes on an added importance especially for some pattern recognition tasks in which some classes of data are rare and difficult to collect. In an iris dataset, for instance, the minority class samples include images of eyes with glasses, oversized or undersized pupils, misaligned iris locations, and iris occluded or contaminated by eyelids, eyelashes, or lighting reflections. Such class-imbalanced datasets often result in biased classification performance. Generative adversarial networks (GANs) are one of the most promising frameworks that learn to generate synthetic data through a two-player minimax game between a generator and a discriminator. In this paper, we utilized the state-of-the-art conditional Wasserstein generative adversarial network with gradient penalty (CWGAN-GP) for generating the minority class of iris images which saves huge amount of cost of human labors for rare data collection. With our model, the researcher can generate as many iris images of rare cases as they want and it helps to develop any deep learning algorithm whenever large size of dataset is needed.

  • A Robust Depth Image Based Rendering Scheme for Stereoscopic View Synthesis with Adaptive Domain Transform Based Filtering Framework

    Wei LIU  Yun Qi TANG  Jian Wei DING  Ming Yue CUI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2018/08/31
      Vol:
    E101-D No:12
      Page(s):
    3138-3149

    Depth image based rendering (DIBR), which is utilized to render virtual views with a color image and the corresponding depth map, is one of the key procedures in the 2D to 3D conversion process. However, some troubling problems, such as depth edge misalignment, disocclusion occurrences and cracks at resampling, still exist in current DIBR systems. To solve these problems, in this paper, we present a robust depth image based rendering scheme for stereoscopic view synthesis. The cores of the proposed scheme are two depth map filters which share a common domain transform based filtering framework. As a first step, a filter of this framework is carried out to realize texture-depth boundary alignments and directional disocclusion reduction smoothing simultaneously. Then after depth map 3D warping, another adaptive filter is used on the warped depth maps with delivered scene gradient structures to further diminish the remaining cracks and noises. Finally, with the optimized depth map of the virtual view, backward texture warping is adopted to retrieve the final texture virtual view. The proposed scheme enables to yield visually satisfactory results for high quality 2D to 3D conversion. Experimental results demonstrate the excellent performances of the proposed approach.

  • Deep Relational Model: A Joint Probabilistic Model with a Hierarchical Structure for Bidirectional Estimation of Image and Labels

    Toru NAKASHIKA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2017/10/25
      Vol:
    E101-D No:2
      Page(s):
    428-436

    Two different types of representations, such as an image and its manually-assigned corresponding labels, generally have complex and strong relationships to each other. In this paper, we represent such deep relationships between two different types of visible variables using an energy-based probabilistic model, called a deep relational model (DRM) to improve the prediction accuracies. A DRM stacks several layers from one visible layer on to another visible layer, sandwiching several hidden layers between them. As with restricted Boltzmann machines (RBMs) and deep Boltzmann machines (DBMs), all connections (weights) between two adjacent layers are undirected. During maximum likelihood (ML) -based training, the network attempts to capture the latent complex relationships between two visible variables with its deep architecture. Unlike deep neural networks (DNNs), 1) the DRM is a totally generative model and 2) allows us to generate one visible variables given the other, and 2) the parameters can be optimized in a probabilistic manner. The DRM can be also fine-tuned using DNNs, like deep belief nets (DBNs) or DBMs pre-training. This paper presents experiments conduced to evaluate the performance of a DRM in image recognition and generation tasks using the MNIST data set. In the image recognition experiments, we observed that the DRM outperformed DNNs even without fine-tuning. In the image generation experiments, we obtained much more realistic images generated from the DRM more than those from the other generative models.

  • Video Encoding Scheme Employing Intra and Inter Prediction Based on Averaged Template Matching Predictors

    Yoshinori SUZUKI  Choong Seng BOON  Thiow Keng TAN  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E91-D No:4
      Page(s):
    1127-1134

    In video compression, the information transmitted from the encoder to the decoder can be classified into two categories: side information, which carries action instructions to be performed, and data such as the residual error of the texture. As video compression technology has matured, better compression has been achieved by increasing the ratio of side information to data, while reducing the overall bit rate. However, there is a limit to this method because the side information becomes a significant fraction of the overall bit rate. In recent video compression technologies, the decoder tends to share the burden of the decision making in order to achieve a higher compression ratio. To further improve the coding efficiency, we tried to provide the decoder with a more active role in reducing the amount of data. According to this approach, by using reconstructed pixels that surround a target block to produce a better sample predictor of the target block, the amount of side information and the residual error of the texture are reduced. Furthermore, multiple candidates of the sample predictor are utilized to create a better sample predictor without increasing the amount of side information. In this paper, we employ a template matching method that makes the decoder more active. The template matching method is applied to the conventional video codec to improve the prediction performance of intra, inter, and bi-directional pictures in video. The results show that improvements in coding efficiency up to 5.8% are achieved.

  • Free Iris and Focus Image Generation by Merging Multiple Differently Focused Images Based on a Three-Dimensional Filtering

    Kazuya KODAMA  Akira KUBOTA  

     
    PAPER

      Vol:
    E90-D No:1
      Page(s):
    191-198

    This paper describes a method of free iris and focus image generation based on transformation integrating multiple differently focused images. First, we assume that objects are defocused by a geometrical blurring model. And we combine acquired images on certain imaging planes and spatial information of objects by using a convolution of a three-dimensional blur. Then, based on spatial frequency analysis of the blur, we design three-dimensional filters that generate free iris and focus images from the acquired images. The method enables us to generate not only an all-in-focus image corresponding to an ideal pin-hole iris but also various images, which would be acquired with virtual irises whose sizes are different from the original one. In order to generate a certain image by using multiple differently focused images, especially very many images, conventional methods usually analyze focused regions of each acquired image independently and construct a depth map. Then, based on the map, the regions are merged into a desired image with some effects. However, generally, it is so difficult to conduct such depth estimation robustly in all regions that these methods cannot prevent merged results from including visible artifacts, which decrease the quality of generated images awfully. In this paper, we propose a method of generating desired images directly and robustly from very many differently focused images without depth estimation. Simulations of image generation are performed utilizing synthetic images to study how certain parameters of the blur and the filter affect the quality of generated images. We also introduce pre-processing that corrects the size of acquired images and a simple method for estimating the parameter of the three-dimensional blur. Finally, we show experimental results of free iris and focus image generation from real images.

  • Viewpoint Vector Rendering for Efficient Elemental Image Generation

    Kyoung Shin PARK  Sung-Wook MIN  Yongjoo CHO  

     
    PAPER

      Vol:
    E90-D No:1
      Page(s):
    233-241

    This paper presents a fast elemental image generation algorithm, called the Viewpoint Vector Rendering (VVR), for the computer-generated integral imaging system. VVR produces a set of elemental images in real-time by assembling the segmented area of the directional scenes taken from a range of viewpoints. This algorithm is less affected by system factors such as the number of elemental lens and the number of polygons. It also supports all display modes of the integral imaging system, real, virtual and focused mode. This paper first describes the characteristics of integral imaging system. It then discusses the design, implementation, and performance evaluation of the VVR algorithm, which can be easily adapted to render the integral images of complex 3D objects.

  • Data Hiding under Fractal Image Generation via Fourier Filtering Method

    Shuichi TAKANO  Kiyoshi TANAKA  Tatsuo SUGIMURA  

     
    PAPER-Image Processing, Image Pattern Recognition

      Vol:
    E84-D No:1
      Page(s):
    171-178

    This paper presents a new data hiding scheme under fractal image generation via Fourier filtering method for Computer Graphics (CG) applications. The data hiding operations are achieved in the frequency domain and a method similar to QAM used in digital communication is introduced for efficient embedding in order to explore both phase and amplitude components simultaneously. Consequently, this scheme enables us not only to generate a natural terrain surface without loss of fractalness analogous to the conventional scheme, but also to embed larger amounts of data into an image depending on the fractal dimension. This scheme ensures the correct decoding of the embedded data under lossy data compression such as JPEG by controlling the quantization exponent used in the embedding process.

  • High-Level Synthesis of a Multithreaded Processor for Image Generation

    Takao ONOYE  Toshihiro MASAKI  Isao SHIRAKAWA  Hiroaki HIRATA  Kozo KIMURA  Shigeo ASAHARA  Takayuki SAGISHIMA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E78-A No:3
      Page(s):
    322-330

    The design procedure of a multithreaded processor dedicated to the image generation is described, which can be achieved by means of a high-level synthesis tool PARTHENON. The processor employs a multithreaded architecture which is a novel promising approach to the parallel image generation. This paper puts special stress on the high-level synthesis scheme which can simplify the behavioral description for the structure and control of a complex hardware, and therefore enables the design of a complicated mechanism for a multithreaded processor. Implementation results of the synthesis are also shown to demonstrate the performance of the designed processor. This processor greatly improves the throughput of the image generation so far attained by the conventional approach.

  • A High Speed Contour Fill Method for Character Image Generation

    Kazuki NAKASHIMA  Masashi KOGA  Katsumi MARUKAWA  Yoshihiro SHIMA  Yasuaki NAKANO  

     
    PAPER

      Vol:
    E77-D No:7
      Page(s):
    832-838

    This paper proposes a new, high-speed method of filling in the contours of alpha-numeric characters to produce correct binary image patterns. We call this method the improved edge-fill method because it improves on a previously developed edge-fill method. Ambiguity of the conventional edge-fill method on binary images are eliminated by selecting fill pixels from combinations of Freeman's chain code, which expresses contour lines. Consequently, the areas inside the contour lines are filled in rapidly and correctly. With the new method, the processing time for character image generation is reduced by ten to tewnty percent over the conventional method. The effectiveness of the new method is examined in experiments using both Arabic numerals and letters from the Roman alphabet. Results show that this fill method is able to produce correct image patterns and that it can be applied to alpha-numeric-character contour filling.