The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] segmentation(284hit)

1-20hit(284hit)

  • Aggregated to Pipelined Structure Based Streaming SSN for 1-ms Superpixel Segmentation System in Factory Automation Open Access

    Yuan LI  Tingting HU  Ryuji FUCHIKAMI  Takeshi IKENAGA  

     
    PAPER-Computer System

      Pubricized:
    2024/07/23
      Vol:
    E107-D No:11
      Page(s):
    1396-1407

    1 millisecond (1-ms) vision systems are gaining increasing attention in diverse fields like factory automation and robotics, as the ultra-low delay ensures seamless and timely responses. Superpixel segmentation is a pivotal preprocessing to reduce the number of image primitives for subsequent processing. Recently, there has been a growing emphasis on leveraging deep network-based algorithms to pursue superior performance and better integration into other deep network tasks. Superpixel Sampling Network (SSN) employs a deep network for feature generation and employs differentiable SLIC for superpixel generation. SSN achieves high performance with a small number of parameters. However, implementing SSN on FPGAs for ultra-low delay faces challenges due to the final layer’s aggregation of intermediate results. To address this limitation, this paper proposes an aggregated to pipelined structure for FPGA implementation. The final layer is decomposed into individual final layers for each intermediate result. This architectural adjustment eliminates the need for memory to store intermediate results. Concurrently, the proposed structure leverages decomposed layers to facilitate a pipelined structure with pixel streaming input to achieve ultra-low latency. To cooperate with the pipelined structure, layer-partitioned memory architecture is proposed. Each final layer has dedicated memory for storing superpixel center information, allowing values to be read and calculated from memory without conflicts. Calculation results of each final layer are accumulated, and the result of each pixel is obtained as the stream reaches the last layer. Evaluation results demonstrate that boundary recall and under-segmentation error remain comparable to SSN, with an average label consistency improvement of 0.035 over SSN. From a hardware performance perspective, the proposed system processes 1000 FPS images with a delay of 0.947 ms/frame.

  • BiConvNet: Integrating Spatial Details and Deep Semantic Features in a Bilateral-Branch Image Segmentation Network Open Access

    Zhigang WU  Yaohui ZHU  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2024/07/16
      Vol:
    E107-D No:11
      Page(s):
    1385-1395

    This article focuses on improving the BiSeNet v2 bilateral branch image segmentation network structure, enhancing its learning ability for spatial details and overall image segmentation accuracy. A modified network called “BiconvNet” is proposed. Firstly, to extract shallow spatial details more effectively, a parallel concatenated strip and dilated (PCSD) convolution module is proposed and used to extract local features and surrounding contextual features in the detail branch. Continuing on, the semantic branch is reconstructed using the lightweight capability of depth separable convolution and high performance of ConvNet, in order to enable more efficient learning of deep advanced semantic features. Finally, fine-tuning is performed on the bilateral guidance aggregation layer of BiSeNet v2, enabling better fusion of the feature maps output by the detail branch and semantic branch. The experimental part discusses the contribution of stripe convolution and different sizes of empty convolution to image segmentation accuracy, and compares them with common convolutions such as Conv2d convolution, CG convolution and CCA convolution. The experiment proves that the PCSD convolution module proposed in this paper has the highest segmentation accuracy in all categories of the Cityscapes dataset compared with common convolutions. BiConvNet achieved a 9.39% accuracy improvement over the BiSeNet v2 network, with only a slight increase of 1.18M in model parameters. A mIoU accuracy of 68.75% was achieved on the validation set. Furthermore, through comparative experiments with commonly used autonomous driving image segmentation algorithms in recent years, BiConvNet demonstrates strong competitive advantages in segmentation accuracy on the Cityscapes and BDD100K datasets.

  • Pool-Unet: A Novel Tongue Image Segmentation Method Based on Pool-Former and Multi-Task Mask Learning Open Access

    Xiangrun LI  Qiyu SHENG  Guangda ZHOU  Jialong WEI  Yanmin SHI  Zhen ZHAO  Yongwei LI  Xingfeng LI  Yang LIU  

     
    PAPER-Image

      Pubricized:
    2024/05/29
      Vol:
    E107-A No:10
      Page(s):
    1609-1620

    Automated tongue segmentation plays a crucial role in the realm of computer-aided tongue diagnosis. The challenge lies in developing algorithms that achieve higher segmentation accuracy and maintain less memory space and swift inference capabilities. To relieve this issue, we propose a novel Pool-unet integrating Pool-former and Multi-task mask learning for tongue image segmentation. First of all, we collected 756 tongue images taken in various shooting environments and from different angles and accurately labeled the tongue under the guidance of a medical professional. Second, we propose the Pool-unet model, combining a hierarchical Pool-former module and a U-shaped symmetric encoder-decoder with skip-connections, which utilizes a patch expanding layer for up-sampling and a patch embedding layer for down-sampling to maintain spatial resolution, to effectively capture global and local information using fewer parameters and faster inference. Finally, a Multi-task mask learning strategy is designed, which improves the generalization and anti-interference ability of the model through the Multi-task pre-training and self-supervised fine-tuning stages. Experimental results on the tongue dataset show that compared to the state-of-the-art method (OET-NET), our method has 25% fewer model parameters, achieves 22% faster inference times, and exhibits 0.91% and 0.55% improvements in Mean Intersection Over Union (MIOU), and Mean Pixel Accuracy (MPA), respectively.

  • Joint 2D and 3D Semantic Segmentation with Consistent Instance Semantic Open Access

    Yingcai WAN  Lijin FANG  

     
    PAPER-Image

      Pubricized:
    2023/12/15
      Vol:
    E107-A No:8
      Page(s):
    1309-1318

    2D and 3D semantic segmentation play important roles in robotic scene understanding. However, current 3D semantic segmentation heavily relies on 3D point clouds, which are susceptible to factors such as point cloud noise, sparsity, estimation and reconstruction errors, and data imbalance. In this paper, a novel approach is proposed to enhance 3D semantic segmentation by incorporating 2D semantic segmentation from RGB-D sequences. Firstly, the RGB-D pairs are consistently segmented into 2D semantic maps using the tracking pipeline of Simultaneous Localization and Mapping (SLAM). This process effectively propagates object labels from full scans to corresponding labels in partial views with high probability. Subsequently, a novel Semantic Projection (SP) block is introduced, which integrates features extracted from localized 2D fragments across different camera viewpoints into their corresponding 3D semantic features. Lastly, the 3D semantic segmentation network utilizes a combination of 2D-3D fusion features to facilitate a merged semantic segmentation process for both 2D and 3D. Extensive experiments conducted on public datasets demonstrate the effective performance of the proposed 2D-assisted 3D semantic segmentation method.

  • CyCSNet: Learning Cycle-Consistency of Semantics for Weakly-Supervised Semantic Segmentation Open Access

    Zhikui DUAN  Xinmei YU  Yi DING  

     
    PAPER-Computer Graphics

      Pubricized:
    2023/12/11
      Vol:
    E107-A No:8
      Page(s):
    1328-1337

    Existing weakly-supervised segmentation approaches based on image-level annotations may focus on the most activated region in the image and tend to identify only part of the target object. Intuitively, high-level semantics among objects of the same category in different images could help to recognize corresponding activated regions of the query. In this study, a scheme called Cycle-Consistency of Semantics Network (CyCSNet) is proposed, which can enhance the activation of the potential inactive regions of the target object by utilizing the cycle-consistent semantics from images of the same category in the training set. Moreover, a Dynamic Correlation Feature Selection (DCFS) algorithm is derived to reduce the noise from pixel-wise samples of low relevance for better training. Experiments on the PASCAL VOC 2012 dataset show that the proposed CyCSNet achieves competitive results compared with state-of-the-art weakly-supervised segmentation approaches.

  • A CNN-Based Feature Pyramid Segmentation Strategy for Acoustic Scene Classification Open Access

    Ji XI  Yue XIE  Pengxu JIANG  Wei JIANG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2024/03/26
      Vol:
    E107-D No:8
      Page(s):
    1093-1096

    Currently, a significant portion of acoustic scene categorization (ASC) research is centered around utilizing Convolutional Neural Network (CNN) models. This preference is primarily due to CNN’s ability to effectively extract time-frequency information from audio recordings of scenes by employing spectrum data as input. The expression of many dimensions can be achieved by utilizing 2D spectrum characteristics. Nevertheless, the diverse interpretations of the same object’s existence in different positions on the spectrum map can be attributed to the discrepancies between spectrum properties and picture qualities. The lack of distinction between different aspects of input information in ASC-based CNN networks may result in a decline in system performance. Considering this, a feature pyramid segmentation (FPS) approach based on CNN is proposed. The proposed approach involves utilizing spectrum features as the input for the model. These features are split based on a preset scale, and each segment-level feature is then fed into the CNN network for learning. The SoftMax classifier will receive the output of all feature scales, and these high-level features will be fused and fed to it to categorize different scenarios. The experiment provides evidence to support the efficacy of the FPS strategy and its potential to enhance the performance of the ASC system.

  • A Retinal Vessel Segmentation Network Fusing Cross-Modal Features Open Access

    Xiaosheng YU  Jianning CHI  Ming XU  

     
    LETTER-Image

      Pubricized:
    2023/11/01
      Vol:
    E107-A No:7
      Page(s):
    1071-1075

    Accurate segmentation of fundus vessel structure can effectively assist doctors in diagnosing eye diseases. In this paper, we propose a fundus blood vessel segmentation network combined with cross-modal features and verify our method on the public data set OCTA-500. Experimental results show that our method has high accuracy and robustness.

  • Amodal Instance Segmentation of Thin Objects with Large Overlaps by Seed-to-Mask Extending Open Access

    Ryohei KANKE  Masanobu TAKAHASHI  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2024/02/29
      Vol:
    E107-D No:7
      Page(s):
    908-911

    Amodal Instance Segmentation (AIS) aims to segment the regions of both visible and invisible parts of overlapping objects. The mainstream Mask R-CNN-based methods are unsuitable for thin objects with large overlaps because of their object proposal features with bounding boxes for three reasons. First, capturing the entire shapes of overlapping thin objects is difficult. Second, the bounding boxes of close objects are almost identical. Third, a bounding box contains many objects in most cases. In this paper, we propose a box-free AIS method, Seed-to-Mask, for thin objects with large overlaps. The method specifies a target object using a seed and iteratively extends the segmented region. We have achieved better performance in experiments on artificial data consisting only of thin objects.

  • Fresh Tea Sprouts Segmentation via Capsule Network Open Access

    Chunhua QIAN  Xiaoyan QIN  Hequn QIANG  Changyou QIN  Minyang LI  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2024/01/17
      Vol:
    E107-D No:5
      Page(s):
    728-731

    The segmentation performance of fresh tea sprouts is inadequate due to the uncontrollable posture. A novel method for Fresh Tea Sprouts Segmentation based on Capsule Network (FTS-SegCaps) is proposed in this paper. The spatial relationship between local parts and whole tea sprout is retained and effectively utilized by a deep encoder-decoder capsule network, which can reduce the effect of tea sprouts with uncontrollable posture. Meanwhile, a patch-based local dynamic routing algorithm is also proposed to solve the parameter explosion problem. The experimental results indicate that the segmented tea sprouts via FTS-SegCaps are almost coincident with the ground truth, and also show that the proposed method has a better performance than the state-of-the-art methods.

  • Single-Line Text Detection in Multi-Line Text with Narrow Spacing for Line-Based Character Recognition

    Chee Siang LEOW  Hideaki YAJIMA  Tomoki KITAGAWA  Hiromitsu NISHIZAKI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2023/08/31
      Vol:
    E106-D No:12
      Page(s):
    2097-2106

    Text detection is a crucial pre-processing step in optical character recognition (OCR) for the accurate recognition of text, including both fonts and handwritten characters, in documents. While current deep learning-based text detection tools can detect text regions with high accuracy, they often treat multiple lines of text as a single region. To perform line-based character recognition, it is necessary to divide the text into individual lines, which requires a line detection technique. This paper focuses on the development of a new approach to single-line detection in OCR that is based on the existing Character Region Awareness For Text detection (CRAFT) model and incorporates a deep neural network specialized in line segmentation. However, this new method may still detect multiple lines as a single text region when multi-line text with narrow spacing is present. To address this, we also introduce a post-processing algorithm to detect single text regions using the output of the single-line segmentation. Our proposed method successfully detects single lines, even in multi-line text with narrow line spacing, and hence improves the accuracy of OCR.

  • Implementing Region-Based Segmentation for Hardware Trojan Detection in FPGAs Cell-Level Netlist

    Ann Jelyn TIEMPO  Yong-Jin JEONG  

     
    LETTER-Dependable Computing

      Pubricized:
    2023/07/28
      Vol:
    E106-D No:11
      Page(s):
    1926-1929

    Field Programmable Gate Array (FPGA) is gaining popularity because of their reconfigurability which brings in security concerns like inserting hardware trojan. Various detection methods to overcome this threat have been proposed but in the ASIC's supply chain and cannot directly apply to the FPGA application. In this paper, the authors aim to implement a structural feature-based detection method for detecting hardware trojan in a cell-level netlist, which is not well explored yet, where the nets are segmented into smaller groups based on their interconnection and further analyzed by looking at their structural similarities. Experiments show positive performance with an average detection rate of 95.41%, an average false alarm rate of 2.87% and average accuracy of 96.27%.

  • Inverse Heat Dissipation Model for Medical Image Segmentation

    Yu KASHIHARA  Takashi MATSUBARA  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/08/22
      Vol:
    E106-D No:11
      Page(s):
    1930-1934

    The diffusion model has achieved success in generating and editing high-quality images because of its ability to produce fine details. Its superior generation ability has the potential to facilitate more detailed segmentation. This study presents a novel approach to segmentation tasks using an inverse heat dissipation model, a kind of diffusion-based models. The proposed method involves generating a mask that gradually shrinks to fit the shape of the desired segmentation region. We comprehensively evaluated the proposed method using multiple datasets under varying conditions. The results show that the proposed method outperforms existing methods and provides a more detailed segmentation.

  • Fusion-Based Edge and Color Recovery Using Weighted Near-Infrared Image and Color Transmission Maps for Robust Haze Removal

    Onhi KATO  Akira KUBOTA  

     
    PAPER

      Pubricized:
    2023/05/23
      Vol:
    E106-D No:10
      Page(s):
    1661-1672

    Various haze removal methods based on the atmospheric scattering model have been presented in recent years. Most methods have targeted strong haze images where light is scattered equally in all color channels. This paper presents a haze removal method using near-infrared (NIR) images for relatively weak haze images. In order to recover the lost edges, the presented method first extracts edges from an appropriately weighted NIR image and fuses it with the color image. By introducing a wavelength-dependent scattering model, our method then estimates the transmission map for each color channel and recovers the color more naturally from the edge-recovered image. Finally, the edge-recovered and the color-recovered images are blended. In this blending process, the regions with high lightness, such as sky and clouds, where unnatural color shifts are likely to occur, are effectively estimated, and the optimal weighting map is obtained. Our qualitative and quantitative evaluations using 59 pairs of color and NIR images demonstrated that our method can recover edges and colors more naturally in weak haze images than conventional methods.

  • A Lightweight and Efficient Infrared Pedestrian Semantic Segmentation Method

    Shangdong LIU  Chaojun MEI  Shuai YOU  Xiaoliang YAO  Fei WU  Yimu JI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2023/06/13
      Vol:
    E106-D No:9
      Page(s):
    1564-1571

    The thermal imaging pedestrian segmentation system has excellent performance in different illumination conditions, but it has some drawbacks(e.g., weak pedestrian texture information, blurred object boundaries). Meanwhile, high-performance large models have higher latency on edge devices with limited computing performance. To solve the above problems, in this paper, we propose a real-time thermal infrared pedestrian segmentation method. The feature extraction layers of our method consist of two paths. Firstly, we utilize the lossless spatial downsampling to obtain boundary texture details on the spatial path. On the context path, we use atrous convolutions to improve the receptive field and obtain more contextual semantic information. Then, the parameter-free attention mechanism is introduced at the end of the two paths for effective feature selection, respectively. The Feature Fusion Module (FFM) is added to fuse the semantic information of the two paths after selection. Finally, we accelerate method inference through multi-threading techniques on the edge computing device. Besides, we create a high-quality infrared pedestrian segmentation dataset to facilitate research. The comparative experiments on the self-built dataset and two public datasets with other methods show that our method also has certain effectiveness. Our code is available at https://github.com/mcjcs001/LEIPNet.

  • Image Segmentation-Based Bicycle Riding Side Identification Method

    Jeyoen KIM  Takumi SOMA  Tetsuya MANABE  Aya KOJIMA  

     
    PAPER

      Pubricized:
    2022/11/02
      Vol:
    E106-A No:5
      Page(s):
    775-783

    This paper attempts to identify which side of the road a bicycle is currently riding on using a common camera for realizing an advanced bicycle navigation system and bicycle riding safety support system. To identify the roadway area, the proposed method performs semantic segmentation on a front camera image captured by a bicycle drive recorder or smartphone. If the roadway area extends from the center of the image to the right, the bicyclist is riding on the left side of the roadway (i.e., the correct riding position in Japan). In contrast, if the roadway area extends to the left, the bicyclist is on the right side of the roadway (i.e., the incorrect riding position in Japan). We evaluated the accuracy of the proposed method on various road widths with different traffic volumes using video captured by riding bicycles in Tsuruoka City, Yamagata Prefecture, and Saitama City, Saitama Prefecture, Japan. High accuracy (>80%) was achieved for any combination of the segmentation model, riding side identification method, and experimental conditions. Given these results, we believe that we have realized an effective image segmentation-based method to identify which side of the roadway a bicycle riding is on.

  • Efficiency Analysis for Inductive Power Transfer Using Segmented Parallel Line Feeder Open Access

    William-Fabrice BROU  Quang-Thang DUONG  Minoru OKADA  

     
    PAPER-Electronic Circuits

      Pubricized:
    2022/10/17
      Vol:
    E106-C No:5
      Page(s):
    165-173

    Parallel line feeder (PLF) consisting of a two-wire transmission line operating in the MHz band has been proposed as a wide-coverage short-distance wireless charging. In the MHz band, a PLF of several meters suffers from standing wave effect, resulting in fluctuation in power transfer efficiency accordingly to the receiver's position. This paper studies a modified version of the system, where the PLF is divided into individually compensated segments to mitigate the standing wave effect. Modelling the PLF as a lossy transmission line, this paper theoretically shows that if the segments' lengths are properly determined, it is able to improve and stabilize the efficiency for all positions. Experimental results at 27.12 MHz confirm the theoretical analysis and show that a fairly high efficiency of 70% can be achieved.

  • 3D Multiple-Contextual ROI-Attention Network for Efficient and Accurate Volumetric Medical Image Segmentation

    He LI  Yutaro IWAMOTO  Xianhua HAN  Lanfen LIN  Akira FURUKAWA  Shuzo KANASAKI  Yen-Wei CHEN  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/02/21
      Vol:
    E106-D No:5
      Page(s):
    1027-1037

    Convolutional neural networks (CNNs) have become popular in medical image segmentation. The widely used deep CNNs are customized to extract multiple representative features for two-dimensional (2D) data, generally called 2D networks. However, 2D networks are inefficient in extracting three-dimensional (3D) spatial features from volumetric images. Although most 2D segmentation networks can be extended to 3D networks, the naively extended 3D methods are resource-intensive. In this paper, we propose an efficient and accurate network for fully automatic 3D segmentation. Specifically, we designed a 3D multiple-contextual extractor to capture rich global contextual dependencies from different feature levels. Then we leveraged an ROI-estimation strategy to crop the ROI bounding box. Meanwhile, we used a 3D ROI-attention module to improve the accuracy of in-region segmentation in the decoder path. Moreover, we used a hybrid Dice loss function to address the issues of class imbalance and blurry contour in medical images. By incorporating the above strategies, we realized a practical end-to-end 3D medical image segmentation with high efficiency and accuracy. To validate the 3D segmentation performance of our proposed method, we conducted extensive experiments on two datasets and demonstrated favorable results over the state-of-the-art methods.

  • A Novel Unambiguous Acquisition Algorithm Based on Segmentation Reconstruction for BOC(n,n) Signal Open Access

    Yuanfa JI  Sisi SONG  Xiyan SUN  Ning GUO  Youming LI  

     
    PAPER-Navigation, Guidance and Control Systems

      Pubricized:
    2022/08/26
      Vol:
    E106-B No:3
      Page(s):
    287-295

    In order to improve the frequency band utilization and avoid mutual interference between signals, the BD3 satellite signals adopt Binary Offset Carrier (BOC) modulation. On one hand, BOC modulation has a narrow main peak width and strong anti-interference ability; on the other hand, the phenomenon of false acquisition locking caused by the multi-peak characteristic of BOC modulation itself needs to be resolved. In this context, this paper proposes a new BOC(n,n) unambiguous acquisition algorithm based on segmentation reconstruction. The algorithm is based on splitting the local BOC signal into four parts in each subcarrier period. The branch signal and the received signal are correlated with the received signal to generate four branch correlation signals. After a series of combined reconstructions, the final signal detection function completely eliminates secondary peaks. A simulation shows that the algorithm can completely eliminate the sub-peak interference for the BOC signals modulated by subcarriers with different phase. The characteristics of narrow correlation peak are retained. Experiments show that the proposed algorithm has superior performance in detection probability and peak-to-average ratio.

  • Split and Eliminate: A Region-Based Segmentation for Hardware Trojan Detection

    Ann Jelyn TIEMPO  Yong-Jin JEONG  

     
    PAPER-Dependable Computing

      Pubricized:
    2022/12/09
      Vol:
    E106-D No:3
      Page(s):
    349-356

    Using third-party intellectual properties (3PIP) has been a norm in IC design development process to meet the time-to-market demand and at the same time minimizing the cost. But this flow introduces a threat, such as hardware trojan, which may compromise the security and trustworthiness of underlying hardware, like disclosing confidential information, impeding normal execution and even permanent damage to the system. In years, different detections methods are explored, from just identifying if the circuit is infected with hardware trojan using conventional methods to applying machine learning where it identifies which nets are most likely are hardware trojans. But the performance is not satisfactory in terms of maximizing the detection rate and minimizing the false positive rate. In this paper, a new hardware trojan detection approach is proposed where gate-level netlist is segmented into regions first before analyzing which nets might be hardware trojans. The segmentation process depends on the nets' connectivity, more specifically by looking on each fanout points. Then, further analysis takes place by means of computing the structural similarity of each segmented region and differentiate hardware trojan nets from normal nets. Experimental results show 100% detection of hardware trojan nets inserted on each benchmark circuits and an overall average of 1.38% of false positive rates which resulted to a higher accuracy with an average of 99.31%.

  • Spatial-Temporal Aggregated Shuffle Attention for Video Instance Segmentation of Traffic Scene

    Chongren ZHAO  Yinhui ZHANG  Zifen HE  Yunnan DENG  Ying HUANG  Guangchen CHEN  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2022/11/24
      Vol:
    E106-D No:2
      Page(s):
    240-251

    Aiming at the problem of spatial focus regions distribution dispersion and dislocation in feature pyramid networks and insufficient feature dependency acquisition in both spatial and channel dimensions, this paper proposes a spatial-temporal aggregated shuffle attention for video instance segmentation (STASA-VIS). First, an mixed subsampling (MS) module to embed activating features from the low-level target area of feature pyramid into the high-level is designed, so as to aggregate spatial information on target area. Taking advantage of the coherent information in video frames, STASA-VIS uses the first ones of every 5 video frames as the key-frames and then propagates the keyframe feature maps of the pyramid layers forward in the time domain, and fuses with the non-keyframe mixed subsampled features to achieve time-domain consistent feature aggregation. Finally, STASA-VIS embeds shuffle attention in the backbone to capture the pixel-level pairwise relationship and dimensional dependencies among the channels and reduce the computation. Experimental results show that the segmentation accuracy of STASA-VIS reaches 41.2%, and the test speed reaches 34FPS, which is better than the state-of-the-art one stage video instance segmentation (VIS) methods in accuracy and achieves real-time segmentation.

1-20hit(284hit)