White-box cryptographic implementations often use masking and shuffling as countermeasures against key extraction attacks. To counter these defenses, higher-order Differential Computation Analysis (HO-DCA) and its variants have been developed. These methods aim to breach these countermeasures without needing reverse engineering. However, these non-invasive attacks are expensive and can be thwarted by updating the masking and shuffling techniques. This paper introduces a simple binary injection attack, aptly named clear & return, designed to bypass advanced masking and shuffling defenses employed in white-box cryptography. The attack involves injecting a small amount of assembly code, which effectively disables run-time random sources. This loss of randomness exposes the unprotected lookup value within white-box implementations, making them vulnerable to simple statistical analysis. In experiments targeting open-source white-box cryptographic implementations, the attack strategy of hijacking entries in the Global Offset Table (GOT) or function calls shows effectiveness in circumventing run-time countermeasures.
Xiangrun LI Qiyu SHENG Guangda ZHOU Jialong WEI Yanmin SHI Zhen ZHAO Yongwei LI Xingfeng LI Yang LIU
Automated tongue segmentation plays a crucial role in the realm of computer-aided tongue diagnosis. The challenge lies in developing algorithms that achieve higher segmentation accuracy and maintain less memory space and swift inference capabilities. To relieve this issue, we propose a novel Pool-unet integrating Pool-former and Multi-task mask learning for tongue image segmentation. First of all, we collected 756 tongue images taken in various shooting environments and from different angles and accurately labeled the tongue under the guidance of a medical professional. Second, we propose the Pool-unet model, combining a hierarchical Pool-former module and a U-shaped symmetric encoder-decoder with skip-connections, which utilizes a patch expanding layer for up-sampling and a patch embedding layer for down-sampling to maintain spatial resolution, to effectively capture global and local information using fewer parameters and faster inference. Finally, a Multi-task mask learning strategy is designed, which improves the generalization and anti-interference ability of the model through the Multi-task pre-training and self-supervised fine-tuning stages. Experimental results on the tongue dataset show that compared to the state-of-the-art method (OET-NET), our method has 25% fewer model parameters, achieves 22% faster inference times, and exhibits 0.91% and 0.55% improvements in Mean Intersection Over Union (MIOU), and Mean Pixel Accuracy (MPA), respectively.
Sendren Sheng-Dong XU Albertus Andrie CHRISTIAN Chien-Peng HO Shun-Long WENG
During the COVID-19 pandemic, a robust system for masked face recognition has been required. Most existing solutions used many samples per identity for the model to recognize, but the processes involved are very laborious in a real-life scenario. Therefore, we propose “CPNet” as a suitable and reliable way of recognizing masked faces from only a few samples per identity. The prototype classifier uses a few-shot learning paradigm to perform the recognition process. To handle complex and occluded facial features, we incorporated the covariance structure of the classes to refine the class distance calculation. We also used sharpness-aware minimization (SAM) to improve the classifier. Extensive in-depth experiments on a variety of datasets show that our method achieves remarkable results with accuracy as high as 95.3%, which is 3.4% higher than that of the baseline prototype network used for comparison.
Min GAO Gaohua CHEN Jiaxin GU Chunmei ZHANG
Wearing a mask correctly is an effective method to prevent respiratory infectious diseases. Correct mask use is a reliable approach for preventing contagious respiratory infections. However, when dealing with mask-wearing in some complex settings, the detection accuracy still needs to be enhanced. The technique for mask-wearing detection based on YOLOv7-Tiny is enhanced in this research. Distribution Shifting Convolutions (DSConv) based on YOLOv7-tiny are used instead of the 3×3 convolution in the original model to simplify computation and increase detection precision. To decrease the loss of coordinate regression and enhance the detection performance, we adopt the loss function Intersection over Union with Minimum Points Distance (MPDIoU) instead of Complete Intersection over Union (CIoU) in the original model. The model is introduced with the GSConv and VoVGSCSP modules, recognizing the model’s mobility. The P6 detection layer has been designed to increase detection precision for tiny targets in challenging environments and decrease missed and false positive detection rates. The robustness of the model is increased further by creating and marking a mask-wearing data set in a multi environment that uses Mixup and Mosaic technologies for data augmentation. The efficiency of the model is validated in this research using comparison and ablation experiments on the mask dataset. The results demonstrate that when compared to YOLOv7-tiny, the precision of the enhanced detection algorithm is improved by 5.4%, Recall by 1.8%, mAP@.5 by 3%, mAP@.5:.95 by 1.7%, while the FLOPs is decreased by 8.5G. Therefore, the improved detection algorithm realizes more real-time and accurate mask-wearing detection tasks.
Due to the global outbreak of coronaviruses, people are increasingly wearing masks even when photographed. As a result, photos uploaded to web pages and social networking services with the lower half of the face hidden are less likely to convey the attractiveness of the photographed persons. In this study, we propose a method to complete facial mask regions using StyleGAN2, a type of Generative Adversarial Networks (GAN). In the proposed method, a reference image of the same person without a mask is prepared separately from a target image of the person wearing a mask. After the mask region in the target image is temporarily inpainted, the face orientation and contour of the person in the reference image are changed to match those of the target image using StyleGAN2. The changed image is then composited into the mask region while correcting the color tone to produce a mask-free image while preserving the person's features.
Takumi KOBAYASHI Masahiro MINAGAWA Akira BABA Keizo KATO Kazunari SHINBO
Improvement of the on/off ratio in organic field-effect transistors through the use of pentacene and molybdenum trioxide (MoO3) layers was attempted via the preparation of a discontinuous MoO3 layer using a mesh mask. We prepared three types of devices. Device A had a conventional top-contact structure with an n-type Si wafer and a 200-nm-thick SiO2 film onto which we deposited a 70-nm-thick pentacene film and a 30-nm-thick layer of Au top electrodes. Devices B and C had a similar structure to device A but received a continuous and a discontinuous MoO3 layer, respectively. The off current in Device B was remarkably high; in contrast, the off current in Device C was reduced and dependent on the separation of the MoO3 layer. It was deduced that the high resistance of the area without MoO3 contributed to the reduced off current.
Tomoya NITTA Tsubasa HIRAKAWA Hironobu FUJIYOSHI Toru TAMAKI
In this paper we propose an extension of the Attention Branch Network (ABN) by using instance segmentation for generating sharper attention maps for action recognition. Methods for visual explanation such as Grad-CAM usually generate blurry maps which are not intuitive for humans to understand, particularly in recognizing actions of people in videos. Our proposed method, Object-ABN, tackles this issue by introducing a new mask loss that makes the generated attention maps close to the instance segmentation result. Further the Prototype Conformity (PC) loss and multiple attention maps are introduced to enhance the sharpness of the maps and improve the performance of classification. Experimental results with UCF101 and SSv2 shows that the generated maps by the proposed method are much clearer qualitatively and quantitatively than those of the original ABN.
Jing WANG Yiyu LUO Weiming YI Xiang XIE
Speech separation is the task of extracting target speech while suppressing background interference components. In applications like video telephones, visual information about the target speaker is available, which can be leveraged for multi-speaker speech separation. Most previous multi-speaker separation methods are mainly based on convolutional or recurrent neural networks. Recently, Transformer-based Seq2Seq models have achieved state-of-the-art performance in various tasks, such as neural machine translation (NMT), automatic speech recognition (ASR), etc. Transformer has showed an advantage in modeling audio-visual temporal context by multi-head attention blocks through explicitly assigning attention weights. Besides, Transformer doesn't have any recurrent sub-networks, thus supporting parallelization of sequence computation. In this paper, we propose a novel speaker-independent audio-visual speech separation method based on Transformer, which can be flexibly applied to unknown number and identity of speakers. The model receives both audio-visual streams, including noisy spectrogram and speaker lip embeddings, and predicts a complex time-frequency mask for the corresponding target speaker. The model is made up by three main components: audio encoder, visual encoder and Transformer-based mask generator. Two different structures of encoders are investigated and compared, including ResNet-based and Transformer-based. The performance of the proposed method is evaluated in terms of source separation and speech quality metrics. The experimental results on the benchmark GRID dataset show the effectiveness of the method on speaker-independent separation task in multi-talker environments. The model generalizes well to unseen identities of speakers and noise types. Though only trained on 2-speaker mixtures, the model achieves reasonable performance when tested on 2-speaker and 3-speaker mixtures. Besides, the model still shows an advantage compared with previous audio-visual speech separation works.
In flat panel display (FPD) lithography, a high resolution and large depth of focus (DOF) are required. The demands for high throughput have necessitated the use of large glass plates and exposure areas, thereby increasing focal unevenness and reducing process latitude. Thus, a large DOF is needed, particularly for high-resolution lithography. To manufacture future high-definition displays, 1.0μm line and space (L/S) is predicted to be required, and a technique to achieve this resolution with adequate DOF is necessary. To improve the resolution and DOF, resolution enhancement techniques (RETs) have been introduced. RETs such as off-axis illumination (OAI) and phase-shift masks (PSMs) have been widely used in semiconductor lithography, which utilizes narrowband illumination. To effectively use RETs in FPD lithography, modification for broadband illumination is required because FPD lithography utilizes such illumination as exposure light. However, thus far, RETs for broadband illumination have not been studied. This study aimed to develop techniques to achieve 1.0μm L/S resolution with an acceptable DOF. To this end, this paper proposes a method that combines our previously developed RET, namely, divided spectrum illumination (DSI), with an attenuated PSM (Att. PSM). Theoretical observations and simulations present the design of a PSM for broadband illumination. The transmittance and phase shift, whose degree varies according to the wavelength, are determined in terms of aerial image contrast and resist loss. The design of DSI for an Att. PSM is also discussed considering image contrast, DOF, and illumination intensity. Finally, the exposure results of 1.0μm L/S using DSI and PSM techniques are shown, demonstrating that a PSM greatly improves the resist profile, and DSI enhances the DOF by approximately 30% compared to conventional OAI. Thus, DSI and PSMs can be used in practical applications for achieving 1.0μm L/S with sufficient DOF.
Itaru KAMOHARA Ulrich WELLING Ulrich KLOSTERMANN Wolfgang DEMMERLE
This paper presents a simulation study on the printing behavior of three different EUV resist systems. Stochastic models for negative metal-based resist and conventional chemically amplified resist (CAR) were calibrated and then validated. As for negative-tone development (NTD) CAR, we commenced from a positive-tone development (PTD) CAR calibrated (material) and NTD development models, since state-of-the-art measurements are not available. A conceptual study between PTD CAR and NTD CAR shows that the stochastic inhibitor fluctuation differs for PTD CAR: the inhibitor level exhibits small fluctuation (Mack development). For NTD CAR, the inhibitor fluctuation depends on the NTD type, which is defined by categorizing the difference between the NTD and PTD development thresholds. Respective NTD types have different inhibitor concentration level. Moreover, contact hole printing between negative metal-based and NTD CAR was compared to clarify the stochastic process window (PW) for tone reversed mask. For latter comparison, the aerial image (AI) and secondary electron effect are comparable. Finally, the local CD uniformity (LCDU) for the same 20 nm size, 40 nm pitch contact hole was compared among the three different resists. Dose-dependent behavior of LCDU and stochastic PW for NTD were different for the PTD CAR and metal-based resist. For NTD CAR, small inhibitor level and large inhibitor fluctuation around the development threshold were observed, causing LCDU increase, which is specific to the inverse Mack development resist.
Lu YIN Junfeng LI Yonghong YAN Masato AKAGI
The simultaneous utterances impact the ability of both the hearing-impaired persons and automatic speech recognition systems. Recently, deep neural networks have dramatically improved the speech separation performance. However, most previous works only estimate the speech magnitude and use the mixture phase for speech reconstruction. The use of the mixture phase has become a critical limitation for separation performance. This study proposes a two-stage phase-aware approach for multi-talker speech separation, which integrally recovers the magnitude as well as the phase. For the phase recovery, Multiple Input Spectrogram Inversion (MISI) algorithm is utilized due to its effectiveness and simplicity. The study implements the MISI algorithm based on the mask and gives that the ideal amplitude mask (IAM) is the optimal mask for the mask-based MISI phase recovery, which brings less phase distortion. To compensate for the error of phase recovery and minimize the signal distortion, an advanced mask is proposed for the magnitude estimation. The IAM and the proposed mask are estimated at different stages to recover the phase and the magnitude, respectively. Two frameworks of neural network are evaluated for the magnitude estimation on the second stage, demonstrating the effectiveness and flexibility of the proposed approach. The experimental results demonstrate that the proposed approach significantly minimizes the distortions of the separated speech.
Mohammed Salah AL-RADHI Tamás Gábor CSAPÓ Géza NÉMETH
In this article, we propose a method called “continuous noise masking (cNM)” that allows eliminating residual buzziness in a continuous vocoder, i.e. of which all parameters are continuous and offers a simple and flexible speech analysis and synthesis system. Traditional parametric vocoders generally show a perceptible deterioration in the quality of the synthesized speech due to different processing algorithms. Furthermore, an inaccurate noise resynthesis (e.g. in breathiness or hoarseness) is also considered to be one of the main underlying causes of performance degradation, leading to noisy transients and temporal discontinuity in the synthesized speech. To overcome these issues, a new cNM is developed based on the phase distortion deviation in order to reduce the perceptual effect of the residual noise, allowing a proper reconstruction of noise characteristics, and model better the creaky voice segments that may happen in natural speech. To this end, the cNM is designed to keep only voice components under a condition of the cNM threshold while discarding others. We evaluate the proposed approach and compare with state-of-the-art vocoders using objective and subjective listening tests. Experimental results show that the proposed method can reduce the effect of residual noise and can reach the quality of other sophisticated approaches like STRAIGHT and log domain pulse model (PML).
Tongxin YANG Toshinori SATO Tomoaki UKEZONO
Addition is a key fundamental function for many error-tolerant applications. Approximate addition is considered to be an efficient technique for trading off energy against performance and accuracy. This paper proposes a carry-maskable adder whose accuracy can be configured at runtime. The proposed scheme can dynamically select the length of the carry propagation to satisfy the quality requirements flexibly. Compared with a conventional ripple carry adder and a conventional carry look-ahead adder, the proposed 16-bit adder reduced the power consumption by 54.1% and 57.5%, respectively, and the critical path delay by 72.5% and 54.2%, respectively. In addition, results from an image processing application indicate that the quality of processed images can be controlled by the proposed adder. Good scalability of the proposed adder is demonstrated from the evaluation results using a 32-bit length.
Daijoon HYUN Younggwang JUNG Youngsoo SHIN
Multiple patterning lithography allows fine patterns beyond lithography limit, but it suffers from a large process cost. In this paper, we address a method to reduce the number of V0 masks; it consists of two sub-problems. First, stitch-induced via (SIV) is introduced to reduce the number of V0 masks. It involves the redesign of standard cells to replace some vias in V0 layer with SIVs, such that the remaining vias can be assigned to the reduced masks. Since SIV formation requires metal stitches in different masks, SIV replacement and metal mask assignment should be solved simultaneously. This sub-problem is formulated as integer linear programming (ILP). In the second sub-problem, inter-row via conflict aware detailed placement is addressed. Single row placement optimization is performed for each row to remove metal and inter-row via conflicts, while minimizing cell displacements. Since it is time consuming to consider many cell operations at once, we apply a few operations iteratively, where different operations are applied to each iteration and to each cell depending on whether the cell has a conflict in the previous iteration. Remaining conflicts are then removed by mapping conflict cells to white spaces. To this end, we minimize the number of cells to move and maximize the number of large white spaces before mapping. Experimental results demonstrate that the cell placement with two V0 masks is completed by proposed methods, with 7 times speedup and 21% reduction in total cell displacement, compared to conventional detailed placement.
Takuji MIKI Noriyuki MIURA Makoto NAGATA
This paper presents a low-power small-area-overhead physical random number generator utilizing SAR ADC embedded in sensor SoCs. An unpredictable random bit sequence is produced by an existing comparator in typical SAR ADCs, which results in little area overhead. Unlike the other comparator-based physical random number generator, this proposed technique does not require an offset calibration scheme since SAR binary search algorithm automatically converges the two input voltages of the comparator to balance the differential circuit pair. Although the randomness slightly depends on an quantization error due to sharing AD conversion scheme, the input signal distribution enhances the quality of random number bit sequence which can use for various security countermeasures such as masking techniques. Fabricated in 180nm CMOS, 1Mb/s random bit generator achieves high efficiency of 0.72pJ/bit with only 400μm2 area overhead, which occupies less than 0.5% of SAR ADC, while remaining 10-bit AD conversion function.
Bangan LIU Yun WANG Jian PANG Haosheng ZHANG Dongsheng YANG Aravind Tharayil NARAYANAN Dae Young LEE Sung Tae CHOI Rui WU Kenichi OKADA Akira MATSUZAWA
An energy efficient modulator for an ultra-low-power (ULP) 60-GHz IEEE transmitter is presented in this paper. The modulator consists of a differential duobinary coder and a semi-digital finite-impulse-response (FIR) pulse-shaping filter. By virtue of differential duobinary coding and pulse shaping, the transceiver successfully solves the adjacent-channel-power-ratio (ACPR) issue of conventional on-off-keying (OOK) transceivers. The proposed differential duobinary code adopts an over-sampling precoder, which relaxes timing requirement and reduces power consumption. The semi-digital FIR eliminates the power hungry digital multipliers and accumulators, and improves the power efficiency through optimization of filter parameters. Fabricated in a 65nm CMOS process, this modulator occupies a core area of 0.12mm2. With a throughput of 1.7Gbps/2.6Gbps, power consumption of modulator is 24.3mW/42.8mW respectively, while satisfying the IEEE 802.11ad spectrum mask.
This paper presents a hierarchical-masked image filtering method for privacy-protection. Cameras are widely used for various applications, e.g., crime surveillance, environment monitoring, and marketing. However, invasion of privacy has become a serious social problem, especially regarding the use of surveillance cameras. Many surveillance cameras point at many people; thus, a large amount of our private information of our daily activities are under surveillance. However, several surveillance cameras currently on the market and related research often have a complicated or institutional masking privacy-protection functionality. To overcome this problem, a Hierarchical-Masked image Filtering (HMF) method is proposed, which has unmaskable (mask reversal) capability and is applicable to current surveillance camera systems for privacy-information protection and can satisfy privacy-protection related requirements. This method has five main features: unmasking of the original image from only the masked image and a cipher key, hierarchical-mask level control using parameters for the length of a pseudorandom number, robustness against malicious attackers, fast processing on an embedded processor, and applicability of mask operation to current surveillance camera systems. Previous studies have difficulty in providing these features. To evaluate HMF on actual equipment, an HMF-based prototype system is developed that mainly consists of a USB web camera, ultra-compact single board computer, and notebook PC. Through experiments, it is confirmed that the proposed method achieves mask level control and is robust against attacks. The increase in processing time of the HMF-based prototype system compared with a conventional non-masking system is only about 1.4%. This paper also reports on the comparison of the proposed method with conventional privacy protection methods and favorable responses of people toward the HMF-based prototype system both domestically and abroad. Therefore, the proposed HMF method can be applied to embedded systems such as those equipped with surveillance cameras for protecting privacy.
Audio hashing has been successfully employed for protection, management, and indexing of digital music archives. For a reliable audio hashing system, improving hash matching accuracy is crucial. In this paper, we try to improve a binary audio hash matching performance by utilizing auxiliary information, resilience mask, which is obtained while constructing hash DB. The resilience mask contains reliability information of each hash bit. We propose a new type of resilience mask by considering spectrum scaling and additive noise distortions. Experimental results show that the proposed resilience mask is effective in improving hash matching performance.
Ahmed AWAD Atsushi TAKAHASHI Chikaaki KODAMA
With being pushed into sub-16nm regime, advanced technology nodes printing in optical micro-lithography relies heavily on aggressive Optical Proximity Correction (OPC) in the foreseeable future. Although acceptable pattern fidelity is utilized under process variations, mask design time and mask manufacturability form crucial parameters whose tackling in the OPC recipe is highly demanded by the industry. In this paper, we propose an intensity based OPC algorithm to find a highly manufacturable mask solution for a target pattern with acceptable pattern fidelity under process variations within a short computation time. This is achieved through utilizing a fast intensity estimation model in which intensity is numerically correlated with local mask density and kernel type to estimate the intensity in a short time and with acceptable estimation accuracy. This estimated intensity is used to guide feature shifting, alignment, and concatenation following linearly interpolated variational intensity error model to achieve high mask manufacturability with preserving acceptable pattern fidelity under process variations. Experimental results show the effectiveness of our proposed algorithm on the public benchmarks.
Yukihiro KUDOH Yuta UCHIDA Taiju TAKAHASHI
A black mask (BM) is a layer used to improve the display quality by suppressing light leakage. In general, the BM is formed by a photolithography process. In this study, a novel technique for the fabrication of a quasi-black mask (q-BM) is proposed; the q-BM was composed of vertical and hybrid orientation areas, patterned by a separation coating technique using an electro-spray deposition method. Using our technique, the q-BM can be formed easily without the additional masks used for the BM.