The search functionality is under construction.

Keyword Search Result

[Keyword] anonymization(7hit)

1-7hit
  • Preventing Fake Information Generation Against Media Clone Attacks Open Access

    Noboru BABAGUCHI  Isao ECHIZEN  Junichi YAMAGISHI  Naoko NITTA  Yuta NAKASHIMA  Kazuaki NAKAMURA  Kazuhiro KONO  Fuming FANG  Seiko MYOJIN  Zhenzhong KUANG  Huy H. NGUYEN  Ngoc-Dung T. TIEU  

     
    INVITED PAPER

      Pubricized:
    2020/10/19
      Vol:
    E104-D No:1
      Page(s):
    2-11

    Fake media has been spreading due to remarkable advances in media processing and machine leaning technologies, causing serious problems in society. We are conducting a research project called Media Clone aimed at developing methods for protecting people from fake but skillfully fabricated replicas of real media called media clones. Such media can be created from fake information about a specific person. Our goal is to develop a trusted communication system that can defend against attacks of media clones. This paper describes some research results of the Media Clone project, in particular, various methods for protecting personal information against generating fake information. We focus on 1) fake information generation in the physical world, 2) anonymization and abstraction in the cyber world, and 3) modeling of media clone attacks.

  • An Overview of De-Identification Techniques and Their Standardization Directions

    Heung Youl YOUM  

     
    INVITED PAPER

      Pubricized:
    2020/05/14
      Vol:
    E103-D No:7
      Page(s):
    1448-1461

    De-identification[1]-[5], [30]-[71] is the process that organizations can use to remove personal information from data that they collect, use, archive, and share with other organizations. It is recognized as an important tool for organizations to balance requirements between the use of data and privacy protection of personal information. Its objective is to remove the association between a set of identifying attributes and the data principal where identifying attribute is attribute in a dataset that is able to contribute to uniquely identifying a data principal within a specific operational context and data principal is entity to which data relates. This paper provides an overview of de-identification techniques including the data release models. It also describes the current standardization activities by the standardization development organizations in terms of de-identification. It suggests future standardization directions including potential future work items.

  • Anonymization Technique Based on SGD Matrix Factorization

    Tomoaki MIMOTO  Seira HIDANO  Shinsaku KIYOMOTO  Atsuko MIYAJI  

     
    PAPER-Cryptographic Techniques

      Pubricized:
    2019/11/25
      Vol:
    E103-D No:2
      Page(s):
    299-308

    Time-sequence data is high dimensional and contains a lot of information, which can be utilized in various fields, such as insurance, finance, and advertising. Personal data including time-sequence data is converted to anonymized datasets, which need to strike a balance between both privacy and utility. In this paper, we consider low-rank matrix factorization as one of anonymization methods and evaluate its efficiency. We convert time-sequence datasets to matrices and evaluate both privacy and utility. The record IDs in time-sequence data are changed at regular intervals to reduce re-identification risk. However, since individuals tend to behave in a similar fashion over periods of time, there remains a risk of record linkage even if record IDs are different. Hence, we evaluate the re-identification and linkage risks as privacy risks of time-sequence data. Our experimental results show that matrix factorization is a viable anonymization method and it can achieve better utility than existing anonymization methods.

  • Study on Record Linkage of Anonymizied Data

    Hiroaki KIKUCHI  Takayasu YAMAGUCHI  Koki HAMADA  Yuji YAMAOKA  Hidenobu OGURI  Jun SAKUMA  

     
    INVITED PAPER

      Vol:
    E101-A No:1
      Page(s):
    19-28

    Data anonymization is required before a big-data business can run effectively without compromising the privacy of personal information it uses. It is not trivial to choose the best algorithm to anonymize some given data securely for a given purpose. In accurately assessing the risk of data being compromised, there needs to be a balance between utility and security. Therefore, using common pseudo microdata, we propose a competition for the best anonymization and re-identification algorithm. The paper reported the result of the competition and the analysis on the effective of anonymization technique. The competition result reveals that there is a tradeoff between utility and security, and 20.9% records were re-identified in average.

  • Novel Method to Watermark Anonymized Data for Data Publishing

    Yuichi NAKAMURA  Yoshimichi NAKATSUKA  Hiroaki NISHI  

     
    PAPER-Privacy

      Pubricized:
    2017/05/18
      Vol:
    E100-D No:8
      Page(s):
    1671-1679

    In this study, an anonymization infrastructure for the secondary use of data is proposed. The proposed infrastructure can publish data that includes privacy information while preserving the privacy by using anonymization techniques. The infrastructure considers a situation where ill-motivated users redistribute the data without authorization. Therefore, we propose a watermarking method for anonymized data to solve this problem. The proposed method is implemented, and the proposed method's tolerance against attacks is evaluated.

  • Internet Data Center IP Identification and Connection Relationship Analysis Based on Traffic Connection Behavior Analysis

    Xuemeng ZHAI  Mingda WANG  Hangyu HU  Guangmin HU  

     
    PAPER-Fundamental Theories for Communications

      Pubricized:
    2016/10/21
      Vol:
    E100-B No:4
      Page(s):
    510-517

    Identifying IDC (Internet Data Center) IP addresses and analyzing the connection relationship of IDC could reflect the IDC network resource allocation and network layout which is helpful for IDC resource allocation optimization. Recent research mainly focuses on minimizing electricity consumption and optimizing network resource allocation based on IDC traffic behavior analysis. However, the lack of network-wide IP information from network operators has led to problems like management difficulties and unbalanced resource allocation of IDC, which are still unsolved today. In this paper, we propose a method for the IP identification and connection relationship analysis of IDC based on the flow connection behavior analysis. In our method, the frequent IP are extracted and aggregated in backbone communication network based on the traffic characteristics of IDC. After that, the connection graph of frequent IP (CGFIP) are built by analyzing the behavior of the users who visit the IDC servers, and IDC IP blocks are thus identified using CGFIP. Furthermore, the connection behavior characteristics of IDC are analyzed based on the connection graphs of IDC (CGIDC). Our findings show that the method can accurately identify the IDC IP addresses and is also capable of reflecting the relationships among IDCs effectively.

  • Achieving High Data Utility K-Anonymization Using Similarity-Based Clustering Model

    Mohammad Rasool SARRAFI AGHDAM  Noboru SONEHARA  

     
    PAPER

      Pubricized:
    2016/05/31
      Vol:
    E99-D No:8
      Page(s):
    2069-2078

    In data sharing privacy has become one of the main concerns particularly when sharing datasets involving individuals contain private sensitive information. A model that is widely used to protect the privacy of individuals in publishing micro-data is k-anonymity. It reduces the linking confidence between private sensitive information and specific individual by generalizing the identifier attributes of each individual into at least k-1 others in dataset. K-anonymity can also be defined as clustering with constrain of minimum k tuples in each group. However, the accuracy of the data in k-anonymous dataset decreases due to huge information loss through generalization and suppression. Also most of the current approaches are designed for numerical continuous attributes and for categorical attributes they do not perform efficiently and depend on attributes hierarchical taxonomies, which often do not exist. In this paper we propose a new model for k-anonymization, which is called Similarity-Based Clustering (SBC). It is based on clustering and it measures similarity and calculates distances between tuples containing numerical and categorical attributes without hierarchical taxonomies. Based on this model a bottom up greedy algorithm is proposed. Our extensive study on two real datasets shows that the proposed algorithm in comparison with existing well-known algorithms offers much higher data utility and reduces the information loss significantly. Data utility is maintained above 80% in a wide range of k values.