Logging Inter-Thread Data Dependencies in Linux Kernel

Takafumi KUBOTA; Naohiro AOTA; Kenji KONO

doi:10.1587/transinf.2019EDP7255

IEICE TRANSACTIONS on Information

Logging Inter-Thread Data Dependencies in Linux Kernel

Takafumi KUBOTA, Naohiro AOTA, Kenji KONO

Full Text Views

0

Cite this

Summary :

Logging is a practical and useful way of diagnosing failures in software systems. The logged events are crucially important to learning what happened during a failure. If key events are not logged, it is almost impossible to track error propagations in the diagnosis. Tracking an error propagation becomes utterly complicated if inter-thread data dependency is involved. An inter-thread data dependency arises when one thread accesses to share data corrupted by another thread. Since the erroneous state propagates from a buggy thread to a failing thread through the corrupt shared data, the root cause cannot be tracked back solely by investigating the failing thread. This paper presents the design and implementation of K9, a tool that inserts logging code automatically to trace inter-thread data dependencies. K9 is designed to be “practical”; it scales to one million lines of code in C, causes negligible runtime overheads, and provides clues to tracking inter-thread dependencies in real-world bugs. To scale to one million lines of code, K9 ditches rigorous static analysis of pointers to detect code locations where inter-thread data dependency can occur. Instead, K9 takes the best-effort approach and finds out “most” of those code locations by making use of coding conventions. This paper demonstrates that K9 is applicable to Linux and captures relevant code locations, in spite of the best-effort approach, enough to provide useful clues to root causes in real-world bugs, including a previously unknown bug in Linux. The paper also shows K9 runtime overhead is negligible. K9 incurs 1.25% throughput degradation and 0.18% CPU usage increase, on average, in our evaluation.

Publication: IEICE TRANSACTIONS on Information Vol.E103-D No.7 pp.1633-1646

Publication Date: 2020/07/01

Publicized: 2020/04/06

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2019EDP7255

Type of Manuscript: PAPER

Category: Software System

Authors

Takafumi KUBOTA
  Keio University
Naohiro AOTA
  Keio University
Kenji KONO
  Keio University

Keyword

logging automation, inter-thread dependency, debugging, operating systems

Cite this

Copy

Takafumi KUBOTA, Naohiro AOTA, Kenji KONO, "Logging Inter-Thread Data Dependencies in Linux Kernel" in IEICE TRANSACTIONS on Information, vol. E103-D, no. 7, pp. 1633-1646, July 2020, doi: 10.1587/transinf.2019EDP7255.
Abstract: Logging is a practical and useful way of diagnosing failures in software systems. The logged events are crucially important to learning what happened during a failure. If key events are not logged, it is almost impossible to track error propagations in the diagnosis. Tracking an error propagation becomes utterly complicated if inter-thread data dependency is involved. An inter-thread data dependency arises when one thread accesses to share data corrupted by another thread. Since the erroneous state propagates from a buggy thread to a failing thread through the corrupt shared data, the root cause cannot be tracked back solely by investigating the failing thread. This paper presents the design and implementation of K9, a tool that inserts logging code automatically to trace inter-thread data dependencies. K9 is designed to be “practical”; it scales to one million lines of code in C, causes negligible runtime overheads, and provides clues to tracking inter-thread dependencies in real-world bugs. To scale to one million lines of code, K9 ditches rigorous static analysis of pointers to detect code locations where inter-thread data dependency can occur. Instead, K9 takes the best-effort approach and finds out “most” of those code locations by making use of coding conventions. This paper demonstrates that K9 is applicable to Linux and captures relevant code locations, in spite of the best-effort approach, enough to provide useful clues to root causes in real-world bugs, including a previously unknown bug in Linux. The paper also shows K9 runtime overhead is negligible. K9 incurs 1.25% throughput degradation and 0.18% CPU usage increase, on average, in our evaluation.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019EDP7255/_p

Copy

@ARTICLE{e103-d_7_1633,
author={Takafumi KUBOTA, Naohiro AOTA, Kenji KONO, },
journal={IEICE TRANSACTIONS on Information},
title={Logging Inter-Thread Data Dependencies in Linux Kernel},
year={2020},
volume={E103-D},
number={7},
pages={1633-1646},
abstract={Logging is a practical and useful way of diagnosing failures in software systems. The logged events are crucially important to learning what happened during a failure. If key events are not logged, it is almost impossible to track error propagations in the diagnosis. Tracking an error propagation becomes utterly complicated if inter-thread data dependency is involved. An inter-thread data dependency arises when one thread accesses to share data corrupted by another thread. Since the erroneous state propagates from a buggy thread to a failing thread through the corrupt shared data, the root cause cannot be tracked back solely by investigating the failing thread. This paper presents the design and implementation of K9, a tool that inserts logging code automatically to trace inter-thread data dependencies. K9 is designed to be “practical”; it scales to one million lines of code in C, causes negligible runtime overheads, and provides clues to tracking inter-thread dependencies in real-world bugs. To scale to one million lines of code, K9 ditches rigorous static analysis of pointers to detect code locations where inter-thread data dependency can occur. Instead, K9 takes the best-effort approach and finds out “most” of those code locations by making use of coding conventions. This paper demonstrates that K9 is applicable to Linux and captures relevant code locations, in spite of the best-effort approach, enough to provide useful clues to root causes in real-world bugs, including a previously unknown bug in Linux. The paper also shows K9 runtime overhead is negligible. K9 incurs 1.25% throughput degradation and 0.18% CPU usage increase, on average, in our evaluation.},
keywords={},
doi={10.1587/transinf.2019EDP7255},
ISSN={1745-1361},
month={July},}

Copy

TY - JOUR
TI - Logging Inter-Thread Data Dependencies in Linux Kernel
T2 - IEICE TRANSACTIONS on Information
SP - 1633
EP - 1646
AU - Takafumi KUBOTA
AU - Naohiro AOTA
AU - Kenji KONO
PY - 2020
DO - 10.1587/transinf.2019EDP7255
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2020
AB - Logging is a practical and useful way of diagnosing failures in software systems. The logged events are crucially important to learning what happened during a failure. If key events are not logged, it is almost impossible to track error propagations in the diagnosis. Tracking an error propagation becomes utterly complicated if inter-thread data dependency is involved. An inter-thread data dependency arises when one thread accesses to share data corrupted by another thread. Since the erroneous state propagates from a buggy thread to a failing thread through the corrupt shared data, the root cause cannot be tracked back solely by investigating the failing thread. This paper presents the design and implementation of K9, a tool that inserts logging code automatically to trace inter-thread data dependencies. K9 is designed to be “practical”; it scales to one million lines of code in C, causes negligible runtime overheads, and provides clues to tracking inter-thread dependencies in real-world bugs. To scale to one million lines of code, K9 ditches rigorous static analysis of pointers to detect code locations where inter-thread data dependency can occur. Instead, K9 takes the best-effort approach and finds out “most” of those code locations by making use of coding conventions. This paper demonstrates that K9 is applicable to Linux and captures relevant code locations, in spite of the best-effort approach, enough to provide useful clues to root causes in real-world bugs, including a previously unknown bug in Linux. The paper also shows K9 runtime overhead is negligible. K9 incurs 1.25% throughput degradation and 0.18% CPU usage increase, on average, in our evaluation.
ER -

IEICE TRANSACTIONS on Information