Mathematical formulae play an important role in many scientific domains. Regardless of the importance of mathematical formula search, conventional keyword-based retrieval methods are not sufficient for searching mathematical formulae, which are structured as trees. The increasing number as well as the structural complexity of mathematical formulae in scientific articles lead to the necessity for large-scale structure-aware formula search techniques. In this paper, we formulate three types of measures that represent distinctive features of semantic similarity of math formulae, and develop efficient hash-based algorithms for the approximate calculation. Our experiments using NTCIR-11 Math-2 Task dataset, a large-scale test collection for math information retrieval with about 60-million formulae, show that the proposed method improves the search precision while also keeps the scalability and runtime efficiency high.
Shunsuke OHASHI
The University of Tokyo
Giovanni Yoko KRISTIANTO
The University of Tokyo
Goran TOPIC
National Institute of Informatics
Akiko AIZAWA
The University of Tokyo,National Institute of Informatics
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Shunsuke OHASHI, Giovanni Yoko KRISTIANTO, Goran TOPIC, Akiko AIZAWA, "Efficient Algorithm for Math Formula Semantic Search" in IEICE TRANSACTIONS on Information,
vol. E99-D, no. 4, pp. 979-988, April 2016, doi: 10.1587/transinf.2015DAP0023.
Abstract: Mathematical formulae play an important role in many scientific domains. Regardless of the importance of mathematical formula search, conventional keyword-based retrieval methods are not sufficient for searching mathematical formulae, which are structured as trees. The increasing number as well as the structural complexity of mathematical formulae in scientific articles lead to the necessity for large-scale structure-aware formula search techniques. In this paper, we formulate three types of measures that represent distinctive features of semantic similarity of math formulae, and develop efficient hash-based algorithms for the approximate calculation. Our experiments using NTCIR-11 Math-2 Task dataset, a large-scale test collection for math information retrieval with about 60-million formulae, show that the proposed method improves the search precision while also keeps the scalability and runtime efficiency high.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2015DAP0023/_p
Copy
@ARTICLE{e99-d_4_979,
author={Shunsuke OHASHI, Giovanni Yoko KRISTIANTO, Goran TOPIC, Akiko AIZAWA, },
journal={IEICE TRANSACTIONS on Information},
title={Efficient Algorithm for Math Formula Semantic Search},
year={2016},
volume={E99-D},
number={4},
pages={979-988},
abstract={Mathematical formulae play an important role in many scientific domains. Regardless of the importance of mathematical formula search, conventional keyword-based retrieval methods are not sufficient for searching mathematical formulae, which are structured as trees. The increasing number as well as the structural complexity of mathematical formulae in scientific articles lead to the necessity for large-scale structure-aware formula search techniques. In this paper, we formulate three types of measures that represent distinctive features of semantic similarity of math formulae, and develop efficient hash-based algorithms for the approximate calculation. Our experiments using NTCIR-11 Math-2 Task dataset, a large-scale test collection for math information retrieval with about 60-million formulae, show that the proposed method improves the search precision while also keeps the scalability and runtime efficiency high.},
keywords={},
doi={10.1587/transinf.2015DAP0023},
ISSN={1745-1361},
month={April},}
Copy
TY - JOUR
TI - Efficient Algorithm for Math Formula Semantic Search
T2 - IEICE TRANSACTIONS on Information
SP - 979
EP - 988
AU - Shunsuke OHASHI
AU - Giovanni Yoko KRISTIANTO
AU - Goran TOPIC
AU - Akiko AIZAWA
PY - 2016
DO - 10.1587/transinf.2015DAP0023
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2016
AB - Mathematical formulae play an important role in many scientific domains. Regardless of the importance of mathematical formula search, conventional keyword-based retrieval methods are not sufficient for searching mathematical formulae, which are structured as trees. The increasing number as well as the structural complexity of mathematical formulae in scientific articles lead to the necessity for large-scale structure-aware formula search techniques. In this paper, we formulate three types of measures that represent distinctive features of semantic similarity of math formulae, and develop efficient hash-based algorithms for the approximate calculation. Our experiments using NTCIR-11 Math-2 Task dataset, a large-scale test collection for math information retrieval with about 60-million formulae, show that the proposed method improves the search precision while also keeps the scalability and runtime efficiency high.
ER -