Constructive Visual Analytics for Text Similarity Detection

Abdul-Rahman, A.; Roe, G.; Olsen, M.; Gladstone, C.; Whaling, R.; Cronk, N.; Morrissey, R.; Chen, M.

View/Open

v36i1pp237-248.pdf (4.228Mb)

Date

2017

Author

Abdul-Rahman, A.

Roe, G.

Olsen, M.

Gladstone, C.

Whaling, R.

Cronk, N.

Morrissey, R.

Chen, M.

Pay-Per-View via TIB Hannover:

Try if this item/paper is available.

Metadata

Show full item record

Abstract

Detecting similarity between texts is a frequently encountered text mining task. Because the measurement of similarity is typically composed of a number of metrics, and some measures are sensitive to subjective interpretation, a generic detector obtained using machine learning often has difficulties balancing the roles of different metrics according to the semantic context exhibited in a specific collection of texts. In order to facilitate human interaction in a visual analytics process for text similarity detection, we first map the problem of pairwise sequence comparison to that of image processing, allowing patterns of similarity to be visualized as a 2D pixelmap. We then devise a visual interface to enable users to construct and experiment with different detectors using primitive metrics, in a way similar to constructing an image processing pipeline. We deployed this new approach for the identification of commonplaces in 18th‐century literary and print culture. Domain experts were then able to make use of the prototype system to derive new scholarly discoveries and generate new hypotheses.Detecting similarity between texts is a frequently encountered text mining task. Because the measurement of similarity is typically composed of a number of metrics, and some measures are sensitive to subjective interpretation, a generic detector obtained using machine learning often has difficulties balancing the roles of different metrics according to the semantic context exhibited in a specific collection of texts. In order to facilitate human interaction in a visual analytics process for text similarity detection, we first map the problem of pairwise sequence comparison to that of image processing, allowing patterns of similarity to be visualized as a 2D pixelmap.We then devise a visual interface to enable users to construct and experiment with different detectors using primitive metrics, in a way similar to constructing an image processing pipeline. We deployed this new approach for the identification of commonplaces in 18th‐century literary and print culture. Domain experts were then able to make use of the prototype system to derive new scholarly discoveries and generate new hypotheses.

BibTeX

@article {10.1111:cgf.12798,
journal = {Computer Graphics Forum},
title = {{Constructive Visual Analytics for Text Similarity Detection}},
author = {Abdul-Rahman, A. and Roe, G. and Olsen, M. and Gladstone, C. and Whaling, R. and Cronk, N. and Morrissey, R. and Chen, M.},
year = {2017},
publisher = {© 2017 The Eurographics Association and John Wiley & Sons Ltd.},
ISSN = {1467-8659},
DOI = {10.1111/cgf.12798}
}

URI

http://dx.doi.org/10.1111/cgf.12798
https://diglib.eg.org:443/handle/10.1111/cgf12798

Collections

36-Issue 1