Constructive Visual Analytics for Text Similarity Detection

Abdul-Rahman, A.; Roe, G.; Olsen, M.; Gladstone, C.; Whaling, R.; Cronk, N.; Morrissey, R.; Chen, M.

dc.contributor.author	Abdul-Rahman, A.	en_US
dc.contributor.author	Roe, G.	en_US
dc.contributor.author	Olsen, M.	en_US
dc.contributor.author	Gladstone, C.	en_US
dc.contributor.author	Whaling, R.	en_US
dc.contributor.author	Cronk, N.	en_US
dc.contributor.author	Morrissey, R.	en_US
dc.contributor.author	Chen, M.	en_US
dc.contributor.editor	Chen, Min and Zhang, Hao (Richard)	en_US
dc.date.accessioned	2017-03-13T18:13:03Z
dc.date.available	2017-03-13T18:13:03Z
dc.date.issued	2017
dc.identifier.issn	1467-8659
dc.identifier.uri	http://dx.doi.org/10.1111/cgf.12798
dc.identifier.uri	https://diglib.eg.org:443/handle/10.1111/cgf12798
dc.description.abstract	Detecting similarity between texts is a frequently encountered text mining task. Because the measurement of similarity is typically composed of a number of metrics, and some measures are sensitive to subjective interpretation, a generic detector obtained using machine learning often has difficulties balancing the roles of different metrics according to the semantic context exhibited in a specific collection of texts. In order to facilitate human interaction in a visual analytics process for text similarity detection, we first map the problem of pairwise sequence comparison to that of image processing, allowing patterns of similarity to be visualized as a 2D pixelmap. We then devise a visual interface to enable users to construct and experiment with different detectors using primitive metrics, in a way similar to constructing an image processing pipeline. We deployed this new approach for the identification of commonplaces in 18th‐century literary and print culture. Domain experts were then able to make use of the prototype system to derive new scholarly discoveries and generate new hypotheses.Detecting similarity between texts is a frequently encountered text mining task. Because the measurement of similarity is typically composed of a number of metrics, and some measures are sensitive to subjective interpretation, a generic detector obtained using machine learning often has difficulties balancing the roles of different metrics according to the semantic context exhibited in a specific collection of texts. In order to facilitate human interaction in a visual analytics process for text similarity detection, we first map the problem of pairwise sequence comparison to that of image processing, allowing patterns of similarity to be visualized as a 2D pixelmap.We then devise a visual interface to enable users to construct and experiment with different detectors using primitive metrics, in a way similar to constructing an image processing pipeline. We deployed this new approach for the identification of commonplaces in 18th‐century literary and print culture. Domain experts were then able to make use of the prototype system to derive new scholarly discoveries and generate new hypotheses.	en_US
dc.publisher	© 2017 The Eurographics Association and John Wiley & Sons Ltd.	en_US
dc.subject	Visualization
dc.subject	Visual Analytics
dc.subject	I.3.6 [Computer Graphics]: Methodology and Techniques—Interaction techniques I.4.m [Image Processing and Computer Vision]: Miscellaneous I.7.5 [Document and Text Processing]: Document Capture—Document Analysis
dc.title	Constructive Visual Analytics for Text Similarity Detection	en_US
dc.description.seriesinformation	Computer Graphics Forum
dc.description.sectionheaders	Articles
dc.description.volume	36
dc.description.number	1
dc.identifier.doi	10.1111/cgf.12798

Files in this item

Name:: v36i1pp237-248.pdf
Size:: 4.228Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

36-Issue 1
Regular Issue

Show simple item record