Please use this identifier to cite or link to this item:
http://hdl.handle.net/UCSP/15843
Title: | Paradigmatic Clustering for NLP |
Authors: | Santisteban Pablo, Julio Omar Tejada Cárcamo, Javier |
Keywords: | Cluster analysis;Data mining;Graph theory;Natural language processing systems;asymmetric similarity;clustering;Clustering techniques;paradigmatic;Similarity measure;Synthetic and real data;Traditional approaches;Word Sense Disambiguation;Clustering algorithms |
Issue Date: | 2016 |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
metadata.dc.relation.uri: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84964770393&doi=10.1109%2fICDMW.2015.233&partnerID=40&md5=26ff37a5a3402b53a73baf00f81bd862 |
Abstract: | How can we retrieve meaningful information from a large and sparse graph?. Traditional approaches focus on generic clustering techniques and discovering dense cumulus in a network graph, however, they tend to omit interesting patterns such as the paradigmatic relations. In this paper, we propose a novel graph clustering technique modelling the relations of a node using the paradigmatic analysis. We exploit node's relations to extract its existing sets of signifiers. The newly found clusters represent a different view of a graph, which provides interesting insights into the structure of a sparse network graph. Our proposed algorithm PaC (Paradigmatic Clustering) for clustering graphs uses paradigmatic analysis supported by a asymmetric similarity, in contrast to traditional graph clustering methods, our algorithm yields worthy results in tasks of word-sense disambiguation. In addition we propose a novel paradigmatic similarity measure. Extensive experiments and empirical analysis are used to evaluate our algorithm on synthetic and real data. © 2015 IEEE. |
URI: | http://repositorio.ucsp.edu.pe/handle/UCSP/15843 |
ISBN: | 9781467384926 |
Appears in Collections: | Artículos de investigación |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.