Which Words Pillar the Semantic Expression of a Sentence?

Zhang, Cheng; Cao, Jingxu; Yan, Dongmei; Song, Dawei and Lv, Jinxin (2023). Which Words Pillar the Semantic Expression of a Sentence? In: Proceedings of the 2023 IEEE 35th International Conference on Tools with Artificial Intelligence (ICTAI2023), (In Press).

DOI: https://doi.org/10.1109/ICTAI59109.2023.00121

Abstract

In the realm of machine learning, a profound understanding of sentence semantics holds paramount importance for various applications, notably text classification. Traditionally, this comprehension has been entrusted to deep learning models, despite their computationally intensive nature, particularly when dealing with lengthy sequences. The nuanced impact of individual words within a sentence on semantic expression necessitates a strategic removal of less pertinent words to alleviate the computational burden of the model. Presently, prevailing approaches for word removal predominantly employ methods such as truncation, stop-word elimination and attention mechanisms. Regrettably, these techniques often lack a robust theoretical foundation concerning semantics and interpretability. To bridge this conceptual gap, our study introduces the concept of ‘Semantic Pillar Words’ (SPW) within a sentence, anchored in a Semantic Euclidean space. Here, the semantics of a word are represented as a constellation of semantic points, with a text sequence encapsulating the convex hull of these semantic points of words. We propose a novel method for Semantic Pillar Word extraction, known as ‘SPW-Conv’, which dynamically and interpretably prunes text segments, striving to preserve the semantic pillars inherent in the original text. Our extensive experimentation encompasses three diverse text classification datasets, revealing that SPW-Conv outperforms existing methods. Remarkably, it becomes evident that retaining less than 80% of the words within a sentence suffices to capture its semantics adequately, all while achieving classification accuracy levels comparable to those obtained using the entire original text.

Viewing alternatives

Metrics

Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Item Actions

Export

About