Copy the page URI to the clipboard
Opsomer, Rob; Knoth, Petr; van Polen, Freek; Trapman, Jantine and Wiering, Marco
(2008).
Abstract
In this paper we present the application of machine learning text classification methods to two tasks: categorization of children's speech in the CHILDES Database according to gender and age. Both tasks are binary. For age, we distinguish two age groups between the age of 1.9 and 3.0 years old. The boundary between the groups lies at the age of 2.4 which is both the mean and the median of the age in our data set. We show that the machine learning approach, based on a bag of words, can achieve much better results than features such as average utterance length or Type-Token Ratio, which are methods traditionally used by linguists. We have achieved 80.5% and 70.5% classification accuracy for the age and gender task respectively.
Viewing alternatives
- Published Version (PDF) This file is not available for public download
Item Actions
Export
About
- Item ORO ID
- 24749
- Item Type
- Conference or Workshop Item
- ISSN
- 1568-7805
- Extra Information
-
Proceedings of the twentieth Belgian-Dutch Conference on
Artificial Intelligence.
Enschede, October 30-31, 2008.
Anton Nijholt, Maja Pantic, Mannes Poel and Hendri Hondorp (eds.) - Academic Unit or School
-
Faculty of Science, Technology, Engineering and Mathematics (STEM) > Knowledge Media Institute (KMi)
Faculty of Science, Technology, Engineering and Mathematics (STEM) - Research Group
-
Centre for Research in Computing (CRC)
Big Scientific Data and Text Analytics Group (BSDTAG) - Copyright Holders
- © 2008 Universiteit Twente, Enschede
- Related URLs
- Depositing User
- Kay Dave