Back to articles
Volume: 16 | Article ID: art00018
Preprocessing pipeline for Italian Cultural Heritage multimedia datasets
  DOI :  10.2352/issn.2168-3204.2019.1.0.18  Published OnlineMay 2019

Preprocessing is an important task and a fundamental step in Information Retrieval, Text Mining, Natural Language Processing (NLP). While datasets in the English language can rely on well-established tools and methods for text preprocessing, the situation for the Italian language is more nuanced, due to a sum of factors, not least that fewer experiments and studies were made, and algorithms developed. Here we present an experimentation, a work in progress whose purpose is to define a pipeline able to preprocess texts. The different steps of the pipeline have been implemented and tested individually on Cultural Heritage datasets. The results obtained have been evaluated in the context of unsupervised automatic keyword extraction algorithms, such as RAKE or TextRank.

Subject Areas :
Views 13
Downloads 5
 articleview.views 13
 articleview.downloads 5
  Cite this article 

Maria Teresa Artese, Isabella Gagliardi, "Preprocessing pipeline for Italian Cultural Heritage multimedia datasetsin Proc. IS&T Archiving 2019,  2019,  pp 81 - 85,

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2019
Archiving Conference
Society for Imaging Science and Technology