
Cultural artifact classification supports preservation, scholarship, and anti-trafficking enforcement, but computational approaches face severe data scarcity: limited labeled collections, inconsistent metadata, and complex annotation requirements. We present a methodology for large-scale dataset creation from aggregated heritage platforms that combines cultural marker extraction from unstructured text descriptions, cross-lingual querying across multiple European languages, visual clustering using self-supervised features, and collection-aware data splitting. Applied to Europeana, this produced datasets ranging from 33,859 to 310,252 objects across 6 to 62 classes of varying granularity. Visual clustering removed 34% of noisy content while enabling fine-grained subcategories. A k-nearest-neighbor probe of the resulting feature space revealed that 79.5% of nearest neighbors share the same institutional source. This nearly matches the 83.0% same-civilization rate, indicating that institutional signatures are almost as dominant as cultural content. We quantify this bias per civilization and propose collection-aware splitting as a necessary countermeasure for realistic evaluation.

This work addresses the challenge of identifying the provenance of illicit cultural artifacts, a task often hindered by the lack of specialized expertise among law enforcement and customs officials. To facilitate immediate assessments, we propose an improved deep learning model based on a pre-trained ResNet model, fine-tuned for archaeological artifact recognition through transfer learning. Our model uniquely integrates multi-level feature extraction, capturing both textural and structural features of artifacts, and incorporates self-attention mechanisms to enhance contextual understanding. In addition, we developed two different artifact datasets: a dataset with mixed types of earthenware and a dataset for coins. Both datasets are categorized according to the age and region of artifacts. Evaluations of the proposed model on these datasets demonstrate improved recognition accuracy thanks to the enhanced feature representation.