Automated Metadata Annotation Article


The data behind the whizzy new AI technology is no older than the World Wide Web—barely 25 years old. It’s important to acknowledge what type of data a pre-trained machine learning algorithm like ChatGPT has been trained on to understand its limitations and potential biases. For example, scholarly and historical content is often not available in consumable, homogenized, and interoperable formats at the large volume that is required for machine learning. Based on a panel I organized for the 2021 International Conference on Dublin Core and Metadata Applications, MIT Press just published this paper that presents the current state of automated metadata annotation in cultural heritage and research data: M. Wu, H. Brandhorst, M. Marinescu, J. Lopez, M. Hlava, and J. Busch. “Automated metadata annotation: What is and is not possible with machine learning.” 5:1 Data Intelligence (2023) 122–138.