2020 |
Kapidakis, S. Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020), vol. 2, 2020, ISBN: 978-989-758-474-9. Abstract | Links | BibTeX | Tags: Controlled Terms, controlled vocabularies, Dendrogram, Dublin core, harvesting, Language, Linked Open Data, Metadata, OAI-PMH, Repeated Values @conference{Kapidakis2020, When resource descriptions use the exact same value for an entity, this value is easier parsed, identified and utilized by automatic procedures. The use of controlled values, even when it is common and very useful, it is usually not enforced during the data entry. In this paper we study the use of the controlled values in many harvested collections and we study all Dublin Core elements and also their similarity. We mainly focus in the element language, as there is a lot of standardization on how to denote language values, followed by other elements that normally use controlled values. We discovered values that are repeated many times and in many collections and many more values that are used only once! The lack of coordination among collections during their creation results to many variations for each value, even when the value is used consistently and many times inside a collection. The study uses dendrogram to reveal the current usage of the Dublin Core elements inside and among active collections by clustering the collections with similar values and helps adopting better guidelines, designing better tools and improving the effectiveness of the collections. |