2020
|
Kapidakis, S. Consistency and Interoperability on Dublin Core Element Values in Collections Harvested using the Open Archive Initiative Protocol for Metadata Harvesting Conference Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020), vol. 2, 2020, ISBN: 978-989-758-474-9. @conference{Kapidakis2020,
title = {Consistency and Interoperability on Dublin Core Element Values in Collections Harvested using the Open Archive Initiative Protocol for Metadata Harvesting},
author = {Kapidakis, S.},
url = {https://www.scitepress.org/Papers/2020/101120/101120.pdf},
isbn = {978-989-758-474-9},
year = {2020},
date = {2020-11-04},
booktitle = {Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020)},
journal = {Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020},
volume = {2},
pages = {181-188},
abstract = {When resource descriptions use the exact same value for an entity, this value is easier parsed, identified and utilized by automatic procedures. The use of controlled values, even when it is common and very useful, it is usually not enforced during the data entry. In this paper we study the use of the controlled values in many harvested collections and we study all Dublin Core elements and also their similarity. We mainly focus in the element language, as there is a lot of standardization on how to denote language values, followed by other elements that normally use controlled values. We discovered values that are repeated many times and in many collections and many more values that are used only once! The lack of coordination among collections during their creation results to many variations for each value, even when the value is used consistently and many times inside a collection. The study uses dendrogram to reveal the current usage of the Dublin Core elements inside and among active collections by clustering the collections with similar values and helps adopting better guidelines, designing better tools and improving the effectiveness of the collections.},
keywords = {Controlled Terms, controlled vocabularies, Dendrogram, Dublin core, harvesting, Language, Linked Open Data, Metadata, OAI-PMH, Repeated Values},
pubstate = {published},
tppubtype = {conference}
}
When resource descriptions use the exact same value for an entity, this value is easier parsed, identified and utilized by automatic procedures. The use of controlled values, even when it is common and very useful, it is usually not enforced during the data entry. In this paper we study the use of the controlled values in many harvested collections and we study all Dublin Core elements and also their similarity. We mainly focus in the element language, as there is a lot of standardization on how to denote language values, followed by other elements that normally use controlled values. We discovered values that are repeated many times and in many collections and many more values that are used only once! The lack of coordination among collections during their creation results to many variations for each value, even when the value is used consistently and many times inside a collection. The study uses dendrogram to reveal the current usage of the Dublin Core elements inside and among active collections by clustering the collections with similar values and helps adopting better guidelines, designing better tools and improving the effectiveness of the collections. |
2019
|
Kapidakis, S. Repeated Values on Collections Harvested using the Open Archive Initiative Protocol for Metadata Harvesting Conference 11th International Conference on Management of Digital EcoSystems MEDES 2019 November 12--14, 2019, Limassol, Cyprus, ACM 2019, ACM, 2019, ISBN: 978-1-4503-6238-2. @conference{Kapidakis2019b,
title = {Repeated Values on Collections Harvested using the Open Archive Initiative Protocol for Metadata Harvesting},
author = {Kapidakis, S.},
url = {https://dl.acm.org/doi/10.1145/3297662.3365795},
doi = {https://doi.org/10.1145/3297662.3365795},
isbn = {978-1-4503-6238-2},
year = {2019},
date = {2019-11-14},
booktitle = {11th International Conference on Management of Digital EcoSystems MEDES 2019 November 12--14, 2019, Limassol, Cyprus, ACM 2019},
publisher = {ACM},
abstract = {Libraries use repeated values to always denote each entity or group of entities in a specific way. When resources have metadata elements with the exact same value, their correlation is made obvious, making the retrieval of all matching metadata records easier. The library uses guidelines on which metadata elements should only use controlled terms, and how these values will be selected. In this paper, we study the use of the repeated values in many collections and also their effectiveness when all the collections are used together. We discovered values that are repeated often and values that are unusual, misused or just rare. Many metadata elements may use controlled terms as values, although they are traditionally used mostly in some of them. We see the differences on the use of the Dublin Core elements. The lack of coordination among collections results to many variations for each value. The study reveals the current usage of repeated values in active collections and helps adopting better guidelines, designing better tools and improving the effectiveness of the collections.},
keywords = {harvesting, metadata harvesting, open archive initiative},
pubstate = {published},
tppubtype = {conference}
}
Libraries use repeated values to always denote each entity or group of entities in a specific way. When resources have metadata elements with the exact same value, their correlation is made obvious, making the retrieval of all matching metadata records easier. The library uses guidelines on which metadata elements should only use controlled terms, and how these values will be selected. In this paper, we study the use of the repeated values in many collections and also their effectiveness when all the collections are used together. We discovered values that are repeated often and values that are unusual, misused or just rare. Many metadata elements may use controlled terms as values, although they are traditionally used mostly in some of them. We see the differences on the use of the Dublin Core elements. The lack of coordination among collections results to many variations for each value. The study reveals the current usage of repeated values in active collections and helps adopting better guidelines, designing better tools and improving the effectiveness of the collections. |
2018
|
Kapidakis, S. Metadata Synthesis and Updates on Collections Harvested using the Open Archive Initiative Protocol for Metadata Harvesting Conference International Conference on Theory and Practice of Digital Libraries, TPDL 2018, LNCS 10450, Springer, 2018, ISSN: 0302-9743. @conference{Kapidakis2018b,
title = {Metadata Synthesis and Updates on Collections Harvested using the Open Archive Initiative Protocol for Metadata Harvesting},
author = {Kapidakis, S.},
url = {https://www.springerprofessional.de/en/metadata-synthesis-and-updates-on-collections-harvested-using-th/16097186},
issn = {0302-9743},
year = {2018},
date = {2018-09-14},
booktitle = {International Conference on Theory and Practice of Digital Libraries, TPDL 2018},
pages = {16-31},
publisher = {LNCS 10450, Springer},
abstract = {Harvesting tasks gather information to a central repository. We studied the metadata returned from 744179 harvesting tasks from 2120 harvesting services in 529 harvesting rounds during a period of two years. To achieve that, we initiated nearly 1,500,000 tasks, because a significant part of the Open Archive Initiative harvesting services never worked or have ceased working while many other services fail occasionally. We studied the synthesis (elements and verbosity of values) of the harvested metadata, and how it evolved over time. We found that most services utilize almost all Dublin Core elements, but there are services with minimal descriptions. Most services have very minimal updates and, overall, the harvested metadata is slowly improving over time with “description” and “relation” improving the most. Our results help us to better understand how and when the metadata are improved and have more realistic expectations about the quality of the metadata when we design harvesting or information systems that rely on them.},
keywords = {digital libraries, harvesting, Metadata, open archive},
pubstate = {published},
tppubtype = {conference}
}
Harvesting tasks gather information to a central repository. We studied the metadata returned from 744179 harvesting tasks from 2120 harvesting services in 529 harvesting rounds during a period of two years. To achieve that, we initiated nearly 1,500,000 tasks, because a significant part of the Open Archive Initiative harvesting services never worked or have ceased working while many other services fail occasionally. We studied the synthesis (elements and verbosity of values) of the harvested metadata, and how it evolved over time. We found that most services utilize almost all Dublin Core elements, but there are services with minimal descriptions. Most services have very minimal updates and, overall, the harvested metadata is slowly improving over time with “description” and “relation” improving the most. Our results help us to better understand how and when the metadata are improved and have more realistic expectations about the quality of the metadata when we design harvesting or information systems that rely on them. |