2019
|
Triantafyllou, Ioannis; Vorgia, Froso; Koulouris, Alexandros Hypatia Digital Library: a novel text classification approach for small text fragments Journal Article In: Journal of Integrated Information Management, vol. 4, no. 2, pp. 16-23, 2019. @article{Ioannis2019,
title = {Hypatia Digital Library: a novel text classification approach for small text fragments},
author = {Triantafyllou, Ioannis and Vorgia, Froso and Koulouris, Alexandros},
url = {http://ejournals.uniwa.gr/index.php/JIIM/article/view/4420},
doi = {https://doi.org/10.26265/jiim.v4i2.4420},
year = {2019},
date = {2019-10-30},
journal = {Journal of Integrated Information Management},
volume = {4},
number = {2},
pages = {16-23},
abstract = {Purpose - The purpose of this paper is to further investigate prior work of the authors in text classification in Hypatia, the digital library of the University of Western Attica. The main objective is to provide an accurate automated classification tool as an alternative to manual assignments.
Design/methodology/approach - The crucial point in text classification is the selection of the most important term-words for document representation. The specific document collection consists of 718 abstracts in Medicine, Tourism and Food Technology. Two weighting methods were investigated: classic TF.IDF and DEVMAX.DF. The last one was proposed by the authors as a more accurate term-word selection tool for smaller text fragments. Classification was conducted by applying 14 classifiers available on WEKA.
Findings - Classification process yielded an excellent ~97% precision score and DEVMAX.DF proved to perform better than classic TF.IDF.},
keywords = {algorithms, DEVMAX, digital libraries, text classification, TF-IDF, WEKA, word stemming},
pubstate = {published},
tppubtype = {article}
}
Purpose - The purpose of this paper is to further investigate prior work of the authors in text classification in Hypatia, the digital library of the University of Western Attica. The main objective is to provide an accurate automated classification tool as an alternative to manual assignments.
Design/methodology/approach - The crucial point in text classification is the selection of the most important term-words for document representation. The specific document collection consists of 718 abstracts in Medicine, Tourism and Food Technology. Two weighting methods were investigated: classic TF.IDF and DEVMAX.DF. The last one was proposed by the authors as a more accurate term-word selection tool for smaller text fragments. Classification was conducted by applying 14 classifiers available on WEKA.
Findings - Classification process yielded an excellent ~97% precision score and DEVMAX.DF proved to perform better than classic TF.IDF. |
2018
|
Kapidakis, S. Metadata Synthesis and Updates on Collections Harvested using the Open Archive Initiative Protocol for Metadata Harvesting Conference International Conference on Theory and Practice of Digital Libraries, TPDL 2018, LNCS 10450, Springer, 2018, ISSN: 0302-9743. @conference{Kapidakis2018b,
title = {Metadata Synthesis and Updates on Collections Harvested using the Open Archive Initiative Protocol for Metadata Harvesting},
author = {Kapidakis, S.},
url = {https://www.springerprofessional.de/en/metadata-synthesis-and-updates-on-collections-harvested-using-th/16097186},
issn = {0302-9743},
year = {2018},
date = {2018-09-14},
booktitle = {International Conference on Theory and Practice of Digital Libraries, TPDL 2018},
pages = {16-31},
publisher = {LNCS 10450, Springer},
abstract = {Harvesting tasks gather information to a central repository. We studied the metadata returned from 744179 harvesting tasks from 2120 harvesting services in 529 harvesting rounds during a period of two years. To achieve that, we initiated nearly 1,500,000 tasks, because a significant part of the Open Archive Initiative harvesting services never worked or have ceased working while many other services fail occasionally. We studied the synthesis (elements and verbosity of values) of the harvested metadata, and how it evolved over time. We found that most services utilize almost all Dublin Core elements, but there are services with minimal descriptions. Most services have very minimal updates and, overall, the harvested metadata is slowly improving over time with “description” and “relation” improving the most. Our results help us to better understand how and when the metadata are improved and have more realistic expectations about the quality of the metadata when we design harvesting or information systems that rely on them.},
keywords = {digital libraries, harvesting, Metadata, open archive},
pubstate = {published},
tppubtype = {conference}
}
Harvesting tasks gather information to a central repository. We studied the metadata returned from 744179 harvesting tasks from 2120 harvesting services in 529 harvesting rounds during a period of two years. To achieve that, we initiated nearly 1,500,000 tasks, because a significant part of the Open Archive Initiative harvesting services never worked or have ceased working while many other services fail occasionally. We studied the synthesis (elements and verbosity of values) of the harvested metadata, and how it evolved over time. We found that most services utilize almost all Dublin Core elements, but there are services with minimal descriptions. Most services have very minimal updates and, overall, the harvested metadata is slowly improving over time with “description” and “relation” improving the most. Our results help us to better understand how and when the metadata are improved and have more realistic expectations about the quality of the metadata when we design harvesting or information systems that rely on them. |