Invenio digital library technology for open research data repositories


2016 (San Francisco, Oct 2016)


Tibor Simko et al. (presenter TBD)

talk (poster)


In this paper we present the new Invenio 3 digital library framework and
demonstrate its application in the field of open research data
repositories. Invenio digital library framework is composed of more than
sixty independently developed packages that share a set of common
patterns and communicate together via well-established APIs. The digital
repository managers can cherry-pick individual modules with the aim of
building a customised digital repository solution targeting their
individual needs and use cases. We present how the Invenio technology
has been applied in two research data services: (1) the CERN Open Data
portal that provides access to the approved open datasets and software
of the ALICE, ATLAS, CMS and LHCb collaborations; (2) the Zenodo service
that offers an open research data archiving solution to world-wide
scientific communities in any research discipline. We discuss the role
of underlying technologies such as the JSON Schema for controlling
metadata structure, the Elasticsearch for information retrieval, the
CERN EOS system for data storage, or the role of virtual environments
(CernVM) and container-based solutions (Docker) that together with the
archived data analysis software (Jupyter notebooks, custom analysis
code) aim at reproducing the research data analyses even many years
after their publication.

