Report on Repository Survey in Europe, November 2023
Page 11 of 36
Table 2: Languages of metadata and resources in the repositories
COUNTRIES WITH
OVER 15
REPOSITORIES IN
SURVEY
ENGLISH PREDOMINANT
LANGUAGE
LOCAL PREDOMINANT
LANGUAGE
OTHER SECOND
LANGUAGES
ENG %
% LOCAL as
2nd
predominant
language
LOCAL
%
% ENG as 2nd
predominant
language
Croatia
9%
6%
91%
69%
Italian
Portugal
22%
22%
78%
67%
Spanish
Poland
32%
26%
68%
59%
Spain 34% 32% 66% 59% Catalan (3) Austria 41% 35% 59% 53% Hungarian Serbia 42% 42% 58% 45% Russian Germany 56% 42% 44% 36%
Switzerland (*) 70% 75% 20% 15%
Italy 80% 67% 20% 20% Spanish United Kingdom 100%
Spanish (6) French (5) German (3) Welsh (2) Polish, Italian & Chinese -a repository belonging to the international publisher) (*) Local LANGUAGES: DEU, ITA, FRA
Report on Repository Survey in Europe, December 2023
Page 12 of 36
Who can deposit
Over 75% of repositories in the survey serve their local communities and offer
services to only persons who are affiliated with their institution (Figure 7). 6% of
respondent repositories are open to anyone, 4% are open to domain
communities, and 1% are open to persons from a specific country. Most of the 9%
who chose the ‘’other’ category, clarified that the repository was an institutional
repository offering a mediated deposit service, whereby repository staff
deposited content on behalf of the creators, therefore the portion of institutional
repositories was actually over 80%.
Figure 7: Repository accepts content from which communities National networks About half of respondent repositories indicated they were part of a national level network or service (Figure 8). The types of services/networks are varied and include harvesters, portals and other discovery/indexing services; communities of practice; shared platforms; open source platform networks; and domain networks. However, the responses were inconsistent in many countries, with some respondents from a given country indicating they belong to a network and others indicating they did not. This could be because respondents had a different interpretation of what is a national network or national services, but also some national networks may serve only a subset of repositories in their country.
Report on Repository Survey in Europe, November 2023
Page 13 of 36
However, a substantial amount, i.e. almost half of all responding repositories in a
country, feel part of an existing network. Several repositories belong to more than
one type of national network or service. Given the fact that the communities
advancing open access and research data management communities are often
distinct from each other, it is not surprising that respondents from these different
sectors named different national services.
Figure 8: Number of respondents who are part of a national network Hosting model for repository 57% (223) of respondent repositories are locally hosted, while 43% (165) of respondent repositories are hosted by an external provider (Figure 9). Most external providers are national hosting platforms, university data centres, or national cloud services. 7 respondent repositories are hosted by commercial providers.
Figure 9: Local or external hosting of repository
Report on Repository Survey in Europe, December 2023
Page 14 of 36
Software platforms
DSpace is the most commonly used software
platform, with 41% of respondents indicating
they currently use the DSpace software. Other
widely used platforms are Eprints (11%),
Fedora/Islandora (11%) and Dataverse (4%).
Following this, several other platforms were also
reported: Invenio (3%), Pure (3%), OPUS (3%),
Omega-PSIR (2%), Samvera (1%), and Figshare
(1%) along with a variety of other software
types. (Figure 10) It is worth noting that 8% of
respondents run their repositories on locally
developed
software
(4%
of
institutional
repositories use a locally developed software
platform and 22% of national / domain /
generalist repositories have locally developed
software platform).
Add-ons/patch/code added
to the codebase
About 61% of all respondents indicated that they have changed or added to the
basic “out of the box” versions of the repository software platform (Figure 11).
Figure 11: Number of respondents that adopt add-ons, patches of new code Figure 10. Software platforms used by repositories
Report on Repository Survey in Europe, November 2023
Page 15 of 36
This situation is more frequent in Eprints (83,7%) and DSpace repositories (63,2%),
compared with all the other platforms (58%).
Software Upgrades
42% of repositories upgraded their repository platforms in 2022, and 74% of
repositories stated that they were planning to upgrade in 2023. 21% of repositories
that upgraded in 2022, plan to do it again in 2023. In total, about 60% of
respondents have either updated their repository in 2022 or are planning to
update to a more recent version in 2023 (Figures 12 and 13).
Figure 12: Year of last major upgrade
Figure 13: Year of next major upgrade
Report on Repository Survey in Europe, December 2023
Page 16 of 36
Metadata schemas
The most common metadata schema adopted in repositories is Dublin Core, with
77% of repositories indicating they provide support for Dublin Core (Figure 14). 26%
provide support for the DataCite schema, which was initially developed for
research data and unsurprisingly, there was a positive correlation between the
repositories that collect research data and support the DataCite schema. Just
under half of respondents indicated that they support more than one type of
metadata schema.
Figure 14: Metadata schemas available to use in the repository OpenAIRE Guidelines The OpenAIRE guidelines, which are more extensive and detailed than Dublin Core and include additional metadata elements such as funder and project IDs and access status, are becoming a widely used standard in Europe as they have been recommended by the European Commission (EC) as part of their open access policy. Many repositories in Europe (74%) have adopted the OpenAIRE Guidelines (Figure 15). It is worth noting that a significant number of repositories (167) are still using older versions of the Guidelines (which are less granular and don’t include identifier schemes for authors, organisations or funders, and the COAR Controlled Vocabularies), meaning they do not meet the current EC requirements for metadata.
Report on Repository Survey in Europe, November 2023 Page 17 of 36
Figure 15: Support for OpenAIRE Guidelines Licences Almost all repositories (96%) offer users the option of choosing a specific licence, the most common of which are Creative Commons licences (91%). Some repositories offer several licensing options (Figure 16).
Figure 16: Licences available in the repository
Report on Repository Survey in Europe, December 2023
Page 18 of 36
Author IDs
ORCID IDs are quite widely supported, with 260 repositories providing a metadata
field for ORCID in their records (66%), 71 support National IDs (18%), and other
types of IDs are also supported by 78 repositories. 97 repositories do not support
any type of author ID, which represents about 25% of respondents (Figure 17).
Figure 17: Authors IDs supported by the repository Resource Persistent Identifiers Many repositories assign at least one type of persistent identifier (PID) to the resources deposited, with the most common one being DOIs (Digital Object Identifiers) - 46%, followed by Handles (44%). 67 repositories support both Handles and DOIs. In the “other” category, most indicated that they are using an URN (Uniform Resource Name) or ISSN. About 10% of repositories do not assign / support any type of PID for the resources in their repository.
Figure 18: Persistent identifiers for resources assigned by the repository
Report on Repository Survey in Europe, November 2023
Page 19 of 36
Other services
Preservation
Approximately 63% of respondents (229) have a formal preservation policy in
place at their repository, while 37% (136) indicated they have no preservation
policy (Figure 19). In the comments, some respondents indicated that they were
in the process of developing a policy (15); and several respondents noted that,
while they don’t have a formal policy, they do have a variety of preservation
practices and procedures in place, including making back-up copies/mirroring
content elsewhere. Some repositories are integrated with broader institutional
preservation systems.
Figure 19: Repositories with a preservation policy Usage statistics Most respondent repositories (73%) are collecting usage statistics, with several using more than one usage statistics service. Only 33 repositories (about 10%) indicating they do not collect any type of usage stats. Most common is the use of the local repository statistics functionality, which is provided by the software platform.
Report on Repository Survey in Europe, December 2023 Page 20 of 36
Figure 20: Type of repository statistics services used by the repository Curation Most repositories apply some level of curation upon deposit of a new resource. Metadata validation is the most common (checking that it is correct and/or complete), followed by mediated deposit (repository staff deposit on behalf of the researchers) and content validation (checking file formats and copyright) (Figure 21). In the “other” category, respondents listed things such as review for compliance with other deposit guidelines, checksum validation, and ethics review. Repositories do not undertake editorial review, but rather ensure resources are described and formatted properly.
Figure 21: Curation process undertaken by the repository
Report on Repository Survey in Europe, November 2023
Page 21 of 36
Certification
23% of respondents said that the repository has undergone some type of
certification (Figure 22), with CORE Trust Seal being the most common, followed
by DINI and Data Seal of Approval. No significance difference in certification
rates was found across repository types, with a slight increase for research data
repositories. 19 respondents indicated compliance with national aggregator
requirements or OpenAIRE as “certification” (which is not so much of a
certification, but rather validation of the use of the OpenAIRE guidelines) (Table
3).
Figure 22: The repository has undergone some type of certification Table 3: Type of certification undergone by repository CoreTrustSeal 20 DINI 14 National Aggregator Compliance 14 Data Seal of Approval 7 OpenAIRE Compliance 5 ISO 3
Report on Repository Survey in Europe, December 2023
Page 22 of 36
Other value added services
Numerous other services beyond the ones mentioned above were described by
respondents. Most common is the integration of repositories with other institutional
services, such as a CRIS (current research information system), academic profile
pages, or university websites.
A significant number of respondents also indicated that repository resources are
reused by other types of external systems such as aggregators and discovery
systems, but are also integrated into customised collections at the national level,
reused for research assessment exercises (e.g. REF), and incorporated into
national education curriculum.
Enhancement of repository records using metadata from other systems (e.g. using
ORCID, Crossref records) is also common, as is the export of repository metadata
to other systems. Other tools/functionalities such as using the CORE recommender
system, digitization services, plagiarism detection, and request-a-copy were also
mentioned.
Training was also widely referred to, especially by data repositories, which often
provide training to researchers on how to format their data and how to complete
data management plans. Some repositories offer assistance for authors to
navigate copyright and other licensing issues.
Main funding sources
Institutional
funding
represents
the
predominant funding source for repositories,
with 77% of respondents indicating their
main funding source was their institution.
13% receive external project funding (Figure
23). Very few repositories (5, or just over 1%)
charge a fee for depositors, and after further
examination, these fees were only applied
for certain types of deposits (i.e., unusually
large data sets that require significant
storage capacity). Most repositories rely on
a single funding source, with only a few that
receive funds from more than one source
(institution and project funds mainly).
Figure 23: Predominant funding sources for
the repository
Report on Repository Survey in Europe, November 2023
Page 23 of 36
Staffing
After removing several outlier responses with unrealistically high numbers (we
presume these questions were misinterpreted by some), the average number of
staff per repository was found to be just under 3 full time staff members (FTE). The
staffing for repositories is spread across several positions: repository managers,
technical support, metadata and content curation, and “other” positions. Close
to half of the staffing of repositories (47%) is devoted to metadata and content
curation, 27% to the repository manager position and 19% to technical support
positions (Figure 24). Over half of respondent repositories have 2 or less full time
employees (Figure 25).
Figure 24: Distribution of staffing in repositories
Figure 25: Number of staff members per repository