December 2023
Kathleen Shearer, COAR | Silvia Nakano, COAR | Eloy Rodrigues, University of Minho
Natalia Manola, OpenAIRE | Martine Pronk, LIBER | Vanessa Proudman, Sparc Europe
Report on Repository Survey in Europe, December 2023
Page 2 of 36
Executive Summary
“I think repositories are absolutely necessary as part of the chain to create a
sustainable Open Access world.” - Survey respondent
Open Science is ushering in a new paradigm for research; one in which all
researchers have unprecedented access to the full corpus of research for
analysis, text and data mining, and other new research methods. A prerequisite
for achieving this vision is a strong and well-functioning network of repositories that
provides human and machine access to the wide range of valuable research
outputs. It will require transitioning repositories from isolated institutional services
towards the vision of the next generation repository, whereby repositories are part
of a distributed, globally networked infrastructure for scholarly communication,
on top of which layers of value-added services can be deployed.
In January 2023, OpenAIRE, LIBER, SPARC Europe, and COAR launched a joint
strategy aimed at strengthening the European repository network. Through this
strategy we are committed to working together - and with other relevant
organisations - to develop and execute a plan that will reinforce and enhance
repositories in Europe. As a first step, a survey of the European repository
landscape was undertaken in February-March 2023.
The survey had 394 responses from repositories in 34 countries. We found that,
collectively, European repositories acquire, preserve and provide open access to
tens or possibly hundreds of millions of valuable research outputs and represent
critical, not-for-profit infrastructure in the European open science landscape. They
are used for sharing articles that may be paywalled in published journals, but also
for providing access to a large variety of other types of research outputs including
research data, theses/dissertations, conference papers, preprints, code, and so
on.
A large proportion of repositories are based at universities making them quite
sustainable and, by every indication, their collections are being well-used by the
research community and beyond. The number and range of value-added
services to which repositories are contributing demonstrates that European
repositories have been progressing towards the vision of the next generation
repository, which is about moving beyond the repository as an institutional service,
to the networked repository that is an integral part of the broader ecosystem. In
Report on Repository Survey in Europe, November 2023
Page 3 of 36
addition, repositories are well placed to support the expansion of open science
practices across Europe and the reformation of research assessment, which
places a greater emphasis on inclusiveness, diversity, and transparency.
However, to fully achieve our vision, there is still work to be done. The survey has
exposed a number of important areas where the current repository landscape
could be strengthened. In particular, we found that repositories struggle with three
main challenges:
(1) maintaining up-to-date, highly functioning software platforms,
(2) applying consistent and comprehensive good practices in terms of
metadata, preservation, and usage statistics; and
(3) gaining appropriate visibility in the scholarly ecosystem.
Despite the challenges, the current climate offers exciting opportunities for
repositories. Many funders are actively promoting the repository route for articles
because of their role in supporting equitable access to content (i.e. no fees to
access or deposit). The value proposition for open science is growing and
repositories are increasingly recognised as the main mechanism for collecting
and providing access to a wide range of other research outputs. Add to this, the
nascent, but growing, interest in the publish-review-curate model in which
repositories have a central function1, and it seems they are well placed to expand
their current role in the ecosystem.
To support this evolving role for repositories, OpenAIRE, LIBER, SPARC Europe and
COAR have identified three areas where we can work together to help advance
and strengthen repositories in Europe:
- Highlighting the value proposition and advocating for the critical role of repositories in Europe
- Propagating best practices for repositories across the continent
- Assisting with the creation and coordination of national networks In the coming months, our organisations will develop more concrete plans for advancing each of these areas.
1 From cOAlition S: To illustrate how a scholar-led communication system can (and already does) work in practice and supports the principles of Open Science, we highlight the Publish, Review, Curate (PRC) model, which we find particularly promising. https://www.coalition-s.org/wp- content/uploads/2023/10/Towards_Responsible_Publishing_web.pdf
Report on Repository Survey in Europe, December 2023
Page 4 of 36
Table of Contents
Executive Summary
2
Introduction
5
Results
6
Number of respondents
6
Types of institutions
6
Predominant content types in the repository
7
Number of items in the repository
9
Languages of metadata and content
10
Who can deposit
12
National networks
12
Hosting model for repository
13
Software platforms
14
Add-ons/patch/code added to the codebase
14
Software Upgrades
15
Metadata schemas
16
OpenAIRE Guidelines
16
Licences
17
Author IDs
18
Resource Persistent Identifiers
18
Other services
19
Preservation
19
Usage statistics
19
Curation
20
Certification
21
Other value added services
22
Main funding sources
22
Staffing
23
Sustainability
24
Challenges
25
Solutions / strategies
26
Analysis
27
Coverage
27
Collections
27
Multilingualism
28
Services
29
Metadata and persistent identifiers
30
Technologies and functionalities
31
Certification
32
Sustainability and funding
32
Conclusions
33
Opportunities and Next Steps
35
Data Availability Statement
35
Report on Repository Survey in Europe, November 2023
Page 5 of 36
Introduction
Open Science is ushering in a new paradigm for research; one in which all
researchers have unprecedented access to the full corpus of research for
analysis, text and data mining, and other new research methods. A prerequisite
for achieving this vision is a strong and well-functioning network of repositories that
provides human and machine access to the wide range of valuable research
outputs. This will require transitioning repositories from isolated institutional services
towards the vision of the next generation repository, whereby repositories are part
of a distributed, globally networked infrastructure for scholarly communication,
on top of which layers of value-added services can be deployed.
Yet, progress towards this vision has been relatively slow, and many repositories
continue to struggle with older technologies and a number of other challenges.
To address this COAR and other key stakeholders in different regions and countries
have been working together to adopt strategies that will strengthen repository
networks and accelerate the adoption of leading-edge functionalities2.
Currently, Europe has one of the most well-developed networks globally with
hundreds of repositories hosted by universities, research centres, government
departments, and not-for-profit organisations. However, there are significant
variations across the European repository landscape. For Europe to maintain its
position as a global leader in open science, we must ensure there is a strong and
sustainable network of open repositories.
In January 2023, OpenAIRE, LIBER, SPARC Europe, and COAR launched a joint
strategy aimed at strengthening the European repository network. Through this
strategy we are committed to working together - and with other relevant
organisations - to develop and execute an action plan that will reinforce and
enhance repositories in Europe.
As a first step, a survey of the European repository landscape was undertaken in
February-March 2023. The aim of the survey was to gain a better understanding
of the repository ecosystem in Europe. The survey was designed and disseminated
by
partner
organisations
through
various
channels
including
website
announcements, email lists, twitter (X) and other social media.
This report provides the results of the survey and will assist the organisations in
developing relevant and effective activities to strengthen repositories in the
region.
2 https://www.coar-repositories.org/news-updates/what-we-do/regional-initiatives/
Report on Repository Survey in Europe, December 2023
Page 6 of 36
Results
For the purposes of this survey, an open repository was defined as a digital
management system that collects one or more types of research output and
provides free access to the content to all users (with the exception of restrictions
for sensitive data).
Number of respondents
There were 394 responses from 34 countries in Europe (Figure 1), with 10 countries
(Austria, Croatia, Germany, Italy, Poland, Portugal, Serbia, Spain, Switzerland, UK)
that each had over 15 responses. In certain areas, we provide a small snapshot
of certain results of each of these countries and have undertaken a more in-depth
analysis of the situation.
Figure 1: Geographic distribution of survey respondent repositories Types of institutions Most respondent repositories were based at universities, followed by research centres (Figure 2). The rest fell into the “other” category, which was composed of
Report on Repository Survey in Europe, November 2023
Page 7 of 36
a diversity of institution types including libraries, university departments, scientific
institutions, hospitals, government entities and not-for-profit organisations. Two
respondent repositories were managed by publishers. As many university
repositories are managed by the library, we assume that a number of the
respondents that indicated the repository was based at a university, was also
located in the library.
Figure 2: Types of institutions where repositories are based
Predominant content types in the repository
Most repositories reported collecting a variety of content types, with 54% of
respondents indicating that the predominant content type in the repository was
published articles (Table 1). Theses and dissertations are predominant for 19% of
respondents and research data for 13%. 14% of respondents indicated preprints
were in the top 3 of their content types, but only 1% (5 repositories) reported they
were the predominant type.
Repositories with research data as their predominant content type tend not to
collect publications, theses, preprints, and other - rather they seem to specialise
in research data only. The repositories that collect predominantly publications
(articles, theses, and preprints) usually collect a variety of content types, including
research data. (Figure 3)
Report on Repository Survey in Europe, December 2023
Page 8 of 36
Respondents were not asked to specify content types if they chose the other
category, so we do not have further information about what they are.
Table 1: Top three most predominant content types in repositories
Published articles Preprints Research data Theses & dissertations Conference proceedings Other 1st 213 5 52 74 10 20
54% 1% 13% 19% 3% 5% 2nd 68 23 11 71 97 57
17% 6% 3% 18% 25% 14% 3rd 14 26 19 80 89 42
4% 7% 5% 20% 23% 11% Top 3 75% 14% 21% 57% 50% 30%
Figure 3: Top six predominant types
Report on Repository Survey in Europe, November 2023
Page 9 of 36
Number of items in the repository
Collection sizes (number of items in each repository) vary significantly across
respondent repositories with about 20% having less than 1,000 items, and the six
largest repositories having more than a million records each (Figure 4). The largest
repository, Europe PMC, contains over 8.5 million full text records. The most
frequent collection sizes of repositories are from 1,000 to 10,000 items (32.5%);
10,000 to 50,000 (27.5%); and less than 1,000 items (21.8%). The average collection
size for institutional repositories is 64,859 items (repositories that collect their local
research outputs), and for other repositories (domain, data, and national
repositories) the average is 386,088 items.
Figure 4: Repository collection sizes
Report on Repository Survey in Europe, December 2023
Page 10 of 36
Languages of metadata and content
Figure 5: Predominant language of resources in repositories For 57% of the repositories, the predominant language of the repository records is English. If we exclude UK repositories (62 respondents), 47% (142 of 299) of repositories reported that English was the predominant language of content. (Figure 5 and 6).
Figure 6: Second most predominant language of resources in repositories For repositories whose predominant language is not English, it is always the national language that was reported as being predominant. For the majority of those repositories, English is the second most predominant language, with a few exceptions shown in the table below (Table 2). If we look at the countries with more than 15 repositories represented in the survey, certain ones were notable for having low portions of English content: Croatia, Portugal, Poland, and Spain.