Report%20of%20European%20repositories%20survey_2023_1.pdf

Type: Document | Status: ready

December 2023

Kathleen Shearer, COAR | Silvia Nakano, COAR | Eloy Rodrigues, University of Minho
Natalia Manola, OpenAIRE | Martine Pronk, LIBER | Vanessa Proudman, Sparc Europe

Report on Repository Survey in Europe, December 2023 Page 2 of 36
Executive Summary

“I think repositories are absolutely necessary as part of the chain to create a sustainable Open Access world.” - Survey respondent Open Science is ushering in a new paradigm for research; one in which all researchers have unprecedented access to the full corpus of research for analysis, text and data mining, and other new research methods. A prerequisite for achieving this vision is a strong and well-functioning network of repositories that provides human and machine access to the wide range of valuable research outputs. It will require transitioning repositories from isolated institutional services towards the vision of the next generation repository, whereby repositories are part of a distributed, globally networked infrastructure for scholarly communication, on top of which layers of value-added services can be deployed.
In January 2023, OpenAIRE, LIBER, SPARC Europe, and COAR launched a joint strategy aimed at strengthening the European repository network. Through this strategy we are committed to working together - and with other relevant organisations - to develop and execute a plan that will reinforce and enhance repositories in Europe. As a first step, a survey of the European repository landscape was undertaken in February-March 2023. The survey had 394 responses from repositories in 34 countries. We found that, collectively, European repositories acquire, preserve and provide open access to tens or possibly hundreds of millions of valuable research outputs and represent critical, not-for-profit infrastructure in the European open science landscape. They are used for sharing articles that may be paywalled in published journals, but also for providing access to a large variety of other types of research outputs including research data, theses/dissertations, conference papers, preprints, code, and so on.
A large proportion of repositories are based at universities making them quite sustainable and, by every indication, their collections are being well-used by the research community and beyond. The number and range of value-added services to which repositories are contributing demonstrates that European repositories have been progressing towards the vision of the next generation repository, which is about moving beyond the repository as an institutional service, to the networked repository that is an integral part of the broader ecosystem. In

Report on Repository Survey in Europe, November 2023 Page 3 of 36
addition, repositories are well placed to support the expansion of open science practices across Europe and the reformation of research assessment, which places a greater emphasis on inclusiveness, diversity, and transparency. However, to fully achieve our vision, there is still work to be done. The survey has exposed a number of important areas where the current repository landscape could be strengthened. In particular, we found that repositories struggle with three main challenges:
(1) maintaining up-to-date, highly functioning software platforms,
(2) applying consistent and comprehensive good practices in terms of metadata, preservation, and usage statistics; and
(3) gaining appropriate visibility in the scholarly ecosystem. Despite the challenges, the current climate offers exciting opportunities for repositories. Many funders are actively promoting the repository route for articles because of their role in supporting equitable access to content (i.e. no fees to access or deposit). The value proposition for open science is growing and repositories are increasingly recognised as the main mechanism for collecting and providing access to a wide range of other research outputs. Add to this, the nascent, but growing, interest in the publish-review-curate model in which repositories have a central function1, and it seems they are well placed to expand their current role in the ecosystem.
To support this evolving role for repositories, OpenAIRE, LIBER, SPARC Europe and COAR have identified three areas where we can work together to help advance and strengthen repositories in Europe:

  1. Highlighting the value proposition and advocating for the critical role of repositories in Europe
  2. Propagating best practices for repositories across the continent
  3. Assisting with the creation and coordination of national networks In the coming months, our organisations will develop more concrete plans for advancing each of these areas.

1 From cOAlition S: To illustrate how a scholar-led communication system can (and already does) work in practice and supports the principles of Open Science, we highlight the Publish, Review, Curate (PRC) model, which we find particularly promising. https://www.coalition-s.org/wp- content/uploads/2023/10/Towards_Responsible_Publishing_web.pdf

Report on Repository Survey in Europe, December 2023 Page 4 of 36
Table of Contents Executive Summary 2 Introduction 5 Results 6 Number of respondents 6 Types of institutions 6 Predominant content types in the repository 7 Number of items in the repository 9 Languages of metadata and content 10 Who can deposit 12 National networks 12 Hosting model for repository 13 Software platforms 14 Add-ons/patch/code added to the codebase 14 Software Upgrades 15 Metadata schemas 16 OpenAIRE Guidelines 16 Licences 17 Author IDs 18 Resource Persistent Identifiers 18 Other services 19 Preservation 19 Usage statistics 19 Curation 20 Certification 21 Other value added services 22 Main funding sources 22 Staffing 23 Sustainability 24 Challenges 25 Solutions / strategies 26 Analysis 27 Coverage 27 Collections 27 Multilingualism 28 Services 29 Metadata and persistent identifiers 30 Technologies and functionalities 31 Certification 32 Sustainability and funding 32 Conclusions 33 Opportunities and Next Steps 35 Data Availability Statement 35

Report on Repository Survey in Europe, November 2023 Page 5 of 36
Introduction Open Science is ushering in a new paradigm for research; one in which all researchers have unprecedented access to the full corpus of research for analysis, text and data mining, and other new research methods. A prerequisite for achieving this vision is a strong and well-functioning network of repositories that provides human and machine access to the wide range of valuable research outputs. This will require transitioning repositories from isolated institutional services towards the vision of the next generation repository, whereby repositories are part of a distributed, globally networked infrastructure for scholarly communication, on top of which layers of value-added services can be deployed. Yet, progress towards this vision has been relatively slow, and many repositories continue to struggle with older technologies and a number of other challenges. To address this COAR and other key stakeholders in different regions and countries have been working together to adopt strategies that will strengthen repository networks and accelerate the adoption of leading-edge functionalities2. Currently, Europe has one of the most well-developed networks globally with hundreds of repositories hosted by universities, research centres, government departments, and not-for-profit organisations. However, there are significant variations across the European repository landscape. For Europe to maintain its position as a global leader in open science, we must ensure there is a strong and sustainable network of open repositories.
In January 2023, OpenAIRE, LIBER, SPARC Europe, and COAR launched a joint strategy aimed at strengthening the European repository network. Through this strategy we are committed to working together - and with other relevant organisations - to develop and execute an action plan that will reinforce and enhance repositories in Europe.
As a first step, a survey of the European repository landscape was undertaken in February-March 2023. The aim of the survey was to gain a better understanding of the repository ecosystem in Europe. The survey was designed and disseminated by partner organisations through various channels including website announcements, email lists, twitter (X) and other social media. This report provides the results of the survey and will assist the organisations in developing relevant and effective activities to strengthen repositories in the region.

2 https://www.coar-repositories.org/news-updates/what-we-do/regional-initiatives/

Report on Repository Survey in Europe, December 2023 Page 6 of 36
Results For the purposes of this survey, an open repository was defined as a digital management system that collects one or more types of research output and provides free access to the content to all users (with the exception of restrictions for sensitive data). Number of respondents There were 394 responses from 34 countries in Europe (Figure 1), with 10 countries (Austria, Croatia, Germany, Italy, Poland, Portugal, Serbia, Spain, Switzerland, UK) that each had over 15 responses. In certain areas, we provide a small snapshot of certain results of each of these countries and have undertaken a more in-depth analysis of the situation.

Figure 1: Geographic distribution of survey respondent repositories Types of institutions Most respondent repositories were based at universities, followed by research centres (Figure 2). The rest fell into the “other” category, which was composed of

Report on Repository Survey in Europe, November 2023 Page 7 of 36
a diversity of institution types including libraries, university departments, scientific institutions, hospitals, government entities and not-for-profit organisations. Two respondent repositories were managed by publishers. As many university repositories are managed by the library, we assume that a number of the respondents that indicated the repository was based at a university, was also located in the library.

Figure 2: Types of institutions where repositories are based Predominant content types in the repository Most repositories reported collecting a variety of content types, with 54% of respondents indicating that the predominant content type in the repository was published articles (Table 1). Theses and dissertations are predominant for 19% of respondents and research data for 13%. 14% of respondents indicated preprints were in the top 3 of their content types, but only 1% (5 repositories) reported they were the predominant type.
Repositories with research data as their predominant content type tend not to collect publications, theses, preprints, and other - rather they seem to specialise in research data only. The repositories that collect predominantly publications (articles, theses, and preprints) usually collect a variety of content types, including research data. (Figure 3)

Report on Repository Survey in Europe, December 2023 Page 8 of 36
Respondents were not asked to specify content types if they chose the other category, so we do not have further information about what they are. Table 1: Top three most predominant content types in repositories

Published articles Preprints Research data Theses & dissertations Conference proceedings Other 1st 213 5 52 74 10 20

54% 1% 13% 19% 3% 5% 2nd 68 23 11 71 97 57

17% 6% 3% 18% 25% 14% 3rd 14 26 19 80 89 42

4% 7% 5% 20% 23% 11% Top 3 75% 14% 21% 57% 50% 30%

Figure 3: Top six predominant types

Report on Repository Survey in Europe, November 2023 Page 9 of 36
Number of items in the repository Collection sizes (number of items in each repository) vary significantly across respondent repositories with about 20% having less than 1,000 items, and the six largest repositories having more than a million records each (Figure 4). The largest repository, Europe PMC, contains over 8.5 million full text records. The most frequent collection sizes of repositories are from 1,000 to 10,000 items (32.5%); 10,000 to 50,000 (27.5%); and less than 1,000 items (21.8%). The average collection size for institutional repositories is 64,859 items (repositories that collect their local research outputs), and for other repositories (domain, data, and national repositories) the average is 386,088 items.

Figure 4: Repository collection sizes

Report on Repository Survey in Europe, December 2023 Page 10 of 36
Languages of metadata and content

Figure 5: Predominant language of resources in repositories For 57% of the repositories, the predominant language of the repository records is English. If we exclude UK repositories (62 respondents), 47% (142 of 299) of repositories reported that English was the predominant language of content. (Figure 5 and 6).

Figure 6: Second most predominant language of resources in repositories For repositories whose predominant language is not English, it is always the national language that was reported as being predominant. For the majority of those repositories, English is the second most predominant language, with a few exceptions shown in the table below (Table 2). If we look at the countries with more than 15 repositories represented in the survey, certain ones were notable for having low portions of English content: Croatia, Portugal, Poland, and Spain.

Page 1 of 4