Report%20of%20European%20repositories%20survey_2023_1.pdf

Type: Document | Status: ready

Report on Repository Survey in Europe, December 2023 Page 24 of 36
Sustainability 66% of respondents indicated that the repository was “very” sustainable and 31% indicated the repository was “somewhat” sustainable for the next three years. Some respondents noted that the repository has been a well-established service already for many years and is well used and well supported by their institution. Several respondents provided more information about their sustainability challenges, grouped into several categories:
• Time and resource requirements to properly curate metadata and content • Replacement of repositories with CRIS systems, which do not fully support the needs for managing a variety of content types • Complexities of regular software upgrades • High cost of employing outside companies to support software upgrades and ongoing maintenance of the system • Lack of expected functionalities of the repository platforms • Understaffing • Project-oriented funding model

Figure 26: Respondents perceptions of repository sustainability Together 97% of respondents (351) felt their repository was either “very” or “somewhat” sustainable, with only 13 respondents (3%) indicating that it was “not sustainable” (Figure 26). The 3% of respondents that felt their repository was unsustainable came from different countries and repository types, so no geographic generalisations could be inferred.

Report on Repository Survey in Europe, November 2023 Page 25 of 36
Challenges Respondents were asked to rank a number of challenges to their repository operations. Software upgrades ranked as the biggest challenge (Figure 26), with 39% of respondents indicating this was a big challenge and another 38% indicating that it was somewhat of a challenge. This was followed by employing skilled staff, with 28% asserting that this was a big challenge. and then underfunding which was flagged by 26% of respondents. It should be noted that more than 50% of respondents indicated that all the five issues proposed in the survey were either “a big challenge” or “somewhat of a challenge”.
We received numerous comments related to this survey question that fell into several other types of categories: lack of needed functionality of the software, policy trends moving away from repositories (e.g. a growing emphasis on gold open access), and complexities of the growing diversity and size of collections.

Figure 27: Challenges for repositories

Report on Repository Survey in Europe, December 2023 Page 26 of 36
Solutions / strategies In terms of helping to address existing challenges, respondents ranked several proposed activities as “somewhat” or “very” helpful (Figure 28). The majority of respondents ranked all options provided as very helpful or somewhat helpful. Advocacy for repositories was ranked highest, with 58% indicating it would be very helpful and 34% saying it would be somewhat helpful. Community of practice for technical support was also seen as very helpful with 92% of respondents indicating this would be very or somewhat helpful. Greater national and regional coordination for repositories and training for managers was also considered as very and somewhat helpful for the vast majority of respondents (86% and 85% respectively). A national or regional platform for hosting was ranked lowest, but still was considered helpful by just over 50% of respondents (this could be explained by the fact that several countries already have a national platform so it is not needed in those jurisdictions).
Some respondents provided other suggested solutions including the development of tools and services to assist with ingest and curation, increased funding, and improving incentives for researchers to deposit.

Figure 28: Activities that will help to address challenges

Report on Repository Survey in Europe, November 2023 Page 27 of 36
Analysis The survey had 394 responses from 34 countries in Europe. While this is only a portion of the repositories in the region - OpenDOAR and Re3Data together list over 3,000 European repositories - we believe that the collected sample is somewhat representative of the European repository landscape, but perhaps slightly skewed towards publication repositories.
A strong and well-functioning network of repositories that provides human and machine access to the wide range of valuable research outputs is needed for Europe to reap the full benefits of open science. This will require repositories to be sustainable; user friendly and technically agile; and able to interoperate with a range of other value-added services. Below is a summary of our key findings and highlight the strengths and weaknesses in the current landscape. Coverage The majority of the repositories that responded to the survey were institutional repositories. This may be partially due to the fact that the survey was disseminated by library-based organisations, but also reflects the reality that the vast majority of repositories in Europe are managed by universities/university libraries or research centres. Institutional repositories are generally quite sustainable, as they are hosted by long-lived institutions, who have committed budgets towards this activity. While OA repositories are open for anyone to access their content, most repositories focus on collecting research outputs created by members of a specific community, which typically fall into one of a few categories: institutional, national, international, and domain repositories. With the current prevalence and variety of repositories in Europe, all researchers will have at least one repository in which they can share their research outputs. Collections European repositories collect a variety of content types ranging from journal articles (author accepted manuscripts and publisher versions), e-theses and dissertations, research data, as well as a range of other materials. Institutional, international, and national repositories tend to collect a diversity of content types and disciplines; domain repositories usually focus on a specific content type in addition to their disciplinary coverage. Collection sizes vary significantly across

Report on Repository Survey in Europe, December 2023 Page 28 of 36
respondent repositories, with an average size of 64,859 items for institutional repositories and 386,088 items for the other types of repositories (data, domain, generalist and national repositories) and collection sizes range from less than a thousand items to millions of resources. The largest respondent repository was Europe PMC, which contains over 8 million full text records. Even if we consider a low estimate of around 1,500 active repositories in Europe with an average of 65,000 items per repository; this would mean European repositories collectively provide open access to close to 100 million items. Collectively, this represents a significant amount of content if we consider that the large knowledge graphs that aggregate from many different content providers contain between 200-300 million objects3,4. Repositories support bibliodiversity in the ecosystem. They do not charge for access or for research to deposit and they collect and preserve a range of content types in many domains and disciplines. The most predominant content types in repositories are journal articles, theses and dissertations, and conference proceedings. Research data is also common, followed by a long tail of other types of scholarly and educational materials. Some repositories contain preprints, but they still represent a small portion of items in the European network. Repositories, therefore, are well placed to support the expansion of open science practices across Europe and the reformation of research assessment, which places a greater emphasis on inclusiveness, diversity, and transparency. Multilingualism There is a growing recognition in Europe and beyond of the need to support and encourage publishing in local languages, as this ensures that the public has access to the research (which they often fund). The survey found that repositories play an important role in preserving and disseminating content in a variety of languages, especially local languages. 17% of respondent repositories collect content in only one language, almost exclusively represented by repositories in predominantly English-speaking countries (UK and Ireland), meaning most repositories collect content in at least two languages.
The survey also found some diversity in the languages of resources across the European repository network, with at least 29 languages represented in total. That said, the predominant language for over 50% of respondent repositories was

3 On Nov 24, 2023, OpenAlex had 246 million objects in its aggregation: https://help.openalex.org/
4 On Nov 24, 2023, OpenAIRE has 239 million objects in its aggregation: https://graph.openaire.eu/

Report on Repository Survey in Europe, November 2023 Page 29 of 36
English, even for many repositories from non-English speaking countries. There are 24 official languages in the EU, however, over 200 languages are spoken across the continent5. So there are still many languages that do not seem to be well represented in the European repository network. Repositories tend to collect resources in just two or three languages, with either the main local language being most predominant, or second most predominant after English. As English is the lingua-franca for research, especially in the STM (science, technology and medical fields), these results are not unexpected. As there are fewer international venues for disseminating non-English content, institutional repositories are playing a role in this respect for some of their local communities. In a few cases, repositories publish metadata and abstracts in two languages (usually the original language of the resource and English), which can lead to better discovery in (the predominantly English-focused) indexing and discovery services. We know from anecdotal information6 that repository platforms have typically been developed with English in mind and do not support all languages equally. Therefore, managing non-English content can involve extra efforts for those repositories such as translating the interface of the platform and metadata curation to correctly assign language codes, especially for languages that use non-roman characters. This may also partially explain the predominance of English in repositories. Services Along with their primary role of collecting and providing access to research outputs, repositories are active participants in a broader scholarly ecosystem, feeding their metadata and resources into various types of networks and services that repurpose the content and/or combine it with others. Almost all repositories offer certain baseline services: metadata checking, deposit support, back-up copies, and usage statistics. The vast majority of repositories expose their metadata (and sometimes full text resources) to external discovery services using OAI-PMH protocol. In addition, repository records are increasingly indexed / visible in other external systems, such as DataCite (because they are minting DOIs).
The next generation repository envisions repositories as more than institutional services but as the foundation for other services built on the collective contents

5 https://www.tomedes.com/translator-hub/european-languages
6 See the discuss and analysis by the COAR Task Force on Supporting Multilingualism and non-English Content in Repositories

Report on Repository Survey in Europe, December 2023 Page 30 of 36
of repositories7. The impact and value of repositories, therefore, can be demonstrated by both local usage and download statistics, as well as the downstream reuse of repository resources in other contexts. In this respect, repository collections are increasingly being repurposed and reused in innovative ways. In particular, the integration of repository records into institutional or national systems, such as CRIS systems (especially common in the UK), academic profile pages, university websites, and other internal research administrative tools is widespread as is the reuse and repurposing of repository content into other collections, such domain collections, specialised portals, and education curricula is also becoming quite common. Metadata and persistent identifiers The use of PIDs and comprehensive and standardised metadata is a fundamental requirement for the discovery and reuse of repository resources. The vast majority of repositories support Dublin Core metadata, which has typically been the default for repositories. This, therefore, is the baseline of interoperability for repositories in Europe. In addition, there has also been quite widespread adoption of the OpenAIRE guidelines, which are more granular and require additional metadata elements to be added to repository records.
In terms of persistent identifiers (PIDs), most repositories are now using PIDs for their resources (either handles or DOIs). This is a positive development because it ensures a certain level of permanence for the resources in repositories (for example, if URLs change when a repository changes platforms or upgrades to a new version). Other types of PIDs are also increasingly supported by repositories including author IDs, funder, and institution IDs, and so on, bring additional benefits: enabling the analysis and tracking of research outputs according to the funder, university, or author; and providing an opportunity for repositories to integrate metadata from those external systems to enhance their local metadata.
Despite the fact that a repository supports certain metadata schemas and PIDs does not always equate to the collections having high quality metadata. While most repositories do support standardised and granular metadata schemas, they often rely on the author to fill in the metadata fields. Since authors may not be

7 https://www.coar-repositories.org/news-updates/what-we-do/next-generation-repositories