Report on Repository Survey in Europe, December 2023
Page 24 of 36
Sustainability
66% of respondents indicated that the repository was “very” sustainable and 31%
indicated the repository was “somewhat” sustainable for the next three years.
Some respondents noted that the repository has been a well-established service
already for many years and is well used and well supported by their institution.
Several respondents provided more information about their sustainability
challenges, grouped into several categories:
• Time and resource requirements to properly curate metadata and content
• Replacement of repositories with CRIS systems, which do not fully support the
needs for managing a variety of content types
• Complexities of regular software upgrades
• High cost of employing outside companies to support software upgrades and
ongoing maintenance of the system
• Lack of expected functionalities of the repository platforms
• Understaffing
• Project-oriented funding model
Figure 26: Respondents perceptions of repository sustainability Together 97% of respondents (351) felt their repository was either “very” or “somewhat” sustainable, with only 13 respondents (3%) indicating that it was “not sustainable” (Figure 26). The 3% of respondents that felt their repository was unsustainable came from different countries and repository types, so no geographic generalisations could be inferred.
Report on Repository Survey in Europe, November 2023
Page 25 of 36
Challenges
Respondents were asked to rank a number of challenges to their repository
operations. Software upgrades ranked as the biggest challenge (Figure 26), with
39% of respondents indicating this was a big challenge and another 38%
indicating that it was somewhat of a challenge. This was followed by employing
skilled staff, with 28% asserting that this was a big challenge. and then
underfunding which was flagged by 26% of respondents. It should be noted that
more than 50% of respondents indicated that all the five issues proposed in the
survey were either “a big challenge” or “somewhat of a challenge”.
We received numerous comments related to this survey question that fell into
several other types of categories: lack of needed functionality of the software,
policy trends moving away from repositories (e.g. a growing emphasis on gold
open access), and complexities of the growing diversity and size of collections.
Figure 27: Challenges for repositories
Report on Repository Survey in Europe, December 2023
Page 26 of 36
Solutions / strategies
In terms of helping to address existing challenges, respondents ranked several
proposed activities as “somewhat” or “very” helpful (Figure 28). The majority of
respondents ranked all options provided as very helpful or somewhat helpful.
Advocacy for repositories was ranked highest, with 58% indicating it would be
very helpful and 34% saying it would be somewhat helpful. Community of practice
for technical support was also seen as very helpful with 92% of respondents
indicating this would be very or somewhat helpful. Greater national and regional
coordination for repositories and training for managers was also considered as
very and somewhat helpful for the vast majority of respondents (86% and 85%
respectively). A national or regional platform for hosting was ranked lowest, but
still was considered helpful by just over 50% of respondents (this could be
explained by the fact that several countries already have a national platform so
it is not needed in those jurisdictions).
Some
respondents
provided
other
suggested
solutions
including
the
development of tools and services to assist with ingest and curation, increased
funding, and improving incentives for researchers to deposit.
Figure 28: Activities that will help to address challenges
Report on Repository Survey in Europe, November 2023
Page 27 of 36
Analysis
The survey had 394 responses from 34 countries in Europe. While this is only a
portion of the repositories in the region - OpenDOAR and Re3Data together list
over 3,000 European repositories - we believe that the collected sample is
somewhat representative of the European repository landscape, but perhaps
slightly skewed towards publication repositories.
A strong and well-functioning network of repositories that provides human and
machine access to the wide range of valuable research outputs is needed for
Europe to reap the full benefits of open science. This will require repositories to be
sustainable; user friendly and technically agile; and able to interoperate with a
range of other value-added services. Below is a summary of our key findings and
highlight the strengths and weaknesses in the current landscape.
Coverage
The majority of the repositories that responded to the survey were institutional
repositories. This may be partially due to the fact that the survey was disseminated
by library-based organisations, but also reflects the reality that the vast majority of
repositories in Europe are managed by universities/university libraries or research
centres. Institutional repositories are generally quite sustainable, as they are
hosted by long-lived institutions, who have committed budgets towards this
activity. While OA repositories are open for anyone to access their content, most
repositories focus on collecting research outputs created by members of a
specific community, which typically fall into one of a few categories: institutional,
national, international, and domain repositories. With the current prevalence and
variety of repositories in Europe, all researchers will have at least one repository in
which they can share their research outputs.
Collections
European repositories collect a variety of content types ranging from journal
articles (author accepted manuscripts and publisher versions), e-theses and
dissertations, research data, as well as a range of other materials. Institutional,
international, and national repositories tend to collect a diversity of content types
and disciplines; domain repositories usually focus on a specific content type in
addition to their disciplinary coverage. Collection sizes vary significantly across
Report on Repository Survey in Europe, December 2023
Page 28 of 36
respondent repositories, with an average size of 64,859 items for institutional
repositories and 386,088 items for the other types of repositories (data, domain,
generalist and national repositories) and collection sizes range from less than a
thousand items to millions of resources. The largest respondent repository was
Europe PMC, which contains over 8 million full text records. Even if we consider a
low estimate of around 1,500 active repositories in Europe with an average of
65,000 items per repository; this would mean European repositories collectively
provide open access to close to 100 million items. Collectively, this represents a
significant amount of content if we consider that the large knowledge graphs
that aggregate from many different content providers contain between 200-300
million objects3,4.
Repositories support bibliodiversity in the ecosystem. They do not charge for
access or for research to deposit and they collect and preserve a range of
content types in many domains and disciplines. The most predominant content
types in repositories are journal articles, theses and dissertations, and conference
proceedings. Research data is also common, followed by a long tail of other
types of scholarly and educational materials. Some repositories contain preprints,
but they still represent a small portion of items in the European network.
Repositories, therefore, are well placed to support the expansion of open science
practices across Europe and the reformation of research assessment, which
places a greater emphasis on inclusiveness, diversity, and transparency.
Multilingualism
There is a growing recognition in Europe and beyond of the need to support and
encourage publishing in local languages, as this ensures that the public has
access to the research (which they often fund). The survey found that repositories
play an important role in preserving and disseminating content in a variety of
languages, especially local languages. 17% of respondent repositories collect
content in only one language, almost exclusively represented by repositories in
predominantly English-speaking countries (UK and Ireland), meaning most
repositories collect content in at least two languages.
The survey also found some diversity in the languages of resources across the
European repository network, with at least 29 languages represented in total. That
said, the predominant language for over 50% of respondent repositories was
3 On Nov 24, 2023, OpenAlex had 246 million objects in its aggregation: https://help.openalex.org/
4 On Nov 24, 2023, OpenAIRE has 239 million objects in its aggregation: https://graph.openaire.eu/
Report on Repository Survey in Europe, November 2023
Page 29 of 36
English, even for many repositories from non-English speaking countries. There are
24 official languages in the EU, however, over 200 languages are spoken across
the continent5. So there are still many languages that do not seem to be well
represented in the European repository network. Repositories tend to collect
resources in just two or three languages, with either the main local language
being most predominant, or second most predominant after English. As English is
the lingua-franca for research, especially in the STM (science, technology and
medical fields), these results are not unexpected.
As there are fewer international venues for disseminating non-English content,
institutional repositories are playing a role in this respect for some of their local
communities. In a few cases, repositories publish metadata and abstracts in two
languages (usually the original language of the resource and English), which can
lead to better discovery in (the predominantly English-focused) indexing and
discovery services. We know from anecdotal information6 that repository
platforms have typically been developed with English in mind and do not support
all languages equally. Therefore, managing non-English content can involve extra
efforts for those repositories such as translating the interface of the platform and
metadata curation to correctly assign language codes, especially for languages
that use non-roman characters. This may also partially explain the predominance
of English in repositories.
Services
Along with their primary role of collecting and providing access to research
outputs, repositories are active participants in a broader scholarly ecosystem,
feeding their metadata and resources into various types of networks and services
that repurpose the content and/or combine it with others. Almost all repositories
offer certain baseline services: metadata checking, deposit support, back-up
copies, and usage statistics. The vast majority of repositories expose their
metadata (and sometimes full text resources) to external discovery services using
OAI-PMH protocol. In addition, repository records are increasingly indexed / visible
in other external systems, such as DataCite (because they are minting DOIs).
The next generation repository envisions repositories as more than institutional
services but as the foundation for other services built on the collective contents
5 https://www.tomedes.com/translator-hub/european-languages
6 See the discuss and analysis by the COAR Task Force on Supporting Multilingualism and non-English
Content in Repositories
Report on Repository Survey in Europe, December 2023
Page 30 of 36
of repositories7. The impact and value of repositories, therefore, can be
demonstrated by both local usage and download statistics, as well as the
downstream reuse of repository resources in other contexts. In this respect,
repository collections are increasingly being repurposed and reused in innovative
ways. In particular, the integration of repository records into institutional or
national systems, such as CRIS systems (especially common in the UK), academic
profile pages, university websites, and other internal research administrative tools
is widespread as is the reuse and repurposing of repository content into other
collections, such domain collections, specialised portals, and education curricula
is also becoming quite common.
Metadata and persistent identifiers
The use of PIDs and comprehensive and standardised metadata is a fundamental
requirement for the discovery and reuse of repository resources. The vast majority
of repositories support Dublin Core metadata, which has typically been the
default for repositories. This, therefore, is the baseline of interoperability for
repositories in Europe. In addition, there has also been quite widespread adoption
of the OpenAIRE guidelines, which are more granular and require additional
metadata elements to be added to repository records.
In terms of persistent identifiers (PIDs), most repositories are now using PIDs for their
resources (either handles or DOIs). This is a positive development because it
ensures a certain level of permanence for the resources in repositories (for
example, if URLs change when a repository changes platforms or upgrades to a
new version). Other types of PIDs are also increasingly supported by repositories
including author IDs, funder, and institution IDs, and so on, bring additional
benefits: enabling the analysis and tracking of research outputs according to the
funder, university, or author; and providing an opportunity for repositories to
integrate metadata from those external systems to enhance their local
metadata.
Despite the fact that a repository supports certain metadata schemas and PIDs
does not always equate to the collections having high quality metadata. While
most repositories do support standardised and granular metadata schemas, they
often rely on the author to fill in the metadata fields. Since authors may not be
7 https://www.coar-repositories.org/news-updates/what-we-do/next-generation-repositories