Report%20of%20European%20repositories%20survey_2023_1.pdf

Type: Document | Status: ready

Report on Repository Survey in Europe, November 2023 Page 11 of 36
Table 2: Languages of metadata and resources in the repositories COUNTRIES WITH OVER 15 REPOSITORIES IN SURVEY ENGLISH PREDOMINANT LANGUAGE LOCAL PREDOMINANT LANGUAGE OTHER SECOND LANGUAGES ENG % % LOCAL as 2nd predominant language LOCAL % % ENG as 2nd predominant language Croatia 9% 6% 91% 69% Italian Portugal 22% 22% 78% 67% Spanish Poland 32% 26% 68% 59%

Spain 34% 32% 66% 59% Catalan (3) Austria 41% 35% 59% 53% Hungarian Serbia 42% 42% 58% 45% Russian Germany 56% 42% 44% 36%

Switzerland (*) 70% 75% 20% 15%

Italy 80% 67% 20% 20% Spanish United Kingdom 100%

Spanish (6) French (5) German (3) Welsh (2) Polish, Italian & Chinese -a repository belonging to the international publisher) (*) Local LANGUAGES: DEU, ITA, FRA

Report on Repository Survey in Europe, December 2023 Page 12 of 36
Who can deposit Over 75% of repositories in the survey serve their local communities and offer services to only persons who are affiliated with their institution (Figure 7). 6% of respondent repositories are open to anyone, 4% are open to domain communities, and 1% are open to persons from a specific country. Most of the 9% who chose the ‘’other’ category, clarified that the repository was an institutional repository offering a mediated deposit service, whereby repository staff deposited content on behalf of the creators, therefore the portion of institutional repositories was actually over 80%.

Figure 7: Repository accepts content from which communities National networks About half of respondent repositories indicated they were part of a national level network or service (Figure 8). The types of services/networks are varied and include harvesters, portals and other discovery/indexing services; communities of practice; shared platforms; open source platform networks; and domain networks. However, the responses were inconsistent in many countries, with some respondents from a given country indicating they belong to a network and others indicating they did not. This could be because respondents had a different interpretation of what is a national network or national services, but also some national networks may serve only a subset of repositories in their country.

Report on Repository Survey in Europe, November 2023 Page 13 of 36
However, a substantial amount, i.e. almost half of all responding repositories in a country, feel part of an existing network. Several repositories belong to more than one type of national network or service. Given the fact that the communities advancing open access and research data management communities are often distinct from each other, it is not surprising that respondents from these different sectors named different national services.

Figure 8: Number of respondents who are part of a national network Hosting model for repository 57% (223) of respondent repositories are locally hosted, while 43% (165) of respondent repositories are hosted by an external provider (Figure 9). Most external providers are national hosting platforms, university data centres, or national cloud services. 7 respondent repositories are hosted by commercial providers.

Figure 9: Local or external hosting of repository

Report on Repository Survey in Europe, December 2023 Page 14 of 36
Software platforms DSpace is the most commonly used software platform, with 41% of respondents indicating they currently use the DSpace software. Other widely used platforms are Eprints (11%), Fedora/Islandora (11%) and Dataverse (4%). Following this, several other platforms were also reported: Invenio (3%), Pure (3%), OPUS (3%), Omega-PSIR (2%), Samvera (1%), and Figshare (1%) along with a variety of other software types. (Figure 10) It is worth noting that 8% of respondents run their repositories on locally developed software (4% of institutional repositories use a locally developed software platform and 22% of national / domain / generalist repositories have locally developed software platform). Add-ons/patch/code added to the codebase
About 61% of all respondents indicated that they have changed or added to the basic “out of the box” versions of the repository software platform (Figure 11).

Figure 11: Number of respondents that adopt add-ons, patches of new code Figure 10. Software platforms used by repositories

Report on Repository Survey in Europe, November 2023 Page 15 of 36
This situation is more frequent in Eprints (83,7%) and DSpace repositories (63,2%), compared with all the other platforms (58%).
Software Upgrades 42% of repositories upgraded their repository platforms in 2022, and 74% of repositories stated that they were planning to upgrade in 2023. 21% of repositories that upgraded in 2022, plan to do it again in 2023. In total, about 60% of respondents have either updated their repository in 2022 or are planning to update to a more recent version in 2023 (Figures 12 and 13).

Figure 12: Year of last major upgrade

Figure 13: Year of next major upgrade

Report on Repository Survey in Europe, December 2023 Page 16 of 36
Metadata schemas The most common metadata schema adopted in repositories is Dublin Core, with 77% of repositories indicating they provide support for Dublin Core (Figure 14). 26% provide support for the DataCite schema, which was initially developed for research data and unsurprisingly, there was a positive correlation between the repositories that collect research data and support the DataCite schema. Just under half of respondents indicated that they support more than one type of metadata schema.

Figure 14: Metadata schemas available to use in the repository OpenAIRE Guidelines The OpenAIRE guidelines, which are more extensive and detailed than Dublin Core and include additional metadata elements such as funder and project IDs and access status, are becoming a widely used standard in Europe as they have been recommended by the European Commission (EC) as part of their open access policy. Many repositories in Europe (74%) have adopted the OpenAIRE Guidelines (Figure 15). It is worth noting that a significant number of repositories (167) are still using older versions of the Guidelines (which are less granular and don’t include identifier schemes for authors, organisations or funders, and the COAR Controlled Vocabularies), meaning they do not meet the current EC requirements for metadata.

Report on Repository Survey in Europe, November 2023 Page 17 of 36

Figure 15: Support for OpenAIRE Guidelines Licences Almost all repositories (96%) offer users the option of choosing a specific licence, the most common of which are Creative Commons licences (91%). Some repositories offer several licensing options (Figure 16).

Figure 16: Licences available in the repository

Report on Repository Survey in Europe, December 2023 Page 18 of 36
Author IDs ORCID IDs are quite widely supported, with 260 repositories providing a metadata field for ORCID in their records (66%), 71 support National IDs (18%), and other types of IDs are also supported by 78 repositories. 97 repositories do not support any type of author ID, which represents about 25% of respondents (Figure 17).

Figure 17: Authors IDs supported by the repository Resource Persistent Identifiers Many repositories assign at least one type of persistent identifier (PID) to the resources deposited, with the most common one being DOIs (Digital Object Identifiers) - 46%, followed by Handles (44%). 67 repositories support both Handles and DOIs. In the “other” category, most indicated that they are using an URN (Uniform Resource Name) or ISSN. About 10% of repositories do not assign / support any type of PID for the resources in their repository.

Figure 18: Persistent identifiers for resources assigned by the repository

Report on Repository Survey in Europe, November 2023 Page 19 of 36
Other services Preservation Approximately 63% of respondents (229) have a formal preservation policy in place at their repository, while 37% (136) indicated they have no preservation policy (Figure 19). In the comments, some respondents indicated that they were in the process of developing a policy (15); and several respondents noted that, while they don’t have a formal policy, they do have a variety of preservation practices and procedures in place, including making back-up copies/mirroring content elsewhere. Some repositories are integrated with broader institutional preservation systems.

Figure 19: Repositories with a preservation policy Usage statistics Most respondent repositories (73%) are collecting usage statistics, with several using more than one usage statistics service. Only 33 repositories (about 10%) indicating they do not collect any type of usage stats. Most common is the use of the local repository statistics functionality, which is provided by the software platform.

Report on Repository Survey in Europe, December 2023 Page 20 of 36

Figure 20: Type of repository statistics services used by the repository Curation Most repositories apply some level of curation upon deposit of a new resource. Metadata validation is the most common (checking that it is correct and/or complete), followed by mediated deposit (repository staff deposit on behalf of the researchers) and content validation (checking file formats and copyright) (Figure 21). In the “other” category, respondents listed things such as review for compliance with other deposit guidelines, checksum validation, and ethics review. Repositories do not undertake editorial review, but rather ensure resources are described and formatted properly.

Figure 21: Curation process undertaken by the repository

Report on Repository Survey in Europe, November 2023 Page 21 of 36
Certification 23% of respondents said that the repository has undergone some type of certification (Figure 22), with CORE Trust Seal being the most common, followed by DINI and Data Seal of Approval. No significance difference in certification rates was found across repository types, with a slight increase for research data repositories. 19 respondents indicated compliance with national aggregator requirements or OpenAIRE as “certification” (which is not so much of a certification, but rather validation of the use of the OpenAIRE guidelines) (Table 3).

Figure 22: The repository has undergone some type of certification Table 3: Type of certification undergone by repository CoreTrustSeal 20 DINI 14 National Aggregator Compliance 14 Data Seal of Approval 7 OpenAIRE Compliance 5 ISO 3

Report on Repository Survey in Europe, December 2023 Page 22 of 36
Other value added services Numerous other services beyond the ones mentioned above were described by respondents. Most common is the integration of repositories with other institutional services, such as a CRIS (current research information system), academic profile pages, or university websites.
A significant number of respondents also indicated that repository resources are reused by other types of external systems such as aggregators and discovery systems, but are also integrated into customised collections at the national level, reused for research assessment exercises (e.g. REF), and incorporated into national education curriculum.
Enhancement of repository records using metadata from other systems (e.g. using ORCID, Crossref records) is also common, as is the export of repository metadata to other systems. Other tools/functionalities such as using the CORE recommender system, digitization services, plagiarism detection, and request-a-copy were also mentioned.
Training was also widely referred to, especially by data repositories, which often provide training to researchers on how to format their data and how to complete data management plans. Some repositories offer assistance for authors to navigate copyright and other licensing issues. Main funding sources Institutional funding represents the predominant funding source for repositories, with 77% of respondents indicating their main funding source was their institution. 13% receive external project funding (Figure 23). Very few repositories (5, or just over 1%) charge a fee for depositors, and after further examination, these fees were only applied for certain types of deposits (i.e., unusually large data sets that require significant storage capacity). Most repositories rely on a single funding source, with only a few that receive funds from more than one source (institution and project funds mainly). Figure 23: Predominant funding sources for the repository

Report on Repository Survey in Europe, November 2023 Page 23 of 36
Staffing After removing several outlier responses with unrealistically high numbers (we presume these questions were misinterpreted by some), the average number of staff per repository was found to be just under 3 full time staff members (FTE). The staffing for repositories is spread across several positions: repository managers, technical support, metadata and content curation, and “other” positions. Close to half of the staffing of repositories (47%) is devoted to metadata and content curation, 27% to the repository manager position and 19% to technical support positions (Figure 24). Over half of respondent repositories have 2 or less full time employees (Figure 25).

Figure 24: Distribution of staffing in repositories

Figure 25: Number of staff members per repository