Presentation from José Carvalho and Pedro Principe, University of Minho, at ETD 2019 Conference (22nd International Symposium on Electronic Theses and Dissertations), Porto, Nov 7, 2019.
Interoperability is the key: repositories networks promoting the quality and interoperability of repository metadata & vocabularies
1. INTEROPERABILITY IS THE KEY
repositories networks promoting the quality and
interoperability of repository metadata & vocabularies
José Carvalho e Pedro Príncipe
jcarvalho@sdum.uminho
pedroprincipe@sdum.uminho.pt
2. AGENDA
Power of infrastructures and Repositories Networks - intro
RCAAP and OpenAIRE - overview
Feedback from the participants
Theses and Dissertations @ RCAAP and Portuguese use case
Theses and Dissertations @OpenAIRE
PAUSE
Use cases and demos
TID workflow in RCAAP & Broker service from OpenAIRE
Tips for interoperability and trends
Discussions
4. THE REAL VALUE OF REPOSITORIES LIES IN
THE POTENTIAL TO INTERCONNECT THEM
TO CREATE A NETWORK OF REPOSITORIES.
4
The Case for Interoperability for Open Access Repositories, COAR
8. Project Goals
- Increase the visibility, accessibility and dissemination of Portuguese research results
- Facilitate access to information about Portuguese scientific output
- Integrate Portugal in the wide range of international initiatives in this domain
8
12. The Workflow
04
03
02
01
Validation on RENATES
Renates system checks for the
TID on the Portal and the URL
for the item on the repository.
Harvesting on Search Portal
The Search Portal harvest the
thesis and made them
available on the API
Registration on RENATES
Registration on the national
registry of thesis by
institutions. Attribution of TID.
Deposit on Repository
Institutions deposit on
Repository integrated in the
RCAAP Network
13. What is RENATES
Administrative registry of thesis made by the
institutions for the government.
https://renates2.dgeec.mec.pt/
14. 1- Registration on RENATES
- Institutions register on a national service the thesis and obtain an identifier
TID = Thesis Identifier
- After the conclusion and acceptance of the work, the deposit is made by the institution.
15. 2 - Deposit on Repository
Institutions deposit on a repository integrated on the RCAAP Network and associate the TID (Thesis ID)
The repository needs to be on the Search Portal Directory:
https://www.rcaap.pt/directory.jsp
16. 2 - Deposit on Repository
Submission process with specific forms and a new field : dc.identifier.tid
17. 2 - Deposit on Repository
DSpace functionality - Get metadata from RENATES based on TID
19. 3 - Harvesting on Search Portal
The national search portal RCAAP harvest every night all repositories and made available the information of the thesis on the API.
API Interface: https://www.rcaap.pt/api/
https://www.rcaap.pt/api/v1/documents?format=xml&jsonp=rcaapCallback&resultsPerPage=10&includeAllRe
positories=true&identifier=201816962
20. 4 - Validation on
RENATES
Every night, the RENATES service checks the API for the existence of any thesis with a particular TID not yet identified (or
closed).
When a TID is available on the portal, RENATES sistems get the URI (usually a handle) and closes the process.
21. To be DONE
1 - RENATES service get all available metadata and register a DOI for that thesis that points to the repository item. The TID is
converted as a DOI from the beginning. Initially not active. RCAAP has a DOI registry at national level.
2 - Digital preservation centralized
22. Main Aspects
- The TID is key to identify the thesis on the ecosystem.
- ORCID identifiers are used on RENATES systems and also in some repositories
- A persistent identifier is usually associated with the repository.
- Registration of DOIs for each thesis in on the way! (with ORCIDs)
- Funding information can be associated with thesis!
- Digital Preservation is done at repository level (may be centralized in the future)
- Metadata (based on OpenAIRE Guidelines v3) & ETD-MS (in a near future OpenAIRE 4)
25. Data Curation for Thesis
Work based on repositories to identify
- Thesis without TID
- Works not deposited
Sended lists to institutions to comply.
Mandatory from August 2013!
26. Support of the Community
Helpdesk by email and phone (provided by RCAAP Project)
Webinars
eLearning contents
Workshops at events
32. Building the OpenAIRE Research Graph (information space)
OpenAIRE Graph & Dashboards
Research Graph Services
TERMS
OF USE
Publications
repositories
Data
repositories
Hybrid
repositories
Registries
OA
Journals
Software
repositories
Content Providers Research
Infras
GUIDE
LINES
35. Providing an open metadata
research graph of interlinked
scientific products, with Open
Access information, linked to
funding information and research
communities
The OpenAIRE research graph
Open
Complete
De-duplicated
Transparent
Participatory
Decentralized
Trusted
36. • Publications
• Products with “equivalent” PIDs, title, authors, dates are grouped
• Dataset
• Products with “equivalent” PIDs are grouped
De-duplicated
• Software
Products with “equivalent” PIDs and original URLs
are grouped
• Other products
Products with “equivalent” PIDs, title, authors,
dates are grouped
41. Evolution of OpenAIRE-Guidelines
2010
Literature
Guidelines v1
2012
- Literature
Guidelines v2
- Data
Guidelines v1
2013
Literature
Guidelines
v3
2014
Data
Guidelines
v2
2015
CRIS-CERIF
Guidelines v1
2018 Guidelines for
- institutional and
thematicrepos. v4.0
-CRIS-CERIF v1.1
2018 Guidelines for
- Software
Repositories
- Other Research
Products
42. Metadata Goals in OpenAIRE
Goal Metadata Groups
Discovery and Citability Descriptive metadata
Accessibility and Reuse Access Rights, License Conditions
Contextualization Research Project, Linked Research Artefacts
Interoperability Identifier for Entities, Controlled Vocabularies
Reporting Funding Reference
TDM File Location, License Conditions
51. Metadata Quality Challenges
Issue Affects Proposed Solutions
Missing values Indexing, discovery, reuse Curation by repository team;
use OpenAIRE Validator,
Broker service
Missing Links and
Identifier
Interlinking with other research
products; Contextualisation
ScholXplorer, Broker service
Lack of controlled values Discovery Use agreed controlled
vocabularies according to
OpenAIRE Guidelines
Mandatory values only Discovery and reuse Broker service
52. ● Guidelines at https://openaire-guidelines-for-literature-repository-
managers.readthedocs.io/en/v4.0.0/
● Schema and examples on github
https://github.com/openaire/guidelines-literature-repositories
References
54. OpenAIRE’s e-infrastructure Commons – BROKER CONCEPT
Publications
repositories
Research Data
repositories
CRIS
systems
Registries
(e.g. projects)
OA
Journals
Software
Repositories
Validation
Cleaning De-duplication
Enrichment
By inference
CONTENT PROVIDERS
INFO SPACE SERVICES
Project initiative
FunderFunding
Result
Publication Data Software
Organization
GUIDE
LINES
TERMS
OF USE
Repositories in OpenAIRE may
be interested to acquire
metadata information about
publications that are
“potentially of interest
to them”
i.e. be part of their collection:
add new records, enrich the
records with extra metadata
information.
59. Interoperable metadata is key for
effective content sharing
Use our validation service and see how you can apply the
OpenAIRE Guidelines to expose your contents using
global standards.
VALIDATE
60. Reach a wider audience around the world
Register your datasource in OpenAIRE and be part of a
global interlinked network.
REGISTER
61. Improve your metadata.
Get more connections
OA Broker service offers a wealth of information on
scholarly communication data.ENRICH
Find out what interests you and subscribe to enrich your records.
More & Missing events that may enrich your Repository:
• Persistent identifiers
• Open Access Versions
• Projects
• Subjects
• Abstracts
… datasets, software
62. Open research impact empowers
Open Science
Open Metrics service by sharing your usage data.
Get the benefit of an aggregated environment to
broaden the mechanisms for impact assessment.
MEASURE
Get usage statistics reports for your datasource
67. Support materials for Content Providers Dashboard uptake
• Provide - How to validate and register your
repository
• Provide - How to enrich research artifacts
• Usage Statistics – How to track the usage
activity of your repository
• ScholExplorer - Literature & Data
interlinking
• Making your repository Open
Support – guides
• Make your content count - OpenAIRE
Content providers Dashboard: service for
repository managers
• OpenAIRE metrics service: usage statistics
• OpenAIRE Guidelines for data providers:
new Metadata Application Profile for
Literature Repositories
Training – webinars
71. Integrated Vision
Why?
- Develop innovative services based on a national integrated information ecosystem
How?
- Based on international guidelines, identifiers, tools and services
What?
- Facilitate integrations between services (repositories, harvesters, funding, government, …)
- Focus on the end-user (researcher & science manager)
71
72. Integrated Vision of the Repositories Network
- Focus on community needs (and the support of the community)
- Researcher/User Centric approach
- Adoption of existing protocols, metadata schemas and guidelines
- Focus on Metadata Quality
- Get the added value from different services
72
73. National initiative to ensure the creation
and sustained development of national
integrated information ecosystem to
support research management
https://pt-cris.pt
73
77. Integrated Vision
Why?
- Develop innovative services based on a national integrated information ecosystem
How?
- Based on international guidelines, identifiers, tools and services
What?
- Facilitate integrations between services (repositories, harvesters, funding, government, …)
- Focus on the end-user (researcher & science manager)
77
79. GUIDELINES
- Initially DRIVER Guidelines, then OpenAIRE Guidelines
- Working on implementation of OpenAIRE 4 Literature Guidelines
- COAR Taxonomies for Document Types, Access Types and Versions
79
85. Repository Software - DSpace
By now, DSpace 5, but shaping DSpace 7 !
Integrates the concept of Entities
Will be OpenAIRE 4 compliant
Use of API as main integration endpoint
85
86. APIs
Use of existing information from “authoritative” data sources:
- OpenAIRE API project list
- Sherpa / Romeo (DSpace)
- RENATES
- ...
86
88. Validation Reports
At Search Portal Level
On report based on each harvesting process
Shows validated, transformation and errors by type
88
89. Integrations with other systems
- Always…
- Using existing metadata profiles / mapping
- Adopting existing protocols (OAI-PMH, SWORD, REST API,....)
89
90. PTCRIS Sync
- PTCRIS Sync - Framework for synchronization with ORCID - https://github.com/fccn/PTCRISync
Curriculum
Vitae
ORCID API Any other
CRIS System
90
91. Requisites
We identified five use cases related to claim tasks, deposits on repositories from
external sources, synchronization, authority control for entities (authors,
Organizations; funding) and data curation
91
94. Authentication
Federated Authentication based on “National Researcher Identifier” - Ciência ID (that aggregates other author identifiers like ORCID, SCOPUS, Researcher
ID,...)
a
Binding CID a
user RI
b
Claim Author
Profile
c
Claim de
trabalhos
1 Claim
94
95. From a system to the Repository
- From departments to institutional repository
- From Curriculum Vitae to Repository
SWORD V2
API
RepositoryUser or
System
95
96. 2. Direct deposit
from CV
Wizard that permits:
• Choose Repository
• Choose Collection
• Choose FT file and access type
• Introduce funding information
• Agree with deposit licence
2
Deposit
96
99. 4. Authority Control for Authors
4 Authority control
4.a
Authors
Possibility to associate an author name with a unique identifier (ORCID or / and
Ciência ID)
99
101. From the Repository to Other Systems
Repository
User on
Curriculum
Service
National
Harvester
OAI-PMH
REST API
101
RENATES
102. 1.c - Works claim 1.a
Binding CID a
user RI
1.b
Claim Author
Profile
1.c
Work Claim
1 Claim
Possibility of a Ciencia Vitae user to import works from repositories (via RCAAP
Portal)
102
103. Data Curation
- Connecting Author Names with ORCID IDs
- Converting project IDs into project entity
info:eu-repo/grantAgreement/EC/FP7/612425/EU
103
104. OpenAIRE 4 on repositories
Author
information is
stored in
repositories with
openAIRE 4
schema and will
be harvested in
this same format
by RCAAP portal.
104
106. Services Provided by the Repository Infrastructure
National Funder- Identify and report publications with national funding for report and evaluation.
Thesis & Dissertations- Support the legal deposit of thesis and dissertations
Synchronization- Repositories use the national harvester as proxy to other services (Validation, improvement, integration with CV, ORCID,
Funder, Thesis & Dissertation legal deposit,...)
106
107. New Approaches
Hierarchical Metadata (Entities Concept)
The need to describe specific concepts more deeply (authors, funding, journals, events, affiliations,...)
Relations focus on know identifiers
Use of ORCID, ISNI, project IDs, ISSN, ISBN, DOIs,...
Reproducible Local data model
Possibility to reproduce entities and relations to other systems (from the repository to the harvester)
107
108. New Approaches
Concept of a living item - as it may be improved, updated, related over time by third party services (like linking research data to thesis…)
Need of international alignment - on guidelines, protocols, data models - to make research management really global
108
109. Practical Example of the Integrated Vision
Curriculum Vitae
CRIS System National Harvester
Search Portal
Institutional
Repositories
PTCRIS
Sync
Broker
109
RENATES
Thesis
111. How to reproduce the network ?
You have access to…
- Services
- Software
- Guidelines
- Protocols
- Uses cases
- And an open community!
What do you need more?
111
112. to an integrated research
Interoperability is Key
and added value services!