FAIR Data Digest #3

On the benefits of ISNI identifiers, what ELSA is and how FAIR data can support legislation

Jun 20, 2023

Hi everyone,

welcome to the 3rd edition of my newsletter! Besides the fact that we have suffered the first official heat wave in Belgium this year, I have other hot information to share. This time some work updates about the use of the ISNI identifier, an online webinar on an ethical, legal and social training for data scientists and a video recommendation about law as code.

🏢 If you work with data from different data sources you need to identify the overlap of data, ideally by common identifiers. In my work update I present a Python script that I’ve wrote to fill identifier gaps based on the ISNI database. It helped identifying more than 9,000 identifiers of contributors to relevant translations of contemporary Belgian history.

📅 I have attended an online webinar about the development of a training for data scientists about Ethical, Legal and Social Aspects (ELSA) as part of the Fair Data Spaces project. My main takeaway: there should be a minimum shared vocabulary between computer scientists and ELSA researchers, because data sharing is not only about technical challenges!

🎥 Last but not least, as citizens we are subject to legislation on many levels. When, which legislation applies is a perfect use case for FAIR data. I will share some resources related to law as code that highlight the benefits of FAIR data for legislation with examples from the European Publications Office and the Interoperable Europe program.

🏢 Work updates

As mentioned in the last issue, for the BELTRANS project we use different identifiers to integrate data about book translation contributors from different data sources. For example, record 14150228 from the Royal Library of Belgium (KBR) contains the International Standard Name Identifier (ISNI) 0000 0001 2148 8752, so does record cb119276389 from the National Library of France (BnF). This means both records are about the same person.

In the case of the BELTRANS project, we have CSV lists with person contributors that mention different identifiers in different columns. Yet some rows contain missing identifiers. For example person contributors for which we only have the ISNI identifier and identifiers from other data sources but not yet from KBR. If the data source does not contain the ISNI identifier for a person, we cannot find a match!

However, instead of just passively using the ISNI identifier to find matches between data sources, we can also use it to actively search, because ISNI also has a central database. Last week I have implemented a script that reads a person contributor CSV file and for which you can indicate which “gaps” in the identifier column you would like to get filled based on data from the central ISNI database.

I have put the script that uses an Application Programming Interface (API) of the ISNI database to fetch identifiers under an open license on GitHub. So far, the script simply tries to fill the gaps if possible. However, future versions could also be used to update existing wrong identifiers or produce warnings. Let me know if you would need such a feature!

From an initial contributor list with blanks of the BELTRANS project, I could find 1,398 missing KBR identifiers, 6,504 missing identifiers for the National Library of France and 1,234 missing identifiers of the Royal Library of the Netherlands.

Check out the code on GitHub

📅 Events

FAIR Data Spaces - ELSA curriculum

If there is one term you can’t get around these days, it is data spaces! This term usually describes an environment in which different organizations and consumers manage and share data in a decentral way. And I guess that you already know which qualities the data need to have to make this possible … spoiler: they have to be FAIR!

Managing data in such a way, does not only revolve around technical challenges, but also Ethical, Legal and Social Aspects (ELSA). Something about which (future) data scientists should know about.

Last week I attended a workshop about the development of an ELSA curriculum for data scientists. This workshop was part of the FAIR Data Spaces research project, funded by the German Federal Ministry of Education and Research (BMBF)

Besides the acronym ELSA, I learned about a lot of things such as the CRISP-DM (CRoss-Industry Standard Process for Data Mining) model to guide data science or the Personal Health Train that enables care professionals to analyze shared data.

My main takeaways are that it is all about awareness of ELSA topics. There need to be a minimum shared vocabulary between technical and ELSA stakeholders. And last but not least, computer scientists should not hesitate to get in touch with ethical researchers, as they are the experts of the real scientific field of ethics! - as stressed by Jona Boeddinghaus in his presentation.

Resources about the FAIR Data Spaces project are available on their Zenodo community, thus in a FAIR way!

Read about the workshop

🎥 Videos not to miss

Law as Code

Legislation can be very complex. On top of that there are different governmental organizations that produce and use legislation: one municipality may have slightly different rules or fees compared to others.

For you as a citizen it is also important to quickly get the answers you seek and not have to study dozens of legal documents. This is where FAIR data comes in! Imagine you could find information in or across legal documents and query it in an interoperable way.

Recently Interoperable Europe hosted a webinar about law as code (see video below). Among others, the webinar featured presentations about the EU Rail Ontology for the EU mobility data space and local council decisions as Linked Data in Flanders, Belgium.

How can FAIR legal documents be created in the first place? At the last Extended Semantic Web Conference (ESWC), the Publications Office of the EU presented LegalHTML. Their workflow to streamline the production of formal, structural and semantic representation of legal acts. Another example is the use case of the Flemish legislation (not to be confused with the local council decisions above). Information about that you will find in my blog post from last week on a network event.

That’s it for this week of the FAIR Data Digest. I hope you found the content interesting. Don’t forget to share or subscribe. See you next week!

Sven

FAIR Data Digest

Discussion about this post

Ready for more?