FAIR Data Digest #20

the Halloween edition on scary data anti-patterns

Oct 31, 2023

Hello everyone,

welcome to today’s Halloween edition about data anti-patterns that hopefully will scare you so much you don’t do them :-) Things you better avoid when aiming for FAIR data.

Anti patterns? In short they are the opposite of best practices. Common or obvious solutions that are not only ineffective but also risky. Similar to biases, it is good to know about them so you can try avoiding them. I mainly know the term from software engineering, but apparently it is at least also known for project management and business processes (Q76438, Wikipedia).

In the following I will list some anti patterns I have seen “in the wild” that directly relate to the FAIR principles, their evil twins if you like 🧛

daryl_mitchell from Saskatoon, Saskatchewan, Canada, CC BY-SA 2.0, via Wikimedia Commons

Findability: the needle in the haystack

Findability is probably the most important principle to get started: if you don’t know that something exists or you know it but you don’t know where it is, then you simply cannot use it.

Providing unique and persistent identifiers for things is the common solution to deal with findability. As easy as it sounds, there are many possible pitfalls, some obvious and some more tricky.

😱 reusing identifiers
➡️ Recycling is good, but reusing identifiers is bad: identifiers should be unique!
😱 implementation details as part of the identifier
➡️ If you change your software or provider the identifier will change, hence “identifiers” such as http://my-url.com/items?query=abc are bad.
😱 thinking that persistent identifiers are only a technical problem for the ICT department
➡️ Providing and maintaining the technical infrastructure is important, but knowing the data, use cases, stakeholders and community best practices is just as important

From a very practical perspective, searching nowadays often happen via Google. Did you ever try to google a persistent identifier? Often you won’t find much. The following paper investigates how FAIR persistent identifiers actually are and which role validity can play (DOI: 10.3233/DS-190024).

By the way, there are different ways to implement persistent identifiers. You can check out the Persistent Identifier Guide of the Dutch Digital Heritage Network to know which system works best for you. Just answer 25 questions.

Read the Persistent Identifier Guide

Accessibility: what’s in the box?!

To stay in the Halloween theme, imagine you’ve found a secret door, but you don’t know how to open it. You would be more than happy getting some information about how to access the room behind it: that would be metadata about accessibility.

😱 not providing any metadata
➡️ “~~Pics~~ metadata or it didn’t happen”. Help others to find out more about something (something that possibly is no longer available). I have heard that the phone number of a research projects’ PI can be the most valuable piece of metadata :-) (DOI: 10.5281/zenodo.8344854, YouTube)
😱 not using an open standard for metadata
➡️ After all the effort of making data available, please go the extra mile of easy accessible metadata, your users will thank you!

Interoperability: it just works

FAIR's tongue-twister that even grammar checkers have trouble with. Exchanging information between computer systems without re-inventing the wheel each time. It’s all about standards (again).

😱 not using a standard
➡️ Standards are important, not just to charge your devices via USB, also for your data! Think of the amazing feeling when you can simply double click on a file and it just opens without problems.
😱 using a standard in the wrong way
➡️ Read the docs to use the standard (vocabulary) you are using to describe your data in the correct way. Imagine you annotate your website wrong and therefore it will rank worse on Google and other search engines.
😱 using a proprietary format
➡️ You have done an amazing job to create some valuable data, don’t make its value and success dependent on an untrustworthy commercial product that may cease to exist or that only a fraction of your users are able to use.

Reusability:

You found something online, for free … how valuable do you think is it for your use case and to which extend are you allowed to use it? Likewise, if you create something you probably want to be credited for, so you should indicate how someone else can reuse what you created. These questions concern the reusability (of data). There are a few things to avoid.

😱 no license
➡️ No one likes to be sued! Indicate a license to give security to your users on the one hand, and tell them how you would like them to use it on the other hand.
😱 don’t mention any sources or provenance
➡️ Everything has its origin, knowing it is important to assess its value for reuse. Based on which criteria did you curate the data? Which software did you use? Which data sources?
😱 too complicated
➡️ make it easy for anyone to reuse your data/software. Once again, this can be achieved by following standards. For example, when providing code try to follow common coding and installation guidelines.

These were just a few anti patterns standing opposite to some of the FAIR principles. Feel free to google to find more specific data anti patterns. Happy Halloween 🎃

That’s it for this week of the FAIR Data Digest. If you found the content interesting, please share and subscribe. See you in two weeks!

Sven

FAIR Data Digest