FAIR Data Digest #18
On Reliable Research Software (Harvard Data Science Review) and announcements for the newsletter: frequency of posts and DOI identifiers
Hi everyone,
this edition starts with two announcements before it dives into the topic of software engineering skills for data science.
This newsletter started as a weekly newsletter in which I shared topics around FAIR data and updates about my work. Selecting one or more topics each week is fun and helps myself to summarize and understand topics better. However, depending on the topic it takes some to prepare the content. Additionally I don’t get much response while seeing declining opening rates instead of growth. I therefore decided to only post every other week. This helps me to prepare the content better and keeps your inboxes from overflowing :-)
🔍 The second announcement relates to making this newsletter even more FAIR! Up until now, metadata about the newsletter is available on Wikidata: the newsletter itself (Q119153950) and editions of it such as the last edition (Q122839768) on the European Day of Languages. As of today, each edition of the newsletter gets its own Digital Object Identifier (DOI). This is thanks to the Rogue Scholar platform over which you can learn more in today’s insights section!
🖥️ It is very likely that you depend on certain software for your work on a daily basis, e.g. Excel or chat programs. This is the same for researchers in many fields. Yet, specialized research software often has to be developed for specific purposes. Also often, this software is developed by the researchers themselves, whether they are formally trained in software engineering or not. Imagine the computer program you depend on was created by a colleague of yours who will retire soon. Not a very comforting idea! In today’s coding corner I link you a very interesting article from the Harvard Data Science Review that provides an analysis and lessons learned about the needed professionalization of software development in research.
🔍 Insights: Findable and Preserved Scholarly Communication
Web content in the form of text often does not live very long. Websites and blogs are abandoned/offline or at least evolve and move such that previous content is difficult to find. Especially for modern science communication in the form of scholarly blogs this is less than ideal.
The Rogue Scholar platform (Q122915746) offers solutions! Its slogan is Science blogging on steroids. You can register your science blog and the platform acts as a sort of data aggregator. One of the only prerequisites is that the full length your content is openly available under a Creative Commons license for example as an RSS (Q45432) feed.
You get the following advantages
Your blog posts findable via assigned DOIs
You are linked to your blog posts via your ORCID identifier (Q51044)
Your content becomes searchable via the platform
Besides findability, your content will be preserved for the long-term
if needed you can have a Mastodon (Q27986619) bot boosting (“tweeting”) updates on your blog posts
And this also offers advantages for the scientific community. Not only does your content become more visible and findable, but in case your blog discontinues, the DOI can link to an archived version of the content. So it stays citable.
Interested? Getting DOIs for the first 50 posts per year are completely free. Go and check it out and support the platform.
🖥️ Coding Corner: Better Research Software
Current research in many disciplines heavily depends on software. But just knowing how to code is not sufficient to enable high quality data science or advance a field. Research is about collaboration, software needs to be shared and understood, and its developers trained!
Last week I read an article about the experiences in Research Software Engineering at the eScience institute of the University of Washington (Q101097636) in the Harvard Data Science Review (Q65011533).
Based to practical experiences they identified three needs that can be addressed to different degrees depending on the scope of the research project. For example whether it is a solo project offered by a single developer, projects of a whole lab, or projects meant to serve a community. The three needs are:
Readability of the code such that also others can understand and maintain it
Resilience (or reliability) of software to give confidence that the software works as intended
Reusability of software by others, without adapting the code
The article provides practical examples from different disciplines. Further the article investigates current gaps in software engineering curricula for non computer-science students. Alternatives are proposed and lessons from a new Research Software Engineering team are shared.
Interested in how you can improve and measure your software engineering quality with the 3Rs or why it is easier to recruit senior software engineers for academia rather than junior engineers? Check out the full open access article at the Harvard Data Science Review.
That’s it for this week of the FAIR Data Digest. If you found the content interesting, please share and subscribe. See you in two weeks!
Sven