FAIR Data Digest #12
FAIR proceedings: semantification of the open access publishing service CEUR-WS
Hi everyone,
I’m on vacation now, but thanks to pre-scheduling you won’t miss out FAIR data updates! Today I want to share with you a research paper in which the authors report on the creation of large amounts of FAIR data which they host on Wikidata.
🧪 Researchers write scientific papers which get peer-reviewed and if accepted they eventually make it to the proceedings of a conference. So far so good. However, depending on the publishing platform, you may end up with a lot of PDF files that are not really FAIR, meaning you cannot easily explore research papers. In today’s Science Spotlight, I summarize an interesting paper in which the authors report on how they improve the CEUR-WS publishing workflow and make the metadata available on Wikidata.
🧪 Science Spotlight: CEUR-WS Semantification
If you’re organizing a small workshop you most likely are looking for a place to publish the proceedings. CEUR-WS (Q27230297) gets you covered! It’s a free online service to publish conference/workshop proceedings which is operated by a team of unpaid volunteers since 1995.
Currently there are more than 3,300 volumes published by CEUR-WS containing over 65,000 PDF documents. However, most of the data is only indirectly available via other indexing services and because of previous manual curation. Apparently there have been attempts in the past to make this data more accessible, but so far no sustainable way was implemented as a service with Persistent Identifiers (PIDs) is needed to host the data in a permanent way.
“PIDs can only remain persistent if someone is committed to ensuring they stay accessible to users. This requires an engagement or a service level agreement for PID availability, in contrast to URIs, where no such agreement exists”
This is where two tools from the Wikimedia Foundation (Q180) come into play: Wikidata and the Semantic Media Wiki. In the paper “Semantification of CEUR-WS with Wikidata as a target Knowledge Graph“ (Q118799186) that I want to share with you today, the authors present a solution: instead of yet another effort to create RDF out of the proceedings and storing them in a local RDF database, the whole publishing workflow is adapted to use a Media Wiki as Content Management System and single-source-of-truth metadata and Wikidata as a free and externally hosted database for FAIR metadata.
They have already applied the workflow and made proceedings and event entities available on Wikidata. More details you can find in their paper. The benefit you may ask? One of the benefits: exploration of data! Check out the live query in the caption of the following image that can show the locations of all proceedings events, because it is stored now in a FAIR fashion.

That’s it for this week of the FAIR Data Digest. I hope you found the content interesting. Don’t forget to share or subscribe. See you next week!
Sven