“By making data easier to find, others can use it too in their research”

How a metadata catalogue can help science take further steps

This spring Utrecht University geoscientists will launch a new metadata catalogue. But what exactly is such a catalogue, and how can it help you as a scientist? Project manager Ronald Pijnenburg gives us a tour of the new system. He explains why the catalogue brings something new, and why this system is reusable within other research disciplines.

More and more scientists publish their research data on online platforms. The data is often publicly available; anybody who wants to can download and use a dataset. However, this is easier said than done, because for each discipline there are hundreds of places to share research data. How do you find the data relevant to your research in all these scattered sources?

The answer to this question:  with the help of a metadata catalogue. “That is a place where data from several publication sources can be found in one spot,” says geoscientist Ronald Pijnenburg. `The data files themselves stay where they are. Only the metadata, so the description of the research, is included in the catalogue. Think for instance of the title, the authors, keywords and the subject matter of the dataset. And of course, you will find a link to the data files.”

Extensive search system

Ronald Pijnenburg is project manager of EPOS NL, the Dutch branch of EPOS, a European data portal for Earth Sciences. Together with a group of colleagues from Utrecht University and Delft University of Technology, he works on a sub catalogue, EPOS Multi-scale Laboratories (MSL). It helps you to find lab data from a hundred Earth science laboratories throughout the whole of Europe. In its turn, this sub catalogue contributes to the EPOS central catalogue. This spring EPOS will be officially launched on a conference in Vienna.

Ronald Pijnenburg in the laboratory of the Earth Simulation Laboratory, photo by Annemiek van der Kuil, PhotoA
Ronald Pijnenburg in the Earth Simulation Laboratory, photo by Annemiek van der Kuil, PhotoA

Such a metadata catalogue reminds one of the digital search system of a library, shows Ronald Pijnenburg. You enter the subject you are searching for, press enter, and the system provides a list of search results matching your question. ‘Let’s take my own PhD research into sandstone layers in Groningen (see text box below) as an example. Suppose I am searching for data on sandstone. Via the central EPOS portal I find that there are two discipline groups offering data on sandstone, including Multi-scale Labs (MSL). Within the MSL data service, I can refine my search query. In this way I will find exactly the data I am looking for. For that matter, this can also be done via the sub catalogue that MSL is offering, besides the central EPOS portal. Here, have a look!”

On his screen a page appears with a search bar and to the left a whole series of search terms. “In the catalogue that is specifically developed for MSL, you can tick more specific boxes to indicate what data you are looking for,” he points out. “For instance, you could filter on sort of rock or on the equipment used to collect the data. A bit similar to Funda, the Dutch real estate website, where you also select your search requests.”

About Ronald Pijnenburg

Ronald Pijnenburg sitting in front of the Earth Simulation Laboratory, photographer Annemiek van der Kuil, Photo A

Ronald Pijnenburg works at the Faculty of Geosciences. He manages the Dutch team that is building and managing the EPOS-MSL metadata catalogue. Ronald Pijnenburg is a geoscientist himself, his PhD research dealt with the earthquakes in Groningen that were caused by gas mining. He conducted research into the sandstone layer containing the gas: what happens to the sandstone if you produce gas and so lower the gas pressure?

Open data

Via ‘material’ and ‘sandstone’ Ronald Pijnenburg keeps on clicking, until he arrives at ‘Slochteren-sandstone” the Groningen subtype from his PhD research. “As soon as you have found a suitable dataset, the catalogue will link you through to the portal where the data can be found online. There you can download the data files directly.”

With respect to EPOS nearly all linked research data are publicly available. However, that is not always the case. In Yoda, for instance, the Utrecht University data management platform, the person publishing the data can choose to lock the data and only make the description (meta data) available. If another researcher wants to access the data, the owner must give permission.

What search terms to use?

To make sure that the information in a metadata catalogue is easily findable, it is important that every author adds the same search terms to their publications. Ronald Pijnenburg: “You can only find other data on, for instance, sandstone if everybody uses exactly the same term without spelling errors. That is why we have agreed upon a vocabulary with the approximately hundred involved laboratories of EPOS-MSL. The intention is that now everybody uses the same terms. As you can imagine: it is quite a bit of work to get that done on the ground.”

Smart reuse of data

Metadata catalogues are of importance to science for several reasons, says Ronald Pijnenburg. “As research data become more easily findable, they are easier to be reused by others. In this way you prevent people from reinventing the wheel. You can also set up entire studies, based on existing data. That is more efficient, also with a view to the costs. Students and their supervisors, whose budgets are often limited, may benefit from this.”

A metadata catalogue can also be a way for researchers to come into contact with their peers. “If you see in the search results that a particular scientist has done a lot of research on the same topic as you have, it could be a reason to approach that person. In this way, data management eventually reinforces the scientific community.”

Moreover, the catalogue boosts multidisciplinary research. “In the central EPOS portal, all relevant research disciplines are listed. As a result, it is not difficult to find data outside of your own field of expertise. Someone having a lab background, for instance, is encouraged to have a look at satellite data. In my case, I would draw a circle on the map around the region of Groningen. Next I would be shown all datasets from that region. Not only the kinds of data I am familiar with, but also other types of data."

Other discipline, same system

The way in which research data can be found with the help of meta data catalogue EPOS-MSL can also easily be applied to other disciplines. On the same basis, a metadata portal is being developed for the CD2 project ((‘Connecting Data in Child Development’), from the Consortium on Individual Development (CID). At Utrecht University, the studies YOUth and RADAR are part of this consortium. In this catalogue researchers collect data on varied longitudinal studies into the development of children in the Netherlands. The types of data range from biomedical measurements and questionnaires to observations of parent-child interactions and eye tracking experiments. The test portal of the CD2 project is now online. Check it out.

Generate impact with your research

Proper data and metadata are of additional value to science. But still many researchers are hardly aware of this, says Ronald Pijnenburg. “Data management is often seen as something that has to be done, a boring chore you don’t want to spend too much time on. Also the “why” is not always clear. How can my dataset be of help to anybody else?”

The power of international metadata catalogues, he explains, is in making specific datasets findable, next to hundreds of other datasets from the same research disciplines but also from other disciplines. “Reuse of such data has proved its value time and time again in advancing scientific knowledge. Quite useful, I should think.”

But it also has personal advantages to add proper metadata to your published data, he emphasises. “If the data are more easily findable, you generate more impact with your research. And when your collected data are reused by others, it results in your work being more widely known and recognized. This may help forward your career.”

Tips for others

His most important tip for researchers?  “Go and have a look at what data portals and data catalogues are available in your discipline. At the Faculty of Geosciences we have made a useful overview. In addition, many faculties have their own data steward or data manager, they are good starting points to address your questions to. Also RDM Support is happy to help you on your way.”