program

13.1 Interdisciplinary data integration and evidence amalgamation

Aims of the project

The aim of the PhD project is to describe under what circumstances we can reap epistemic benefits from scaling up and interlinking data sets. To this aim we investigate different ways of aligning and connecting existing data sets, and we develop the means to assess the evidential value of the resulting integration, focusing on data pertaining to the mesoscopic level of social interactions. Our analyses will contribute to a confirmation theory, i.e., a philosophical analysis of evidential relations between data and theory, tailored to meta-analysis and big data research.

The project is best located in the intersection of philosophy of science and statistics, in particular in confirmation theory on the side of philosophy, and in meta-analysis and social-science methodology on the statistical side. The results of the project will be applicable to the SCOOP data repository, because the project takes the SCOOP data alignment and integration as its primary case study. The project is linked to the overall aim of the SCOOP proposal to establish a national data infrastructure for the social sciences.  

Background

The objective of the SCOOP data infrastructure is to facilitate a more comprehensive understanding of sustainable cooperation, by making accessible and interlinking data from a variety of disciplines. But when does such an integrated data infrastructure provide evidence that is not already provided by its component parts? Is additional data collection needed or could a combination of available data already provide additional insight? Can we use the available data in a better way to acquire new knowledge, for example by creating reliable information on a meso-level? These questions point to the need to investigate the conditions under which integrating and amassing of a variety of data is evidentially valuable. We need sound methodological basis for our data efforts, and a critical appraisal of when initiatives towards bigger and more varied data pay off in terms of the evidence that the data provide.

A quick example will illustrate what we have in mind. Say that we have collected data on norm compliance in a sociological field study, e.g., people’s willingness to wear masks in public transport, and that we also have available several social psychological experiments on norm compliance in which certain causal factors for it were identified. How can we make use of both studies to obtain a more complete picture of norm compliance? Can we somehow subtract the causal factors found by the psychologists to identify other such factors on the basis of the sociological data? What meta-analytic tools can we use to obtain more precision on variables that are included in both studies?

The motivations for our project are methodological and theoretical. On the methodological side, an important motivation is that systematic and validated re-use of data will become increasingly attractive and rewarding as our data repositories get stocked up. A further motivation lies in the potential use of targeted new data collections for data integration. One of the challenges for data integration in SCOOP is in connecting micro-level data about the individual to macroscopic data on society, for example when we wish to combine social psychological findings with sociological ones. A targeted data collection on the mesoscopic scale, in which smaller communities and organizations are the unit of analysis, may well contribute to the possibilities to integrate SCOOP studies. Similarly, new data from the mesoscopic scale will present a valuable addition to existing databases. The results of the project proposed here may provide valuable input to the broader task, for the SCOOP consortium as a whole, to set up its own data collection.

The primary motivation for developing a principled stance on data integration and evidence amalgamation is of a more theoretical nature: the SCOOP research consortium aims to bring to light how cooperative arrangements can be made resilient. For this purpose the research traverses disciplinary boundaries: sustainable cooperation is a multifarious notion and requires study in all its aspects, irrespective of their disciplinary location. Evidence amalgamation is therefore not only a response to a data revolution that is already underway, it is also a necessity for other research programs like SCOOP, in which a societal problem is tackled from a multitude of angles. Ultimately, the goal of the project is not merely to provide a tool for valuable data integration, but more generally to contribute to the foundations for interdisciplinary science.

Because of this theoretical motivation the proposed research also connects to the goals of the SCOOP program in a derivative sense: the results of the project can contribute to cross-disciplinary collaborations in science, and thereby to sustainable cooperation and value creation in the scientific realm. The SCOOP program can be exemplary for its very own topic.

Description of the project

In statistical methodology we already find proposals to interlink studies by statistical or other means; the “mixed treatment comparisons” from medical science are a case in point (Jansen et al 2008), and so is the idea of “semantic interoperability” between data sets, i.e., the idea that we can zip together data sets through shared variables. And in the philosophy of science there is a pioneering work on the methodology of data-intense research and evidence amalgamation, partly in response to statistical meta-analysis but also building on confirmation theory (Leonelli 2016, Fletcher et al 2019). Taking the data sets in SCOOP as our working material, this project investigates frameworks and tools that can be used for evidence amalgamation in the social sciences, with special attention for the integration of studies across disciplinary divides and at different scales (micro, meso, macro). These methods will subsequently be assessed on the basis of the evidential value of the resulting connections, using concepts and assessment criteria from confirmation theory.

In brief, the research questions that drive the project are:

  • What meta-analytic tools and theoretical structures are most useful for interdisciplinary data integration and evidence amalgamation?
  • What determines the evidential evidential value of the results of data integration and evidence amalgamation, and how can we evaluate this value?
  • Can the insights on data integration and evidence amalgamation be of help to the SCOOP data infrastructure and, if so, how?

The following research areas will be involved in answers to these questions, likely but not necessarily in the order in which they are presented below.

  1. Meta-analysis: with the increased availability of data, meta-analysis is developing rapidly, and so do other techniques for integrating research findings (Cooper et al 2009, Hox et al 2018). There is substantial experience with this in the medical sciences (notably clinical trials) and in the social sciences meta-studies are a growing trend. These developments are crying out for philosophical scrutiny. In the philosophy of science the topic of meta-analysis is heavily underrepresented and, while data scientists and statisticians are developing new tools, substantive methodological reflection is rare.
  2. Social epistemology: some of the techniques for integrating data and data analysis borrow from literature on opinion aggregation and collective decision making. This is an exciting area of cross-disciplinary methodology with a lot of research opportunity; recent work on meta-analysis and opinion pooling is a case in point (Romeijn 20XX). There is an extensive philosophical literature on how people can learn from each other and come to consensus, for instance in a setting in which people share their opinions on an issue and then adapt their own opinion in the light of what the group seems to think. This literature also has connections to political theory, e.g., through voting theory, and to economics, e.g., the analysis of information exchange in markets and auctions. Insights from this literature are relevant for a philosophical understanding of meta-analysis.
  3. Epistemic decision theory: there is a trickle of results in statistical methodology and decision theory on the “value of information”, starting with Good’s seminal paper and continuing up until the present (Pettigrew 2016). The case of data science and the transdisciplinary integration of data sets offers an interesting new domain of application for this research. By way of example, a long-standing question within the philosophy of science is why evidence from a variety of sources, i.e., from different disciplines, should be more convincing than evidence gathered from a single vantage point. The “value of information” literature provides a new inroad into this question.

The PhD student will develop a theoretical framework for evaluating data amalgamation. The hope is that the theoretical framework will ultimately facilitate SCOOP’s data project, i.e., that it can be used to draw up practical guidelines for SCOOP researchers in building up their data infrastructure. For this aim the project can rely on experience in the management of big data research projects among the supervisors.

Finally, to facilitate engagement with the primary case study, the PhD student will work in close contact with a part-time data scientist from the CIT at RUG, who will be more practically involved in structuring and inventorizing the currently collected data in SCOOP. The supervision team of the PhD project will determine how the data scientist can best be deployed within the SCOOP project. S/he will assist in the computational and statistical side of data integration in SCOOP, and serve as a sparring partner and informant for the PhD student and the supervision team.

Methodology

The methodology of the project will be philosophical and statistical analysis informed by the empirical studies from the SCOOP consortium.

Literature

  • Cooper, H., L. V. Hedges, and J. C. Valentine, Eds. (2009). The Handbook of Research Synthesis and Meta-Analysis, Second Edition. Russell Sage.
  • Pettigrew, R. (2016). Accuracy and the Laws of Credence. Oxford.
  • Fletcher, S. C., J. Landes, and R. Poellinger (2019). Evidence amalgamation in the sciences: an introduction. Synthese 196: 3163–3188. (Introduction to a special issue on the topic.)
  • Hox, J., M. Moerbeek, and R. van de Schoot (2018). Multilevel Analysis: Techniques and Applications, Third Edition. Routledge.
  • Jansen, J. P., B. Crawford, G. Bergman, and W. Stam (2008). Bayesian Meta-Analysis of Multiple Treatment Comparisons: An Introduction to Mixed Treatment Comparisons. Value Health 11: 956–964.
  • Kuiper, R.M., V. Buskens, W. Raub, and H. Hoijtink (2013). Combining Statistical Evidence From Several Studies: A Method Using Bayesian Updating and an Example From Research on Trust Problems in Social and Economic Exchange. Sociological Methods & Research 42(1): 60–
  • Leonelli, S. (2016). Data-Centric Biology: A Philosophical Study. Chicago University Press.
  • Romeijn, J.W. (20XX). Stein’s paradox in social epistemology. Manuscript.

Project stakeholders

Jan-Willem Romeijn (Philosophy RUG), Ronald Stolk (Centre for Information Technology RUG), and Tanja van der Lippe (Sociology UU and leader of the SCOOP data team).

Location

Faculty of Philosophy of the University of Groningen

Related Articles