Some things are impossible. You cannot levitate a feather with your mind, no matter how hard you try. And yet, some things are more impossible than others. Levitating a feather with one’s mind is in some sense easier than levitating a rock (McCoy and Ullman, 2019, Shtulman and Morgan, 2017). Such graded judgments of impossibility are the topic of ongoing study in cognitive science, philosophy, and cognitive development. The idea is that people’s understanding of imaginary worlds is rooted in their understanding of the real world (Byrne, 2007), so studying what makes things easier or harder in the imagination can reveal people’s understanding of everyday reality. As part of this research, there have been different explanations for what makes some things seem more impossible than others. The explanations are not mutually exclusive, and include (for example) moves across ontological hierarchies (Griffiths, 2015), causal violations (Shtulman & Morgan, 2017), perceived similarity (Goulding & Friedman, 2023), and violations of core knowledge and intuitive physics (Lewry et al., 2021, McCoy and Ullman, 2019).
But, just as there is a dividing line between the merely improbable and the truly impossible, there may be a category of events even more impossible than impossible. Levitating a feather with one’s mind is impossible in our world, but can still be imagined as occurring in a fictional world, and fits into our intuitive theories of possible worlds (Lewis, 1986). In contrast, events like “levitating a feather using the number five” or “finding the square-root of a dog” cannot be evaluated, construed, or imagined in any possible world.1 Borrowing from philosophy, we refer to such events as inconceivable (Gendler & Hawthorne, 2002). Our suggestion is that just as it has been fruitful for cognitive science to study people’s understanding of the impossible, it is useful to study people’s understanding of the inconceivable.
While the relationship between conceivability and possibility has been the topic of much philosophical research (e.g., Balog, 1999, Chalmers, 1996, Gendler and Hawthorne, 2002), there has been less empirical and computational cognitive science study of inconceivability. The most closely related line of work has investigated children’s distinction between events that are impossible (e.g., walking on water) and events that are possible but highly improbable (e.g., growing a beard down to one’s toes, or singing Jingle Bells at a birthday party) (e.g., Browne and Woolley, 2004, Goulding and Friedman, 2021, Goulding et al., 2023, Komatsu and Galotti, 1986, Lane et al., 2016, Shtulman, 2009, Shtulman and Carey, 2007). A main finding across these studies is that while even preschool-age children can distinguish impossible and probable events, they tend to judge improbable (or expectation-violating) events as impossible, and that the ability to distinguish the improbable and impossible is refined with age. This has been taken by some to suggest that young children try to conceive of an improbable event, fail, and judge their inability to conceive of it as evidence that the event cannot occur at all (Harris, 2021). However, such a distinction would not explain the impossible/inconceivable divide for adults: adults are easily able to imagine things they judge to be impossible, at least in our world. These findings in general leave open the question of whether adults naturally treat impossible and inconceivable as distinct modal categories (as they do for impossible and improbable), or whether people naturally treat inconceivability as simply an instance of impossibility.
We begin by investigating whether people can readily distinguish the impossible from the inconceivable, using categorization studies similar to those used to investigate the differences between improbable and impossible. These categorization tests follow the logic of the cognitive development studies mentioned above, which conclude that young children do not distinguish improbable from impossible events early in development, but refine their understanding over early childhood (e.g., Shtulman, 2009, Shtulman and Carey, 2007, Shtulman and Phillips, 2018). In our first experiment (Section “Experiment 1: Modal classification task”), we introduce a novel set of stimuli featuring event descriptions in four different modal categories (probable, improbable, impossible, and inconceivable), and validate them by examining whether people intuitively and reliably distinguish inconceivable event descriptions from event descriptions belonging to the other modal categories. We find that people are highly consistent in their categorizations, suggesting the set of inconceivable event descriptions is easily distinguished from the sets of impossible and improbable event descriptions.
Having established that people readily do make this distinction, we then examine how this distinction might be made. Broadly, there are two functional accounts for how the distinction between impossibility and inconceivability may work, corresponding to a distinction in kind, and a distinction of degree. A functional framework in support of the distinction in kind would argue that encountering the inconceivable is similar to the mind encountering a category error (Magidor, 2009, Ryle, 1949), leading to either termination of further processing, or the need to coerce the meaning of the original event. Category errors may in turn be similar to “type errors” encountered by type-based computer programs (Magidor, 2009, Sosa and Ullman, 2022). This general view also connects to a literature on type-theoretic distinctions in semantic theory (e.g., Bekki, 2014, Chatzikyriakidis and Luo, 2020, Luo, 2012, Montague, 1973, Sutton, 2024), and psycholinguistic studies of selectional restrictions (Chomsky, 1965, Katz and Fodor, 1963) in language processing (e.g., Paczynski and Kuperberg, 2012, Sitnikova et al., 2008, Warren and McConnell, 2007, Warren et al., 2015). An alternative possibility is that people have a single, graded notion of probability, where all modal events exist on a spectrum, including the improbable, impossible, and inconceivable. Each event is processed and assigned some probability of occurring, and modal categories are then read out by defining a straightforward transformation on top of the underlying probabilities. As a simplified example, one could define thresholds on probability values such that improbable events are one-in-a-hundred, impossible are one-in-a-million, and inconceivable are one-in-a-trillion. This view also connects to prior proposals that selectional restrictions can be defined based on statistical associations (Resnik, 1993, Resnik, 1996), and evidence that world knowledge violations elicit similar patterns of neural activity as selectional restriction violations (e.g., Hagoort et al., 2004, Matsuki et al., 2011). Our experiments investigated two potential versions of the second account (“difference in degree”), but we do not resolve the debate of whether inconceivable differs from impossible in kind or degree. We return to the different mechanisms for distinguishing inconceivable and impossible in the Discussion (Section “Discussion”).
In our second experiment (Section “Experiment 2: Subjective ratings of event likelihood”), we ask people for subjective ratings of the likelihood of the events from Experiment 1, on a continuous scale from 0 to 100. We find that people overwhelmingly rate impossible and inconceivable events at the bottom of the scale, and the ratings across these categories cannot be distinguished statistically. This suggests that people’s modal distinctions between impossible and inconceivable are based on information beyond simple perceptions (or transformations) of event likelihood, which weighs against a basic version of the “graded” view — namely, that modal categories are read off based on thresholds upon probability.
We then investigate another version of the graded view by asking to what extent distinctions between modal categories can be made based on the statistical patterns of language. From a cognitive perspective, one proposal has been that people try to imagine or simulate an event occurring, and use the difficulty of imagination to judge the possibility of the event (Gendler and Hawthorne, 2002, Harris, 2021, Kahneman and Tversky, 1981). It is possible to argue that event descriptions that we frequently encounter are easier to simulate or imagine, and event descriptions that are very infrequently encountered will be harder to simulate. To investigate this view, we examine whether statistical language models (LMs) display similar behavioral signatures as humans regarding the distinction between inconceivable event descriptions and other modal categories. LMs are trained with the objective of predicting sequences of tokens, which they learn to do by observing vast amounts of text typically obtained from Internet posts, news datasets, and books. Through this paradigm, LMs may implicitly learn the latent properties of the world that make certain linguistic expressions more or less likely. For example, since people communicate about events (McRae & Matsuki, 2009) and physical observations (Louwerse, 2011, Louwerse, 2018), it may be reasonable to expect that language itself contains structured information about the world. Indeed, prior work has shown that LMs may capture important aspects of commonsense and world knowledge (see Chang & Bergen, 2023 for review), including the distinction between possible and impossible events (Kauf et al., 2023), and the structure of perceptual spaces (Abdou et al., 2021, Merullo et al., 2023, Patel and Pavlick, 2022). The autoregressive next-token-prediction objective used to train LMs also has connections to human behaviors during language comprehension, which involve prediction about upcoming linguistic content (e.g., Altmann and Kamide, 1999, Kutas and Federmeier, 2011, Levy, 2008, Shain et al., 2024, Smith and Levy, 2013) and the integration of generalized event knowledge (e.g., Bicknell et al., 2010, Matsuki et al., 2011, McRae and Matsuki, 2009). It is therefore argued that LMs may learn structured information about the world (and events occurring in the world) in service of optimizing the objective of next-word prediction.
From a machine learning perspective, the question of whether the distinction between impossibility and inconceivability can be learned based on statistical patterns is also highly of interest. If both impossible and inconceivable events occur with vanishingly small frequency, it is not obvious how LMs would learn to distinguish them in string probability space. Indeed, our work’s focus on distinctions within the region of low probabilities – “shades of zero” – differs from two previous major approaches to evaluating LMs. The first prior approach focuses on distinguishing between broad categories which are expected to be associated with high or low probabilities — for example, by performing targeted comparisons between sentences that are grammatical/ungrammatical (e.g., Hu et al., 2020, Marvin and Linzen, 2018, Warstadt et al., 2020), or describe possible/impossible events (e.g., Kauf et al., 2023). The second approach involves studying the behavior of LMs on naturally occurring sentences, which are expected to fall within the region of the string distribution with most probability mass — for example, by examining the alignment between model-derived string distributions and human reading times on naturalistic sentences (e.g., Brothers and Kuperberg, 2021, Hofmann et al., 2022, Shain et al., 2024, Smith and Levy, 2013). While these approaches are reasonable and useful, we suggest that studying distinctions within the near-zero region of the probability distribution – such as the distinction between sentences describing impossible and inconceivable events – can provide new insights into the patterns and commonsense knowledge learned by LMs, and how their representations of event likelihood relate to people’s conceptual categories.
In our third experiment (Section “Experiment 3: Language model evaluation”), we test whether LMs separate the modal categories from our stimuli through the probabilities assigned to string event descriptions, and whether these probabilities align with humans’ subjective ratings of event likelihood (as we measured in Experiment 2). We evaluate five Transformer-based language models from the GPT-2 (Radford et al., 2019) and Llama 3 (AI@Meta, 2024) model families, ranging in size from 124 million to 70 billion parameters, as well as a strong non-neural baseline (Liu et al., 2024). We find that modal categories are strongly separated within the event description probabilities estimated by the neural models, but not within the probabilities estimated by the non-neural baseline. Similarly, the subjective event likelihood ratings provided by humans in Experiment 2 are predicted by the neural models’ probabilities (but not the baseline model’s probabilities) – although models’ probabilities vary widely across impossible and inconceivable event descriptions, while people assign these extremely low likelihood ratings. We additionally test how distinctions between modal categories emerge over the course of training for a single OLMo model (Groeneveld et al., 2024), and find that the distinction between inconceivable and impossible emerges before the distinction between impossible and improbable. These analyses contribute a new empirical investigation of commonsense and event knowledge in modern LMs, which is a central topic in artificial intelligence (Chang and Bergen, 2023, Li et al., 2022, Zhou et al., 2020). In particular, we demonstrate that LMs capture a distinction within the distribution of non-possible events (impossible versus inconceivable), which goes beyond previous studies focusing on distinctions either within the distribution of possible events (likely versus unlikely) or across the boundary of possible and impossible (e.g., Kauf et al., 2024, Kauf et al., 2023, Wang et al., 2018).
Overall, our findings reveal high-level similarities between the way that humans and statistical language models treat inconceivability. First, both humans and models show an ability to categorize probable, improbable, impossible, and inconceivable event descriptions (comparing Experiments 1 & 3). Second, the probabilities assigned to event descriptions by neural LMs predict humans’ subjective ratings of event likelihood (comparing Experiments 2 & 3). And third, the distinction between improbable and impossible emerges later during LM training than the distinctions between other pairs of categories, which is qualitatively consistent with findings in developmental psychology (e.g., Shtulman & Carey, 2007). However, our findings also raise questions about how humans make distinctions between impossible and inconceivable when their subjective likelihoods are indistinguishable (all near zero), and suggest future directions for investigating children’s perceptions of inconceivability. We return to these issues in the Discussion (Section “Discussion”), once we have the results in hand.
Comments (0)