Understanding Online Conversations Through Fractal Dimension: A Case Study Linking Semantic and Structural Analysis of Reddit posts on AI

M.N (Corresponding Author*), R.E (Author), L P. R Jr (Author), M.G (Author), A G.B (Author), K P.R (Author)

Corresponding Author

Micheal Nayebare

Ph.D Candidate

University of Michigan School of Information

University of Michigan

M.N1

MichealNayebare

Ph.D Candidate

1Emailmnayebar@umich.eduEmaileglash@generativejustice.org

RonEglashDirector2Emailmjguz@umich.edu

LionelP.Robert

Jr Professor

3,4Emaillprobert@umich.edu

AudreyG.Bennett5Emailagbennet@umich.edu

Assistant Professor

KwamePorterRobinson6Emailkwamepr@wayne.edu

School of InformationUniversity of Michigan, University of Michigan2200 Hayward St48109Ann ArborMI

The Center for Generative Justice2635 Alex Drive48103Ann ArborMI

School of Information Professor, College of Engineering Robotics DepartmentUniversity of Michigan2200 Hayward St48109Ann ArborMI

University of Michigan2200 Hayward St48109Ann ArborMI

Department of Communication and MediaMedia University Diversity & Social Transformation Professor Penny W. Stamps School of Art & Design LSA, University of Michigan2000 Bonisteel Blvd48109Ann ArborMI

Department of Technology, Information Systems & Analytics (TISA), The Mike Ilitch School of BusinessWayne State University2771 Woodward Ave48201DetroitMI

2200 Hayward St, Ann Arbor, MI 48109

Email: mnayebar@umich.edu

Ron Eglash

Director, The Center for Generative Justice

2635 Alex Drive, Ann Arbor MI 48103

Email: eglash@generativejustice.org

Lionel P. Robert Jr

Professor, School of Information

Professor, College of Engineering Robotics Department

University of Michigan

2200 Hayward St, Ann Arbor, MI 48109

Email: lprobert@umich.edu

Mark Guzdial (he/him/his)

Director, Program in Computing for the Arts and Sciences

Professor of Electrical Engineering and Computer Science

Professor of Information, School of Information (courtesy)

University of Michigan

2200 Hayward St, Ann Arbor, MI 48109

Email: mjguz@umich.edu

Audrey G. Bennett | 奥黛丽 · 博奈特

Professor of Art & Design and Communication and Media

University Diversity & Social Transformation Professor

Penny W. Stamps School of Art & Design

LSA, Department of Communication and Media

University of Michigan

2000 Bonisteel Blvd, Ann Arbor, MI 48109

Email: agbennet@umich.edu

Kwame Porter Robinson

Assistant Professor

Department of Technology, Information Systems & Analytics (TISA)

The Mike Ilitch School of Business

Wayne State University

2771 Woodward Ave, Detroit, MI 48201

Email: kwamepr@wayne.edu

Abstract

Fractal dimension can measure the complexity of a branching structure. Botanical trees for example have more sparse branching in poor environments, and more complex branching in good environments. We hypothesize that the branching structures of online conversations can also use fractal dimension to measure sparse versus complex branching. To test this we measured the fractal dimension of Reddit posts about AI. Posts about purely technical content (e.g. distinctions between different algorithms) had lower fractal dimension than those about social controversies (job loss, racial bias, etc.), suggesting that the controversial conversations had more complex branching structures. A sentiment analysis revealed that social posts had more negative sentiment, consistent with characterizing them as more controversial. We also found that even within each category (social vs technical), higher fractal dimension was associated with more negative sentiment.The fractal model offers further insights when considering its analogous biological models. While it is common to use the metaphor of “conversation tree” we find that fractal metrics reveal a structure closer to Diffusion Limited Growth, found in bacteria colonies, fungi, and rhizomatic plant spread, where “sub-trees” can vary in fractal dimension from the parent. The fractal dimension of social controversy subtrees have a stronger coupling to that of the parent than do the technical subtrees, which has potential implications for the semantic process differences. Overall the application of fractal models to online conversations shows that it allows correlations between structural and semantic aspects, and offers a new way to illuminate the underlying characteristics.

Keywords:

fractal dimension

sub-trees

digital posts

social networks

online behavior

content moderation

1. Introduction

Online conversations are increasingly one of the most influential forces in society. There is potential harm from misinformation and disinformation on social media (Aïmeur et al., 2023; Allcott et al., 2019), but also great promise in democratic forums such as citizen assemblies (Lacelle-Webster & Warren, 2021). In both negative and positive cases, research on solutions such as automated moderation, self-governance mechanisms and other ameliorations are aided by the development of analytic metrics (Horta et al., 2023; Maciel et al., 2007). In this paper, we examine fractal dimension as a metric that can be used to understand relationships between the semantic and structural aspects of conversations. Our findings in this paper indicate that fractal dimension holds promise in this regard: it co-varies with certain semantic metrics, and fractal models can further illuminate some of the underlying structural aspects. We conclude with a discussion on how that might be applied to moderation as well as the design of democratic forums.

Our starting point was the simple concept of a “conversation tree,” in which the first comment creates the “trunk,” the next replies branch off from that, subsequent replies branch off from those branches, and so on. Eglash et al. (2024) offered an artificial model in which differences in conversation branching could be measured with fractal dimension. They speculated that more sedate conversations would have more with sparse branching, giving a low fractal dimension, and that more controversial conversations would have more complex branching, giving a higher fractal dimension. They had developed this analysis in the context of OpenAI’s “Democratic Inputs to AI” program, in which 10 research groups developed democratic forums for recommendations on AI policy (Moats & Ganguly, 2025). During the weekly meetings, several groups reported that they struggled to find topics and forum structures in which there was enough controversy to spark engaged conversations, but not so much that it degraded into chaos. Eglash et al. noted that this balance point–neither conversations that are too sparse, nor those that are too dense– sounded like a similar balance point in biological branching structures.

Their literature review indicated that botanical trees grown in poor conditions tend to have branching that is too sparse, and trees with cancerous conditions are too dense. The same for physiology: branching in neural, pulmonary, and circulatory structures show that low nutrition or other growth restrictions lead to sparse branching, while cancer or infections lead to densely tangled branching. In both cases–botanical and physiological research–fractal dimension has been used to identify this healthy balance point. Sparse branching has a low fractal dimension, pathologically dense branching has a high fractal dimension, and healthy branching is between the two (Fig. 1). Eglash et al. hypothesized that the same fractal metric could be used to find

Fig. 1

fractal dimension covaries with the density of branching in tree shapes

a healthy balance point for conversation trees; i.e. “robust deliberations” (Clark et al., 2019, Dubberly et al., 2009; Guydish et al., 2021). Their model was then extended from simple conversations to broader communication structures, with the ultimate goal of describing how fractal models might be used to describe similar patterns of democratization of AI at many different scales, from local governance of ownership to global human rights in AI policy.

In this paper we have a more modest goal. First, we want to test the hypothesis that fractal dimension can distinguish between sparse and dense branching in conversation trees. Second, we want to test the hypothesis that this fractal metric correlates with semantic differences. In keeping with the original intentions of Eglash et al. to contribute to AI democratization, our test data utilized two types of Reddit posts, both on the topic of AI. For the sparse branching we used Reddit discussions on the technical aspects of AI (details of algorithms etc.). For the dense branching we utilized discussions on social aspects of AI, which tended to be more controversial discussions around race, gender, bias, job loss, etc. In summary:

Hypothesis (H₁)

Estimates of fractal dimension can be obtained for Reddit posts

Hypothesis (H₂)

Reddit conversations on socially controversial topics (race, gender, immigration, etc.) will tend to have a higher fractal dimension than those on technical topics (algorithms and mathematical concepts).

By answering these research questions, we seek to develop fractal dimension as a new metric that can be useful in exploring the structural complexity of online conversations, especially in relation to its semantic characteristics. In particular, we seek insight into fostering engagement and meaningful discourse, without causing chaotic, polarized or otherwise unhelpful situations to unfold.

Our results show that socially controversial conversations have a higher fractal dimension than technical conversations. In terms of content moderation, this suggests that tools might be developed that allow monitoring of the “temperature” of conversations, and alerting the users or moderators as to when the conversation needs to be “cooled down”, or signaling that a topic has become less engaging and the conversation is becoming more repetitive (low fractal dimension). Other applications of this metric might examine fractal dimension rate of change, in relation to disinformation, trolling or other dysfunctional phenomenon.

In addition to the utility of fractal dimension as a metric, our results also show that fractal models can illuminate certain structural features of the conversation. Eglash et al. (2024) presented a model of synchronous conversation (such as you would find in a zoom call or physical classroom), in which the primary question or provocation had a strong influence on the secondary replies, as they did on tertiary replies, and so on, creating a branching structure similar to botanical trees. But in the case of the Reddit posts we examined, they are highly asynchronous. A reader might scan through all the posts before deciding which comment to reply to. Rather than the fractal model of botanical trees, the structure more closely resembles Diffusion Limited Growth (DLG), as seen with bacterial colonies, fungal mycelia, crab grass, and other forms of biotic spread (Tronnolone et al., 2018, Matsuura, 1999).

DLG patterns are also fractal, but more decentralized, with numerous “sub-trees” emerging (Fig. 2). In the biological case the sub-trees vary because growth is following nutrient diffusion

Fig. 2

fractal dimension covaries with the density of branching in Diffusion Limited Growth. Simulations created using the application at https://cfbrasz.github.io/DLA.html, and measured with Fractal Dimension Estimator.

patterns. In the conversation case we hypothesize that sub-trees vary because readers are following their interest patterns. The semantic analysis results also show that sub-tree conversations are slightly independent from their original posts.

2.0 Semantic metrics

The semantic analyses we utilize are well studied in the literature. We utilize three indicators for semantic content. The first is sentiment, categorized as positive, neutral or negative. Following Gao et al. (2021) and Yu et al. (2024), who used a BERT-based model, we use a similar approach but a different machine learning model–VADER (Hutto et al., 2014) to determine the median sentiment score for each reply thread. The second is a data set categorization: because we hypothesized that fractal dimension would be higher for more engaged, controversial, passionate conversations, we needed for more engaged vs more sedate Reddit posts. Using Reddit posts, we manually categorized AI post titles as belonging to either the technical category (those focused on details of algorithms, training methods, computer vision, and so on) or social category (those post titles asking about race, gender, job loss and so on).

2.1 Structural terminology

We used the terminology of numbered levels to describe the post hierarchy. The original provocation or question is level 0. Every reply to the level 0 post is a level 1, every reply to a level 1 post is a level 2, and so on. We found that level 1 posts often had distinct topical differences, as if they were quasi-independent “sub-trees.” Thus it was not so much the model of a botanical tree with a central trunk as it was the spread by rhizomes or runners, as you can see with crab grass, bamboo, black berries, strawberries and other plants with vegetative propagation. In those cases the plant essentially “diffuses” out as far as it can (diffusion limited growth), popping up as numerous “sub-trees” where it can.

For example, in Wang et al. (2009) they examine how rice plants (which propagate with underground root systems) vary their root branching patterns. As they move into drying soil, the researchers found that the fractal dimension for that “sub-tree” of roots becomes lower. The fractal dimension is still within a typical range for that species, but modulated by the water stress. Unlike a botanical tree, which has a very consistent branching pattern across the entire structure, these kinds of Diffusion Limited Growth patterns allow for sub-trees structures that are quasi-independent, maintaining enough similarity to the parent to be within that species’ normal range, but enough independence to respond to local conditions. The same was true for fractal dimension: sub-trees had some independence, but collectively they were correlated with the fractal dimension of the level 0 tree. Interestingly, the correlation of fractal dimension for sub-trees and level 0 trees was lower for social posts than technical posts, again indicating that more controversial, passionate conversations allowed for more variety of dialog structures. See Fig. 7.

Some of the Reddit post sub-trees (i.e. level 1 threads) reached a depth of 12 or more, but this created an overwhelming amount of complexity when dealing with large datasets, so we recommend moderate level capping. Another important structure feature we observed was that many posts were joking replies, providing entertainment value but offering little in the way of helpful information. Even if you could not read the language, joking replies tended to be extremely brief “one liners,” in contrast to those which we designated as having “information relevance.” Our results show that for any individual sub-tree, its fractal dimension is influenced by the fractal dimension of its OP tree (Fig. 7). Sub-trees on average share their parent's structural complexity. However, the strength of that coupling (the slope in Fig. 7) is stronger for social posts, suggesting that they are more likely to invoke structurally complex responses than the technical posts.

2.2 Determining the fractal dimension using the box-counting method

A common application for fractal geometry is the assessment of “ruggedness” or “irregularity” of a curve; e.g. a fracture line or geographic coastline (Mandelbrot 1983; 1989). Because conversation threads are commonly indented, and the indentation is proportional to the depth of the reply, we can treat the indentation contour as a rugged coastline (Fig. 3).

Fig. 3

Illustration of how a threaded conversation is converted into a rugged contour

Fig. 4

Box counting method of measuring fractal dimension of rugged curve

The fractal dimension for irregular curves is often carried out using the box-counting method (Barnsley, 2014; Foroutan-pour et al., 1999; Ai et al., 2014). As shown in Fig. 4, this method detects all boxes in a grid which contain the curve. By using progressively finer grids, we obtain the rate at which the total length increases with shrinking grid size. The slope of that relation–length vs grid size on a log-log plot–is proportionate to the fractal dimension. According to Mandelbrot (1989), box-counting has been found to be simple, reliable, and a more desirable approach to estimate fractal dimension in both linear and non-linear fractal images. The approach has been used widely to estimate fractal dimension in medical images (Korchiyne et al., 2014; Penn et al., 1996; Hadzieva et al., 2014), irregularity and roughness (So et al., 2009), and shape classification (Li et al., 2009).

3. Methodology

3.1 Criteria for selecting socially controversial and technical posts

All posts were retrieved from Reddit using PRAW (Python Reddit API Wrapper)—a Python package that allows access to Reddit's API—alongside a custom Python script to extract the needed features and data points. A sample of the extracted, filtered, and cleaned datasets can be found in Appendix A. We used the step-by-step design process shown in Fig. 5. To test our hypothesis that high fractal dimension is correlated with controversial posts, we categorized Reddit posts and subreddits (e.g., r/MachineLearning) into one of two categories: technical subjects (algorithms, mathematics, data methods, etc.) or socially controversial topics (race, gender, job loss, etc.).

We carried out our analysis on a total of 100 Reddit posts, with 55 in the social controversy category and 45 in the technical category. We lost 5 posts due to the thresholds we set in the original data set in Appendix A. We focused on AI in part because it is a topic in which highly technical and highly social discussions can both be found, and in part because it is one of the technologies for which stakeholders were actively seeking public opinion through participation (Hansen et al., 2022). We classified social issues related to AI that were most likely to lead to disagreement or controversies as those that talked about race, gender, class, labor, Diversity, Equity, and Inclusion (DEI), immigrants, Islam, Israel, borders, and so on. This is similar to work done by Jang et al. (2017). These categories were classified as "socially controversial" conversations.

Controversial and technical categories were observed from two perspectives, original posts–entire post itself, what we also refer to as the global scale and their sub-trees (local scale), level-1 recursive replies. The objective was to compare their fractal dimension in both perspectives.

Diagram showing step-by-step design process

Fig. 5

Illustration of the step-by-step design process used in the study

The second criteria required that all social and technical posts had at least two level-1 branches that extended to a depth of more than three levels. The same threshold was applied to sub-trees. Only 5 posts from the original dataset did not meet the second criteria thresholds, and were removed from the pool. These were “degenerate cases”—for example, single comment trees, pure linear trees with single chained replies, and starburst trees (those with many replies at level-1 and no other replies). In all computations, the original post was considered to be at the root or level-0. Reddit is a highly moderated platform where data points such as upvotes, number of comments, and comment timestamps are dynamic and subject to change over time (Srinivasan, 2023). For this reason, we caution that our analysis is only a snapshot in time (not unusual for online social media metrics).

Prior to our analysis, we excluded posts where the content was removed. We retained replies where the author was deleted but the content was preserved.

3.2 Determining the fractal dimension

To determine the fractal dimension of social versus technical conversations, we used the box-counting method, where a sequence of box grids of descending sizes was layered on top of conversation contour images. To obtain the conversation contour images, horizontal white space indentation was converted to X-axis coordinates. Since each reply line appears one line down from the last (regardless of the length of reply), that vertical spacing provides the Y-axis coordinates. This was easily facilitated by Reddit's hierarchical post structure. We added transition points between these coordinate pairs to create continuous curves, as shown in Fig. 3. Direct coordinate mapping without transition considerations generated sharp 45-degree angles that compromised the fractal measurement accuracy. Finally, the fractal dimension was calculated from the gradient of the log(N(r)) against log(1/r) plot of the box size and counts, respectively. In addition, to properly account for the degree of conversational structural complexity, we penalized replies below the average word length.

We had anticipated that a medium degree of boundary ruggedness would be associated with more meaningful conversations, a low degree with trivial conversations (two-word replies), and a high degree with conversations that have the opposite problem (being too pedantic—simply concerned with trivial aspects). If two-word or one-word replies were creating more rugged landscapes, this would not be beneficial. Therefore, we penalized the X-axis indentation for low word counts, making them shorter relative to conversations with longer replies. This affected social and technical posts equally.

Moreover, these one-liner replies often lack the 'information relevance' needed to advance the conversation or contribute helpful discussions. We had already established the average word count per comment to be approximately between 50–60 words in both conversation categories. The penalty was implemented as follows: 50 + words: no penalty (1.0), 20–49 words: gentle penalty (0.7), < 20 words: moderate penalty (0.4). Both coordinates were responsible for the fractal dimension, a notion that Foroutan-pour et al. (1999) also agreed with, albeit in a different context. The penalties were applied directly to X coordinates and were transitively transferred to the transition coordinates. The contour images and fractal dimension for some of the posts are shown in Fig. 6. Images with more rugged contour lines would require more boxes to cover all the details, and the rate of change of this scale was more likely to be higher than images with less rugged boundaries. We argued that social conversations were more likely to develop contour images with more complex rugged boundaries and therefore would need a higher rate of scale reduction than technical conversations. This higher rate of scale change is what we defined as the "degree of conversational structural complexity." In terms of content moderation, this meant that controversial conversations would require continuous attention and flexible moderation approaches. For platforms and community managers, this signaled that one-size-fits-all content moderation rules were likely not to work for highly complex, evolving threads. We estimated and confirmed the Koch curve coastline metrics using the following fractal dimension estimator.

Fig. 6

At top, the contour of a socially controversial post (A) and its corresponding box counting graph (B). At bottom, the smoother contour of a technical post (C) and the lower slope of its box counting graph (D). The difference is reflected in their respective fractal dimensions: social posts (B) show FD = 1.529, while technical posts (D) show FD = 0.859.

4. Results

4.1 Using the box-counting method.

The results were consistent with our hypothesis, in that the fractal dimension of socially controversial conversations were higher than those of technical conversations. The fractal dimension of socially controversial conversations showed mean = 1.373, median = 1.414, range = 0.983–1.600. Those of technical conversations yielded mean = 0.779, median = 0.780, range = 0.176–1.349. This difference is statistically significant at p < 0.001 for social and technical posts. See Fig. 6. This showed that socially controversial conversations had more complex structures than technical posts.

Figure 7 shows the correlation of fractal dimension between the level 0 tree (the entire conversation extending from the original post (OP)) and the level 1 subtrees. It was about the same for both: r = 0.745 (p < 0.001) for social and r = 0.749 (p < 0.001) for technical. This indicates that the complexity or branching density of replies do not stray far from that of the OP tree as a whole. A biological analogy might be a plant that propagates by rhizomes: each time a “subtree” pops up to the surface, its branching will vary a bit depending on the surrounding soil conditions, but it will stay within the range characteristic for that species. Social vs technical in this analogy is like contrasting species: each has a different characteristic fractal dimension, so the subtrees stay within the range for their particular species as well, despite local variations.

While the correlation of subtree and OP were about the same for social and technical posts, the slopes of these relationships were quite different: 1.359 for the social posts, and 0.653 for the technical posts (p < 0.001). The steeper slope of the social posts shows a stronger coupling between the complexity of the initial post’s tree and the complexity of sub-trees. One interpretation is that in the social case, when an OP is more complex or provocative, it evokes a disproportionately complex pattern of replies. An amplifier of complexity so to speak. Whereas for technical threads, a flatter slope could mean the replies tend to stay bounded by established problem-solving conventions. A biological analogy might be a species that is valuing replication over adaptation (technical posts), versus a species that has an increased mutation rate (as seen, for example, in drug resistant microbes (Boyce, 2022)).

Fractal Dimension: Social vs. Technical Posts

Fig. 7

Pearson’s correlation between the fractal dimension of sub-trees and original posts in socially controversial and technical categories.

4.2 Sentiment analysis

For sentiment analysis, we used the VADER model (Hutto et al., 2014). We found that social posts, which had a higher fractal dimension than technical posts, also had a higher negativity score, consistent with our hypothesis that the social posts were more controversial. We also found the same patterns within each category (technical and social posts). First, that low fractal dimension correlates with higher positive sentiment. Second, higher fractal dimension posts correlate with higher negative sentiment in original posts. Taken together, this suggests that disagreement or controversy is correlated with higher fractal dimension, which is consistent with our hypothesis. See Fig. 8.

Sentiment Analysis Results for Social and Technical Posts

Fig. 8

Sentiment analysis shows that for both social and technical posts, lower fractal dimension is associated with more positive sentiment, and higher fractal dimension is associated with more negative sentiment.

5. Limitations

This study, like many other social media studies, had several limitations. First, it used a categorization–technical discussions vs social controversy discussions–which, while intuitively reasonable, did not have prior literature built up around it. Second, the results we report here are for Reddit posts, and may not be applicable to other types of conversations such as zoom or face to face discussions. Conversations carried out in real time are more sequential: it is rare that someone says “I want to go back to something said earlier”--more typically respondents are replying in real time to the last comment someone made. This is not the case in post-based conversations, where users can respond to a comment made months ago, and are completely asynchronous.

We agree with the observations in Srinivasan (2023) that Reddit is a highly moderated platform with algorithmic insertions (AI-generated data), bots, trolls, deletions, and other uncertainties arising from its black-box algorithms. Therefore it could be that our results were affected by Reddit’s internal recommendation algorithms. The computational process we used is also a potential limitation. We analysed only 100 posts, limiting our sentiment analysis; we recommend that other researchers explore more posts as well. Finally, and perhaps of the greatest importance, we used only Reddit posts and therefore likely could have missed other phenomena from social networks such as X, Meta, Wikipedia and so on.

6. Future work

The original goal expressed in Eglash et al. (2024) was to develop metrics that could be used to facilitate democratic deliberations, with a particular focus on AI policy through online democratic participation. Traditional forums such as town hall meetings, focus groups, and other formal mechanisms have advantages in that they are heavily moderated and strictly organized, but for that reason they have been criticized as a smokescreen that gives the impression of participation while more powerful actors can manipulate the forums (Heeks, 1999). However, democracy does not have to be administered only through such mechanisms. Bottom-up approaches can be embraced as well, especially when the goal is to empower and give voice to those most affected by the decisions being made (Birhane et al., 022). But the more we strive towards open, bottom-up discussions, the more need arises for moderation.

While most of these open discussion platforms are moderated by humans, some rely on AI moderation algorithms (Savaget et al., 2019; McKinsey, 2019, Tsai et al. 2024; McKinney, 2024). Some evidence suggests that AI can outperform human moderators in managing public debates (Tessler et al., 2024; Neff et al., 2023; Laniado et al., 2011 ) by incorporating approaches like "narrative building" to enhance public engagement in digital governance and decision-making (Marmolejo-Ramos, 2022). Other claims include improved political paradigms (Savaget et al., 2019), support for value-based democratic processes, information access and aggregated opinions (Gudiño et al., 2024; Lazzar & Manuali, 2024; Bak et al., 2022), and encouraging citizen input (Delgado et al., 202). These applications of AI are what Nayebare et al. (2023) define as AI possessing "low mimicry" and "high democracy." But they are all susceptible to the ways that the AI has been trained, and it is notoriously difficult to root out bias in AI models or data. Thus, having additional parameters as an independent measure on conversations, such as the fractal dimension metric we examined in this paper, may prove to be all the more important as AI is increasingly utilized in democratic deliberations. As potential pathways to such applications, we recommend that this study be extended to community deliberation platforms, such as Taiwan's Pol.is—a computational democratic real-time project for gathering, analyzing and Recursive Republic, which is based on the vTaiwan model (that relies on a "voices in the room" approach).

Other applications for fractal metrics may be found in more general uses of social media. For example, we have only examined static values of fractal dimension, but there are likely rates of change in the fractal measure as conversations ebb and flow. Spotting “unnatural” rates of change might be of use as a potential indicator of abusive manipulation of social media. Fractal metrics have already been used as an indicator for socially generated networks in the case of open source: Turnu et al. (2013) show that excessively high fractal dimension correlates with the number of software bugs and other defects.

7. Conclusion

This paper demonstrated a method to determine the fractal dimension of Reddit posts. The main hypothesis was that fractal dimension would be higher for more engaged, controversial conversations. In order to test the hypothesis, we made the informal observation that the technical posts about AI on Reddit–those regarding models, algorithms and other math and computing aspects–seemed more calm and subdued, whereas those about social controversies–race, gender, job loss and so on–far more passionately engaged. For that reason we divided our collection of posts into those two categories (social vs technical), and tested to see if the social posts had a higher fractal dimension. The results were consistent with our hypothesis: The fractal dimension of socially controversial conversations showed mean = 1.373, whereas the technical conversations yielded mean = 0.779 (p < 0.001).

Sentiment analysis results were also consistent with the hypothesis. We would expect that more controversial posts would have a higher fractal dimension, and would involve more disagreement, which would show up as either higher levels of negative sentiment, or lower levels of positive sentiment, or both. Social posts, which did have higher fractal dimension, had more negative sentiment, and less positive sentiment, than technical posts, suggesting that they had more controversy. Also supporting the hypothesis was the analysis within each category: technical posts with lower fractal dimension posts had more positive sentiment, and less negative sentiment; than technical posts with higher fractal dimension. The same was true within the social post category. Overall, sentiment analysis was consistent with the hypothesis that fractal dimension is correlated with controversy.

Finally, we note that the relationship between the fractal dimension of the entire branching structure from the original post (OP), and the fractal dimension of its subtrees, was also illuminating. When we graph the linear correlation of the two (OP versus subtree), we find that the slope is higher for social posts than for technical posts. It may be that for technical threads, a flatter slope indicates that replies tend to stay bounded by established technical approaches to reasoning, whereas in the social case, a provocative or controversial post evokes a more complex pattern of replies. While both unitary trees and Diffusion Limited Growth are fractal patterns, the relation between OP fractal dimension and subtree fractal dimension suggests that Reddit posts are more consistent with the model of Diffusion Limited Growth.

We hope that these results will inspire other kinds of structural and semantic inquiries facilitated by fractal modeling of social media, and that future applications of this metric will contribute to important areas such as democratic deliberation, identification of media abuse, and other areas that offer social benefits.

Appendix A

The author confirms that all data generated or analysed during this study are included in this published article. Example from: https://doi.org/10.7302/1x3y-9c95

8. Declarations

Data Availability

The datasets generated and/or analyzed during the current study are available in the University of Michigan - Deep Blue Data repository: https://doi.org/10.7302/1x3y-9c95

Funding

This work was funded by the National Science Foundation (NSF) Future of Work at the Human-Technology Frontier under Grant No. 2128756. All opinions stated or implied in this document are those of the authors and not their respective institutions or the National Science Foundation.

Author Contribution

M.N (Corresponding Author*) , R.E (Author), L P. R Jr (Author), M.G (Author), A G.B (Author), K P.R (Author)

Acknowledgements

Not applicable

References

Ai T, Zhang R, Zhou HW, Pei JL (2014) Box-counting methods to directly estimate the fractal dimension of a rock surface. Appl Surf Sci 314:610–621

Alsinet T, Argelich J, Béjar R, Martínez S (2021) Measuring polarization in online debates. Appl Sci 11(24):11879

Aïmeur E, Amri S, Brassard G (2023) Fake news, disinformation and misinformation in social media: a review. Social Netw Anal Min 13(1):30

Allcott H, Gentzkow M, Yu C (2019) Trends in the diffusion of misinformation on social media. Res Politics 6(2). https://doi.org/10.1177/2053168019848554(Original work published 2019)

Alsinet T, Argelich J, Béjar R, Martínez S (2024) On the Complexity of the Bipartite Polarization Problem: From Neutral to Highly Polarized Discussions. Algorithms 17(8):369

Andersen VN, Hansen KM (2007) How deliberation makes better citizens: The Danish Deliberative Poll on the euro. Eur J Polit Res 46(4):531–556

Bak P, Tang C, Wiesenfeld K (1988) Self-organized criticality. Phys Rev A 38(1):364

Barnsley MF (2014) Fractals everywhere. Academic

Birhane A, Isaac W, Prabhakaran V, Diaz M, Elish MC, Gabriel I, Mohamed S (2022), October Power to the people? Opportunities and challenges for participatory AI. In Proceedings of the 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (pp. 1–8)

Clark L, Pantidi N, Cooney O, Doyle P, Garaialde D, Edwards J, Spillane B et al (2019) What makes a good conversation? Challenges in designing truly conversational agents. In Proceedings of the 2019 CHI conference on human factors in computing systems, pp. 1–12

Delgado F, Yang S, Madaio M, Yang Q (2021) Stakeholder Participation in AI: Beyond Add Diverse Stakeholders and Stir. arXiv preprint arXiv:2111.01122

Eglash R, Nayebare M, Robinson K et al (2024) AI governance through fractal scaling: integrating universal human rights with emergent self-governance for democratized technosocial systems. AI Soc. https://doi.org/10.1007/s00146-024-02029-4

Eglash R, Robinson KP, Bennett A, Robert L, Garvin M (2024) Computational reparations as generative justice: Decolonial transitions to unalienated circular value flow. Big Data Soc 11(1):20539517231221732

Eglash R (2017) Generative Technologies from Africa. In Global Africa, edited by Dorothy L. Hodgson and Judith A. Byfield University of California Press

Foroutan-pour K, Dutilleul P, Smith DL (1999) Advances in the implementation of the box-counting method of fractal dimension estimation. Appl Math Comput 105(2–3):195–210

Gao H, Jacobson N, Liang J, Zhang C (2021) Between passion and politics: How emotions drive engagement and polarization on Reddit

Gudiño JF, Grandi U, Hidalgo C (2024) Large language models (LLMs) as agents for augmented democracy. Philosophical Trans A 382(2285):20240100

Guydish AJ, Fox Tree JE (2021) Good conversations: Grounding, convergence, and richness. New Ideas Psychol 63:100877

Hadzieva E, Bogatinoska DC, Petroski R, Shuminoska M, Gjergjeska L, Karadimce A, Trajkova V (2016) Is the fractal dimension of the contour-lines a reliable tool for classification of medical images? In MATEC Web of Conferences (Vol. 76, p. 05002). EDP Sciences

Hansen SS (2022) Public AI imaginaries: How the debate on artificial intelligence was covered in Danish newspapers and magazines 1956–2021. Nordicom Rev, 43(1)

Heeks R (1999) The tyranny of participation in information systems: Learning from development projects. Development Informatics working paper, (4)

Hutto CJ, Gilbert EE (2014) VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. In Proceedings of the Eighth International Conference on Weblogs and Social Media (ICWSM-14), pp. 216–225

Jang M, Dori-Hacohen S, Allan J (2017), October Modeling controversy within populations. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval (pp. 141–149)

Laniado D, Tasso R (2011) Co-authorship 2.0: Patterns of collaboration in Wikipedia. In Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, pp. 201–210

Maciel C, Bicharra Garcia AC (2007), September Design and Metrics of a ‘Democratic Citizenship Community’in Support of Deliberative Decision-Making. In International Conference on Electronic Government (pp. 388–400). Berlin, Heidelberg: Springer Berlin Heidelberg

Mandelbrot BB (1983) The fractal geometry of nature/Revised and enlarged edition. New York

Mandelbrot BB (1989) Fractal geometry: what is it, and what does it do? Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences, 423(1864), 3–16

Marmolejo-Ramos F, Workman T, Walker C, Lenihan D, Moulds S, Correa JC, Sonna B (2022) AI-powered narrative building for facilitating public participation and engagement. Discover Artif Intell 2(1):7

Matsuura S (1999) Growth and colony patterning of filamentous fungi. FORMA-TOKYO- 14(4):315–320

McKinney S (2024) Integrating Artificial Intelligence into Citizens’ Assemblies: Benefits, Concerns and Future Pathways. J Deliberative Democracy 20(1). https://doi.org/10.16997/jdd.1556

Moats D, Ganguly C (2025) Bringing AI participation down to scale. Patterns (N Y). 6(5):101241. 10.1016/j.patter.2025.101241. PMID: 40486964; PMCID: PMC12142630

Nayebare M, Eglash R, Kimanuka U, Baguma R, Mounsey J (2023) Interim report for Ubuntu-AI: A bottom-up approach to more democratic and equitable training and outcomes for machine learning. Conference: Democratic Inputs for AI

Neff JJ, Laniado D, Kappler KE, Volkovich Y, Aragón P, Kaltenbrunner A (2013) Jointly they edit: Examining the impact of community identification on political interaction in wikipedia. PLoS ONE, 8(4), e60584

Penn AI, Loew MH (1996, April) Estimating fractal dimension of medical images. Medical Imaging 1996: Image Processing, vol 2710. SPIE, pp 840–851

Saeed MH, Ali S, Blackburn J, De Cristofaro E, Zannettou S, Stringhini G (2022), May Trollmagnifier: Detecting state-sponsored troll accounts on reddit. In 2022 IEEE symposium on security and privacy (SP) (pp. 2161–2175). IEEE

Savage VM, Deeds EJ, Fontana W (2008) Sizing up allometric scaling theory. PLoS Comput Biol, 4(9), e1000171

Savaget P, Chiarini T, Evans S (2019) Empowering political participation through artificial intelligence. Sci Public Policy 46(3):369–380

So GB, So HR, Jin GG (2017) Enhancement of the box-counting algorithm for fractal dimension estimation. Pattern Recognit Lett 98:53–58

Srinivasan K (2023) Paying attention. Technical Report, Mimeo

Tronnolone H, Tam A, Szenczi Z, Green JEF, Balasuriya S, Tek EL, Binder BJ (2018) Diffusion-limited growth of microbial colonies. Sci Rep 8(1):5992

Tsai LL, Pentland A, Braley A, Chen N, Enríquez JR, Reuel A (2024) Generative AI for Pro-Democracy Platforms

Turnu I, Concas G, Marchesi M, Tonelli R (2013) The fractal dimension of software networks as a global quality metric. Inf Sci 245:290–303

Wang H, Siopongco J, Wade LJ, Yamauchi A (2009) Fractal analysis on root systems of rice plants in response to drought stress. Environ Exp Bot 65(2–3):338–344

Yu Y, Jiang J, Dhillon PS (2024) Characterizing the Structure of Online Conversations Across Reddit. Proceedings of the ACM on Human-Computer Interaction, 8(CSCW2), 1–23

Yes

Fractal Dimension Estimator is a software tool to measure the fractal dimension (FD) of a 2D image-http://www.fractal-lab.org/Downloads/FDEstimator.html

We should note that this is only a measure of the conversations’ structural complexity. It may well be that the concepts expressed are highly complex, even if the structure is not. “E = MC²” requires only 5 symbols.

Introducing Democratic Fine-Tuning-https://meaningalignment.substack.com/p/introducing-democratic-fine-tuning