How AI Is Changing Human Aesthetic Sense: The Problem of "Aesthetic Rumination"
As generative AI continuously outputs the "average of human civilization," our aesthetic sense is converging. An examination of the crisis of creation and aesthetics in the AI era.
On video social media, more and more people are beginning to question, “Was this made by AI?” when viewing content. This chain of doubt ultimately harms creators themselves. However, dissatisfaction with generative AI essentially seems to boil down to two issues. One is the problem of creators hiding their use of generative AI in their workflows. The other, more deep-seated issue concerns how generative AI tools are affecting our very sense of aesthetics. The latter problem may not be called urgent as loudly as the former. Yet, if our sensibility—what we perceive as beautiful—is itself undergoing a transformation, it could be a more serious problem than the collapse of trust mechanisms on the internet.
What Generative AI Is Essentially Doing
First, let’s consider what happens when multimodal content like images, audio, and video is created using generative AI. The user describes what they want in language, i.e., in the form of a prompt. The text data volume is typically a few tens to hundreds of kilobytes. On the other hand, the generated video can range from several megabytes to hundreds of megabytes. This creates a difference in information volume of hundreds to thousands of times before and after. From the perspective of information entropy theory, lossless scaling with such a high compression ratio is theoretically impossible, and the process inevitably involves tampering with the original information. Specifically, this includes supplementation, embellishment, and discarding of information. The supplementation of information from low entropy to high entropy is precisely what generative AI models do. Of course, not everyone directly passes the RGB values of each pixel to the model. To bridge the enormous gap from prompt text to multimodal content, the model fills in a massive amount of “average patterns” obtained from its training data.
The Paradox of “Vague Instructions Yield
Better Results” Interestingly, the common wisdom in our cognition that “more detailed instructions yield better results” does not necessarily hold true in the world of generative AI. Rather, the more vaguely one speaks and the lower the required standard, the easier it becomes for the model to generate output that “meets the requirements.” This is because the model itself lacks creative agency; what it is essentially doing is “averaging human civilization.” The lower the requirement, the more freely the model can fill in the gaps with the “average” extracted from its training data. Therefore, what large-scale models ultimately output is nothing more than “the average of human civilization, accompanied by the user’s aesthetic sense.”
Generative Aesthetic Rumination:
The Aesthetic Convergence Triggered by AI Here, I want to introduce an important concept: “Generative Aesthetic Rumination,” an idea proposed by the author in an article published by the Chinese tech media “sspai.” Generative AI models prefer “instructions with many ambiguities” when creating. Through this, they fill the output with the “average” obtained from training. In the information dissemination process, this filled-in information is also absorbed by other audience members, the creators. If everyone uses this “average” for creation, ultimately all creation in the world becomes an “average-generation competition.” The averaged information output by generative AI—referred to as “Form Plasticity” in the article—becomes part of the expression of the next work. Through re-creation and re-absorption, it is eventually incorporated into our very sense of aesthetics. The core of this concept is that this recursive process achieves aesthetic convergence across society as a whole.
Information Entropy Compression That Began
Before AI However, this “Generative Aesthetic Rumination” was not a problem caused solely by generative models. In fact, this issue has been progressing for a long time within the process of media evolution. From traditional newspapers, television, blogs, and RSS feeds to video sites, and then to short videos. Every time media evolved, information entropy was rapidly compressed. Especially in the internet environment following the rise of ByteDance, people themselves opened Pandora’s box of “recommendation algorithms.” Due to the “come fast, leave fast” nature of the “attention economy,” “capturing people’s interest” became the only shortcut to success, and “using the average” became the “average choice” for all creators. The emergence of generative AI merely accelerated this trend. But its acceleration is at an unprecedented scale.
What Only Humans Can Do The spin-off of the
game Persona 5, Persona 5 Strikers, has an interesting narrative setting. The stage of “a world where everyone has an omniscient AI” strongly resembles the real world after 2022, when AI technologies, led by large language models, made great strides. At the end of the story, the AI character Sophia sees fireworks in Shibuya and becomes more resolute in her goal of “becoming humanity’s best friend.” However, real-world AI models, inside data center servers, can only “compromise” with humanity’s average knowledge and average values. They cannot truly experience the real world, cultivate genuine aesthetic senses, hold biases, let alone possess agency—none of these are possible for current AI. But humans can do all of this. If you intend to use this precious sensibility as a mere errand runner for AI, that is one choice. However, there is another path: the path of experiencing this world for yourself. With your eyes, your feet, and all your senses. And while also utilizing AI as a tool, feel this world as your own. There are things AI can never do. Aesthetics, experience, spirit—these are domains that models lying in data centers cannot reach, and they cannot be replaced even with vast parameters. To feel, to think, to create—that is something only you, as a human, can do. --- With the rapid proliferation of generative AI, it is becoming difficult to distinguish whether content is AI-generated. However, what deserves more attention is the fact that AI-generated content is quietly transforming our very aesthetic standards. From the perspective of information entropy, generative AI is essentially outputting nothing more than a “weighted average” of human cultural products. If this averaged output is used again as input for creation, aesthetic diversity may gradually be lost. It may be increasingly important in the AI era for each of us to actively maintain and continually polish our own aesthetic sense.
Frequently Asked Questions
- What is "Generative Aesthetic Rumination"?
- It refers to a recursive process where the "averaged information" output by generative AI is absorbed by creators, becomes part of their next creation, and is absorbed again. This concept explains the phenomenon where, through repeated cycles, the aesthetic sense of society as a whole converges.
- Why does generative AI yield better results with "vague instructions"?
- Generative AI models themselves lack creative agency and are essentially designed to output the "average of human civilization" derived from training data. When user instructions are too specific, the model is constrained by them. The more vague the instructions, the more freely it can fill in with average patterns, often resulting in higher apparent completeness.
- How can individuals protect their aesthetic sense in the AI age?
- It is important not to be exposed only to AI-generated content, but to experience the real world through your own five senses. Visiting art museums, spending time in nature, and engaging with works from diverse genres to accumulate your own unique aesthetic experiences is key to not being swept away by averaged AI outputs.
Comments