How to Polish Mermaid Diagrams with Gemini Nano Banana
A method to convert rough Mermaid diagrams written by engineers into infographics suitable for business materials using Gemini's image generation feature "Nano Banana." Introduces a four-sentence system prompt and practical examples.
Mermaid diagrams are indispensable for communication among engineers. However, many have likely experienced being told, “It’s hard to tell what this is about” when pasting them directly into meeting materials or client-facing documents. While text-based Mermaid can accurately represent structure, it often fails to produce “business-oriented diagrams” with a polished appearance.
Bridging this gap is a new method using Google Gemini’s image generation feature, commonly known as “Nano Banana.” According to an article published on June 6, 2026, by Qiita user ktdatascience, a mere four-sentence system prompt can transform Mermaid “blueprints” into infographics with white backgrounds, Japanese Gothic fonts, and official icons. Moreover, the time required for polishing is nearly zero.
Why Use Mermaid as an Intermediary?
The key to this method lies in not directly feeding Nano Banana with a textual description of the diagram. The author noted that in early trials, passing lengthy text directly to the image generation API caused the model to arbitrarily summarize content, leading to distorted elements and arrow directions.
To address this, the approach of using Mermaid as an intermediate language was devised. Mermaid functions as a “structural intermediate language” that precisely describes elements and relationships. By first fixing the structure with Mermaid and then handing it to Nano Banana, the model’s role is limited to “polishing.”
This division of roles is clear: humans take responsibility for design (structure), while AI handles only visual adjustments. By completing the easily disrupted “summarization” step on the human side first, stable, high-quality output is achieved.
Breaking Down the Four-Sentence System Prompt
The custom instructions (system prompt) for the published Gem are as follows:
Summarize and structure the input, then generate an infographic image in Japanese Gothic font. Use official icons for visualization. If the input exceeds 30 characters, first summarize and structure the content before generating the image. Use a white background and omit the title.
The intentions behind these four sentences are explained as follows:
“Summarize and structure the input” directs the AI to break down the content into a structure first, whether it’s Mermaid or long text, providing a foundation that resists distortion even with rough input. “In Japanese Gothic font” fixes the typeface to a standard one, as serif or pop fonts would be unsuitable for business materials. “Use official icons for visualization” specifies the use of icons for AWS and various services, though a note warns not to expect too much. “If the input exceeds 30 characters” acts as a safety net for accidentally pasting long text, again requiring summarization and structuring first. “Use a white background and omit the title” ensures the image doesn’t look out of place in documents or Slack, and avoids AI generating a title when the user might want to add one themselves.
Considerations for Model Selection
As of early 2026, Nano Banana is no longer a single model. Users need to choose from the following three options based on the use case:
- Nano Banana (Gemini 2.5 Flash Image): Fast and lightweight, but Japanese text tends to be distorted.
- Nano Banana Pro (Gemini 3 Pro Image): Suitable for infographics, diagrams, and multilingual text. Optimal for this use case.
- Nano Banana 2 (Gemini 3.1 Flash Image): Achieves Pro-level quality at Flash speed.
If the priority is readable Japanese text and structurally sound diagrams, choosing Pro or 2 is recommended. Google itself positions Nano Banana Pro as suitable for “visualizing ideas, turning data into infographics, and illustrating handwritten notes,” which aligns perfectly with this use case.
How to Use in Three Steps
The usage is surprisingly simple:
- Write the content to be illustrated in Mermaid (as you normally would).
- Paste that Mermaid into a Gem configured with the above prompt and send it.
- Use the resulting image directly in documents, Slack, or meeting minutes.
Simply writing Mermaid in VS Code, copying it, and pasting reduces the polishing effort to nearly zero. The author published three examples, each showing how a simple Mermaid diagram is transformed through Nano Banana.
In the AWS architecture diagram (three-tier web app) example, a diagram that was merely arrows becomes visually clear with icons, allowing one to easily trace “where requests come from and where data is stored.” In the data analytics pipeline example, the polished version proves effective when explaining to non-engineers. In the business flow (sequence diagram) example, reading the raw sequence diagram requires background knowledge, but the polished version naturally conveys the flow: “submit → payment → email sent.”
Editorial Opinion
What makes this method particularly interesting is how it turns AI’s “summarization habit” to its advantage. When lengthy text is fed to an image generation AI, the model often over-simplifies content. However, by providing Mermaid—a “non-redundant structural representation”—first, the AI has no room for unnecessary summarization. This represents a new form of “prompt engineering” for using generative AI in business.
In the short term, this method significantly reduces time spent creating diagram illustrations for meeting minutes and specifications. Especially in situations where “diagrams drawn by engineers are hard to understand,” this offers an immediately effective improvement. Since it generates images that can be directly pasted into Slack, Notion, or Google Docs, it is practical to adopt without changing existing workflows.
In the long term, the combination of text-based diagrammatic representations like Mermaid and generative AI is likely to expand further. While this case focuses on “polishing,” advanced use cases such as “generating diagrams from multiple perspectives from Mermaid” or “converting old architecture diagrams into the latest service configurations” are conceivable. However, the author acknowledges limitations in the automatic application of official icons, indicating that full automation is still some distance away.
What the editorial team finds noteworthy is how this method fundamentally achieves a “separation of design and decoration.” Engineers focus on structure, while AI handles appearance. This clear division of roles ensures stable quality. This concept holds potential beyond simple diagram creation, extending to software development documentation as a whole.
Reference
- 【Diagram】Convert Engineers’ “Rough Mermaid” into Business-Friendly Illustrations - Qiita — Published June 6, 2026
Frequently Asked Questions
- Is this method free to use?
- Using Gemini's image generation feature (Nano Banana) requires access to the Gemini API and the relevant model (Nano Banana Pro or 2). Available models vary by plan, so check the official website for the latest compatibility.
- Can it be used with other diagrammatic representations like PlantUML or Graphviz?
- In principle, similar methods can be applied to any diagramming language that clearly expresses structure. However, the same rule applies: avoid directly inputting text over 30 characters. The approach of fixing structure first and then polishing is applicable to all text-based diagrammatic representations.
- What about the copyright of generated images?
- Rights for images generated by Gemini (Nano Banana) follow Google's terms of service. Generally, images generated based on user input belong to the user, but for business use, it is recommended to check both your company's policy and Google's terms.
Comments