BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation

Yuyang Peng    Shishi Xiao    Keming Wu    Qisheng Liao     Bohan CHEN     Kevin Lin     Danqing Huang     Ji Li     Yuhui Yuan    
Tsinghua University         Brown University         University of Liverpool         Microsoft Research Asia         Microsoft        

Generated Infographics

info 1 info 2 info 3 info 4 info 5

Generated Slides

slide 1_1 slide 1_2 slide 1_3 slide 1_4
slide 2_1 slide 2_2 slide 2_3 slide 2_4
slide 3_1 slide 3_2 slide 3_3 slide 3_4

Abstract

Recently, state-of-the-art text-to-image generation models, such as Flux and Ideogram 2.0, have made significant progress in sentence-level visual text rendering. In this paper, we focus on the more challenging scenarios of article-level visual text rendering and address a novel task of generating high-quality business content, including infographics and slides, based on user provided article-level descriptive prompts and ultra-dense layouts. The fundamental challenges are twofold: significantly longer context lengths and the scarcity of high-quality business content data.

In contrast to most previous works that focus on a limited number of sub-regions and sentence-level prompts, ensuring precise adherence to ultra-dense layouts with tens or even hundreds of sub-regions in business content is far more challenging. We make two key technical contributions: (i) the construction of scalable, high-quality business content dataset, i.e., Infographics-650K, equipped with ultra-dense layouts and prompts by implementing a layer-wise retrieval-augmented infographic generation scheme; and (ii) a layout-guided cross attention scheme, which injects tens of region-wise prompts into a set of cropped region latent space according to the ultra-dense layouts, and refine each sub-regions flexibly during inference using a layout conditional CFG.

We demonstrate the strong results of our system compared to previous SOTA systems such as Flux and SD3 on our BizEval prompt set. Additionally, we conduct thorough ablation experiments to verify the effectiveness of each component. We hope our constructed Infographics-650K and BizEval can encourage the broader community to advance the progress of business content generation.



Scalable Data Engine

data data


Layout-Guided Cross Attention

input attention framework


Layout Conditional CFG

lcfg


Local region quality accessment pipeline

lgsr


Generation Quality Compare

quality




BibTeX


        @misc{peng2025bizgenadvancingarticlelevelvisual,
          title={BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation}, 
          author={Yuyang Peng and Shishi Xiao and Keming Wu and Qisheng Liao and Bohan Chen and Kevin Lin and Danqing Huang and Ji Li and Yuhui Yuan},
          year={2025},
          eprint={2503.20672},
          archivePrefix={arXiv},
          primaryClass={cs.CV},
          url={https://arxiv.org/abs/2503.20672}, 
        }
      

        @article{liu2024glyphv2,
      		title={Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering},
      		author={Liu, Zeyu and Liang, Weicong and Zhao, Yiming and Chen, Bohan and Li, Ji and Yuan, Yuhui},
      		journal={arXiv preprint arXiv:2406.10208},
      		year={2024}
    	  }
      

        @article{liu2024glyph,
      		title={Glyph-byt5: A customized text encoder for accurate visual text rendering},
      		author={Liu, Zeyu and Liang, Weicong and Liang, Zhanhao and Luo, Chong and Li, Ji and Huang, Gao and Yuan, Yuhui},
      		journal={arXiv preprint arXiv:2403.09622},
      		year={2024}
    	  }
      

Acknowledgements

Website adapted from the following template