融合组件位置信息特征的少样本字体生成

doi:10.3778/j.issn.1673-9418.2305030

摘要/Abstract

摘要： 局部组件方法实现少样本字体生成任务，通常会忽略组件位置信息对文字生成的作用，造成生成文字整体结构布局存有偏差。为了能够有效捕获组件位置信息，提出一种融合组件位置信息特征的少样本字体生成方法（CPI-Font）。提出的CPI-Font模型以MX-Font为基本框架，设计一种新的全局位置信息提取器。以坐标注意力提取全局位置信息，并通过多头组件注意力关注每个组件在全局位置信息中的不同重要程度，以捕获局部组件在整个字形中的全局位置关系特征，避免生成的字形结构产生偏差。采用公开构建的39种字体作为汉字数据集，与目前主流模型进行大量实验。实验结果显示，提出模型的LPIPS值达到0.112，FID值达到88.5，在内容和风格上的准确率分别达到83.1%、70.4%，在三个评价指标上模型均优于其他算法。结果表明，CPI-Font模型能够有效捕获组件位置信息，并具有较为先进的少样本字体生成性能。

关键词: 字体生成, 全局位置信息, 局部组件

Abstract: In order to realize the few shot font generation, local component method usually ignores the role of component location information on text generation, resulting in the deviation of the overall structure layout of the generated text. In order to capture component location information effectively, a method of few shot font generation by integrating component position information features (CPI-Font) is proposed. The proposed CPI-Font model takes MX-Font as the basic framework to design a new global location information extractor. The global position information is extracted with coordinate attention, and different importance of each component in the global position information is paid attention to by multi-component attention, so as to capture the global position relation features of local components in the whole glyphs and avoid the deviation of the generated glyphs structure. Using 39 publicly constructed fonts as Chinese character datasets, a large number of experiments are carried out with the current mainstream models. Experimental results show that the proposed model achieves 0.112 in LPIPS value, 88.5 in FID value, and 83.1% and 70.4% in content and style accuracy, respectively. The model is superior to other algorithms in three evaluation indices. The results show that the CPI-Font model can capture the location information of components effectively and has a relatively advanced performance of the few shot font generation.

Key words: font generation, global position information, local component

杨娜, 殷雁君, 张文轩, 云飞. 融合组件位置信息特征的少样本字体生成[J]. 计算机科学与探索, 2024, 18(6): 1556-1565.

YANG Na, YIN Yanjun, ZHANG Wenxuan, YUN Fei. Few Shot Font Generation Fused with Component Position Information[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(6): 1556-1565.

参考文献

[1] 陈旭. 基于深度学习的少样本字体生成算法研究[D]. 济南: 山东大学, 2022.
CHEN X. Few-shot font generation based on deep learning[D]. Jinan: Shandong University, 2022.
[2] ZHANG Y, ZHANG Y, CAI W. Separating style and content for generalized style transfer[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2018: 8447-8455.
[3] GAO Y, GUO Y, LIAN Z, et al. Artistic glyph image synthesis via one-stage few-shot learning[J]. ACM Transactions on Graphics, 2019, 38(6): 1-12.
[4] LIU W, LIU F, DING F, et al. XMP-Font: self-supervised cross-modality pre-training for few-shot font generation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 7905-7914.
[5] CHA J, CHUN S, LEE G, et al. Few-shot compositional font generation with dual memory[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 735-751.
[6] PARK S, CHUN S, CHA J, et al. Multiple heads are better than one: few-shot font generation with multiple localized experts[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 13900-13909.
[7] LI C, TANIGUCHI Y, LU M, et al. Few-shot font style transfer between different languages[C]//Proceedings of the 2021 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021: 433-442.
[8] HUANG Y, HE M, JIN L, et al. RD-GAN: few/zero-shot Chinese character style transfer via radical decomposition and rendering[C]//Proceedings of the 16th European Conference on Computer Vision, Glasgow, Aug 23-28, 2020. Cham: Springer, 2020: 156-172.
[9] WU S J, YANG C Y, HSU J Y. CalliGAN: style and structure-aware Chinese calligraphy character generator[J]. arXiv:2005. 12500, 2020.
[10] PARK S, CHUN S, CHA J, et al. Few-shot font generation with localized style representations and factorization[C]//Proceedings of the 2021 AAAI Conference on Artificial Intelligence, Washington, Feb 7-14, 2021. Menlo Park: AAAI, 2021: 2393-2402.
[11] KUHN H W. The Hungarian method for the assignment problem[J]. Naval Research Logistics Quarterly, 1955, 2(1/2): 83-97.
[12] GRETTON A, BOUSQUET O, SMOLA A, et al. Measuring statistical dependence with Hilbert-Schmidt norms[C]//Proceedings of the 16th International Conference on Algorithmic Learning Theory, Singapore, Oct 8-11, 2005. Berlin, Heidelberg: Springer, 2005: 63-77.
[13] GRETTON A, FUKUMIZU K, TEO C H, et al. A kernel statistical test of independence[C]//Advances in Neural Information Processing Systems 20: Proceedings of the 21st Annual Conference on Neural Information Processing Systems, Vancouver, Dec 3-6, 2007. Red Hook: Curran Associates, 2008: 585-592.
[14] ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2017: 2881-2890.
[15] ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 19-21,2018. Washington: IEEE Computer Society, 2018: 586-595.
[16] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANS trained by a two time-scale update rule converge to a local Nash equilibrium[C]//Advances in Neural Information Processing Systems 30, Long Beach, Dec 4-9, 2017: 6626-6637.