Python Keeps Alignment when Printing ASCII Characters Including Chinese

Introduction

When we print Python strings, we add spaces and tabs between words to align the output list or ASCII table.

For example, the following ASCII table

# โ•”โ•โ•โ•โ•โ•ฆโ•โ•โ•โ•โ•โ•โ•โ•โ•ฆโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฆโ•โ•โ•โ•โ•โ•โ•โ•—
# โ•‘ id โ•‘  name  โ•‘ course  โ•‘ score โ•‘
# โ• โ•โ•โ•โ•โ•ฌโ•โ•โ•โ•โ•โ•โ•โ•โ•ฌโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฌโ•โ•โ•โ•โ•โ•โ•โ•ฃ
# โ•‘  1 โ•‘ Alex   โ•‘ English โ•‘    90 โ•‘
# โ•‘  2 โ•‘ Elaine โ•‘ Math    โ•‘    92 โ•‘
# โ•‘  3 โ•‘ Tom    โ•‘ Science โ•‘    88 โ•‘
# โ•‘  4 โ•‘ Sophia โ•‘ History โ•‘    94 โ•‘
# โ•šโ•โ•โ•โ•โ•ฉโ•โ•โ•โ•โ•โ•โ•โ•โ•ฉโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฉโ•โ•โ•โ•โ•โ•โ•โ•

There is no problem when the strings are all in English, but once the string contains Chinese or other non-ASCII characters, it will be difficult to align.

World!!       โ•‘ # 7 normal spaces
Helloไฝ ๅฅฝ     โ•‘ # 5 normal spaces
Helloไฝ ๅฅฝ      โ•‘ # 6 normal spaces

The reason is that the width of non-ASCII characters such as Chinese is larger than that of English letters, which requires special treatment for Chinese.

Analysis

In order to align ASCII characters, enough spaces are usually added to the character gap. The ordinary spaces " " belong to ASCII characters, and the Unicode code is U+0020, and if there are Chinese or Chinese punctuation marks, you need to use Chinese Full-width spaces to fill blanks, Unicode encoding U+3000.

Here we make a simple demonstration. When we recognize characters, after counting the length of Chinese characters, use the same length of ordinary spaces and the remaining length of full-width spaces to fill in. Equivalent to every ASCII character has a non-ASCII character pair, the length must be the same.

You can get the following output

Helloไฝ ๅฅฝ  ใ€€ใ€€ใ€€ใ€€ใ€€โ•‘ # 2 normal spaces + 5 full-width spaces
World!!ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€ใ€€โ•‘ # 7 full-width spaces

Code

import re

re_chinese = re.compile(r"[\u4e00-\u9fa5\๏ผ\๏ผŸ\ใ€‚\๏ผ‚\๏ผ‡\๏ผˆ\๏ผ‰\๏ผŠ\๏ผ‹\๏ผŒ\๏ผ\๏ผ\๏ผš\๏ผ›\๏ผœ\๏ผ\๏ผž\๏ผ \๏ผป\๏ผผ\๏ผฝ\๏ผพ\๏ผฟ\๏ฝ€\๏ฝ›\๏ฝœ\๏ฝ\๏ฝž\๏ฝŸ\๏ฝ \ใ€\ใ€ƒ\ใ€Š\ใ€‹\ใ€Œ\ใ€\ใ€Ž\ใ€\ใ€\ใ€‘\ใ€”\ใ€•\ใ€–\ใ€—\ใ€˜\ใ€™\ใ€š\ใ€›\ใ€œ\ใ€\ใ€ž\ใ€Ÿ\ใ€ฐ\ใ€พ\ใ€ฟ\โ€“\โ€”\โ€˜\โ€™\โ€›\โ€œ\โ€\โ€ž\โ€Ÿ\โ€ฆ\โ€ง\๏น\๏ผŽ]", re.S)

def format_ascii(text) :
    t= re.findall(re_chinese,text)
    count = len(t)
    return text  + " " * count + u"\u3000" * (len(text) - count)


print(format_ascii("Helloไฝ ๅฅฝ") + "โ•‘")
print(format_ascii("World!!") + "โ•‘")

Online Demo Python Online Editor

Conclusion

The above is about the Chinese string alignment problem encountered in Python development, which basically meets our development needs, and there may be some details that have not been noticed. You are welcome to put forward better ideas.

Reference

Comments