Python Keeps Alignment when Printing ASCII Characters Including Chinese
Introduction
When we print Python strings, we add spaces and tabs between words to align the output list or ASCII table.
For example, the following ASCII table
# โโโโโโฆโโโโโโโโโฆโโโโโโโโโโฆโโโโโโโโ
# โ id โ name โ course โ score โ
# โ โโโโโฌโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโฃ
# โ 1 โ Alex โ English โ 90 โ
# โ 2 โ Elaine โ Math โ 92 โ
# โ 3 โ Tom โ Science โ 88 โ
# โ 4 โ Sophia โ History โ 94 โ
# โโโโโโฉโโโโโโโโโฉโโโโโโโโโโฉโโโโโโโโ
There is no problem when the strings are all in English, but once the string contains Chinese or other non-ASCII characters, it will be difficult to align.
World!! โ # 7 normal spaces
Helloไฝ ๅฅฝ โ # 5 normal spaces
Helloไฝ ๅฅฝ โ # 6 normal spaces
The reason is that the width of non-ASCII characters such as Chinese is larger than that of English letters, which requires special treatment for Chinese.
Analysis
In order to align ASCII characters, enough spaces are usually added to the character gap. The ordinary spaces " "
belong to ASCII characters, and the Unicode code is U+0020
, and if there are Chinese or Chinese punctuation marks, you need to use Chinese Full-width spaces to fill blanks, Unicode encoding U+3000
.
Here we make a simple demonstration. When we recognize characters, after counting the length of Chinese characters, use the same length of ordinary spaces and the remaining length of full-width spaces to fill in. Equivalent to every ASCII character has a non-ASCII character pair, the length must be the same.
You can get the following output
Helloไฝ ๅฅฝ ใใใใใโ # 2 normal spaces + 5 full-width spaces
World!!ใใใใใใใโ # 7 full-width spaces
Code
import re
re_chinese = re.compile(r"[\u4e00-\u9fa5\๏ผ\๏ผ\ใ\๏ผ\๏ผ\๏ผ\๏ผ\๏ผ\๏ผ\๏ผ\๏ผ\๏ผ\๏ผ\๏ผ\๏ผ\๏ผ\๏ผ\๏ผ \๏ผป\๏ผผ\๏ผฝ\๏ผพ\๏ผฟ\๏ฝ\๏ฝ\๏ฝ\๏ฝ\๏ฝ\๏ฝ\๏ฝ \ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใ\ใฐ\ใพ\ใฟ\โ\โ\โ\โ\โ\โ\โ\โ\โ\โฆ\โง\๏น\๏ผ]", re.S)
def format_ascii(text) :
t= re.findall(re_chinese,text)
count = len(t)
return text + " " * count + u"\u3000" * (len(text) - count)
print(format_ascii("Helloไฝ ๅฅฝ") + "โ")
print(format_ascii("World!!") + "โ")
Online Demo Python Online Editor
Conclusion
The above is about the Chinese string alignment problem encountered in Python development, which basically meets our development needs, and there may be some details that have not been noticed. You are welcome to put forward better ideas.
Comments