本文和大家分享的主要是python3
中全角與半角字符的轉(zhuǎn)換相關(guān)內(nèi)容,一起來看看吧,希望對大家
學習python3有所幫助。
一、背景介紹
·
解決什么問題
:快速方便的對文本進行全角半角自動轉(zhuǎn)換
·
適用什么場景
:學生答題數(shù)據(jù)中全角字符替換為半角字符
二、全角半角原理
·
全角即:
D ouble
B yte
C haracter
,簡稱
DBC
·
半角即:
S ingle
B yte
C haracter
,簡稱
SBC
·
在
windows
中,中文和全角字符都占兩個字節(jié),并且使用了
ascii chart 2 (codes 128–255)
;
·
全角字符的第一個字節(jié)總是被置為
163
,而第二個字節(jié)則是相同半角字符碼加上
128
(不包括空格,全角空格和半角空格也要考慮進去);
·
對于中文來說,它的第一個字節(jié)被置為大于
163
,如
’
阿
’
為
:176 162
,檢測到中文時不進行轉(zhuǎn)換。
·
例如:半角
a
為
65
,則全角
a
是
163
(第一個字節(jié))、
193
(第二個字節(jié),
128+65
)。
全角半角示例:(文本 test.txt
包含全角和半角字符)
F:\test>
type
test.
txt123456
?。保玻常矗担?/span>
abcdefg
abcdefg
中國你好
三、使用 Python3 實現(xiàn)全角半角轉(zhuǎn)換
# -*- coding:utf-8 -*-
”’
全角即:Double Byte Character
,簡稱:
DBC
半角即:Single Byte Character
,簡稱:
SBC
”’
def DBC2SBC(ustring):
”’
全角轉(zhuǎn)半角
”’
rstring = “”
for uchar in ustring:
inside_code = ord(uchar)
if inside_code == 0x3000:
inside_code = 0x0020
else:
inside_code -= 0xfee0
if not (0x0021 <= inside_code and inside_code <= 0x7e):
rstring += uchar
continue
rstring += chr(inside_code)
return rstring
def SBC2DBC(ustring):
”’
半角轉(zhuǎn)全角
”’
rstring = “”
for uchar in ustring:
inside_code = ord(uchar)
if inside_code == 0x0020:
inside_code = 0x3000
else:
if not (0x0021 <= inside_code and inside_code <= 0x7e):
rstring += uchar
continue
inside_code += 0xfee0
rstring += chr(inside_code)
return rstring
s = ”’
array(‘
0
’ => ‘0’, ‘
1
’ => ‘1’, ‘
2
’ => ‘2’, ‘
3
’ => ‘3’, ‘
4
’ => ‘4’,
‘
5
’ => ‘5’, ‘
6
’ => ‘6’, ‘
7
’ => ‘7’, ‘
8
’ => ‘8’, ‘
9
’ => ‘9’,
‘
A
’ => ‘A’, ‘
B
’ => ‘B’, ‘
C
’ => ‘C’, ‘
D
’ => ‘D’, ‘
E
’ => ‘E’,
‘
F
’ => ‘F’, ‘
G
’ => ‘G’, ‘
H
’ => ‘H’, ‘
I
’ => ‘I’, ‘
J
’ => ‘J’,
‘
K
’ => ‘K’, ‘
L
’ => ‘L’, ‘
M
’ => ‘M’, ‘
N
’ => ‘N’, ‘
O
’ => ‘O’,
‘
P
’ => ‘P’, ‘
Q
’ => ‘Q’, ‘
R
’ => ‘R’, ‘
S
’ => ‘S’, ‘
T
’ => ‘T’,
‘
U
’ => ‘U’, ‘
V
’ => ‘V’, ‘
W
’ => ‘W’, ‘
X
’ => ‘X’, ‘
Y
’ => ‘Y’,
‘
Z
’ => ‘Z’, ‘
a
’ => ‘a(chǎn)’, ‘
b
’ => ‘b’, ‘
c
’ => ‘c’, ‘
d
’ => ‘d’,
‘
e
’ => ‘e’, ‘
f
’ => ‘f’, ‘
g
’ => ‘g’, ‘
h
’ => ‘h’, ‘
i
’ => ‘i’,
‘
j
’ => ‘j’, ‘
k
’ => ‘k’, ‘
l
’ => ‘l’, ‘
m
’ => ‘m’, ‘
n
’ => ‘n’,
‘
o
’ => ‘o’, ‘
p
’ => ‘p’, ‘
q
’ => ‘q’, ‘
r
’ => ‘r’, ‘
s
’ => ‘s’,
‘
t
’ => ‘t’, ‘
u
’ => ‘u’, ‘
v
’ => ‘v’, ‘
w
’ => ‘w’, ‘
x
’ => ‘x’,
‘
y
’ => ‘y’, ‘
z
’ => ‘z’,
‘
(
’ => ‘(‘, ‘
)
’ => ‘)’, ‘
〔
’ => ‘[‘, ‘
〕
’ => ‘]’, ‘
【
’ => ‘[‘,
‘
】
’ => ‘]’, ‘
〖
’ => ‘[‘, ‘
〗
’ => ‘]’, ‘”‘ => ‘[‘, ‘”‘ => ‘]’,
‘\” => ‘[‘, ‘\” => ‘]’, ‘
{
’ => ‘{‘, ‘
}
’ => ‘}’, ‘
《
’ => ‘<‘,
‘
》
’ => ‘>’,
‘
%
’ => ‘%’, ‘
+
’ => ‘+’, ‘—’ => ‘-‘, ‘
-
’ => ‘-‘, ‘
~
’ => ‘-‘,
‘
:
’ => ‘:’, ‘
。
’ => ‘.’, ‘
、
’ => ‘,’, ‘
,
’ => ‘.’, ‘
、
’ => ‘.’,
‘
;
’ => ‘,’, ‘
?
’ => ‘?’, ‘
!
’ => ‘!’, ‘…’ => ‘-‘, ‘‖’ => ‘|’,
‘”‘ => ‘”‘, ‘\” => ‘`’, ‘\” => ‘`’, ‘
|
’ => ‘|’, ‘
〃
’ => ‘”‘,
‘ ’ => ‘ ‘);
”’
#
全角轉(zhuǎn)半角
print(DBC2SBC(s))
#
半角轉(zhuǎn)全角
print(SBC2DBC(s))
s = ”’
中文測試
”’
#
全角轉(zhuǎn)半角
print(DBC2SBC(s))
#
半角轉(zhuǎn)全角
print(SBC2DBC(s))
來源:
陳鵬個人博客