Problems Writing £ (pound sterling) To MS SQL Server using pymssql

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Mon Nov 17 19:01:15 EST 2008


En Mon, 17 Nov 2008 15:05:43 -0200, J. Cliff Dyer <jcd at sdf.lonestar.org>  
escribió:
> On Mon, 2008-11-17 at 15:55 +0000, Darren Mansell wrote:
>> On Mon, 2008-11-17 at 15:24 +0000, Tim Golden wrote:
>> > Darren Mansell wrote:
>> > >
>> > > I'm trying to write a £ symbol to an MS SQL server using pymsssql .  
>> > > This
>> > > works but when selecting the data back (e.g. using SQL management
>> > > studio) the £ symbol is replaced with £ (latin capital letter A >  
>> > with
>> > > circumflex).
>
> As I was trying to explain in my other email, the £ does *not* have an
> "extra symbol" attached to it.  It is being encoded at UTF-8 and then
> decoded as Latin-1 (ISO-8859-1).  If you had other higher-order (>
> ASCII) characters in your text, they would also be mis-decoded, but
> would probably not show the original character in the output.  That was
> just a coincidence.

I was curious as how much of a coincidence it was. From  
http://en.wikipedia.org/wiki/Utf8
- all code points below 0x80 are encoded as themselves
- 0x80 to 0xFF are encoded as two bytes 0b110000xx 0b10xxxxxx (where the  
"x" represent the 8 bits to be encoded).
The second byte will be the same as the original if it goes in the range  
0b10000000 to 0b10111111, that is, 0x80 to 0xBF.  0x80 to 0x9F aren't  
assigned in Latin-1, and we're left with the range 0xA0 to 0xBF. So any of  
those 32 characters is encoded in utf8 by just "prepending" 0xC2 to them.

py> for i in range(128,256):
...   c = unichr(i)
...   u = c.encode('utf8')
...   if ord(u[1])==i:
...   	print i, hex(i), c, u.encode('hex')
...
128 0x80 ? c280
129 0x81  c281
130 0x82 ? c282
131 0x83 ? c283
132 0x84 ? c284
133 0x85 ?c285
134 0x86 ? c286
135 0x87 ? c287
136 0x88 ? c288
137 0x89 ? c289
138 0x8a ? c28a
139 0x8b ? c28b
140 0x8c ? c28c
141 0x8d  c28d
142 0x8e ? c28e
143 0x8f  c28f
144 0x90  c290
145 0x91 ? c291
146 0x92 ? c292
147 0x93 ? c293
148 0x94 ? c294
149 0x95 ? c295
150 0x96 ? c296
151 0x97 ? c297
152 0x98 ? c298
153 0x99 ? c299
154 0x9a ? c29a
155 0x9b ? c29b
156 0x9c ? c29c
157 0x9d  c29d
158 0x9e ? c29e
159 0x9f ? c29f
160 0xa0  c2a0
161 0xa1 ¡ c2a1
162 0xa2 ¢ c2a2
163 0xa3 £ c2a3
164 0xa4 ¤ c2a4
165 0xa5 ¥ c2a5
166 0xa6 ¦ c2a6
167 0xa7 § c2a7
168 0xa8 ¨ c2a8
169 0xa9 © c2a9
170 0xaa ª c2aa
171 0xab « c2ab
172 0xac ¬ c2ac
173 0xad ­ c2ad
174 0xae ® c2ae
175 0xaf ¯ c2af
176 0xb0 ° c2b0
177 0xb1 ± c2b1
178 0xb2 ² c2b2
179 0xb3 ³ c2b3
180 0xb4 ´ c2b4
181 0xb5 µ c2b5
182 0xb6 ¶ c2b6
183 0xb7 · c2b7
184 0xb8 ¸ c2b8
185 0xb9 ¹ c2b9
186 0xba º c2ba
187 0xbb » c2bb
188 0xbc ¼ c2bc
189 0xbd ½ c2bd
190 0xbe ¾ c2be
191 0xbf ¿ c2bf

-- 
Gabriel Genellina




More information about the Python-list mailing list