From 20ddfb7fcded992bd0e98b9ef8797104fe0900fd Mon Sep 17 00:00:00 2001
From: Harvey Falcic
->>> s = test.non_utf8_c_str() +@@ -5974,7 +5974,7 @@ bytes are represented as high surrogate characters that can be used to obtain the original byte sequence: -+>>> s = example.non_utf8_c_str() >>> s 'h\udce9llo wörld'+>>> b = s.encode('utf-8', errors='surrogateescape') >>> b b'h\xe9llo w\xc3\xb6rld' @@ -5985,7 +5985,7 @@ One can then attempt a different encoding, if desired (or simply leave the byte string as a raw sequence of bytes for use in binary protocols): -+@@ -5995,7 +5995,7 @@ Note, however, that text strings containing surrogate characters are rejected with the default strict codec error handler. For example: ->>> b.decode('latin-1') 'héllo wörld'+>>> with open('test', 'w') as f: ... print(s, file=f) ...