Add missing checks for failures in calls to PyUnicode_AsUTF8String.
Previously a seg fault could occur when passing invalid UTF8 strings (low surrogates), eg passing u"\udcff" to the C layer (Python 3).
This commit is contained in:
parent
069ce1f6e9
commit
b0e29fbdf3
12 changed files with 92 additions and 25 deletions
|
|
@ -6521,14 +6521,16 @@ string that cannot be completely decoded as UTF-8:
|
|||
<div class="code"><pre>
|
||||
%module example
|
||||
|
||||
%include <std_string.i>
|
||||
|
||||
%inline %{
|
||||
|
||||
const char* non_utf8_c_str(void) {
|
||||
const char * non_utf8_c_str(void) {
|
||||
return "h\xe9llo w\xc3\xb6rld";
|
||||
}
|
||||
|
||||
void instring(const char *s) {
|
||||
...
|
||||
}
|
||||
|
||||
%}
|
||||
</pre></div>
|
||||
|
||||
|
|
@ -6590,6 +6592,20 @@ For more details about the <tt>surrogateescape</tt> error handler, please see
|
|||
<a href="https://www.python.org/dev/peps/pep-0383/">PEP 383</a>.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
When Python 3 strings are passed to the C/C++ layer, they are expected to be valid UTF8 Unicode strings too.
|
||||
For example, when the <tt>instring</tt> method above is wrapped and called, any invalid UTF8 Unicode code strings
|
||||
will result in a TypeError because the attempted conversion fails:
|
||||
</p>
|
||||
|
||||
<div class="targetlang"><pre>
|
||||
>>> example.instring('h\xe9llo')
|
||||
>>> example.instring('h\udce9llo')
|
||||
Traceback (most recent call last):
|
||||
File "<stdin>", line 1, in <module>
|
||||
TypeError: in method 'instring', argument 1 of type 'char const *'
|
||||
</pre></div>
|
||||
|
||||
<p>
|
||||
In some cases, users may wish to instead handle all byte strings as bytes
|
||||
objects in Python 3. This can be accomplished by adding
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue