Add missing checks for failures in calls to PyUnicode_AsUTF8String.

Previously a seg fault could occur when passing invalid UTF8 strings (low
surrogates), eg passing u"\udcff" to the C layer (Python 3).
This commit is contained in:
William S Fulton 2017-12-04 18:41:55 +00:00
commit b0e29fbdf3
12 changed files with 92 additions and 25 deletions

View file

@ -6521,14 +6521,16 @@ string that cannot be completely decoded as UTF-8:
<div class="code"><pre>
%module example
%include &lt;std_string.i&gt;
%inline %{
const char* non_utf8_c_str(void) {
const char * non_utf8_c_str(void) {
return "h\xe9llo w\xc3\xb6rld";
}
void instring(const char *s) {
...
}
%}
</pre></div>
@ -6590,6 +6592,20 @@ For more details about the <tt>surrogateescape</tt> error handler, please see
<a href="https://www.python.org/dev/peps/pep-0383/">PEP 383</a>.
</p>
<p>
When Python 3 strings are passed to the C/C++ layer, they are expected to be valid UTF8 Unicode strings too.
For example, when the <tt>instring</tt> method above is wrapped and called, any invalid UTF8 Unicode code strings
will result in a TypeError because the attempted conversion fails:
</p>
<div class="targetlang"><pre>
&gt;&gt;&gt; example.instring('h\xe9llo')
&gt;&gt;&gt; example.instring('h\udce9llo')
Traceback (most recent call last):
File "&lt;stdin&gt;", line 1, in &lt;module&gt;
TypeError: in method 'instring', argument 1 of type 'char const *'
</pre></div>
<p>
In some cases, users may wish to instead handle all byte strings as bytes
objects in Python 3. This can be accomplished by adding