Modify

Opened 7 years ago

Last modified 6 years ago

#5086 new Patches

Fix for possible assertion failure in MSVC isctype.c

Reported by: Kazutoshi Satoda <k_satoda@…> Owned by: jsiek
Milestone: To Be Determined Component: tokenizer
Version: Boost Development Trunk Severity: Problem
Keywords: Cc:

Description

Ticket #4791 about warning C6328 was closed as fixed. But in fact, the fix (r66855) only silenced the warning, not addressing the real problem indicated by the warning.

The problem is explained here. In short: (for character classification functions in <cctype>)

the valid range of values for its input argument is:

0 <= c <= 255, plus the special value EOF.

Otherwise, the behavior is undefined. Thus, passing an user provided value of char (may be signed) can cause UB. The existing comment just above struct traits_extension addresses the same issue.

Nothing was changed by the static_cast<int> introduced by the fix. It just expressed what the compiler implicitly does (integral promotion).

Here is the patch to fix the real problem, including a test about the problem.

Attachments (1)

boost_tokenizer_tolerate_negative_char.patch (2.5 KB) - added by Kazutoshi Satoda <k_satoda@…> 7 years ago.
svn diff for trunk r68230

Download all attachments as: .zip

Change History (8)

Changed 7 years ago by Kazutoshi Satoda <k_satoda@…>

svn diff for trunk r68230

comment:1 Changed 7 years ago by Kazutoshi Satoda <k_satoda@…>

Version: Boost 1.45.0Boost Development Trunk

comment:2 Changed 6 years ago by Marshall Clow

Good explanation; I see the problem that needs to be fixed.

However, it seems to me that this patch will change the behavior of non-MS systems.

isspace ( -5 ) will become isspace ( 251 ) - is that correct?

comment:3 in reply to:  2 Changed 6 years ago by Kazutoshi Satoda <k_satoda@…>

isspace ( -5 ) will become isspace ( 251 ) - is that correct?

Yes. But the former was undefined behavior. I think change to any certain behavior from undefined behavior is enough convincing.

comment:4 Changed 6 years ago by Marshall Clow

I don't see where that's undefined behavior (except on MS-systems)

Looking at the C specification (since the C++ one defers back to that), all I see is:

7.4.1.9 The isspace function

Synopsis:

#include <ctype.h> 
int isspace(int c);

Description: The isspace function tests for any character that is a standard white-space character or is one of a locale-specific set of characters for which isalnum is false. The standard white-space characters are the following: space (’ ’), form feed (’\f’), new-line (’\n’), carriage return (’\r’), horizontal tab (’\t’), and vertical tab (’\v’). In the "C" locale, isspace returns true only for the standard white-space characters.

Is there something that I'm missing?

comment:5 in reply to:  4 Changed 6 years ago by Kazutoshi Satoda <k_satoda@…>

Is there something that I'm missing?

Yes, here: 7.4 Character handling <ctype.h> p1

The header <ctype.h> declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

comment:6 Changed 6 years ago by Marshall Clow

Thanks for the pointer.

But I still see a problem with the path, specifically, when handling EOF. Casting EOF into an unsigned char will make it "not EOF" any more.

comment:7 in reply to:  6 Changed 6 years ago by Kazutoshi Satoda <k_satoda@…>

But I still see a problem with the path, specifically, when handling EOF.

A value char(EOF), which become equal to EOF after integer promotion if EOF is representable by char, in a string should never be treated as end of file in the first place.

This patch will somewhat cure that case, too.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The owner will remain jsiek.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.