Opened 5 years ago

Closed 5 years ago

#9473 closed Bugs (fixed)

make_u32regex() accepts illegal UTF-8

Reported by: Peter Klotz <peter.klotz@…> Owned by: John Maddock
Milestone: To Be Determined Component: regex
Version: Boost 1.54.0 Severity: Problem
Keywords: Cc:


The attached example shows that make_u32regex() accepts two kinds of illegal UTF-8.

It accepts codepoints reserved for UTF-16 surrogate pairs encoded as 3-byte UTF-8 characters, e.g. "\xed\xa0\x80" representing U+D800.

It accepts overlong UTF-8 encodings where the codepoint value has been extended to the left with additional zero bits, e.g. "\xc0\x80" representing U+0000 whereas its correct 1-byte encoding is "\x00".

Boost.Locale already contains code to protect against overlong encodings (see method width() in

Attachments (1)

main.cpp (1.6 KB) - added by Peter Klotz <peter.klotz@…> 5 years ago.

Download all attachments as: .zip

Change History (2)

Changed 5 years ago by Peter Klotz <peter.klotz@…>

Attachment: main.cpp added

comment:1 Changed 5 years ago by John Maddock

Resolution: fixed
Status: newclosed

Fixed in Git develop.

Modify Ticket

Change Properties
Set your email in Preferences
as closed The owner will remain John Maddock.
The resolution will be deleted.

Add Comment

E-mail address and name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.