Opened 12 years ago

Closed 7 years ago

#698 closed Bugs (fixed)

The case insensitive modifier doesn't work when followed by a character class

Reported by: nobody Owned by: John Maddock
Milestone: Component: regex
Version: Boost 1.45.0 Severity: Problem
Keywords: Cc:

Description (last modified by John Maddock)

My name is Florin Trofin (ftrofin at _adobe_ dot com) and I work for 
Adobe Systems. We are using boost/regex 1.33.1 in one of our projects 
and we've encountered the following bug:

The case insensitive modifier is supposed to make the string comparison
case insensitive from the place at which it is encountered first till
the end. In view of this, if we have "ABC abc aCb" as the text in which
we will be doing search and if we have find string as "(?i)[bc]" then
the expectation is that b/B/c/C will be found. But only b/c is found. If
we have "(?i)a[bc]" as find string then ab/ac/AB/Ab/AC/Ac are found as
expected. The only place which is having problem is when we specify
character class[] immediately after case insensitive modifier "(?i)". 

We also have issues regarding character equivalence on Mac. Japanese
character equivalence in general is not working i.e. if we have [[=x=]]
where x is a Japanese character in hiragana or katakana then the
equivalence is not matching correctly. Please let me know if you want me 
to open a separate bug on this issue.

If you need more info please let me know. Thx!

Attachments (1)

boostbug698.cpp (568 bytes) - added by Stephen Wassell <stephen.wassell@…> 7 years ago.
Sample code demonstrating this bug (compiles with MSVC9)

Download all attachments as: .zip

Change History (8)

comment:1 Changed 12 years ago by John Maddock

Logged In: YES 

I can't reproduce this, the test program I'm using is below,
can you check and see if this reproduces the issue for you?
 BTW I'm testing with the latest Boost-cvs, but the only
patches I'm aware of making are to non-greedy repeats which
shouldn't have any effect here.

Re equivalence classes: there is no portable way to make
this work unfortunately, it requires that the regex engine
is able to decode the collation string produced by the
locale to extract the primary equivalence class.  The "kind"
of sort key used by the platform is determined in a fairly
heuristic way in find_sort_syntax() in
boost/regex/v4/primary_transform.hpp, and the actually sort
key is produced in cpp_regex_traits::primary_transform(). 
You may - with a bit of debugging - be able to find out
what's going wrong (I don't have access to a mac BTW).  The
most important thing would be to find out what kind of sort
keys are returned by std::collate<>::transform.

Test program follows, John Maddock.

#include <boost/regex.hpp>
#include <iostream>

int main(int,char**) 
   boost::regex e("(?i)[bc]");
   std::string s("ABC abc aCb");
   boost::sregex_iterator i(s.begin(), s.end(), e), j;
   while(i != j)
      std::cout << *i << std::endl;

comment:2 Changed 12 years ago by sf-robot

Status: assignedclosed
Logged In: YES 

This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).

comment:3 Changed 7 years ago by Stephen Wassell <stephen.wassell@…>

Severity: Problem

This bug still exists as described in the original post. I've tested with Boost 1.45 but I believe it's still there in 1.47.

The regular expression "(?i)[dh]og" should match on both "HOG" and "dog" but in fact only matches on "dog". Please see the attached sample code boostbug698.cpp.

I've fixed the bug as follows: at boost/regex/v4/basic_regex_creator.hpp line 1216 change m_icase to l_icase.

< if(&c != re_is_set_member(&c, &c + 1, static_cast<re_set_long<mask_type>*>(state), *m_pdata, m_icase))

> if(&c != re_is_set_member(&c, &c + 1, static_cast<re_set_long<mask_type>*>(state), *m_pdata, l_icase))

Changed 7 years ago by Stephen Wassell <stephen.wassell@…>

Attachment: boostbug698.cpp added

Sample code demonstrating this bug (compiles with MSVC9)

comment:4 Changed 7 years ago by Stephen Wassell <stephen.wassell@…>

Resolution: None
Status: closedreopened

comment:5 Changed 7 years ago by Stephen Wassell <stephen.wassell@…>

Summary: The case insensitive modifier doesn't workThe case insensitive modifier doesn't work when followed by a character class
Version: NoneBoost 1.45.0

comment:6 Changed 7 years ago by John Maddock

Description: modified (diff)


comment:7 Changed 7 years ago by John Maddock

Resolution: fixed
Status: reopenedclosed

(In [74898]) Fix case change bug. Fixes #698.

Note: See TracTickets for help on using tickets.