Modify

Ticket #698 (closed Bugs: fixed)

Opened 8 years ago

Last modified 3 years ago

The case insensitive modifier doesn't work when followed by a character class

Reported by: nobody Owned by: johnmaddock
Milestone: Component: regex
Version: Boost 1.45.0 Severity: Problem
Keywords: Cc:

Description (last modified by johnmaddock) (diff)

My name is Florin Trofin (ftrofin at _adobe_ dot com) and I work for 
Adobe Systems. We are using boost/regex 1.33.1 in one of our projects 
and we've encountered the following bug:

The case insensitive modifier is supposed to make the string comparison
case insensitive from the place at which it is encountered first till
the end. In view of this, if we have "ABC abc aCb" as the text in which
we will be doing search and if we have find string as "(?i)[bc]" then
the expectation is that b/B/c/C will be found. But only b/c is found. If
we have "(?i)a[bc]" as find string then ab/ac/AB/Ab/AC/Ac are found as
expected. The only place which is having problem is when we specify
character class[] immediately after case insensitive modifier "(?i)". 

We also have issues regarding character equivalence on Mac. Japanese
character equivalence in general is not working i.e. if we have [[=x=]]
where x is a Japanese character in hiragana or katakana then the
equivalence is not matching correctly. Please let me know if you want me 
to open a separate bug on this issue.

If you need more info please let me know. Thx!

Attachments

boostbug698.cpp Download (568 bytes) - added by Stephen Wassell <stephen.wassell@…> 3 years ago.
Sample code demonstrating this bug (compiles with MSVC9)

Change History

comment:1 Changed 8 years ago by johnmaddock

Logged In: YES 
user_id=14804

I can't reproduce this, the test program I'm using is below,
can you check and see if this reproduces the issue for you?
 BTW I'm testing with the latest Boost-cvs, but the only
patches I'm aware of making are to non-greedy repeats which
shouldn't have any effect here.

Re equivalence classes: there is no portable way to make
this work unfortunately, it requires that the regex engine
is able to decode the collation string produced by the
locale to extract the primary equivalence class.  The "kind"
of sort key used by the platform is determined in a fairly
heuristic way in find_sort_syntax() in
boost/regex/v4/primary_transform.hpp, and the actually sort
key is produced in cpp_regex_traits::primary_transform(). 
You may - with a bit of debugging - be able to find out
what's going wrong (I don't have access to a mac BTW).  The
most important thing would be to find out what kind of sort
keys are returned by std::collate<>::transform.

Test program follows, John Maddock.

#include <boost/regex.hpp>
#include <iostream>

int main(int,char**) 
{
   boost::regex e("(?i)[bc]");
   std::string s("ABC abc aCb");
   boost::sregex_iterator i(s.begin(), s.end(), e), j;
   while(i != j)
   {
      std::cout << *i << std::endl;
      ++i;
   }
}

comment:2 Changed 8 years ago by sf-robot

  • Status changed from assigned to closed
Logged In: YES 
user_id=1312539

This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).

comment:3 Changed 3 years ago by Stephen Wassell <stephen.wassell@…>

  • Severity set to Problem

This bug still exists as described in the original post. I've tested with Boost 1.45 but I believe it's still there in 1.47.

The regular expression "(?i)[dh]og" should match on both "HOG" and "dog" but in fact only matches on "dog". Please see the attached sample code boostbug698.cpp.

I've fixed the bug as follows: at boost/regex/v4/basic_regex_creator.hpp line 1216 change m_icase to l_icase.

< if(&c != re_is_set_member(&c, &c + 1, static_cast<re_set_long<mask_type>*>(state), *m_pdata, m_icase))

> if(&c != re_is_set_member(&c, &c + 1, static_cast<re_set_long<mask_type>*>(state), *m_pdata, l_icase))

Changed 3 years ago by Stephen Wassell <stephen.wassell@…>

Sample code demonstrating this bug (compiles with MSVC9)

comment:4 Changed 3 years ago by Stephen Wassell <stephen.wassell@…>

  • Status changed from closed to reopened
  • Resolution None deleted

comment:5 Changed 3 years ago by Stephen Wassell <stephen.wassell@…>

  • Version changed from None to Boost 1.45.0
  • Summary changed from The case insensitive modifier doesn't work to The case insensitive modifier doesn't work when followed by a character class

comment:6 Changed 3 years ago by johnmaddock

  • Description modified (diff)

Confirmed.

comment:7 Changed 3 years ago by johnmaddock

  • Status changed from reopened to closed
  • Resolution set to fixed

(In [74898]) Fix case change bug. Fixes #698.

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.