Opened 16 years ago

Closed 15 years ago

#101 closed Support Requests (Works For Me)

regex performance issue

Reported by: nobody Owned by: John Maddock
Milestone: Component: None
Version: None Severity:
Keywords: Cc:

Description

Is boost::regex_merge able to parse more than 20kb 
data per sec?

 I'm using a function like the following, and cant get it 
above that rate..

std::string RegExpBinReplace(LPCSTR szWhat, 
std::string szWhere, DWORD len, LPCSTR 
szReplacement)
{
	const boost::regex e
(szWhat,boost::regbase::normal);
	std::ostringstream t(std::ios::out | 
std::ios::binary);
	std::ostream_iterator<char, char> oi(t);
	boost::regex_merge(oi,  szWhere.begin(), 
(szWhere.begin() + len), e, szReplacement);
	return t.str();
}

Compiled with visual c++ 6, running on a AMD 
xp2000+...

Change History (2)

comment:1 Changed 16 years ago by nobody

Logged In: NO 

ops... i asked the same to Dr. John Maddock, directly...
here it is his answer:

>> Second, I found a "leak" in your documentation: you often use
>>   ...
>>   std::ostreamstring t;
>>   std::ostream_iterator<char> oi(t);
>>   boost::regex_merge(oi, in.begin(), in,end(), expr, format);
>>   ...
>> I implemented a tiny sed(1) facility, but using
ostream_iterator slows my
>> program down terribly while executing the sed
substitution command
>> (s/regex/replacement/) many times on a 14Kb-sized html
file (it takes some
>> seconds to accomplish the tasks, while sed(1) takes less
than one second).
>
>> I found the problem is in re_details::re_copy_out
function, which seems fast
>> but it is not fast at all with ostream_iterators.

> Try using a ostreambuf iterator instead (my docs should be
changed to do the
> same): its a lot quicker.

> Make sure you turn on all optimisations before making
comparisons: the
> stream iterators are excrusiatingly slow until
optimisatioms are turned on
> (at which point they can actually be pretty fast).

>> So why don't make re_details::string_out_iterator public (in
documentation)?

> There shouldn't be any need for that: it's a workaround
for broken std
> libraries, you should really be able to use
std::back_inserter with strings.

> regards,

> John Maddock

since i am a beginner in c++, i am still using
re_details::string_out_iterator that is faster than
ostream_iterator (but not fast enough): now my own sed
searches&replaces a >30Kb-sized html file more than 25 times
in 0.63 secs on a PIII/500 under Linux 2.4.18 + gcc 2.95.3.

bye
Claudio

comment:2 Changed 15 years ago by John Maddock

Status: assignedclosed
Logged In: YES 
user_id=14804

A couple of points:

* The current cvs code has just been updated with a much 
faster version, however the new version still won't improve 
performance for long searches, which was already 
substantially faster than the GNU regex library for example.
* Using C++ iostreams can be rather slow, unless you are 
very careful: check that synch_with_stdio is false for example 
(otherwise the performance hit can be *huge*).
* outputting the result to a string, and then copying to stdout 
is going to be a heck of a lot slower than just copying to 
output (due to memory allocation requirements)

John Maddock.
Note: See TracTickets for help on using tickets.