Modify

Opened 15 years ago

Closed 15 years ago

#101 closed Support Requests (Works For Me)

regex performance issue

Reported by: nobody Owned by: John Maddock
Milestone: Component: None
Version: None Severity:
Keywords: Cc:

Description

Is boost::regex_merge able to parse more than 20kb 
data per sec?

 I'm using a function like the following, and cant get it 
above that rate..

std::string RegExpBinReplace(LPCSTR szWhat, 
std::string szWhere, DWORD len, LPCSTR 
szReplacement)
{
	const boost::regex e
(szWhat,boost::regbase::normal);
	std::ostringstream t(std::ios::out | 
std::ios::binary);
	std::ostream_iterator<char, char> oi(t);
	boost::regex_merge(oi,  szWhere.begin(), 
(szWhere.begin() + len), e, szReplacement);
	return t.str();
}

Compiled with visual c++ 6, running on a AMD 
xp2000+...

Attachments (0)

Change History (2)

comment:1 Changed 15 years ago by nobody

Logged In: NO 

ops... i asked the same to Dr. John Maddock, directly...
here it is his answer:

>> Second, I found a "leak" in your documentation: you often use
>>   ...
>>   std::ostreamstring t;
>>   std::ostream_iterator<char> oi(t);
>>   boost::regex_merge(oi, in.begin(), in,end(), expr, format);
>>   ...
>> I implemented a tiny sed(1) facility, but using
ostream_iterator slows my
>> program down terribly while executing the sed
substitution command
>> (s/regex/replacement/) many times on a 14Kb-sized html
file (it takes some
>> seconds to accomplish the tasks, while sed(1) takes less
than one second).
>
>> I found the problem is in re_details::re_copy_out
function, which seems fast
>> but it is not fast at all with ostream_iterators.

> Try using a ostreambuf iterator instead (my docs should be
changed to do the
> same): its a lot quicker.

> Make sure you turn on all optimisations before making
comparisons: the
> stream iterators are excrusiatingly slow until
optimisatioms are turned on
> (at which point they can actually be pretty fast).

>> So why don't make re_details::string_out_iterator public (in
documentation)?

> There shouldn't be any need for that: it's a workaround
for broken std
> libraries, you should really be able to use
std::back_inserter with strings.

> regards,

> John Maddock

since i am a beginner in c++, i am still using
re_details::string_out_iterator that is faster than
ostream_iterator (but not fast enough): now my own sed
searches&replaces a >30Kb-sized html file more than 25 times
in 0.63 secs on a PIII/500 under Linux 2.4.18 + gcc 2.95.3.

bye
Claudio

comment:2 Changed 15 years ago by John Maddock

Status: assignedclosed
Logged In: YES 
user_id=14804

A couple of points:

* The current cvs code has just been updated with a much 
faster version, however the new version still won't improve 
performance for long searches, which was already 
substantially faster than the GNU regex library for example.
* Using C++ iostreams can be rather slow, unless you are 
very careful: check that synch_with_stdio is false for example 
(otherwise the performance hit can be *huge*).
* outputting the result to a string, and then copying to stdout 
is going to be a heck of a lot slower than just copying to 
output (due to memory allocation requirements)

John Maddock.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain John Maddock.
The resolution will be deleted.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.