This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Apache::args Versus Apache::Request::param Versus CGI::param
|
457
First of all, the 6-ms difference in average processing time we saw on the fast
machine when running the light set has now grown to 110 ms. This means that the
few extra operations that
Apache::Registry performs turn out to be very expensive
on a slow machine.
Secondly, you can see that when the heavy set is used, the time difference is no
longer close to that found in the light set, as we saw on the fast machine. We
expected that the added code would take about the same time to execute in the han-
dler and the script. Instead, we see a difference of 673 ms (822 ms – 149 ms).
The explanation lies in the fact that the difference between the machines isn’t merely
in the CPU speed. It’s possible that there are many other things that are different—
for example, the size of the processor cache. If one machine has a processor cache
large enough to hold the whole handler and the other doesn’t, this can be very signif-
icant, given that in our heavy benchmark set, 99.9% of the CPU activity was dedi-
cated to running the calculation code.
This demonstrates that none of the results and conclusions made here should be
taken for granted. Most likely you will see similar behavior on your machine; how-
ever, only after you have run the benchmarks and analyzed the results can you be
sure of what is best for your situation. If you later happen to use a different machine,
make sure you run the tests again, as they may lead to a completely different deci-
sion (as we found when we tried the same benchmark on different machines).
Apache::args Versus Apache::Request::param
Versus CGI::param
Apache::args, Apache::Request::param, and CGI::param are the three most common
ways to process input arguments in mod_perl handlers and scripts. Let’s write three
Apache::Registry scripts that use Apache::args, Apache::Request::param, and CGI::
param
to process a form’s input and print it out. Notice that Apache::args is consid-
ered identical to
Apache::Request::param only when you have single-valued keys. In
the case of multi-valued keys (e.g., when using checkbox groups), you will have to
write some extra code. If you do a simple:
my %params = $r->args;
only the last value will be stored and the rest will collapse, because that’s what hap-
pens when you turn a list into a hash. Assuming that you have the following list:
(rules => 'Apache', rules => 'Perl', rules => 'mod_perl')
and assign it to a hash, the following happens:
$hash{rules} = 'Apache';
$hash{rules} = 'Perl';
$hash{rules} = 'mod_perl';
,ch13.24285 Page 457 Thursday, November 18, 2004 12:42 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
458
|
Chapter 13: TMTOWTDI: Convenience and Habit Versus Performance
So at the end only the following pair will get stored:
rules => 'mod_perl'
With CGI.pm or Apache::Request, you can solve this by extracting the whole list by its
key:
my @values = $q->param('rules');
In addition, Apache::Request and CGI.pm have many more functions that ease input
processing, such as handling file uploads. However,
Apache::Request is theoretically
much faster, since its guts are implemented in C, glued to Perl using XS code.
Assuming that the only functionality you need is the parsing of key-value pairs, and
assuming that every key has a single value, we will compare the almost identical
scripts in Examples 13-3, 13-4, and 13-5 by trying to pass various query strings.
All three scripts and the modules they use are preloaded at server startup in startup.pl:
use Apache::RegistryLoader ( );
use CGI ( );
CGI->compile('param');
use Apache::Request ( );
Example 13-3. processing_with_apache_args.pl
use strict;
my $r = shift;
$r->send_http_header('text/plain');
my %args = $r->args;
print join "\n", map {"$_ => $args{$_}" } keys %args;
Example 13-4. processing_with_apache_request.pl
use strict;
use Apache::Request ( );
my $r = shift;
my $q = Apache::Request->new($r);
$r->send_http_header('text/plain');
my %args = map {$_ => $q->param($_) } $q->param;
print join "\n", map {"$_ => $args{$_}" } keys %args;
Example 13-5. processing_with_cgi_pm.pl
use strict;
use CGI;
my $r = shift;
my $q = new CGI;
$r->send_http_header('text/plain');
my %args = map {$_ => $q->param($_) } $q->param;
print join "\n", map {"$_ => $args{$_}" } keys %args;
,ch13.24285 Page 458 Thursday, November 18, 2004 12:42 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Apache::args Versus Apache::Request::param Versus CGI::param
|
459
# Preload registry scripts
Apache::RegistryLoader->new->handler(
"/perl/processing_with_cgi_pm.pl",
"/home/httpd/perl/processing_with_cgi_pm.pl"
);
Apache::RegistryLoader->new->handler(
"/perl/processing_with_apache_request.pl",
"/home/httpd/perl/processing_with_apache_request.pl"
);
Apache::RegistryLoader->new->handler(
"/perl/processing_with_apache_args.pl",
"/home/httpd/perl/processing_with_apache_args.pl"
);
1;
We use four different query strings, generated by:
my @queries = (
join("&", map {"$_=" . 'e' x 10} ('a' 'b')),
join("&", map {"$_=" . 'e' x 50} ('a' 'b')),
join("&", map {"$_=" . 'e' x 5 } ('a' 'z')),
join("&", map {"$_=" . 'e' x 10} ('a' 'z')),
);
The first string is:
a=eeeeeeeeee&b=eeeeeeeeee
which is 25 characters in length and consists of two key/value pairs. The second
string is also made of two key/value pairs, but the values are 50 characters long (a
total of 105 characters). The third and fourth strings are each made from 26 key/
value pairs, with value lengths of 5 and 10 characters respectively and total lengths of
207 and 337 characters respectively. The
query_len column in the report table is one
of these four total lengths.
We conduct the benchmark with a concurrency level of 50 and generate 5,000
requests for each test. The results are:
name val_len pairs query_len | avtime rps
apreq 10 2 25 | 51 945
apreq 50 2 105 | 53 907
r_args 50 2 105 | 53 906
r_args 10 2 25 | 53 899
apreq 5 26 207 | 64 754
apreq 10 26 337 | 65 742
r_args 5 26 207 | 73 665
r_args 10 26 337 | 74 657
cgi_pm 50 2 105 | 85 573
cgi_pm 10 2 25 | 87 559
cgi_pm 5 26 207 | 188 263
cgi_pm 10 26 337 | 188 262
,ch13.24285 Page 459 Thursday, November 18, 2004 12:42 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
460
|
Chapter 13: TMTOWTDI: Convenience and Habit Versus Performance
where apreq stands for Apache::Request::param( ), r_args stands for Apache::args( )
or $r->args( ), and cgi_pm stands for CGI::param( ).
You can see that
Apache::Request::param and Apache::args have similar perfor-
mance with a few key/value pairs, but the former is faster with many key/value pairs.
CGI::param is significantly slower than the other two methods.
These results also suggest that the processing gets progressively slower as the num-
ber of key/value pairs grows, but longer lengths of the key/value pairs have less of a
slowdown impact. To verify that, let’s use the
Apache::Request::param method and
first test several query strings made of five key/value pairs with value lengths grow-
ing from 10 characters to 60 in steps of 10:
my @strings = map {'e' x (10*$_)} 1 6;
my @ae = ('a' 'e');
my @queries = ( );
for my $string (@strings) {
push @queries, join "&", map {"$_=$string"} @ae;
}
The results are:
val_len query_len | avtime rps
10 77 | 55 877
20 197 | 55 867
30 257 | 56 859
40 137 | 56 858
50 317 | 56 857
60 377 | 58 828
Indeed, the length of the value influences the speed very little, as we can see that the
average processing time almost doesn’t change as the length of the value grows.
Now let’s use a fixed value length of 10 characters and test with a varying number of
key/value pairs, from 2 to 26 in steps of 5:
my @az = ('a' 'z');
my @queries = map { join("&", map {"$_=" . 'e' x 10 } @az[0 $_]) }
(1, 5, 10, 15, 20, 25);
The results are:
pairs query_len | avtime rps
2 25 | 53 906
6 77 | 55 869
12 142 | 57 838
16 207 | 61 785
21 272 | 64 754
26 337 | 66 726
,ch13.24285 Page 460 Thursday, November 18, 2004 12:42 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Buffered Printing and Better print( ) Techniques
|
461
Now by looking at the average processing time column, we can see that the number
of key/value pairs makes a significant impact on processing speed.
Buffered Printing and Better print( )
Techniques
As you probably know, this statement:
local $|=1;
disables buffering of the currently select( )ed file handle (the default is STDOUT).
Under mod_perl, the STDOUT file handle is automatically tied to the output socket. If
STDOUT buffering is disabled, each print( ) call also calls ap_rflush( ) to flush
Apache’s output buffer.
When multiple
print( ) calls are used (bad style in generating output), or if there are
just too many of them, you will experience a degradation in performance. The sever-
ity depends on the number of
print( ) calls that are made.
Many old CGI scripts were written like this:
print "<body bgcolor=\"black\" text=\"white\">";
print "<h1>Hello</h1>";
print "<a href=\"foo.html\">foo</a>";
print "</body>";
This example has multiple print( ) calls, which will cause performance degradation
with
$|=1. It also uses too many backslashes. This makes the code less readable, and
it is more difficult to format the HTML so that it is easily readable as the script’s out-
put. The code below solves the problems:
print qq{
<body bgcolor="black" text="white">
<h1>Hello</h1>
<a href="foo.html">foo</a>
</body>
};
You can easily see the difference. Be careful, though, when printing an <html> tag.
The correct way is:
print qq{<html>
<head></head>
};
You can also try the following:
print qq{
<html>
<head></head>
};
,ch13.24285 Page 461 Thursday, November 18, 2004 12:42 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
462
|
Chapter 13: TMTOWTDI: Convenience and Habit Versus Performance
but note that some older browsers expect the first characters after the headers and
empty line to be
<html> with no spaces before the opening left angle bracket. If there
are any other characters, they might not accept the output as HTML might and print
it as plain text. Even if this approach works with your browser, it might not work
with others.
Another approach is to use the here document style:
print <<EOT;
<html>
<head></head>
EOT
Performance-wise, the qq{} and here document styles compile down to exactly the
same code, so there should not be any real difference between them.
Remember that the closing tag of the here document style (
EOT in our example) must
be aligned to the left side of the line, with no spaces or other characters before it and
nothing but a newline after it.
Yet another technique is to pass the arguments to
print( ) as a list:
print "<body bgcolor=\"black\" text=\"white\">",
"<h1>Hello</h1>",
"<a href=\"foo.html\">foo</a>",
"</body>";
This technique makes fewer print( ) calls but still suffers from so-called backslashitis
(quotation marks used in HTML need to be prefixed with a backslash). Single quotes
can be used instead:
'<a href="foo.html">foo</a>'
but then how do we insert a variable? The string will need to be split again:
'<a href="',$foo,'.html">', $foo, '</a>'
This is ugly, but it’s a matter of taste. We tend to use the qq operator:
print qq{<a href="$foo.html">$foo</a>
Some text
<img src="bar.png" alt="bar" width="1" height="1">
};
What if you want to make fewer print( ) calls, but you don’t have the output ready all
at once? One approach is to buffer the output in the array and then print it all at once:
my @buffer = ( );
push @buffer, "<body bgcolor=\"black\" text=\"white\">";
push @buffer, "<h1>Hello</h1>";
push @buffer, "<a href=\"foo.html\">foo</a>";
push @buffer, "</body>";
print @buffer;
,ch13.24285 Page 462 Thursday, November 18, 2004 12:42 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Buffered Printing and Better print( ) Techniques
|
463
An even better technique is to pass print( ) a reference to the string. The print( )
used under Apache overloads the default CORE::print( ) and knows that it should
automatically dereference any reference passed to it. Therefore, it’s more efficient to
pass strings by reference, as it avoids the overhead of copying.
my $buffer = "<body bgcolor=\"black\" text=\"white\">";
$buffer .= "<h1>Hello</h1>";
$buffer .= "<a href=\"foo.html\">foo</a>";
$buffer .= "</body>";
print \$buffer;
If you print references in this way, your code will not be backward compatible with
mod_cgi, which uses the
CORE::print( ) function.
Now to the benchmarks. Let’s compare the printing techniques we have just dis-
cussed. The benchmark that we are going to use is shown in Example 13-6.
Example 13-6. benchmarks/print.pl
use Benchmark;
use Symbol;
my $fh = gensym;
open $fh, ">/dev/null" or die;
my @text = (
"<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML//EN\">\n",
"<HTML>\n",
" <HEAD>\n",
" <TITLE>\n",
" Test page\n",
" </TITLE>\n",
" </HEAD>\n",
" <BODY BGCOLOR=\"black\" TEXT=\"white\">\n",
" <H1>\n",
" Test page \n",
" </H1>\n",
" <A HREF=\"foo.html\">foo</A>\n",
"text line that emulates some real output\n" x 100,
" <HR>\n",
" </BODY>\n",
"</HTML>\n",
);
my $text = join "", @text;
sub multi {
my @copy = @text;
my_print($_) for @copy;
}
sub single {
my $copy = $text;
,ch13.24285 Page 463 Thursday, November 18, 2004 12:42 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
464
|
Chapter 13: TMTOWTDI: Convenience and Habit Versus Performance
my_print($copy);
}
sub array {
my @copy = @text;
my_print(@copy);
}
sub ref_arr {
my @refs = \(@text);
my_print(@refs);
}
sub concat {
my $buffer;
$buffer .= $_ for @text;
my_print($buffer);
}
sub my_join {
my $buffer = join '', @text;
my_print($buffer);
}
sub my_print {
for (@_) {
print $fh ref($_) ? $$_ : $_;
}
}
timethese(100_000, {
join => \&my_join,
array => \&array,
ref_arr => \&ref_arr,
multi => \&multi,
single => \&single,
concat => \&concat,
});
timethese(100_000, {
'array /b' => sub {my $ofh=select($fh);$|=0;select($ofh); array( ) },
'array /u' => sub {my $ofh=select($fh);$|=1;select($ofh); array( ) },
'ref_arr/b' => sub {my $ofh=select($fh);$|=0;select($ofh); ref_arr( )},
'ref_arr/u' => sub {my $ofh=select($fh);$|=1;select($ofh); ref_arr( )},
'multi /b' => sub {my $ofh=select($fh);$|=0;select($ofh); multi( ) },
'multi /u' => sub {my $ofh=select($fh);$|=1;select($ofh); multi( ) },
'single /b' => sub {my $ofh=select($fh);$|=0;select($ofh); single( ) },
'single /u' => sub {my $ofh=select($fh);$|=1;select($ofh); single( ) },
'concat /b' => sub {my $ofh=select($fh);$|=0;select($ofh); concat( ) },
'concat /u' => sub {my $ofh=select($fh);$|=1;select($ofh); concat( ) },
'join /b' => sub {my $ofh=select($fh);$|=0;select($ofh); my_join( )},
Example 13-6. benchmarks/print.pl (continued)
,ch13.24285 Page 464 Thursday, November 18, 2004 12:42 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
Buffered Printing and Better print( ) Techniques
|
465
Under Perl 5.6.0 on Linux, the first set of results, sorted by CPU clocks, is:
Benchmark: timing 100000 iterations of array, concat, multi, ref_array
single: 6 wallclock secs ( 5.42 usr + 0.16 sys = 5.58 CPU)
join: 8 wallclock secs ( 8.63 usr + 0.14 sys = 8.77 CPU)
concat: 12 wallclock secs (10.57 usr + 0.31 sys = 10.88 CPU)
ref_arr: 14 wallclock secs (11.92 usr + 0.13 sys = 12.05 CPU)
array: 15 wallclock secs (12.95 usr + 0.26 sys = 13.21 CPU)
multi: 38 wallclock secs (34.94 usr + 0.25 sys = 35.19 CPU)
single string print is obviously the fastest; join, concatination of string, array of refer-
ences to string, and array of strings are very close to each other (the results may vary
according to the length of the strings); and print call per string is the slowest.
Now let’s look at the same benchmark, where the printing was either buffered or not:
Benchmark: timing 100000 iterations of
single /b: 10 wallclock secs ( 8.34 usr + 0.23 sys = 8.57 CPU)
single /u: 10 wallclock secs ( 8.57 usr + 0.25 sys = 8.82 CPU)
join /b: 13 wallclock secs (11.49 usr + 0.27 sys = 11.76 CPU)
join /u: 12 wallclock secs (11.80 usr + 0.18 sys = 11.98 CPU)
concat /b: 14 wallclock secs (13.73 usr + 0.17 sys = 13.90 CPU)
concat /u: 16 wallclock secs (13.98 usr + 0.15 sys = 14.13 CPU)
ref_arr/b: 15 wallclock secs (14.95 usr + 0.20 sys = 15.15 CPU)
array /b: 16 wallclock secs (16.06 usr + 0.23 sys = 16.29 CPU)
ref_arr/u: 18 wallclock secs (16.85 usr + 0.98 sys = 17.83 CPU)
array /u: 19 wallclock secs (17.65 usr + 1.06 sys = 18.71 CPU)
multi /b: 41 wallclock secs (37.89 usr + 0.28 sys = 38.17 CPU)
multi /u: 48 wallclock secs (43.24 usr + 1.67 sys = 44.91 CPU)
First, we see the same picture among different printing techniques. Second, we can
see that the buffered print is always faster, but only in the case where
print() is
called for each short string does it have a significant speed impact.
Now let’s go back to the
$|=1 topic. You might still decide to disable buffering, for
two reasons:
• You use relatively few
print( ) calls. You achieve this by arranging for print( )
statements to print multiline text, not one line per print( ) statement.
• You want your users to see output immediately. If you are about to produce the
results of a database query that might take some time to complete, you might
want users to get some feedback while they are waiting. Ask yourself whether
you prefer getting the output a bit slower but steadily from the moment you
press the Submit button, or having to watch the “falling stars” for a while and
then getting the whole output at once, even if it’s a few milliseconds faster—
assuming the browser didn’t time out during the wait.
'join /u' => sub {my $ofh=select($fh);$|=1;select($ofh); my_join( )},
});
Example 13-6. benchmarks/print.pl (continued)
,ch13.24285 Page 465 Thursday, November 18, 2004 12:42 PM
This is the Title of the Book, eMatter Edition
Copyright © 2004 O’Reilly & Associates, Inc. All rights reserved.
466
|
Chapter 13: TMTOWTDI: Convenience and Habit Versus Performance
An even better solution is to keep buffering enabled and call $r->rflush( ) to flush
the buffers when needed. This way you can place the first part of the page you are
sending in the buffer and flush it a moment before you perform a lengthy operation
such as a database query. This kills two birds with the same stone: you show some of
the data to the user immediately so she will see that something is actually happen-
ing, and you don’t suffer from the performance hit caused by disabling buffering.
Here is an example of such code:
use CGI ( );
my $r = shift;
my $q = new CGI;
print $q->header('text/html');
print $q->start_html;
print $q->p("Searching Please wait");
$r->rflush;
# imitate a lengthy operation
for (1 5) {
sleep 1;
}
print $q->p("Done!");
The script prints the beginning of the HTML document along with a nice request to
wait by flushing the output buffer just before it starts the lengthy operation.
Now let’s run the web benchmark and compare the performance of buffered versus
unbuffered printing in the multi-printing code used in the last benchmark. We are
going to use two identical handlers, the first handler having its
STDOUT stream (tied to
socket) unbuffered. The code appears in Example 13-7.
Example 13-7. Book/UnBuffered.pm
package Book::UnBuffered;
use Apache::Constants qw(:common);
local $|=1; # Switch off buffering.
sub handler {
my $r = shift;
$r->send_http_header('text/html');
print "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML//EN\">\n";
print "<html>\n";
print " <head>\n";
print " <title>\n";
print " Test page\n";
print " </title>\n";
print " </head>\n";
print " <body bgcolor=\"black\" text=\"white\">\n";
print " <h1> \n";
print " Test page \n";
print " </h1>\n";
print " <a href=\"foo.html\">foo</a>\n" for 1 100;
print " <hr>\n";
,ch13.24285 Page 466 Thursday, November 18, 2004 12:42 PM
Không có nhận xét nào:
Đăng nhận xét