[ Index ] |
PHP Cross Reference of Unnamed Project |
[Summary view] [Print] [Text view]
1 =head1 NAME 2 X<operator> 3 4 perlop - Perl operators and precedence 5 6 =head1 DESCRIPTION 7 8 =head2 Operator Precedence and Associativity 9 X<operator, precedence> X<precedence> X<associativity> 10 11 Operator precedence and associativity work in Perl more or less like 12 they do in mathematics. 13 14 I<Operator precedence> means some operators are evaluated before 15 others. For example, in C<2 + 4 * 5>, the multiplication has higher 16 precedence so C<4 * 5> is evaluated first yielding C<2 + 20 == 17 22> and not C<6 * 5 == 30>. 18 19 I<Operator associativity> defines what happens if a sequence of the 20 same operators is used one after another: whether the evaluator will 21 evaluate the left operations first or the right. For example, in C<8 22 - 4 - 2>, subtraction is left associative so Perl evaluates the 23 expression left to right. C<8 - 4> is evaluated first making the 24 expression C<4 - 2 == 2> and not C<8 - 2 == 6>. 25 26 Perl operators have the following associativity and precedence, 27 listed from highest precedence to lowest. Operators borrowed from 28 C keep the same precedence relationship with each other, even where 29 C's precedence is slightly screwy. (This makes learning Perl easier 30 for C folks.) With very few exceptions, these all operate on scalar 31 values only, not array values. 32 33 left terms and list operators (leftward) 34 left -> 35 nonassoc ++ -- 36 right ** 37 right ! ~ \ and unary + and - 38 left =~ !~ 39 left * / % x 40 left + - . 41 left << >> 42 nonassoc named unary operators 43 nonassoc < > <= >= lt gt le ge 44 nonassoc == != <=> eq ne cmp ~~ 45 left & 46 left | ^ 47 left && 48 left || // 49 nonassoc .. ... 50 right ?: 51 right = += -= *= etc. 52 left , => 53 nonassoc list operators (rightward) 54 right not 55 left and 56 left or xor 57 58 In the following sections, these operators are covered in precedence order. 59 60 Many operators can be overloaded for objects. See L<overload>. 61 62 =head2 Terms and List Operators (Leftward) 63 X<list operator> X<operator, list> X<term> 64 65 A TERM has the highest precedence in Perl. They include variables, 66 quote and quote-like operators, any expression in parentheses, 67 and any function whose arguments are parenthesized. Actually, there 68 aren't really functions in this sense, just list operators and unary 69 operators behaving as functions because you put parentheses around 70 the arguments. These are all documented in L<perlfunc>. 71 72 If any list operator (print(), etc.) or any unary operator (chdir(), etc.) 73 is followed by a left parenthesis as the next token, the operator and 74 arguments within parentheses are taken to be of highest precedence, 75 just like a normal function call. 76 77 In the absence of parentheses, the precedence of list operators such as 78 C<print>, C<sort>, or C<chmod> is either very high or very low depending on 79 whether you are looking at the left side or the right side of the operator. 80 For example, in 81 82 @ary = (1, 3, sort 4, 2); 83 print @ary; # prints 1324 84 85 the commas on the right of the sort are evaluated before the sort, 86 but the commas on the left are evaluated after. In other words, 87 list operators tend to gobble up all arguments that follow, and 88 then act like a simple TERM with regard to the preceding expression. 89 Be careful with parentheses: 90 91 # These evaluate exit before doing the print: 92 print($foo, exit); # Obviously not what you want. 93 print $foo, exit; # Nor is this. 94 95 # These do the print before evaluating exit: 96 (print $foo), exit; # This is what you want. 97 print($foo), exit; # Or this. 98 print ($foo), exit; # Or even this. 99 100 Also note that 101 102 print ($foo & 255) + 1, "\n"; 103 104 probably doesn't do what you expect at first glance. The parentheses 105 enclose the argument list for C<print> which is evaluated (printing 106 the result of C<$foo & 255>). Then one is added to the return value 107 of C<print> (usually 1). The result is something like this: 108 109 1 + 1, "\n"; # Obviously not what you meant. 110 111 To do what you meant properly, you must write: 112 113 print(($foo & 255) + 1, "\n"); 114 115 See L<Named Unary Operators> for more discussion of this. 116 117 Also parsed as terms are the C<do {}> and C<eval {}> constructs, as 118 well as subroutine and method calls, and the anonymous 119 constructors C<[]> and C<{}>. 120 121 See also L<Quote and Quote-like Operators> toward the end of this section, 122 as well as L</"I/O Operators">. 123 124 =head2 The Arrow Operator 125 X<arrow> X<dereference> X<< -> >> 126 127 "C<< -> >>" is an infix dereference operator, just as it is in C 128 and C++. If the right side is either a C<[...]>, C<{...}>, or a 129 C<(...)> subscript, then the left side must be either a hard or 130 symbolic reference to an array, a hash, or a subroutine respectively. 131 (Or technically speaking, a location capable of holding a hard 132 reference, if it's an array or hash reference being used for 133 assignment.) See L<perlreftut> and L<perlref>. 134 135 Otherwise, the right side is a method name or a simple scalar 136 variable containing either the method name or a subroutine reference, 137 and the left side must be either an object (a blessed reference) 138 or a class name (that is, a package name). See L<perlobj>. 139 140 =head2 Auto-increment and Auto-decrement 141 X<increment> X<auto-increment> X<++> X<decrement> X<auto-decrement> X<--> 142 143 "++" and "--" work as in C. That is, if placed before a variable, 144 they increment or decrement the variable by one before returning the 145 value, and if placed after, increment or decrement after returning the 146 value. 147 148 $i = 0; $j = 0; 149 print $i++; # prints 0 150 print ++$j; # prints 1 151 152 Note that just as in C, Perl doesn't define B<when> the variable is 153 incremented or decremented. You just know it will be done sometime 154 before or after the value is returned. This also means that modifying 155 a variable twice in the same statement will lead to undefined behaviour. 156 Avoid statements like: 157 158 $i = $i ++; 159 print ++ $i + $i ++; 160 161 Perl will not guarantee what the result of the above statements is. 162 163 The auto-increment operator has a little extra builtin magic to it. If 164 you increment a variable that is numeric, or that has ever been used in 165 a numeric context, you get a normal increment. If, however, the 166 variable has been used in only string contexts since it was set, and 167 has a value that is not the empty string and matches the pattern 168 C</^[a-zA-Z]*[0-9]*\z/>, the increment is done as a string, preserving each 169 character within its range, with carry: 170 171 print ++($foo = '99'); # prints '100' 172 print ++($foo = 'a0'); # prints 'a1' 173 print ++($foo = 'Az'); # prints 'Ba' 174 print ++($foo = 'zz'); # prints 'aaa' 175 176 C<undef> is always treated as numeric, and in particular is changed 177 to C<0> before incrementing (so that a post-increment of an undef value 178 will return C<0> rather than C<undef>). 179 180 The auto-decrement operator is not magical. 181 182 =head2 Exponentiation 183 X<**> X<exponentiation> X<power> 184 185 Binary "**" is the exponentiation operator. It binds even more 186 tightly than unary minus, so -2**4 is -(2**4), not (-2)**4. (This is 187 implemented using C's pow(3) function, which actually works on doubles 188 internally.) 189 190 =head2 Symbolic Unary Operators 191 X<unary operator> X<operator, unary> 192 193 Unary "!" performs logical negation, i.e., "not". See also C<not> for a lower 194 precedence version of this. 195 X<!> 196 197 Unary "-" performs arithmetic negation if the operand is numeric. If 198 the operand is an identifier, a string consisting of a minus sign 199 concatenated with the identifier is returned. Otherwise, if the string 200 starts with a plus or minus, a string starting with the opposite sign 201 is returned. One effect of these rules is that -bareword is equivalent 202 to the string "-bareword". If, however, the string begins with a 203 non-alphabetic character (excluding "+" or "-"), Perl will attempt to convert 204 the string to a numeric and the arithmetic negation is performed. If the 205 string cannot be cleanly converted to a numeric, Perl will give the warning 206 B<Argument "the string" isn't numeric in negation (-) at ...>. 207 X<-> X<negation, arithmetic> 208 209 Unary "~" performs bitwise negation, i.e., 1's complement. For 210 example, C<0666 & ~027> is 0640. (See also L<Integer Arithmetic> and 211 L<Bitwise String Operators>.) Note that the width of the result is 212 platform-dependent: ~0 is 32 bits wide on a 32-bit platform, but 64 213 bits wide on a 64-bit platform, so if you are expecting a certain bit 214 width, remember to use the & operator to mask off the excess bits. 215 X<~> X<negation, binary> 216 217 Unary "+" has no effect whatsoever, even on strings. It is useful 218 syntactically for separating a function name from a parenthesized expression 219 that would otherwise be interpreted as the complete list of function 220 arguments. (See examples above under L<Terms and List Operators (Leftward)>.) 221 X<+> 222 223 Unary "\" creates a reference to whatever follows it. See L<perlreftut> 224 and L<perlref>. Do not confuse this behavior with the behavior of 225 backslash within a string, although both forms do convey the notion 226 of protecting the next thing from interpolation. 227 X<\> X<reference> X<backslash> 228 229 =head2 Binding Operators 230 X<binding> X<operator, binding> X<=~> X<!~> 231 232 Binary "=~" binds a scalar expression to a pattern match. Certain operations 233 search or modify the string $_ by default. This operator makes that kind 234 of operation work on some other string. The right argument is a search 235 pattern, substitution, or transliteration. The left argument is what is 236 supposed to be searched, substituted, or transliterated instead of the default 237 $_. When used in scalar context, the return value generally indicates the 238 success of the operation. Behavior in list context depends on the particular 239 operator. See L</"Regexp Quote-Like Operators"> for details and 240 L<perlretut> for examples using these operators. 241 242 If the right argument is an expression rather than a search pattern, 243 substitution, or transliteration, it is interpreted as a search pattern at run 244 time. Note that this means that its contents will be interpolated twice, so 245 246 '\\' =~ q'\\'; 247 248 is not ok, as the regex engine will end up trying to compile the 249 pattern C<\>, which it will consider a syntax error. 250 251 Binary "!~" is just like "=~" except the return value is negated in 252 the logical sense. 253 254 =head2 Multiplicative Operators 255 X<operator, multiplicative> 256 257 Binary "*" multiplies two numbers. 258 X<*> 259 260 Binary "/" divides two numbers. 261 X</> X<slash> 262 263 Binary "%" computes the modulus of two numbers. Given integer 264 operands C<$a> and C<$b>: If C<$b> is positive, then C<$a % $b> is 265 C<$a> minus the largest multiple of C<$b> that is not greater than 266 C<$a>. If C<$b> is negative, then C<$a % $b> is C<$a> minus the 267 smallest multiple of C<$b> that is not less than C<$a> (i.e. the 268 result will be less than or equal to zero). If the operands 269 C<$a> and C<$b> are floating point values and the absolute value of 270 C<$b> (that is C<abs($b)>) is less than C<(UV_MAX + 1)>, only 271 the integer portion of C<$a> and C<$b> will be used in the operation 272 (Note: here C<UV_MAX> means the maximum of the unsigned integer type). 273 If the absolute value of the right operand (C<abs($b)>) is greater than 274 or equal to C<(UV_MAX + 1)>, "%" computes the floating-point remainder 275 C<$r> in the equation C<($r = $a - $i*$b)> where C<$i> is a certain 276 integer that makes C<$r> should have the same sign as the right operand 277 C<$b> (B<not> as the left operand C<$a> like C function C<fmod()>) 278 and the absolute value less than that of C<$b>. 279 Note that when C<use integer> is in scope, "%" gives you direct access 280 to the modulus operator as implemented by your C compiler. This 281 operator is not as well defined for negative operands, but it will 282 execute faster. 283 X<%> X<remainder> X<modulus> X<mod> 284 285 Binary "x" is the repetition operator. In scalar context or if the left 286 operand is not enclosed in parentheses, it returns a string consisting 287 of the left operand repeated the number of times specified by the right 288 operand. In list context, if the left operand is enclosed in 289 parentheses or is a list formed by C<qw/STRING/>, it repeats the list. 290 If the right operand is zero or negative, it returns an empty string 291 or an empty list, depending on the context. 292 X<x> 293 294 print '-' x 80; # print row of dashes 295 296 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over 297 298 @ones = (1) x 80; # a list of 80 1's 299 @ones = (5) x @ones; # set all elements to 5 300 301 302 =head2 Additive Operators 303 X<operator, additive> 304 305 Binary "+" returns the sum of two numbers. 306 X<+> 307 308 Binary "-" returns the difference of two numbers. 309 X<-> 310 311 Binary "." concatenates two strings. 312 X<string, concatenation> X<concatenation> 313 X<cat> X<concat> X<concatenate> X<.> 314 315 =head2 Shift Operators 316 X<shift operator> X<operator, shift> X<<< << >>> 317 X<<< >> >>> X<right shift> X<left shift> X<bitwise shift> 318 X<shl> X<shr> X<shift, right> X<shift, left> 319 320 Binary "<<" returns the value of its left argument shifted left by the 321 number of bits specified by the right argument. Arguments should be 322 integers. (See also L<Integer Arithmetic>.) 323 324 Binary ">>" returns the value of its left argument shifted right by 325 the number of bits specified by the right argument. Arguments should 326 be integers. (See also L<Integer Arithmetic>.) 327 328 Note that both "<<" and ">>" in Perl are implemented directly using 329 "<<" and ">>" in C. If C<use integer> (see L<Integer Arithmetic>) is 330 in force then signed C integers are used, else unsigned C integers are 331 used. Either way, the implementation isn't going to generate results 332 larger than the size of the integer type Perl was built with (32 bits 333 or 64 bits). 334 335 The result of overflowing the range of the integers is undefined 336 because it is undefined also in C. In other words, using 32-bit 337 integers, C<< 1 << 32 >> is undefined. Shifting by a negative number 338 of bits is also undefined. 339 340 =head2 Named Unary Operators 341 X<operator, named unary> 342 343 The various named unary operators are treated as functions with one 344 argument, with optional parentheses. 345 346 If any list operator (print(), etc.) or any unary operator (chdir(), etc.) 347 is followed by a left parenthesis as the next token, the operator and 348 arguments within parentheses are taken to be of highest precedence, 349 just like a normal function call. For example, 350 because named unary operators are higher precedence than ||: 351 352 chdir $foo || die; # (chdir $foo) || die 353 chdir($foo) || die; # (chdir $foo) || die 354 chdir ($foo) || die; # (chdir $foo) || die 355 chdir +($foo) || die; # (chdir $foo) || die 356 357 but, because * is higher precedence than named operators: 358 359 chdir $foo * 20; # chdir ($foo * 20) 360 chdir($foo) * 20; # (chdir $foo) * 20 361 chdir ($foo) * 20; # (chdir $foo) * 20 362 chdir +($foo) * 20; # chdir ($foo * 20) 363 364 rand 10 * 20; # rand (10 * 20) 365 rand(10) * 20; # (rand 10) * 20 366 rand (10) * 20; # (rand 10) * 20 367 rand +(10) * 20; # rand (10 * 20) 368 369 Regarding precedence, the filetest operators, like C<-f>, C<-M>, etc. are 370 treated like named unary operators, but they don't follow this functional 371 parenthesis rule. That means, for example, that C<-f($file).".bak"> is 372 equivalent to C<-f "$file.bak">. 373 X<-X> X<filetest> X<operator, filetest> 374 375 See also L<"Terms and List Operators (Leftward)">. 376 377 =head2 Relational Operators 378 X<relational operator> X<operator, relational> 379 380 Binary "<" returns true if the left argument is numerically less than 381 the right argument. 382 X<< < >> 383 384 Binary ">" returns true if the left argument is numerically greater 385 than the right argument. 386 X<< > >> 387 388 Binary "<=" returns true if the left argument is numerically less than 389 or equal to the right argument. 390 X<< <= >> 391 392 Binary ">=" returns true if the left argument is numerically greater 393 than or equal to the right argument. 394 X<< >= >> 395 396 Binary "lt" returns true if the left argument is stringwise less than 397 the right argument. 398 X<< lt >> 399 400 Binary "gt" returns true if the left argument is stringwise greater 401 than the right argument. 402 X<< gt >> 403 404 Binary "le" returns true if the left argument is stringwise less than 405 or equal to the right argument. 406 X<< le >> 407 408 Binary "ge" returns true if the left argument is stringwise greater 409 than or equal to the right argument. 410 X<< ge >> 411 412 =head2 Equality Operators 413 X<equality> X<equal> X<equals> X<operator, equality> 414 415 Binary "==" returns true if the left argument is numerically equal to 416 the right argument. 417 X<==> 418 419 Binary "!=" returns true if the left argument is numerically not equal 420 to the right argument. 421 X<!=> 422 423 Binary "<=>" returns -1, 0, or 1 depending on whether the left 424 argument is numerically less than, equal to, or greater than the right 425 argument. If your platform supports NaNs (not-a-numbers) as numeric 426 values, using them with "<=>" returns undef. NaN is not "<", "==", ">", 427 "<=" or ">=" anything (even NaN), so those 5 return false. NaN != NaN 428 returns true, as does NaN != anything else. If your platform doesn't 429 support NaNs then NaN is just a string with numeric value 0. 430 X<< <=> >> X<spaceship> 431 432 perl -le '$a = "NaN"; print "No NaN support here" if $a == $a' 433 perl -le '$a = "NaN"; print "NaN support here" if $a != $a' 434 435 Binary "eq" returns true if the left argument is stringwise equal to 436 the right argument. 437 X<eq> 438 439 Binary "ne" returns true if the left argument is stringwise not equal 440 to the right argument. 441 X<ne> 442 443 Binary "cmp" returns -1, 0, or 1 depending on whether the left 444 argument is stringwise less than, equal to, or greater than the right 445 argument. 446 X<cmp> 447 448 Binary "~~" does a smart match between its arguments. Smart matching 449 is described in L<perlsyn/"Smart matching in detail">. 450 This operator is only available if you enable the "~~" feature: 451 see L<feature> for more information. 452 X<~~> 453 454 "lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified 455 by the current locale if C<use locale> is in effect. See L<perllocale>. 456 457 =head2 Bitwise And 458 X<operator, bitwise, and> X<bitwise and> X<&> 459 460 Binary "&" returns its operands ANDed together bit by bit. 461 (See also L<Integer Arithmetic> and L<Bitwise String Operators>.) 462 463 Note that "&" has lower priority than relational operators, so for example 464 the brackets are essential in a test like 465 466 print "Even\n" if ($x & 1) == 0; 467 468 =head2 Bitwise Or and Exclusive Or 469 X<operator, bitwise, or> X<bitwise or> X<|> X<operator, bitwise, xor> 470 X<bitwise xor> X<^> 471 472 Binary "|" returns its operands ORed together bit by bit. 473 (See also L<Integer Arithmetic> and L<Bitwise String Operators>.) 474 475 Binary "^" returns its operands XORed together bit by bit. 476 (See also L<Integer Arithmetic> and L<Bitwise String Operators>.) 477 478 Note that "|" and "^" have lower priority than relational operators, so 479 for example the brackets are essential in a test like 480 481 print "false\n" if (8 | 2) != 10; 482 483 =head2 C-style Logical And 484 X<&&> X<logical and> X<operator, logical, and> 485 486 Binary "&&" performs a short-circuit logical AND operation. That is, 487 if the left operand is false, the right operand is not even evaluated. 488 Scalar or list context propagates down to the right operand if it 489 is evaluated. 490 491 =head2 C-style Logical Or 492 X<||> X<operator, logical, or> 493 494 Binary "||" performs a short-circuit logical OR operation. That is, 495 if the left operand is true, the right operand is not even evaluated. 496 Scalar or list context propagates down to the right operand if it 497 is evaluated. 498 499 =head2 C-style Logical Defined-Or 500 X<//> X<operator, logical, defined-or> 501 502 Although it has no direct equivalent in C, Perl's C<//> operator is related 503 to its C-style or. In fact, it's exactly the same as C<||>, except that it 504 tests the left hand side's definedness instead of its truth. Thus, C<$a // $b> 505 is similar to C<defined($a) || $b> (except that it returns the value of C<$a> 506 rather than the value of C<defined($a)>) and is exactly equivalent to 507 C<defined($a) ? $a : $b>. This is very useful for providing default values 508 for variables. If you actually want to test if at least one of C<$a> and 509 C<$b> is defined, use C<defined($a // $b)>. 510 511 The C<||>, C<//> and C<&&> operators return the last value evaluated 512 (unlike C's C<||> and C<&&>, which return 0 or 1). Thus, a reasonably 513 portable way to find out the home directory might be: 514 515 $home = $ENV{'HOME'} // $ENV{'LOGDIR'} // 516 (getpwuid($<))[7] // die "You're homeless!\n"; 517 518 In particular, this means that you shouldn't use this 519 for selecting between two aggregates for assignment: 520 521 @a = @b || @c; # this is wrong 522 @a = scalar(@b) || @c; # really meant this 523 @a = @b ? @b : @c; # this works fine, though 524 525 As more readable alternatives to C<&&> and C<||> when used for 526 control flow, Perl provides the C<and> and C<or> operators (see below). 527 The short-circuit behavior is identical. The precedence of "and" 528 and "or" is much lower, however, so that you can safely use them after a 529 list operator without the need for parentheses: 530 531 unlink "alpha", "beta", "gamma" 532 or gripe(), next LINE; 533 534 With the C-style operators that would have been written like this: 535 536 unlink("alpha", "beta", "gamma") 537 || (gripe(), next LINE); 538 539 Using "or" for assignment is unlikely to do what you want; see below. 540 541 =head2 Range Operators 542 X<operator, range> X<range> X<..> X<...> 543 544 Binary ".." is the range operator, which is really two different 545 operators depending on the context. In list context, it returns a 546 list of values counting (up by ones) from the left value to the right 547 value. If the left value is greater than the right value then it 548 returns the empty list. The range operator is useful for writing 549 C<foreach (1..10)> loops and for doing slice operations on arrays. In 550 the current implementation, no temporary array is created when the 551 range operator is used as the expression in C<foreach> loops, but older 552 versions of Perl might burn a lot of memory when you write something 553 like this: 554 555 for (1 .. 1_000_000) { 556 # code 557 } 558 559 The range operator also works on strings, using the magical auto-increment, 560 see below. 561 562 In scalar context, ".." returns a boolean value. The operator is 563 bistable, like a flip-flop, and emulates the line-range (comma) operator 564 of B<sed>, B<awk>, and various editors. Each ".." operator maintains its 565 own boolean state. It is false as long as its left operand is false. 566 Once the left operand is true, the range operator stays true until the 567 right operand is true, I<AFTER> which the range operator becomes false 568 again. It doesn't become false till the next time the range operator is 569 evaluated. It can test the right operand and become false on the same 570 evaluation it became true (as in B<awk>), but it still returns true once. 571 If you don't want it to test the right operand till the next 572 evaluation, as in B<sed>, just use three dots ("...") instead of 573 two. In all other regards, "..." behaves just like ".." does. 574 575 The right operand is not evaluated while the operator is in the 576 "false" state, and the left operand is not evaluated while the 577 operator is in the "true" state. The precedence is a little lower 578 than || and &&. The value returned is either the empty string for 579 false, or a sequence number (beginning with 1) for true. The 580 sequence number is reset for each range encountered. The final 581 sequence number in a range has the string "E0" appended to it, which 582 doesn't affect its numeric value, but gives you something to search 583 for if you want to exclude the endpoint. You can exclude the 584 beginning point by waiting for the sequence number to be greater 585 than 1. 586 587 If either operand of scalar ".." is a constant expression, 588 that operand is considered true if it is equal (C<==>) to the current 589 input line number (the C<$.> variable). 590 591 To be pedantic, the comparison is actually C<int(EXPR) == int(EXPR)>, 592 but that is only an issue if you use a floating point expression; when 593 implicitly using C<$.> as described in the previous paragraph, the 594 comparison is C<int(EXPR) == int($.)> which is only an issue when C<$.> 595 is set to a floating point value and you are not reading from a file. 596 Furthermore, C<"span" .. "spat"> or C<2.18 .. 3.14> will not do what 597 you want in scalar context because each of the operands are evaluated 598 using their integer representation. 599 600 Examples: 601 602 As a scalar operator: 603 604 if (101 .. 200) { print; } # print 2nd hundred lines, short for 605 # if ($. == 101 .. $. == 200) ... 606 607 next LINE if (1 .. /^$/); # skip header lines, short for 608 # ... if ($. == 1 .. /^$/); 609 # (typically in a loop labeled LINE) 610 611 s/^/> / if (/^$/ .. eof()); # quote body 612 613 # parse mail messages 614 while (<>) { 615 $in_header = 1 .. /^$/; 616 $in_body = /^$/ .. eof; 617 if ($in_header) { 618 # ... 619 } else { # in body 620 # ... 621 } 622 } continue { 623 close ARGV if eof; # reset $. each file 624 } 625 626 Here's a simple example to illustrate the difference between 627 the two range operators: 628 629 @lines = (" - Foo", 630 "01 - Bar", 631 "1 - Baz", 632 " - Quux"); 633 634 foreach (@lines) { 635 if (/0/ .. /1/) { 636 print "$_\n"; 637 } 638 } 639 640 This program will print only the line containing "Bar". If 641 the range operator is changed to C<...>, it will also print the 642 "Baz" line. 643 644 And now some examples as a list operator: 645 646 for (101 .. 200) { print; } # print $_ 100 times 647 @foo = @foo[0 .. $#foo]; # an expensive no-op 648 @foo = @foo[$#foo-4 .. $#foo]; # slice last 5 items 649 650 The range operator (in list context) makes use of the magical 651 auto-increment algorithm if the operands are strings. You 652 can say 653 654 @alphabet = ('A' .. 'Z'); 655 656 to get all normal letters of the English alphabet, or 657 658 $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15]; 659 660 to get a hexadecimal digit, or 661 662 @z2 = ('01' .. '31'); print $z2[$mday]; 663 664 to get dates with leading zeros. 665 666 If the final value specified is not in the sequence that the magical 667 increment would produce, the sequence goes until the next value would 668 be longer than the final value specified. 669 670 If the initial value specified isn't part of a magical increment 671 sequence (that is, a non-empty string matching "/^[a-zA-Z]*[0-9]*\z/"), 672 only the initial value will be returned. So the following will only 673 return an alpha: 674 675 use charnames 'greek'; 676 my @greek_small = ("\N{alpha}" .. "\N{omega}"); 677 678 To get lower-case greek letters, use this instead: 679 680 my @greek_small = map { chr } ( ord("\N{alpha}") .. ord("\N{omega}") ); 681 682 Because each operand is evaluated in integer form, C<2.18 .. 3.14> will 683 return two elements in list context. 684 685 @list = (2.18 .. 3.14); # same as @list = (2 .. 3); 686 687 =head2 Conditional Operator 688 X<operator, conditional> X<operator, ternary> X<ternary> X<?:> 689 690 Ternary "?:" is the conditional operator, just as in C. It works much 691 like an if-then-else. If the argument before the ? is true, the 692 argument before the : is returned, otherwise the argument after the : 693 is returned. For example: 694 695 printf "I have %d dog%s.\n", $n, 696 ($n == 1) ? '' : "s"; 697 698 Scalar or list context propagates downward into the 2nd 699 or 3rd argument, whichever is selected. 700 701 $a = $ok ? $b : $c; # get a scalar 702 @a = $ok ? @b : @c; # get an array 703 $a = $ok ? @b : @c; # oops, that's just a count! 704 705 The operator may be assigned to if both the 2nd and 3rd arguments are 706 legal lvalues (meaning that you can assign to them): 707 708 ($a_or_b ? $a : $b) = $c; 709 710 Because this operator produces an assignable result, using assignments 711 without parentheses will get you in trouble. For example, this: 712 713 $a % 2 ? $a += 10 : $a += 2 714 715 Really means this: 716 717 (($a % 2) ? ($a += 10) : $a) += 2 718 719 Rather than this: 720 721 ($a % 2) ? ($a += 10) : ($a += 2) 722 723 That should probably be written more simply as: 724 725 $a += ($a % 2) ? 10 : 2; 726 727 =head2 Assignment Operators 728 X<assignment> X<operator, assignment> X<=> X<**=> X<+=> X<*=> X<&=> 729 X<<< <<= >>> X<&&=> X<-=> X</=> X<|=> X<<< >>= >>> X<||=> X<//=> X<.=> 730 X<%=> X<^=> X<x=> 731 732 "=" is the ordinary assignment operator. 733 734 Assignment operators work as in C. That is, 735 736 $a += 2; 737 738 is equivalent to 739 740 $a = $a + 2; 741 742 although without duplicating any side effects that dereferencing the lvalue 743 might trigger, such as from tie(). Other assignment operators work similarly. 744 The following are recognized: 745 746 **= += *= &= <<= &&= 747 -= /= |= >>= ||= 748 .= %= ^= //= 749 x= 750 751 Although these are grouped by family, they all have the precedence 752 of assignment. 753 754 Unlike in C, the scalar assignment operator produces a valid lvalue. 755 Modifying an assignment is equivalent to doing the assignment and 756 then modifying the variable that was assigned to. This is useful 757 for modifying a copy of something, like this: 758 759 ($tmp = $global) =~ tr [A-Z] [a-z]; 760 761 Likewise, 762 763 ($a += 2) *= 3; 764 765 is equivalent to 766 767 $a += 2; 768 $a *= 3; 769 770 Similarly, a list assignment in list context produces the list of 771 lvalues assigned to, and a list assignment in scalar context returns 772 the number of elements produced by the expression on the right hand 773 side of the assignment. 774 775 =head2 Comma Operator 776 X<comma> X<operator, comma> X<,> 777 778 Binary "," is the comma operator. In scalar context it evaluates 779 its left argument, throws that value away, then evaluates its right 780 argument and returns that value. This is just like C's comma operator. 781 782 In list context, it's just the list argument separator, and inserts 783 both its arguments into the list. These arguments are also evaluated 784 from left to right. 785 786 The C<< => >> operator is a synonym for the comma, but forces any word 787 (consisting entirely of word characters) to its left to be interpreted 788 as a string (as of 5.001). This includes words that might otherwise be 789 considered a constant or function call. 790 791 use constant FOO => "something"; 792 793 my %h = ( FOO => 23 ); 794 795 is equivalent to: 796 797 my %h = ("FOO", 23); 798 799 It is I<NOT>: 800 801 my %h = ("something", 23); 802 803 If the argument on the left is not a word, it is first interpreted as 804 an expression, and then the string value of that is used. 805 806 The C<< => >> operator is helpful in documenting the correspondence 807 between keys and values in hashes, and other paired elements in lists. 808 809 %hash = ( $key => $value ); 810 login( $username => $password ); 811 812 =head2 List Operators (Rightward) 813 X<operator, list, rightward> X<list operator> 814 815 On the right side of a list operator, it has very low precedence, 816 such that it controls all comma-separated expressions found there. 817 The only operators with lower precedence are the logical operators 818 "and", "or", and "not", which may be used to evaluate calls to list 819 operators without the need for extra parentheses: 820 821 open HANDLE, "filename" 822 or die "Can't open: $!\n"; 823 824 See also discussion of list operators in L<Terms and List Operators (Leftward)>. 825 826 =head2 Logical Not 827 X<operator, logical, not> X<not> 828 829 Unary "not" returns the logical negation of the expression to its right. 830 It's the equivalent of "!" except for the very low precedence. 831 832 =head2 Logical And 833 X<operator, logical, and> X<and> 834 835 Binary "and" returns the logical conjunction of the two surrounding 836 expressions. It's equivalent to && except for the very low 837 precedence. This means that it short-circuits: i.e., the right 838 expression is evaluated only if the left expression is true. 839 840 =head2 Logical or, Defined or, and Exclusive Or 841 X<operator, logical, or> X<operator, logical, xor> 842 X<operator, logical, defined or> X<operator, logical, exclusive or> 843 X<or> X<xor> 844 845 Binary "or" returns the logical disjunction of the two surrounding 846 expressions. It's equivalent to || except for the very low precedence. 847 This makes it useful for control flow 848 849 print FH $data or die "Can't write to FH: $!"; 850 851 This means that it short-circuits: i.e., the right expression is evaluated 852 only if the left expression is false. Due to its precedence, you should 853 probably avoid using this for assignment, only for control flow. 854 855 $a = $b or $c; # bug: this is wrong 856 ($a = $b) or $c; # really means this 857 $a = $b || $c; # better written this way 858 859 However, when it's a list-context assignment and you're trying to use 860 "||" for control flow, you probably need "or" so that the assignment 861 takes higher precedence. 862 863 @info = stat($file) || die; # oops, scalar sense of stat! 864 @info = stat($file) or die; # better, now @info gets its due 865 866 Then again, you could always use parentheses. 867 868 Binary "xor" returns the exclusive-OR of the two surrounding expressions. 869 It cannot short circuit, of course. 870 871 =head2 C Operators Missing From Perl 872 X<operator, missing from perl> X<&> X<*> 873 X<typecasting> X<(TYPE)> 874 875 Here is what C has that Perl doesn't: 876 877 =over 8 878 879 =item unary & 880 881 Address-of operator. (But see the "\" operator for taking a reference.) 882 883 =item unary * 884 885 Dereference-address operator. (Perl's prefix dereferencing 886 operators are typed: $, @, %, and &.) 887 888 =item (TYPE) 889 890 Type-casting operator. 891 892 =back 893 894 =head2 Quote and Quote-like Operators 895 X<operator, quote> X<operator, quote-like> X<q> X<qq> X<qx> X<qw> X<m> 896 X<qr> X<s> X<tr> X<'> X<''> X<"> X<""> X<//> X<`> X<``> X<<< << >>> 897 X<escape sequence> X<escape> 898 899 900 While we usually think of quotes as literal values, in Perl they 901 function as operators, providing various kinds of interpolating and 902 pattern matching capabilities. Perl provides customary quote characters 903 for these behaviors, but also provides a way for you to choose your 904 quote character for any of them. In the following table, a C<{}> represents 905 any pair of delimiters you choose. 906 907 Customary Generic Meaning Interpolates 908 '' q{} Literal no 909 "" qq{} Literal yes 910 `` qx{} Command yes* 911 qw{} Word list no 912 // m{} Pattern match yes* 913 qr{} Pattern yes* 914 s{}{} Substitution yes* 915 tr{}{} Transliteration no (but see below) 916 <<EOF here-doc yes* 917 918 * unless the delimiter is ''. 919 920 Non-bracketing delimiters use the same character fore and aft, but the four 921 sorts of brackets (round, angle, square, curly) will all nest, which means 922 that 923 924 q{foo{bar}baz} 925 926 is the same as 927 928 'foo{bar}baz' 929 930 Note, however, that this does not always work for quoting Perl code: 931 932 $s = q{ if($a eq "}") ... }; # WRONG 933 934 is a syntax error. The C<Text::Balanced> module (from CPAN, and 935 starting from Perl 5.8 part of the standard distribution) is able 936 to do this properly. 937 938 There can be whitespace between the operator and the quoting 939 characters, except when C<#> is being used as the quoting character. 940 C<q#foo#> is parsed as the string C<foo>, while C<q #foo#> is the 941 operator C<q> followed by a comment. Its argument will be taken 942 from the next line. This allows you to write: 943 944 s {foo} # Replace foo 945 {bar} # with bar. 946 947 The following escape sequences are available in constructs that interpolate 948 and in transliterations. 949 X<\t> X<\n> X<\r> X<\f> X<\b> X<\a> X<\e> X<\x> X<\0> X<\c> X<\N> 950 951 \t tab (HT, TAB) 952 \n newline (NL) 953 \r return (CR) 954 \f form feed (FF) 955 \b backspace (BS) 956 \a alarm (bell) (BEL) 957 \e escape (ESC) 958 \033 octal char (example: ESC) 959 \x1b hex char (example: ESC) 960 \x{263a} wide hex char (example: SMILEY) 961 \c[ control char (example: ESC) 962 \N{name} named Unicode character 963 964 The character following C<\c> is mapped to some other character by 965 converting letters to upper case and then (on ASCII systems) by inverting 966 the 7th bit (0x40). The most interesting range is from '@' to '_' 967 (0x40 through 0x5F), resulting in a control character from 0x00 968 through 0x1F. A '?' maps to the DEL character. On EBCDIC systems only 969 '@', the letters, '[', '\', ']', '^', '_' and '?' will work, resulting 970 in 0x00 through 0x1F and 0x7F. 971 972 B<NOTE>: Unlike C and other languages, Perl has no \v escape sequence for 973 the vertical tab (VT - ASCII 11), but you may use C<\ck> or C<\x0b>. 974 975 The following escape sequences are available in constructs that interpolate 976 but not in transliterations. 977 X<\l> X<\u> X<\L> X<\U> X<\E> X<\Q> 978 979 \l lowercase next char 980 \u uppercase next char 981 \L lowercase till \E 982 \U uppercase till \E 983 \E end case modification 984 \Q quote non-word characters till \E 985 986 If C<use locale> is in effect, the case map used by C<\l>, C<\L>, 987 C<\u> and C<\U> is taken from the current locale. See L<perllocale>. 988 If Unicode (for example, C<\N{}> or wide hex characters of 0x100 or 989 beyond) is being used, the case map used by C<\l>, C<\L>, C<\u> and 990 C<\U> is as defined by Unicode. For documentation of C<\N{name}>, 991 see L<charnames>. 992 993 All systems use the virtual C<"\n"> to represent a line terminator, 994 called a "newline". There is no such thing as an unvarying, physical 995 newline character. It is only an illusion that the operating system, 996 device drivers, C libraries, and Perl all conspire to preserve. Not all 997 systems read C<"\r"> as ASCII CR and C<"\n"> as ASCII LF. For example, 998 on a Mac, these are reversed, and on systems without line terminator, 999 printing C<"\n"> may emit no actual data. In general, use C<"\n"> when 1000 you mean a "newline" for your system, but use the literal ASCII when you 1001 need an exact character. For example, most networking protocols expect 1002 and prefer a CR+LF (C<"\015\012"> or C<"\cM\cJ">) for line terminators, 1003 and although they often accept just C<"\012">, they seldom tolerate just 1004 C<"\015">. If you get in the habit of using C<"\n"> for networking, 1005 you may be burned some day. 1006 X<newline> X<line terminator> X<eol> X<end of line> 1007 X<\n> X<\r> X<\r\n> 1008 1009 For constructs that do interpolate, variables beginning with "C<$>" 1010 or "C<@>" are interpolated. Subscripted variables such as C<$a[3]> or 1011 C<< $href->{key}[0] >> are also interpolated, as are array and hash slices. 1012 But method calls such as C<< $obj->meth >> are not. 1013 1014 Interpolating an array or slice interpolates the elements in order, 1015 separated by the value of C<$">, so is equivalent to interpolating 1016 C<join $", @array>. "Punctuation" arrays such as C<@*> are only 1017 interpolated if the name is enclosed in braces C<@{*}>, but special 1018 arrays C<@_>, C<@+>, and C<@-> are interpolated, even without braces. 1019 1020 You cannot include a literal C<$> or C<@> within a C<\Q> sequence. 1021 An unescaped C<$> or C<@> interpolates the corresponding variable, 1022 while escaping will cause the literal string C<\$> to be inserted. 1023 You'll need to write something like C<m/\Quser\E\@\Qhost/>. 1024 1025 Patterns are subject to an additional level of interpretation as a 1026 regular expression. This is done as a second pass, after variables are 1027 interpolated, so that regular expressions may be incorporated into the 1028 pattern from the variables. If this is not what you want, use C<\Q> to 1029 interpolate a variable literally. 1030 1031 Apart from the behavior described above, Perl does not expand 1032 multiple levels of interpolation. In particular, contrary to the 1033 expectations of shell programmers, back-quotes do I<NOT> interpolate 1034 within double quotes, nor do single quotes impede evaluation of 1035 variables when used within double quotes. 1036 1037 =head2 Regexp Quote-Like Operators 1038 X<operator, regexp> 1039 1040 Here are the quote-like operators that apply to pattern 1041 matching and related activities. 1042 1043 =over 8 1044 1045 =item qr/STRING/msixpo 1046 X<qr> X</i> X</m> X</o> X</s> X</x> X</p> 1047 1048 This operator quotes (and possibly compiles) its I<STRING> as a regular 1049 expression. I<STRING> is interpolated the same way as I<PATTERN> 1050 in C<m/PATTERN/>. If "'" is used as the delimiter, no interpolation 1051 is done. Returns a Perl value which may be used instead of the 1052 corresponding C</STRING/msixpo> expression. The returned value is a 1053 normalized version of the original pattern. It magically differs from 1054 a string containing the same characters: C<ref(qr/x/)> returns "Regexp", 1055 even though dereferencing the result returns undef. 1056 1057 For example, 1058 1059 $rex = qr/my.STRING/is; 1060 print $rex; # prints (?si-xm:my.STRING) 1061 s/$rex/foo/; 1062 1063 is equivalent to 1064 1065 s/my.STRING/foo/is; 1066 1067 The result may be used as a subpattern in a match: 1068 1069 $re = qr/$pattern/; 1070 $string =~ /foo$re}bar/; # can be interpolated in other patterns 1071 $string =~ $re; # or used standalone 1072 $string =~ /$re/; # or this way 1073 1074 Since Perl may compile the pattern at the moment of execution of qr() 1075 operator, using qr() may have speed advantages in some situations, 1076 notably if the result of qr() is used standalone: 1077 1078 sub match { 1079 my $patterns = shift; 1080 my @compiled = map qr/$_/i, @$patterns; 1081 grep { 1082 my $success = 0; 1083 foreach my $pat (@compiled) { 1084 $success = 1, last if /$pat/; 1085 } 1086 $success; 1087 } @_; 1088 } 1089 1090 Precompilation of the pattern into an internal representation at 1091 the moment of qr() avoids a need to recompile the pattern every 1092 time a match C</$pat/> is attempted. (Perl has many other internal 1093 optimizations, but none would be triggered in the above example if 1094 we did not use qr() operator.) 1095 1096 Options are: 1097 1098 m Treat string as multiple lines. 1099 s Treat string as single line. (Make . match a newline) 1100 i Do case-insensitive pattern matching. 1101 x Use extended regular expressions. 1102 p When matching preserve a copy of the matched string so 1103 that ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined. 1104 o Compile pattern only once. 1105 1106 If a precompiled pattern is embedded in a larger pattern then the effect 1107 of 'msixp' will be propagated appropriately. The effect of the 'o' 1108 modifier has is not propagated, being restricted to those patterns 1109 explicitly using it. 1110 1111 See L<perlre> for additional information on valid syntax for STRING, and 1112 for a detailed look at the semantics of regular expressions. 1113 1114 =item m/PATTERN/msixpogc 1115 X<m> X<operator, match> 1116 X<regexp, options> X<regexp> X<regex, options> X<regex> 1117 X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> 1118 1119 =item /PATTERN/msixpogc 1120 1121 Searches a string for a pattern match, and in scalar context returns 1122 true if it succeeds, false if it fails. If no string is specified 1123 via the C<=~> or C<!~> operator, the $_ string is searched. (The 1124 string specified with C<=~> need not be an lvalue--it may be the 1125 result of an expression evaluation, but remember the C<=~> binds 1126 rather tightly.) See also L<perlre>. See L<perllocale> for 1127 discussion of additional considerations that apply when C<use locale> 1128 is in effect. 1129 1130 Options are as described in C<qr//>; in addition, the following match 1131 process modifiers are available: 1132 1133 g Match globally, i.e., find all occurrences. 1134 c Do not reset search position on a failed match when /g is in effect. 1135 1136 If "/" is the delimiter then the initial C<m> is optional. With the C<m> 1137 you can use any pair of non-alphanumeric, non-whitespace characters 1138 as delimiters. This is particularly useful for matching path names 1139 that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is 1140 the delimiter, then the match-only-once rule of C<?PATTERN?> applies. 1141 If "'" is the delimiter, no interpolation is performed on the PATTERN. 1142 1143 PATTERN may contain variables, which will be interpolated (and the 1144 pattern recompiled) every time the pattern search is evaluated, except 1145 for when the delimiter is a single quote. (Note that C<$(>, C<$)>, and 1146 C<$|> are not interpolated because they look like end-of-string tests.) 1147 If you want such a pattern to be compiled only once, add a C</o> after 1148 the trailing delimiter. This avoids expensive run-time recompilations, 1149 and is useful when the value you are interpolating won't change over 1150 the life of the script. However, mentioning C</o> constitutes a promise 1151 that you won't change the variables in the pattern. If you change them, 1152 Perl won't even notice. See also L<"qr/STRING/msixpo">. 1153 1154 If the PATTERN evaluates to the empty string, the last 1155 I<successfully> matched regular expression is used instead. In this 1156 case, only the C<g> and C<c> flags on the empty pattern is honoured - 1157 the other flags are taken from the original pattern. If no match has 1158 previously succeeded, this will (silently) act instead as a genuine 1159 empty pattern (which will always match). 1160 1161 Note that it's possible to confuse Perl into thinking C<//> (the empty 1162 regex) is really C<//> (the defined-or operator). Perl is usually pretty 1163 good about this, but some pathological cases might trigger this, such as 1164 C<$a///> (is that C<($a) / (//)> or C<$a // />?) and C<print $fh //> 1165 (C<print $fh(//> or C<print($fh //>?). In all of these examples, Perl 1166 will assume you meant defined-or. If you meant the empty regex, just 1167 use parentheses or spaces to disambiguate, or even prefix the empty 1168 regex with an C<m> (so C<//> becomes C<m//>). 1169 1170 If the C</g> option is not used, C<m//> in list context returns a 1171 list consisting of the subexpressions matched by the parentheses in the 1172 pattern, i.e., (C<$1>, C<$2>, C<$3>...). (Note that here C<$1> etc. are 1173 also set, and that this differs from Perl 4's behavior.) When there are 1174 no parentheses in the pattern, the return value is the list C<(1)> for 1175 success. With or without parentheses, an empty list is returned upon 1176 failure. 1177 1178 Examples: 1179 1180 open(TTY, '/dev/tty'); 1181 <TTY> =~ /^y/i && foo(); # do foo if desired 1182 1183 if (/Version: *([0-9.]*)/) { $version = $1; } 1184 1185 next if m#^/usr/spool/uucp#; 1186 1187 # poor man's grep 1188 $arg = shift; 1189 while (<>) { 1190 print if /$arg/o; # compile only once 1191 } 1192 1193 if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/)) 1194 1195 This last example splits $foo into the first two words and the 1196 remainder of the line, and assigns those three fields to $F1, $F2, and 1197 $Etc. The conditional is true if any variables were assigned, i.e., if 1198 the pattern matched. 1199 1200 The C</g> modifier specifies global pattern matching--that is, 1201 matching as many times as possible within the string. How it behaves 1202 depends on the context. In list context, it returns a list of the 1203 substrings matched by any capturing parentheses in the regular 1204 expression. If there are no parentheses, it returns a list of all 1205 the matched strings, as if there were parentheses around the whole 1206 pattern. 1207 1208 In scalar context, each execution of C<m//g> finds the next match, 1209 returning true if it matches, and false if there is no further match. 1210 The position after the last match can be read or set using the pos() 1211 function; see L<perlfunc/pos>. A failed match normally resets the 1212 search position to the beginning of the string, but you can avoid that 1213 by adding the C</c> modifier (e.g. C<m//gc>). Modifying the target 1214 string also resets the search position. 1215 1216 You can intermix C<m//g> matches with C<m/\G.../g>, where C<\G> is a 1217 zero-width assertion that matches the exact position where the previous 1218 C<m//g>, if any, left off. Without the C</g> modifier, the C<\G> assertion 1219 still anchors at pos(), but the match is of course only attempted once. 1220 Using C<\G> without C</g> on a target string that has not previously had a 1221 C</g> match applied to it is the same as using the C<\A> assertion to match 1222 the beginning of the string. Note also that, currently, C<\G> is only 1223 properly supported when anchored at the very beginning of the pattern. 1224 1225 Examples: 1226 1227 # list context 1228 ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g); 1229 1230 # scalar context 1231 $/ = ""; 1232 while (defined($paragraph = <>)) { 1233 while ($paragraph =~ /[a-z]['")]*[.!?]+['")]*\s/g) { 1234 $sentences++; 1235 } 1236 } 1237 print "$sentences\n"; 1238 1239 # using m//gc with \G 1240 $_ = "ppooqppqq"; 1241 while ($i++ < 2) { 1242 print "1: '"; 1243 print $1 while /(o)/gc; print "', pos=", pos, "\n"; 1244 print "2: '"; 1245 print $1 if /\G(q)/gc; print "', pos=", pos, "\n"; 1246 print "3: '"; 1247 print $1 while /(p)/gc; print "', pos=", pos, "\n"; 1248 } 1249 print "Final: '$1', pos=",pos,"\n" if /\G(.)/; 1250 1251 The last example should print: 1252 1253 1: 'oo', pos=4 1254 2: 'q', pos=5 1255 3: 'pp', pos=7 1256 1: '', pos=7 1257 2: 'q', pos=8 1258 3: '', pos=8 1259 Final: 'q', pos=8 1260 1261 Notice that the final match matched C<q> instead of C<p>, which a match 1262 without the C<\G> anchor would have done. Also note that the final match 1263 did not update C<pos> -- C<pos> is only updated on a C</g> match. If the 1264 final match did indeed match C<p>, it's a good bet that you're running an 1265 older (pre-5.6.0) Perl. 1266 1267 A useful idiom for C<lex>-like scanners is C</\G.../gc>. You can 1268 combine several regexps like this to process a string part-by-part, 1269 doing different actions depending on which regexp matched. Each 1270 regexp tries to match where the previous one leaves off. 1271 1272 $_ = <<'EOL'; 1273 $url = URI::URL->new( "http://www/" ); die if $url eq "xXx"; 1274 EOL 1275 LOOP: 1276 { 1277 print(" digits"), redo LOOP if /\G\d+\b[,.;]?\s*/gc; 1278 print(" lowercase"), redo LOOP if /\G[a-z]+\b[,.;]?\s*/gc; 1279 print(" UPPERCASE"), redo LOOP if /\G[A-Z]+\b[,.;]?\s*/gc; 1280 print(" Capitalized"), redo LOOP if /\G[A-Z][a-z]+\b[,.;]?\s*/gc; 1281 print(" MiXeD"), redo LOOP if /\G[A-Za-z]+\b[,.;]?\s*/gc; 1282 print(" alphanumeric"), redo LOOP if /\G[A-Za-z0-9]+\b[,.;]?\s*/gc; 1283 print(" line-noise"), redo LOOP if /\G[^A-Za-z0-9]+/gc; 1284 print ". That's all!\n"; 1285 } 1286 1287 Here is the output (split into several lines): 1288 1289 line-noise lowercase line-noise lowercase UPPERCASE line-noise 1290 UPPERCASE line-noise lowercase line-noise lowercase line-noise 1291 lowercase lowercase line-noise lowercase lowercase line-noise 1292 MiXeD line-noise. That's all! 1293 1294 =item ?PATTERN? 1295 X<?> 1296 1297 This is just like the C</pattern/> search, except that it matches only 1298 once between calls to the reset() operator. This is a useful 1299 optimization when you want to see only the first occurrence of 1300 something in each file of a set of files, for instance. Only C<??> 1301 patterns local to the current package are reset. 1302 1303 while (<>) { 1304 if (?^$?) { 1305 # blank line between header and body 1306 } 1307 } continue { 1308 reset if eof; # clear ?? status for next file 1309 } 1310 1311 This usage is vaguely deprecated, which means it just might possibly 1312 be removed in some distant future version of Perl, perhaps somewhere 1313 around the year 2168. 1314 1315 =item s/PATTERN/REPLACEMENT/msixpogce 1316 X<substitute> X<substitution> X<replace> X<regexp, replace> 1317 X<regexp, substitute> X</m> X</s> X</i> X</x> X</p> X</o> X</g> X</c> X</e> 1318 1319 Searches a string for a pattern, and if found, replaces that pattern 1320 with the replacement text and returns the number of substitutions 1321 made. Otherwise it returns false (specifically, the empty string). 1322 1323 If no string is specified via the C<=~> or C<!~> operator, the C<$_> 1324 variable is searched and modified. (The string specified with C<=~> must 1325 be scalar variable, an array element, a hash element, or an assignment 1326 to one of those, i.e., an lvalue.) 1327 1328 If the delimiter chosen is a single quote, no interpolation is 1329 done on either the PATTERN or the REPLACEMENT. Otherwise, if the 1330 PATTERN contains a $ that looks like a variable rather than an 1331 end-of-string test, the variable will be interpolated into the pattern 1332 at run-time. If you want the pattern compiled only once the first time 1333 the variable is interpolated, use the C</o> option. If the pattern 1334 evaluates to the empty string, the last successfully executed regular 1335 expression is used instead. See L<perlre> for further explanation on these. 1336 See L<perllocale> for discussion of additional considerations that apply 1337 when C<use locale> is in effect. 1338 1339 Options are as with m// with the addition of the following replacement 1340 specific options: 1341 1342 e Evaluate the right side as an expression. 1343 ee Evaluate the right side as a string then eval the result 1344 1345 Any non-alphanumeric, non-whitespace delimiter may replace the 1346 slashes. If single quotes are used, no interpretation is done on the 1347 replacement string (the C</e> modifier overrides this, however). Unlike 1348 Perl 4, Perl 5 treats backticks as normal delimiters; the replacement 1349 text is not evaluated as a command. If the 1350 PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own 1351 pair of quotes, which may or may not be bracketing quotes, e.g., 1352 C<s(foo)(bar)> or C<< s<foo>/bar/ >>. A C</e> will cause the 1353 replacement portion to be treated as a full-fledged Perl expression 1354 and evaluated right then and there. It is, however, syntax checked at 1355 compile-time. A second C<e> modifier will cause the replacement portion 1356 to be C<eval>ed before being run as a Perl expression. 1357 1358 Examples: 1359 1360 s/\bgreen\b/mauve/g; # don't change wintergreen 1361 1362 $path =~ s|/usr/bin|/usr/local/bin|; 1363 1364 s/Login: $foo/Login: $bar/; # run-time pattern 1365 1366 ($foo = $bar) =~ s/this/that/; # copy first, then change 1367 1368 $count = ($paragraph =~ s/Mister\b/Mr./g); # get change-count 1369 1370 $_ = 'abc123xyz'; 1371 s/\d+/$&*2/e; # yields 'abc246xyz' 1372 s/\d+/sprintf("%5d",$&)/e; # yields 'abc 246xyz' 1373 s/\w/$& x 2/eg; # yields 'aabbcc 224466xxyyzz' 1374 1375 s/%(.)/$percent{$1}/g; # change percent escapes; no /e 1376 s/%(.)/$percent{$1} || $&/ge; # expr now, so /e 1377 s/^=(\w+)/pod($1)/ge; # use function call 1378 1379 # expand variables in $_, but dynamics only, using 1380 # symbolic dereferencing 1381 s/\$(\w+)/${$1}/g; 1382 1383 # Add one to the value of any numbers in the string 1384 s/(\d+)/1 + $1/eg; 1385 1386 # This will expand any embedded scalar variable 1387 # (including lexicals) in $_ : First $1 is interpolated 1388 # to the variable name, and then evaluated 1389 s/(\$\w+)/$1/eeg; 1390 1391 # Delete (most) C comments. 1392 $program =~ s { 1393 /\* # Match the opening delimiter. 1394 .*? # Match a minimal number of characters. 1395 \*/ # Match the closing delimiter. 1396 } []gsx; 1397 1398 s/^\s*(.*?)\s*$/$1/; # trim whitespace in $_, expensively 1399 1400 for ($variable) { # trim whitespace in $variable, cheap 1401 s/^\s+//; 1402 s/\s+$//; 1403 } 1404 1405 s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields 1406 1407 Note the use of $ instead of \ in the last example. Unlike 1408 B<sed>, we use the \<I<digit>> form in only the left hand side. 1409 Anywhere else it's $<I<digit>>. 1410 1411 Occasionally, you can't use just a C</g> to get all the changes 1412 to occur that you might want. Here are two common cases: 1413 1414 # put commas in the right places in an integer 1415 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g; 1416 1417 # expand tabs to 8-column spacing 1418 1 while s/\t+/' ' x (length($&)*8 - length($`)%8)/e; 1419 1420 =back 1421 1422 =head2 Quote-Like Operators 1423 X<operator, quote-like> 1424 1425 =over 4 1426 1427 =item q/STRING/ 1428 X<q> X<quote, single> X<'> X<''> 1429 1430 =item 'STRING' 1431 1432 A single-quoted, literal string. A backslash represents a backslash 1433 unless followed by the delimiter or another backslash, in which case 1434 the delimiter or backslash is interpolated. 1435 1436 $foo = q!I said, "You said, 'She said it.'"!; 1437 $bar = q('This is it.'); 1438 $baz = '\n'; # a two-character string 1439 1440 =item qq/STRING/ 1441 X<qq> X<quote, double> X<"> X<""> 1442 1443 =item "STRING" 1444 1445 A double-quoted, interpolated string. 1446 1447 $_ .= qq 1448 (*** The previous line contains the naughty word "$1".\n) 1449 if /\b(tcl|java|python)\b/i; # :-) 1450 $baz = "\n"; # a one-character string 1451 1452 =item qx/STRING/ 1453 X<qx> X<`> X<``> X<backtick> 1454 1455 =item `STRING` 1456 1457 A string which is (possibly) interpolated and then executed as a 1458 system command with C</bin/sh> or its equivalent. Shell wildcards, 1459 pipes, and redirections will be honored. The collected standard 1460 output of the command is returned; standard error is unaffected. In 1461 scalar context, it comes back as a single (potentially multi-line) 1462 string, or undef if the command failed. In list context, returns a 1463 list of lines (however you've defined lines with $/ or 1464 $INPUT_RECORD_SEPARATOR), or an empty list if the command failed. 1465 1466 Because backticks do not affect standard error, use shell file descriptor 1467 syntax (assuming the shell supports this) if you care to address this. 1468 To capture a command's STDERR and STDOUT together: 1469 1470 $output = `cmd 2>&1`; 1471 1472 To capture a command's STDOUT but discard its STDERR: 1473 1474 $output = `cmd 2>/dev/null`; 1475 1476 To capture a command's STDERR but discard its STDOUT (ordering is 1477 important here): 1478 1479 $output = `cmd 2>&1 1>/dev/null`; 1480 1481 To exchange a command's STDOUT and STDERR in order to capture the STDERR 1482 but leave its STDOUT to come out the old STDERR: 1483 1484 $output = `cmd 3>&1 1>&2 2>&3 3>&-`; 1485 1486 To read both a command's STDOUT and its STDERR separately, it's easiest 1487 to redirect them separately to files, and then read from those files 1488 when the program is done: 1489 1490 system("program args 1>program.stdout 2>program.stderr"); 1491 1492 The STDIN filehandle used by the command is inherited from Perl's STDIN. 1493 For example: 1494 1495 open BLAM, "blam" || die "Can't open: $!"; 1496 open STDIN, "<&BLAM"; 1497 print `sort`; 1498 1499 will print the sorted contents of the file "blam". 1500 1501 Using single-quote as a delimiter protects the command from Perl's 1502 double-quote interpolation, passing it on to the shell instead: 1503 1504 $perl_info = qx(ps $$); # that's Perl's $$ 1505 $shell_info = qx'ps $$'; # that's the new shell's $$ 1506 1507 How that string gets evaluated is entirely subject to the command 1508 interpreter on your system. On most platforms, you will have to protect 1509 shell metacharacters if you want them treated literally. This is in 1510 practice difficult to do, as it's unclear how to escape which characters. 1511 See L<perlsec> for a clean and safe example of a manual fork() and exec() 1512 to emulate backticks safely. 1513 1514 On some platforms (notably DOS-like ones), the shell may not be 1515 capable of dealing with multiline commands, so putting newlines in 1516 the string may not get you what you want. You may be able to evaluate 1517 multiple commands in a single line by separating them with the command 1518 separator character, if your shell supports that (e.g. C<;> on many Unix 1519 shells; C<&> on the Windows NT C<cmd> shell). 1520 1521 Beginning with v5.6.0, Perl will attempt to flush all files opened for 1522 output before starting the child process, but this may not be supported 1523 on some platforms (see L<perlport>). To be safe, you may need to set 1524 C<$|> ($AUTOFLUSH in English) or call the C<autoflush()> method of 1525 C<IO::Handle> on any open handles. 1526 1527 Beware that some command shells may place restrictions on the length 1528 of the command line. You must ensure your strings don't exceed this 1529 limit after any necessary interpolations. See the platform-specific 1530 release notes for more details about your particular environment. 1531 1532 Using this operator can lead to programs that are difficult to port, 1533 because the shell commands called vary between systems, and may in 1534 fact not be present at all. As one example, the C<type> command under 1535 the POSIX shell is very different from the C<type> command under DOS. 1536 That doesn't mean you should go out of your way to avoid backticks 1537 when they're the right way to get something done. Perl was made to be 1538 a glue language, and one of the things it glues together is commands. 1539 Just understand what you're getting yourself into. 1540 1541 See L</"I/O Operators"> for more discussion. 1542 1543 =item qw/STRING/ 1544 X<qw> X<quote, list> X<quote, words> 1545 1546 Evaluates to a list of the words extracted out of STRING, using embedded 1547 whitespace as the word delimiters. It can be understood as being roughly 1548 equivalent to: 1549 1550 split(' ', q/STRING/); 1551 1552 the differences being that it generates a real list at compile time, and 1553 in scalar context it returns the last element in the list. So 1554 this expression: 1555 1556 qw(foo bar baz) 1557 1558 is semantically equivalent to the list: 1559 1560 'foo', 'bar', 'baz' 1561 1562 Some frequently seen examples: 1563 1564 use POSIX qw( setlocale localeconv ) 1565 @EXPORT = qw( foo bar baz ); 1566 1567 A common mistake is to try to separate the words with comma or to 1568 put comments into a multi-line C<qw>-string. For this reason, the 1569 C<use warnings> pragma and the B<-w> switch (that is, the C<$^W> variable) 1570 produces warnings if the STRING contains the "," or the "#" character. 1571 1572 1573 =item tr/SEARCHLIST/REPLACEMENTLIST/cds 1574 X<tr> X<y> X<transliterate> X</c> X</d> X</s> 1575 1576 =item y/SEARCHLIST/REPLACEMENTLIST/cds 1577 1578 Transliterates all occurrences of the characters found in the search list 1579 with the corresponding character in the replacement list. It returns 1580 the number of characters replaced or deleted. If no string is 1581 specified via the =~ or !~ operator, the $_ string is transliterated. (The 1582 string specified with =~ must be a scalar variable, an array element, a 1583 hash element, or an assignment to one of those, i.e., an lvalue.) 1584 1585 A character range may be specified with a hyphen, so C<tr/A-J/0-9/> 1586 does the same replacement as C<tr/ACEGIBDFHJ/0246813579/>. 1587 For B<sed> devotees, C<y> is provided as a synonym for C<tr>. If the 1588 SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has 1589 its own pair of quotes, which may or may not be bracketing quotes, 1590 e.g., C<tr[A-Z][a-z]> or C<tr(+\-*/)/ABCD/>. 1591 1592 Note that C<tr> does B<not> do regular expression character classes 1593 such as C<\d> or C<[:lower:]>. The C<tr> operator is not equivalent to 1594 the tr(1) utility. If you want to map strings between lower/upper 1595 cases, see L<perlfunc/lc> and L<perlfunc/uc>, and in general consider 1596 using the C<s> operator if you need regular expressions. 1597 1598 Note also that the whole range idea is rather unportable between 1599 character sets--and even within character sets they may cause results 1600 you probably didn't expect. A sound principle is to use only ranges 1601 that begin from and end at either alphabets of equal case (a-e, A-E), 1602 or digits (0-4). Anything else is unsafe. If in doubt, spell out the 1603 character sets in full. 1604 1605 Options: 1606 1607 c Complement the SEARCHLIST. 1608 d Delete found but unreplaced characters. 1609 s Squash duplicate replaced characters. 1610 1611 If the C</c> modifier is specified, the SEARCHLIST character set 1612 is complemented. If the C</d> modifier is specified, any characters 1613 specified by SEARCHLIST not found in REPLACEMENTLIST are deleted. 1614 (Note that this is slightly more flexible than the behavior of some 1615 B<tr> programs, which delete anything they find in the SEARCHLIST, 1616 period.) If the C</s> modifier is specified, sequences of characters 1617 that were transliterated to the same character are squashed down 1618 to a single instance of the character. 1619 1620 If the C</d> modifier is used, the REPLACEMENTLIST is always interpreted 1621 exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter 1622 than the SEARCHLIST, the final character is replicated till it is long 1623 enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated. 1624 This latter is useful for counting characters in a class or for 1625 squashing character sequences in a class. 1626 1627 Examples: 1628 1629 $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case 1630 1631 $cnt = tr/*/*/; # count the stars in $_ 1632 1633 $cnt = $sky =~ tr/*/*/; # count the stars in $sky 1634 1635 $cnt = tr/0-9//; # count the digits in $_ 1636 1637 tr/a-zA-Z//s; # bookkeeper -> bokeper 1638 1639 ($HOST = $host) =~ tr/a-z/A-Z/; 1640 1641 tr/a-zA-Z/ /cs; # change non-alphas to single space 1642 1643 tr [\200-\377] 1644 [\000-\177]; # delete 8th bit 1645 1646 If multiple transliterations are given for a character, only the 1647 first one is used: 1648 1649 tr/AAA/XYZ/ 1650 1651 will transliterate any A to X. 1652 1653 Because the transliteration table is built at compile time, neither 1654 the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote 1655 interpolation. That means that if you want to use variables, you 1656 must use an eval(): 1657 1658 eval "tr/$oldlist/$newlist/"; 1659 die $@ if $@; 1660 1661 eval "tr/$oldlist/$newlist/, 1" or die $@; 1662 1663 =item <<EOF 1664 X<here-doc> X<heredoc> X<here-document> X<<< << >>> 1665 1666 A line-oriented form of quoting is based on the shell "here-document" 1667 syntax. Following a C<< << >> you specify a string to terminate 1668 the quoted material, and all lines following the current line down to 1669 the terminating string are the value of the item. 1670 1671 The terminating string may be either an identifier (a word), or some 1672 quoted text. An unquoted identifier works like double quotes. 1673 There may not be a space between the C<< << >> and the identifier, 1674 unless the identifier is explicitly quoted. (If you put a space it 1675 will be treated as a null identifier, which is valid, and matches the 1676 first empty line.) The terminating string must appear by itself 1677 (unquoted and with no surrounding whitespace) on the terminating line. 1678 1679 If the terminating string is quoted, the type of quotes used determine 1680 the treatment of the text. 1681 1682 =over 4 1683 1684 =item Double Quotes 1685 1686 Double quotes indicate that the text will be interpolated using exactly 1687 the same rules as normal double quoted strings. 1688 1689 print <<EOF; 1690 The price is $Price. 1691 EOF 1692 1693 print << "EOF"; # same as above 1694 The price is $Price. 1695 EOF 1696 1697 1698 =item Single Quotes 1699 1700 Single quotes indicate the text is to be treated literally with no 1701 interpolation of its content. This is similar to single quoted 1702 strings except that backslashes have no special meaning, with C<\\> 1703 being treated as two backslashes and not one as they would in every 1704 other quoting construct. 1705 1706 This is the only form of quoting in perl where there is no need 1707 to worry about escaping content, something that code generators 1708 can and do make good use of. 1709 1710 =item Backticks 1711 1712 The content of the here doc is treated just as it would be if the 1713 string were embedded in backticks. Thus the content is interpolated 1714 as though it were double quoted and then executed via the shell, with 1715 the results of the execution returned. 1716 1717 print << `EOC`; # execute command and get results 1718 echo hi there 1719 EOC 1720 1721 =back 1722 1723 It is possible to stack multiple here-docs in a row: 1724 1725 print <<"foo", <<"bar"; # you can stack them 1726 I said foo. 1727 foo 1728 I said bar. 1729 bar 1730 1731 myfunc(<< "THIS", 23, <<'THAT'); 1732 Here's a line 1733 or two. 1734 THIS 1735 and here's another. 1736 THAT 1737 1738 Just don't forget that you have to put a semicolon on the end 1739 to finish the statement, as Perl doesn't know you're not going to 1740 try to do this: 1741 1742 print <<ABC 1743 179231 1744 ABC 1745 + 20; 1746 1747 If you want to remove the line terminator from your here-docs, 1748 use C<chomp()>. 1749 1750 chomp($string = <<'END'); 1751 This is a string. 1752 END 1753 1754 If you want your here-docs to be indented with the rest of the code, 1755 you'll need to remove leading whitespace from each line manually: 1756 1757 ($quote = <<'FINIS') =~ s/^\s+//gm; 1758 The Road goes ever on and on, 1759 down from the door where it began. 1760 FINIS 1761 1762 If you use a here-doc within a delimited construct, such as in C<s///eg>, 1763 the quoted material must come on the lines following the final delimiter. 1764 So instead of 1765 1766 s/this/<<E . 'that' 1767 the other 1768 E 1769 . 'more '/eg; 1770 1771 you have to write 1772 1773 s/this/<<E . 'that' 1774 . 'more '/eg; 1775 the other 1776 E 1777 1778 If the terminating identifier is on the last line of the program, you 1779 must be sure there is a newline after it; otherwise, Perl will give the 1780 warning B<Can't find string terminator "END" anywhere before EOF...>. 1781 1782 Additionally, the quoting rules for the end of string identifier are not 1783 related to Perl's quoting rules -- C<q()>, C<qq()>, and the like are not 1784 supported in place of C<''> and C<"">, and the only interpolation is for 1785 backslashing the quoting character: 1786 1787 print << "abc\"def"; 1788 testing... 1789 abc"def 1790 1791 Finally, quoted strings cannot span multiple lines. The general rule is 1792 that the identifier must be a string literal. Stick with that, and you 1793 should be safe. 1794 1795 =back 1796 1797 =head2 Gory details of parsing quoted constructs 1798 X<quote, gory details> 1799 1800 When presented with something that might have several different 1801 interpretations, Perl uses the B<DWIM> (that's "Do What I Mean") 1802 principle to pick the most probable interpretation. This strategy 1803 is so successful that Perl programmers often do not suspect the 1804 ambivalence of what they write. But from time to time, Perl's 1805 notions differ substantially from what the author honestly meant. 1806 1807 This section hopes to clarify how Perl handles quoted constructs. 1808 Although the most common reason to learn this is to unravel labyrinthine 1809 regular expressions, because the initial steps of parsing are the 1810 same for all quoting operators, they are all discussed together. 1811 1812 The most important Perl parsing rule is the first one discussed 1813 below: when processing a quoted construct, Perl first finds the end 1814 of that construct, then interprets its contents. If you understand 1815 this rule, you may skip the rest of this section on the first 1816 reading. The other rules are likely to contradict the user's 1817 expectations much less frequently than this first one. 1818 1819 Some passes discussed below are performed concurrently, but because 1820 their results are the same, we consider them individually. For different 1821 quoting constructs, Perl performs different numbers of passes, from 1822 one to four, but these passes are always performed in the same order. 1823 1824 =over 4 1825 1826 =item Finding the end 1827 1828 The first pass is finding the end of the quoted construct, where 1829 the information about the delimiters is used in parsing. 1830 During this search, text between the starting and ending delimiters 1831 is copied to a safe location. The text copied gets delimiter-independent. 1832 1833 If the construct is a here-doc, the ending delimiter is a line 1834 that has a terminating string as the content. Therefore C<<<EOF> is 1835 terminated by C<EOF> immediately followed by C<"\n"> and starting 1836 from the first column of the terminating line. 1837 When searching for the terminating line of a here-doc, nothing 1838 is skipped. In other words, lines after the here-doc syntax 1839 are compared with the terminating string line by line. 1840 1841 For the constructs except here-docs, single characters are used as starting 1842 and ending delimiters. If the starting delimiter is an opening punctuation 1843 (that is C<(>, C<[>, C<{>, or C<< < >>), the ending delimiter is the 1844 corresponding closing punctuation (that is C<)>, C<]>, C<}>, or C<< > >>). 1845 If the starting delimiter is an unpaired character like C</> or a closing 1846 punctuation, the ending delimiter is same as the starting delimiter. 1847 Therefore a C</> terminates a C<qq//> construct, while a C<]> terminates 1848 C<qq[]> and C<qq]]> constructs. 1849 1850 When searching for single-character delimiters, escaped delimiters 1851 and C<\\> are skipped. For example, while searching for terminating C</>, 1852 combinations of C<\\> and C<\/> are skipped. If the delimiters are 1853 bracketing, nested pairs are also skipped. For example, while searching 1854 for closing C<]> paired with the opening C<[>, combinations of C<\\>, C<\]>, 1855 and C<\[> are all skipped, and nested C<[> and C<]> are skipped as well. 1856 However, when backslashes are used as the delimiters (like C<qq\\> and 1857 C<tr\\\>), nothing is skipped. 1858 During the search for the end, backslashes that escape delimiters 1859 are removed (exactly speaking, they are not copied to the safe location). 1860 1861 For constructs with three-part delimiters (C<s///>, C<y///>, and 1862 C<tr///>), the search is repeated once more. 1863 If the first delimiter is not an opening punctuation, three delimiters must 1864 be same such as C<s!!!> and C<tr)))>, in which case the second delimiter 1865 terminates the left part and starts the right part at once. 1866 If the left part is delimited by bracketing punctuations (that is C<()>, 1867 C<[]>, C<{}>, or C<< <> >>), the right part needs another pair of 1868 delimiters such as C<s(){}> and C<tr[]//>. In these cases, whitespaces 1869 and comments are allowed between both parts, though the comment must follow 1870 at least one whitespace; otherwise a character expected as the start of 1871 the comment may be regarded as the starting delimiter of the right part. 1872 1873 During this search no attention is paid to the semantics of the construct. 1874 Thus: 1875 1876 "$hash{"$foo/$bar"}" 1877 1878 or: 1879 1880 m/ 1881 bar # NOT a comment, this slash / terminated m//! 1882 /x 1883 1884 do not form legal quoted expressions. The quoted part ends on the 1885 first C<"> and C</>, and the rest happens to be a syntax error. 1886 Because the slash that terminated C<m//> was followed by a C<SPACE>, 1887 the example above is not C<m//x>, but rather C<m//> with no C</x> 1888 modifier. So the embedded C<#> is interpreted as a literal C<#>. 1889 1890 Also no attention is paid to C<\c\> (multichar control char syntax) during 1891 this search. Thus the second C<\> in C<qq/\c\/> is interpreted as a part 1892 of C<\/>, and the following C</> is not recognized as a delimiter. 1893 Instead, use C<\034> or C<\x1c> at the end of quoted constructs. 1894 1895 =item Interpolation 1896 X<interpolation> 1897 1898 The next step is interpolation in the text obtained, which is now 1899 delimiter-independent. There are multiple cases. 1900 1901 =over 4 1902 1903 =item C<<<'EOF'> 1904 1905 No interpolation is performed. 1906 Note that the combination C<\\> is left intact, since escaped delimiters 1907 are not available for here-docs. 1908 1909 =item C<m''>, the pattern of C<s'''> 1910 1911 No interpolation is performed at this stage. 1912 Any backslashed sequences including C<\\> are treated at the stage 1913 to L</"parsing regular expressions">. 1914 1915 =item C<''>, C<q//>, C<tr'''>, C<y'''>, the replacement of C<s'''> 1916 1917 The only interpolation is removal of C<\> from pairs of C<\\>. 1918 Therefore C<-> in C<tr'''> and C<y'''> is treated literally 1919 as a hyphen and no character range is available. 1920 C<\1> in the replacement of C<s'''> does not work as C<$1>. 1921 1922 =item C<tr///>, C<y///> 1923 1924 No variable interpolation occurs. String modifying combinations for 1925 case and quoting such as C<\Q>, C<\U>, and C<\E> are not recognized. 1926 The other escape sequences such as C<\200> and C<\t> and backslashed 1927 characters such as C<\\> and C<\-> are converted to appropriate literals. 1928 The character C<-> is treated specially and therefore C<\-> is treated 1929 as a literal C<->. 1930 1931 =item C<"">, C<``>, C<qq//>, C<qx//>, C<< <file*glob> >>, C<<<"EOF"> 1932 1933 C<\Q>, C<\U>, C<\u>, C<\L>, C<\l> (possibly paired with C<\E>) are 1934 converted to corresponding Perl constructs. Thus, C<"$foo\Qbaz$bar"> 1935 is converted to C<$foo . (quotemeta("baz" . $bar))> internally. 1936 The other escape sequences such as C<\200> and C<\t> and backslashed 1937 characters such as C<\\> and C<\-> are replaced with appropriate 1938 expansions. 1939 1940 Let it be stressed that I<whatever falls between C<\Q> and C<\E>> 1941 is interpolated in the usual way. Something like C<"\Q\\E"> has 1942 no C<\E> inside. instead, it has C<\Q>, C<\\>, and C<E>, so the 1943 result is the same as for C<"\\\\E">. As a general rule, backslashes 1944 between C<\Q> and C<\E> may lead to counterintuitive results. So, 1945 C<"\Q\t\E"> is converted to C<quotemeta("\t")>, which is the same 1946 as C<"\\\t"> (since TAB is not alphanumeric). Note also that: 1947 1948 $str = '\t'; 1949 return "\Q$str"; 1950 1951 may be closer to the conjectural I<intention> of the writer of C<"\Q\t\E">. 1952 1953 Interpolated scalars and arrays are converted internally to the C<join> and 1954 C<.> catenation operations. Thus, C<"$foo XXX '@arr'"> becomes: 1955 1956 $foo . " XXX '" . (join $", @arr) . "'"; 1957 1958 All operations above are performed simultaneously, left to right. 1959 1960 Because the result of C<"\Q STRING \E"> has all metacharacters 1961 quoted, there is no way to insert a literal C<$> or C<@> inside a 1962 C<\Q\E> pair. If protected by C<\>, C<$> will be quoted to became 1963 C<"\\\$">; if not, it is interpreted as the start of an interpolated 1964 scalar. 1965 1966 Note also that the interpolation code needs to make a decision on 1967 where the interpolated scalar ends. For instance, whether 1968 C<< "a $b -> {c}" >> really means: 1969 1970 "a " . $b . " -> {c}"; 1971 1972 or: 1973 1974 "a " . $b -> {c}; 1975 1976 Most of the time, the longest possible text that does not include 1977 spaces between components and which contains matching braces or 1978 brackets. because the outcome may be determined by voting based 1979 on heuristic estimators, the result is not strictly predictable. 1980 Fortunately, it's usually correct for ambiguous cases. 1981 1982 =item the replacement of C<s///> 1983 1984 Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, and interpolation 1985 happens as with C<qq//> constructs. 1986 1987 It is at this step that C<\1> is begrudgingly converted to C<$1> in 1988 the replacement text of C<s///>, in order to correct the incorrigible 1989 I<sed> hackers who haven't picked up the saner idiom yet. A warning 1990 is emitted if the C<use warnings> pragma or the B<-w> command-line flag 1991 (that is, the C<$^W> variable) was set. 1992 1993 =item C<RE> in C<?RE?>, C</RE/>, C<m/RE/>, C<s/RE/foo/>, 1994 1995 Processing of C<\Q>, C<\U>, C<\u>, C<\L>, C<\l>, C<\E>, 1996 and interpolation happens (almost) as with C<qq//> constructs. 1997 1998 However any other combinations of C<\> followed by a character 1999 are not substituted but only skipped, in order to parse them 2000 as regular expressions at the following step. 2001 As C<\c> is skipped at this step, C<@> of C<\c@> in RE is possibly 2002 treated as an array symbol (for example C<@foo>), 2003 even though the same text in C<qq//> gives interpolation of C<\c@>. 2004 2005 Moreover, inside C<(?{BLOCK})>, C<(?# comment )>, and 2006 a C<#>-comment in a C<//x>-regular expression, no processing is 2007 performed whatsoever. This is the first step at which the presence 2008 of the C<//x> modifier is relevant. 2009 2010 Interpolation in patterns has several quirks: C<$|>, C<$(>, C<$)>, C<@+> 2011 and C<@-> are not interpolated, and constructs C<$var[SOMETHING]> are 2012 voted (by several different estimators) to be either an array element 2013 or C<$var> followed by an RE alternative. This is where the notation 2014 C<$arr[$bar]}> comes handy: C</$arr[0-9]}/> is interpreted as 2015 array element C<-9>, not as a regular expression from the variable 2016 C<$arr> followed by a digit, which would be the interpretation of 2017 C</$arr[0-9]/>. Since voting among different estimators may occur, 2018 the result is not predictable. 2019 2020 The lack of processing of C<\\> creates specific restrictions on 2021 the post-processed text. If the delimiter is C</>, one cannot get 2022 the combination C<\/> into the result of this step. C</> will 2023 finish the regular expression, C<\/> will be stripped to C</> on 2024 the previous step, and C<\\/> will be left as is. Because C</> is 2025 equivalent to C<\/> inside a regular expression, this does not 2026 matter unless the delimiter happens to be character special to the 2027 RE engine, such as in C<s*foo*bar*>, C<m[foo]>, or C<?foo?>; or an 2028 alphanumeric char, as in: 2029 2030 m m ^ a \s* b mmx; 2031 2032 In the RE above, which is intentionally obfuscated for illustration, the 2033 delimiter is C<m>, the modifier is C<mx>, and after delimiter-removal the 2034 RE is the same as for C<m/ ^ a \s* b /mx>. There's more than one 2035 reason you're encouraged to restrict your delimiters to non-alphanumeric, 2036 non-whitespace choices. 2037 2038 =back 2039 2040 This step is the last one for all constructs except regular expressions, 2041 which are processed further. 2042 2043 =item parsing regular expressions 2044 X<regexp, parse> 2045 2046 Previous steps were performed during the compilation of Perl code, 2047 but this one happens at run time--although it may be optimized to 2048 be calculated at compile time if appropriate. After preprocessing 2049 described above, and possibly after evaluation if concatenation, 2050 joining, casing translation, or metaquoting are involved, the 2051 resulting I<string> is passed to the RE engine for compilation. 2052 2053 Whatever happens in the RE engine might be better discussed in L<perlre>, 2054 but for the sake of continuity, we shall do so here. 2055 2056 This is another step where the presence of the C<//x> modifier is 2057 relevant. The RE engine scans the string from left to right and 2058 converts it to a finite automaton. 2059 2060 Backslashed characters are either replaced with corresponding 2061 literal strings (as with C<\{>), or else they generate special nodes 2062 in the finite automaton (as with C<\b>). Characters special to the 2063 RE engine (such as C<|>) generate corresponding nodes or groups of 2064 nodes. C<(?#...)> comments are ignored. All the rest is either 2065 converted to literal strings to match, or else is ignored (as is 2066 whitespace and C<#>-style comments if C<//x> is present). 2067 2068 Parsing of the bracketed character class construct, C<[...]>, is 2069 rather different than the rule used for the rest of the pattern. 2070 The terminator of this construct is found using the same rules as 2071 for finding the terminator of a C<{}>-delimited construct, the only 2072 exception being that C<]> immediately following C<[> is treated as 2073 though preceded by a backslash. Similarly, the terminator of 2074 C<(?{...})> is found using the same rules as for finding the 2075 terminator of a C<{}>-delimited construct. 2076 2077 It is possible to inspect both the string given to RE engine and the 2078 resulting finite automaton. See the arguments C<debug>/C<debugcolor> 2079 in the C<use L<re>> pragma, as well as Perl's B<-Dr> command-line 2080 switch documented in L<perlrun/"Command Switches">. 2081 2082 =item Optimization of regular expressions 2083 X<regexp, optimization> 2084 2085 This step is listed for completeness only. Since it does not change 2086 semantics, details of this step are not documented and are subject 2087 to change without notice. This step is performed over the finite 2088 automaton that was generated during the previous pass. 2089 2090 It is at this stage that C<split()> silently optimizes C</^/> to 2091 mean C</^/m>. 2092 2093 =back 2094 2095 =head2 I/O Operators 2096 X<operator, i/o> X<operator, io> X<io> X<while> X<filehandle> 2097 X<< <> >> X<@ARGV> 2098 2099 There are several I/O operators you should know about. 2100 2101 A string enclosed by backticks (grave accents) first undergoes 2102 double-quote interpolation. It is then interpreted as an external 2103 command, and the output of that command is the value of the 2104 backtick string, like in a shell. In scalar context, a single string 2105 consisting of all output is returned. In list context, a list of 2106 values is returned, one per line of output. (You can set C<$/> to use 2107 a different line terminator.) The command is executed each time the 2108 pseudo-literal is evaluated. The status value of the command is 2109 returned in C<$?> (see L<perlvar> for the interpretation of C<$?>). 2110 Unlike in B<csh>, no translation is done on the return data--newlines 2111 remain newlines. Unlike in any of the shells, single quotes do not 2112 hide variable names in the command from interpretation. To pass a 2113 literal dollar-sign through to the shell you need to hide it with a 2114 backslash. The generalized form of backticks is C<qx//>. (Because 2115 backticks always undergo shell expansion as well, see L<perlsec> for 2116 security concerns.) 2117 X<qx> X<`> X<``> X<backtick> X<glob> 2118 2119 In scalar context, evaluating a filehandle in angle brackets yields 2120 the next line from that file (the newline, if any, included), or 2121 C<undef> at end-of-file or on error. When C<$/> is set to C<undef> 2122 (sometimes known as file-slurp mode) and the file is empty, it 2123 returns C<''> the first time, followed by C<undef> subsequently. 2124 2125 Ordinarily you must assign the returned value to a variable, but 2126 there is one situation where an automatic assignment happens. If 2127 and only if the input symbol is the only thing inside the conditional 2128 of a C<while> statement (even if disguised as a C<for(;;)> loop), 2129 the value is automatically assigned to the global variable $_, 2130 destroying whatever was there previously. (This may seem like an 2131 odd thing to you, but you'll use the construct in almost every Perl 2132 script you write.) The $_ variable is not implicitly localized. 2133 You'll have to put a C<local $_;> before the loop if you want that 2134 to happen. 2135 2136 The following lines are equivalent: 2137 2138 while (defined($_ = <STDIN>)) { print; } 2139 while ($_ = <STDIN>) { print; } 2140 while (<STDIN>) { print; } 2141 for (;<STDIN>;) { print; } 2142 print while defined($_ = <STDIN>); 2143 print while ($_ = <STDIN>); 2144 print while <STDIN>; 2145 2146 This also behaves similarly, but avoids $_ : 2147 2148 while (my $line = <STDIN>) { print $line } 2149 2150 In these loop constructs, the assigned value (whether assignment 2151 is automatic or explicit) is then tested to see whether it is 2152 defined. The defined test avoids problems where line has a string 2153 value that would be treated as false by Perl, for example a "" or 2154 a "0" with no trailing newline. If you really mean for such values 2155 to terminate the loop, they should be tested for explicitly: 2156 2157 while (($_ = <STDIN>) ne '0') { ... } 2158 while (<STDIN>) { last unless $_; ... } 2159 2160 In other boolean contexts, C<< <I<filehandle>> >> without an 2161 explicit C<defined> test or comparison elicit a warning if the 2162 C<use warnings> pragma or the B<-w> 2163 command-line switch (the C<$^W> variable) is in effect. 2164 2165 The filehandles STDIN, STDOUT, and STDERR are predefined. (The 2166 filehandles C<stdin>, C<stdout>, and C<stderr> will also work except 2167 in packages, where they would be interpreted as local identifiers 2168 rather than global.) Additional filehandles may be created with 2169 the open() function, amongst others. See L<perlopentut> and 2170 L<perlfunc/open> for details on this. 2171 X<stdin> X<stdout> X<sterr> 2172 2173 If a <FILEHANDLE> is used in a context that is looking for 2174 a list, a list comprising all input lines is returned, one line per 2175 list element. It's easy to grow to a rather large data space this 2176 way, so use with care. 2177 2178 <FILEHANDLE> may also be spelled C<readline(*FILEHANDLE)>. 2179 See L<perlfunc/readline>. 2180 2181 The null filehandle <> is special: it can be used to emulate the 2182 behavior of B<sed> and B<awk>. Input from <> comes either from 2183 standard input, or from each file listed on the command line. Here's 2184 how it works: the first time <> is evaluated, the @ARGV array is 2185 checked, and if it is empty, C<$ARGV[0]> is set to "-", which when opened 2186 gives you standard input. The @ARGV array is then processed as a list 2187 of filenames. The loop 2188 2189 while (<>) { 2190 ... # code for each line 2191 } 2192 2193 is equivalent to the following Perl-like pseudo code: 2194 2195 unshift(@ARGV, '-') unless @ARGV; 2196 while ($ARGV = shift) { 2197 open(ARGV, $ARGV); 2198 while (<ARGV>) { 2199 ... # code for each line 2200 } 2201 } 2202 2203 except that it isn't so cumbersome to say, and will actually work. 2204 It really does shift the @ARGV array and put the current filename 2205 into the $ARGV variable. It also uses filehandle I<ARGV> 2206 internally--<> is just a synonym for <ARGV>, which 2207 is magical. (The pseudo code above doesn't work because it treats 2208 <ARGV> as non-magical.) 2209 2210 You can modify @ARGV before the first <> as long as the array ends up 2211 containing the list of filenames you really want. Line numbers (C<$.>) 2212 continue as though the input were one big happy file. See the example 2213 in L<perlfunc/eof> for how to reset line numbers on each file. 2214 2215 If you want to set @ARGV to your own list of files, go right ahead. 2216 This sets @ARGV to all plain text files if no @ARGV was given: 2217 2218 @ARGV = grep { -f && -T } glob('*') unless @ARGV; 2219 2220 You can even set them to pipe commands. For example, this automatically 2221 filters compressed arguments through B<gzip>: 2222 2223 @ARGV = map { /\.(gz|Z)$/ ? "gzip -dc < $_ |" : $_ } @ARGV; 2224 2225 If you want to pass switches into your script, you can use one of the 2226 Getopts modules or put a loop on the front like this: 2227 2228 while ($_ = $ARGV[0], /^-/) { 2229 shift; 2230 last if /^--$/; 2231 if (/^-D(.*)/) { $debug = $1 } 2232 if (/^-v/) { $verbose++ } 2233 # ... # other switches 2234 } 2235 2236 while (<>) { 2237 # ... # code for each line 2238 } 2239 2240 The <> symbol will return C<undef> for end-of-file only once. 2241 If you call it again after this, it will assume you are processing another 2242 @ARGV list, and if you haven't set @ARGV, will read input from STDIN. 2243 2244 If what the angle brackets contain is a simple scalar variable (e.g., 2245 <$foo>), then that variable contains the name of the 2246 filehandle to input from, or its typeglob, or a reference to the 2247 same. For example: 2248 2249 $fh = \*STDIN; 2250 $line = <$fh>; 2251 2252 If what's within the angle brackets is neither a filehandle nor a simple 2253 scalar variable containing a filehandle name, typeglob, or typeglob 2254 reference, it is interpreted as a filename pattern to be globbed, and 2255 either a list of filenames or the next filename in the list is returned, 2256 depending on context. This distinction is determined on syntactic 2257 grounds alone. That means C<< <$x> >> is always a readline() from 2258 an indirect handle, but C<< <$hash{key}> >> is always a glob(). 2259 That's because $x is a simple scalar variable, but C<$hash{key}> is 2260 not--it's a hash element. Even C<< <$x > >> (note the extra space) 2261 is treated as C<glob("$x ")>, not C<readline($x)>. 2262 2263 One level of double-quote interpretation is done first, but you can't 2264 say C<< <$foo> >> because that's an indirect filehandle as explained 2265 in the previous paragraph. (In older versions of Perl, programmers 2266 would insert curly brackets to force interpretation as a filename glob: 2267 C<< <$foo}> >>. These days, it's considered cleaner to call the 2268 internal function directly as C<glob($foo)>, which is probably the right 2269 way to have done it in the first place.) For example: 2270 2271 while (<*.c>) { 2272 chmod 0644, $_; 2273 } 2274 2275 is roughly equivalent to: 2276 2277 open(FOO, "echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|"); 2278 while (<FOO>) { 2279 chomp; 2280 chmod 0644, $_; 2281 } 2282 2283 except that the globbing is actually done internally using the standard 2284 C<File::Glob> extension. Of course, the shortest way to do the above is: 2285 2286 chmod 0644, <*.c>; 2287 2288 A (file)glob evaluates its (embedded) argument only when it is 2289 starting a new list. All values must be read before it will start 2290 over. In list context, this isn't important because you automatically 2291 get them all anyway. However, in scalar context the operator returns 2292 the next value each time it's called, or C<undef> when the list has 2293 run out. As with filehandle reads, an automatic C<defined> is 2294 generated when the glob occurs in the test part of a C<while>, 2295 because legal glob returns (e.g. a file called F<0>) would otherwise 2296 terminate the loop. Again, C<undef> is returned only once. So if 2297 you're expecting a single value from a glob, it is much better to 2298 say 2299 2300 ($file) = <blurch*>; 2301 2302 than 2303 2304 $file = <blurch*>; 2305 2306 because the latter will alternate between returning a filename and 2307 returning false. 2308 2309 If you're trying to do variable interpolation, it's definitely better 2310 to use the glob() function, because the older notation can cause people 2311 to become confused with the indirect filehandle notation. 2312 2313 @files = glob("$dir/*.[ch]"); 2314 @files = glob($files[$i]); 2315 2316 =head2 Constant Folding 2317 X<constant folding> X<folding> 2318 2319 Like C, Perl does a certain amount of expression evaluation at 2320 compile time whenever it determines that all arguments to an 2321 operator are static and have no side effects. In particular, string 2322 concatenation happens at compile time between literals that don't do 2323 variable substitution. Backslash interpolation also happens at 2324 compile time. You can say 2325 2326 'Now is the time for all' . "\n" . 2327 'good men to come to.' 2328 2329 and this all reduces to one string internally. Likewise, if 2330 you say 2331 2332 foreach $file (@filenames) { 2333 if (-s $file > 5 + 100 * 2**16) { } 2334 } 2335 2336 the compiler will precompute the number which that expression 2337 represents so that the interpreter won't have to. 2338 2339 =head2 No-ops 2340 X<no-op> X<nop> 2341 2342 Perl doesn't officially have a no-op operator, but the bare constants 2343 C<0> and C<1> are special-cased to not produce a warning in a void 2344 context, so you can for example safely do 2345 2346 1 while foo(); 2347 2348 =head2 Bitwise String Operators 2349 X<operator, bitwise, string> 2350 2351 Bitstrings of any size may be manipulated by the bitwise operators 2352 (C<~ | & ^>). 2353 2354 If the operands to a binary bitwise op are strings of different 2355 sizes, B<|> and B<^> ops act as though the shorter operand had 2356 additional zero bits on the right, while the B<&> op acts as though 2357 the longer operand were truncated to the length of the shorter. 2358 The granularity for such extension or truncation is one or more 2359 bytes. 2360 2361 # ASCII-based examples 2362 print "j p \n" ^ " a h"; # prints "JAPH\n" 2363 print "JA" | " ph\n"; # prints "japh\n" 2364 print "japh\nJunk" & '_____'; # prints "JAPH\n"; 2365 print 'p N$' ^ " E<H\n"; # prints "Perl\n"; 2366 2367 If you are intending to manipulate bitstrings, be certain that 2368 you're supplying bitstrings: If an operand is a number, that will imply 2369 a B<numeric> bitwise operation. You may explicitly show which type of 2370 operation you intend by using C<""> or C<0+>, as in the examples below. 2371 2372 $foo = 150 | 105; # yields 255 (0x96 | 0x69 is 0xFF) 2373 $foo = '150' | 105; # yields 255 2374 $foo = 150 | '105'; # yields 255 2375 $foo = '150' | '105'; # yields string '155' (under ASCII) 2376 2377 $baz = 0+$foo & 0+$bar; # both ops explicitly numeric 2378 $biz = "$foo" ^ "$bar"; # both ops explicitly stringy 2379 2380 See L<perlfunc/vec> for information on how to manipulate individual bits 2381 in a bit vector. 2382 2383 =head2 Integer Arithmetic 2384 X<integer> 2385 2386 By default, Perl assumes that it must do most of its arithmetic in 2387 floating point. But by saying 2388 2389 use integer; 2390 2391 you may tell the compiler that it's okay to use integer operations 2392 (if it feels like it) from here to the end of the enclosing BLOCK. 2393 An inner BLOCK may countermand this by saying 2394 2395 no integer; 2396 2397 which lasts until the end of that BLOCK. Note that this doesn't 2398 mean everything is only an integer, merely that Perl may use integer 2399 operations if it is so inclined. For example, even under C<use 2400 integer>, if you take the C<sqrt(2)>, you'll still get C<1.4142135623731> 2401 or so. 2402 2403 Used on numbers, the bitwise operators ("&", "|", "^", "~", "<<", 2404 and ">>") always produce integral results. (But see also 2405 L<Bitwise String Operators>.) However, C<use integer> still has meaning for 2406 them. By default, their results are interpreted as unsigned integers, but 2407 if C<use integer> is in effect, their results are interpreted 2408 as signed integers. For example, C<~0> usually evaluates to a large 2409 integral value. However, C<use integer; ~0> is C<-1> on two's-complement 2410 machines. 2411 2412 =head2 Floating-point Arithmetic 2413 X<floating-point> X<floating point> X<float> X<real> 2414 2415 While C<use integer> provides integer-only arithmetic, there is no 2416 analogous mechanism to provide automatic rounding or truncation to a 2417 certain number of decimal places. For rounding to a certain number 2418 of digits, sprintf() or printf() is usually the easiest route. 2419 See L<perlfaq4>. 2420 2421 Floating-point numbers are only approximations to what a mathematician 2422 would call real numbers. There are infinitely more reals than floats, 2423 so some corners must be cut. For example: 2424 2425 printf "%.20g\n", 123456789123456789; 2426 # produces 123456789123456784 2427 2428 Testing for exact equality of floating-point equality or inequality is 2429 not a good idea. Here's a (relatively expensive) work-around to compare 2430 whether two floating-point numbers are equal to a particular number of 2431 decimal places. See Knuth, volume II, for a more robust treatment of 2432 this topic. 2433 2434 sub fp_equal { 2435 my ($X, $Y, $POINTS) = @_; 2436 my ($tX, $tY); 2437 $tX = sprintf("%.${POINTS}g", $X); 2438 $tY = sprintf("%.${POINTS}g", $Y); 2439 return $tX eq $tY; 2440 } 2441 2442 The POSIX module (part of the standard perl distribution) implements 2443 ceil(), floor(), and other mathematical and trigonometric functions. 2444 The Math::Complex module (part of the standard perl distribution) 2445 defines mathematical functions that work on both the reals and the 2446 imaginary numbers. Math::Complex not as efficient as POSIX, but 2447 POSIX can't work with complex numbers. 2448 2449 Rounding in financial applications can have serious implications, and 2450 the rounding method used should be specified precisely. In these 2451 cases, it probably pays not to trust whichever system rounding is 2452 being used by Perl, but to instead implement the rounding function you 2453 need yourself. 2454 2455 =head2 Bigger Numbers 2456 X<number, arbitrary precision> 2457 2458 The standard Math::BigInt and Math::BigFloat modules provide 2459 variable-precision arithmetic and overloaded operators, although 2460 they're currently pretty slow. At the cost of some space and 2461 considerable speed, they avoid the normal pitfalls associated with 2462 limited-precision representations. 2463 2464 use Math::BigInt; 2465 $x = Math::BigInt->new('123456789123456789'); 2466 print $x * $x; 2467 2468 # prints +15241578780673678515622620750190521 2469 2470 There are several modules that let you calculate with (bound only by 2471 memory and cpu-time) unlimited or fixed precision. There are also 2472 some non-standard modules that provide faster implementations via 2473 external C libraries. 2474 2475 Here is a short, but incomplete summary: 2476 2477 Math::Fraction big, unlimited fractions like 9973 / 12967 2478 Math::String treat string sequences like numbers 2479 Math::FixedPrecision calculate with a fixed precision 2480 Math::Currency for currency calculations 2481 Bit::Vector manipulate bit vectors fast (uses C) 2482 Math::BigIntFast Bit::Vector wrapper for big numbers 2483 Math::Pari provides access to the Pari C library 2484 Math::BigInteger uses an external C library 2485 Math::Cephes uses external Cephes C library (no big numbers) 2486 Math::Cephes::Fraction fractions via the Cephes library 2487 Math::GMP another one using an external C library 2488 2489 Choose wisely. 2490 2491 =cut
title
Description
Body
title
Description
Body
title
Description
Body
title
Body
Generated: Tue Mar 17 22:47:18 2015 | Cross-referenced by PHPXref 0.7.1 |