PHP源代码文章列表

PHP tokens and opcodes : 3 useful extensions for understanding the working of Zend Engine

November 15, 2009   -   PHP -   8 comments

“PHP tokens and opcodes” – When a PHP script is executed it goes through a number of processes, before the final result is displayed. These processes are namely: Lexing, Parsing, Compiling and Executing. In this blog post, I will walk you through all these processes with a sample example. In the end I will list some useful PHP extensions, which can be used to analyze results of every intermediate process.

Lets take a sample PHP script as an example:

  1. <?php
  2. function increment($a) {
  3. return $a+1;
  4. }
  5. $a = 3;
  6. $b = increment($a);
  7. echo $b;
  8. ?>

Try running this script through command line:

  1. ~ sabhinav$ php -r debug.php
  2. 4

This PHP script goes through the following processes before outputting the result:

  • Lexing: The php code inside debug.php is converted into tokens
  • Parsing: During this stage, tokens are processed to derive at meaningful expressions
  • Compiling: The derived expressions are compiled into opcodes
  • Execution: Opcodes are executed to derive at the final result

Lets see how a PHP script passes through all the above steps.

Lexing: During this stage human readable php script is converted into token. For the first two lines of our PHP script:

  1. <?php
  2. function increment($a) {

tokens will look like this (try to match the tokens below line by line with the above 2 lines of PHP code and you will get a feel):

  1. ~ sabhinav$ php -r ‘print_r(token_get_all(file_get_contents(”debug.php”)));’;
  2. Array
  3. (
  4. [0] => Array
  5. (
  6. [0] => 368             // 368 is the token number and it’s symbolic name is T_OPEN_TAG,
  7. //see below
  8. [1] => <?php
  9. [2] => 1
  10. )
  11. [1] => Array
  12. (
  13. [0] => 371
  14. [1] =>
  15. [2] => 2
  16. )
  17. [2] => Array
  18. (
  19. [0] => 334
  20. [1] => function
  21. [2] => 2
  22. )
  23. [3] => Array
  24. (
  25. [0] => 371
  26. [1] =>
  27. [2] => 2
  28. )
  29. [4] => Array
  30. (
  31. [0] => 307
  32. [1] => increment
  33. [2] => 2
  34. )
  35. [5] => (
  36. [6] => Array
  37. (
  38. [0] => 309
  39. [1] => $a
  40. [2] => 2
  41. )
  42. [7] => )
  43. [8] => Array
  44. (
  45. [0] => 371
  46. [1] =>
  47. [2] => 2
  48. )
  49. [9] => {
  50. [10] => Array
  51. (
  52. [0] => 371
  53. [1] =>
  54. [2] => 2
  55. )

A list of parser tokens can be found here: http://www.php.net/manual/en/tokens.php

Every token number has a symbolic name attached with it. Below is our PHP script with human readable code replaced by symbolic name for each generated token:

  1. ~ sabhinav$ php -r ‘$tokens = (token_get_all(file_get_contents(”debug.php”))); foreach($tokens as $token) {
  2. if(count($token) == 3) { echo token_name($token[0]); echo $token[1]; echo token_name($token[2]);  }  }’;
  3. T_OPEN_TAG<?php
  4. UNKNOWNT_WHITESPACE UNKNOWNT_FUNCTIONfunctionUNKNOWNT_WHITESPACE
  5. UNKNOWNT_STRINGincrementUNKNOWNT_VARIABLE$aUNKNOWNT_WHITESPACE UNKNOWNT_WHITESPACE
  6. UNKNOWNT_RETURNreturnUNKNOWNT_WHITESPACE UNKNOWNT_VARIABLE$aUNKNOWN
  7. T_LNUMBER1UNKNOWNT_WHITESPACE
  8. UNKNOWNT_WHITESPACE
  9. UNKNOWNT_VARIABLE$aUNKNOWNT_WHITESPACE UNKNOWNT_WHITESPACE UNKNOWNT_LNUMBER3UNKNOWNT_WHITESPACE
  10. UNKNOWNT_VARIABLE$bUNKNOWNT_WHITESPACE UNKNOWNT_WHITESPACE UNKNOWNT_STRINGincrementUNKNOWNT_VARIABLE$aUNKNOWN
  11. T_WHITESPACE
  12. UNKNOWNT_ECHOechoUNKNOWNT_WHITESPACE UNKNOWNT_VARIABLE$bUNKNOWN
  13. T_WHITESPACEUNKNOWN

Parsing and Compiling: By generating the tokens in the above step, zend engine is able to recognize each and every detail in the script. Where the spaces are, where are the new line characters, where is a user defined function and what not. Over the next two stages, the generated tokens are parsed and then compiled into opcodes. Below is the compiled opcode for the complete sample script of ours:

  1. ~ sabhinav$ php -r ‘$op_codes = parsekit_compile_file(”debug.php”, $errors, PARSEKIT_SIMPLE); print_r($op_codes);
  2. print_r($errors);’;
  3. Array
  4. (
  5. [0] => ZEND_EXT_STMT UNUSED UNUSED UNUSED
  6. [1] => ZEND_NOP UNUSED UNUSED UNUSED
  7. [2] => ZEND_EXT_STMT UNUSED UNUSED UNUSED
  8. [3] => ZEND_ASSIGN T(0) T(0) 3
  9. [4] => ZEND_EXT_STMT UNUSED UNUSED UNUSED
  10. [5] => ZEND_EXT_FCALL_BEGIN UNUSED UNUSED UNUSED
  11. [6] => ZEND_SEND_VAR UNUSED T(0) 0×1
  12. [7] => ZEND_DO_FCALL T(1) ‘increment’ 0×83E710CA
  13. [8] => ZEND_EXT_FCALL_END UNUSED UNUSED UNUSED
  14. [9] => ZEND_ASSIGN T(2) T(0) T(1)
  15. [10] => ZEND_EXT_STMT UNUSED UNUSED UNUSED
  16. [11] => ZEND_ECHO UNUSED T(0) UNUSED
  17. [12] => ZEND_RETURN UNUSED 1 UNUSED
  18. [function_table] => Array
  19. (
  20. [increment] => Array
  21. (
  22. [0] => ZEND_EXT_NOP UNUSED UNUSED UNUSED
  23. [1] => ZEND_RECV T(0) 1 UNUSED
  24. [2] => ZEND_EXT_STMT UNUSED UNUSED UNUSED
  25. [3] => ZEND_ADD T(0) T(0) 1
  26. [4] => ZEND_RETURN UNUSED T(0) UNUSED
  27. [5] => ZEND_EXT_STMT UNUSED UNUSED UNUSED
  28. [6] => ZEND_RETURN UNUSED NULL UNUSED
  29. )
  30. )
  31. [class_table] =>
  32. )

As we can see above, Zend engine is able to recognize the flow of our PHP. For instance, [3] => ZEND_ASSIGN T(0) T(0) 3 is a replacement for $a = 3; in our PHP code. Read on to understand what do these T(0) in the opcode means.

Executing the opcodes: The generated opcode is executed one by one. Below table shows various details as every opcode is executed:

  1. ~ sabhinav$ php -d vld.active=1 -d vld.execute=0 -f debug.php
  2. Branch analysis from position: 0
  3. Return found
  4. filename:       /Users/sabhinav/Workspace/interview/facebook/peaktraffic/debug.php
  5. function name:  (null)
  6. number of ops:  13
  7. compiled vars:  !0 = $a, !1 = $b
  8. line     #  op                           fetch          ext  return operands
  9. ——————————————————————————-
  10. 2     0  EXT_STMT
  11. 1  NOP
  12. 5     2  EXT_STMT
  13. 3  ASSIGN                                                   !0, 3
  14. 6     4  EXT_STMT
  15. 5  EXT_FCALL_BEGIN
  16. 6  SEND_VAR                                                 !0
  17. 7  DO_FCALL                                      1          ‘increment’
  18. 8  EXT_FCALL_END
  19. 9  ASSIGN                                                   !1, $1
  20. 7    10  EXT_STMT
  21. 11  ECHO !1
  22. 8    12  RETURN                                                   1
  23. Function increment:
  24. Branch analysis from position: 0
  25. Return found
  26. filename:       /Users/sabhinav/Workspace/interview/facebook/peaktraffic/debug.php
  27. function name:  increment
  28. number of ops:  7
  29. compiled vars:  !0 = $a
  30. line     #  op                           fetch          ext  return operands
  31. ——————————————————————————-
  32. 2     0  EXT_NOP
  33. 1  RECV                                                     1
  34. 3     2  EXT_STMT
  35. 3  ADD                                              ~0      !0, 1
  36. 4  RETURN                                                   ~0
  37. 4     5* EXT_STMT
  38. 6* RETURN                                                   null
  39. End of function increment.

First table represents the main loop run, while second table represents the run of user defined function in the php script. compiled vars: !0 = $a tells us that internally while script execution !0 = $a and hence now we can relate [3] => ZEND_ASSIGN T(0) T(0) 3 very well.

Above table also returns back the number of operations number of ops: 13 which can be used to benchmark and performance enhancement of your PHP script.

If APC cache is enabled, it caches the opcodes and thereby avoiding repetitive lexing/parsing/compiling every time same PHP script is called.

3 PHP extensions providing interface to Zend Engine: Below are 3 very useful PHP extensions for geeky PHP developers. (Specially helpful for all PHP extension developers)

  • Tokenizer: The tokenizer functions provide an interface to the PHP tokenizer embedded in the Zend Engine. Using these functions you may write your own PHP source analyzing or modification tools without having to deal with the language specification at the lexical level.
  • Parsekit: These parsekit functions allow runtime analysis of opcodes compiled from PHP scripts.
  • Vulcan Logic Disassembler (vld): Provides functionality to dump the internal representation of PHP scripts. Homepage of VLD project for download instructions.

Hope this is of some help for PHP geeks out there. Enjoy!

Tags: -->
« Previous posts Back to top