Saturday, April 23, 2011

Oracle PL/SQL language pitfalls

Oracle PL/SQL is an procedural language for Oracle relational database and tools like Oracle Forms and Reports. The language was based on the Ada programming language and even uses a variant of Descriptive Intermediate Attributed Notation for Ada (DIANA). I've been developing in the language for the past 4 years and wanted to share some pitfalls I stumbled upon during this time. Since the language is supposed to be based on Ada I will compare some cases to their counterparts in the Ada language - skipping the ones without obvious mappings (like SQL only code). We will start with a trivial problem and increment to more sophisticated ones.

PL/SQL and Ada are both block structured languages where each block can be divided into roughly three sections. The declaration for specifying block-local variables, sub-program units, records etc. The code section containing the algorithmic code and an exception section where exceptions from the current block can be caught and handled. Further language features like package specifications, object oriented programming etc. will not be discussed in this post as they are not required to demonstrate any of our problems. Single line comments in Ada and PL/SQL start with double dash (--), multi-line comments will not be used by us.

An example PL/SQL block:

<<example_block>> -- optional label
declare
  -- variable declarations
  -- sub program units (functions/procedures)
begin
  -- code
exception
  when exception_name then -- handle a named exception
    handle_exception();
  when others then -- catch all exceptions handler
    handle_all_exceptions();
end example_block;


The usage of 'end example_block' is only valid if the block was <<labeled>> otherwise just use 'end;'.

Since we got our basics down, we can start with our first problem.

Weak compile time checks

The longer I work on large legacy code bases the more I appreciate errors caught during compilation.

In the following example, a VARCHAR2 variable is declared with a length equal to one. You can think about the VARCHAR2 type as string. In the declaration section a constant literal 'abc' is assigned to this string which exceeds it's capacity.

set serveroutput ON

DECLARE
  PROCEDURE test_literal IS
    v_string VARCHAR2(1) := 'abc';
  BEGIN
    dbms_output.put_line( v_string );
  END;
BEGIN
  dbms_output.put_line('Before Call');
  test_literal();
  dbms_output.put_line('After Call');
END;

/

From the above code, it is obvious that there is a bug which can be determined on compilation - sadly PL/SQL compiles the code without any warnings.
We have declared a local procedure in this block to demonstrate that the issue really arises on runtime and not during compilation.
Save the code to literal_string.sql and run it in a SQL Plus session.


SQL> @C:\blog\literal_string.sql

Before Call

ORA-06502: PL/SQL: numeric or value error: character buffer too small
ORA-06512: at line 4
ORA-06512: at line 10


As you probably guessed, this resulted in a run-time exception being raised (ORA-06502). The same problem extends to numbers exceeding the defined precision (ie. NUMBER(2) assigned 100 as a value). Since the value is literal it would be nice if the language at least reported it during compilation. Let's see how a similar example behaves in Ada.

with Ada.Text_IO;

procedure Literal_String
is
  v_string : String (1 .. 1) := "abc";
begin
  Ada.Text_IO.Put_Line ( v_string );
end Literal_String;


Save the file to literal_string.adb and compile with gnatmake -gnato literal_string.adb.

mulander@bunkier_mysli:~/code/blog/plsql-pitfalls$ gnatmake -gnato literal_string.adb
gcc -c -gnato literal_string.adb
literal_string.adb:5:33: warning: wrong length for array of subtype of "Standard.String" defined at line 5
literal_string.adb:5:33: warning: "Constraint_Error" will be raised at run time
gnatbind -x literal_string.ali
gnatlink literal_string.ali


The -gnato compiler flag enables run-time overflow checks but it doesn't affect this example.
The compiler nicely warned us that the length of the assigned string exceeds the allowed range for the variable declared at line 5. The compiler even informed us which exception will be raised on run-time. Adding a -we flag would treat all warnings as errors and wouldn't even produce an executable. We didn't supply the flag so let's run the program.


mulander@bunkier_mysli:~/code/blog/plsql-pitfalls$ ./literal_string

raised CONSTRAINT_ERROR : literal_string.adb:5 length check failed
mulander@bunkier_mysli:~/code/blog/plsql-pitfalls$


We got our runtime exception - exactly the one we were warned about. This may seem trivial but trust me - things like this can sometimes save hours spent on handling a runtime error on a production system.

Ada also has a nice syntax for string literal declarations that infers the length from the assignment.


with Ada.Text_IO;

procedure Literal_String_2
is
v_string : String := "abc";
begin
Ada.Text_IO.Put_Line ( v_string );
end Literal_String_2;


Save this to literal_string_2.adb and compile with gnatmake -gnato literal_string_2.adb. I'll skip the output as the program behaves correctly and no exceptions are raised.

What happens if we assign a literal exceeding the inferred length?


with Ada.Text_IO;

procedure Literal_String_3
is
  v_string : String := "abc";
begin
  v_String := "abcd";
  Ada.Text_IO.Put_Line ( v_string );
end Literal_String_3;


Save the content to literal_string_3.adb and compile with gnatmake -gnato literal_string_3.adb.


mulander@bunkier_mysli:~/code/blog/plsql-pitfalls$ gnatmake -gnato literal_string_3.adb
gcc -c -gnato literal_string_3.adb
literal_string_3.adb:7:15: warning: wrong length for array of subtype of "Standard.String" defined at line 5
literal_string_3.adb:7:15: warning: "Constraint_Error" will be raised at run time
gnatbind -x literal_string_3.ali
gnatlink literal_string_3.ali


Again we will skip the output. The proper Constraint_Error exception is raised like the compiler promised. You can't imagine how happy I am seeing this behavior. At this point we can move on to our second problem.

Implicit type conversions
Implicit type conversions are often hard to track down and can lead to significant problems during application run-time. One of the most severe problems caused by them are hard to predict choices when overloaded functions/procedures are involved - this time it's not I what I want to focus on. The specific bug I encountered is really interesting because of the innocent code it appears in and an exception that could be avoided with a more strict compilation phase.


set serveroutput on

DECLARE
  n_first NUMBER := 1;
  n_second NUMBER := 2;
BEGIN
  dbms_output.put_line('Before faulty code');
  dbms_output.put_line('n_first + n_second ' || n_first + n_second );
  dbms_output.put_line('After faulty code');
END;

/


Imagine for a moment that the middle output statement is a debug log added and left by a programmer in an important part of the system. Let's run the code in SQL Plus:


SQL> @C:\blog\concat.sql

Before faulty code

ORA-06502: PL/SQL: numeric or value error: character to number conversion error
ORA-06512: at line 7


Why did this code result in a run-time error? My guess is operator precedence rules and implicit type conversions. From the linked documentation page we can see that +, - and || (addition, subtraction and concatenation operators) have equal precedence and as stated in the document such operators are applied in no particular order. So what happened?

The following line:

dbms_output.put_line('n_first + n_second ' || n_first + n_second );


Can be further simplified to:

procedure_call( string_type CONCAT number_type ADD number_type )

We know that before the function call the operators will be evaluated so we can drop the call:

string_type CONCAT number_type ADD number_type

Now, since both concatenation and addition are of equal precedence we can start with an assumption that concatenation was performed first (leftmost operation). So number_type was converted to a string and concatenated to our first string_type.

string_type ADD number_type

Now, we are left with an addition operation of a string and a number. Since adding a number to a string doesn't make any sense for the compiler an implicit string to number conversion happens. This operation fails with an obvious run-time exception. A really nasty bug hard to find in such an innocent statement.

So how does Ada handle this problem? Easy, no implicit type conversions.


with Ada.Text_IO;

procedure Concat is
  n_first : Integer := 1;
  n_second: Integer := 2;
begin
  Ada.Text_IO.Put_Line ( "n_first + n_second" & n_first + n_second );
end Concat;


Save the file to concat.adb and compile with gnatmake -gnato concat.adb


mulander@bunkier_mysli:~/code/blog/plsql-pitfalls$ gnatmake -gnato concat.adb
gcc -c -gnato concat.adb
concat.adb:7:47: invalid operand types for operator "&"
gnatmake: "concat.adb" compilation error


The file doesn't even compile. The concatenation operator & expects a string as it's left and right value. The language will not attempt any implicit conversions even in a simple case like string_type CONCAT number_type. If you want to achieve concatenation of numbers with strings in Ada you need to explicitly cast them:


with Ada.Text_IO;

procedure Concat is
  n_first : Integer := 1;
  n_second: Integer := 2;
begin
  Ada.Text_IO.Put_Line ( "n_first + n_second"
    & Integer'Image(n_first + n_second) );
end Concat;


This time the code compiles without errors and runs with no exceptions raised. It's true that mitigating the problem in PL/SQL is also simple. All one needs to do is to surround the addition of variables with additional parentheses. I still prefer being informed on compilation - even with just a warning. PL/SQL type conversion problems are not limited to simple cases like this but extend to built-in functions. The TO_DATE function when converting a string to a date type can result in an exception being raised which only depends on two variable. The programmer forgetting to specify the string format and the national language support database level variable set to a value which expects a string in a different format than in the provided variable. Bugs like this are annoying to find.

We are done with problems that can be compared to Ada directly. The following sections will only relate to PL/SQL.

Too_Many_Rows exception
Generally the problem with the too_many_rows exception is the moment it's raised. It's best to start with an example:

set serveroutput on

DECLARE
  n_some_value NUMBER := -1e10;
BEGIN
  BEGIN
    -- The following rowset will be returned by
    -- this select statement:
    -- 1
    -- 2
    -- This will raise a too_many_rows exception
    SELECT LEVEL
      INTO n_some_value
      FROM dual
   CONNECT BY LEVEL <= 2

      ; 
  EXCEPTION
    WHEN too_many_rows THEN
      dbms_output.put_line( 'too_many_rows exception' );
      NULL; -- Typical wrong way to handle this case
  END;
  -- So what's the value of n_some_value?
  dbms_output.put_line( 'n_some_value='|| n_some_value );
END;
/

n_some_value is being initialized to a constant -1e10 then a select into statement performs a query which always returns two rows of data and tries to save it into n_some_value. Raising the too_many_rows exception is of course correct in this case. The problem is at which point is it raised? Try to guess the value of n_some_value printed out at the end of the code and compare it to the actual result:


SQL> @C:\blog\too_many_rows.sql

too_many_rows exception
n_some_value=1

PL/SQL procedure successfully completed


You probably expected to see n_some_value=-10000000000 on the output right? We got 1 instead. What happened here is the query got executed, the first row was fetched and stored in the variable n_some_value. Since there is more data on the cursor (select into is automatically converted by the database into a cursor fetch) the code is attempting to fetch the second row and in a select into statement only single row queries are allowed so the too_many_rows exception gets raised. Since our variable was set during the first fetch it no longer holds the initialization value of -1e10. This is an easy bug to fix (reinitialize the variable in the exception handler) but it's hard to track down if the error handling code was incorrect or missed this reinitialization in the first place.

Code optimization
The title of this section might be a bit misleading. I love the fact that the compiler tries hard to optimize away unnecessary code - the problem is that it's hard to sometimes guess what could get optimized away and in specific use cases this can change actual program behavior leading to Heisenbugs.
Please keep in mind, that the only time I hit this bug was a really bad programming approach so the language can't really be blamed for making a wrong decision here but still at least it could be a bit more predictable.

SET serveroutput ON

DECLARE
  v_date_string VARCHAR2(8) := '20110229'; -- only 28 days in feb 2011
  b_valid_date  NUMBER := 1;
  invalid_date  EXCEPTION;
  -- introducing ORA-01839 exception by name
  pragma exception_init(invalid_date,-01839);
BEGIN
  BEGIN
    IF TO_DATE ( v_date_string, 'yyyymmdd' ) =
       TO_DATE ( v_date_string, 'yyyymmdd' )
    THEN
      NULL; -- do nothing
      --dbms_output.put_line('oops');
    END IF;
  EXCEPTION
    WHEN invalid_date THEN
      b_valid_date := 0;
  END;
  -- an important block of logic executed only if the date was valid
  IF b_valid_date = 1
  THEN
    dbms_output.put_line( '... executing ...' );
  ELSE
    dbms_output.put_line( '+++ ERROR - invalid date +++' );
  END IF;
END;

/


Save this snippet to opt_away_simple.sql and run it in SQL Plus.


SQL> @C:\blog\opt_away_simple.sql

... executing ...

PL/SQL procedure successfully completed

SQL>


In this and all of the following snippets we expect to see +++ ERROR - invalid date +++ as the output of the program. Here is the convoluted logic that was supposed to guarantee date validation. The oracle TO_DATE(string,format) function can convert a string in date format to a date type. If the date matches the format but the date is incorrect (ie. 29 Feb 2011) an ORA-01839 exception will be raised which stands for date not valid for month specified. In the exception handler a control variable is set to 0 marking that the date was not valid. Now try to uncomment the put_line statement inside the IF control structure. Running with the uncommented code will give us the expected behavior. See how nasty that would be to debug?
To the point. Why is the code behaving this way? The PL/SQL compiler decided that the whole begin/exception/end block containing the if statement and exception handling code is not needed hence it was optimized away during compilation. The code literally does not exist after compilation. Here is my guess why:
The TO_DATE() function has no side effects. It doesn't modify any tables, doesn't return any values that get stored to a variable and doesn't insert any rows to tables. The only implicit side effect it has is a possible exception being raised if the provided date string is incorrect. The condition in the if statement also has no side effects except for the possible exception from TO_DATE() and at both sides of the equality comparison the input parameters to TO_DATE() are exactly the same. Since the function is deterministic (always returns the same output for a given set of input parameters) for the compiler this look like code that always evaluates to truth. The code section for the if statement is NULL which stands for a do nothing operation. In this case the whole if is a do nothing operation so the compiler decides to remove it. Since there is no longer any code that can raise the invalid_date exception the control flag will not be set hence our unexpected behavior.

Now here is the really funny part. Let's modify the code to use a boolean variable instead of a number. No other changes to the code and all of the above still remains valid (my reasoning for optimizing away the code).

SET serveroutput ON

DECLARE
  v_date_string VARCHAR2(8) := '20110229'; -- only 28 days in feb 2011
  b_valid_date  BOOLEAN := TRUE;
  invalid_date  EXCEPTION;
  -- introducing ORA-01839 exception by name
  PRAGMA exception_init(invalid_date,-01839);
BEGIN
  BEGIN
    IF TO_DATE( v_date_string, 'yyyymmdd' ) =
       TO_DATE( v_date_string, 'yyyymmdd' )
    THEN
      NULL; -- do nothing
      --dbms_output.put_line('oops');
    END IF;
  EXCEPTION
    WHEN invalid_date THEN
      b_valid_date := FALSE;
  END;
  -- an important block of logic executed only if the date was valid
  IF b_valid_date
  THEN
    dbms_output.put_line( '... executing ...' );
  ELSE
    dbms_output.put_line( '+++ ERROR - invalid date +++' );
  END IF;
END;

/


Save this snippet to opt_away_bool.sql and run it in SQL Plus.


SQL> @C:\blog\opt_away_bool.sql

+++ ERROR - invalid date +++

PL/SQL procedure successfully completed

SQL>


Now this is interesting. Changing just the type of our control variable made the code behave correctly - I really have no idea why in this case the code was not optimized away like when we used a number.
Here is an even more interesting example. Let's change the if statement checking the control variable from IF b_valid_date to IF b_valid_date = TRUE.

SET serveroutput ON

DECLARE
  v_date_string VARCHAR2(8) := '20110229'; -- only 28 days in feb 2011
  b_valid_date  BOOLEAN := TRUE;
  invalid_date exception;
  -- introducing ORA-01839 exception by name
  pragma exception_init(invalid_date,-01839);
BEGIN
  BEGIN
    IF TO_DATE( v_date_string, 'yyyymmdd' ) =
       TO_DATE( v_date_string, 'yyyymmdd' )
    THEN
      NULL; -- do nothing
      --dbms_output.put_line('oops');
    END IF;
  EXCEPTION
    WHEN invalid_date THEN
      b_valid_date := FALSE;
  END;
  -- an important block of logic executed only if the date was valid
  IF b_valid_date = TRUE
  THEN
    dbms_output.put_line( '... executing ...' );
  ELSE
    dbms_output.put_line( '+++ ERROR - invalid date +++' );
  END IF;
END;

/


Save this snippet to opt_away_bool_bad.sql and run it in SQL Plus.


SQL> @C:\blog\opt_away_bool_bad.sql

... executing ...

PL/SQL procedure successfully completed

SQL>


We are back to our incorrect behavior just with a single line change outside the scope of the optimized away block. Sadly I am not aware of any utilities that could help us review the decisions made by the compiler. The last two snippets will remain a mystery for some time.

Summary
I really despised the language at first but as time passed by I really started to like large parts of it. PL/SQL is not a bad language but it's a shame that even though in large part it was based on Ada, the language designers decided to skip some of it's features - especially the really strong type system. I can imagine that some of the gripes with the language remain in order to maintain backwards compatibility and that new language features are slowly being added in. I have a final suggestions for everyone working with a language that isn't considered the cool kid in the industry/community: try to understand the inner workings of the tool you use - it may bring new interest to an otherwise mundane task.