ABIS Infor - 2012-10

MySQL: stored procedures and SQL/PSM

Peter Vanroose (ABIS) - September 2012

Abstract

This contribution is a brief tutorial on stored procedures in MySQL. At the same time, it's a report on our (ABIS') experiences with porting stored procedures written in SQL PL (DB2) to MySQL. To be successful, such a port requires some sort of standardised language across platforms. Hence the introductory section on the SQL/PSM standard.

SQL and procedural program logic

The SQL language is designed primarily to query (and update) data in a relational database. It is a 4th generation language which means that it is declarative (not imperative like 3rd generation languages such as Java). Data manipulation statements SELECT, DELETE, INSERT, UPDATE don't need variables nor explicit iterations or other control flow constructs.

Sometimes, however, it makes sense to combine pure data manipulation (DML) with procedural logic and variables, e.g. to perform some non-database action on every row of a result table, or to store intermediate results for later use, or to enforce a certain chronology or interdependency of several DML statements (e.g. insert into two tables, then commit).

In a relational database, stored procedures are the natural container for procedural program logic. They allow a mix of DML statements with sequential logic and flow constructs (like IF ... ELSE or WHILE ...) and they can be called through the database server from any SQL client since "CALL procedure_name" is a standard SQL statement. Question remains: which 3rd generation programming language to use to implement stored procedures?

Unfortunately, until about 10 years ago there was no standard to be followed, so all RDBMS platforms had (and still have) their own proprietary solution. Oracle was one of the first systems to integrate SQL and procedural logic through PL/SQL which is syntactically based on Ada (and Pascal). Microsoft's SQL Server borrowed the syntax of Sybase's Transact-SQL. IBM's DB2 was relatively late with incorporating procedural logic into SQL (only in the late 1990s) and came up with their SQL PL. PostgreSQL first developed PL/pgSQL (a language similar to Oracle's PL/SQL) but later switched to PL/PSM. Last in the row, MySQL waited until version 5 (released in 2005) to support stored procedures.

By that time, the SQL:2003 ISO & ANSI standard was ready. It added (compared to the earlier SQL:1999 standard) a chapter on procedural program logic, called SQL/PSM, "Persistent Stored Modules". Luckily the MySQL 5.0 development team wisely decided to follow the ISO/ANSI standard w.r.t. their programming language for stored procedures (and, by the way, also for triggers, user defined functions, and events). Actually both the DB2 (SQL PL) and PostgreSQL choices are also fully compatible with the SQL:2003 standard SQL/PSM. Unfortunately PL/SQL (Oracle) and t-SQL (SQL Server) aren't.

Stored procedures, triggers and user defined functions written for DB2 should thus work on MySQL without any modification, and vice versa. At least, if those procedures stick to the SQL/PSM standard syntax and don't use any DB2 or MySQL nostandard extensions. But... is this really the case? Can you just migrate any DB2 stored procedure to MySQL? We made the exercise at ABIS, so read on to learn what we found out!

SQL/PSM in action in MySQL

Apparently, a lot of MySQL users (and especially database designers) seem to have missed some of the recent evolutions and stayed in the version 4 era: new features of version 5 (although available since more than 7 years) seem to remain unused -- and unknown! The MySQL procedural objects (and the SQL/PSM syntax) seem to be amongst the best kept secrets of the last decade! Hopefully this article will get you "triggered" to start exploring this wonderful world ...

SQL/PSM can only be used inside a procedural object: a stored procedure, a trigger, a function, or an event. I'll only look at stored procedures here. They are useful in an RDBMS context since they allow database designers and administrators to encapsulate certain "business logic" into a database object which sits inside the server and can be called from any SQL client program. This avoids code duplication on the clients, avoids unnecessary network data traffic between server and client, allows for better data security, and simplifies data and application design with guaranteed data consistency.

What's a procedure? Well, it should have a name, a (possibly empty) list of parameters, and a "body" which contains one or more SQL statements. These can be a mix of "traditional" DML and additional SQL/PSM syntax for local variables, error handling, loop control, and IF conditions. The following SQL creates a procedure with some of these ingredients:

CREATE PROCEDURE proc1      /* name */
(IN parm1 INT, OUT parm2 VARCHAR(32)) /* parameter list */
LANGUAGE SQL /* optional clause: language used for the body */
BEGIN /* start of a block of statements */
DECLARE var1 VARCHAR(10); /* local variable */
IF parm1 = 32 THEN /* start of IF; condition syntax as with "WHERE" */
SET var1 = 'ABIS'; /* this is the "assignment" statement */
ELSE
SET var1 = 'unknown';
END IF; /* end of IF composite statement */
INSERT INTO table1 VALUES (var1);/* SQL DML statement */
SET parm2 = 'Procedure proc1 ran successfully';
END /* end of the body; matches the "BEGIN" 9 lines ago */

The names used in this example (proc1, parm1, parm2, var1, table1) would of course be not well chosen for a "real life" procedure; I'll use such "generic" names to stress the fact that my examples are not meant to do useful things but only try to illustrate the SQL/PSM syntax in MySQL. Also notice the comments (/* ... */): use this to document your procedures!

Once a procedure named "proc1" is created, it can be called from any client by issuing an SQL statement of the form

       CALL proc1(32, @var);

or with any other numeric constant or expression between the parentheses instead of 32. Since the second parameter in this example is a "return" field (keyword "OUT"), the way to specify it in the CALL statement depends on the client environment in use. The "@var" syntax would be the way to specify it when using the "mysql" command-line client interface. When using ODBC or JDBC, or from within PHP, a question mark would be needed as parameter marker:

       CALL proc1(32, ?);

When the CALL succeeds, the statements inside the procedure body should have been executed (according to the program flow); in the example, the two side effects should be that table1 now has an extra row containing 'ABIS', and that the client receives the text 'Procedure proc1 ran successfully' into the parameter marker for the second parameter.

Variables inside a procedure are invisible to the caller: their scope is limited to the procedure body. Actually, scope is limited by any BEGIN ... END block enclosing the variable declaration. The following procedure has two such nested blocks:

CREATE PROCEDURE proc2 (OUT p1 CHAR(5), OUT p2 CHAR(5))
BEGIN
DECLARE x1, x2 CHAR(5) DEFAULT 'outer'; /* this is how variables get initialized */
BEGIN /* start of inner block
DECLARE x1 CHAR(5) DEFAULT 'inner';
SET p1 = x1;
END; /* notice the ";"
SET p2 = x1;
END

Two of the three variables in this example procedure are both named x1 which is perfectly legal: the scope of the "inner" x1 is limited to the inner BEGIN ... END block, while that of the outer x1 and x2 are limited to the outer BEGIN ... END, i.e., the full procedure. So x2 is still "visible" inside the inner block, while the outer x1 is of course made invisible by the declaration of an identically named variable inside that inner block.

Needless to say that, when called, this procedure returns the text contents 'inner' and 'outer' as the two parameter marker values, in that order.

Loops and cursors

The most often needed DML statement is SELECT; but since it generally returns a result table, while variables and parameters can only be "scalar", a cursor will be needed to iterate through the returned table rows one by one. Suppose we want to obtain the comma-separated, concatenated values of the first 100 rows of a two-column table t1 (column lengths 32):

CREATE PROCEDURE proc3 (OUT p1 VARCHAR(3500), OUT p2 VARCHAR(3500))
BEGIN
DECLARE x1, x2 CHAR(32);
DECLARE n INT DEFAULT 0;
DECLARE c1 CURSOR FOR SELECT * FROM t1; /* cursor declaration */
SET p1 = 'Col1: '; SET p2 = 'Col2: '; /* needed for "concat" to work properly */
OPEN c1; /* cursor query is executed now */
WHILE n < 100 DO /* start iteration */
FETCH c1 INTO x1, x2; /* one result row is returned */
SET p1 = concat(p1, rtrim(x1), ', '); SET p2 = concat(p2, rtrim(x2), ', ');
SET n = n + 1;
END WHILE; /* end iteration */
CLOSE c1; /* clean up the cursor area */
END

This example illustrates the WHILE syntax, and at the same time the typical use of a cursor: DECLARE, OPEN, FETCH (repeatedly), and CLOSE. Note that cursor declarations must be placed after all variable declarations but before any executable statements. The only atypical aspect is the fact that the loop in this example doesn't traverse the cursor until the end of the result set but stops after 100 iterations. Stopping at end-of-result requires error handling: since attempting to FETCH after an end-of-table is an "exception" returned by the database engine, the procedural logic should explicitly catch that exception and finish the WHILE loop at that point. Here is a typical way to do this with SQL/PSM:

CREATE PROCEDURE proc4 (OUT p1 VARCHAR(3500))
BEGIN
DECLARE x1 CHAR(32);
DECLARE end_indicator INT; /* implicitly initialized to NULL */
DECLARE c1 CURSOR FOR SELECT col1 FROM t1;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000'
SET end_indicator = 1; /* what to do when this exception occurs */
SET p1 = 'Col1: '; OPEN c1; FETCH c1 INTO x1;
WHILE end_indicator IS NULL DO /* handler sets condition to false */
SET p1 = concat(p1, rtrim(x1), ', ');
FETCH c1 INTO x1;
END WHILE;
CLOSE c1;
END

"Continue handlers" are "triggered" when an exception is thrown from within the database engine. Every possible type of exception has an associated SQLSTATE which is standardized across RDBMS platforms; SQL state "02000" stands for "end-of-data". The procedural logic will continue where it left off when an exception is catched with a handler, but only after the handler body statement is executed. Such a statement can of course be a BEGIN...END block, if needed.

By default, if no continue handler would have been declared, the thrown exception would have been passed on to the caller of the procedure, but with an implicit rollback of whatever happened inside the procedure. This would leave p1 at NULL.

MySQL's SQL/PSM versus DB2's SQL PL

To find out how "standardized" the MySQL implementation of the SQL/PSM specification really is, we tried to port all our DB2 stored procedures to MySQL. First, we ran into DB2 non-standard extensions of SQL PL that were used in our existing procedures. Actually, there was only one such extension that was used systematically in all our DB2 procedures: the SQLCODE variable (a DB2 variant of SQLSTATE) into which the database engine writes a particular value for each exception, but without causing the procedure to be interrupted, even if no handler is present.

So we first rewrote our DB2 procedures --in DB2--, making sure not to use SQLCODE anymore. Instead we had to introduce the corresponding continue handler(s), thereby introducing an additional "flag" variable (like the end_indicator variable in the example). This worked out fine: the new procedures ran perfectly in DB2.

New attempt to blindly run the DB2 "create procedure" statements on MySQL. Now we observed some syntactic differences, luckily not in the body of the procedures but in the optional clauses specified just before the body: DB2 allows options like "LANGUAGE SQL", "QUALIFIER", "DISABLE DEBUG MODE", "WITH EXPLAIN", "ISOLATION LEVEL", "VALIDATE BIND", and "DATE FORMAT EUR". For use with MySQL, "VALIDATE BIND" had to be replaced by "SECURITY DEFINER"; the alternative "VALIDATE RUN" would have to be replaced by "SECURITY INVOKER". For the other options, MySQL uses "environmental settings" which it inherits from the environment executing the CREATE statement. For example, the "ISOLATION LEVEL" setting is found in the tx_isolation session variable of the MySQL client. And the DATE FORMAT setting corresponds to MySQL's date_format session variable.

As with DB2, all relevant settings are stored together with the procedure, in the database catalog, hence will be re-activated when someone calls the procedure, thereby temporarily overriding that user's corresponding session variables.

The only option clause which we could keep was the "LANGUAGE SQL": required in DB2, optional with MySQL.

After these small modifications, the CREATE PROCEDURE statements from DB2 worked on MySQL! But did they run properly? To verify this, we had to create identical tables on both systems, have the same test data in both, and migrate then run the unit test programs from DB2 on MySQL. And indeed: it turned out that MySQL worked exactly as expected!

Conclusion

For a "port" of environments (applications, tables, data, ...) to be successful, standardisation is crucial. SQL (and especially its DML statements) has been standardised for quite some time now, but the SQL/PSM standardisation is relatively new.

Three relational database systems have embraced this new standard as their preferred language for implementing triggers and stored procedures: DB2, PostgreSQL, and MySQL.

Our experience with porting stored procedures from DB2 to MySQL shows relatively positive: it suffices to get rid of known non-standard constructs (like the use of SQLCODE with DB2), and to understand the way to configure the environment, in order to successfully port a procedure from one platform to the other.