ABIS Infor - 2011-08

SAS and the adaptation to the RDBMS world

Koen De Backer (ABIS) - 30 June 2011

Abstract

In most companies data are held in various formats, not just SAS. The relational databases (DB2, Oracle, SQL Server,...) that we use, have their own peculiarities that may not always be obvious.

"Proc SQL" was developed by SAS to make use of some of the powerful components of SQL. The "Proc SQL" procedure available in SAS can minimise coding and maximise resource usage, particularly when working with very large data collections.

Using "libname" for a database and working through "Proc SQL" doesn't mean your code is going to be handled as if running against a SAS data source. There are the usual design differences between SAS data sets and RDBMS tables, coding considerations, and the "libname" option versus "SQL Pass-Through".

Design differences between SAS data sets and relational tables

Data sets are known for the lack of methodology in creating and documenting them. This is partially due to the ease of creation (get some disk space, the write access to go with it and build a SAS data set). Structure can very easily be changed.

On the other hand, most RDBMSes are very strict on what goes into the database, the access people have and the procedure to add new structures and elements. The latter environments usually have good documented data models so that users can understand the relationships, also data element definitions and a central repository to go with this.

Getting relational

The terminology is different for a data set versus a table, data sets have variables and observations (although SAS tries to change to the relational terms), tables have columns and rows.

As far as data structure is concerned, SAS data sets very often have the information you look for in just one or two sources. In an RBMS, information will probably be 'normalised' and spread over many tables.

Any column in a data table can serve as an identifying (key) variable, a.k.a. "primary key" consisting of one or more columns. This key has to distinguish any one row from any other row in the table. The database system guarantees uniqueness of a primary key value. This rule enables the system to link unambiguously to a single row in a data table.

Tables may also contain columns that link to columns in other tables. These links have special meanings in relational systems. The so-called "foreign key" reference requires the special protection of a "referential integrity" constraint.

Something you may not be used to is the use of "index"-es. The original SAS data sets don't very often have them. An index may be used to define the primary key, enforce referential integrity or make retrieval of a subset easier.

SAS fully supports the relational model

Overview of SAS relational features in a nutshell:

  • Base SAS
  • Library (database), data set (table), views, indexes, integrity constraints, data dictionary views
  • Proc SQL
  • SAS ODBC Driver
  • SAS/SHARE and SAS/SHARE*NET
  • Proc Server and Proc Operate
  • Lock management
  • Authentication
  • SAS/ACCESS
  • Access to data from various sources using native data engine or ODBC

SAS will let you choose the technique to retrieve/update (relational) data

Although currently SAS delivers the features to support the relational model, for most companies SAS is not their primary relational data store. Using SAS together with a external relational data store implies the use of an effective link with that same data store. Although many scenarios can be built, there are two basic methods to access a database through SAS/Access.

First there is the method based on "SQL Pass-Through" through which SAS passes the SQL code directly to the database.

Second we have the "libname" method which assigns a SAS name (libref) to a database as if it were a SAS library. All statements are then translated into SQL, be it not always with the same satisfying quality. It also lets you query combinations of tables and datasets with "Proc SQL" although not without some considerations based on knowledge of the table structures, data quantities and some caution with complex sql queries.

Conclusion

Implementing data models should help in setting up more robust (consistent) data collections, SAS can do this for you. The SAS "relational features" can be used to improve existing SAS data library structures.

If SAS is not the relational data store, but holds the interface to get at the data, then remember to keep a check on the complexity of your statements, decide when to use "libname" or "Pass-Through" and be wary of statements that combine objects from both worlds.

The course: "SAS programming part 2: Digging deeper" (see http://www.abis.be/html/en0886.html) explores this topic in more depth.