ABIS Infor - 2012-04

Know Thine Own Database

Steven Scheldeman (ABIS) - 12 March 2012

Abstract

It has become inconceivable for any modern company to work without a database. Of course, this can't be achieved without any knowledge of the the language to communicate with such a DB (SQL) and the actual software of the DB (DB2, Oracle, SQL Server, MySQL, ...). However, there's something that a lot of people tend to overlook. A profound knowledge of the actual software, the DB platform and SQL as a language, isn't the same as knowing your own database.

Designing your database: Pouring the foundations.

As we all know, the design of your database is vital with regards to it's efficiency. Starting with the preliminary analysis of the data as used by your company and ultimately arriving at the actual structures of tables and views, each step along this path will influence the manner in which your DB is used. These first steps, these foundations, will have an impact on every query. So it is essential in this phase that all blue-prints of your database are extensively documented.

Implementation & Maintenance: Constructing your castle.

Once you've decided upon the design, you need to translate this into actual physical and logical structures known as "The Database". Lots of things need to be taken into consideration: performance issues, memory usage, indexes, data integrity, triggers, procedures, ... and so forth. And all of these considerations can be implemented using SQL. We don't have to stress the need for in depth documentation during this phase. Omit the documentation and you will yourselves in unknown territory without a guide.

On top of it all, a database isn't static. The needs and activities of any company can and will change through time, and so will the demands we pose on our DB. Throughout time we keep on renovating our "castle": we add columns, tables, views. Some are no longer used, or might even get deleted entirely. In other words: our database is alive. And this demands a constant revision of our blue-print, the map which guides us in search of our data.

Querying the Data: a Treasure Hunt.

In the end, we arrive at the sole purpose of a DB: to retrieve and maintain the vital information of our company in an efficient and performant manner. And what do we see? For many end-users is the retrieval of data more like a treasure hunt than an actual easy-to-execute query. What can prevent this phenomenon?

The first step is a sufficient knowledge of SQL. We not solely talking about a syntactical knowledge, but also about a profound understanding of the execution order of a query. If you understand in which order a query is executed, you can influence the efficiency and performance of your query, with positive results.

A second step is the analysis of the demand for data. This is so essential that some companies employ a dedicated team just to analyse any question, before a single query is written. Let's be honest: all of us have posed questions which seemed clear and precise in our heads, but were in reality too vague, or too restrictive to produce answers that fitted our needs. A correct analysis of a demand for data is as essential for an efficient query, as any syntactical knowledge of SQL.

However, there is a third step which is often overlooked: a profound knowledge of your own database. A lot of companies considerd this a something of lesser importance, something that an end-user will pick up along the way. We beg to differ: such a knowledge is just as neccesary as a correct analysis, or a syntactical expertise in SQL. It is the map that will guid us on our treasure hunt.

We're convinced that such a map should include three essential parts: a schematical drawing indicating all tables (views) and the links between them, a syntactical description of all tables and columns and a semantical description of the same. The third part is just as essential as the first two, but sadly it's usually missing.

Nowadays there are a lot of GUIs which can generate such a schematical drawing, and provide without any problem a thorough syntactical description. This shouldn't come as a surprise since these two parts are based on the data which can be found in the catalogue, an intrinsic part of any DB-software. The semantical description however, can't be generated by the software, because it needs human input to create. The meaning of each column, of each number, each string found in a field must be documented, written out and explained ... and this takes time.

We'll repeat this to be totally clear: documenting each table and each column in such a manner that the meaning of each piece of data is crystal clear to anyone who reads this semantical description ("This number represents the actual number of this specific product in our warehouse"), takes a lot of time and effort. But it is an effort that pays. Once it is finished (and maintained), new end-users have an meaningful description of the DB which allows them to use the DB without any additional human support.The time gained in this manner more than makes up for the initial time and effort spent in generating that semantical description. It will have a profound impact on understanding the DB, the relations between data-items, and wil diminish the need for other employees to spent their time guiding new ones. That is to say, only if new employees are given leeway to study this description for a couple of days.

Customized training courses

When we observed these discrepancies in the way our own clients documented their databases, we decided to tackle these aspects of querying in the courses here at ABIS. Our SQL training goes beyond the pure syntactical aspect of the language. As in integral part of of the course, we spent time and effort to the order of execution of a query, to data demand analysis and to the logical structure of a given database. This is explored in stages: we start at the very basics to guide you to the very high end performance queries. And each of these courses can be tailored to your own company database.