SQL for BI and Data Science

During this training, you will learn how to use SQL for analysing data. It may come as a surprise that so-called "in-database" analytics is used much less than it could be: most often, the (production) data is first copied to an analytics platform or BI tool to do the "data science" there. For large data volumes, or for data "in the cloud", this is not the best choice since precious time is lost by copying the data. This course will teach you how you can perform data analytics in the database itself. The learned techniques will of course also be applicable on data which has been copied, to an environment that supports SQL, e.g. to a data warehouse (DW), or to a Big Data environment like Hadoop (with Hive) or Spark.

After a short introduction on data warehouses, data lakes, business intelligence (BI) and Data Science topics, we dive into a thorough discussion on the syntactic possibilities offered by SQL. More specifically, statistical, analytical and OLAP functions will be treated, as well as the use of SQL for ETL, for data exploration, and (if applicable) for MapReduce. The most important ones are individually discussed and demonstrated; realistic labs should allow the participant to get an idea of the applicability of these SQL functions and options.

The SQL syntax covered in this course is platform independent. When relevant, platform specific syntax will be pointed out with respect to Oracle, Db2, SQL Server and Hive/Spark.

At the end of this training, participants will be able to:

  • understand how SQL (ie. the SELECT instruction) can be used for reporting and explorative purposes in the context of DW, BI, Big Data and Data Science;
  • write effective SQL from a DW / BI / Big Data perspective;
  • write problem-based SQL statements, and evaluate possible alternative SQL constructs for their effectiveness

Schedule

datedur.lang.locationprice 
03 Jun2Leuven 1200 EUR (excl. VAT)
03 Jun2web based 1200 EUR (excl. VAT)
SESSION INFO AND ENROLMENT

Intended for

This course targets everyone involved in the development of business reports, by means of SQL statements on relational database systems or Hadoop (Hive) or Spark SQL.

Background

In order to have the full benefit of this course, it is important to be really proficient with SQL (level SQL workshop and being familiar with complex queries; having followed the SQL advanced course is an advantage). Furthermore it's important to be aware of some elementary statistical terminology (see e.g. Statistics fundamentals).

Main topics

  • Data warehouse, business intelligence, Data Science -- positioning
  • Extract / Transform / Load (ETL)
  • the dimensional model
  • Statistics and analytics -- support in SQL
  • Purpose and context -- multi-platform support.
  • AVG, CORRELATION, COUNT, COVARIANCE, MAX, MIN, RAND, STDDEV, SUM, VARIANCE, MEDIAN, ...
  • Online analytical processing in SQL
  • Purpose and context -- multi-platform support.
  • RANK, DENSE_RANK, ROW_NUMBER, PARTITION BY, ORDER BY, ROWS, RANGE, GROUP BY, GROUPING SETS, ROLLUP, CUBE, OVER
  • WINDOWING functions: SUM ... OVER ..., LEAD; LAG and similar aggregation functions, and their application in a.o. trend analysis
  • Modelling: histograms; linear regression; normality testing; trend analysis; confidence intervals; ...
  • Text analytics; regular expressions

Training method

Class training alternated with exercises. These can be done on Db2, Oracle, SQL Server, MariaDB, Hive, or Spark. (Please indicate your choice on the enrolment form.)

Duration

2 days.

Course leader

Peter Vanroose, Arnout Veugelen.

Reviews

Zeer fijne instructeur/docent Neemt de tijd en weet de stof helder over te brengen.

 
  (, )

Ik ben zeer tevreden

 
  (, )

Het was een erg interessante cursus en Peter was een kundige docent.

 
  (, )

très bon

 
  (, )

excellente

 
  (, )

Also interesting

Enrollees for this training also took the following courses:


SESSION INFO AND ENROLMENT