SQL (or the stored procedure) can handle most of the database
computations. If the computations are complex or hard to deal in
SQL, we use another programming language to read data out of the
database to manipulate it. Such a programming language handles the
data read and manipulation with a simple script. So we call the
process the post-SQL scripting.

The scenarios that SQL is not good at handling include complex
set-based operations, order-based operations, associative
operations and multi-step computations, etc. Due to SQL’s
incomplete set orientation and lack of explicit set data type,
it’s almost impossible to reuse the intermediate sets generated
during the computation. The forced aggregate after each grouping
operation makes it impossible to use the post-grouping subsets. The
unordered-set-based SQL adopts a very roundabout way to handle
order-based computations, like inter-row (group) computations and
ranking operations. The language generates temporary sequence
numbers using JOIN(s) or subqueries, making the program hard to
write and slow to compute. Record reference is another SQL’s
incapability. The language uses a subquery or a JOIN statement to
express the relationship. Code becomes ridiculously complicated
when there are multiple data levels or when self-joins are needed.
SQL doesn’t foster multi-step coding. Programmers have to write
very long query containing layers of subqueries. Though stored
procedures can alleviate the problem, they are not always
available. DBA has strict rules about the privileges of using
stored procedures, and old and small databases don’t support
stored procedures. Besides, it’s inconvenient to debug a stored
procedure. This makes it unsuitable to do a procedural
computation.

There are other scenarios that require post-SQL scripting. To
migrate the algorithm between different database products or
between database and non-relational database, data source or output
target isn’t the database but a file, and mixed computation
performed between multiple databases, for example. All these
external database computations need post-SQL scripting.

The most important role of a post-SQL script is to achieve the
complex computations that SQL is not good at. It would be better if
they had some advanced features, such as the ability of processing
data of various sources (files and non-relational databases, for
example) and handling a relatively large amount of data, and
satisfactory performance. But, the basic thing is that the
scripting tool should be able to perform database read/write
conveniently.

The popular post-SQL scripting tools are Java, Python pandas and
esProc. Now let’s look at and examine their scripting
abilities.

JAVA

High-level languages, such C++ and Java, are theoretically
almighty and thus are able to manage computations SQL is hard to
deal with. Java supports generic type and has relatively
comprehensive set orientation to handle complex set-based
computations. A Java array has intrinsic sequence numbers to
implement order-based computations conveniently. Java can express a
relationship using object reference and handle join operations
well. It also supports procedural syntax, including branch and
loop, to achieve complex multi-step computations.

Unfortunately, Java lacks class libraries for structured
computations. It hardcodes even the simplest structured
computations and creates the most basic data type manually. That
makes Java code lengthy and complicated.

Here’s an example of order-based computation: get the number
of longest consecutively rising trading days for a stock. Database
AAPL stores a stock’s price information in fields including
transaction dates and closing price.

In an intuitive way, we loop through the records ordered by
dates, add 1 to the number of consecutive rising days (the initial
value is 0) if the closing price of the current record is greater
than that of the previous one; and compare the number of
consecutively rising days with the current largest number of
consecutively rising days (the initial value is 0) to get the new
largest number and reset the number as 0. Repeat the process until
the loop is over and the current largest number is the final
desired one.

SQL can’t implement the intuitive algorithm because it
doesn’t support order-based computations. But it has its own
tricks. It groups stock records ordered by dates. The way is like
this: put records whose closing prices rise consecutively in one
group, that is, group the current record with the previous one if
its closing price rises and create a new group if its price lowers;
then count the members in every group and find the largest number,
which is the largest number of consecutively rising days.

Examples of such SQL code can be found
here
.