After developing a web application or web service (SaaS) and overcoming a tumultuous period of its growth, you may eventually reach a dead end and become lost in how to grow further. For some reason, new possibilities don’t give you what you expected, and covering new markets doesn’t grant you the anticipated profit. As a result, you put in your SaaS solution more than you get in return.
In order to be able to move further and find new possibilities for growth, one needs to analyze and reconsider the past. What is in demand and what is not, how certain software features and the demand for them are interconnected, what the reason is and how different times influence it – such a thorough analysis can help you to identify the growing-point. A deep analysis of the accumulated data in the SaaS database is one of the methods that can help to understand this past. So, let’s take a closer look at the available analytical approaches.
In Search of the Best Data Analysis Method for SaaS App
When you choose to develop a SaaS application, it is important to undergo various development and testing stages, which include a great variety of decisions. And deciding on choosing OLAP or OLTP is one of them.
In general, a SaaS (Software-as-a-Service) solution can be useful for any business, because they can help a company to cover the processes connected to email, calendaring, etc. and are accessible from anywhere and anytime, as long as a user has an internet connection and a compatible device. CRM software, ERP systems, HRM software, Learning management systems (LMS), and other solutions can be developed as SaaS. Therefore, it is vital to understand which complex data analysis method and which database will be the best in your particular situation depending on which data you need to gather and why.
Most probably, your Software-as-a-Service solution is developed with the use of traditional row-oriented databases (MySQL, MS SQL, etc.). Indeed, they provide integrity and are able to process simple transactions in no time, while following the ACID (Atomicity, Consistency, Isolation, and Durability) principle. For example, a user somehow interacts with UI, then within one transaction, this action is registered in the database and connected with the data that is already presented there. After that, the transaction is analyzed online to ensure SaaS operating activity.
Let’s take the analysis of the inventory restock on time, which includes the current amount of resources a supplier has and the set logistics operations. The analysis is performed by following Online Transaction Processing (OLTP) that includes the same row-oriented databases. But in the case of reconsidering the past to analyze historical data, its performance leaves much to be desired.
Building a SaaS solution based on Online Analytical Processing (OLAP) can be a better choice to deal with this task. Let’s see why.
Choosing OLAP Over OLTP
In the OLAP vs OLTP battle, opting to the first option may be a smarter decision if a solution includes the following:
- the vast majority of requests to database focuses on reading data for its analysis;
- data is added to database but is not changed (the historical data is gathered, and there will be no further alterations to the history);
- there are loads of data to gather (there is a need to minimize the overload on the main database, therefore, the entries should be analyzed in packs, not one by one, or when the data load is at its minimum);
- during data reading, a large amount of database rows but a small number of columns are used (many entries are analyzed over a long period of time so that it will be possible to create several diagrams with different parameters).
When there is still not so much data in the row-oriented database, the solution in which data analysis is using a replica of the current database will suit perfectly. To put it simply, let’s imagine that your SaaS solution uses a MySQL database as one of the typical representatives of a row-oriented approach. In this case, data analysis is built on a replica of MySQL. But if there is a lot of data, this method will be less productive as compared with specialized multi-dimensional structures.
However, this is not the topic of discussion. We want to dive deeper into the column-oriented databases as a more preferred option, because they store data in columns, which grants maximum performance in situation we covered in point #4. And this is the key difference: OLTP works with row-oriented databases, while OLAP is for column-oriented databases.
When you choose OLAP, you gain the following benefits:
- data is gathered only from relevant columns, and same-type info is effectively compressed to decrease the size of database and increase its performance;
- data approximation is supported, which decreases the amount of requests to hard disk and increases processing speed;
- data is physically sorted by primary key, which streamlines the receiving of particular entries and intervals;
- vector computing is performed on parts of columns, which reduces expenses required for supervision and elevates the efficiency of using CPU.
Differences in Using OLAP and OLTP
To understand this type of data processing and why it holds the cards when there is a need to analyze big data, let’s view an example.
Knowing your customer helps you to decide which sales approach to take. So, imagine that you need to analyze who buys shaving gel more often. You want to take the attribute Gender to identify if male, female, or other consumers buy it more or less.
By using your MySQL database, you should have to create an index with the Gender attribute. Considering that there are many fields in a database, and you may need analytics covering different attributes, the database must have an index per each field. For example, today, you decide to sort out your customers by gender, tomorrow you want to know their education or zodiac sign. In this situation, you need to add indexes to all fields, which will be daunting for the system performance, because you will need to change all indexes each time the information alters in the database. After only your one INSERT request, many changes will be happening in your MySQL database. But if you refuse to create those indexes, any request related to the analytics will leave you with an extra long waiting time.
However, if you choose to use a column-oriented database for gathering and sorting out analytics, instead of working with row-oriented MySQL, the data record will be shown in columns, while the needed fields – in rows. Thus, you will be able to easily sort out all of your customers by using the SELECT request. As a result, the request runtime of getting the data on all customers who chose “female” in the Gender category will have the same duration that MySQL spends on getting info on one customer (if omit the time required for data transfering).
For example, in the column-oriented database, the information is stored like this: 513001,502333,455332,457889; Jones,Dickinson,Smith,Beck; Joanna,Rose,Taylor,Samuel; female,female,other,male. It allows the system to compress the data by female attribute and fetch it by Gender much faster.
As for the row-oriented database, it is stored like this: 513001,Jones,Joanna, female; 502333, Smith,Jamie,female; 455332,Beck,Samuel,male.
A row-oriented database stores all attributes of each row altogether, while a column-oriented database uses one logical file for each attribute (column). Thus, storing only the data on the Gender field, the logical file will be of a smaller size, which allows your system to perform fetch faster. In the case of using a row-oriented database, the software would have to search the entire data.
Choosing the Best Data Analysis Method for Your SaaS
Now knowing more details about OLAP and OLTP, we can finally answer the question of which data analysis approach to go for to have effective analytics when there is a lot of data to deal with. Many companies choose OLAP as their best option and can be based on corresponding databases, such as ClickHouse, Microsoft SASS, or a cloud data warehouse like Amazon Redshift.
Generally, gathering analytics is not of a prime importance when you build a SaaS, and you don’t need it at an earlier stage of your app launch. The data needs time to be gathered. Therefore, the implementation of the chosen method can be done later. Even if you have done simple data analytics based on the MySQL replica, nothing stops you from changing your mind later. You won’t waste much time, as you will need to only adapt the amount of the required work according to your choice of using a column-oriented database instead. А business level and presentation level of your analytics will remain the same.
Analytics should be built as an external data consumer that gathers information from your SaaS solution. The system is reading the data as per schedule (for example, with the help of Cron) or by some events with specially created scripts that are external in relation to your SaaS. Usually but not necessarily, these scripts are created with Python, and they are able to perform any additional data processing. However, the key goal of these scripts is to transfer the data from the row-oriented database to the column-oriented one to let the analytics system have faster processing time.
Choosing a column-oriented database depends on many requirements and circumstances.The simplest example is when you already use AWS, you probably should pay attention to Amazon Redshift. In case you have a Microsoft platform, your best choice may be Microsoft SASS.
When you understand the key principle of creating the analytics of historical data that has to find tendencies and patterns, you know how to minimize the efforts needed for achieving the desired outcome. As we mentioned, at first, your MySQL replica can do its job. After gathering some data, you can gently switch to OLAP and a more suitable column-oriented database. If you want to know more about the cloud software development and other services we provide, please contact us, and our experts will be glad to turn your idea into reality.