Data is growing day by day and is very difficult to understand. Most of the companies turn to Business Intelligence tools that can help them analyze and present data in an understandable and attractive manner.
In today’s world, when there’s a plethora of visualization tools available it is difficult to select one that will satisfy all your needs. It is easy to learn any visualization tool, but selecting one is the actual task.
To make life simpler for you, I’m comparing the three tools that I have used:
- Apache Superset
We started with QuickSight as our business intelligence tool since we were using Redshift as the data source. It is available on AWS. One of the advantages of QuickSight is that you can easily connect it to Redshift and many other sources.
You can name the data source whatever you wish. The connection type can be left to default. The Database server is the hostname of your Redshift cluster ,i.e, you can see an endpoint on the top of the window when you click on the name of the cluster. The Port is 5439 by default. All other details can be found in your Redshift cluster with ease.
Here comes the main thing! Before clicking on the validate connection you have to create a security group for that. You can do that using the EC2 AWS service. There is an option called security groups. Create new security group. Enter details requested and you’ll find the Add inbound rule.
All the details should be selected as shown in the figure above except the CIDR, IP which is next to source. That allows the client to connect to your cluster if you are in that region. You can refer to this link for the IP address and select one according to the region in which your cluster is working. Click on the Save button and you are done! 🙂
Now you will be able to access all the tables that are there in your cluster and visualize the data using different visual types. But one of the disadvantages of QuickSight is that it has few visual options and also the visualization is not up to the mark.
QuickSight is not the best match for our purpose so we considered testing Kibana as it is one of the best open-source visualization dashboards available. But before diving into the details, you need to know that Kibana only works with ElasticSearch.
You can easily download and install it on your computer. Go to the bin directory and run command ./kibana. You can go to localhost:5601 and check if kibana is running. Before running kibana make sure your elasticsearch is running. For the elasticsearch after installing run command ./elasticsearch inside the bin directory. Go to localhost:9200 and check if elasticsearch is running.
You can load data into elasticsearch using logstash. Once you have the data inside logstash, you can visualize using Kibana. It has a variety of visual types available. You can shape your data using different charts, maps, tables and so on. Also, the visualization looks quite fascinating. But as we know nothing is perfect. The main issue that we face while using Kibana is that we can not directly perform joins between the tables. There is one option called parent-child relationship for the joins. But, again, that is not a feasible option.
After playing around with different visualization tools, we finally found the One, APACHE SUPERSET. This is the business intelligence tool that we like the most.
Step 1: Installation & Configuration
Apache Superset works only with python version < 3.8. It won’t even work with python 3.8. Follow this for installation and configuration. You can install it according to the steps explained in that document.
If you are trying to run pip install apache-superset and if you get an error regarding building wheels of py-arrow, read the first line of Step 1 of this article. You might be using python version 3.8 so you should shift it to python 3.7 or lesser than that.
These are the commands that are useful for the installation of Apache Superset on Mac. You can ignore superset load_examples command since it will load the sample data and you won’t require that. It also takes quite a while to load. You could use docker for the installation of Apache Superset but it’s way too slow. I’d suggest avoiding it.
Once you are done with these steps, you can go to localhost:8088 and you should be able to login using the name and password that you have entered while executing flask fab create-admin command. Now you can connect Apache Superset with various data sources.
I’d suggest running all these commands using anaconda.
Step 2: Connection to the data sources
Since we are using Redshift as a data warehouse, I will show you here the connection of Redshift with Apache Superset. But, you can also connect to various other data sources. Refer to this!
Go to Sources -> Database -> click on + sign which is next to the filter list.
That will open a form where you can enter details such as the name of the database. SQL alchemy URI for Redshift is:
Select checkbox according to requirement and click on the Save button. Go to SQLLab where you should be able to see your schema and tables. You can run sql queries there and click on explore where you can select different visual types and later add them to dashboards.
You might face errors such as can’t load plugin psycopg2 so you have to install those things. Here are the commands which you should run if you face any error related to plugin. Make sure you install them inside the environment.
Another error you might come across is connection timeout(500). So that shows you have some issues with Redshift and not Apache Superset. Just as we created a security group while connecting QuickSight to Redshift, we have to create a security group for Apache Superset. Follow the same steps. Inbound rules should look something like this.
Once you are done with all the above steps, you can test the connection that will show an alert message: Seems Ok!
That’s it! Thanks for reading my post. Hopefully, it would have helped you to overcome the issues that we have faced and pick a business intelligence tool that works for you. Also read my post on how business can minimize their losses by taking adequate precautions during global outbreak.