Apache Superset is a powerful, open-source data visualization tool that allows users to explore, visualize, and share data insights through interactive, customizable dashboards. However, as your Apache Superset deployment scales and data volumes grow, performance bottlenecks may arise, impacting the user experience. In this blog post, we’ll explore six quick tips to optimize Apache Superset performance for better scalability and enhanced user satisfaction.

6 Tips for Apache Superset Performance and Scalability Optimization

1. Scale with Multiple Instances Behind a Load Balancer

If you have a large number of users accessing Superset dashboards simultaneously, consider scaling Apache Superset for large datasets by adding multiple instances behind a load balancer. This setup distributes incoming traffic evenly across all instances, preventing any single server from becoming a bottleneck. Load balancing in Apache Superset not only improves performance but also enhances fault tolerance, ensuring high availability for your users.

2. Optimize Gunicorn Processes and Threads

Gunicorn is the WSGI HTTP server that runs your Superset application. Properly configuring the number of Gunicorn worker processes and threads per worker can significantly improve resource utilization. By default, Gunicorn may not fully leverage your server’s CPU cores and threads. Adjust the Gunicorn workers and threads parameters based on your server’s capabilities to handle more concurrent requests efficiently, a key factor in Superset server performance optimization.

gunicorn \
  --workers=4 \
  --threads=30 \
  --bind 0.0.0.0:8088 \
  "superset.app:create_app()"Code language: JavaScript (javascript)

Adjust the –workers and –threads values according to your server’s CPU and memory resources.

If you are running docker-compose based on the docker-compose files provided in the Superset codebase then changing the following environment variables should be sufficient:

​​SERVER_WORKER_AMOUNT=4
SERVER_THREADS_AMOUNT=30

3. Configure SQLAlchemy Database Connection Pool

When dealing with a large number of threads, it’s crucial to configure the SQLAlchemy database connection pool properly. Each thread may require a dedicated database connection, and without proper pooling, you could run into connection limits or experience degraded performance. Adjust the pool_size and max_overflow settings in your SQLAlchemy configuration to ensure efficient use of database connections. For further improvement, Superset database optimization tips like tuning connection pools can prevent bottlenecks and enhance overall performance.

# In your superset_config.py
SQLALCHEMY_ENGINE_OPTIONS = {
    'pool_size': 30,
    'max_overflow': 10,
    'pool_timeout': 30,
    'pool_recycle': 1800,
}
Code language: PHP (php)

Modify the pool_size and max_overflow based on your application’s concurrency needs. 

Contact for Data Engineering, BI, or Analytics Services

Here it is important to note that these configurations are applied per Gunicorn worker process.

4. Implement Domain Sharding for Embedded Dashboards

If you’re using Superset via the embedded SDK and have numerous small dashboards on the same page, you might notice requests being queued. Browsers typically limit the number of concurrent connections to a single domain. To reduce dashboard load time in Superset, implement domain sharding by serving Superset dashboards from multiple subdomains (e.g., superset1.example.com, superset2.example.com). This allows the browser to open more simultaneous connections, helping to optimize Superset dashboards.

Apache Superset Dashboard Optimization Workflow

5. Enable Redis Caching

Enabling caching significantly boosts Apache Superset speed. Ensure that Redis caching is enabled in your Superset configuration. Redis stores frequently accessed data in memory, reducing the need to fetch data from the database repeatedly. This can improve Apache Superset query performance and reduce load times for large dashboards. Proper caching will enhance performance across your deployment, ensuring a smoother user experience.

# In your superset_config.py
CACHE_CONFIG = {
    'CACHE_TYPE': 'RedisCache',
    'CACHE_DEFAULT_TIMEOUT': 300,
    'CACHE_KEY_PREFIX': 'superset_',
    'CACHE_REDIS_URL': 'redis://localhost:6379/0',
}Code language: PHP (php)

Ensure that Redis is installed and running on your server.

6. Tune Slow-Loading Queries

If your Superset dashboard UI loads quickly but individual charts take time to render, the underlying queries may need optimization. Use “EXPLAIN ANALYZE” to inspect the SQL queries used in your charts and identify performance issues. Optimize Superset dashboards by adding indexes, rewriting complex queries, or aggregating data to improve execution times. Improving Apache Superset query performance can drastically reduce chart loading times and improve the overall user experience.

Conclusion

Optimizing Apache Superset is essential for maintaining a responsive and efficient data visualization platform, especially as your organization scales. By implementing the six Superset optimization tips mentioned above, you can unlock the full potential of Apache Superset and provide a better experience for your users.

Regular monitoring and Apache Superset performance tuning will help you stay ahead of potential issues, ensuring that your deployment remains robust and scalable. Implement these optimization strategies today to unlock the full potential of your Apache Superset environment with the expertise of Mobisoft Infotech.

Maximize your insights with our Data Engineering Services

Author's Bio:

mobisoft-pritam
Pritam Barhate

Pritam Barhate, with an experience of 14+ years in technology, heads Technology Innovation at Mobisoft Infotech. He has a rich experience in design and development. He has been a consultant for a variety of industries and startups. At Mobisoft Infotech, he primarily focuses on technology resources and develops the most advanced solutions.