Processing AWS ALB Logs with GoAccess, Excel, and SQLite Banner

Introduction

Sometimes on projects with tight budgets we don’t have access to expensive APM tools like DataDog and New Relic. In such projects, there is no easy way to find out which of the APIs are slow causing performance issues for your application. In such cases, processing AWS Application Load Balancer logs which (if enabled) are stored in a S3 bucket can lead to valuable insights.

This tutorial walks you through the process of downloading Application Load Balancer logs from AWS S3, consolidating them, and analyzing them using tools like GoAccess, Excel, and SQLite. By the end of this guide, you’ll be able to extract meaningful metrics and identify slow requests or errors impacting your application’s performance.

You will need:

  • AWS CLI configured with the necessary permissions to access your Application Load Balancer logs in S3. You can get it from here.
  • GoAccess installed on your local machine for log analysis. You can get it from here
  • SQLite3 installed for querying log data. On all modern Linux and Mac machines SQLite should be available by default. If you are using Windows, please use WSL2 follow this tutorial.
  • Excel: Excel or any spreadsheet software for additional data manipulation.

Optimize Your AWS Infrastructure with Expert DevOps Solutions

Step 1: Configure AWS CLI and Download the Logs

First, ensure your AWS CLI is configured correctly:

aws configure

Provide your AWS Access Key ID, Secret Access Key, Default Region, and Default Output Format when prompted.

Next, synchronize your Application Load Balancer logs from the S3 bucket to a local directory:

aws s3 sync s3://your-alb-logs-bucket/path/to/logs ./alb-logs
  • Replace s3://your-alb-logs-bucket/path/to/logs with the actual S3 path to your Application Load Balancer logs.
  • ./alb-logs is the local directory where the logs will be downloaded.

Note: The logs will be downloaded into a nested directory structure organized by year, month, and day.


Step 2: Consolidate Log Files

To simplify processing, it’s helpful to move all .log.gz files into a single directory.

Navigate to your logs directory:

cd alb-logs

Use the following command to find and move all .log.gz files to the current directory:

find . -type f -name "*.log.gz" -exec mv {} . \;

Explanation:

  • find .: Searches in the current directory and subdirectories.
  • -type f: Looks for files.
  • -name "*.log.gz": Matches files ending with .log.gz.
  • -exec mv {} . \;: Executes the mv command to move each found file to the current directory.

Step 3: Combine Logs into a Single File

Unzip and combine all log files into one file for easier processing:

gunzip -c *.log.gz > combined-log.txt

Explanation:

  • gunzip -c: Unzips the files and outputs to stdout.
  • *.log.gz: Matches all compressed log files.
  • > combined-log.txt: Redirects the output to combined-log.txt.

Optional: Check the number of log lines:

cat combined-log.txt | wc -l

This command counts the number of lines in the combined log file, giving you an idea of the volume of data you’re working with.


Step 4: Analyze Logs with GoAccess

Use GoAccess to parse and analyze the combined log file:

cat combined-log.txt | goaccess --log-format=AWSELB -a -o report.html

Explanation:

  • cat combined-log.txt: Outputs the content of the combined log file.
  • |: Pipes the output to the next command.
  • goaccess --log-format=AWSELB -a -o report.html: Runs GoAccess with the AWS ELB log format, enables all static reports (-a), and outputs the report to report.html.

View the Report:

Open report.html in your web browser to explore the visual analytics provided by GoAccess.


Step 5: Prepare Logs for CSV Processing

Replace spaces with tabs to convert the log file into a tab-separated values (TSV) format suitable for CSV processing:

sed -e 's/ /\t/g' combined-log.txt > combined-log.tsv

Explanation:

  • sed -e 's/ /\t/g': Uses sed to substitute every space with a tab character.
  • combined-log.txt: Input file.
  • > combined-log.tsv: Redirects the output to combined-log.tsv.

Step 6: Sort Logs by Target Response Time

Identify the slowest requests by sorting the logs based on the target_processing_time field:

sort -k7nr combined-log.tsv | head -200 > top-slow-200.tsv

Explanation:

  • sort -k7nr combined-log.tsv: Sorts the TSV file numerically (n) and in reverse order (r) based on the 7th column (k7), which corresponds to target_processing_time.
  • | head -200: Takes the top 200 entries from the sorted output.
  • > top-slow-200.tsv: Writes the result to top-slow-200.tsv.

Note: The column index starts at 1, so k7 refers to the 7th column.


Step 7: Load Logs into SQLite Database

Loading the TSV data into SQLite allows for powerful querying capabilities.

Create a SQLite Database and Table

Create a new SQLite database:

sqlite3 logs.db

Within the SQLite shell, create a logs table matching the AWS Application Load Balancer log fields:

CREATE TABLE logs (
    type TEXT,
    time TEXT,
    elb TEXT,
    client_port TEXT,
    target_port TEXT,
    request_processing_time REAL,
    target_processing_time REAL,
    response_processing_time REAL,
    elb_status_code INTEGER,
    target_status_code INTEGER,
    received_bytes INTEGER,
    sent_bytes INTEGER,
    request TEXT,
    user_agent TEXT,
    ssl_cipher TEXT,
    ssl_protocol TEXT,
    target_group_arn TEXT,
    trace_id TEXT,
    domain_name TEXT,
    chosen_cert_arn TEXT,
    matched_rule_priority INTEGER,
    request_creation_time TEXT,
    actions_executed TEXT,
    redirect_url TEXT,
    error_reason TEXT,
    target_port_list TEXT,
    target_status_code_list TEXT,
    classification TEXT,
    classification_reason TEXT,
    conn_trace_id TEXT
);

Configure SQLite for TSV Import

Set the separator to a tab character and switch to tab-separated mode:

.separator "\t"
.mode tabs

Import the TSV Data

Import the data from combined-log.tsv into the logs table:

.import combined-log.tsv logs

Note: Ensure that combined-log.tsv is in the same directory as your SQLite database or provide the full path.

Query the Data

Set the output mode to CSV and specify an output file:

.mode csv
.output slow-requests.csv

Execute a query to find requests with a target_processing_time greater than 10 seconds:

SELECT time, target_processing_time, request
FROM logs
WHERE target_processing_time > 10
ORDER BY target_processing_time DESC
LIMIT 40;

This query selects the timestamp, target processing time, and request details for the top 40 slowest requests.

Reset Output and Exit SQLite:

.mode list
.quit

Conclusion

This process enables you to find out which of the endpoints are slow and are causing performance issues. Then you can investigate those endpoints further. Hope you found this tutorial helpful. If yes, please share it with your friends and colleagues. 

Enhance Your Cloud Performance with Scalable Cloud Solutions

Author's Bio:

mobisoft-pritam
Pritam Barhate

Pritam Barhate, with an experience of 14+ years in technology, heads Technology Innovation at Mobisoft Infotech. He has a rich experience in design and development. He has been a consultant for a variety of industries and startups. At Mobisoft Infotech, he primarily focuses on technology resources and develops the most advanced solutions.