Introduction
Sometimes on projects with tight budgets we don’t have access to expensive APM tools like DataDog and New Relic. In such projects, there is no easy way to find out which of the APIs are slow causing performance issues for your application. In such cases, processing AWS Application Load Balancer logs which (if enabled) are stored in a S3 bucket can lead to valuable insights.
This tutorial walks you through the process of downloading Application Load Balancer logs from AWS S3, consolidating them, and analyzing them using tools like GoAccess, Excel, and SQLite. By the end of this guide, you’ll be able to extract meaningful metrics and identify slow requests or errors impacting your application’s performance.
You will need:
- AWS CLI configured with the necessary permissions to access your Application Load Balancer logs in S3. You can get it from here.
- GoAccess installed on your local machine for log analysis. You can get it from here
- SQLite3 installed for querying log data. On all modern Linux and Mac machines SQLite should be available by default. If you are using Windows, please use WSL2 follow this tutorial.
- Excel: Excel or any spreadsheet software for additional data manipulation.
Step 1: Configure AWS CLI and Download the Logs
First, ensure your AWS CLI is configured correctly:
aws configure
Provide your AWS Access Key ID, Secret Access Key, Default Region, and Default Output Format when prompted.
Next, synchronize your Application Load Balancer logs from the S3 bucket to a local directory:
aws s3 sync s3://your-alb-logs-bucket/path/to/logs ./alb-logs
- Replace
s3://your-alb-logs-bucket/path/to/logs
with the actual S3 path to your Application Load Balancer logs. ./alb-logs
is the local directory where the logs will be downloaded.
Note: The logs will be downloaded into a nested directory structure organized by year, month, and day.
Step 2: Consolidate Log Files
To simplify processing, it’s helpful to move all .log.gz
files into a single directory.
Navigate to your logs directory:
cd alb-logs
Use the following command to find and move all .log.gz
files to the current directory:
find . -type f -name "*.log.gz" -exec mv {} . \;
Explanation:
find .
: Searches in the current directory and subdirectories.-type f
: Looks for files.-name "*.log.gz"
: Matches files ending with.log.gz
.-exec mv {} . \;
: Executes themv
command to move each found file to the current directory.
Step 3: Combine Logs into a Single File
Unzip and combine all log files into one file for easier processing:
gunzip -c *.log.gz > combined-log.txt
Explanation:
gunzip -c
: Unzips the files and outputs to stdout.*.log.gz
: Matches all compressed log files.> combined-log.txt
: Redirects the output tocombined-log.txt
.
Optional: Check the number of log lines:
cat combined-log.txt | wc -l
This command counts the number of lines in the combined log file, giving you an idea of the volume of data you’re working with.
Step 4: Analyze Logs with GoAccess
Use GoAccess to parse and analyze the combined log file:
cat combined-log.txt | goaccess --log-format=AWSELB -a -o report.html
Explanation:
cat combined-log.txt
: Outputs the content of the combined log file.|
: Pipes the output to the next command.goaccess --log-format=AWSELB -a -o report.html
: Runs GoAccess with the AWS ELB log format, enables all static reports (-a
), and outputs the report toreport.html
.
View the Report:
Open report.html
in your web browser to explore the visual analytics provided by GoAccess.
Step 5: Prepare Logs for CSV Processing
Replace spaces with tabs to convert the log file into a tab-separated values (TSV) format suitable for CSV processing:
sed -e 's/ /\t/g' combined-log.txt > combined-log.tsv
Explanation:
sed -e 's/ /\t/g'
: Usessed
to substitute every space with a tab character.combined-log.txt
: Input file.> combined-log.tsv
: Redirects the output tocombined-log.tsv
.
Step 6: Sort Logs by Target Response Time
Identify the slowest requests by sorting the logs based on the target_processing_time
field:
sort -k7nr combined-log.tsv | head -200 > top-slow-200.tsv
Explanation:
sort -k7nr combined-log.tsv
: Sorts the TSV file numerically (n
) and in reverse order (r
) based on the 7th column (k7
), which corresponds totarget_processing_time
.| head -200
: Takes the top 200 entries from the sorted output.> top-slow-200.tsv
: Writes the result totop-slow-200.tsv
.
Note: The column index starts at 1, so k7
refers to the 7th column.
Step 7: Load Logs into SQLite Database
Loading the TSV data into SQLite allows for powerful querying capabilities.
Create a SQLite Database and Table
Create a new SQLite database:
sqlite3 logs.db
Within the SQLite shell, create a logs
table matching the AWS Application Load Balancer log fields:
CREATE TABLE logs (
type TEXT,
time TEXT,
elb TEXT,
client_port TEXT,
target_port TEXT,
request_processing_time REAL,
target_processing_time REAL,
response_processing_time REAL,
elb_status_code INTEGER,
target_status_code INTEGER,
received_bytes INTEGER,
sent_bytes INTEGER,
request TEXT,
user_agent TEXT,
ssl_cipher TEXT,
ssl_protocol TEXT,
target_group_arn TEXT,
trace_id TEXT,
domain_name TEXT,
chosen_cert_arn TEXT,
matched_rule_priority INTEGER,
request_creation_time TEXT,
actions_executed TEXT,
redirect_url TEXT,
error_reason TEXT,
target_port_list TEXT,
target_status_code_list TEXT,
classification TEXT,
classification_reason TEXT,
conn_trace_id TEXT
);
Configure SQLite for TSV Import
Set the separator to a tab character and switch to tab-separated mode:
.separator "\t"
.mode tabs
Import the TSV Data
Import the data from combined-log.tsv
into the logs
table:
.import combined-log.tsv logs
Note: Ensure that combined-log.tsv
is in the same directory as your SQLite database or provide the full path.
Query the Data
Set the output mode to CSV and specify an output file:
.mode csv
.output slow-requests.csv
Execute a query to find requests with a target_processing_time
greater than 10 seconds:
SELECT time, target_processing_time, request
FROM logs
WHERE target_processing_time > 10
ORDER BY target_processing_time DESC
LIMIT 40;
This query selects the timestamp, target processing time, and request details for the top 40 slowest requests.
Reset Output and Exit SQLite:
.mode list
.quit
Conclusion
This process enables you to find out which of the endpoints are slow and are causing performance issues. Then you can investigate those endpoints further. Hope you found this tutorial helpful. If yes, please share it with your friends and colleagues.
Author's Bio:
Pritam Barhate, with an experience of 14+ years in technology, heads Technology Innovation at Mobisoft Infotech. He has a rich experience in design and development. He has been a consultant for a variety of industries and startups. At Mobisoft Infotech, he primarily focuses on technology resources and develops the most advanced solutions.