Visualizing Data Flow with Amazon QuickSight Sankey Diagrams
By Braincuber Team
Published on February 11, 2026
Understanding the flow of data, goods, or users through a complex system is often difficult with traditional bar charts or tables. Whether you are tracking server latency between microservices or the movement of packages through a global supply chain, you need to see the connections.
This is where Sankey Diagrams shine. They visualize "flow" from one set of values to another, with the width of the lines representing the magnitude. In this tutorial, we will build a visibility dashboard for "TransOcean Logistics" to track the volume of shipments moving between regional hubs, ports, and last-mile delivery centers using Amazon QuickSight and Amazon Athena.
Why Sankey Diagrams?
- Identify Bottlenecks: Instantly see where the "flow" narrows (e.g., a port handling too much traffic).
- Trace Paths: Follow a shipment's journey from origin to destination across multiple hops.
- Visualize Volume: The thickness of the bands makes it easy to compare traffic volumes intuitively.
Step 1: Generating the Data
Our application logs shipment transfers to CSV files in an S3 bucket. Each row represents a package moving from a source_location to a destination_location.
import csv
import random
import boto3
from datetime import datetime
# Define our logistics network
HUBS = ['NYC_Hub', 'Lon_Hub', 'Tok_Hub', 'Sin_Hub']
PORTS = ['Port_Newark', 'Port_Felixstowe', 'Port_Yokohama', 'Port_Singapore']
LAST_MILE = ['Queens_Depot', 'Camden_Depot', 'Shibuya_Depot', 'Jurong_Depot']
def generate_logs(num_records=1000):
filename = f"shipment_logs_{datetime.now().strftime('%Y%m%d%H%M%S')}.csv"
with open(filename, 'w', newline='') as csvfile:
fieldnames = ['timestamp', 'packet_id', 'source', 'destination', 'weight_kg', 'status']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
for _ in range(num_records):
# Simulate flow: Hub -> Port -> Port -> Last Mile
path_type = random.choice(['hub_to_port', 'port_to_port', 'port_to_lastmile'])
if path_type == 'hub_to_port':
src, dst = random.choice(HUBS), random.choice(PORTS)
elif path_type == 'port_to_port':
src, dst = random.choice(PORTS), random.choice(PORTS)
if src == dst: continue # Skip detailed check for brevity
else:
src, dst = random.choice(PORTS), random.choice(LAST_MILE)
writer.writerow({
'timestamp': datetime.now().isoformat(),
'packet_id': f"PKG-{random.randint(10000,99999)}",
'source': src,
'destination': dst,
'weight_kg': random.randint(1, 50),
'status': 'IN_TRANSIT'
})
# Upload to S3 (Uncomment to run)
# s3 = boto3.client('s3')
# s3.upload_file(filename, 'your-bucket-name', f'logs/{filename}')
print(f"Generated {filename}")
if __name__ == "__main__":
generate_logs()
Step 2: Querying with Athena
Once the data is in S3, we need to create a schema so Athena can query it. Go to the Athena Console and run the following SQL.
CREATE DATABASE IF NOT EXISTS transocean_logistics;
CREATE EXTERNAL TABLE IF NOT EXISTS transocean_logistics.shipment_logs (
timestamp STRING,
packet_id STRING,
source STRING,
destination STRING,
weight_kg INT,
status STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION 's3://your-bucket-name/logs/'
TBLPROPERTIES ('skip.header.line.count'='1');
Step 3: Visualizing in QuickSight
Now for the main event. We will visualize the flow of goods to identify which ports are the busiest transshipment points.
- Connect Data: In QuickSight, create a new "Dataset" utilizing Athena as the source. Select the
transocean_logisticsdatabase andshipment_logstable. - Create Analysis: Click "Visualize". In the "Visual types" pane, select the Sankey Diagram icon (it looks like a flowing river).
- Map Fields:
- Source: Drag
sourceto the "Source" well. - Destination: Drag
destinationto the "Destination" well. - Weight: Drag
weight_kg(or count of records) to the "Weight" well.
- Source: Drag
- Analyze: You will now see thick bands connecting your Hubs to Ports, and Ports to Depots. Hover over the "Port_Singapore" node to see exactly how much cargo is flowing through it compared to "Port_Newark".
Conclusion
By visualizing raw logs as a Sankey diagram, TransOcean can instantly spot that 60% of their cargo flows through Port Singapore, creating a potential single point of failure. This insight allows them to reroute shipments proactively. This architecture—Logs to S3, Query via Athena, Visualize in QuickSight—is serverless, scalable, and highly effective for any flow-based data.
Optimize Your Supply Chain?
Unlock hidden efficiencies in your logistics network with advanced data visualization. Our AWS experts can build custom dashboards tailored to your KPIs.
