Can I use real-time data?

Yes. While S3/Athena has a slight ingestion delay, you can set QuickSight to refresh its dataset automatically (SPICE) or use Direct Query mode for near real-time visualization.

What other visualizations work well for flow?

Network graphs and Chord diagrams are also excellent for flow data, but Sankey diagrams are superior for showing the magnitude (volume) of the flow relative to other paths.

Is this limited to logistics?

Not at all. This pattern is widely used for User Journey Mapping (Website Page A -> Page B -> Checkout), Cash Flow Analysis, and IT Network Traffic visualization.

Visualizing Data Flow with Amazon QuickSight Sankey Diagrams

Understanding the flow of data, goods, or users through a complex system is often difficult with traditional bar charts or tables. Whether you are tracking server latency between microservices or the movement of packages through a global supply chain, you need to see the connections.

This is where Sankey Diagrams shine. They visualize "flow" from one set of values to another, with the width of the lines representing the magnitude. In this tutorial, we will build a visibility dashboard for "TransOcean Logistics" to track the volume of shipments moving between regional hubs, ports, and last-mile delivery centers using Amazon QuickSight and Amazon Athena.

Why Sankey Diagrams?

Identify Bottlenecks: Instantly see where the "flow" narrows (e.g., a port handling too much traffic).
Trace Paths: Follow a shipment's journey from origin to destination across multiple hops.
Visualize Volume: The thickness of the bands makes it easy to compare traffic volumes intuitively.

Step 1: Generating the Data

Our application logs shipment transfers to CSV files in an S3 bucket. Each row represents a package moving from a source_location to a destination_location.

log_generator.py

import csv
import random
import boto3
from datetime import datetime

# Define our logistics network
HUBS = ['NYC_Hub', 'Lon_Hub', 'Tok_Hub', 'Sin_Hub']
PORTS = ['Port_Newark', 'Port_Felixstowe', 'Port_Yokohama', 'Port_Singapore']
LAST_MILE = ['Queens_Depot', 'Camden_Depot', 'Shibuya_Depot', 'Jurong_Depot']

def generate_logs(num_records=1000):
    filename = f"shipment_logs_{datetime.now().strftime('%Y%m%d%H%M%S')}.csv"
    
    with open(filename, 'w', newline='') as csvfile:
        fieldnames = ['timestamp', 'packet_id', 'source', 'destination', 'weight_kg', 'status']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        
        for _ in range(num_records):
            # Simulate flow: Hub -> Port -> Port -> Last Mile
            path_type = random.choice(['hub_to_port', 'port_to_port', 'port_to_lastmile'])
            
            if path_type == 'hub_to_port':
                src, dst = random.choice(HUBS), random.choice(PORTS)
            elif path_type == 'port_to_port':
                src, dst = random.choice(PORTS), random.choice(PORTS)
                if src == dst: continue # Skip detailed check for brevity
            else:
                src, dst = random.choice(PORTS), random.choice(LAST_MILE)

            writer.writerow({
                'timestamp': datetime.now().isoformat(),
                'packet_id': f"PKG-{random.randint(10000,99999)}",
                'source': src,
                'destination': dst,
                'weight_kg': random.randint(1, 50),
                'status': 'IN_TRANSIT'
            })
            
    # Upload to S3 (Uncomment to run)
    # s3 = boto3.client('s3')
    # s3.upload_file(filename, 'your-bucket-name', f'logs/{filename}')
    print(f"Generated {filename}")

if __name__ == "__main__":
    generate_logs()

Step 2: Querying with Athena

Once the data is in S3, we need to create a schema so Athena can query it. Go to the Athena Console and run the following SQL.

athena_schema.sql

CREATE DATABASE IF NOT EXISTS transocean_logistics;

CREATE EXTERNAL TABLE IF NOT EXISTS transocean_logistics.shipment_logs (
  timestamp STRING,
  packet_id STRING,
  source STRING,
  destination STRING,
  weight_kg INT,
  status STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION 's3://your-bucket-name/logs/'
TBLPROPERTIES ('skip.header.line.count'='1');

Step 3: Visualizing in QuickSight

Now for the main event. We will visualize the flow of goods to identify which ports are the busiest transshipment points.

Connect Data: In QuickSight, create a new "Dataset" utilizing Athena as the source. Select the transocean_logistics database and shipment_logs table.
Create Analysis: Click "Visualize". In the "Visual types" pane, select the Sankey Diagram icon (it looks like a flowing river).
Map Fields:
- Source: Drag source to the "Source" well.
- Destination: Drag destination to the "Destination" well.
- Weight: Drag weight_kg (or count of records) to the "Weight" well.
Analyze: You will now see thick bands connecting your Hubs to Ports, and Ports to Depots. Hover over the "Port_Singapore" node to see exactly how much cargo is flowing through it compared to "Port_Newark".

Conclusion

By visualizing raw logs as a Sankey diagram, TransOcean can instantly spot that 60% of their cargo flows through Port Singapore, creating a potential single point of failure. This insight allows them to reroute shipments proactively. This architecture—Logs to S3, Query via Athena, Visualize in QuickSight—is serverless, scalable, and highly effective for any flow-based data.

Optimize Your Supply Chain?

Unlock hidden efficiencies in your logistics network with advanced data visualization. Our AWS experts can build custom dashboards tailored to your KPIs.

AI Solutions

Cloud & AWS

Shopify

Odoo & ERP

AI Solutions

AI Support Agent

AI Inventory Agent

AI Finance Agent

Free AI Audit

AI Chatbot

AI Agent Development

AI Development

MCP Server

Blog

Case Studies

Dead Stock Calculator

Guides & Playbooks

Tutorials