Solving the Traveling Salesman Problem with Graph Neural Networks: A Complete Python Tutorial

Have you ever wondered how delivery companies like UPS or Amazon optimize their routes to save millions of dollars in fuel costs? The answer lies in solving one of computer science's most famous problems: the Traveling Salesman Problem (TSP).

In this comprehensive tutorial, you'll learn how to apply Graph Neural Networks (GNNs) — one of the hottest areas in deep learning — to solve TSP. By the end, you'll have a working implementation that predicts optimal routes using PyTorch Geometric.

What is the Traveling Salesman Problem?

The Traveling Salesman Problem (TSP) is a classic optimization problem that asks:

"Given a list of cities and the distances between them, what is the shortest possible route that visits each city exactly once and returns to the starting city?"

Why TSP Matters

TSP isn't just an academic puzzle — it has real-world applications in:

🚚 Logistics: Route optimization for delivery trucks
🔧 Manufacturing: Minimizing tool movement in CNC machines
🧬 Bioinformatics: DNA sequencing
🔌 Electronics: Printed circuit board drilling

The Challenge

Here's the catch: TSP is NP-hard. For just 20 cities, there are over 60 quadrillion possible routes! Traditional algorithms struggle as the problem size grows, making machine learning approaches increasingly attractive.

Why Use Graph Neural Networks for TSP?

Graph Neural Networks are a natural fit for TSP because:

Native Graph Structure: TSP is inherently a graph problem — cities are nodes, routes are edges
Spatial Learning: GNNs can learn spatial relationships between cities
Generalization: A trained GNN can solve new TSP instances without retraining
Speed: Once trained, inference is fast compared to traditional solvers

Our Approach: Edge Classification

Instead of predicting the tour sequence directly, we'll train a GNN to predict which edges belong in the optimal tour. This is a binary classification problem:

Label = 1: Edge is in the optimal tour
Label = 0: Edge is not in the tour

Setting Up Your Environment

First, let's install the required dependencies:

pip install torch torch-geometric numpy matplotlib scikit-learn networkx

Import the necessary libraries:

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import GCNConv, GATConv
from torch_geometric.data import Data, DataLoader
import numpy as np
import matplotlib.pyplot as plt

Representing TSP as a Graph

Graph Structure

In our TSP graph:

Nodes: Cities with (x, y) coordinates
Edges: Connections between all pairs of cities (complete graph)
Edge Weights: Euclidean distances

Node Features

Each city gets 4 features:

node_features = [
    x_coordinate,        # Absolute x position
    y_coordinate,        # Absolute y position  
    normalized_x,        # x / max_range (0-1)
    normalized_y         # y / max_range (0-1)
]

Edge Features

Each edge gets 2 features:

edge_features = [
    distance,            # Euclidean distance
    normalized_distance  # distance / max_possible_distance
]

Data Generation Code

Here's how we generate a TSP instance:

class TSPDataGenerator:
    def __init__(self, seed=42):
        np.random.seed(seed)
        torch.manual_seed(seed)

    def generate_tsp_instance(self, num_cities=20, coord_range=(0, 100)):
        # Generate random city coordinates
        coords = np.random.uniform(
            coord_range[0], coord_range[1], 
            size=(num_cities, 2)
        )

        # Create complete graph edges and compute distances
        edge_list = []
        edge_attrs = []

        for i in range(num_cities):
            for j in range(i + 1, num_cities):
                # Compute Euclidean distance
                dist = np.sqrt(
                    (coords[i, 0] - coords[j, 0])**2 + 
                    (coords[i, 1] - coords[j, 1])**2
                )

                # Add edge in both directions (undirected)
                edge_list.extend([[i, j], [j, i]])
                edge_attrs.extend([[dist, dist/100.0], [dist, dist/100.0]])

        # Create node features
        node_features = torch.zeros(num_cities, 4, dtype=torch.float)
        for i in range(num_cities):
            node_features[i] = torch.tensor([
                coords[i, 0], coords[i, 1],
                coords[i, 0] / coord_range[1],
                coords[i, 1] / coord_range[1]
            ])

        edge_index = torch.tensor(edge_list, dtype=torch.long).t().contiguous()
        edge_attr = torch.tensor(edge_attrs, dtype=torch.float)

        return Data(
            x=node_features,
            edge_index=edge_index,
            edge_attr=edge_attr,
            coords=torch.tensor(coords, dtype=torch.float)
        )

Generating Labels

We use the nearest neighbor heuristic to generate approximate optimal tours:

def compute_tour_nearest_neighbor(self, coords):
    num_cities = len(coords)
    unvisited = set(range(num_cities))
    tour = [0]  # Start at city 0
    unvisited.remove(0)

    current = 0
    while unvisited:
        # Find nearest unvisited city
        nearest = min(unvisited, 
                     key=lambda c: np.linalg.norm(coords[current] - coords[c]))
        tour.append(nearest)
        unvisited.remove(nearest)
        current = nearest

    return tour

Visualizing TSP Instances

Here's an example of what a TSP instance looks like with 20 cities:

TSP Instance Example

Figure 1: A sample TSP instance with 20 cities. The goal is to find the shortest route visiting each city exactly once.

Building the GNN Model

Our architecture has three main components:

Node Encoder: GNN layers to learn city representations
Edge Encoder: MLP to process distance features
Edge Classifier: Combines node and edge features for prediction

Complete Model Code

class TSPGNN(nn.Module):
    def __init__(self, num_node_features, num_edge_features, 
                 hidden_dim=128, num_layers=3, dropout=0.3):
        super(TSPGNN, self).__init__()

        # Node feature encoder (GCN layers)
        self.node_conv1 = GCNConv(num_node_features, hidden_dim)
        self.node_convs = nn.ModuleList([
            GCNConv(hidden_dim, hidden_dim)
            for _ in range(num_layers - 2)
        ])
        self.node_conv_final = GCNConv(hidden_dim, hidden_dim)

        # Edge feature encoder
        self.edge_encoder = nn.Sequential(
            nn.Linear(num_edge_features, hidden_dim),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim, hidden_dim)
        )

        # Edge classifier: combines node and edge features
        self.edge_classifier = nn.Sequential(
            nn.Linear(hidden_dim * 3, hidden_dim),  # 2 nodes + 1 edge
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_dim // 2, 2)  # Binary classification
        )

        self.dropout = dropout

    def forward(self, x, edge_index, edge_attr):
        # Process node features through GNN
        x = F.relu(self.node_conv1(x, edge_index))
        x = F.dropout(x, p=self.dropout, training=self.training)

        for conv in self.node_convs:
            x = F.relu(conv(x, edge_index))
            x = F.dropout(x, p=self.dropout, training=self.training)

        x = F.relu(self.node_conv_final(x, edge_index))

        # Process edge features
        edge_features = self.edge_encoder(edge_attr)

        # Get node features for each edge
        row, col = edge_index
        node_i = x[row]  # Source node features
        node_j = x[col]  # Target node features

        # Combine and classify
        combined = torch.cat([node_i, node_j, edge_features], dim=1)
        logits = self.edge_classifier(combined)

        return F.log_softmax(logits, dim=1)

Why This Architecture Works

GNN Layers: Each layer aggregates information from neighboring cities, learning spatial patterns
Multiple Layers: With 3 layers, each city "sees" cities up to 3 hops away
Edge Encoder: Learns that shorter distances are often in optimal tours
Combined Features: The classifier sees both city locations AND distances for its decision

Training the Model

Training Loop

def train_model(model, train_loader, val_loader, epochs=100, lr=0.001):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = model.to(device)

    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    criterion = nn.NLLLoss()

    best_val_acc = 0
    patience_counter = 0

    for epoch in range(epochs):
        model.train()
        total_loss = 0

        for batch in train_loader:
            batch = batch.to(device)
            optimizer.zero_grad()

            out = model(batch.x, batch.edge_index, batch.edge_attr)
            loss = criterion(out, batch.y)

            loss.backward()
            optimizer.step()
            total_loss += loss.item()

        # Validation
        val_acc = evaluate(model, val_loader, device)

        if val_acc > best_val_acc:
            best_val_acc = val_acc
            patience_counter = 0
            torch.save(model.state_dict(), 'best_tsp_model.pt')
        else:
            patience_counter += 1

        if patience_counter >= 20:  # Early stopping
            print(f"Early stopping at epoch {epoch}")
            break

        if (epoch + 1) % 10 == 0:
            print(f'Epoch {epoch+1} | Loss: {total_loss/len(train_loader):.4f} | Val Acc: {val_acc:.4f}')

    return model

Key Training Tips

Use Early Stopping: Prevents overfitting
Class Imbalance: Only ~14% of edges are in the tour — consider weighted loss
Learning Rate: 0.001 works well for Adam
Batch Size: Use 1 for simplicity (each graph is one batch)

Training Progress

Here's what the training curves look like for our GNN model:

Training Curves

Figure 2: Training and validation metrics over epochs. Notice how the model learns to predict edges in optimal tours with increasing accuracy.

Evaluating Results

Edge Classification Metrics

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

def evaluate(model, test_loader, device):
    model.eval()
    all_preds, all_labels = [], []

    with torch.no_grad():
        for batch in test_loader:
            batch = batch.to(device)
            out = model(batch.x, batch.edge_index, batch.edge_attr)
            pred = out.argmax(dim=1)

            all_preds.extend(pred.cpu().numpy())
            all_labels.extend(batch.y.cpu().numpy())

    print(f"Accuracy: {accuracy_score(all_labels, all_preds):.4f}")
    print(f"Precision: {precision_score(all_labels, all_preds, average='weighted'):.4f}")
    print(f"Recall: {recall_score(all_labels, all_preds, average='weighted'):.4f}")
    print(f"F1-Score: {f1_score(all_labels, all_preds, average='weighted'):.4f}")

Expected Results

With proper training, you should see:

Metric	Expected Range
Accuracy	75-90%
Precision	0.65-0.85
Recall	0.70-0.90
F1-Score	0.70-0.85

Model Performance Analysis

Let's examine the confusion matrix to understand how well our model performs:

Confusion Matrix

Figure 3: Confusion matrix showing the model's edge classification performance. The model correctly identifies most edges that belong (and don't belong) to optimal tours.

Tour Quality Metrics

The following visualization shows how our predicted tours compare to optimal solutions:

Tour Quality Analysis

Figure 4: Comparison of predicted tour lengths vs. optimal tour lengths. Our GNN model produces tours that are close to optimal solutions.

Visualizing Predictions

def visualize_tour(coords, true_tour, pred_edges):
    fig, axes = plt.subplots(1, 2, figsize=(14, 6))

    # True tour
    axes[0].scatter(coords[:, 0], coords[:, 1], c='red', s=200, zorder=3)
    for i in range(len(true_tour)):
        start = true_tour[i]
        end = true_tour[(i + 1) % len(true_tour)]
        axes[0].plot([coords[start, 0], coords[end, 0]],
                    [coords[start, 1], coords[end, 1]], 'b-', lw=2)
    axes[0].set_title('Optimal Tour')

    # Predicted edges
    axes[1].scatter(coords[:, 0], coords[:, 1], c='red', s=200, zorder=3)
    for i, j in pred_edges:
        axes[1].plot([coords[i, 0], coords[j, 0]],
                    [coords[i, 1], coords[j, 1]], 'g-', lw=1, alpha=0.5)
    axes[1].set_title('Predicted Edges')

    plt.tight_layout()
    plt.savefig('tsp_comparison.png', dpi=150)
    plt.show()

Here's an example of our model's predictions on a sample TSP instance:

Sample TSP Predictions

Figure 5: Side-by-side comparison of optimal tour (left) and predicted edges (right) for a 20-city TSP instance.

Understanding Model Confidence

The model also provides probability scores for each edge. Here's a visualization of the probability distribution:

Probability Analysis

Figure 6: Probability heatmap showing which edges the model believes are most likely to be in the optimal tour. Darker colors indicate higher confidence.

Predicted Tour Visualization

Here's a complete predicted tour from our trained model:

Predicted Tour

Figure 7: A complete tour constructed from the model's edge predictions. The tour visits all cities and returns to the starting point.

Complete Code Repository

You can find the complete implementation on GitHub:

🔗 Graph Neural Networks Tutorial Repository

The repository includes:

✅ Complete TSP implementation
✅ Step-by-step tutorials
✅ Pre-trained models
✅ Visualization scripts
✅ Additional GNN examples (supply chain optimization)

Conclusion and Next Steps

In this tutorial, you learned how to:

✅ Represent the Traveling Salesman Problem as a graph
✅ Design a GNN architecture for edge prediction
✅ Train and evaluate the model
✅ Visualize and interpret results

Where to Go From Here

Improve the Model:

Try Graph Attention Networks (GAT) for better performance
Implement reinforcement learning for direct tour optimization
Add post-processing with 2-opt to improve tour quality

Scale Up:

Handle larger instances (50-100+ cities)
Use sparse graphs to reduce computation
Implement hierarchical approaches for very large problems

Real-World Applications:

Add time window constraints for delivery scheduling
Handle multiple vehicles for fleet routing
Incorporate real map data with actual road distances

🚀 Key Takeaways

Concept	What You Learned
TSP	Classic optimization problem with real-world applications
GNNs	Neural networks that naturally handle graph-structured data
Edge Prediction	Simpler than sequence prediction, works with variable sizes
PyTorch Geometric	Powerful library for GNN implementations

Resources

Did you find this tutorial helpful? Drop a comment below or share it with fellow ML enthusiasts!

Have questions? Feel free to leave a comment.

About the Author

Hi, I'm Israel, a data scientist and AI engineer passionate about transforming real-world challenges into innovative solutions with machine learning and data. I love mentoring and supporting others as they grow in their tech careers. When I'm not coding or coaching, you'll likely find me immersed in a game of chess or enjoying a good action movie with my family. I hope you enjoyed this blog post and learnt something.

Follow for more ML tutorials: Hashnode | GitHub | LinkedIn

Command Palette