Solving the Traveling Salesman Problem with Graph Neural Networks: A Complete Python Tutorial
Learn how to apply GNNs to classic optimization problems with PyTorch Geometric
Solving the Traveling Salesman Problem with Graph Neural Networks: A Complete Python Tutorial
Have you ever wondered how delivery companies like UPS or Amazon optimize their routes to save millions of dollars in fuel costs? The answer lies in solving one of computer science's most famous problems: the Traveling Salesman Problem (TSP).
In this comprehensive tutorial, you'll learn how to apply Graph Neural Networks (GNNs) โ one of the hottest areas in deep learning โ to solve TSP. By the end, you'll have a working implementation that predicts optimal routes using PyTorch Geometric.
๐ Table of Contents
- What is the Traveling Salesman Problem?
- Why Use Graph Neural Networks for TSP?
- Setting Up Your Environment
- Representing TSP as a Graph
- Building the GNN Model
- Training the Model
- Evaluating Results
- Complete Code Repository
- Conclusion and Next Steps
What is the Traveling Salesman Problem?
The Traveling Salesman Problem (TSP) is a classic optimization problem that asks:
"Given a list of cities and the distances between them, what is the shortest possible route that visits each city exactly once and returns to the starting city?"
Why TSP Matters
TSP isn't just an academic puzzle โ it has real-world applications in:
- ๐ Logistics: Route optimization for delivery trucks
- ๐ง Manufacturing: Minimizing tool movement in CNC machines
- ๐งฌ Bioinformatics: DNA sequencing
- ๐ Electronics: Printed circuit board drilling
The Challenge
Here's the catch: TSP is NP-hard. For just 20 cities, there are over 60 quadrillion possible routes! Traditional algorithms struggle as the problem size grows, making machine learning approaches increasingly attractive.
Why Use Graph Neural Networks for TSP?
Graph Neural Networks are a natural fit for TSP because:
- Native Graph Structure: TSP is inherently a graph problem โ cities are nodes, routes are edges
- Spatial Learning: GNNs can learn spatial relationships between cities
- Generalization: A trained GNN can solve new TSP instances without retraining
- Speed: Once trained, inference is fast compared to traditional solvers
Our Approach: Edge Classification
Instead of predicting the tour sequence directly, we'll train a GNN to predict which edges belong in the optimal tour. This is a binary classification problem:
- Label = 1: Edge is in the optimal tour
- Label = 0: Edge is not in the tour
Setting Up Your Environment
First, let's install the required dependencies:
pip install torch torch-geometric numpy matplotlib scikit-learn networkx
Import the necessary libraries:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import GCNConv, GATConv
from torch_geometric.data import Data, DataLoader
import numpy as np
import matplotlib.pyplot as plt
Representing TSP as a Graph
Graph Structure
In our TSP graph:
- Nodes: Cities with (x, y) coordinates
- Edges: Connections between all pairs of cities (complete graph)
- Edge Weights: Euclidean distances
Node Features
Each city gets 4 features:
node_features = [
x_coordinate, # Absolute x position
y_coordinate, # Absolute y position
normalized_x, # x / max_range (0-1)
normalized_y # y / max_range (0-1)
]
Edge Features
Each edge gets 2 features:
edge_features = [
distance, # Euclidean distance
normalized_distance # distance / max_possible_distance
]
Data Generation Code
Here's how we generate a TSP instance:
class TSPDataGenerator:
def __init__(self, seed=42):
np.random.seed(seed)
torch.manual_seed(seed)
def generate_tsp_instance(self, num_cities=20, coord_range=(0, 100)):
# Generate random city coordinates
coords = np.random.uniform(
coord_range[0], coord_range[1],
size=(num_cities, 2)
)
# Create complete graph edges and compute distances
edge_list = []
edge_attrs = []
for i in range(num_cities):
for j in range(i + 1, num_cities):
# Compute Euclidean distance
dist = np.sqrt(
(coords[i, 0] - coords[j, 0])**2 +
(coords[i, 1] - coords[j, 1])**2
)
# Add edge in both directions (undirected)
edge_list.extend([[i, j], [j, i]])
edge_attrs.extend([[dist, dist/100.0], [dist, dist/100.0]])
# Create node features
node_features = torch.zeros(num_cities, 4, dtype=torch.float)
for i in range(num_cities):
node_features[i] = torch.tensor([
coords[i, 0], coords[i, 1],
coords[i, 0] / coord_range[1],
coords[i, 1] / coord_range[1]
])
edge_index = torch.tensor(edge_list, dtype=torch.long).t().contiguous()
edge_attr = torch.tensor(edge_attrs, dtype=torch.float)
return Data(
x=node_features,
edge_index=edge_index,
edge_attr=edge_attr,
coords=torch.tensor(coords, dtype=torch.float)
)
Generating Labels
We use the nearest neighbor heuristic to generate approximate optimal tours:
def compute_tour_nearest_neighbor(self, coords):
num_cities = len(coords)
unvisited = set(range(num_cities))
tour = [0] # Start at city 0
unvisited.remove(0)
current = 0
while unvisited:
# Find nearest unvisited city
nearest = min(unvisited,
key=lambda c: np.linalg.norm(coords[current] - coords[c]))
tour.append(nearest)
unvisited.remove(nearest)
current = nearest
return tour
Visualizing TSP Instances
Here's an example of what a TSP instance looks like with 20 cities:

Figure 1: A sample TSP instance with 20 cities. The goal is to find the shortest route visiting each city exactly once.
Building the GNN Model
Our architecture has three main components:
- Node Encoder: GNN layers to learn city representations
- Edge Encoder: MLP to process distance features
- Edge Classifier: Combines node and edge features for prediction
Complete Model Code
class TSPGNN(nn.Module):
def __init__(self, num_node_features, num_edge_features,
hidden_dim=128, num_layers=3, dropout=0.3):
super(TSPGNN, self).__init__()
# Node feature encoder (GCN layers)
self.node_conv1 = GCNConv(num_node_features, hidden_dim)
self.node_convs = nn.ModuleList([
GCNConv(hidden_dim, hidden_dim)
for _ in range(num_layers - 2)
])
self.node_conv_final = GCNConv(hidden_dim, hidden_dim)
# Edge feature encoder
self.edge_encoder = nn.Sequential(
nn.Linear(num_edge_features, hidden_dim),
nn.ReLU(),
nn.Dropout(dropout),
nn.Linear(hidden_dim, hidden_dim)
)
# Edge classifier: combines node and edge features
self.edge_classifier = nn.Sequential(
nn.Linear(hidden_dim * 3, hidden_dim), # 2 nodes + 1 edge
nn.ReLU(),
nn.Dropout(dropout),
nn.Linear(hidden_dim, hidden_dim // 2),
nn.ReLU(),
nn.Dropout(dropout),
nn.Linear(hidden_dim // 2, 2) # Binary classification
)
self.dropout = dropout
def forward(self, x, edge_index, edge_attr):
# Process node features through GNN
x = F.relu(self.node_conv1(x, edge_index))
x = F.dropout(x, p=self.dropout, training=self.training)
for conv in self.node_convs:
x = F.relu(conv(x, edge_index))
x = F.dropout(x, p=self.dropout, training=self.training)
x = F.relu(self.node_conv_final(x, edge_index))
# Process edge features
edge_features = self.edge_encoder(edge_attr)
# Get node features for each edge
row, col = edge_index
node_i = x[row] # Source node features
node_j = x[col] # Target node features
# Combine and classify
combined = torch.cat([node_i, node_j, edge_features], dim=1)
logits = self.edge_classifier(combined)
return F.log_softmax(logits, dim=1)
Why This Architecture Works
- GNN Layers: Each layer aggregates information from neighboring cities, learning spatial patterns
- Multiple Layers: With 3 layers, each city "sees" cities up to 3 hops away
- Edge Encoder: Learns that shorter distances are often in optimal tours
- Combined Features: The classifier sees both city locations AND distances for its decision
Training the Model
Training Loop
def train_model(model, train_loader, val_loader, epochs=100, lr=0.001):
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
criterion = nn.NLLLoss()
best_val_acc = 0
patience_counter = 0
for epoch in range(epochs):
model.train()
total_loss = 0
for batch in train_loader:
batch = batch.to(device)
optimizer.zero_grad()
out = model(batch.x, batch.edge_index, batch.edge_attr)
loss = criterion(out, batch.y)
loss.backward()
optimizer.step()
total_loss += loss.item()
# Validation
val_acc = evaluate(model, val_loader, device)
if val_acc > best_val_acc:
best_val_acc = val_acc
patience_counter = 0
torch.save(model.state_dict(), 'best_tsp_model.pt')
else:
patience_counter += 1
if patience_counter >= 20: # Early stopping
print(f"Early stopping at epoch {epoch}")
break
if (epoch + 1) % 10 == 0:
print(f'Epoch {epoch+1} | Loss: {total_loss/len(train_loader):.4f} | Val Acc: {val_acc:.4f}')
return model
Key Training Tips
- Use Early Stopping: Prevents overfitting
- Class Imbalance: Only ~14% of edges are in the tour โ consider weighted loss
- Learning Rate: 0.001 works well for Adam
- Batch Size: Use 1 for simplicity (each graph is one batch)
Training Progress
Here's what the training curves look like for our GNN model:

Figure 2: Training and validation metrics over epochs. Notice how the model learns to predict edges in optimal tours with increasing accuracy.
Evaluating Results
Edge Classification Metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
def evaluate(model, test_loader, device):
model.eval()
all_preds, all_labels = [], []
with torch.no_grad():
for batch in test_loader:
batch = batch.to(device)
out = model(batch.x, batch.edge_index, batch.edge_attr)
pred = out.argmax(dim=1)
all_preds.extend(pred.cpu().numpy())
all_labels.extend(batch.y.cpu().numpy())
print(f"Accuracy: {accuracy_score(all_labels, all_preds):.4f}")
print(f"Precision: {precision_score(all_labels, all_preds, average='weighted'):.4f}")
print(f"Recall: {recall_score(all_labels, all_preds, average='weighted'):.4f}")
print(f"F1-Score: {f1_score(all_labels, all_preds, average='weighted'):.4f}")
Expected Results
With proper training, you should see:
| Metric | Expected Range |
| Accuracy | 75-90% |
| Precision | 0.65-0.85 |
| Recall | 0.70-0.90 |
| F1-Score | 0.70-0.85 |
Model Performance Analysis
Let's examine the confusion matrix to understand how well our model performs:

Figure 3: Confusion matrix showing the model's edge classification performance. The model correctly identifies most edges that belong (and don't belong) to optimal tours.
Tour Quality Metrics
The following visualization shows how our predicted tours compare to optimal solutions:

Figure 4: Comparison of predicted tour lengths vs. optimal tour lengths. Our GNN model produces tours that are close to optimal solutions.
Visualizing Predictions
def visualize_tour(coords, true_tour, pred_edges):
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# True tour
axes[0].scatter(coords[:, 0], coords[:, 1], c='red', s=200, zorder=3)
for i in range(len(true_tour)):
start = true_tour[i]
end = true_tour[(i + 1) % len(true_tour)]
axes[0].plot([coords[start, 0], coords[end, 0]],
[coords[start, 1], coords[end, 1]], 'b-', lw=2)
axes[0].set_title('Optimal Tour')
# Predicted edges
axes[1].scatter(coords[:, 0], coords[:, 1], c='red', s=200, zorder=3)
for i, j in pred_edges:
axes[1].plot([coords[i, 0], coords[j, 0]],
[coords[i, 1], coords[j, 1]], 'g-', lw=1, alpha=0.5)
axes[1].set_title('Predicted Edges')
plt.tight_layout()
plt.savefig('tsp_comparison.png', dpi=150)
plt.show()
Here's an example of our model's predictions on a sample TSP instance:

Figure 5: Side-by-side comparison of optimal tour (left) and predicted edges (right) for a 20-city TSP instance.
Understanding Model Confidence
The model also provides probability scores for each edge. Here's a visualization of the probability distribution:

Figure 6: Probability heatmap showing which edges the model believes are most likely to be in the optimal tour. Darker colors indicate higher confidence.
Predicted Tour Visualization
Here's a complete predicted tour from our trained model:

Figure 7: A complete tour constructed from the model's edge predictions. The tour visits all cities and returns to the starting point.
Complete Code Repository
You can find the complete implementation on GitHub:
๐ Graph Neural Networks Tutorial Repository
The repository includes:
- โ Complete TSP implementation
- โ Step-by-step tutorials
- โ Pre-trained models
- โ Visualization scripts
- โ Additional GNN examples (supply chain optimization)
Conclusion and Next Steps
In this tutorial, you learned how to:
- โ Represent the Traveling Salesman Problem as a graph
- โ Design a GNN architecture for edge prediction
- โ Train and evaluate the model
- โ Visualize and interpret results
Where to Go From Here
Improve the Model:
- Try Graph Attention Networks (GAT) for better performance
- Implement reinforcement learning for direct tour optimization
- Add post-processing with 2-opt to improve tour quality
Scale Up:
- Handle larger instances (50-100+ cities)
- Use sparse graphs to reduce computation
- Implement hierarchical approaches for very large problems
Real-World Applications:
- Add time window constraints for delivery scheduling
- Handle multiple vehicles for fleet routing
- Incorporate real map data with actual road distances
๐ Key Takeaways
| Concept | What You Learned |
| TSP | Classic optimization problem with real-world applications |
| GNNs | Neural networks that naturally handle graph-structured data |
| Edge Prediction | Simpler than sequence prediction, works with variable sizes |
| PyTorch Geometric | Powerful library for GNN implementations |
Resources
- ๐ PyTorch Geometric Documentation
- ๐ Graph Neural Networks Review Paper
- ๐ Stanford CS224W: Machine Learning with Graphs
Did you find this tutorial helpful? Drop a comment below or share it with fellow ML enthusiasts!
Have questions? Feel free to leave a comment.
About the Author
Hi, I'm Israel, a data scientist and AI engineer passionate about transforming real-world challenges into innovative solutions with machine learning and data. I love mentoring and supporting others as they grow in their tech careers. When I'm not coding or coaching, you'll likely find me immersed in a game of chess or enjoying a good action movie with my family. I hope you enjoyed this blog post and learnt something.
