Back to Projects
Project Overview
This project involved analyzing customer behavior patterns from a UK-based online retail dataset containing over 541,000 transactions to identify distinct customer segments using RFM (Recency, Frequency, Monetary) analysis. The goal was to help the business understand their customer base better and develop targeted marketing strategies.
The analysis revealed 6 distinct customer segments, each with unique characteristics and business implications. The findings were visualized using an interactive Tableau dashboard featuring KPI metrics, revenue analysis, trend visualizations, and detailed segment performance tables. The dashboard includes interactive filters, cross-filtering capabilities, and drill-down functionality to explore customer patterns at multiple levels of granularity.
Python
Pandas
NumPy
Scikit-learn
Tableau
RFM Analysis
K-means Clustering
Data Visualization
Problem Statement
The UK online retail company needed to understand their customer base better to:
- Identify high-value customers for retention strategies
- Develop targeted marketing campaigns for different customer groups
- Optimize resource allocation based on customer profitability
- Predict customer churn and implement preventive measures
- Improve overall customer lifetime value
Methodology
1. Data Preprocessing
Cleaned and prepared the dataset by handling missing values, removing duplicates, and standardizing data formats. The dataset contained transaction records with customer IDs, purchase dates, quantities, and monetary values.
# Data preprocessing example
import pandas as pd
import numpy as np
# Load and clean data
df = pd.read_excel('Online Retail.xlsx')
df = df.dropna(subset=['CustomerID'])
df = df[df['Quantity'] > 0]
df['TotalAmount'] = df['Quantity'] * df['UnitPrice']
2. RFM Analysis
Implemented RFM (Recency, Frequency, Monetary) analysis to evaluate customer value:
- Recency: Days since last purchase
- Frequency: Number of transactions
- Monetary: Total amount spent
# RFM Analysis
from datetime import datetime, timedelta
# Calculate RFM metrics
snapshot_date = df['InvoiceDate'].max() + timedelta(days=1)
rfm = df.groupby('CustomerID').agg({
'InvoiceDate': lambda x: (snapshot_date - x.max()).days,
'InvoiceNo': 'count',
'TotalAmount': 'sum'
}).rename(columns={
'InvoiceDate': 'Recency',
'InvoiceNo': 'Frequency',
'TotalAmount': 'Monetary'
})
3. Customer Segmentation
Applied K-means clustering to group customers into distinct segments based on their RFM scores. The optimal number of clusters was determined using the elbow method and silhouette analysis.
# K-means clustering
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Standardize features
scaler = StandardScaler()
rfm_scaled = scaler.fit_transform(rfm)
# Apply K-means clustering
kmeans = KMeans(n_clusters=6, random_state=42)
rfm['Cluster'] = kmeans.fit_predict(rfm_scaled)
Key Results
The analysis successfully identified distinct customer segments with actionable business insights. Key metrics and segment characteristics are summarized below. For real-time, interactive exploration of all metrics, trends, and detailed segment performance, use the Tableau dashboard embedded below.
6
Customer Segments Identified
6.84%
Revenue from Champions
£6.7M
Total Revenue Analyzed
Customer Segments Identified:
- Champions: High-value, frequent customers who recently made purchases - ideal for VIP programs and exclusive offers
- Loyal Customers: Regular customers with good recency and frequency - potential for upselling and cross-selling
- At Risk: Previously valuable customers showing declining engagement - require immediate re-engagement campaigns
- Can't Lose: High-value customers who haven't purchased recently - critical for win-back strategies
- About to Sleep: Customers with declining frequency and recency - benefit from reactivation campaigns
- Lost: Customers with low recency, frequency, and monetary value - consider low-cost retention or new acquisition strategies
Note: The interactive dashboard below provides dynamic filtering, detailed metrics breakdowns, monthly trend analysis, and drill-down capabilities to explore these segments in depth.
Interactive Tableau Dashboard
Explore the comprehensive Tableau dashboard below to visualize customer segments and their characteristics. The interactive dashboard provides real-time insights through:
Key Performance Indicators (KPIs)
- Total Customers - Count of unique customers with trend indicators
- Total Revenue - Sum of all transaction amounts in currency format
- Average RFM Score - Mean RFM score across all customer segments
Primary Visualizations
- Revenue by Segment - Bar chart showing total revenue and customer count per segment
- Revenue Share - Pie chart displaying percentage distribution of revenue across segments
Secondary Visualizations
- Monthly Revenue Trends - Line chart tracking revenue patterns over time by segment
- Customer Distribution - Scatter plot mapping recency vs monetary value, with frequency shown as bubble size
Detailed Analysis
- Segment Performance Table - Comprehensive metrics including customer count, average recency, frequency, monetary value, and RFM scores
- Interactive Filters - Date range, segment selection, country filters, and revenue thresholds
- Cross-filtering & Drill-down - Click any segment or data point to filter across all visualizations and explore individual customer details
Tableau Dashboard
To display the dashboard, please update the JavaScript code below with your published Tableau dashboard URL.
You can get the embed URL from Tableau Public or Tableau Server by clicking Share > Embed.
The dashboard file is located at: customer-segmentation-uk/tableau/customer-segmentation.twb
Business Impact & Recommendations
Strategic Recommendations:
- Champions: Implement VIP programs and exclusive offers to maintain loyalty
- Loyal Customers: Upsell and cross-sell opportunities to increase value
- At Risk: Immediate re-engagement campaigns with personalized offers
- Can't Lose: Win-back campaigns with significant incentives
- About to Sleep: Reactivation campaigns with product recommendations
- Lost: Low-cost retention efforts or consider them as acquisition targets
Expected Outcomes:
- 15-20% increase in customer retention rates
- 25% improvement in marketing campaign effectiveness
- 10-15% increase in average customer lifetime value
- Better resource allocation and ROI on marketing spend
Technical Implementation
Data Pipeline:
- Data extraction from Excel files
- Data cleaning and preprocessing using Pandas
- RFM metric calculation and scoring
- K-means clustering implementation
- Segment analysis and interpretation
- Tableau dashboard creation and deployment
Key Libraries Used:
# Key imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
import warnings
warnings.filterwarnings('ignore')
Lessons Learned
- Data quality is crucial for meaningful segmentation results
- Business context is essential for interpreting clustering results
- Visualization plays a key role in communicating insights to stakeholders
- Regular model updates are necessary as customer behavior evolves
- Combining statistical analysis with business knowledge yields better results