Customer Segmentation Analysis

Project Overview

This project involved analyzing customer behavior patterns from a UK-based online retail dataset containing over 541,000 transactions to identify distinct customer segments using RFM (Recency, Frequency, Monetary) analysis. The goal was to help the business understand their customer base better and develop targeted marketing strategies.

The analysis revealed 6 distinct customer segments, each with unique characteristics and business implications. The findings were visualized using an interactive Tableau dashboard featuring KPI metrics, revenue analysis, trend visualizations, and detailed segment performance tables. The dashboard includes interactive filters, cross-filtering capabilities, and drill-down functionality to explore customer patterns at multiple levels of granularity.

Python Pandas NumPy Scikit-learn Tableau RFM Analysis K-means Clustering Data Visualization

Problem Statement

The UK online retail company needed to understand their customer base better to:

Identify high-value customers for retention strategies
Develop targeted marketing campaigns for different customer groups
Optimize resource allocation based on customer profitability
Predict customer churn and implement preventive measures
Improve overall customer lifetime value

Methodology

1. Data Preprocessing

Cleaned and prepared the dataset by handling missing values, removing duplicates, and standardizing data formats. The dataset contained transaction records with customer IDs, purchase dates, quantities, and monetary values.

# Data preprocessing example
import pandas as pd
import numpy as np

# Load and clean data
df = pd.read_excel('Online Retail.xlsx')
df = df.dropna(subset=['CustomerID'])
df = df[df['Quantity'] > 0]
df['TotalAmount'] = df['Quantity'] * df['UnitPrice']
            

2. RFM Analysis

Implemented RFM (Recency, Frequency, Monetary) analysis to evaluate customer value:

Recency: Days since last purchase
Frequency: Number of transactions
Monetary: Total amount spent

# RFM Analysis
from datetime import datetime, timedelta

# Calculate RFM metrics
snapshot_date = df['InvoiceDate'].max() + timedelta(days=1)
rfm = df.groupby('CustomerID').agg({
    'InvoiceDate': lambda x: (snapshot_date - x.max()).days,
    'InvoiceNo': 'count',
    'TotalAmount': 'sum'
}).rename(columns={
    'InvoiceDate': 'Recency',
    'InvoiceNo': 'Frequency',
    'TotalAmount': 'Monetary'
})
            

3. Customer Segmentation

Applied K-means clustering to group customers into distinct segments based on their RFM scores. The optimal number of clusters was determined using the elbow method and silhouette analysis.

# K-means clustering
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Standardize features
scaler = StandardScaler()
rfm_scaled = scaler.fit_transform(rfm)

# Apply K-means clustering
kmeans = KMeans(n_clusters=6, random_state=42)
rfm['Cluster'] = kmeans.fit_predict(rfm_scaled)
            

Key Results

The analysis successfully identified distinct customer segments with actionable business insights. Key metrics and segment characteristics are summarized below. For real-time, interactive exploration of all metrics, trends, and detailed segment performance, use the Tableau dashboard embedded below.

6

Customer Segments Identified

6.84%

Revenue from Champions

59.18%

Customers at Risk

£6.7M

Total Revenue Analyzed

Customer Segments Identified:

Champions: High-value, frequent customers who recently made purchases - ideal for VIP programs and exclusive offers
Loyal Customers: Regular customers with good recency and frequency - potential for upselling and cross-selling
At Risk: Previously valuable customers showing declining engagement - require immediate re-engagement campaigns
Can't Lose: High-value customers who haven't purchased recently - critical for win-back strategies
About to Sleep: Customers with declining frequency and recency - benefit from reactivation campaigns
Lost: Customers with low recency, frequency, and monetary value - consider low-cost retention or new acquisition strategies

Note: The interactive dashboard below provides dynamic filtering, detailed metrics breakdowns, monthly trend analysis, and drill-down capabilities to explore these segments in depth.

Interactive Tableau Dashboard

Explore the comprehensive Tableau dashboard below to visualize customer segments and their characteristics. The interactive dashboard provides real-time insights through:

Key Performance Indicators (KPIs)

Total Customers - Count of unique customers with trend indicators
Total Revenue - Sum of all transaction amounts in currency format
Average RFM Score - Mean RFM score across all customer segments

Primary Visualizations

Revenue by Segment - Bar chart showing total revenue and customer count per segment
Revenue Share - Pie chart displaying percentage distribution of revenue across segments

Secondary Visualizations

Monthly Revenue Trends - Line chart tracking revenue patterns over time by segment
Customer Distribution - Scatter plot mapping recency vs monetary value, with frequency shown as bubble size

Detailed Analysis

Segment Performance Table - Comprehensive metrics including customer count, average recency, frequency, monetary value, and RFM scores
Interactive Filters - Date range, segment selection, country filters, and revenue thresholds
Cross-filtering & Drill-down - Click any segment or data point to filter across all visualizations and explore individual customer details

Business Impact & Recommendations

Strategic Recommendations:

Champions: Implement VIP programs and exclusive offers to maintain loyalty
Loyal Customers: Upsell and cross-sell opportunities to increase value
At Risk: Immediate re-engagement campaigns with personalized offers
Can't Lose: Win-back campaigns with significant incentives
About to Sleep: Reactivation campaigns with product recommendations
Lost: Low-cost retention efforts or consider them as acquisition targets

Expected Outcomes:

15-20% increase in customer retention rates
25% improvement in marketing campaign effectiveness
10-15% increase in average customer lifetime value
Better resource allocation and ROI on marketing spend

Technical Implementation

Data Pipeline:

Data extraction from Excel files
Data cleaning and preprocessing using Pandas
RFM metric calculation and scoring
K-means clustering implementation
Segment analysis and interpretation
Tableau dashboard creation and deployment

Key Libraries Used:

# Key imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
import warnings
warnings.filterwarnings('ignore')
            

Lessons Learned

Data quality is crucial for meaningful segmentation results
Business context is essential for interpreting clustering results
Visualization plays a key role in communicating insights to stakeholders
Regular model updates are necessary as customer behavior evolves
Combining statistical analysis with business knowledge yields better results