Customer Segmentation Analysis

RFM analysis of UK retail customers using Python and Tableau to identify distinct customer segments and create actionable business strategies

Back to Projects

Project Overview

This project involved analyzing customer behavior patterns from a UK-based online retail dataset containing over 541,000 transactions to identify distinct customer segments using RFM (Recency, Frequency, Monetary) analysis. The goal was to help the business understand their customer base better and develop targeted marketing strategies.

The analysis revealed 6 distinct customer segments, each with unique characteristics and business implications. The findings were visualized using an interactive Tableau dashboard featuring KPI metrics, revenue analysis, trend visualizations, and detailed segment performance tables. The dashboard includes interactive filters, cross-filtering capabilities, and drill-down functionality to explore customer patterns at multiple levels of granularity.

Python Pandas NumPy Scikit-learn Tableau RFM Analysis K-means Clustering Data Visualization

Problem Statement

The UK online retail company needed to understand their customer base better to:

Methodology

1. Data Preprocessing

Cleaned and prepared the dataset by handling missing values, removing duplicates, and standardizing data formats. The dataset contained transaction records with customer IDs, purchase dates, quantities, and monetary values.

# Data preprocessing example import pandas as pd import numpy as np # Load and clean data df = pd.read_excel('Online Retail.xlsx') df = df.dropna(subset=['CustomerID']) df = df[df['Quantity'] > 0] df['TotalAmount'] = df['Quantity'] * df['UnitPrice']

2. RFM Analysis

Implemented RFM (Recency, Frequency, Monetary) analysis to evaluate customer value:

# RFM Analysis from datetime import datetime, timedelta # Calculate RFM metrics snapshot_date = df['InvoiceDate'].max() + timedelta(days=1) rfm = df.groupby('CustomerID').agg({ 'InvoiceDate': lambda x: (snapshot_date - x.max()).days, 'InvoiceNo': 'count', 'TotalAmount': 'sum' }).rename(columns={ 'InvoiceDate': 'Recency', 'InvoiceNo': 'Frequency', 'TotalAmount': 'Monetary' })

3. Customer Segmentation

Applied K-means clustering to group customers into distinct segments based on their RFM scores. The optimal number of clusters was determined using the elbow method and silhouette analysis.

# K-means clustering from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler # Standardize features scaler = StandardScaler() rfm_scaled = scaler.fit_transform(rfm) # Apply K-means clustering kmeans = KMeans(n_clusters=6, random_state=42) rfm['Cluster'] = kmeans.fit_predict(rfm_scaled)

Key Results

The analysis successfully identified distinct customer segments with actionable business insights. Key metrics and segment characteristics are summarized below. For real-time, interactive exploration of all metrics, trends, and detailed segment performance, use the Tableau dashboard embedded below.

6

Customer Segments Identified

6.84%

Revenue from Champions

59.18%

Customers at Risk

£6.7M

Total Revenue Analyzed

Customer Segments Identified:

Note: The interactive dashboard below provides dynamic filtering, detailed metrics breakdowns, monthly trend analysis, and drill-down capabilities to explore these segments in depth.

Interactive Tableau Dashboard

Explore the comprehensive Tableau dashboard below to visualize customer segments and their characteristics. The interactive dashboard provides real-time insights through:

Key Performance Indicators (KPIs)

Primary Visualizations

Secondary Visualizations

Detailed Analysis

Business Impact & Recommendations

Strategic Recommendations:

Expected Outcomes:

Technical Implementation

Data Pipeline:

  1. Data extraction from Excel files
  2. Data cleaning and preprocessing using Pandas
  3. RFM metric calculation and scoring
  4. K-means clustering implementation
  5. Segment analysis and interpretation
  6. Tableau dashboard creation and deployment

Key Libraries Used:

# Key imports import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler from sklearn.metrics import silhouette_score import warnings warnings.filterwarnings('ignore')

Lessons Learned