Customer Segmentation Analysis

RFM analysis of UK retail customers using Python and Tableau to identify distinct customer segments and create actionable business strategies

Back to Projects

Project Overview

This project involved analyzing customer behavior patterns from a UK-based online retail dataset to identify distinct customer segments using RFM (Recency, Frequency, Monetary) analysis. The goal was to help the business understand their customer base better and develop targeted marketing strategies.

The analysis revealed 6 distinct customer segments, each with unique characteristics and business implications. The findings were visualized using Tableau dashboards to provide actionable insights for business decision-making.

Python Pandas NumPy Scikit-learn Tableau RFM Analysis K-means Clustering Data Visualization

Problem Statement

The UK online retail company needed to understand their customer base better to:

Methodology

1. Data Preprocessing

Cleaned and prepared the dataset by handling missing values, removing duplicates, and standardizing data formats. The dataset contained transaction records with customer IDs, purchase dates, quantities, and monetary values.

# Data preprocessing example import pandas as pd import numpy as np # Load and clean data df = pd.read_excel('Online Retail.xlsx') df = df.dropna(subset=['CustomerID']) df = df[df['Quantity'] > 0] df['TotalAmount'] = df['Quantity'] * df['UnitPrice']

2. RFM Analysis

Implemented RFM (Recency, Frequency, Monetary) analysis to evaluate customer value:

# RFM Analysis from datetime import datetime, timedelta # Calculate RFM metrics snapshot_date = df['InvoiceDate'].max() + timedelta(days=1) rfm = df.groupby('CustomerID').agg({ 'InvoiceDate': lambda x: (snapshot_date - x.max()).days, 'InvoiceNo': 'count', 'TotalAmount': 'sum' }).rename(columns={ 'InvoiceDate': 'Recency', 'InvoiceNo': 'Frequency', 'TotalAmount': 'Monetary' })

3. Customer Segmentation

Applied K-means clustering to group customers into distinct segments based on their RFM scores. The optimal number of clusters was determined using the elbow method and silhouette analysis.

# K-means clustering from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler # Standardize features scaler = StandardScaler() rfm_scaled = scaler.fit_transform(rfm) # Apply K-means clustering kmeans = KMeans(n_clusters=6, random_state=42) rfm['Cluster'] = kmeans.fit_predict(rfm_scaled)

Key Results

6

Customer Segments Identified

23%

Revenue from Champions

18%

Customers at Risk

£2.1M

Total Revenue Analyzed

Customer Segments Identified:

Data Visualizations

Created comprehensive Tableau dashboards to visualize customer segments and their characteristics:

Revenue by Segment Bar Chart

Interactive Tableau visualization showing revenue distribution across customer segments

Customer Distribution Pie Chart

Visual representation of customer segment proportions

Monthly Revenue Trends

Time series analysis showing revenue patterns by segment over time

RFM Scatter Plot

Three-dimensional visualization of Recency, Frequency, and Monetary values

Business Impact & Recommendations

Strategic Recommendations:

Expected Outcomes:

Technical Implementation

Data Pipeline:

  1. Data extraction from Excel files
  2. Data cleaning and preprocessing using Pandas
  3. RFM metric calculation and scoring
  4. K-means clustering implementation
  5. Segment analysis and interpretation
  6. Tableau dashboard creation and deployment

Key Libraries Used:

# Key imports import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler from sklearn.metrics import silhouette_score import warnings warnings.filterwarnings('ignore')

Lessons Learned