Back to Projects
Project Overview
This project involved analyzing customer behavior patterns from a UK-based online retail dataset to identify distinct customer segments using RFM (Recency, Frequency, Monetary) analysis. The goal was to help the business understand their customer base better and develop targeted marketing strategies.
The analysis revealed 6 distinct customer segments, each with unique characteristics and business implications. The findings were visualized using Tableau dashboards to provide actionable insights for business decision-making.
Python
Pandas
NumPy
Scikit-learn
Tableau
RFM Analysis
K-means Clustering
Data Visualization
Problem Statement
The UK online retail company needed to understand their customer base better to:
- Identify high-value customers for retention strategies
- Develop targeted marketing campaigns for different customer groups
- Optimize resource allocation based on customer profitability
- Predict customer churn and implement preventive measures
- Improve overall customer lifetime value
Methodology
1. Data Preprocessing
Cleaned and prepared the dataset by handling missing values, removing duplicates, and standardizing data formats. The dataset contained transaction records with customer IDs, purchase dates, quantities, and monetary values.
# Data preprocessing example
import pandas as pd
import numpy as np
# Load and clean data
df = pd.read_excel('Online Retail.xlsx')
df = df.dropna(subset=['CustomerID'])
df = df[df['Quantity'] > 0]
df['TotalAmount'] = df['Quantity'] * df['UnitPrice']
2. RFM Analysis
Implemented RFM (Recency, Frequency, Monetary) analysis to evaluate customer value:
- Recency: Days since last purchase
- Frequency: Number of transactions
- Monetary: Total amount spent
# RFM Analysis
from datetime import datetime, timedelta
# Calculate RFM metrics
snapshot_date = df['InvoiceDate'].max() + timedelta(days=1)
rfm = df.groupby('CustomerID').agg({
'InvoiceDate': lambda x: (snapshot_date - x.max()).days,
'InvoiceNo': 'count',
'TotalAmount': 'sum'
}).rename(columns={
'InvoiceDate': 'Recency',
'InvoiceNo': 'Frequency',
'TotalAmount': 'Monetary'
})
3. Customer Segmentation
Applied K-means clustering to group customers into distinct segments based on their RFM scores. The optimal number of clusters was determined using the elbow method and silhouette analysis.
# K-means clustering
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Standardize features
scaler = StandardScaler()
rfm_scaled = scaler.fit_transform(rfm)
# Apply K-means clustering
kmeans = KMeans(n_clusters=6, random_state=42)
rfm['Cluster'] = kmeans.fit_predict(rfm_scaled)
Key Results
6
Customer Segments Identified
23%
Revenue from Champions
£2.1M
Total Revenue Analyzed
Customer Segments Identified:
- Champions: High-value, frequent customers who recently made purchases
- Loyal Customers: Regular customers with good recency and frequency
- At Risk: Previously valuable customers showing declining engagement
- Can't Lose: High-value customers who haven't purchased recently
- About to Sleep: Customers with declining frequency and recency
- Lost: Customers with low recency, frequency, and monetary value
Data Visualizations
Created comprehensive Tableau dashboards to visualize customer segments and their characteristics:
Revenue by Segment Bar Chart
Interactive Tableau visualization showing revenue distribution across customer segments
Customer Distribution Pie Chart
Visual representation of customer segment proportions
Monthly Revenue Trends
Time series analysis showing revenue patterns by segment over time
RFM Scatter Plot
Three-dimensional visualization of Recency, Frequency, and Monetary values
Business Impact & Recommendations
Strategic Recommendations:
- Champions: Implement VIP programs and exclusive offers to maintain loyalty
- Loyal Customers: Upsell and cross-sell opportunities to increase value
- At Risk: Immediate re-engagement campaigns with personalized offers
- Can't Lose: Win-back campaigns with significant incentives
- About to Sleep: Reactivation campaigns with product recommendations
- Lost: Low-cost retention efforts or consider them as acquisition targets
Expected Outcomes:
- 15-20% increase in customer retention rates
- 25% improvement in marketing campaign effectiveness
- 10-15% increase in average customer lifetime value
- Better resource allocation and ROI on marketing spend
Technical Implementation
Data Pipeline:
- Data extraction from Excel files
- Data cleaning and preprocessing using Pandas
- RFM metric calculation and scoring
- K-means clustering implementation
- Segment analysis and interpretation
- Tableau dashboard creation and deployment
Key Libraries Used:
# Key imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
import warnings
warnings.filterwarnings('ignore')
Lessons Learned
- Data quality is crucial for meaningful segmentation results
- Business context is essential for interpreting clustering results
- Visualization plays a key role in communicating insights to stakeholders
- Regular model updates are necessary as customer behavior evolves
- Combining statistical analysis with business knowledge yields better results