Manipulate data in Excel
Data cleaning, formatting, Excel formulas, telecom churn
This project simulates a real-world task for a data analyst at a UK broadband company; turning messy customer data into something clean, structured, and ready for ad-hoc analysis. Using Excel, I applied a range of practical tools to get the dataset into shape for deeper insight in Power BI or SQL.
About dataset:
The dataset used for this project is a modified version of a public Kaggle telecom dataset, adapted by me to reflect more realistic UK broadband customer scenarios. It includes over 700 records covering customer churn, billing history, support tickets, and demographics. The data is split across 4 interlinked tables:
- Customer data: Demographics, plans, churn
- Contact records: Names, phone numbers, and cities
- Billing history: Invoices, payments, overdue
- Support tickets: Issues raised and resolution times
Note: all shortcuts I demonstrate in this guide are based on using MS Excel 2019 on a Mac.
Project & Learning Objectives:
This Excel-based project focused on replicating practical workflows a business data analyst might follow. Along the way, I aimed to:
- Clean and prepare data in Excel
Spot churn trends by contract or region
Build workflows using tables and ranges
Get data ready for Power BI or SQL
In this section, I take on typical tasks a manager might throw my way; cleaning up raw CRM data, digging into insights, and shaping it all in Excel.
Each task begins with a short scenario, followed by the formulas and functions used. To keep things focused, snippets show 15–20 rows, display only relevant columns, and highlight any cleaned or updated fields.

Communicate Insights
Prepare summaries, notes or key actions for your team or manager.
Cleaning data:
Q (Manager): Got this from CRM, formatting’s broken. Sort it out quick!
A: All sorted, gave it a full clean-up.
I prepped the dataset for analysis by fixing casing, trimming spaces, filling blanks, and removing duplicates — all common issues in CRM exports. Here’s how I tackled it in Excel:
Pre-cleaning scan: No-formula tools for quick fixes
Alt: Pre-cleaning scan (no-formula tools)
Before applying any formulas, I ran a visual scan and used Excel’s built-in tools to identify quick wins and obvious data issues. These steps helped spot duplicates, missing values, and common typos before deeper cleaning.
Remove Duplicates: Quickly deleted exact email repeats using Data → Remove Duplicates.
Go To Special → Blanks: Highlighted empty cells to find partially entered records or missing fields.
Sort & Filter: Used filter dropdowns and A–Z sorting to group similar issues and visually detect problems like inconsistent casing or strange symbols.
Find & Replace: Replaced obvious typos like ” at “ with @, or “.con” with “.com”.
Text to Columns: Where needed, split full names into first and last name using space as a delimiter via Data → Text to Columns.
All steps were applied directly to the working sheet without helper columns.
=TEXTJOIN() — Merge multiple values into a single field
I merged first_name and last_name into a new full_name column using =TEXTJOIN() to simplify filtering, reporting, and name referencing across the dataset.
While tools like =CONCAT() or Power Query can do this too, I chose =TEXTJOIN() for its flexibility and simplicity in quick Excel workflows.
Formula applied in D2 and copied down: =TEXTJOIN(” “, TRUE, B2, C2)
See formulas in Google Sheets here.
=LOWER() + =TRIM() — Standardise casing and remove excess spaces
I noticed formatting issues in email entries, so I created a helper column to clean them using simple functions. I started by combining =TRIM() to remove leading/trailing spaces with =LOWER() to standardise casing.
Formula applied in E2 and copied down: =LOWER(TRIM(D2))
See formulas in Google Sheets here.
=FIND() + IF() — Flag and fix emails missing the “@” symbol
Some email entries were missing the “@” symbol or had incorrect characters in its place. I added a helper column to flag these rows using =FIND() and then applied =IF() to either keep the cleaned value or fix the error with simple substitutions.
Formula applied in F2: =IF(ISNUMBER(FIND(“@”, E2)), E2, SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(E2, “”””, “@”), “,”, “@”), ” at “, “@”))
See formulas in Google Sheets here.
=CLEAN() — Remove non-printable characters
To prevent issues caused by invisible line breaks or tab spaces (often copied in from CRMs or web forms), I used =CLEAN() to remove non-printable characters. This can also be followed by =PROPER() if consistent casing is needed (eg. for names).
Formula applied in G2 and copied down: =CLEAN(A2)
=SUBSTITUTE() + =TRIM() — Fix unwanted characters and tidy spacing
Some email entries had repeated dots or stray characters like commas or extra spaces. I used =SUBSTITUTE() to replace these and paired it with =TRIM() to clean up the spacing.
Formula applied in H2 and copied down: =TRIM(SUBSTITUTE(G2, “..”, “.”))
• Removing and replacing text characters: Using =SUBSTITUTE(), [also =UNICODE() and =UNICHAR()]…
• Concatenating text data: [Find another example than Date]. This can be done by using the CONCATENATE() or CONCAT() functions for adjacent cells, or the more versatile TEXTJOIN() using “??” as delimiters [between text values], as follows:
• Splitting data: Besides the =LEFT(), =RIGHT() and =MID() functions used to split text data by extracting a number of characters as required, I used in this example…
• Combining text functions: Using =FIND(), and =LEN() functions…
• Converting data with VALUE and TEXT: in the ‘bookings’ sheet, I noticed that the tour prices (base-rate-£ and total-paid-£) are stored as Text values (aligned to Left). Focusing only on the ‘total-paid-£’ column, I created the new column (?) ‘total-paid’ and converted its numeric values into Number/Currency using the =VALUE() function, as follows:
For further exploration, I extracted the weekday and month (columns ?) [on which bookings were made] from the ‘booking-date’ column using the opposite function =TEXT() that converts numeric values (Numbers and Dates) into Text, as follows:
• Automating Data Validation with Named Ranges: …
Billing Data: Late Payments and Underpayments [Check!]
As part of simulating real-world business scenarios, I extended the dataset to include monthly billing and payment records. This allowed me to demonstrate Excel skills commonly used in financial operations, customer billing, and revenue assurance roles.
Key tasks included:
-
Using
=IF()
formulas to flag late payments based on due and payment dates -
Highlighting underpaid invoices with conditional formatting
-
Creating calculated fields to measure discrepancies between billed and paid amounts
-
Summarising total revenue vs. outstanding balances using pivot tables
-
Automating field validation with data rules and drop-down lists (eg, payment status)
A-
A-
Working with numbers & dates
• Generating valid Dates: in the ‘bookings’ worksheet, the booking dates come in 3 separate columns specifying day, month and year, and I’m required to create a full date in the ‘dd/mm/yyyy’ format using the =DATE() function, as follows:
• Calculations with Dates: Now for argument’s sake, let’s say I want to calculate every customer’s age at the time of booking (in whole years) since I have their date of birth. I could either subtract the two dates (booking_date minus birth_date) and divide the age in days by 365.25:
=DAYS([@[booking_date]],[@[birth_date]])/365.25
Other useful functions for working with Dates include:
=TODAY() and =NOW(), “volatile” functions will return the current date [and time] to use as a reference point for instance.
=WORKDAY() will return the serial number of the date before or after a specified number of workdays. Similarly, the =WORKDAY.INTL() will help specify the non-working day(s) by selecting from a fixed set of weekend arguments.
To deal with and separate out holidays, such as Easter Monday, the best way is to have them listed somewhere else in the workbook and then just select them using this syntax: =WORKDAY(start_date, days, [holidays-range]).
=YEARFRAC() will do the same job for years rather than days. [x]
Date calculations with EOMONTH and EDATE:
…
• Rounding numbers: Unlike modifying cell formatting to just change how numbers are displayed, the ROUND, ROUNDDOWN, and ROUNDUPfunctions will alter the underlying number to a certain precision (number of digits after the decimal point).
Examples:
=ROUND(14.5,0) is 15, and =ROUND(14.4,0) is 14
=ROUNDDOWN(14.9,0) returns 14
=ROUNDUP(14.1,0) returns 15
Defined Names
• Cell Referencing and Naming: Rr: What we need to do is just tell Excel that that needs to stay fixed, or what Excel calls “absolute”. And we do this by putting a dollar sign in front of the column and a dollar sign in front of the row for the cell that we want to stay locked. […] So four options, relative (=A1:A3*B1), absolute (=A1:A3*$B$1), and two mixed cell references (=A1:A3*$B1) and (=A1:A3*B$1).
• Named Range: another elegant way to call a single cell (or a range of cells) something sensible rather than typing the $-sign by renaming it from the “name box” (subject to some naming rules). Now the cell will be referred to by the new name in all my calculations, and I would automatically get an absolute cell reference.
Also, other tool options for naming ranges include Define Name with an additional “scope” specificity, and Create from Selection to create from multiple Named Ranges, both accessed from the Formulas > Defined Names group.
File Reference
View this cleaned version in my Excel snippet folder (OneDrive link)
Full version available in the shared Google Drive project folder
Final ‘bookings’ sheet (cleaned):
No Url Found
https://alidataportfolio.co.uk/wp-content/uploads/2023/01/bookings-cleaned.xlsx
Can you fix this? There’s messy data all over!
Note: all shortcuts I demonstrate in this guide are based on using MS Excel 2019 on a Mac.
Note: all shortcuts I demonstrate in this guide are based on using MS Excel 2019 on a Mac.
Note: I used `.xlsx` to retain colour-coding and formatting; final cleaned datasets are also available in `.csv`
Note: I came across an error about “Table name already exists”, and I managed to resolve it by adjusting/renaming the tables’ names from the Formulas > Name Manager
• Applied a Table format Cmd+T with Banded Rows/Columns and gave the table the name ‘bookings’.
Q: Which region has the highest churn rate?
Q: Compare monthly churn across 2023 — are there spikes we should worry about?
Q: Who are our most loyal customers (tenure + on-time payments)?
Q: Show me average resolution time by support issue type.
Q: Any link between late payments and churn?
Q: Which broadband bundles are most popular among customers over 65?
Field glossaries (Metadata):
Customer churn metadata:
- ― tv_streaming: Indicates whether customer has streaming TV service
- ― movie_streaming: Indicates whether customer has streaming movies service
- ― contract_length: Contract length (e.g. month-to-month, 1 year)
- ― contract_type: Type of contract chosen (e.g. prepaid, rolling)
- ― payment_method: Customer’s payment method
- ― is_paperless: Indicates if billing is paperless
- ― device_protection: Indicates if device protection is active
- ― tech_support: Indicates if customer has access to technical support
- ― monthly_charges: Current monthly subscription fee
- ― total_charges: Total charges billed to the customer to date
- ― contract_start_date: Start date of the customer’s current contract
- ― last_login: Most recent login date for customer account
- ― monthly_charges_formatted: Formatted version of monthly charges (as text)
- ― paperless_billing_formatted: Formatted version of paperless billing field
- ― contract_type_formatted: Formatted version of contract type field
- ― total_charges_formatted: Formatted version of total charges field
- ― tv_streaming: Indicates whether customer has streaming TV service
- ― movie_streaming: Indicates whether customer has streaming movies service
- ― contract_length: Contract length (e.g. month-to-month, 1 year)
- ― contract_type: Type of contract chosen (e.g. prepaid, rolling)
- ― payment_method: Customer’s payment method
- ― is_paperless: Indicates if billing is paperless
- ― device_protection: Indicates if device protection is active
- ― tech_support: Indicates if customer has access to technical support
- ― monthly_charges: Current monthly subscription fee
- ― total_charges: Total charges billed to the customer to date
- ― contract_start_date: Start date of the customer’s current contract
- ― last_login: Most recent login date for customer account
- ― monthly_charges_formatted: Formatted version of monthly charges (as text)
- ― paperless_billing_formatted: Formatted version of paperless billing field
- ― contract_type_formatted: Formatted version of contract type field
- ― total_charges_formatted: Formatted version of total charges field
Contacts metadata:
- ― customer_id: Unique customer identifier
- ― first_name: Customer’s first name (may contain inconsistent casing or spacing)
- ― last_name: Customer’s last name (may contain inconsistent casing or spacing)
- ― email: Email address (may contain spacing or formatting issues)
- ― phone: Mobile number with UK prefix (may contain spacing or format issues)
- ― city: Customer’s city of residence, matched to region
Billing and tickets metadata:
- ― customer_id: Unique customer identifier
- ― invoice_date: Date when the invoice was issued (varied formats)
- ― amount_due: Monthly charge amount due (may include symbols or spacing)
- ― amount_paid: Amount actually paid by customer (may vary from amount_due)
- ― payment_status: Payment state (Paid, Unpaid, Partial; inconsistent casing)
- ― payment_method: Method of payment (e.g., credit card, direct debit; may have spacing issues)
- ― ticket_id: Unique support ticket ID
- ― opened_date: Date when support request was opened (mixed formats)
- ― resolved_date: Date when ticket was closed (may be blank for open cases)
- ― issue_type: Type of issue (technical, billing, complaint, etc.)
- ― resolution_status: Ticket status (Resolved/Open)
- ― resolution_time_days: Number of days taken to resolve issue