Manipulate data in Excel
Data cleaning, formatting, Excel formulas, telecom churn
This section focuses on preparing telecom customer data for analysis using Microsoft Excel. The aim is to clean, format, and transform multiple raw datasets so they can be used for further analysis in tools like Power BI or SQL. I used Excel 2019 for Mac to carry out this work, applying a wide range of formulas and logic to simulate typical data analyst tasks in the telecom industry.
The project is based on a simulated dataset representing a UK broadband service provider. The original data was sourced from a public Kaggle dataset and then modified by me to include more realistic formatting issues, calculated fields, and structure suited for demonstrating key Excel formulas and functions. It includes customer records, payment history, contact information, and support interactions — all areas commonly reviewed to understand customer churn and improve service performance.
Datasets used:
The dataset includes several interlinked tables:
• Customer data: Demographics, service subscriptions, churn status
• Contact records: Names, phone numbers, and cities
• Billing history: Monthly invoices, payments, and overdue status
• Support tickets: Issues raised and resolution times
Each dataset includes a short glossary of field names and definitions to support clarity and consistency. A summary is available below (at the bottom of this page), and a full metadata tab is included in each Excel file.
You can also access the full Excel files here, including the original datasets, cleaned versions, and metadata tabs for reference or review.
Learning objectives:
• Apply Excel tools, formulas, and functions to clean and prepare real-world customer data
• Use Named Ranges and structured Tables to automate and simplify analysis
• Transform and categorise fields using logical, text, and lookup functions
Project objectives:
• Identify trends in customer churn by contract type, region, and payment behaviour
• Highlight common characteristics of high-risk customers
• Prepare structured outputs for use in Power BI or SQL-based querying
• Build a repeatable data cleaning workflow using Excel
Cleaning & manipulating text
Here, I attempted to identify any messy data that needs to be cleaned up before I can work with it using some useful formulas and functions Excel has to offer, as follows:
• Applied a Table format Cmd+T with Banded Rows/Columns and gave the table the name ‘bookings’.
• Checked for and replaced Blanks Home > Find & Select > Go To Special > Blanks with ‘Unknown’, and Duplicates Data or Table > Remove Duplicates.
👉 Click here to view dataset explanation
• Used some of Excel’s cleaning functions such as =CLEAN() to remove any ASCII/non-printing characters, and =TRIM() to get rid of the extra spaces I spotted in the ‘customers’ worksheet. Other useful functions include =UPPER(), =LOWER() and =PROPER() to change the case of the text.
In this example, I combined the =TRIM(), the =PROPER(), and the =TEXTJOIN() functions to remove any leading or trailing spaces, apply the correct capitalisation, and convert the First-name and Last-name values into one Full-name (column D).
👉 Click here to view dataset explanation

• Removing and replacing text characters: Using =SUBSTITUTE(), [also =UNICODE() and =UNICHAR()]…
• Concatenating text data: [Find another example than Date]. This can be done by using the CONCATENATE() or CONCAT() functions for adjacent cells, or the more versatile TEXTJOIN() using “??” as delimiters [between text values], as follows:
• Splitting data: Besides the =LEFT(), =RIGHT() and =MID() functions used to split text data by extracting a number of characters as required, I used in this example…
• Combining text functions: Using =FIND(), and =LEN() functions…
• Converting data with VALUE and TEXT: in the ‘bookings’ sheet, I noticed that the tour prices (base-rate-£ and total-paid-£) are stored as Text values (aligned to Left). Focusing only on the ‘total-paid-£’ column, I created the new column (?) ‘total-paid’ and converted its numeric values into Number/Currency using the =VALUE() function, as follows:
For further exploration, I extracted the weekday and month (columns ?) [on which bookings were made] from the ‘booking-date’ column using the opposite function =TEXT() that converts numeric values (Numbers and Dates) into Text, as follows:
• Automating Data Validation with Named Ranges: …
Billing Data: Late Payments and Underpayments [Check!]
As part of simulating real-world business scenarios, I extended the dataset to include monthly billing and payment records. This allowed me to demonstrate Excel skills commonly used in financial operations, customer billing, and revenue assurance roles.
Key tasks included:
-
Using
=IF()
formulas to flag late payments based on due and payment dates -
Highlighting underpaid invoices with conditional formatting
-
Creating calculated fields to measure discrepancies between billed and paid amounts
-
Summarising total revenue vs. outstanding balances using pivot tables
-
Automating field validation with data rules and drop-down lists (eg, payment status)
A-
A-
A-
Working with numbers & dates
• Generating valid Dates: in the ‘bookings’ worksheet, the booking dates come in 3 separate columns specifying day, month and year, and I’m required to create a full date in the ‘dd/mm/yyyy’ format using the =DATE() function, as follows:
• Calculations with Dates: Now for argument’s sake, let’s say I want to calculate every customer’s age at the time of booking (in whole years) since I have their date of birth. I could either subtract the two dates (booking_date minus birth_date) and divide the age in days by 365.25:
=DAYS([@[booking_date]],[@[birth_date]])/365.25
Other useful functions for working with Dates include:
=TODAY() and =NOW(), “volatile” functions will return the current date [and time] to use as a reference point for instance.
=WORKDAY() will return the serial number of the date before or after a specified number of workdays. Similarly, the =WORKDAY.INTL() will help specify the non-working day(s) by selecting from a fixed set of weekend arguments.
To deal with and separate out holidays, such as Easter Monday, the best way is to have them listed somewhere else in the workbook and then just select them using this syntax: =WORKDAY(start_date, days, [holidays-range]).
=YEARFRAC() will do the same job for years rather than days. [x]
Date calculations with EOMONTH and EDATE:
…
• Rounding numbers: Unlike modifying cell formatting to just change how numbers are displayed, the ROUND, ROUNDDOWN, and ROUNDUPfunctions will alter the underlying number to a certain precision (number of digits after the decimal point).
Examples:
=ROUND(14.5,0) is 15, and =ROUND(14.4,0) is 14
=ROUNDDOWN(14.9,0) returns 14
=ROUNDUP(14.1,0) returns 15
Defined Names
• Cell Referencing and Naming: Rr: What we need to do is just tell Excel that that needs to stay fixed, or what Excel calls “absolute”. And we do this by putting a dollar sign in front of the column and a dollar sign in front of the row for the cell that we want to stay locked. […] So four options, relative (=A1:A3*B1), absolute (=A1:A3*$B$1), and two mixed cell references (=A1:A3*$B1) and (=A1:A3*B$1).
• Named Range: another elegant way to call a single cell (or a range of cells) something sensible rather than typing the $-sign by renaming it from the “name box” (subject to some naming rules). Now the cell will be referred to by the new name in all my calculations, and I would automatically get an absolute cell reference.
Also, other tool options for naming ranges include Define Name with an additional “scope” specificity, and Create from Selection to create from multiple Named Ranges, both accessed from the Formulas > Defined Names group.
• Calculations with Named Ranges using =COUNTIFS() and =SUMIFS().
Similarly, =AVERAGEIFS(), =MAXIFS() and =MINIFS() will return the average (arithmetic mean), maximum value and minimum value, respectively, of all cells that meet multiple criteria.
Final ‘bookings’ sheet (cleaned):
No Url Found
https://alidataportfolio.co.uk/wp-content/uploads/2023/01/bookings-cleaned.xlsx
Note: all shortcuts I demonstrate in this guide are based on using MS Excel 2019 on a Mac.
Note: all shortcuts I demonstrate in this guide are based on using MS Excel 2019 on a Mac.
Note: I came across an error about “Table name already exists”, and I managed to resolve it by adjusting/renaming the tables’ names from the Formulas > Name Manager
Field glossaries (Metadata):
Customer churn metadata:
- ― tv_streaming: Indicates whether customer has streaming TV service
- ― movie_streaming: Indicates whether customer has streaming movies service
- ― contract_length: Contract length (e.g. month-to-month, 1 year)
- ― contract_type: Type of contract chosen (e.g. prepaid, rolling)
- ― payment_method: Customer’s payment method
- ― is_paperless: Indicates if billing is paperless
- ― device_protection: Indicates if device protection is active
- ― tech_support: Indicates if customer has access to technical support
- ― monthly_charges: Current monthly subscription fee
- ― total_charges: Total charges billed to the customer to date
- ― contract_start_date: Start date of the customer’s current contract
- ― last_login: Most recent login date for customer account
- ― monthly_charges_formatted: Formatted version of monthly charges (as text)
- ― paperless_billing_formatted: Formatted version of paperless billing field
- ― contract_type_formatted: Formatted version of contract type field
- ― total_charges_formatted: Formatted version of total charges field
- ― tv_streaming: Indicates whether customer has streaming TV service
- ― movie_streaming: Indicates whether customer has streaming movies service
- ― contract_length: Contract length (e.g. month-to-month, 1 year)
- ― contract_type: Type of contract chosen (e.g. prepaid, rolling)
- ― payment_method: Customer’s payment method
- ― is_paperless: Indicates if billing is paperless
- ― device_protection: Indicates if device protection is active
- ― tech_support: Indicates if customer has access to technical support
- ― monthly_charges: Current monthly subscription fee
- ― total_charges: Total charges billed to the customer to date
- ― contract_start_date: Start date of the customer’s current contract
- ― last_login: Most recent login date for customer account
- ― monthly_charges_formatted: Formatted version of monthly charges (as text)
- ― paperless_billing_formatted: Formatted version of paperless billing field
- ― contract_type_formatted: Formatted version of contract type field
- ― total_charges_formatted: Formatted version of total charges field
Contacts metadata:
- ― customer_id: Unique customer identifier
- ― first_name: Customer’s first name (may contain inconsistent casing or spacing)
- ― last_name: Customer’s last name (may contain inconsistent casing or spacing)
- ― email: Email address (may contain spacing or formatting issues)
- ― phone: Mobile number with UK prefix (may contain spacing or format issues)
- ― city: Customer’s city of residence, matched to region
Billing and tickets metadata:
- ― customer_id: Unique customer identifier
- ― invoice_date: Date when the invoice was issued (varied formats)
- ― amount_due: Monthly charge amount due (may include symbols or spacing)
- ― amount_paid: Amount actually paid by customer (may vary from amount_due)
- ― payment_status: Payment state (Paid, Unpaid, Partial; inconsistent casing)
- ― payment_method: Method of payment (e.g., credit card, direct debit; may have spacing issues)
- ― ticket_id: Unique support ticket ID
- ― opened_date: Date when support request was opened (mixed formats)
- ― resolved_date: Date when ticket was closed (may be blank for open cases)
- ― issue_type: Type of issue (technical, billing, complaint, etc.)
- ― resolution_status: Ticket status (Resolved/Open)
- ― resolution_time_days: Number of days taken to resolve issue
Customer data prep with Excel formulas and logic