g

ADP

New portfolio in progress

Back to top

Excel Data Preparation for Broadband Churn Analytics

CLEAN, SORT & FILTER THE DATA

With the ultimate goal of analysis in mind, which is to identify trends within the data to answer business questions and make decisions as a result, I started off by preparing the data in spreadsheets for further exploration. First up, I have imported the main .csv files (bookings, customers, prices, discounts, found here) into .xlsx file format, and consolidated ‘prices’ and ‘discounts’ into one ‘prices’ worksheet.

Note: all shortcuts I demonstrate in this guide are based on using MS Excel 2019 on a Mac.

Learning objectives:

• Use Excel tools, formulas, and functions to clean and prepare data for analysis.
• Use Named Ranges and Tables to automate my analysis.
• Use logical and lookup functions to transform and categorise data.

Cleaning & manipulating text

Here, I attempted to identify any messy data that needs to be cleaned up before I can work with it using some useful formulas and functions Excel has to offer, as follows:

• Applied a Table format Cmd+T with Banded Rows/Columns and gave the table the name ‘bookings’.

• Checked for and replaced Blanks Home > Find & Select > Go To Special > Blanks with ‘Unknown’, and Duplicates Data or Table > Remove Duplicates.

• Used some of Excel’s cleaning functions such as =CLEAN() to remove any ASCII/non-printing characters, and =TRIM() to get rid of the extra spaces I spotted in the ‘customers’ worksheet. Other useful functions include =UPPER(), =LOWER() and =PROPER() to change the case of the text.

In this example, I combined the =TRIM(), the =PROPER(), and the =TEXTJOIN() functions to remove any leading or trailing spaces, apply the correct capitalisation, and convert the First-name and Last-name values into one Full-name (column D), respectively as follows:

Removing and replacing text characters: Using =SUBSTITUTE(), [also =UNICODE() and =UNICHAR()]…

Concatenating text data: [Find another example than Date]. This can be done by using the CONCATENATE() or CONCAT() functions for adjacent cells, or the more versatile TEXTJOIN() using “??” as delimiters [between text values], as follows:

Splitting data: Besides the =LEFT(), =RIGHT() and =MID() functions used to split text data by extracting a number of characters as required, I used in this example…

Combining text functions: Using =FIND(), and =LEN() functions…

Converting data with VALUE and TEXT: in the ‘bookings’ sheet, I noticed that the tour prices (base-rate-£ and total-paid-£) are stored as Text values (aligned to Left). Focusing only on the ‘total-paid-£’ column, I created the new column (?) ‘total-paid’ and converted its numeric values into Number/Currency using the =VALUE() function, as follows:

For further exploration, I extracted the weekday and month (columns ?) [on which bookings were made] from the ‘booking-date’ column using the opposite function =TEXT() that converts numeric values (Numbers and Dates) into Text, as follows:

Automating Data Validation with Named Ranges: …

Working with numbers & dates

Generating valid Dates: in the ‘bookings’ worksheet, the booking dates come in 3 separate columns specifying day, month and year, and I’m required to create a full date in the ‘dd/mm/yyyy’ format using the =DATE() function, as follows:

Calculations with Dates: Now for argument’s sake, let’s say I want to calculate every customer’s age at the time of booking (in whole years) since I have their date of birth. I could either subtract the two dates (booking_date minus birth_date) and divide the age in days by 365.25:
=DAYS([@[booking_date]],[@[birth_date]])/365.25

Other useful functions for working with Dates include:

=TODAY() and =NOW(), “volatile” functions will return the current date [and time] to use as a reference point for instance.

=WORKDAY() will return the serial number of the date before or after a specified number of workdays. Similarly, the =WORKDAY.INTL() will help specify the non-working day(s) by selecting from a fixed set of weekend arguments.

To deal with and separate out holidays, such as Easter Monday, the best way is to have them listed somewhere else in the workbook and then just select them using this syntax: =WORKDAY(start_date, days, [holidays-range]).

=YEARFRAC() will do the same job for years rather than days. [x]

Date calculations with EOMONTH and EDATE:

Rounding numbers: Unlike modifying cell formatting to just change how numbers are displayed, the ROUND, ROUNDDOWN, and ROUNDUPfunctions will alter the underlying number to a certain precision (number of digits after the decimal point).

Examples:
=ROUND(14.5,0) is 15, and =ROUND(14.4,0) is 14
=ROUNDDOWN(14.9,0) returns 14
=ROUNDUP(14.1,0) returns 15

Defined Names

Cell Referencing and Naming: Rr: What we need to do is just tell Excel that that needs to stay fixed, or what Excel calls “absolute”. And we do this by putting a dollar sign in front of the column and a dollar sign in front of the row for the cell that we want to stay locked. […] So four options, relative (=A1:A3*B1), absolute (=A1:A3*$B$1), and two mixed cell references (=A1:A3*$B1) and (=A1:A3*B$1).

Named Range: another elegant way to call a single cell (or a range of cells) something sensible rather than typing the $-sign by renaming it from the “name box” (subject to some naming rules). Now the cell will be referred to by the new name in all my calculations, and I would automatically get an absolute cell reference.

Also, other tool options for naming ranges include Define Name with an additional “scope” specificity, and Create from Selection to create from multiple Named Ranges, both accessed from the Formulas > Defined Names group.

Calculations with Named Ranges using =COUNTIFS() and =SUMIFS().

Similarly, =AVERAGEIFS(), =MAXIFS() and =MINIFS() will return the average (arithmetic mean), maximum value and minimum value, respectively, of all cells that meet multiple criteria.

Final ‘bookings’ sheet (cleaned):
[embeddoc url=”” viewer=”microsoft”]
https://alidataportfolio.co.uk/wp-content/uploads/2023/01/bookings-cleaned.xlsx

Note: all shortcuts I demonstrate in this guide are based on using MS Excel 2019 on a Mac.

Note: all shortcuts I demonstrate in this guide are based on using MS Excel 2019 on a Mac.

Note: I came across an error about “Table name already exists”, and I managed to resolve it by adjusting/renaming the tables’ names from the Formulas > Name Manager