Over the last seven years or so I’ve worked with lots of clients. Ranging from SMEs right up to enterprise companies and multinationals. Generally speaking, things have gone pretty well.
However, this hasn’t been the case on every project I’ve ever worked on. Here’s one example of things not going so smoothly:
This doesn’t make me (or clients!) particularly happy. Failure really, really irritates me. But I believe that perfection is something to strive for but never achieve, and part of that process (for me at least) is using data, not gut instinct, to improve results.
As marketers, we have lots of challenges, and one of those is ‘where do we start?’ – how do we know what to focus on? Some of these questions are quite tactical:
- Which keywords have the best conversion rate?
- What time should we post on social media?
- Which landing pages are bouncing?
The answer to this question is normally, ‘just sort the data’. In this post, I’m going to tell you why that’s probably the wrong approach, and show you a better way of doing things.
What’s wrong with traditional sort?
There are lots of variations of sorting models, but the most common are lexicographical (A-Z or Z-A) and numerical (ascending or descending). We’re not going to focus on lexicographical in this article – but will be talking about the problems with numerical sort. One of the most common real-world examples we see of numerical sort is on ecommerce sites, sorting products by price (low to high or high to low):
In this context, regular sort is just fine. So there’s no issue here from me. But let’s take a look at some other examples more specific to marketing analysis.
We’ll take one of the questions we asked earlier: ‘which landing pages are bouncing?’ This is easy. Just sort the data:
But there’s a problem here. This is not accurate or particularly useful. Why? Let’s take a look at the Page View column:
We don’t have enough data to be confident they are our highest bouncing pages, or as Google put it:
“The true value of a metric (e.g. bounce rate, conversion rate, time on site etc) for dimensions (e.g. channel, page, etc) with small participants will be imprecise.”
Avinash Kaushik, Digital Marketing Evangelist, Google
What does this actually mean?
Let’s use an example. If one person visits a page and bounces, the bounce rate for that page is 100%. If the next 99 people visit don’t bounce, then the bounce rate is 1%. The initial dataset is just too small to make judgements on. We need to have more data (page views) to be confident that the bounce rate for that page is what is shown.
So Google’s engineers were given the following challenge…
“Compute the “expected true value” for each row on a table.”
The solution they came up with was weighted sort.
What is weighted sort?
Weighted sort is Google’s algorithm that sorts percentage columns in Google Analytics (GA) by its importance, instead of numerical order. This results in the data being reordered to reflect a more accurate picture. It does this by weighting metrics that have more participants as more accurate than those with few – resulting in the expected true value (ETV) of a metric.
How is weighted sort calculated?
Avinash Kaushik has a pretty good explanation over on his site – I’ll try and give the TL;DR version!
Imagine a scale:
On one end are zero site visits, and the other, there are many:
Values to the left (closer to zero) are, arguably, not very accurate because we don’t have enough data to be confident in its output. So they are deemed to be very close to the dataset’s average:
Values to the right (closer to many) are more accurate because we have more data to be statistically confident in their accuracy, so they are deemed to be very close to the figure shown:
Values in the middle are deemed to be a combination of the two.
This results in the data being re-sorted as per its ETV.
How weighted sort works – an example
If we go back to our original data set. There’s a ‘Sort Type’ option field above the data. We’ll hit that dropdown and select ‘Weighted’:
The result can be seen below. Notice the top three results all have a lower absolute bounce rate than the page in position four. This is because the top three all have more page views, so we can be more confident of their bounce rate metrics, whereas the one in position four has less page views, so its bounce rate (ETV) is brought closer to the site’s average (59.03%).
You can replace the dimension (pages) with another dimension in GA, such as channel, source, medium, and more.
So this just works for bounce rate?
Nope – it works across all percentage metrics in Google Analytics (except the metric % Exits). However, it’s not available in other Google products such as Ads or Data Studio.
However, I have good news!
A downloadable weighted sort template
This is where Dr. Pete Meyers, Marketing Scientist at Moz, comes in. He wrote a post a few years ago that included an explanation of weighted sort, and a free download of four variations of ETV. I’ll let him explain how it’s all calculated in his post, but the versions are listed below:
- A basic ETV formula
- Google’s ETV sort
- Weighted ETV formula
- Log-based ETV formula
I’d recommend you look at the bottom two as they are a more accurate calculation but don’t expect the same results from each formula. For example, the log model is more appropriate when you have larger range values (e.g. 1-100,000). Feel free to experiment with them and work out what’s best for your data and capabilities.
This is really where Dr. Pete’s post ends – he makes a couple of suggestions without any real detail, so I’m going to try and explain how you can take some of the questions and apply them to your marketing activity, but first, let’s take a look at his spreadsheet.
On the ‘Weighted ETV’ sheet, you’ll see five columns:
- Column A – the dimension
- Column B – the metric you’re measuring against
- Column C – the metric you want sort by
- Column D – a multiple of those two metrics.
- Column E – the ETV
You’ll see an additional column on the log ETV sheet – the log formula uses the Log (base 2) of visits instead of the raw visit value when working out ETV. That’s more complex maths, which we definitely won’t go into here!
What are the applications of ETV?
Weighted sort in GA covers bouncing pages, conversion rate, and other percentage metrics. But where else can we use ETV? What other questions can we ask? Well, anything with a reasonable amount of data!
- Top converting channel by revenue?
- Top converting channel by leads?
- What are my most valuable pages?
- Which keywords have the best conversion rate? (N.B. use Google Ads data for this)
- What time should we post on social media?
- Which landing pages have the best organic search opportunity?
The opportunities here are (almost!) limitless. Let’s take a couple of these questions and apply ETV to see what difference it makes!
What time should we post on social media?
I’m not a huge fan of this question given Facebook’s algorithm changes over the years, but it seems important to people so I’ll tackle it here. You’ll need your data from whatever your channel of choice is. The stats you’ll need are:
- Time of day
- Total engagements
- Engagement rate
Something like this:
Add your data into one of Dr Pete’s sheets (I used the Log ETV for this due to the range of data). And hey presto:
Note the best time in the first screenshot was between 0600-0700 in the morning, but ETV shows us that perhaps a better time would 1100-1200 or 1300-1400. It’s a small shift, but could have an impact on your social engagement rates.
Let’s move onto something which might show a bigger shift in data…
Which keywords have the best conversion rate?
Again, let’s get our data. Unlike the previous screenshot, we have hundreds of rows of data here because we’re not limited to twenty-four one-hour slots, so what you see below is only a snapshot. This time we’ll need the keywords, the number of clicks (or page views) and the conversion rate:
Add some weighted sort magic…
The outlook has dramatically changed – we even have an answer to a bonus question we didn’t even think of – should we bid on competitor keywords? The answer is emphatically, yes! The exact match keyword [sofa.com] is one of the best converting terms on the site and phrase and [loaf] and “loaf sofa” aren’t doing too badly either!
This data is incredibly useful, and could enable PPC professionals to hone in on those top converting keywords, or work out why some other key phrases aren’t performing as well.
Powerful stuff right? Now onto my favourite example…
Which keywords have the best organic search opportunity?
This one is a doozy. I should probably say you might not get the complete answer you’re looking for because there are other factors at play – but the below method isn’t a bad starting point.
The hypothesis is this – we use Google Search Console data to work out what keywords have:
- An average position just off page one
- Have a high CTR
- Are showing with enough impressions to make an impact if the average position was to move up a bit
We then use some judgement (and a bit of looking at the SERPs) to determine the intent of the keyword itself to focus on ideally, transactional terms.
So what’s the process? Well, let’s export Google Search Console data – the Search Analytics for Sheets add-on works pretty well here – to get our information. You’ll need all the data here; impressions, CTR, clicks & position – you can choose either query or URL for the dimension depending on whether you’re focussed on the keyword itself, or landing pages to improve. I’ve gone with the query here and selected a few months as the timeframe.
- Filter keywords with average position between 10 & 20
- Filter keywords with impressions > 500 (or whatever number is appropriate but 500 is not a bad baseline)
We then just want the impression & CTR data:
Take that data, and apply one of the weighted sort formulas to the CTR by impressions. Here’s what I get:
I can now focus my efforts on keywords and pages that:
- Rank on page 2
- Have enough impressions
- Have a high CTR
- Have commercial intent
This is pure gold. I should highlight there are obviously other factors at play here, but this gives you the basis to go and work on pages or keywords that Google thinks your site is relevant for and is already in a good place to improve.
It’s much easier to go from page two, to page one, than it is to create brand new content from scratch, and get to page one. This method helps SEOs and content marketers prioritise their focus – and it’s all thanks to weighted sort and ETV.
If all goes to plan, this post will be published around the time I’m speaking on stage at MeasureFest. You can check the slides for that over on my Slideshare too, although if you’ve got this far in the post, you won’t need to.
If you take this and move forward with it – I’d love to hear any success stories you’ve had – so don’t hesitate to reach out to me on Twitter!