Clustering Google Display Placements Using Tf-Idf + KMeans

You can group display placements together based on their web site content and judge elements even with low sample sizes. This will help you detect the good and bad performing ones and make adjustments.

Grouping display placements together based on their website content uncovers very well performing clusters along with some poor ones. The problem we solve with an approach like this lets us handle and judge elements with low sample size. By grouping together elements of the same nature, we suddenly have big numbers and we can run actions like blocking placements or adding positive patterns to our managed placement list.

For our solution we need this:

  • A large list of placement URLs out of Google Ads
  • Python code for scraping all websites – putting all extracted words/n-Grams in a vector space and build Tf-Idf matrix. Instead of text we have a vector now that describes the webpage. Now we can easily run some computations.
  • Run kmeans clustering on Tf-Idf matrix
  • Visualize clusters with word clouds
  • Join performance data to every cluster

I had to play around with the number of clusters that make sense (and some other settings in the Tf-Idf Vectorizer) but after a short while I got this:

These are just 4 example clusters that had a poor performance after joining the Google Ads data. With that approach it is possible to scan hundreds of word clouds easily – and maybe block them. If you have a problem with low sample sizes for your placements you can also use the cluster performance (value per click) for estimating a good bid for your managed placement.

Join the conversation on LinkedIn

More Similar Posts