Clustering Google Display Placements Using Tf-Idf + KMeans

Stefan Neefischer

Last Updated: Aug 22, 2024
at 10:35 am

You can group display placements together based on their website content and judge elements even with low sample sizes. This will help you detect the good and bad performing ones and make adjustments.

Grouping display placements together based on their website content uncovers very well performing clusters along with some poor ones. The problem we solve with an approach like this lets us handle and judge elements with low sample size. By grouping together elements of the same nature, we suddenly have big numbers, and we can run actions like blocking placements or adding positive patterns to our managed placement list.

Clustering Google Display Placements Using Tf-Idf + KMeans

For our solution, we need this:

A large list of placement URLs out of Google Ads.
Python code for scraping all websites—putting all extracted words/n-grams in a vector space and build Tf-Idf matrix. Instead of text, we have a vector now that describes the web page. Now we can easily run some computations.
Run kmeans clustering on Tf-Idf matrix.
Visualize clusters with word clouds.
Join performance data to every cluster.

We had to play around with the number of clusters that make sense (and some other settings in the Tf-Idf Vectorizer) but after a short while we got four examples of clusters that had a poor performance after joining the Google Ads data. With that approach, it’s possible to scan hundreds of word clouds easily and maybe block them. If you have a problem with low sample sizes for your placements, you can also use the cluster performance (value per click) for estimating a good bid for your managed placement.

Aug 22, 2024
at 10:35 am