From Tagging to Keyword Topics

Before you can create the perfect holistic SEO landing page you should understand the what the users are searching for.

Keyword Tagging

Need relevancy in keyword topics? Understand what the users actually search for.

Keyword research process in SEO and SEA took a big step with named entity recognition. It helps to identify and categorize entities, in other words key information in text. Google’s “refine keywords” option in keywords planner addresses that idea. 

You will realize that it works out nice for some keywords but on others it shows no found entities. Especially on rare words that are mainly used in your business domain this will happen very often. But there are approaches for you to build your own perfect working solution.


  • Think of big list with signal words like “buy”, “shop”, “order” that are mapped to a speaking name for that group – in our example a good name would be “Transaction”. You will already have a good idea which tag categories fit to your business. By running a large keyword list against a big lookup list of signal words and different tag categories you will get new insights that where hidden before.
  • You can do this in Excel for smaller projects with some scripting but it does not really scale well. In our solutions we use machine learning approaches running in Python. Let’s look at the full process.


You have no idea about the keywords: Clustering, N-Gram Analysis

When you are completely new to a business and have no idea about how users conduct their searches, it is difficult to start with keyword tagging. First of all, you’ll need a plan about the most important categories to define lookup lists that make sense. For that reason you should start with keyword clustering techniques or n-gram analysis to get a basic understanding of keyword topics.

You already have an idea about your keyword topics: Keyword Tagging

When knowing some of your keyword topics you can start right away defining your look up lists. I highly recommend to use n-Gram analysis and start with the words with the highest word counts. This will give you the best keyword tagging coverage in a short period of time. You can use our free n-Gram analyzer tool for this task.


Build your own entity database

Build up a centralized entity database for your business. Don’t use it just for tagging your keywords. Think of applying it on SERP snippets or your competitors content.

Keywords grouped by Entity Tags

Entity Recognition applied on a keyword list. You can also use additional metrics like search volume in addition to the keyword count.

1) Use existing data feeds

If you have a product feed available you can start copy and pasting right away. If not you probably have already a logic for your site and url structure where you can find information about tag categories.

2) Start adding tag categories

You will already know a lot about your keywords. Use this knowledge and start adding tag categories that come to your mind together with some signal words for that tag group. No need for a perfect list at that stage.

3) Categorize N-Grams

Try to add n-Grams with the biggest search volume to the existing tag group. You can use our free n-Gram tool for this. By doing this you well get the best coverage of tagged search queries in the shortest amount of time. I’m sure you will find a lot of new categories in that process – add them to your lookup list.

4) Add close variants and semantic similar words

At that stage you can use the existing lookup list and search for similar words within the untagged words. Regex functions and string distance metrics will add a lot of new lookup keys to your list. Have a look at the python module FuzzyWuzzy that uses Levenshtein distance for string similarity.

5) Add semantic similar words

This is the “Wow”-moment for most of the people. If you create word embeddings with word2vec on your complete keyword set it is possible to search for semantic similar words. If you have an entity “color” with one value “green” the word2vec model will show you “red” and “blue” as semantic similar words. This will boost your list size.


Avoid RegEx or String Similarity Comparisons

Regular impressions are very expensive in computation time when you process large keyword files. We used them for setting up our tag category database. The same problem occurs when you are using similarity algorithms like levenshtein. We used those approaches once for defining our lookup lists. By doing this we are able to use python dictionaries for all the tagging. This is blazing fast.

Use Dictionaries or Flash Text for lookups

There is a great python module out there, called Flash Text, that can be used for the keyword tagging process. Load your lookup database into the KeywordProcessor and start extracting entities. It is all about speed – maybe other approaches are ok when making lookups for 100.000 keywords. But think of 100.000 website pages you want tag with our entity database – now you will realize whether your solution scales or not. If you are interested in the Flash Text algorithm you can have a look here.

Go Parallel in your processing

If speed is very important for you or if the datasets are very big you can think of splitting the tagging process to multiple worker scripts. When you are publishing your python solution to amazon lambda or google cloud functions you can easily use those scaling approaches.