1) Use existing data feeds
If you have a product feed available, you can start copy and paste right away. If not, you probably already have a logic for your site and a URL structure where you can find information about tag categories.
2) Start adding tag categories
Do you already know which keywords are the most popular in your business? Let’s use this knowledge and start adding tag categories that come to your mind with some signal words. — No need for a perfect list at that stage.
3) Categorize N-Grams
Try to free N-Gram tool for free with the highest search volumes to the existing tag group. By doing this, you’ll get the best coverage of tagged search queries as soon as possible. We’re sure you’ll find many new categories with this process — add them to your lookup list.
4) Add close variants and semantic similar words
At that stage, you can use the existing lookup list and look for similar words within the untagged words. Regex functions and string distance metrics will add a lot of new lookup keys to your list. Have a look at the Python module FuzzyWuzzy that uses Levenshtein distance for string similarity.
5) Add semantic similar words
It’s the “wow” moment for most people. If you create word embeddings with word2vec on your complete keyword set, it’s possible to search for semantically similar words. If you have an entity “color” with one value “green,” the word2vec model will show you “red” and “blue” as similar semantic words. It’ll boost your list size.