Scrape the Google Autosuggest with Python

Finding keywords with autocomplete may work fine. But can you do it fast and on scale? With this Python script, you can have 20.000 longtail keywords in less than 60 seconds!

I bet you already used the autocomplete of Google for your keyword research. There are also well known SEO tools out there that use the autosuggest results to discover longtail keywords. If you want to perform this process on scale for thousands of seed keywords take a look at this Python solution. No proxy servers are required to fetch the autocomplete keywords!

How to use the autocomplete scraper script?

  • Put your seed keywords into keyword_seeds.csv
  • Run the main.py script
  • Look at results in keyword_suggestions.csv

With the current settings you will fetch the autocomplete results of 50 seed keywords in less than one minute. If you have thousands of seed keywords you have to slow down the requests:

  • WAIT_TIME:
    The time delay between the autosuggest lookups for each seed keyword. Set this to 1 or 2 seconds and you should also be fine for thousands of keywords. Of course this will increase the run time of your script.
  • MAX_WORKERS:
    We scrape keywords in parallel, which reduces the run time. If you have too many workers the total requests per second might be too high for Google and the script can be blocked again. Then reduce this value to 5 or 10.

Here is the script to scrape the Google autosuggest:

# Pemavor.com Autocomplete Scraper
# Author: Stefan Neefischer (stefan.neefischer@gmail.com)

import concurrent.futures
import pandas as pd
import itertools
import requests
import string
import json
import time
​
startTime = time.time()
​
# If you use more than 50 seed keywords you should slow down your requests - otherwise google is blocking the script
# If you have thousands of seed keywords use e.g. WAIT_TIME = 1 and MAX_WORKERS = 10

WAIT_TIME = 0.1
MAX_WORKERS = 20
​
# set the autocomplete language
lang = "en"
​
​
charList = " " + string.ascii_lowercase + string.digits
​
def makeGoogleRequest(query):
    # If you make requests too quickly, you may be blocked by google 
    time.sleep(WAIT_TIME)
    URL="http://suggestqueries.google.com/complete/search"
    PARAMS = {"client":"firefox",
            "hl":lang,
            "q":query}
    headers = {'User-agent':'Mozilla/5.0'}
    response = requests.get(URL, params=PARAMS, headers=headers)
    if response.status_code == 200:
        suggestedSearches = json.loads(response.content.decode('utf-8'))[1]
        return suggestedSearches
    else:
        return "ERR"
​
def getGoogleSuggests(keyword):
    # err_count1 = 0
    queryList = [keyword + " " + char for char in charList]
    suggestions = []
    for query in queryList:
        suggestion = makeGoogleRequest(query)
        if suggestion != 'ERR':
            suggestions.append(suggestion)
​
    # Remove empty suggestions
    suggestions = set(itertools.chain(*suggestions))
    if "" in suggestions:
        suggestions.remove("")
​
    return suggestions
​
​
#read your csv file that contain keywords that you want to send to google autocomplete
df = pd.read_csv("keyword_seeds.csv")
# Take values of first column as keywords
keywords = df.iloc[:,0].tolist()
​
resultList = []
​
with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
    futuresGoogle = {executor.submit(getGoogleSuggests, keyword): keyword for keyword in keywords}
​
    for future in concurrent.futures.as_completed(futuresGoogle):
        key = futuresGoogle[future]
        for suggestion in future.result():
            resultList.append([key, suggestion])
​
# Convert the results to a dataframe
outputDf = pd.DataFrame(resultList, columns=['Keyword','Suggestion'])
​
# Save dataframe as a CSV file
outputDf.to_csv('keyword_suggestions.csv', index=False)
print('keyword_suggestions.csv File Saved')
​
print(f"Execution time: { ( time.time() - startTime ) :.2f} sec")

Join the conversation on LinkedIn

More Similar Posts

You might also like

Menu