Scrape the Google Autosuggest with Python

Finding keywords with autocomplete may work fine. However, can you do it fast and on scale? With this Python script, you can have 20.000 longtail keywords in less than 60 seconds.

I bet you already used the autocomplete of Google for your keyword research. There are also well known SEO tools out there that use the autosuggest results to discover longtail keywords. If you want to perform this process on scale for thousands of seed keywords take a look at this Python solution. No proxy servers are required to fetch the autocomplete keywords.

Based on that Python Code we published a free online tool to scrape the Google Autosuggest. In addition to the shown code you can also scrape YouTube, Google Product and Google News keyword suggestions. It’s also possible to navigate and explore keywords by topics and search intent.

On this pot
    Scrape the Google Autosuggest with Python

    How to use the autocomplete scraper script

    • Put your seed keywords into keyword_seeds.csv
    • Run the main.py script
    • Look at results in keyword_suggestions.csv

    With the current settings you’ll fetch the autocomplete results of 50 seed keywords in less than one minute. If you have thousands of seed keywords you have to slow down the requests:

    • WAIT_TIME:
      The time delay between the autosuggest lookups for each seed keyword. Set this to 1 or 2 seconds and you should also be fine for thousands of keywords. Of course, this will increase the run time of your script.
    • MAX_WORKERS:
      We scrape keywords in parallel, which reduces the run time. If you have too many workers the total requests per second might be too high for Google and the script can be blocked again. Then reduce this value to 5 or 10.

    Here is the script to scrape the Google autosuggest:

    # Pemavor.com Autocomplete Scraper
    # Author: Stefan Neefischer (stefan.neefischer@gmail.com)
    
    import concurrent.futures
    import pandas as pd
    import itertools
    import requests
    import string
    import json
    import time
    ​
    startTime = time.time()
    ​
    # If you use more than 50 seed keywords you should slow down your requests - otherwise google is blocking the script
    # If you have thousands of seed keywords use e.g. WAIT_TIME = 1 and MAX_WORKERS = 10
    
    WAIT_TIME = 0.1
    MAX_WORKERS = 20
    ​
    # set the autocomplete language
    lang = "en"
    ​
    ​
    charList = " " + string.ascii_lowercase + string.digits
    ​
    def makeGoogleRequest(query):
        # If you make requests too quickly, you may be blocked by google 
        time.sleep(WAIT_TIME)
        URL="http://suggestqueries.google.com/complete/search"
        PARAMS = {"client":"firefox",
                "hl":lang,
                "q":query}
        headers = {'User-agent':'Mozilla/5.0'}
        response = requests.get(URL, params=PARAMS, headers=headers)
        if response.status_code == 200:
            suggestedSearches = json.loads(response.content.decode('utf-8'))[1]
            return suggestedSearches
        else:
            return "ERR"
    ​
    def getGoogleSuggests(keyword):
        # err_count1 = 0
        queryList = [keyword + " " + char for char in charList]
        suggestions = []
        for query in queryList:
            suggestion = makeGoogleRequest(query)
            if suggestion != 'ERR':
                suggestions.append(suggestion)
    ​
        # Remove empty suggestions
        suggestions = set(itertools.chain(*suggestions))
        if "" in suggestions:
            suggestions.remove("")
    ​
        return suggestions
    ​
    ​
    #read your csv file that contain keywords that you want to send to google autocomplete
    df = pd.read_csv("keyword_seeds.csv")
    # Take values of first column as keywords
    keywords = df.iloc[:,0].tolist()
    ​
    resultList = []
    ​
    with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
        futuresGoogle = {executor.submit(getGoogleSuggests, keyword): keyword for keyword in keywords}
    ​
        for future in concurrent.futures.as_completed(futuresGoogle):
            key = futuresGoogle[future]
            for suggestion in future.result():
                resultList.append([key, suggestion])
    ​
    # Convert the results to a dataframe
    outputDf = pd.DataFrame(resultList, columns=['Keyword','Suggestion'])
    ​
    # Save dataframe as a CSV file
    outputDf.to_csv('keyword_suggestions.csv', index=False)
    print('keyword_suggestions.csv File Saved')
    ​
    print(f"Execution time: { ( time.time() - startTime ) :.2f} sec")

    Do you need a custom solution with Python?

    One-size-fits-all solutions can’t quite meet everybody’s unique needs. We know that. It’s time to explore the endless possibilities for you. We can provide a custom Python solution. Right now, contact us.

    More Similar Posts

    Menu