Python Status Code Checker for XML-Sitemap

Did you know that you can easily automate a lot of monitoring routines for important SEO KPIs yourself? Here is a small Python recipe that works with only one parameter!

The only parameter you’ll need to use in this Python script is the sitemap URL. When you run the script you will get the http status code for all URLs

The workflow looks like this:

  • If the main sitemap points to multiple sub sitemaps in XML format the script is collecting all of them
  • All available XML sitemaps are parsed and the URLs are extracted
  • All URLs are checked for their status code
  • All results are stored in a CSV file: the first column contains the url, the second one the status code

Schedule your monitoring routines

There are some tools that does the same job but this Python script has another advantage. You can run a lot of monitoring routines for important SEO KPIs in an automated way. You don’t have to open a desktop application and run the check. Schedule daily script runs and send an e-mail for URLs where the status code is not like “200”.

I’m currently playing around with sending results of multiple SEO audit scripts to Grafana. I like the idea of putting a lot of different audit tasks into a centralized dashboard.

But now have fun with the script!

# Sitemap URL Status Code Checker
# Author: Stefan Neefischer
# 1) Enter your Sitemap URL
# 2) Get CSV File with URL and Http Status Code

import advertools as adv
import pandas as pd
import requests
import time
import warnings

def getStatuscode(url):
        r = requests.head(url,verify=False,timeout=25, allow_redirects=False) # it is faster to only request the header
        return (r.status_code)
        return -1

def sitemap_status_code_checker(site,SLEEP):
    #get all urls from sitemap
    print("Start scraping sitemap urls")
    sitemap = adv.sitemap_to_df(site)
    sitemap = sitemap.dropna(subset=["loc"]).reset_index(drop=True)
    print("all sitemap urls have been scraped")

    print("Checking status code")
    # Loop over full list
    url_statuscodes = []
    for url in url_list:
        check = [url,getStatuscode(url)]

    # Save the result as csv file
    print("sitemapUrls_withStatusCode.csv created and saved")

# Enter your XML Sitemap
sitemap = ""

SLEEP = 1.0 # Time in seconds the script should wait between requests

Join the conversation on LinkedIn

More Similar Posts