Did you know that you can easily automate a lot of monitoring routines for important SEO KPIs yourself? Here is a small Python recipe that works with only one parameter!
The only parameter you’ll need to use in this Python script is the sitemap URL. When you run the script you will get the http status code for all URLs
The workflow looks like this:
- If the main sitemap points to multiple sub sitemaps in XML format the script is collecting all of them
- All available XML sitemaps are parsed and the URLs are extracted
- All URLs are checked for their status code
- All results are stored in a CSV file: the first column contains the url, the second one the status code
Schedule your monitoring routines
There are some tools that does the same job but this Python script has another advantage. You can run a lot of monitoring routines for important SEO KPIs in an automated way. You don’t have to open a desktop application and run the check. Schedule daily script runs and send an e-mail for URLs where the status code is not like “200”.
I’m currently playing around with sending results of multiple SEO audit scripts to Grafana. I like the idea of putting a lot of different audit tasks into a centralized dashboard.
But now have fun with the script!
# Pemavor.com Sitemap URL Status Code Checker # Author: Stefan Neefischer # 1) Enter your Sitemap URL # 2) Get CSV File with URL and Http Status Code import advertools as adv import pandas as pd import requests import time import warnings warnings.filterwarnings("ignore") def getStatuscode(url): try: r = requests.head(url,verify=False,timeout=25, allow_redirects=False) # it is faster to only request the header return (r.status_code) except: return -1 def sitemap_status_code_checker(site,SLEEP): #get all urls from sitemap print("Start scraping sitemap urls") sitemap = adv.sitemap_to_df(site) sitemap = sitemap.dropna(subset=["loc"]).reset_index(drop=True) url_list=list(sitemap['loc'].unique()) print("all sitemap urls have been scraped") print("Checking status code") # Loop over full list url_statuscodes =  for url in url_list: print(url) check = [url,getStatuscode(url)] time.sleep(SLEEP) url_statuscodes.append(check) # Save the result as csv file url_statuscodes_df=pd.DataFrame(url_statuscodes,columns=["url","status_code"]) url_statuscodes_df.to_csv("sitemapUrls_withStatusCode.csv",index=False) print("sitemapUrls_withStatusCode.csv created and saved") # Enter your XML Sitemap sitemap = "https://www.pemavor.com/sitemap.xml" SLEEP = 1.0 # Time in seconds the script should wait between requests sitemap_status_code_checker(sitemap,SLEEP)