Python Requests Retry Failed Requests
Although the Requests library makes it easier to make HTTP requests in Python, getting failed requests is frequent due to a network connection issue or other reasons. Therefore, this tutorial introduces the different causes and teaches you how to create a Python Requests retry script to attempt your requests again.
The two main methods we’ll cover are:
What to Know to Build the Python Requests’ Retry Logic
Should retries be attempted in all cases or only in specific scenarios? When is the appropriate time to retry, and how many attempts should be made?
In this section, we’ll answer those questions and provide code examples to help you build a web crawler Python Requests retry logic.
Types of Failed Requests
Understanding the reasons behind a failed request will allow you to develop strategies to deal with each case. Essentially, we can talk of requests that timed out (there’s no response from the server) and requests that returned an error.
Let’s see each one.
Timed out
A request may time out, resulting in no response from the server. That can happen for several reasons, such as overloaded servers, problems with how the server responds, or slow network connections.
When faced with timeout scenarios, consider checking your internet connection, as a stable connection may suggest the problem is server related.
You can catch exceptions related to timeouts, such as requests.Timeout
, and implement a Python retry mechanism conditionally or with strategies like exponential backoff. We’ll look at these later on.
Returned an Error
When a request is unsuccessful, it’ll most often return an error response, which typically comes with a specific status code and an error message. The first tells what went wrong, and the second includes additional information that can provide insights into the actual problem. For instance:
Your first approach to addressing this scenario is to review both the status code and error message while ensuring that the request is properly formed. If you suspect that the error results from a temporary problem or server issues, you may retry the request with caution.
Frustrated that your web scrapers are blocked once and again?
Status Codes for a Python Requests Retry Loop
The different errors in client-server communications are in the 4xx and 5xx code ranges. They include:
- 400 Bad Request.
- 401 Unauthorized.
- 403 Forbidden.
- 404 Not Found.
- 405 Method Not Allowed.
- 408 Request Timeout.
- 429 Too Many Requests.
- 500 Internal Server Error.
- 501 Not Implemented.
- 502 Bad Gateway.
- 503 Service Unavailable.
- 504 Gateway Timeout.
- 505 HTTP Version Not Supported.
The most common ones you’ll see while web scraping are:
Error Code | Explanation |
---|---|
403 Forbidden | The server understands the request but won’t fulfill it because it doesn’t have the right permissions or access. |
429 Too Many Requests | The server has received too many requests from the same IP within a given time frame, so it’s rate-limiting in web scraping. |
500 Internal Server Error | A generic server error occurred, indicating that something went wrong on the server while processing the request. |
502 Bad Gateway | The server acting as a gateway or proxy received an invalid response from an upstream server. |
503 Service Unavailable | The server is too busy or undergoing maintenance and can’t handle the request right now. |
504 Gateway Timeout | An upstream server didn’t respond quickly enough to the gateway or proxy. |
You can check out the MDN docs for more information on HTTP response status codes.
Number of Retries
Setting the number of retries for a failed request depends on several considerations, such as the type of request error and the response time. Errors like 429 Too Many Requests
are temporary and should have more retries than those that aren’t.
While there’s no best maximum number of retries, it’s recommended to set a reasonable limit to avoid indefinite retries and potential performance issues. You can start with small values like three or five.
Delay
Delays between requests should be set to prevent websites and APIs from becoming overloaded and to maintain compliance with rate limits.
Fixed or Random Delay
A fixed delay between requests can be introduced using the time.sleep()
function from the time module. And to add randomness to the delay, you can employ a combination of the time.sleep()
function and the random
module.
Just like the number of retries, there isn’t a rule set in stone for how long the delay should be, but you can experiment with different reasonable delay values around 300ms to find an optimal balance.
Backoff Strategy for the Delays
The backoff strategy is a commonly used technique for setting increasing delays between retries instead of fixed or random ones. Each request increases the delay by an exponential backoff factor, usually greater than one. This approach generally helps to handle temporary issues while avoiding overloading servers.
The backoff algorithm is this:
For example, here are the delay sequences for backoff factors 2, 3, and 10:
Best Methods to Retry Python Requests
In this section, we’ll look at the best methods to retry Python Requests. They include:
- Use an existing retry wrapper: Python Sessions with HTTPAdapter.
- Code your retry wrapper.
We recommend the first one, but the second one might be suitable in some scenarios.
Method 1: Use an Existing Retry Wrapper: Python Sessions with HTTPAdapter
Python Requests uses the urllib3 HTTP client under the hood. With the Python Requests’ HTTP adapter class and the Retry utility class from the urllib3
package, you can set up retries in Python. The HTTPAdapter class lets you specify a retry strategy and also change the behavior of requests.
Retry on Failure
To implement the Python Requests retry logic in case of failure, start by defining options.
We set the maximum number of requests to 4
and specified that the request should only be reattempted if the error has a status code of either 429
, 500
, 502
, 503
, or 504
.
The retry strategy is passed to the HTTPAdapter when creating a new adapter
object. The adapter is then mounted to a session object, which is used to make all requests.
Sessions and HTTPAdapter with a Backoff Strategy
To use the backoff strategy to set increasing delays between retries, add the backoff_factor
parameter in the retry wrapper:
Method 2: Code Your Retry Wrapper
Unlike in the previous option, we’ll create a custom wrapper for the retry logic now. That way, you’ll have the flexibility of implementing a custom error handler, logging, and more.
Python Requests: Retry on Failure
To keep it easy, let’s create the Python function retry_request
to simulate the retry logic implementation of method 1.
Inside, it takes in the target URL as its first argument, then total
for the number of retries, and status_forcelist
to specify the type of errors for which to retry the request.
Retry Python Requests with a Backoff Strategy
To retry Python Requests with a backoff strategy, take the previous code as a base. Then, create a separate function named backoff_delay
to calculate the delay and use the time.sleep()
function to make it happen like this:
Using the backoff_delay
function, you’ll have the following:
Avoid Getting Blocked by Error 403 with Python Requests
Getting blocked because you’re identified as a bot is the biggest problem when crawling. Some websites may block your IP address or take other measures to prevent you from accessing the site if you are detected as a bot.
To prove this, let’s attempt to scrape a protected page on G2.com:
Run the code, and you’ll get a response like the one below, indicating that you were blocked with a 403 error:
Best Practice: Retry Python Requests with a Decorator
Using a decorator to implement retries is a cleaner approach, as the Python Requests retry logic contained in the decorator can be easily applied to multiple methods or functions.
Instead of implementing the decorator yourself, you can use Tenacity, a community-maintained package that simplifies the process of adding retry behavior to requests.
Start by installing Tenacity:
Italic The retry
decorator from Tenacity takes in arguments like stop
for the maximum number of retries and wait
for details, among others.
Here you have it implemented in a scraper:
POST Retry with Python Requests
In addition to the GET
method request we’ve been using, other methods can be retried, such as POST
for creating new resources on the server and PUT
for updating existing resources. For example, to submit a form.
You can use Tenacity to make a POST
request by replacing requests.get
with requests.post
. Check out line 10:
Conclusion
Handling failed requests is critical to building a robust and reliable web scraper. In this tutorial, we looked into the importance of retrying failed requests and what to know to code them. Now you know:
- The most important Python Requests retry logic considerations.
- The two best options for retries.
- How to retry requests with different HTTP methods.
One of the biggest challenges is getting access denied because you’re detected as a bot. To overcome that barrier, a popular web scraping API like ZenRows will help prevent you from getting blocked and save you tons of time and effort against anti-bot measures.
Frequent Questions
How Do You Retry a Request in Python?
You can retry a request in Python by either using the existing wrapper from Requests or creating a custom wrapper with loops and exception handling in order to implement a retry mechanism.
How Do You Force Keep-Alive in Python Requests?
You can force Keep-Alive in Python Requests by using the Session
object and setting the Connection header to keep-alive
. That lets the underlying TCP connection be used for other requests to the same server, which improves performance when making multiple requests to the same endpoint.
1 | # Create a session object |
How Do You Handle Timeouts in Python Requests?
You can handle timeouts using the timeout
parameter, which specifies the maximum amount of time (in seconds) that the request should wait for a response before raising a timeout exception.
1 | # pip install requests |
What is Error Code 404 in Python Requests?
Error code 404 is the same as the HTTP status Not Found
, which means that the server couldn’t find the requested resource.