I am getting hit numerous times by crawlers on a page which triggers an API call. I would like to limit access to that page for bots who do not respect my robots.txt.
Note: This question is not a duplicate.. I want rate limiting not IP blacklisting.
I am getting hit numerous times by crawlers on a page which triggers an API call. I would like to limit access to that page for bots who do not respect my robots.txt.
Note: This question is not a duplicate.. I want rate limiting not IP blacklisting.
Check out the gem: Rack::Attack!
Battle-tested in production environments.
If you are using redis in your project you can very simply implement requests counter for API request. This approach allows you not to just limit robots access, but limit different API request using different policies based on your preferences. Take a loook on this gem or follow this guide if you want to implement limit by yourself.
So, for anyone interested, I found an alternative solution that works without adding rack attack or redis. It's a little hacky, but hey, it might help someone else.
count = 0
unless Rails.cache.read("user_ip_#{get_ip}_count").nil?
count = Rails.cache.read("user_ip_#{get_ip}_count") + 1
if count > 20
flash[:error] = "You're doing that too much. Slow down."
redirect_to root_path and return false
end
end
Rails.cache.write("user_ip_#{get_ip}_count", count, :expires_in => 60.minutes)
This limits any requests to the geocoder to 20/hour. For testing purposes:
def get_ip
if Rails.env.production?
@ip = request.remote_ip
else
@ip = "{YOUR_IP}"
end
end
Update
I thought this was a great idea, but it turns out it doesn't work due to changing IP addresses of crawlers. I have instead implemented this rather simple code:
if request.bot?
Rails.logger.info "Bot Request Denied from #{get_ip}"
flash[:error] = "Bot detected."
redirect_to root_path and return false
end
Using this handy rails gem: voight_kampff