1

Using Python 3, I am trying to download a file (xlsx) from a https ASP.Net form page using Python requests. I am creating a session and at first trying to login to the site. It is HTTPS but I do not have access to SSL cert, so I am using Verify=False, which I am happy with for this purpose.

I have manually set the User-Agent header with help from here, to the same as the browser in Network traffic capturing under IE F12 feature, as this page seems to need a browser user-agent, as the python requests user-agent may be forbidden.

I am also capturing __VIEWSTATE and __VIEWSTATEGENERATOR from the response text as advised in this answer and adding this to my POST data along with Username & Password.

import requests
import bs4


login_payload = {'ctl00_txtEmailAddr':my_login, 'ctl00_txtPwd': pwd}
headers = {'User-Agent': user_agent,
       'Accept':r'*/*',
       'Accept-Encoding':r'gzip, deflate',
       'Connection': r'Keep-Alive'}

s = requests.Session()
req = requests.Request('GET', my_url, headers=headers)
prep0 = s.prepare_request(req)
s.headers.update(headers)
resp = s.send(
            prep0,
            verify=False,
            allow_redirects=True,
         )

soup = bs4.BeautifulSoup(resp.text)
login_payload["__VIEWSTATE"] = soup.select_one("#__VIEWSTATE")["value"]
login_payload["__VIEWSTATEGENERATOR"] = 
soup.select_one("#__VIEWSTATEGENERATOR")["value"]

req_login = requests.Request('POST', juvo_url, headers=s.headers, 
data=login_payload)
prep1 = s.prepare_request(req_login)
login_resp = s.send(prep1, verify=False)

Here is the rest of the request body if this helps, I am not using this.

__EVENTTARGET=&__EVENTARGUMENT=&forErrorMsg=&ctl00%24txtEmailAddr=*MYLOGIN*&ctl00%24txtPwd=*MYPASSWORD*&ctl00%24ImgBtnLoging.x=0&ctl00%24ImgBtnLoging.y=0

With other attempts with more code additional to the above, every page, including trying to get the file from the direct hyperlink copied from IE, returns "Object moved to here" (with a direct link to the file I need which works in browser) or redirects me to the login page.
If I try to download this, in Python using this direct link from requests.history, I download a html file with the same, depending on the response either "Object moved to here" or the html of the login page.

My request status is always 302 or 200 as seen from urllib3 debugging being enabled, but I am yet to see any response other than login/object moved to here.

Closest I can get is with this header after doing a GET request after modifying in Python the copied browser URL to the date I am interested in: (which may actually be a website vulnerability if I can get this far without being logged in...)

{'Cache-Control': 'private', 'Content-Length': '873', 'Content-Type': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet; charset=utf-8', 'Location': 'redacted login page with a whole load of params', 'Server': 'Microsoft-IIS/7.5', 'content-disposition': 'attachment;filename='redacted filename', 'X-AspNet-Version': '2.0.50727', 'X-Powered-By': 'ASP.NET'}

With almost every SO hyperlink now purple, any clues/suggestions would be greatly appreciated.
Many thanks.

Martin Gergov
  • 1,556
  • 4
  • 20
  • 29
jmejay
  • 56
  • 7
  • 1
    Just suggestion, you should check your Python response and browser one, find out the difference. Then try to imitate browser request as possible as you can. – KC. Dec 05 '18 at 06:27

0 Answers0