1

I have this script:

    import requests
    from requests import get
    from bs4 import BeautifulSoup
    import csv
    import pandas as pd
f = open('olanda.csv', 'wb')
writer = csv.writer(f)

url = ('https://www......')
response = get(url)
soup = BeautifulSoup(response.text, 'html.parser')
type(soup)

table = soup.find('table', id='tablepress-94').text.strip()

print(table)

writer.writerow(table.split(), delimiter = ',')

f.close()

When it writes to a CSV file it writes everything in a single cell like that:

Sno.,CompanyLocation,1KarifyNetherlands,2Umenz,Benelux,BVNetherlands,3TovertafelNetherlands,4Behandeling,BegrepenNetherlands,5MEXTRANetherlands,6Sleep.aiNetherlands,7OWiseNetherlands,8Healthy,WorkersNetherlands,9&thijs,|,thuis,in,jouw,situatieNetherlands,10HerculesNetherlands, etc.

I wanted to have the output in a single column and each value (separated by comma) in a single row.

I tried to use delimiter = ',' but I got:

TypeError: a bytes-like object is required, not 'str'.

Screesnhot of table: table

How can I do this? Thanks!

Steven
  • 129
  • 2
  • 5
  • 16

1 Answers1

0

Please see doc of csv.writer

delimiter param is used when instantiating the writer, not on writerow method.

Furthermore from what I understand, your approach is wrong. For example you want to scrape the html table and store it as csv file.

But what you do is get the text from table

table = soup.find('table', id='tablepress-94').text.strip()

which is a concatenated string of all columns with no structure and try to split by nothing, which is wrong

while you should be getting a dict of table values, eg using:

table = soup.find('table', id='tablepress-94')
# assuming table has a
# <thead><tr><td>Column 1</td>..<td>Column n</td></tr></thead>
# section with column headers, else adjust accordingly
column_headers = [td.get_text() for td in table.find('thead').find('tr').find_all('th')]
print(column_headers)
data = {}
row_index = 0
for row in table.find('tbody').find_all('tr'):
    row_index += 1
    column_index = 0
    columns = row.find_all('td')
    for td in columns:
        if column_index >= len(column_headers):
            column_headers.append('Column_'+str(column_index+1))
        if column_headers[column_index] not in data:
            data[column_headers[column_index]] = []
        data[column_headers[column_index]].append(td.get_text())
        column_index += 1

print(data)

now output your scarped table data, data into csv properly

writer.writerow(column_headers) # print headers

print each row of data

for row in range(row_index): writer.writerow([data[column][row] for column in column_headers])

Nikos M.
  • 2,493
  • 1
  • 7
  • 11