1

I am selecting data from Amazon Redshift Table with 500 millions rows. I have 64bit python installed.

code

import psycopg2
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('postgresql://'username':pwd@host/dbname')
data_frame = pd.read_sql_query('SELECT * FROM table_name ;', 
engine)

Everytime I run the code I get a "Out of Memory error". I have 16gb Ram. I am not sure how to resolve this issue.

Would really appreciate any help on this! Thanks

TigSh
  • 243
  • 1
  • 5
  • 14

1 Answers1

1

First, you are trying to access a big dataset with sqlalchemy, while specialized packages like bigquery would be a more suitable choice. I suggest learning about it on https://www.kaggle.com/learn/intro-to-sql

Also, I think that you get more data than your device can handle. Maybe setting up a limit on your data will help.

data_frame = pd.read_sql_query('SELECT * FROM table_name LIMIT 1000000;', engine)
kate-melnykova
  • 548
  • 2
  • 11