PyTip - Saving memory when reading CSV file in pandas

# Saving memory while reading a CSV file in Pandas

There could be many columns in the CSV file, but if you are using only few specific columns always mention usecols option in pd.read_csv like below

bse_daily_csv_all_cols = ["ts", "sc_code", "sc_name", "sc_group", "sc_type"
                 , "open", "high", "low", "close", "last", "prevclose"
                 , "no_trades", "no_of_shrs", "net_turnover", "tdcloindi", "isin"]
 
bse_daily_csv_use_cols = ['ts', 'sc_code', 'sc_name', 'high', 'low', 'close', 'prevclose', 'no_of_shrs']
 
df_bse_daily = pd.read_csv(os.path.join(os.getcwd(), '..', '5. BTD', 'data', 'bse_daily_365d.csv'), sep='|'
                          ,names=bse_daily_csv_all_cols
                          ,usecols = bse_daily_csv_use_cols
                          ,skip_blank_lines=True
                          ,parse_dates=['ts'])

Have a look at the memory usage field in the below output

# when not mentioning usecols
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 657058 entries, 0 to 657057
Data columns (total 16 columns):
ts              657058 non-null datetime64[ns]
sc_code         657058 non-null int64
sc_name         657058 non-null object
sc_group        657058 non-null object
sc_type         657058 non-null object
open            657058 non-null float64
high            657058 non-null float64
low             657058 non-null float64
close           657058 non-null float64
last            657058 non-null float64
prevclose       657058 non-null float64
no_trades       657058 non-null int64
no_of_shrs      657058 non-null int64
net_turnover    657058 non-null int64
tdcloindi       657058 non-null object
isin            657037 non-null object
dtypes: datetime64[ns](1), float64(6), int64(4), object(5)
memory usage: 67.7+ MB
 
# when usecols is mentioned
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 657058 entries, 0 to 657057
Data columns (total 8 columns):
ts            657058 non-null datetime64[ns]
sc_code       657058 non-null int64
sc_name       657058 non-null object
high          657058 non-null float64
low           657058 non-null float64
close         657058 non-null float64
prevclose     657058 non-null float64
no_of_shrs    657058 non-null int64
dtypes: datetime64[ns](1), float64(4), int64(2), object(1)
memory usage: 37.6+ MB

So when you have information about structure beforehand, use it.

bobby_dreamer

Recently Created

Welcome

About me

iRevere

Today I Learned

Exporting Google Photos via Google Takeout

Top quotes of Charlie Munger

The Most Important Thing - Origins and Inspirations, Talks at Google, Howard Marks

PyTip - Saving memory when reading CSV file in pandas

# Saving memory while reading a CSV file in Pandas

Graph View

Backlinks