PyTip - Saving memory when reading CSV file in pandas

11.04.2020 — python, pandas — 1 min read

# Saving memory while reading a CSV file in Pandas

There could be many columns in the CSV file, but if you are using only few specific columns always mention usecols option in pd.read_csv like below

1bse_daily_csv_all_cols = ["ts", "sc_code", "sc_name", "sc_group", "sc_type"
2                 , "open", "high", "low", "close", "last", "prevclose"
3                 , "no_trades", "no_of_shrs", "net_turnover", "tdcloindi", "isin"]
4
5bse_daily_csv_use_cols = ['ts', 'sc_code', 'sc_name', 'high', 'low', 'close', 'prevclose', 'no_of_shrs']
6
7df_bse_daily = pd.read_csv(os.path.join(os.getcwd(), '..', '5. BTD', 'data', 'bse_daily_365d.csv'), sep='|'
8                          ,names=bse_daily_csv_all_cols
9                          ,usecols = bse_daily_csv_use_cols
10                          ,skip_blank_lines=True
11                          ,parse_dates=['ts'])

Have a look at the memory usage field in the below output

1# when not mentioning usecols
2<class 'pandas.core.frame.DataFrame'>
3RangeIndex: 657058 entries, 0 to 657057
4Data columns (total 16 columns):
5ts              657058 non-null datetime64[ns]
6sc_code         657058 non-null int64
7sc_name         657058 non-null object
8sc_group        657058 non-null object
9sc_type         657058 non-null object
10open            657058 non-null float64
11high            657058 non-null float64
12low             657058 non-null float64
13close           657058 non-null float64
14last            657058 non-null float64
15prevclose       657058 non-null float64
16no_trades       657058 non-null int64
17no_of_shrs      657058 non-null int64
18net_turnover    657058 non-null int64
19tdcloindi       657058 non-null object
20isin            657037 non-null object
21dtypes: datetime64[ns](1), float64(6), int64(4), object(5)
22memory usage: 67.7+ MB
23
24# when usecols is mentioned
25<class 'pandas.core.frame.DataFrame'>
26RangeIndex: 657058 entries, 0 to 657057
27Data columns (total 8 columns):
28ts            657058 non-null datetime64[ns]
29sc_code       657058 non-null int64
30sc_name       657058 non-null object
31high          657058 non-null float64
32low           657058 non-null float64
33close         657058 non-null float64
34prevclose     657058 non-null float64
35no_of_shrs    657058 non-null int64
36dtypes: datetime64[ns](1), float64(4), int64(2), object(1)
37memory usage: 37.6+ MB

So when you have information about structure beforehand, use it.