Skip to content
bobby_dreamer

PyTip - Saving memory when reading CSV file in pandas

python, pandas1 min read

# Saving memory while reading a CSV file in Pandas

There could be many columns in the CSV file, but if you are using only few specific columns always mention usecols option in pd.read_csv like below

1bse_daily_csv_all_cols = ["ts", "sc_code", "sc_name", "sc_group", "sc_type"
2 , "open", "high", "low", "close", "last", "prevclose"
3 , "no_trades", "no_of_shrs", "net_turnover", "tdcloindi", "isin"]
4
5bse_daily_csv_use_cols = ['ts', 'sc_code', 'sc_name', 'high', 'low', 'close', 'prevclose', 'no_of_shrs']
6
7df_bse_daily = pd.read_csv(os.path.join(os.getcwd(), '..', '5. BTD', 'data', 'bse_daily_365d.csv'), sep='|'
8 ,names=bse_daily_csv_all_cols
9 ,usecols = bse_daily_csv_use_cols
10 ,skip_blank_lines=True
11 ,parse_dates=['ts'])

Have a look at the memory usage field in the below output

1# when not mentioning usecols
2<class 'pandas.core.frame.DataFrame'>
3RangeIndex: 657058 entries, 0 to 657057
4Data columns (total 16 columns):
5ts 657058 non-null datetime64[ns]
6sc_code 657058 non-null int64
7sc_name 657058 non-null object
8sc_group 657058 non-null object
9sc_type 657058 non-null object
10open 657058 non-null float64
11high 657058 non-null float64
12low 657058 non-null float64
13close 657058 non-null float64
14last 657058 non-null float64
15prevclose 657058 non-null float64
16no_trades 657058 non-null int64
17no_of_shrs 657058 non-null int64
18net_turnover 657058 non-null int64
19tdcloindi 657058 non-null object
20isin 657037 non-null object
21dtypes: datetime64[ns](1), float64(6), int64(4), object(5)
22memory usage: 67.7+ MB
23
24# when usecols is mentioned
25<class 'pandas.core.frame.DataFrame'>
26RangeIndex: 657058 entries, 0 to 657057
27Data columns (total 8 columns):
28ts 657058 non-null datetime64[ns]
29sc_code 657058 non-null int64
30sc_name 657058 non-null object
31high 657058 non-null float64
32low 657058 non-null float64
33close 657058 non-null float64
34prevclose 657058 non-null float64
35no_of_shrs 657058 non-null int64
36dtypes: datetime64[ns](1), float64(4), int64(2), object(1)
37memory usage: 37.6+ MB

So when you have information about structure beforehand, use it.