There could be many columns in the CSV file, but if you are using only few specific columns always mention usecols
option in pd.read_csv
like below
1bse_daily_csv_all_cols = ["ts", "sc_code", "sc_name", "sc_group", "sc_type"2 , "open", "high", "low", "close", "last", "prevclose"3 , "no_trades", "no_of_shrs", "net_turnover", "tdcloindi", "isin"]4
5bse_daily_csv_use_cols = ['ts', 'sc_code', 'sc_name', 'high', 'low', 'close', 'prevclose', 'no_of_shrs']6
7df_bse_daily = pd.read_csv(os.path.join(os.getcwd(), '..', '5. BTD', 'data', 'bse_daily_365d.csv'), sep='|'8 ,names=bse_daily_csv_all_cols9 ,usecols = bse_daily_csv_use_cols10 ,skip_blank_lines=True11 ,parse_dates=['ts'])
Have a look at the memory usage
field in the below output
1# when not mentioning usecols2<class 'pandas.core.frame.DataFrame'>3RangeIndex: 657058 entries, 0 to 6570574Data columns (total 16 columns):5ts 657058 non-null datetime64[ns]6sc_code 657058 non-null int647sc_name 657058 non-null object8sc_group 657058 non-null object9sc_type 657058 non-null object10open 657058 non-null float6411high 657058 non-null float6412low 657058 non-null float6413close 657058 non-null float6414last 657058 non-null float6415prevclose 657058 non-null float6416no_trades 657058 non-null int6417no_of_shrs 657058 non-null int6418net_turnover 657058 non-null int6419tdcloindi 657058 non-null object20isin 657037 non-null object21dtypes: datetime64[ns](1), float64(6), int64(4), object(5)22memory usage: 67.7+ MB23
24# when usecols is mentioned25<class 'pandas.core.frame.DataFrame'>26RangeIndex: 657058 entries, 0 to 65705727Data columns (total 8 columns):28ts 657058 non-null datetime64[ns]29sc_code 657058 non-null int6430sc_name 657058 non-null object31high 657058 non-null float6432low 657058 non-null float6433close 657058 non-null float6434prevclose 657058 non-null float6435no_of_shrs 657058 non-null int6436dtypes: datetime64[ns](1), float64(4), int64(2), object(1)37memory usage: 37.6+ MB
So when you have information about structure beforehand, use it.