At times, companies do change their names and due which in the stock exchanges their ticker changes. In NSE, this ticker is called as “SYMBOL” atleast they used to be.
These changes are available in NSE Website here
These change in symbol csv file link is
CSV file looks like this,
So, when you are doing an analysis with Bhavcopy data, if a company changes their symbol, you got to handle it somehow. So, the method, i am using below is, i’m updating the past symbol to the latest/current symbol.
Lets say, you downloaded that file and read that file into a dataframe like below,
# Read the CSV content into a Pandas DataFrame
df_symchange = pd.read_csv(csv_data, names=colnames, header=None)
Below is what i had tried,
time
df_nse = df_nse5y.copy()
# Convert date columns to datetime
df_symchange["dateChanged"] = pd.to_datetime(df_symchange["dateChanged"])
df_nse["Date"] = pd.to_datetime(df_nse["Date"])
# Sort df_symchange by dateChanged to process changes in chronological order
df_symchange = df_symchange.sort_values(by='dateChanged', ascending=True).reset_index(drop=True)
# Map changes using a cumulative replacement strategy
df_nse = df_nse.merge(
df_symchange,
left_on="Stock",
right_on="previous",
how="left",
suffixes=("", "_change")
)
# Replace the stock symbol where the date is within the change window
df_nse["Stock"] = df_nse.apply(
lambda row: row["latest"] if pd.notnull(row["dateChanged"]) and row["Date"] <= row["dateChanged"] else row["Stock"],
axis=1
)
# Drop extra columns added during the merge
df_nse.drop(columns=["previous", "latest", "dateChanged", "company"], inplace=True)
This took,
CPU times: total: 4.53 s
Wall time: 4.6 s
Vectorization usually doesn’t work for me, like outputs wouldn’t match up. This matched.
This is the code, i use to compare dataframes,
def compare_dataframes(df1, df2):
# Fill missing values with a placeholder before comparison
df1_filled = df1.fillna('missing')
df2_filled = df2.fillna('missing')
# Compare element-wise
comparison = df1_filled == df2_filled
# Check where values differ
differences = ~comparison
# Find mismatched rows and columns
mismatched = differences.any(axis=1) # Rows with differences
print("Number of mismatched rows:", mismatched.sum())
print("Rows with differences:\n", df1[mismatched].shape)
Inputs
compare_dataframes(df_nse1, df_nse3)
Output
Number of mismatched rows: 0
Rows with differences:
(0, 5)
After running the program, ‘ADANITRANS’ is updated as ‘ADANIENSOL’
That’s all for now