Please take note that in your case, SettingWithCopyWarning is a valid warning as the chained assigment is not working as expected. df.iloc[idx] returns a copy of the slice instead of a slice into the original object. Therefore, df.iloc[idx]['labels'] = '|'.join(labels) makes modification on a copy of the row instead of the row of the original df. It seems to happen when the dataframe has mixed datatypes.
Regarding the different results by .loc and .iloc, it is because your row label is different with row integer locations (probably due to a train test split). When a row label does not exist, .loc cannot find it in existing rows, so it generate new row (.loc gets row (and/or col) with row (and/or col) label, while .iloc gets row (and/or col) with integer locations.)
Please find the examples after the solutions.
Solutions
Basic idea: You should avoid chained assignments and use the correct labels/integer locations.
Solution 1: reset_index and .loc
If you don't need to keep the row index, a solution is to do reset_index before your code, and use your df.loc[idx, 'labels'] = '|'.join(labels).
import pandas as pd
df = pd.DataFrame({'instances': ["a", "b", "c", "d"],
'labels': [1, 2, 3, 4]},
index=[0, 2, 4, 5])
df
instances labels
0 a 1
2 b 2
4 c 3
5 d 4
df = df.reset_index(drop=True)
df
instances labels
0 a 1
1 b 2
2 c 3
3 d 4
This will make the dataframe row labels same as the row integer locations. So .loc[n, 'labels'] refers to the same thing as .iloc[n, 'labels'].
Solution 2: Use column integer locations of 'labels' and .iloc
Example: Update labels of the 4th row to 100
col_idx = df.columns.get_loc("labels") # get the column integer locations of 'labels'
df.iloc[3, col_idx] = 100
df
instances labels
0 a 1
2 b 2
4 c 3
5 d 100
More Examples
Example of Valid SettingWithCopyWarning
import pandas as pd
df = pd.DataFrame({'instances': ["a", "b", "c", "d"],
'labels': [1, 2, 3, 4]},
index=[0, 2, 4, 5])
df
instances labels
0 a 1
2 b 2
4 c 3
5 d 4
Assume I want to update the labels of first row to 100.
df.iloc[0]['labels'] = 100
df
It returned the warning and failed to update the value.
/usr/local/lib/python3.7/dist-packages/pandas/core/series.py:1056: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
cacher_needs_updating = self._check_is_chained_assignment_possible()
instances labels
0 a 1
2 b 2
4 c 3
5 d 4
If all columns have the same datatype (eg: all str, all int), iloc will work and won't return SettingWithCopyWarning. Apparently, pandas handles mixed-type and single-type dataframes differently when it comes to chained assignments. Referring to this post which points to this Github issue.
You can also read this post or pandas documentation to gain a better understanding on chained assignment.
Example of Additional Row by .loc
df
instances labels
0 a 1
2 b 2
4 c 3
5 d 4
The row labels in our example are (0, 2, 4, 5), while row integer locations are (0, 1, 2, 3). When you use .loc with a label that does not exist, it will create a new row.
df.loc[1, 'labels'] = 100
df
instances labels
0 a 1
2 b 2
4 c 3
5 d 4
1 NaN 100