Post-Hoc Test for non-parametric and small dataset

Question

In the lab that I work in we performed an experiment to observe whether or not cyanobacteria are effective in changes in silica concentration in a lake. The experiment had 3 different variables:

the type of organism
whether or not silica was added in the batch
type of broth (only Salda lake water or Salda lake water mixed with broth).

There were two replicates for every combination of variables which resulted in 22 samples in the experiment. We made microscope observations and measured the Mg and Ca concentrations for each sample (18 non-NA concentrations).

I performed the Permutation Kruskal-Wallis test to see whether we could gather any information on whether any of the variables mentioned above had any significant affects on Mg or Ca concentrations. I performed the test on type of organism, silica addition and broth type categories and found out that the type of organism in the batches was significant for both Mg and Ca but the other categories did not have significant differences.

When I performed a Post-Hoc test using Dunn's test (with Holm's correction) the results were similar but it didn't show significance in the type of organism for Ca. However, I then learned that because of my sample size, a permutation version of Dunn's test might be better. Because I don't know how to perform the permutation version of Dunn's test (again with Holm's correction), I got a code suggestion from Copilot. But this time the results showed a significant difference in the broth type as well. Naturally, now I'm considering whether it was a problem in the code or whether one of the tests had a Type I or Type II error. If there isn't a problem with the code I'm thinking of performing an effect size test. You can find my code below. I'd appreciate any suggestions you might have.

P.S. I could also provide my code for permutation Kruskal-Wallis if it might be necessary

# Define the permutation function
def perm_dunn(data, dv, between, n_perm = 10000, random_state = 42):
    """
    Performs a permutation Dunn's test on the given DataFrame.
Parameters:
- data: pandas DataFrame containing the data.
- dv: The name of the dependent variable column (e.g., 'Mg' or 'Ca').
- between: The name of the categorical grouping variable column.
- n_perm: Number of permutations to perform.
- random_state: Seed for reproducibility.

Returns:
- result: A pandas DataFrame containing the observed difference, p-value, and Holm's correction.

Exceptions:
- Raises an exception if the dependent variable is not numeric.
- Raises an exception if the grouping variable is not categorical.
&quot;&quot;&quot;
# Check if the dependent variable is numeric
if data[dv].dtype not in ['int64', 'float64']:
    raise ValueError('The dependent variable must be numeric.')

# Check if the grouping variable is categorical
if data[between].dtype not in ['object', 'category']:
    raise ValueError('The grouping variable must be categorical.')

groups = data[between].unique()
result = {}

for (group1, group2) in it.combinations(groups, 2):
    data1 = data[data[between] == group1][dv].values
    data2 = data[data[between] == group2][dv].values
    observed_diff = np.abs(np.mean(data1) - np.mean(data2))

    cmb = np.concatenate([data1, data2])
    perm_diffs = []
    np.random.seed(random_state)
    for _ in range(n_perm):
        np.random.shuffle(cmb)
        perm_data1 = cmb[:len(data1)]
        perm_data2 = cmb[len(data1):]
        perm_diffs.append(np.abs(np.mean(perm_data1) - np.mean(perm_data2)))

    perm_diffs = np.array(perm_diffs)
    p_value = np.sum(perm_diffs &gt;= observed_diff) / n_perm
    result[(group1, group2)] = [observed_diff, p_value]
    # Save the group names as strings in the result dictionary
    result[(group1, group2)].extend([str(group1), str(group2)])

# Apply Holm's correction
p_values = [value[1] for value in result.values()]
reject, p_values_corr, _, _ = smt.multipletests(p_values, method = 'holm')
for i, key in enumerate(result.keys()):
    result[key].append(p_values_corr[i])

# Create a DataFrame from the results
result = pd.DataFrame(result).T
result.columns = ['Observed Difference', 'p-value', 'Group 1', 'Group 2', 'Holm Correction']
# Order the columns as Group 1, Group 2, Observed Difference, p-value, Holm Correction
result = result[['Group 1', 'Group 2', 'Observed Difference', 'p-value', 'Holm Correction']]

return result


Perform Permutation Dunn's test for each column for Mg and Ca respectively using a for loop
perm_dunn_mg = pd.DataFrame()
perm_dunn_ca = pd.DataFrame()
for col_name in cyano[['Shape', 'Silica', 'Broth']]:
    perm_dunn_mg = perm_dunn_mg._append(perm_dunn(data = cyano.drop(columns = ["SampleName", "Replicate"]), dv = 'Mg', between = col_name), ignore_index = True)
    perm_dunn_ca = perm_dunn_ca._append(perm_dunn(data = cyano.drop(columns = ["SampleName", "Replicate"]), dv = 'Ca', between = col_name), ignore_index = True)
Print the results
print(perm_dunn_mg)
print(perm_dunn_ca)
```

Post-Hoc Test for non-parametric and small dataset

Perform Permutation Dunn's test for each column for Mg and Ca respectively using a for loop

Print the results

0 Answers0