3

Inputs. I am given a finite set $S$ of symbols. I know there should exist some total order $<$ on $S$, but I'm not given this ordering and it could be anything.

I am also given a collection of assertions. Each assertion takes the form $s_1<s_2<\cdots<s_m$, where $s_1,\dots,s_m$ form a subset of the symbols of $S$. The assertion probably won't mention all of the symbols of $S$, just a subset. Each assertion will probably cover a different subset.

Warmup problem. The starter problem is: Given $n$ assertions, identify whether they are all internally self-consistent, i.e., whether there exists a total order on $S$ that is consistent with all of the assertions, and if so, output an example of such a total order.

The real problem. In practice, a few assertions might be faulty. Almost all of them should be correct, though. So, the real problem is: if the assertions are not all internally self-consistent, find a minimal subset of assertions to label as "probably-erroneous", such that if you remove the probably-erroneous assertions, the remainder are all self-consistent.

What I know. I know how to solve the warmup problem (just compute the transitive closure of the union of the partial orders given by each assertion, and check that the result is antisymmetric; or, in other words, create a graph with $S$ as vertex set and an edge $s\to t$ if $s<t$ appears in any assertion, then check for cycles). However, I don't know how to solve the real problem. Any ideas?

Real-world parameters. In the application domain where I've run into this, $S$ might have up to a few hundred symbols, and I might have up to a few thousand assertions, with each assertion typically mentioning dozens of symbols.

Gilles 'SO- stop being evil'
  • 44,159
  • 8
  • 120
  • 184
D.W.
  • 167,959
  • 22
  • 232
  • 500

2 Answers2

3

This sounds like weighted FAS (Feedback Arc Set). Find a minimal (weight) feedback arc set in a directed graph and report the assertions that contributed to the wrongly directed arcs.

For each pair of symbols s and t, there is an arc saying how many assertions have s < t and how many assertions have t < s.

This is a known NP-complete problem.

On a different note, your problem is a classic social choice problem where each of the "assertions" are votes.

2

I thought I'd jot down the two approaches that have occurred to me, but maybe you can do better.

Approach 1. Here's a randomized procedure that, given an ordering of the assertions, will emit a candidate set of assertions labelled as "probably-erroneous" such that the remainder are all self-consistent. We can repeat the procedure many times, and keep the best output (the one with the smallest number of assertions labelled as "probably-erroneous").

The procedure:

  • Step 1. Randomly permute the assertions. Create an empty graph $G$ with vertex set $S$ and with no edges
  • Step 2. For each assertion, in the order chosen in step 1, do the following:

    • Suppose the assertion is $s_1<s_2<\cdots<s_m$. Test whether adding the edges $s_1\to s_2, \dots, s_{m-1} \to s_m$ would create a cycle in $G$.
    • If this would create a cycle, label this assertion as "probably-erroneous", and don't add the edges to $G$.
    • If this would not create a cycle, add the edges to $G$.

Approach 2. Create a $|S|\times|S|$ matrix $V[\cdot,\cdot]$, where $V[s,t]$ counts the number of "votes" for the claim $s<t$. Initialize all counts in the matrix to zero.

Treat the assertion $s_1<s_2<\cdots<s_m$ as a collection of $m(m-1)/2$ votes for the claims $s_1<s_2$, $s_1<s_3$, $s_2<s_3$, etc. Process each assertion, incrementing the counts in $V$ appropriately. This gives you the full matrix $V$.

Now look through $V$ to check for symbols $s,t$ such that $V[s,t]>0$ and $V[t,s]>0$ and such that the quantity $V[t,s]/(V[s,t]+V[t,s])$ is minimized. Conclude that $s<t$ is correct, and label all assertions which contradict this (which voted for $t<s$) as "probably-erroneous". Remove these assertions from the set of assertions, and re-start the entire procedure, until the assertions that remain are all self-consistent.

This procedure is not guaranteed to terminate with a correct answer (e.g., consider the five assertions $A<P$, $P<Z$, $A<Q$, $Q<Z$, $Z<A$), but it's a heuristic that might be worth trying.

D.W.
  • 167,959
  • 22
  • 232
  • 500