6

Imagine you have two systems of delimiting. One with paired delimiters, [ and ]:

 [abc]

Then another system which uses a single interstitial delimiter, /:

 a/b/c

It's easy to see how to encode structure in the first case, as they nest cleanly:

[ab[cd[e]f]]

But let's say you are looking to encode arbitrary nested structures in the second scheme, using only some number of interstitial delimiters. Whatever the encoding winds up being, a/////b//c////d///e///f would be an example of "following the rules", while //a//b/c///d///e///f/// would not.

So you're basically able to put a unary-encoded integer from 0..∞ (let's say 0 is /, 1 is //) between your elements.

It's obviously possible to encode, though the results won't necessarily be pretty or visually intuitive. One way would be to consider you have two factors to record at each step.

  • Whether a nesting level is being entered or not for the ensuing token or if it's going to stay the same. So for a?b, we want to know ...a[b... or ...ab...

  • how much of a nesting drop the ensuing token has after it. So for d?e, we want to know e.g. ...d[e]... or ...d[e]]... or ...d[e]]]

The first is just a yes or no, and the second is a number which can range from 0 to however many nesting levels you've gotten so far. So multiply the nesting level to drop by 2, add 1 if you're going a level deeper in and leave it alone otherwise.

[ab[cd[e]f]] => a 0 b 1 c 0 d 3 e 4 f => a/b//c/d///e////f

That's pretty mechanical, but if I've got it right, I think it verifies it can be done.

But here's my question: Is there a known encoding for this kind of problem that would more intuitively convey the structure to a human reader, perhaps at the cost of making longer strings? Let's say a system that would decay such that simple cases like [ab[cd]e] could look more like a/b//c///d//e or similar, while still being able to encode everything distinctly?

I realize the quality I'm asking for is a bit "nebulous", but perhaps you see what I mean. One thing I don't like about the encoding I chose above is it imposes a left-to-right "leaning" property, when there isn't anything particularly left-like or right-like about the nesting properties being encoded. I wonder how that might be excised by making different choices.

2 Answers2

4

\n and \t would get the job done as best as I could tell. In memory they're each only a character, but visually they offer a lot more. You then have an encoding scheme of \n then $x$ \t characters where $x$ is the nesting level. So this:

[a b [c d [e] f]]

in memory becomes:

a\nb\n\tc\n\td\n\t\te\n\tf

visually becomes:

a
b
    c
    d
        e
    f

As Evil mentions, you could similarly remove \n and \t between characters at the same level to achieve something like this:

ab
    cd
        e
    f
ryan
  • 4,533
  • 1
  • 16
  • 41
2

I'll propose a new idea inspired by the everyday slinky. It might be more difficult to perceive exactly what the nesting is, but it might give you a better overall view. We can use a technique similar to the compression and decompression of slinkies:

slinky magic

Take for example the nesting: [a b c d [e f [g] h i] j k l m]. We can visualize it in two dimensions based on nest depth like so:

a b c d           j k l m

        e f   h i 

            g    

Or alternatively, the inverse:

            g    

        e f   h i

a b c d           j k l m

For either of these we need to determine which should be the compression and which should be the decompression (high vs low nests).

We could try both out:

  1. High level is compression, low level is decompression:

    a b c d e f g h i j k l m

  2. Low level is compression, high level is decompression:

    a b c d e f g h i j k l m

Personally I think option two is more visually representative because the large space between the upper levels represents the lack of deeper nests clearly and the condensed space represents the densely packed nests more easily.


Further we could make a formal algorithm. Let's assume input is given as a (value, depth) pair = $(x_i, d_i)$. Let $d_m$ denote the maximum depth of all values in the sequence. For the deepest level, we will pad the elements with 0 spaces, and the highest level we will pad the elements with $2^{d_m - 1}$ spaces on both sides. The only caveat is that we do not pad the left most value on the left and not the right most value on the right. Take the example: [[ab[cd[efg]h][ij]k]lm].

a    b   c  d efg h  i  j   k      l        m

The big problem with this is that it does not guarantee a unique encoding. If you do want a unique encoding with a similar approach but maybe not as visually comprehensive, follow the same algorithm only padding the left side and starting at 1, rather than 0 spaces. So [[ab[cd[efg]hij]k]lm] becomes:

a    b  c  d e f g  h  i  j    k        l        m

Although keep in mind this only is unique for a sequences of depths, not necessarily unique nestings. For example [ab][cd][ef][gh] would be considered the same as [abcdefgh] because they're all at the same depth.

ryan
  • 4,533
  • 1
  • 16
  • 41