27

In many academic papers algorithms are described. They seem to use similar "syntax".

Is there a standard for this language? If I want to describe an algorithm, would I improvise my description?

For example, note that papers in general use

a <-- b

to assign a, not a = b. But where is that standard?

Caleb Stanford
  • 7,298
  • 2
  • 29
  • 50
Makketronix
  • 391
  • 3
  • 7

6 Answers6

30

No. There is no universal standard. There are some conventions that have become more popular over time, through gradual evolution.

A good starting place to look would be to look at the pseudocode notation used in a few common algorithms textbooks, pick one you like, and try to emulate it. Probably anything done in a popular and well-regarded textbook is going to be reasonable and understandable to others.

D.W.
  • 167,959
  • 22
  • 232
  • 500
25

The word you are looking for is pseudocode. :) There is no official standard, but as you have observed, there are general conventions that most people follow. Partly this is dictated by popular LaTeX packages (see, for example, this guide).

It's worth noting that the point of pseudocode is not to be formally specified, but to be human-readable. As a result, it is unlikely that all aspects of it could be standardized. For example, in pseudocode one sometimes takes a shortcut like:

for all vertices v reachable from u, do:
   ...

with the understanding that we can implement this using a DFS or a BFS. To a human reader, this is often clearer than writing out the DFS or BFS explicitly. In short, pseudocode cannot be fully standardized, because its goal is to be read and easily understood by a human.

That being said, there are some conventions that most pseudocode follows, which are considered easier to read than the alternatives. A short list is:

  • Pseudocode is generally written imperatively, that is as a set of functions or procedures which consist of sequentially composed instructions, if-then blocks, while, and for loops.

  • For variable assignment, $a \gets b$ or $a := b$ is generally preferred to $a = b$ as it distinguishes assignment from equality

  • Mathematical expressions are usually written as formulas rather than programmatically: e.g. "while $n \le \sqrt{m + 1}$" is preferred over while (n <= sqrt(m + 1)).

  • Data structures are named explicitly using standard naming conventions: for example, "dynamic array" for a growable, O(1) accessible list rather than something like "vector".

  • Function calls and control flow jumps are usually made explicit using a keyword: e.g. "return $x$", "call $f(x, y)$", "break", and "continue". "goto" and "label" are also sometimes used, though probably somewhat discouraged.

Caleb Stanford
  • 7,298
  • 2
  • 29
  • 50
7

There is no common standard because for languages of any kind (both in theoretical CS, i.e. more abstract languages like lambda calculus; as well as in practical programming languages used to write actual code) there always is a preferred mode of usage, or a type of problem where using a particular language would make most sense, compared to other languages.

As an example: you can use any Turing-complete language to solve any kind of problem we can think about; and basically everything that's worth talking about in this context is Turing-complete. So all languages can perform the same work. But depending on the problem to solve, using, say, a purely functional language like Haskell may be super easy and elegant, while solving the same problem in a purely imperative language like C might be very hard or at least annoying. And vice versa of course; you would program a Kernel driver in C, but not in ProLog.

The same goes for papers. Sometimes it may be perfectly optimal to use lambda calculus to talk about some problem; but sometimes you might be better off with a very basic imperative "language" which basically just formalizes natural language a little bit. You would use, for example, lambda calculus if the problem you wish to solve somehow benefits from the semantics of that language (and any proofs/theorems that are already established). But if you just wish to use a quick pseudo language to make some ideas clearer, a simple intuitive imperative notation with intuitive semantics may be just fine.

Finally, you can also whip up a quick DSL (domain specific language) which not only contains abstract language semantics, but also somehow includes features of the domain in the notation. For example, most of the symbols, conventions and semantics used in maths itself are, basically, a DSL - and if you compare that to notation for physical math, you will find so many differences that a person familiar with the one might be incapable to understand the other, albeit the other makes much more sense for that particular domain.

And within all these languages, the textual representation is usually a matter of taste and convention, which may be different in different special areas of CS; or even from country to country.

AnoE
  • 1,303
  • 8
  • 10
6

As others note, there won't be any one universal standard for pseudocode.

That said, Rosen's Discrete Mathematics has a particularly well-defined system for pseudocode (in Appendix 3), and if you were looking for a particular grammar, then it could be a good model.

4

There is no official standard, but I would argue that algorithm pseudocode conventionally follows much of the syntax of Dijkstra's Guarded Command Language, with influence from other structured programming syntaxes of the late seventies and early eighties, and often used explicitly in texts such as Carroll Morgan's Programming From Specifications (1990–1998). Because there is no officially maintained standard for pseudocode, and many CS papers do not explicitly place their algorithm description within a syntax and semantics, the relationship is not one of strict conformance, but of family resemblance.

The formatting provided by the standard LaTeX packages for algorithms are also strongly influenced by this syntax, and this tooling infrastructure helps continue the convention.

Some examples:

  • Using symbols like $\geq$ instead of the more common executable programming language syntax of >=
  • The use of logical predicates with richer set theory syntax as guards
  • The use of structured programming era style blocks for conditionals and loops, with named if/end if and do / od pairs, rather than braces, as in C-style languages, or whitespace syntax, as in Python.

There are some counter-examples, including the use of $\leftarrow$ rather than := for assignment, as in GCL. I suspect this is because of a certain continuity of mathematical style and because it saves slightly on horizontal space when formatting pages close to the page limit of publications and conferences. Some more careful close reading of papers historical and present could help trace the style lineage more precisely.

Adam Burke
  • 143
  • 4
2

The type of pseudocode I’ve most commonly seen is loosely based on ALGOL, sometimes falling back to natural language or algebraic notation.

For example, compare this sample (from Wikipedia) of Algol-60:

BEGIN
INTEGER p, q;
y := 0; i := k := 1;
     for p := 1 step 1 until n do
         for q := 1 step 1 until m do
             if abs(a[p, q]) > y then
                 begin y := abs(a[p, q]);
                     i := p; k := q
                 end
 END Absmax

With this pseudocode from the second paper I checked at random:

Notation: The flight corridor ℬ; global guide path ;
Initial and goal position: 0, g; local guide point h
Input: , 0, g
Output: ℬ

2 Initialize ℬcur = GenerateOneSphere(0); 3 ℬ.PushBack(ℬcur); 4 while True do 5 h = GetForwardPointOnPath(, ℬcur); 6 ℬcur = BatchSample(h, ℬcur); 7 ℬ.PushBack(ℬcur); 8 if g ∈ ℬcur then 9 break; 10 end 11 end 12 WaypointAndTimeInitialization(ℬ);

This doesn’t quite match the formatting of the original (Algorithm 1 on page 4), such as the syntax highlighting, and it introduces mathematical notation such as that very few programming languages support (Haskell being one that can), but you can see the similarities.

In particular, it retains the semicolons and end statements of ALGOL, even though only parsers that ignore whitespace need them, and they are completely redundant in pseudocode written for human eyes. This listing does not use the := assignment operator, but I have seen many that do. Because = sometimes means assignment, and sometimes a test for equality, you need to be careful to be unambiguous. This is likely one reason for the use of the left arrow (one of whose most influential users was Donald E. Knuth).

Davislor
  • 1,241
  • 9
  • 12