Questions tagged [data-compression]
277 questions
39
votes
7 answers
Can PRNGs be used to magically compress stuff?
This idea occurred to me as a kid learning to program and
on first encountering PRNG's. I still don't know how realistic
it is, but now there's stack exchange.
Here's a 14 year-old's scheme for an amazing compression algorithm:
Take a PRNG and seed…
user15782
38
votes
5 answers
Is there a known maximum for how much a string of 0's and 1's can be compressed?
A long time ago I read a newspaper article where a professor of some sort said that in the future we will be able to compress data to just two bits (or something like that).
This is of course not correct (and it could be that my memory of what he…
user24636
36
votes
6 answers
Do lossless compression algorithms reduce entropy?
According to Wikipedia:
Shannon's entropy measures the information contained in a message as opposed to the portion of the message that is determined (or predictable). Examples of the latter include redundancy in language structure or statistical…
robert
- 463
- 4
- 8
35
votes
5 answers
Enumerate all non-isomorphic graphs of a certain size
I'd like to enumerate all undirected graphs of size $n$, but I only need one instance of each isomorphism class. In other words, I want to enumerate all non-isomorphic (undirected) graphs on $n$ vertices. How can I do this?
More precisely, I want…
D.W.
- 167,959
- 22
- 232
- 500
29
votes
7 answers
Efficient compression of simple binary data
I have a file containing ordered binary numbers from $0$ to $2^n - 1$:
0000000000
0000000001
0000000010
0000000011
0000000100
...
1111111111
7z did not compress this file very efficiently (for n = 20, 22 MB were compressed to 300 kB).
Are there…
DSblizzard
- 484
- 1
- 4
- 10
26
votes
12 answers
Is von Neumann's randomness in sin quote no longer applicable?
Some chap said the following:
Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin.
That's always taken to mean that you can't generate true random numbers with just a computer. And he said…
Paul Uszak
- 1,602
- 1
- 13
- 21
25
votes
1 answer
Compression of domain names
I am curious as to how one might very compactly compress the domain of an arbitrary IDN hostname (as defined by RFC5890) and suspect this could become an interesting challenge. A Unicode host or domain name (U-label) consists of a string of Unicode…
eggyal
- 359
- 2
- 7
24
votes
3 answers
Approximating the Kolmogorov complexity
I've studied something about the Kolmogorov Complexity, read some articles and books from Vitanyi and Li and used the concept of Normalized Compression Distance to verify the stilometry of authors (identify how each author writes some text and group…
woliveirajr
- 343
- 3
- 9
23
votes
5 answers
Data compression using prime numbers
I have recently stumbled upon the following interesting article which claims to efficiently compress random data sets by always more than 50%, regardless of the type and format of the data.
Basically it uses prime numbers to uniquely construct a…
Klangen
- 1,100
- 8
- 15
22
votes
7 answers
Why are these (lossless) compression methods of many similar png images ineffective?
I just came across the following thing: I put multiple identical copies of a png image into a folder and then tried to compress that folder with the following methods:
tar czf folder.tar.gz folder/
tar cf folder.tar folder/ && xz --stdout…
a_guest
- 323
- 1
- 2
- 6
20
votes
4 answers
Compressing two integers disregarding order
Comparing an ordered pair (x,y) to an unordered pair {x, y} (set), then information theoretically, the difference is only one bit, as whether x comes first or y requires exactly a single bit to represent.
So, if we're given a set {x,y} where x,y are…
Troy McClure
- 644
- 4
- 10
19
votes
4 answers
Can data be compressed to size smaller than Shannon data compression limit?
I was reading about data compression algorithms and the theoretical limit for data compression. Recently I encountered a compression method called "Combinatorial Entropy Encoding", the main idea of this method is to encode the file as the characters…
Hesham Hany
- 315
- 2
- 6
18
votes
7 answers
Can random suitless $52$ playing card data be compressed to approach, match, or even beat entropy encoding storage? If so, how?
I have real data I am using for a simulated card game. I am only interested in the ranks of the cards, not the suits. However it is a standard $52$ card deck so there are only $4$ of each rank possible in the deck. The deck is shuffled well for…
David James
- 124
- 2
- 16
17
votes
5 answers
Are there any compression algorithms based on PI?
What we know is that π is infinite and quite likely it contains every possible finite string of digits (disjunctive sequence).
I've seen recently some prototype of πfs which assume that every file you've created (or anybody else) or you will create,…
kenorb
- 275
- 4
- 10
15
votes
1 answer
Why is compression ratio using bzip2 for a sequence of "a"s so jumpy?
library(ggplot2)
compress <- function(str) {
length(memCompress(paste(rep("a", str), collapse=""), type="bzip2"))
/ nchar(paste(rep("a", str), collapse=""))
}
cr <- data.frame(i = 1:10000, r = sapply(1:10000, compress))
ggplot(cr[cr$i>=5000 &…
Raffael
- 337
- 2
- 7