While working on my azul AI I needed a cheap way to store combinations of combinations of tiles to copy around in memory. The more compact this could be done the faster my memcpys become, and the more gamestates can be cached in working memory.

The main idea is to create a number that represents a single uniqe combination of elements, without the order of those elements mattering.

for example u([1, 2, 3]) = u([1, 2, 3]) = $65$ if all the elements are base 10.

Note about A009994

if you know there are three elements which can be 10 different values, then the number is the same as what's found in A009994

from itertools import combinations_with_replacement

# Taken from [OESIS](https://oeis.org/A009994)
# Thanks  Chai Wah Wu
def A009994generator(max):
    l = 1
    while l < max:
        for i in combinations_with_replacement('123456789', l):
            yield int(''.join(i))
        l += 1

for n,i in enumerate(A009994generator(4), start=1):
    print("{}: {}".format(n, i))

Run python script via webassembly (Might take a long time and use a lot of mobile data!):

Alternative nonjs log

I believe this could be useful to compress bitfields:


//An enum with four variants:
enum Stuff {
  One,
  Two,
  Three,
  Four
}

If you put this into a bitfield you could store, [One, Two, Three, Four] as something like 00100111.
But if you don't care about whether or not it's [One, Two, Three, Four] or [Two, One, Four, Three] you could sort the list so that it's [One, Two, Three, Four] every time.
You can then use the fact that the first element could be any of the four variants, the second element can be any of the 4 variants, but the third element can only be one of three variants, since the last one was Two, namely Two, Three, or Four. Spending two whole bits on that would be a 25% waste of space!
If we use the "index" of the available options as the bit value we might be able to save a lot of space.

000000  [One, One, One, One]
000001  [One, One, One, Two]
000010  [One, One, One, Three]
000011  [One, One, One, Four]
000100  [One, One, Two, Two]
000101  [One, One, Two, Three]
000110  [One, One, Two, Four]
000111  [One, One, Three, Three]
001000  [One, One, Three, Four]
001001  [One, One, Four, Four]
001010  [One, Two, Two, Two]
001011  [One, Two, Two, Three]
001100  [One, Two, Two, Four]
001101  [One, Two, Three, Three]
001110  [One, Two, Three, Four]
001111  [One, Two, Four, Four]
010000  [One, Three, Three, Three]
010001  [One, Three, Three, Four]
010010  [One, Three, Four, Four]
010011  [One, Four, Four, Four]
010100  [Two, Two, Two, Two]        
010101  [Two, Two, Two, Three]      
010110  [Two, Two, Two, Four]       
010111  [Two, Two, Three, Three]    
011000  [Two, Two, Three, Four]
011001  [Two, Two, Four, Four]
011010  [Two, Three, Three, Three]
011011  [Two, Three, Three, Four]
011100  [Two, Three, Four, Four]
011101  [Two, Four, Four, Four]
011110  [Three, Three, Three, Three]
011111  [Three, Three, Three, Four]
100000  [Three, Three, Four, Four]
100001  [Three, Four, Four, Four]
100010  [Four, Four, Four, Four]

Unfortuanetly I've been unable to make a function to convert between them without a map. Though I'm working on it...
Until that I guess sorting the elements and looking it up in a table will work 😕

Update 2021-04-26 Encoding!🔗

After a lot of attempts, and this problem burning in the back of my mind, 2 months later I've found a solution.
The breakthrough was figuring out that if you can figure out how to count how many options there are left, you can work out which option you're at.

You could do this by initializing a loop for counting at some state

count = 0
for i in range (0,4):
  for j in range (i, 4):
    for k in range(j, 4):
      for l in range(k, 4):
        count += 1
print(count)

Run python script via webassembly (Might take a long time and use a lot of mobile data!):

Which we knew, but for some reason it didn't click that we could easilly count the states above our original number by just starting at it.

expressing this as mafs would be:

$\sum_{i = 1}^4 \sum_{j = i}^4 \sum_{k = j}^4 \sum_{l = k}^4 1$

Similarly you can count just the last two digits, and then remove those from the total.
This way you can find out which one of those options are the initial state.

$\begin{alignedat}{2} &S_4(O, A) &&= \sum_{i = A}^O \sum_{j = i}^O \sum_{k = j}^O \sum_{l = k}^O 1 \\ &S_3(O, B) &&= \sum_{i = B}^O \sum_{j = i}^O \sum_{k = j}^O 1 \\ &S_2(O, C) &&= \sum_{i = C}^O \sum_{j = i}^O 1 \\ &S_1(O, D) &&= O - D \\ &E(O, A, B, C, D) &&= \underbrace{S_3(O, 1)}_{\text{All options}} - \underbrace{(S_1(O,D) + S_2(O, C+2) + S_3(O, B+2) + S_4(O, A+2))}_{\text{All options above the initial state}} \end{alignedat}$

Where O is the base (4), and the set size being 4 in this case.

You could probably just, uh, count up instead, but I didn't really think of that at the time...

In the end though you end up with some nice numbers:

E(4, 0,0,0,0) =  0
E(4, 0,0,0,1) =  1
E(4, 0,0,0,2) =  2
E(4, 0,0,0,3) =  3
E(4, 0,0,1,1) =  5
E(4, 0,0,1,2) =  6
...
E(4, 3,3,3,3) = 35

Update 2021-04-28🔗

What I am actually looking for is apparently something called "Arithmetic coding".
I can generate a statistical model for which "symbols" should be available in each step really easilly.

Typical that you find the answer a couple of days after having made progress 🤣

2020-03-01 failed attempt at impementation

I'll do an example of given a $\text{base}_0 = 10$ three digit number: $562$.

First we must sort the digits in ascending order, I've named each position in the number $a$, $b$, and $c$.

$$a \leq b \leq c$$

$$562 \to 256$$

Now we take the first digit ($a = 2$ in this case), and do a "normal" step when turning digits into a number. The next step does the same but since we know the digit cannot be smaller than the last number, we can remove those possibilities from the base.

$\begin{alignedat}{3} u &= a*(\text{base}_0)^2 &&+ b*(\text{base}_0-a)^1 &&+ c*(\text{base}_0-b)^0 \\ u &= 2*10^2 &&+ 5*8 &&+ 6*1 \\ &= 200 &&+ 40 &&+ 6 \\ u & = 246 \end{alignedat}$

And that's it! Now you have a number that represents the original number but without such pesky unimportant things encoded like digit position..

To go the other way is also quite simple with some integer math.

$u = 246, a_u = 2, b_u = 4, c_u = 6,$

let's find $a$ first:

$\begin{alignedat}{2} &u_a &&= u \\ &a &&= \bigg\lfloor \frac{u_a}{\text{base}_0^2} \bigg\rfloor \\ &a &&= \bigg\lfloor \frac{246}{10^2} \bigg\rfloor = \lfloor 2.4 \rfloor = 2 \\ &a_n &&= a * \text{base}_0^2 \\ &a_n &&= 2*10^2 = 200 \end{alignedat}$

Continuing with $b$:

We first remove from $a_n$ from $u$:

$\begin{alignedat}{2} &u_b &&= u_a - a_n \\ &u_b &&= 246 - 200 = 46 \end{alignedat}$

Then we do just the same thing as in we did to find $a$, but this time we change the base similarly to how we did it when we encoded, we're just dividing instead of multiplying. NB: $a_u$ the "unordered" "$a$" is what's being used here, NOT the original $a$.

$\begin{alignedat}{2} &b &&= \bigg\lfloor \frac{u_b}{(\text{base}_0-a_u)^1} \bigg\rfloor \\ &b &&= \bigg\lfloor \frac{46}{(10-2)^1} \bigg\rfloor = \lfloor 5.75 \rfloor = 5 \\\\ &b_n &&= 5 * 8 = 40 \end{alignedat}$

then $c$:

$\begin{alignedat}{2} &u_c &&= 46 - 40 = 6 \\\\ &b &&= \bigg\lfloor \frac{u_c}{(\text{base}_0-b_u)^0} \bigg\rfloor \\ &b &&= \bigg\lfloor \frac{6}{(10-4)^0} \bigg\rfloor = 6 \\\\ &b_n &&= 6 * 1 = 6 \end{alignedat}$

At last, now that we have $a$, $b$, and $c$ we can construct $n$:

$\begin{alignedat}{2} &n &&= a*10^2 + b * 10^1 + c * 10^0 \\ &n &&= 2 * 10^2 + 5 * 10^1 + 6 * 10^0 \\ &n &&= 256 \end{alignedat}$

Update 2021-03-03

I've turned the ideas into a functions which are a little more concise.

$\begin{aligned} &n(x)=\lfloor \log x \rfloor + 1 \\ &f(x, y) = \frac{(y \mod 10^{n(y)+1-x}) - (y \mod 10^{n(y)-x})}{10^{n(y)-x}} \\ &u(o) = \sum_{i = 1}^{n(o)} f(i,o)*(10-f(i-1, o))^{n(o)-i} &o \in \text{A009994} \\ &o(u) = \sum_{i=1}^{n(u)} \frac{f(i, u)*10^{n(u)-i}}{(10-f(i-1, u))^{n(u)-i}}*10^{n(u)-i} &u \in \text{A009994} \end{aligned}$

$f(x, y)$ takes an index $x$ and a number $y$, then gives you the digit at that position from left to right.

$n(x)$ is used to count how many digits there are in a number.

$u(o)$ encodes a number to it's unordered representation (digits must be in increasing order) $o(u)$ decodes an unordered number back into a "ordered" number.

Update #2 2021-03-03

from math import log10,floor

def n(x):
    return floor(log10(x))+1

def f(x, y):
    return (y%10**(n(y)+1-x) - y%10**(n(y)-x))/(10**(n(y)-x))

def u(o):
    sum = 0
    for i in range(1, n(o)+1):
        sum += f(i, o)*(10-f(i-1, o))**(n(o)-i)
    return sum

def o(u):
    sum = 0
    for i in range(1, n(u)+1):
        sum += (f(i, u)*10**(n(u)-i)/((10-f(i-1,u))**(n(u)-i)))*10**(n(u)-i)
    return sum

print(u(256))
print(o(246))

Run python script via webassembly (Might take a long time and use a lot of mobile data!):

Alternative nonjs log

246.0
256.0

Update #3 2021-03-03

Unfortuanetly these functions do not give a perfect compression level.
It is better, just not perfect, and probably not worth it

Contents

Unordered Numbers

Update 2021-04-26 Encoding!🔗

Update 2021-04-28🔗