Pagina's

2013/12/30

MWC58 Fast RNG small period part 4

/*
"MWC" stands for "Multiply With Carry", "58" stands for the period: > 2^58 
Actually the period is larger than 2^60, the first version had a smaller period.
A big drawback: The usable period is a lot smaller. Despite this drawback,
MWC58 might be usefull:
-128 non correlated sequences of  random numbers,
 each sequence contains at least 590 million numbers,
 sequences are repeatable, initialisation is fast.
-seed another RNG

It's  based on: Marsaglia's message(1998)
    
MWC58:
  
    private static uint mwc58()
    {
        z0 = m0 * (z0 & 65535) + (z0 >> 16);   //  both (m0<<15)-1 and (m0<<16)-1 are prime
        z1 = m1 * (z1 & 65535) + (z1 >> 16);   //  both (m1<<15)-1 and (m1<<16)-1 are prime
        return z0 + (z1 << 16);
    }

Why is this fast? From a publication by Mark A. Overton(2011):
 
    Fast, High-Quality, Parallel Random Number Generators
    
Two independent generators are used: z0 = m0 * ... , z1 = m1 * ...   
The compiler places instructions in an order that causes the processor to 
pipeline the computations of the generators. As a result of this parallelizing, 
the generators execute even faster than expected. The second multiply, z1 = m1 * ...
only adds one more clock cycle.  

The period "p0" of the first generator is: p0 = m0 * 2^15 ,
with a seed value "s0": 0 < s0 <= 2 * p0 - 2 .
The period of MWC58: m0 * m1 * 2^30 .
The usable period is a lot smaller. The 16 lsb's (least significant bits) are
the 16 lsb's of the first generator: z0 , with period: p0 = m0 * 2^15.
The most significant bits depend much on the second generator: z1 ,
where the highest bits have the largest dependencies.
For the smallest value of m0: 18030, MWC58 generates 18030 * 2^15 (~5.9 * 10^8)
random numbers in ~2.25 seconds. In other words: When 5.9 * 10^8 random numbers
are needed, MWC58 generates them fast.

With 18030 <= m <= 65184 , there are 256 valid multipliers:
    ushort[] m = { 18030 ... 65184 };
With fixed multipliers it is possible to use a 32 bits seed value,
from which valid values for s0 and s1 can be dereived.
When we want different sequences, using different seed values,
to be more divergent (less correlated), it is better to use the seed for
indices in the "ushort[] m" array.

Off topic: The k-th (ordered) combination of { 0, 1, 2, 3, ... , n } , or
the k-th 2-element subset of the set { 0, 1, 2, 3, ... , n }
Triangular numbers and their inverse are involved.


 
    private static uint[] tri_com(uint k, uint n) // n > 0                //     n = 3   
    {                                                                     //   k | i | j 
        uint t, i, j;                                                     //  ---|---|---
        t = n * (n + 1) / 2;                                              //   0 | 0 | 1 
        k %= t;                                                           //   1 | 0 | 2 
        i = n - 1 - (uint)((Math.Sqrt(8 * (t - k - 1) + 1) - 1) / 2);     //   2 | 0 | 3 
        j = (i * (i + 1) / 2 + k) % n + 1;                                //   3 | 1 | 2 
        uint[] ij = { i, j };                                             //   4 | 1 | 3 
        return ij;                                                        //   5 | 2 | 3 
    }                                                                     //   6 | 0 | 1 
                                                                       
When combined indices are used like: 0,3 1,2 (2 combinations)
random sequences generated by those combinations are less (not?) correlated.
With 256 multipliers there are 128 combinations.
 
The scaled version:

"mwc_uint(u)" returns a value x ( 0 <= x <= u ).
Before I used a trick to save random bits, which is not necessary any longer.
An example: mwc_uint(1) returns 0 or 1.
Under the hood 32 random bits are/were generated, they were put in a buffer,
from which the bits were taken out, one by one.
That required some code: buffer empty? refill, etc.
Some code takes some time.
Now one bit is returned, 31 remaining bits are not used,
it is faster to generate 32 new bits, and to use just one of them.
The rather small period is a bottleneck, a solution: CMR63

Results:
    
    10^7 times rand.Next() takes: 107 ms (~11 ns per random number)
    10^7 times rand.Next(int someValue) takes: 236 ms (~24 ns ...)
    
    |--------|--------------------------------------------------------|      
    |        |                        ms 10^7 *                       |      rand.Next()    11 ns      
    |        |-------------|--------------|-------------|-------------|          mwc58()     4 ns
    |      u | mwc_uint(u) | well_uint(u) | gen_uint(u) | g_uint_c(u) |     rand.Next(i)    24 ns                 
    |--------|------|------|-------|------|-------|-----|------|------|      mwc_uint(u)    18 ns, mean  ~12 ns
    |      0 |   20 |      |    24 |      |    24 |     |   24 |      |     well_uint(u)    44 ns, mean  ~32 ns  
    |      1 |      |  51  |       |   74 |       |  73 |      |  139 |      gen_uint(u)    50 ns, mean  ~35 ns   
    |      2 |   96 |      |   121 |      |   124 |     |  296 |      |      g_uint_c(u)   280 ns, mean ~250 ns
    |      3 |      |  51  |       |   80 |       |  78 |      |  211 |       
    |      4 |  133 |      |   222 |      |   242 |     |  518 |      |                 
    |      7 |      |  51  |       |   82 |       |  83 |      |  282 |               
    |      8 |  153 |      |   272 |      |   290 |     |  680 |      |     
    |  2^7-1 |      |  51  |       |   98 |       | 102 |      |  562 |     
    |  2^7-0 |  172 |      |   344 |      |   380 |     | 1079 |      |           
    | 2^15-1 |      |  51  |       |  131 |       | 130 |      | 1123 |               
    | 2^15-0 |  174 |      |   379 |      |   414 |     | 1634 |      |               
    | 2^30-1 |      |  51  |       |  193 |       | 193 |      | 2164 |  
    | 2^30-0 |  175 |      |   431 |      |   494 |     | 2672 |      |  
    | 2^31-1 |      |  51  |       |  195 |       | 210 |      | 2251 |  
    | 2^31-0 |  175 |      |   427 |      |   461 |     | 2747 |      |  
    | 2^32-1 |      |  51  |       |  196 |       | 211 |      | 2307 |  
    |--------|------|------|-------|------|-------|-----|------|------| 

http://www.cacert.at/cgi-bin/rngresults  (code: find and hover mouse over "MWC58")
 
|---------|---------|--------|----------|----------|--------|---------|----------|----------|----------|----------|
|  Gene-  |  Speed  | Upload | Entropy  | Birthday | Matrix | 6*8 Ma- | Minimum  | Random   | Squeeze  | Overlap- |
|  rator  | (MiB/s) |  Size  | (->8)    | Spacing  | Ranks  | trix R  | Distance | Spheres  |          | ping Sums|
|-------- |---------|--------|----------|----------|--------|---------|----------|----------|----------|----------|
|   mwc58 |  1000   | 19 MiB | 7.999990 | 0.161056 |  0.504 |  0.920  | 0.877503 | 0.432728 | 0.677887 | 0.458630 |
| well512 |   400   | 19 MiB | 7.999992 | 0.446363 |  0.636 |  0.879  | 0.967495 | 0.690789 | 0.942252 | 0.005987 |
|  Random |   370   | 17 MiB | 7.999989 | 0.000000 |  0.392 |  0.136  | 0.063920 | 0.905097 | 0.098440 | 0.005467 |
|     CSP |    34   | 15 MiB | 7.999987 | 0.032102 |  0.084 |  0.942  | 0.894277 | 0.036221 | 0.706538 | 0.023523 |
|-------- |---------|--------|----------|----------|--------|---------|----------|----------|----------|----------|

*/

using System;
class mwc
{
    private static uint m0 = 18030, z0 = 325080900, m1 = 65184, z1 = 4248953856;

    private static uint mwc58()
    {
        z0 = m0 * (z0 & 65535) + (z0 >> 16);
        z1 = m1 * (z1 & 65535) + (z1 >> 16);
        return z0 + (z1 << 16);
    }

    private static uint g_u = 0;
    private static int g_s = 0;

    private static uint mwc_uint(uint u)
    {
        uint x;
        if (u == 0) return 0;
        if (u != g_u)
        {
            g_u = u;
            g_s = 32 - nob(u);
        }
        do
        {
            z0 = m0 * (z0 & 65535) + (z0 >> 16);
            z1 = m1 * (z1 & 65535) + (z1 >> 16);
            x = (z0 + (z1 << 16)) >> g_s;
        }
        while (x > u);
        return x;
    }

    private static void init(uint seed)
    {
        ushort[] m = {
            18030, 18273, 18513, 18879, 19074, 19098, 19164, 19215, 19584, 19599,
            19950, 20088, 20508, 20544, 20664, 20814, 20970, 21153, 21243, 21423,
            21723, 21954, 22125, 22188, 22293, 22860, 22938, 22965, 22974, 23109,
            23124, 23163, 23208, 23508, 23520, 23553, 23658, 23865, 24114, 24219,
            24660, 24699, 24864, 24948, 25023, 25308, 25443, 26004, 26088, 26154,
            26550, 26679, 26838, 27183, 27258, 27753, 27795, 27810, 27834, 27960,
            28320, 28380, 28689, 28710, 28794, 28854, 28959, 28980, 29013, 29379,
            29889, 30135, 30345, 30459, 30714, 30903, 30963, 31059, 31083, 31215,
            31353, 31488, 31743, 32430, 32718, 33105, 33189, 33249, 33375, 33378,
            33663, 33768, 33858, 33894, 34158, 34323, 34383, 34590, 34653, 34890,
            35355, 35523, 35643, 36309, 36594, 36804, 36969, 37698, 37935, 37959,
            38079, 38223, 38283, 38484, 38568, 38610, 38649, 38733, 38850, 39444,
            39618, 39690, 39948, 40833, 40995, 41019, 41064, 41289, 41628, 41793,
            41874, 42153, 42444, 42513, 42594, 42633, 42699, 42819, 42903, 42975,
            43038, 43155, 43473, 43563, 43995, 44019, 44568, 44574, 44994, 45723,
            45729, 45780, 45789, 45915, 45939, 46515, 47088, 47529, 48015, 48033,
            48195, 48204, 48393, 49209, 49248, 49299, 49458, 50034, 50223, 50580,
            50589, 50694, 50853, 50988, 51198, 51558, 51618, 51729, 51744, 51813,
            51873, 51933, 52023, 52215, 52275, 52509, 52743, 52950, 53130, 53199,
            53529, 53709, 53898, 53934, 53958, 54144, 54168, 54399, 54474, 54564,
            54885, 55044, 55074, 55179, 55254, 55680, 55809, 55848, 55869, 56205,
            56538, 56604, 56790, 56859, 57039, 57204, 57225, 57525, 57603, 57774,
            57780, 57918, 58149, 58368, 58443, 58758, 59253, 59325, 59775, 60009,
            60060, 60489, 60735, 60990, 61140, 61578, 61914, 62505, 62634, 62778,
            62790, 62865, 62874, 62904, 63129, 63273, 63444, 63663, 63765, 63885,
            64185, 64314, 64455, 64545, 64860, 65184 };

        seed &= 127; m0 = m[seed]; z0 = m0 * m0;
        seed ^= 255; m1 = m[seed]; z1 = m1 * m1;
    }

    private static int nob(uint u) // number of bits, position of the highest set bit, u > 0  
    {
        return
        u < 1u << 16 ? u < 1u << 08 ? u < 1u << 04 ? u < 1u << 02 ? u < 1u << 01 ? 01 : 02 :
                                                                    u < 1u << 03 ? 03 : 04 :
                                                     u < 1u << 06 ? u < 1u << 05 ? 05 : 06 :
                                                                    u < 1u << 07 ? 07 : 08 :
                                      u < 1u << 12 ? u < 1u << 10 ? u < 1u << 09 ? 09 : 10 :
                                                                    u < 1u << 11 ? 11 : 12 :
                                                     u < 1u << 14 ? u < 1u << 13 ? 13 : 14 :
                                                                    u < 1u << 15 ? 15 : 16 :
                       u < 1u << 24 ? u < 1u << 20 ? u < 1u << 18 ? u < 1u << 17 ? 17 : 18 :
                                                                    u < 1u << 19 ? 19 : 20 :
                                                     u < 1u << 22 ? u < 1u << 21 ? 21 : 22 :
                                                                    u < 1u << 23 ? 23 : 24 :
                                      u < 1u << 28 ? u < 1u << 26 ? u < 1u << 25 ? 25 : 26 :
                                                                    u < 1u << 27 ? 27 : 28 :
                                                     u < 1u << 30 ? u < 1u << 29 ? 29 : 30 :
                                                                    u < 1u << 31 ? 31 : 32;
    }

    static void Main()
    {
        // test ( part 1 ? )
    }    
}

2013/12/09

WELL512 scaled random numbers part 3


/*
"WELL" stands for "Well Equidistributed Long-period Linear",
"512" stands for the period: 2^512
Chris Lomont's version is used, visit:

    http://www.lomont.org/Math/Papers/2008/Lomont_PRNG_2008.pdf
  
and read about the advantages of the "WELL512" PRNG.
A "state" array of 16 uints is used, it's seeded by another PRNG: "System.Random".
"System.Random" uses a seed, the absolute value of an int.
Sequences of random numbers generated by System.Random, using different seed values,
are correlated, the left side of the image above clearly shows patterns.
The right side looks pretty random, sequences appear to be less correlated.
In other words: It is possible to regenerate a high quality random sequence,
using the same seed value. Different sequences, using different seed value's, 
seem to be less correlated.


    Results:
    
    10^7 times rand.Next() takes: 107 ms (~11 ns per random number)
    10^7 times rand.Next(int someValue) takes: 236 ms (~24 ns ...)
    
    |--------|------------------------------------------|       rand.Next()    11 ns      
    |        |                 ms 10^7 *                |      rand.Next(i)    24 ns              
    |        |--------------|-------------|-------------|      well_uint(u)    44 ns, mean  ~32 ns
    |      u | well_uint(u) | gen_uint(u) | g_uint_c(u) |       gen_uint(u)    50 ns, mean  ~35 ns    
    |--------|-------|------|-------|-----|------|------|       g_uint_c(u)   280 ns, mean ~250 ns
    |      0 |    24 |      |    24 |     |   24 |      |         
    |      1 |       |   74 |       |  73 |      |  139 |          
    |      2 |   121 |      |   124 |     |  296 |      |            
    |      3 |       |   80 |       |  78 |      |  211 |       
    |      4 |   222 |      |   242 |     |  518 |      |                 
    |      7 |       |   82 |       |  83 |      |  282 |               
    |      8 |   272 |      |   290 |     |  680 |      |     
    |  2^7-1 |       |   98 |       | 102 |      |  562 |     
    |  2^7-0 |   344 |      |   380 |     | 1079 |      |           
    | 2^15-1 |       |  131 |       | 130 |      | 1123 |               
    | 2^15-0 |   379 |      |   414 |     | 1634 |      |               
    | 2^30-1 |       |  193 |       | 193 |      | 2164 |  
    | 2^30-0 |   431 |      |   494 |     | 2672 |      |  
    | 2^31-1 |       |  195 |       | 210 |      | 2251 |  
    | 2^31-0 |   427 |      |   461 |     | 2747 |      |  
    | 2^32-1 |       |  196 |       | 211 |      | 2307 |  
    |--------|-------|------|-------|-----|------|------|


http://www.cacert.at/cgi-bin/rngresults  find and hover mouse over "well512" for the code.

|---------|---------|--------|----------|----------|--------|---------|----------|----------|----------|----------|
|  Gene-  |  Speed  | Upload | Entropy  | Birthday | Matrix | 6*8 Ma- | Minimum  | Random   | Squeeze  | Overlap- |
|  rator  | (MiB/s) |  Size  | (->8)    | Spacing  | Ranks  | trix R  | Distance | Spheres  |          | ping Sums|
|-------- |---------|--------|----------|----------|--------|---------|----------|----------|----------|----------|
| WELL512 |   400   | 19 MiB | 7.999992 | 0.446363 |  0.636 |  0.879  | 0.967495 | 0.690789 | 0.942252 | 0.005987 |
|  Random |   370   | 17 MiB | 7.999989 | 0.000000 |  0.392 |  0.136  | 0.063920 | 0.905097 | 0.098440 | 0.005467 |
|     CSP |    34   | 15 MiB | 7.999987 | 0.032102 |  0.084 |  0.942  | 0.894277 | 0.036221 | 0.706538 | 0.023523 |
|-------- |---------|--------|----------|----------|--------|---------|----------|----------|----------|----------|  
  
*/

using System;
using System.Drawing; // add a ref

class scaled_well512
{
    private static uint[] state = new uint[16];
    private static int index = -1;

    private static void init_well(int seed)
    {
        seed = Math.Abs(seed);
        index = seed & 15;
        var rand = new Random(seed);
        uint u = (uint)rand.Next(1 << 30);
        for (int i = 0; i < 15; i++)
        {
            state[i] = (uint)rand.Next(1 << 30) << 2 | u & 3u;
            state[i] ^= (uint)rand.Next(1 << 30);
            state[i] ^= (uint)rand.Next(1 << 30);
            u >>= 2;
        }
        state[15] = (uint)rand.Next(1 << 30) << 2 | (uint)rand.Next(4);
        state[15] ^= (uint)rand.Next(1 << 30);
        state[15] ^= (uint)rand.Next(1 << 30);
    }

    private static uint well512()
    {
        uint a, b, c, d;
        a = state[index];
        c = state[index + 13 & 15];
        b = a ^ c ^ (a << 16) ^ (c << 15);
        c = state[index + 9 & 15];
        c ^= c >> 11;
        a = state[index] = b ^ c;
        d = a ^ ((a << 5) & 0xda442d24u);
        index = index + 15 & 15;
        a = state[index];
        state[index] = a ^ b ^ d ^ (a << 2) ^ (b << 18) ^ (c << 28);
        return state[index];
    }

    private static ulong buf = 0;              // 64 random bits buffer
    private static int av = 0;                 // available bits in buf
    private static uint g_u = 0;               // copy of u
    private static int g_nd = 0;               // nd (needed) bits = highest set bit of u
    private static int g_s = 0;                // shift
    private static bool ones = false;          // true if u = 10, 110, 1110, ...

    private static uint well_uint(uint u)
    {
        uint x; int s;
        if (u == 0) return 0;
        if (u != g_u)
        {
            g_u = u;
            g_nd = nob(u);
            g_s = 64 - g_nd;
            ones = (u + 1 & u + 2) == 0;
        }
    L0: if (av < g_nd)
        {
            av += 32;
            buf <<= 32;
            buf |= well512();
        }
        x = (uint)buf << g_s >> g_s;
        buf >>= g_nd;
        av -= g_nd;
        if (x <= u) return x;
        if (ones) goto L0;                     // faster when u = 2, 6, 14, ...
        s = nob(x ^ u) - 1;
        av += s;
        buf <<= s;
        buf ^= x;
        goto L0;
    }

    private static int nob(uint u) // number of bits, position of the highest set bit, u > 0  
    {
        return
        u < 1u << 16 ? u < 1u << 08 ? u < 1u << 04 ? u < 1u << 02 ? u < 1u << 01 ? 01 : 02 :
                                                                    u < 1u << 03 ? 03 : 04 :
                                                     u < 1u << 06 ? u < 1u << 05 ? 05 : 06 :
                                                                    u < 1u << 07 ? 07 : 08 :
                                      u < 1u << 12 ? u < 1u << 10 ? u < 1u << 09 ? 09 : 10 :
                                                                    u < 1u << 11 ? 11 : 12 :
                                                     u < 1u << 14 ? u < 1u << 13 ? 13 : 14 :
                                                                    u < 1u << 15 ? 15 : 16 :
                       u < 1u << 24 ? u < 1u << 20 ? u < 1u << 18 ? u < 1u << 17 ? 17 : 18 :
                                                                    u < 1u << 19 ? 19 : 20 :
                                                     u < 1u << 22 ? u < 1u << 21 ? 21 : 22 :
                                                                    u < 1u << 23 ? 23 : 24 :
                                      u < 1u << 28 ? u < 1u << 26 ? u < 1u << 25 ? 25 : 26 :
                                                                    u < 1u << 27 ? 27 : 28 :
                                                     u < 1u << 30 ? u < 1u << 29 ? 29 : 30 :
                                                                    u < 1u << 31 ? 31 : 32;
    }

    static void Main()
    {
        random_bitmap();
        well_bitmap();
    }

    private static void random_bitmap()
    {
        int xmax = 450;
        int ymax = 450;
        var bmp = new Bitmap(xmax + 1, ymax + 1);
        int x, y, i = 0; uint z;
        for (x = 0; x <= xmax; x++)
        {
            for (y = 0; y < ymax; y++)
            {
                var rand = new Random(i++);
                z = (uint)rand.Next(1 << 30);
                if ((z & 1) == 0) bmp.SetPixel(x, y, Color.White);
                else bmp.SetPixel(x, y, Color.Black);
            }
        }
        bmp.Save(xmax.ToString() + "R.bmp");
    }

    private static void well_bitmap()
    {
        int xmax = 450;
        int ymax = 450;
        var bmp = new Bitmap(xmax + 1, ymax + 1);
        int x, y, i = 0; uint z;
        for (x = 0; x <= xmax; x++)
        {
            for (y = 0; y < ymax; y++)
            {
                init_well(i);
                i += 1;
                //z = well512();
                z = well_uint(1 << 30);
                z >>= 0;
                if ((z & 1) == 0) bmp.SetPixel(x, y, Color.White);
                else bmp.SetPixel(x, y, Color.Black);
            }
        }
        bmp.Save(xmax.ToString() + "W.bmp");
    }
}