<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Computo ergo sum</title>
    <link>https://hiliev.eu/</link>
      <atom:link href="https://hiliev.eu/index.xml" rel="self" type="application/rss+xml" />
    <description>Computo ergo sum</description>
    <generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><copyright>(cc) 2008&amp;ndash;2020 Hristo Iliev</copyright><lastBuildDate>Sun, 09 Jul 2017 20:15:00 +0200</lastBuildDate>
    <image>
      <url>https://hiliev.eu/images/icon_hu0b7a4cb9992c9ac0e91bd28ffd38dd00_9727_512x512_fill_lanczos_center_2.png</url>
      <title>Computo ergo sum</title>
      <link>https://hiliev.eu/</link>
    </image>
    
    <item>
      <title>Recovering Datasets from Broken ZFS raidz Pools</title>
      <link>https://hiliev.eu/post/recovering-datasets-from-broken-zfs-raidz-pools/</link>
      <pubDate>Sun, 09 Jul 2017 20:15:00 +0200</pubDate>
      <guid>https://hiliev.eu/post/recovering-datasets-from-broken-zfs-raidz-pools/</guid>
      <description>&lt;p&gt;There are generally two kinds of people&amp;ndash;those who&amp;rsquo;ve suffered a severe data loss and those who are about to suffer a severe data loss.
I repeatedly jump back and forth between the two kinds.&lt;/p&gt;
&lt;p&gt;Recently, a combination of hardware defects and a series of power outages rendered the raidz pool of the NAS of my previous research group unreadable.
The OS, an old Solaris 10 x86, would not import the pool with a dreaded I/O error message.
We tried importing in various modern OpenSolaris-based live distributions, even forcing the kernel to try and fix errors when possible, to no success.
Perhaps disabling the ZIL (because of performance problems with NFS clients) wasn&amp;rsquo;t that good idea after all.
The lack of resources for proper preventive maintenance meant that there were no real backups to restore from.
Gone were a lot of research data, source codes, PhD theses, mails, and web content.
In the face of the growing despair, as it all happened in the middle of several ongoing project calls, and the rapidly approaching need to accept that the data is most likely gone for good and one has to start anew, I got curious&amp;ndash;what could really break in the &amp;ldquo;unbreakable&amp;rdquo; ZFS?
Previous to that moment, ZFS was to me just a magical filesystem that can do all those things such as cheaply creating multiple filesets and instantaneous snapshots, and I never had real interest in learning how is this all implemented.
This time my curiosity won and I asked the sysadmin to wait a while before wiping the disks and let me first poke around the filesystem and see if I could make it readable again.
In the end, what started as a set of Python scripts to read and display on-disk data structures quickly grew into a very functional minimalistic ZFS implementation capable of reading and exporting entire datasets.&lt;/p&gt;
&lt;p&gt;It turned out that a fundamental structure of the ZFS pool known as Meta Object Set (MOS) was badly damaged, therefore the pool couldn&amp;rsquo;t be restored to an importable state, at least not without a more than considerable amount of effort, but reading what was readable brought back to life 1.4 TiB of data, among which all the research data and theses.
Most of the hard work in restoring the data happened automatically using a Python tool I assembled in the process.
It wasn&amp;rsquo;t easy to develop, especially given the poor state of implementation information about ZFS and the differences between the on-disk format and the one described in the only official documentation I was able to find online, therefore I&amp;rsquo;m making it available under open-source license in hope that it could help someone else too.
Most kudos go to Max Bruning for his conference talks on YouTube and especially the 
&lt;a href=&#34;http://mbruning.blogspot.de/2009/12/zfs-raidz-data-walk.html&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;ZFS Raidz Data Walk&lt;/a&gt; blog article and to the FreeBSD project for their excellent port of ZFS, which served as reference source code for some of the modules in my implementation.&lt;/p&gt;
&lt;p&gt;The tool&amp;ndash;aptly named &lt;code&gt;py-zfs-rescue&lt;/code&gt;&amp;ndash;is available under the 3-clause BSD license in its 
&lt;a href=&#34;https://github.com/hiliev/py-zfs-rescue&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;GitHub repo&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;technical-details&#34;&gt;Technical Details&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;py-zfs-rescue&lt;/code&gt; is not for the faint of heart as there are no command-line options of any kind and the configuration is performed exclusively by altering the source code, therefore some knowledge of Python 3 is required.
Also, one has to have a good idea of how ZFS is structured internally, thus I provide here a quick overview of the ZFS on-disk format.&lt;/p&gt;
&lt;p&gt;ZFS is an incredibly complex filesystem, but fundamentally it is just a huge tree of &lt;em&gt;blocks&lt;/em&gt; with the leaf blocks containing data and the rest containing various types of metadata.
The tree is rooted in what is known as &lt;em&gt;uberblock&lt;/em&gt;, which serves the same purpose as the superblock in most filesystems.
The uberblock itself (actually an entire array of uberblocks) is part of the ZFS &lt;em&gt;label&lt;/em&gt;, four copies of which are found on any device, disk partition or file that are part of a ZFS vdev and contains besides the uberblock array a collection of key-value pairs with information about the type of the vdev and its constituent elements.
A typical label (in a 6-dev raidz1) looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-text&#34;&gt;txg: 8510825
name: pool
version: 10
guid: 6106808927530115088
vdev_tree:
  children[0]:
    guid: 6106808927530115088
    id: 0
    type: disk
    path: /dev/dsk/c3t0d0s7
    devid: id1,sd@f0000000049be3b9a000ea8d90002/h
    whole_disk: 0
    phys_path: /pci@0,0/pci15d9,d280@1f,2/disk@0,0:h
    DTL: 35
  ... two children omitted for brevity ...
  children[3]:
    guid: 9245908906035854570
    id: 3
    type: disk
    path: /dev/dsk/c3t3d0s7
    faulted: 1
    devid: id1,sd@f0000000049bbf10a000ac4500003/h
    whole_disk: 0
    phys_path: /pci@0,0/pci15d9,d280@1f,2/disk@3,0:h
    DTL: 33
  ... two children omitted for brevity ...
  guid: 14559490109128549798
  asize: 2899875201024
  nparity: 1
  id: 0
  metaslab_array: 14
  metaslab_shift: 34
  is_log: 0
  ashift: 9
  type: raidz
hostid: 237373741
pool_guid: 1161904676014256579
hostname: spof
state: 0
top_guid: 14559490109128549798
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The label contains the pool &lt;code&gt;name&lt;/code&gt; (&lt;code&gt;pool&lt;/code&gt; in our case), the &lt;code&gt;guid&lt;/code&gt; of the component, the ZFS version, the list of vdevs in the pool, their type and constituent devices (&lt;code&gt;vdev_tree&lt;/code&gt;), the pool &lt;code&gt;state&lt;/code&gt;, and information about the host that the pool belongs to (ours is named &lt;code&gt;spof&lt;/code&gt; for Single Point Of Failure, which it indeed proved to be&amp;hellip;)
By matching the GUIDs of the individual vdev components with the GUIDs in the &lt;code&gt;vdev_tree&lt;/code&gt; list the OS is capable of assembling the pool even if the device names/paths change.
Faulty components are marked accordingly like the fourth child (&lt;code&gt;/dev/dsk/c3t3d0s7&lt;/code&gt;) in this case.
There are two copies of the label at the beginning of each component and two copies at the end.&lt;/p&gt;
&lt;p&gt;Each and every I/O operation in ZFS is performed in the context of a specific transaction, which groups a set of modifications to the data stored on the disk.
When a ZFS object is written to the disk, the transaction number is recorded as part of the metadata.
Unlike most other filesystems, ZFS stores data and metadata in blocks of varying sizes.
Each block is located by its &lt;em&gt;block pointer&lt;/em&gt;, which holds the type of the block, checksum of its contents, the location and physical (eventually compressed) size (collectively known as DVA) of up to three copies of the block data, the logical size of the data, and the compression type.&lt;/p&gt;
&lt;p&gt;ZFS is organised as a set of objects with each object represented by a &lt;em&gt;dnode&lt;/em&gt; (equivalent to the &lt;em&gt;inode&lt;/em&gt; in Unix filesystems) containing pointers to up to three associated groups of data blocks.
For some really small objects the data is stored within the free space of the dnode block itself and there are no associated data blocks.
dnodes are organised in arrays called &lt;em&gt;Object Sets&lt;/em&gt; with the notable exception of the dnode for the top-level object set (the MOS), a pointer to which is located in the uberblock.
By default, there are three copies of the MOS, two copies of the other metadata objects including the object sets and directory objects, and a single copy of the file data blocks.
The metadata blocks are usually compressed with 
&lt;a href=&#34;https://en.wikipedia.org/wiki/LZJB&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;LZJB&lt;/a&gt; (LZ4 in newer ZFS versions) while the file data blocks are uncompressed unless the dataset is configured accordingly.
There is a maximum block size of 128 KiB and larger objects are stored using block trees with blocks at the nodes containing arrays of block pointers to the lower levels.
Some simple modulo and integer division arithmetic is used to figure out which intermediate (&lt;em&gt;indirect&lt;/em&gt; in ZFS terminology) block at each level of the tree contains the relevant pointer.
The depth of the block tree is stored in the dnode.
All top-level objects such as datasets, dataset property lists, space maps, snapshots, etc., are stored in the MOS.&lt;/p&gt;
&lt;p&gt;Datasets are implemented as separate object sets consisting of all files and directories in a given dataset plus two (or more in newer ZFS implementations) special ZFS objects&amp;ndash;the &lt;em&gt;master node&lt;/em&gt; of the dataset and the &lt;em&gt;delete queue&lt;/em&gt;.
A typical dataset looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-text&#34;&gt;[ 0] &amp;lt;unallocated dnode&amp;gt;
[ 1] [ZFS master node] 1B 1L/16384 blkptr[0]=&amp;lt;[L0 ZFS master node] 200L/200P DVA[0]=&amp;lt;0:92e4c6400:600&amp;gt; DVA[1]=&amp;lt;0:b1ca2b1800:600&amp;gt; birth=448585 fletcher4 off LE contiguous fill=1&amp;gt;
[ 2] [ZFS delete queue] 1B 1L/16384 blkptr[0]=&amp;lt;[L0 ZFS delete queue] 200L/200P DVA[0]=&amp;lt;0:b2024e5c00:600&amp;gt; DVA[1]=&amp;lt;0:13110532c00:600&amp;gt; birth=449850 fletcher4 off LE contiguous fill=1&amp;gt;
[ 3] [ZFS directory] 1B 1L/16384 blkptr[0]=&amp;lt;[L0 ZFS directory] 200L/200P DVA[0]=&amp;lt;0:b2024e5800:600&amp;gt; DVA[1]=&amp;lt;0:13110532800:600&amp;gt; birth=449850 fletcher4 off LE contiguous fill=1&amp;gt; bonus[264]
[ 4] [ZFS plain file] 1B 1L/16384 blkptr[0]=&amp;lt;[L0 ZFS plain file] 200L/200P DVA[0]=&amp;lt;0:92e4ff400:600&amp;gt; birth=448593 fletcher2 off LE contiguous fill=1&amp;gt; bonus[264]
[ 5] [ZFS plain file] 1B 1L/16384 blkptr[0]=&amp;lt;[L0 ZFS plain file] 200L/200P DVA[0]=&amp;lt;0:92e4ffc00:600&amp;gt; birth=448593 fletcher2 off LE contiguous fill=1&amp;gt; bonus[264]
[ 6] [ZFS plain file] 1B 1L/16384 blkptr[0]=&amp;lt;[L0 ZFS plain file] 200L/200P DVA[0]=&amp;lt;0:92e500000:600&amp;gt; birth=448593 fletcher2 off LE contiguous fill=1&amp;gt; bonus[264]
[ 7] [ZFS plain file] 1B 1L/16384 blkptr[0]=&amp;lt;[L0 ZFS plain file] 200L/200P DVA[0]=&amp;lt;0:b0c5cd3400:600&amp;gt; birth=450114 fletcher2 off LE contiguous fill=1&amp;gt; bonus[264]
[ 8] [ZFS directory] 1B 1L/16384 blkptr[0]=&amp;lt;[L0 ZFS directory] 200L/200P DVA[0]=&amp;lt;0:95e52f000:600&amp;gt; DVA[1]=&amp;lt;0:b1ce66a400:600&amp;gt; birth=449176 fletcher4 off LE contiguous fill=1&amp;gt; bonus[264]
... (many) more dnodes ...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Directories are implemented simply as key-value pair collections with the file name being the key and a bit field of the index in the object set and the file type the value and are stored in so-called &lt;em&gt;ZAPs&lt;/em&gt; (ZAP stands for ZFS Attribute Processor).
The master node of each dataset (always at index 1 in the object set) contains the index of the root directory&amp;rsquo;s ZAP, which index tends to be always equal to &lt;code&gt;3&lt;/code&gt;.
File metadata such as owner, permissions, ACLs, timestamps, etc. is stored in the file&amp;rsquo;s dnode.
In order to reach the content of a specific file in a given dataset, the following has to be done:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Locate the dataset&amp;rsquo;s dnode in the MOS&lt;/li&gt;
&lt;li&gt;Read the content of the dataset&amp;rsquo;s object set&lt;/li&gt;
&lt;li&gt;Read the master node to find the root directory&amp;rsquo;s index&lt;/li&gt;
&lt;li&gt;Read the root directory to find the index of the next directory in the file path&lt;/li&gt;
&lt;li&gt;Repeat recursively the directory traversal until the index of the file object is found&lt;/li&gt;
&lt;li&gt;Walk the associated block tree to find pointers to all the file data blocks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Python tool is capable of recursively following the root directory of a given dataset and either producing a CSV file with the content of the fileset (similar to &lt;code&gt;ls -lR&lt;/code&gt;) or creating a &lt;code&gt;tar&lt;/code&gt; file of the content with the associated metadata (owner, timestamps, and permissions).
It keeps symbolic links but ignores device nodes and special files.
It is possible to configure it to skip certain objects (provided as lists of IDs), which is useful when working with really large datasets.
The current version performs caching of certain objects, most notably the block trees, and achieves about 11 MiB/s read speed on our faulted server without any read-ahead optimisations.
A peculiar feature is the ability to access the pool remotely via a simple binary TCP protocol, e.g., over an SSH tunnel, which is exactly how I was using it throughout the entire development process.
This was more a result of the way the program was developed than a deliberate design decision, but I think it&amp;rsquo;s pretty nifty.
ZFS mirror and raidz1 vdevs as implemented in ZFS version 10 (the one that an ancient Solaris 10 x86 comes with) are supported.
For raidz1 the tool is able to recover information on faulty devices using the checksum.
Up to date status information is available on the project&amp;rsquo;s GitHub page.&lt;/p&gt;
&lt;p&gt;I really hope nobody will ever need to use this tool.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Recipe: Obtaining Peak VM Size in Pure Fortran</title>
      <link>https://hiliev.eu/post/recipe-obtaining-peak-vm-memory-size-in-pure-fortran/</link>
      <pubDate>Thu, 22 May 2014 12:34:30 +0200</pubDate>
      <guid>https://hiliev.eu/post/recipe-obtaining-peak-vm-memory-size-in-pure-fortran/</guid>
      <description>&lt;p&gt;Often in High Performance Computing one needs to know about the various memory metrics of a given program with the peak memory usage probably being the most important one.
While the &lt;code&gt;getrusage(2)&lt;/code&gt; syscall provides some of that information, it&amp;rsquo;s use in Fortran programs is far from optimal and there are lots of metrics that are not exposed by it.&lt;/p&gt;
&lt;p&gt;On Linux one could simply parse the &lt;code&gt;/proc/PID/status&lt;/code&gt; file.
Being a simple text file it could easily be processed entirely with the built-in Fortran machinery as shown in the following recipe:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-fortran&#34;&gt;program test
  integer :: vmpeak

  call get_vmpeak(vmpeak)
  print *, &#39;Peak VM size: &#39;, vmpeak, &#39; kB&#39;
end program test

!---------------------------------------------------------------!
! Returns current process&#39; peak virtual memory size             !
! Requires Linux procfs mounted at /proc                        !
!---------------------------------------------------------------!
! Output: peak - peak VM size in kB                             !
!---------------------------------------------------------------!
subroutine get_vmpeak(peak)
  implicit none
  integer, intent(out) :: peak
  character(len=80) :: stat_key, stat_value
  !
  peak = 0
  open(unit=1000, name=&#39;/proc/self/status&#39;, status=&#39;old&#39;, err=99)
  do while (.true.)
    read(unit=1000, fmt=*, err=88) stat_key, stat_value
    if (stat_key == &#39;VmPeak:&#39;) then
      read(unit=stat_value, fmt=&#39;(I)&#39;) peak
      exit
    end if
  end do
88 close(unit=1000)
  if (peak == 0) goto 99
  return
  !
99 print *, &#39;ERROR: procfs not mounted or not compatible&#39;
  peak = -1
end subroutine get_vmpeak
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The code accesses the status file of the calling process &lt;code&gt;/proc/self/status&lt;/code&gt;.
The unit number is hard-coded which could present problems in some cases.
Modern Fortran 2008 compilers support the &lt;code&gt;NEWUNIT&lt;/code&gt; specifier and the following code could be used instead:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&#34;language-fortran&#34;&gt;integer :: unitno 

open(newunit=unitno, name=&#39;/proc/self/status&#39;, status=&#39;old&#39;, err=99)
! ...
close(unit=unitno)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With older compilers the same functionality could be simulated using the 
&lt;a href=&#34;http://fortranwiki.org/fortran/show/newunit&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;following code&lt;/a&gt;.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>MPI Programming Basics</title>
      <link>https://hiliev.eu/post/mpi-programming-basics/</link>
      <pubDate>Tue, 18 Mar 2014 17:38:34 +0100</pubDate>
      <guid>https://hiliev.eu/post/mpi-programming-basics/</guid>
      <description>&lt;p&gt;Embracing the current development in educational technologies, the IT Center of the RWTH Aachen University (former Center for Computing and Communication) makes available online the audio recordings of most tutorials delivered during this year&amp;rsquo;s 
&lt;a href=&#34;http://www.itc.rwth-aachen.de/ppces/&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PPCES seminar&lt;/a&gt;.
Participation in PPCES is for free and course materials are available online, but this is the first time when proper audio recordings were taken.&lt;/p&gt;
&lt;p&gt;All videos (presentation slides + audio) are available on the 
&lt;a href=&#34;http://www.youtube.com/channel/UCtdrEoe46tD2IvJJRs_JH1A&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PPCES YouTube channel&lt;/a&gt; under Creative Commons Attribution license.
Course materials are available in the 
&lt;a href=&#34;https://sharepoint.campus.rwth-aachen.de/units/rz/HPC/public/Shared%20Documents/Forms/PPCES%202014.aspx&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;PPCES 2014 archive&lt;/a&gt; under unclear (read: do not steal blatantly) license.&lt;/p&gt;
&lt;p&gt;My own contribution to PPCES - as usual - consists of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Message passing with MPI, part 1:
Basic concepts and point-to-point communication&lt;/p&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/LBgx_S5ougk&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Message passing with MPI, part 2:
Collective operations and often-used patterns&lt;/p&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/CliFXC3kG90&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Tracing and profiling MPI applications with VampirTrace and Vampir&lt;/p&gt;

&lt;div style=&#34;position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;&#34;&gt;
  &lt;iframe src=&#34;https://www.youtube.com/embed/pZVSs__h76Q&#34; style=&#34;position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;&#34; allowfullscreen title=&#34;YouTube Video&#34;&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Big thanks to all the people who made recording and publishing the sessions possible.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Linear Congruency Considered Harmful</title>
      <link>https://hiliev.eu/post/linear-congruency-considered-harmful/</link>
      <pubDate>Sun, 15 Dec 2013 21:43:32 +0100</pubDate>
      <guid>https://hiliev.eu/post/linear-congruency-considered-harmful/</guid>
      <description>&lt;p&gt;Recently I stumbled upon 
&lt;a href=&#34;http://stackoverflow.com/questions/20452420/correct-openmp-pragmas-for-pi-monte-carlo-in-c-with-not-thread-safe-random-numbe&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;this Stack Overflow question&lt;/a&gt;.
The question author was puzzled with why he doesn&amp;rsquo;t see any improvement in the resultant value of $\pi$ approximated using a parallel implementation of the well-known Monte Carlo method when he increase the number of OpenMP threads.
His expectation was that, since the number of Monte Carlo trials that each thread performs was kept constant, adding more threads would increase linearly the sample size and therefore improve the precision of the approximation.
He did not observe such improvement and blamed it on possible data races although all proper locks were in place.
The question seems to be related to an assignment that he got at his university.
What strikes me is the part of the assignment, which requires that he should use a specific 
&lt;a href=&#34;https://en.wikipedia.org/wiki/Linear_congruential_generator&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;linear congruential pseudo-random number generator&lt;/a&gt; (LCPRNG for short).
In his case a terrible LCPRNG.&lt;/p&gt;
&lt;p&gt;An inherent problem with all algorithmic pseudo-random number generators is that they are deterministic and only mimic randomness since each new output is a well-defined function of the previous output(s) (thus the &lt;em&gt;pseudo-&lt;/em&gt; prefix).
The more previous outputs are related together, the better the &amp;ldquo;randomness&amp;rdquo; of the output sequence could be made.
Since the internal state can only be of a finite length, every now and then the generator function would map the current state to one of the previous ones.
At that point the generator starts repeating the same output sequence again and again.
The length of the unique part of the sequence is called the cycle length of the generator.
The longer the cycle length, the better the PRNG.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Linear congruency is the worst method for generating pseudo-random numbers.&lt;/strong&gt;
The only reason it is still used is that it is extremely easy to be implemented, takes very small amount of memory, and it works acceptably well in some cases if the parameters are 
&lt;a href=&#34;https://en.wikipedia.org/wiki/Linear_congruential_generator#Period_length&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;chosen wisely&lt;/a&gt;.
It&amp;rsquo;s just that Monte Carlo simulations are rarely that cases.
So what is the problem with LCPRNGs?
The problem is that their output depends solely on the previous one as the congruential relation is&lt;/p&gt;
&lt;p&gt;$$p_{i+1} \equiv (A \cdot p_i + B),(mod,C),$$&lt;/p&gt;
&lt;p&gt;where $A$, $B$ and $C$ are constants.
If the initial state (the seed of the generator) is $p_0$, then the &lt;em&gt;i&lt;/em&gt;-th output is the result of $i$ applications of the generator function $f$ to the initial state, $p_i = f^i(p_0)$.
When it happens that an output repeats the initial state, i.e., $p_N = p_0$ for some $N &amp;gt; 0$, the generator loops since&lt;/p&gt;
&lt;p&gt;$$p_{N+i} = f^{N+i}(p_0) = f^i(f^N(p_0)) = f^i(p_N) = f^i(p_0) = p_i.$$&lt;/p&gt;
&lt;p&gt;As is also true with the human society, short memory leads to history repeating itself in (relatively short) cycles.&lt;/p&gt;
&lt;p&gt;The generator from the question uses $C = 741025$ and therefore it produces pseudo-random numbers in the range $[0, 741024]$.
For each test point two numbers are sampled consecutively from the output sequence, therefore a total of $C^2$ or about 550 billion points are possible.
Right?
Wrong!
The choice of parameters results in this particular LCPRNG having a cycles length of 49400, which is orders of magnitude worse than the otherwise considered bad ANSI C pseudo-random generator &lt;code&gt;rand()&lt;/code&gt;.
Since the cycle length is even, once the sequence folds over, the same set of 24700 points is repeated over and over again.
The unique sequence covers $49400/C$ or about 6,7% of the output range (which is already quite small).&lt;/p&gt;
&lt;p&gt;A central problem in Monte Carlo simulations is the so called ergodicity or the ability of the simulated system to pass through all possible states.
Because of the looping character of the LCPRNG and the very short cycle length, there are many states that remain unvisited and therefore the simulation exhibits really bad ergodicity.
Not only this, but the output space is partitioned into 16 ($\lceil C/49400\rceil$) disjoint sets and there are only 16 unique initial values (seeds) possible.
Therefore only 32 different sets of points can be drawn from that generator (why 32 and not 16 is left as an exercise to the reader).&lt;/p&gt;
&lt;p&gt;How this relates to the bad approximation of $\pi$?
The method used in the question is a geometric approximation based on the idea that if a set of points ${ P_i }$ is drawn randomly and uniformly from $ [0, 1) \times [0, 1) $, the probability that such a point lies inside a unit circle centred at the origin of the coordinate system is $\frac{\pi}{4}$.
Therefore:&lt;/p&gt;
&lt;p&gt;$$\pi \approx 4\frac{\sum_{i=1}^N \theta{}(P_i)}{N},$$&lt;/p&gt;
&lt;p&gt;where $\theta{}(P_i)$ is an indicator function that has a value of 1 for all points ${ P(x,y): x^2+y^2 \leq 1}$ and 0 for all other points and $N$ is the number of trials.
Now, it is well known that the precision of the approximation is proportional to $1/\sqrt{N}$ and therefore more trials give better results.
The problem in this case is that due to the looping nature of the LCPRNG, the sum in the nominator is simply $m \times S_0$, where&lt;/p&gt;
&lt;div&gt;$$S_0 = \sum_{i=1}^{24700} \theta{}(P_i).$$&lt;/div&gt;
&lt;p&gt;For large $N$ we have $m \approx N/24700$ and therefore the approximation is stuck at the value of:&lt;/p&gt;
&lt;p&gt;$$\tilde{\pi} = 4 \frac{\sum_{i=1}^{24700} \theta(P_i)}{24700}.$$&lt;/p&gt;
&lt;p&gt;It doesn&amp;rsquo;t matter if one samples 24700 points or if one samples 247000000 points.
The result is going to be the same and the precision in the latter case is not going to be 100 times better but rather exactly the same as in the former case with 9999 times the computational resources used in the former case now effectively wasted.&lt;/p&gt;
&lt;p&gt;Adding more threads could improve the precision if:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;each thread has its own PRNG, i.e. the generator state is thread-private and not
globally shared, and&lt;/li&gt;
&lt;li&gt;the seed in each thread is chosen carefully so not to reproduce some other thread&amp;rsquo;s
generator output.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It was already shown that there are at most 32 unique sets of points and therefore using only up to 32 threads makes sense with an expected 5,7-fold increase of the precision of the approximation (less than one decimal digit).&lt;/p&gt;
&lt;p&gt;This leaves me scratching my head: was his docent grossly incompetent or did s/he deliberately give him an exercise with such a bad PRNG so that he could learn how easily beautiful Monte Carlo methods are spoiled by bad pseudo-random generators?&lt;/p&gt;
&lt;p&gt;It should be noted that having a cyclic PRNG is not necessarily a bad thing.
Even if two different seed values result in the same unique sequence, they usually start the generator output at different positions in the sequence.
And if the sample size is small relative to the cycle length (or respectively the cycle length is huge relative to the sample size), it would appear as if two independent sequences are being sampled.
Not in this case though.&lt;/p&gt;
&lt;p&gt;Some final words.
Never use linear congruential PRNGs for Monte Carlo simulations!
Ne-ver!
Use something like 
&lt;a href=&#34;https://en.wikipedia.org/wiki/Mersenne_twister&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Mersenne twister MT19937&lt;/a&gt; instead.
Also don&amp;rsquo;t try to reinvent 
&lt;a href=&#34;https://en.wikipedia.org/wiki/RANDU&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;RANDU&lt;/a&gt; with all its ill consequences to the simulation science.
Thank you!&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
