Partially synchronising a structured folder (on a Synology machine) to a smaller storage

21 November 2014

One of the most enjoyable things writing code has to offer is automating tasks to the point where you can completely forget about them. There is nothing like walking down the street and getting an email on your phone from some long forgotten process, reminding you that something happened and thinking “I can’t remember how that works but I built that”. Recently I’ve been dusting off and cleaning up some of my personal scripts to publish up on my GitHub account and in the process I stumbled upon a problem I solved a long time ago which on initial glance seemed really simple but ended up being rather treacherous.

The situation is as follows: imagine a Synology server storing a video library where files are stored with a certain directory structure. This structure is important because Kodi (previously XBMC) requires specific naming conventions in order to succesfully index TV shows in its library.

Now imagine that you move out to go to college and scramble together a small machine to use as a HTPC. The cpu is fine, the gpu is fine, but the storage capacity is not nearly as much as you’ve grown accustomed to. You can’t mount the server at home as remote storage because your Austrian roommate will forever nag you about the traffic messing up his Dota game and the connection isn’t fast enough to stream anyway. So what do you do?

I figured that the most logical solution was to just rsync all the files in the directory to the local machine until the volume was full, starting with file that was modified last. Seems easy right? How about something with

find . -type f -printf '%[email protected] %p\n'

Just pipe that into sort and xargs to rsync the files individually to the HTPC until the volume is full. Unfortunately its not that simple. Synology machines are equipped with Busybox binaries instead of proper GNU utilities. This means that there is no -printf flag on the find or the xargs command, the stat command is severely limited and the default shell is Almquist shell. This limits the options and convenience quite a bit. Some fancy glob shelling trick won’t be possible too. A savage option would be to pipe something from ls, but even that was off the table due to the not functioning ‘-e List full date and time’ flag of the BusyBox implementation. A compromise could be made where the ‘-mtime’ find flag could be used to simply get all the files not older than a certain amount of time but I didn’t like the idea of having the hdd just sitting there idely until new files came by.

The next step could be to look into options like utilizing the Debian chroot or to find some other way to get better binaries on the system. None of that seems like the ‘clean’ way to do it but luckily there is another route to go.

My solution consists of two scripts. One Python script that runs on the Synology server to list all the files in the source directory and to create symlinks to the latest files in a temporary directory until a certain amount of bytes of data is reached. The second script is a small shell script that runs on the remote machine.

This way every time the latestsync.sh script is run on the small server, a directory is made on the large server, the dir is filled up with symlinks to files sorted by age in a copy of the original directory structure from the specified path until this new temporary directory has reached the allocated amount of bytes specified in the script. Then rsync on the smaller server will start pulling everything from that directory down to the local volume.

To set it up drop the latestfilter.py script somewhere on the remote server like for example

wget https://raw.githubusercontent.com/vdloo/dotfiles/master/code/scripts/system/latestfilter.py -P /volume1/RAID5/other/code/scripts/system

On the local server download the latestsync.sh script to some directory

wget https://raw.githubusercontent.com/vdloo/dotfiles/master/code/scripts/system/latestsync.sh -P ~/.dotfiles-public/code/scripts/system/

And run the script once in a while using a cron job like this based on your configuration

*       7,12,17,22      *       *       *       SHELL=/bin/bash PATH=/bin:/sbin:/usr/bin:/usr/sbin bash /home/vdloo/.dotfiles-public/code/scripts/system/latestsync.sh -h example.com -u vdloo -p 22 -a 1500000000000 -l /mnt/lvmdisk/series > /dev/null 2>&1

Permalink


Implementing Fibonacci in finite fields, in racket

03 July 2014

A couple of days ago Ruud van Asseldonk submitted a post about Fibonacci numbers in finite fields. In the post it is first explained that Binet's formula holds up in a finite field, then an implementation of this method in c++ is presented. Since I have started going through SICP, I thought it would be a nice exercise to port this implementation to scheme.

The author's implementation consists of five functions:

  1. addmod
  2. submod
  3. mulmod
  4. powmod
  5. fib

Porting the first two function is pretty straight forward.

The original addmod function

u64 addmod(u64 a, u64 b, u64 p)
{
    if (p - b > a) return a + b;
    else return a + b - p;
}

becomes:

(define (addmod a b p)
    (if  (> (- p b) a)
      (+ a b)
      (- (+ a b) p)
    )
)

and the original submod function

u64 submod(u64 a, u64 b, u64 p)
{
    if (a >= b) return a - b;
    else return p - b + a;
}

becomes:

(define (submod a b p)
    (if (>= a b)
      (- a b)
      (+ (- p b) a)
    )
)

The mulmod function is a bit more complicated:

u64 mulmod(u64 a, u64 b, u64 p)
{
    u64 r = 0;

    while (b > 0)
    {
        if (b & 1) r = addmod(r, a, p);
        b >>= 1;
        a = addmod(a, a, p);
    }

    return r;
}

The implementation in scheme displays more clearly the difference between a functional programming language and the imperative c++. The implementation in scheme is ostensibly inside out compared to the original.

(define (mulmod a b p)
    (mulmod_rec a b p 0)
)
(define (mulmod_rec a b p r)
    (if (> b 0)
      (mulmod_rec (addmod a a p)
          (arithmetic-shift b -1)
          p
          (if (= 1 (bitwise-and b 1)) (addmod a r p) r)
      )
      r
    )
)

Instead of a while loop, the scheme implementation of mulmod calls itself. Scheme doesn't have looping constructs because of the way the language was designed; there is no need for a special form to do this because Scheme allows you to conveniently define iterative processes by means of recursive procedures.

Porting this function was the moment where I decided to move over from mit-scheme to plt-scheme, also known as Racket. In order to stay close to the original I wanted to use the bitwise operations AND (&) and the right arithmetic shift (>>=). Unfortunately to the best of my knowledge mit-scheme doesn't offer built in bitwise operations on anything but fixnums and bit strings, but luckily racket does.

Next up is powmod:

u64 powmod(u64 a, u64 e, u64 p)
{
    u64 r = 1;

    while (e > 0)
    {
        if (e & 1) r = mulmod(r, a, p);
        e >>= 1;
        a = mulmod(a, a, p);
    }

    return r;
}

Which follows almost the exact same structure as mulmod:

(define (powmod a e p)
    (powmod_rec a e p 1)
)
(define (powmod_rec a e p r)
    (if (> e 0)
        (powmod_rec (mulmod a a p)
            (arithmetic-shift e -1)
            p
            (if (= 1 (bitwise-and e 1)) (mulmod r a p) r)
        )
        r
    )
)

Note that instead of doing assigning 0 to r before the while loop in mulmod we seed that 0 as a parameter every time mulmod is called to mulmod_rec. The same goes for powmod and powmod_rec.

To tie it all together we have the fib function:

u64 fib(u64 n)
{
    u64 p     = 0xa94fad42221f27ad;
    u64 v     = 0x0b92025517515f58;
    u64 v_inv = 0x242d231e3eb01b01;

    u64 a        = powmod(1 + v, n, p);
    u64 b        = powmod(p + 1 - v, n, p);
    u64 pow2_inv = powmod(2, p - 1 - n, p);

    u64 diff   = submod(a, b, p);
    u64 factor = mulmod(v_inv, pow2_inv, p);

    return mulmod(diff, factor, p);
}

Which in Racket looks like:

(define (fib n)
    (define (q)     12200160415121876909)
    (define (v)     833731445503647576)
    (define (v_inv) 2606778372125104897)
    (mulmod (submod (powmod (+ 1 (v)) n (q))
              (powmod (- (+ (q) 1) (v)) n (q))
              (q)
          )
          (mulmod (v_inv)
              (powmod 2 (- (q) 1 n) (q))
              (q)
          )
          (q)
    )
)

So now we have all the components to generate a Fibonacci number. It would of course be way more gratifying to have the program compute all the numbers up to the maximum of N=93 instead of simply computing one at a time. To do that we can use the when evaluator in a recursive procedure. Full disclosure though: it would probably be faster to use the regular iterative method if you are going to calculate all the numbers anyway, but that doesn't take away the fun.

(define (fib-iter n count result)
    (display "Computing fib for ")(display count)(display ": ")
    (display result)(newline)
    (when (< count n)
      (fib-iter n
            (+ count 1)
            (fib count)
      )
    )
)
(define (multifib n)
    (fib-iter n 1 0)
)
(multifib 93)

And that's all there is to it: a racket fibonacci calculator that runs in constant time by use of finite fields. Lots of praise to the original author for this fascinating nugget of information and for figuring out how to translate it into a working program.

You can find the put together version of this racket script on GitHub Gist

Permalink


The SICP reading project

02 July 2014

Programming all day is fun, but the reality is that I am in college studying something non-technical. I just wrapped up my third year of pursuing a BBA in International Media and Entertainment Management, a degree that focuses mainly on topics such as marketing, social research, and as the name suggests: management. While those fields are interesting in their own right, it is not the thought of The Lean Startup methodology or Alan Bryman's Social Research Methods that have me behind a keyboard late at night drinking coffee. Tinkering with computers is something special, there truly is nothing like it. Writing code gives you the chance to create systems only limited in their complexity and potency by your own capabilities and imagination.

Because the curriculum of my formal studies rarely ventures into the technical side of things I have to resort to refining my programming skills in my free time. Writing code during school nights and in the weekends is great but doing so during a break is great on a whole other level. The prospect of being able to sit down and hack away at the ideas in my head without disturbance is what has me counting down the days on the calendar for these rare long stints of uninterrupted time. Reaching that state of total immersion and focus is just a bit easier if the deadline for the next paper isn't in the back of your head.

For the past two years I have tried to use a large part of my summer break to sit down and teach myself something of value. In the summer of 2012 I spent many nights poring over Beej's Guide to Network Programming and taking apart the source of various packet forging tools in order to cook up a way to send data over wifi without associations, a project that in retrospect turned out to be an impractical solution for the problem that I was trying to solve but nonetheless served as a valuable learning experience. Last year I used the summer break to brush up on my PHP. I took a five month back-end web development internship after the summer and I felt the need to prepare myself because I wasn't really sure what to expect when transitioning from a hobbyist writing code to doing so in a professional environment for the first time.

Even though I ended up doing fine during my internship, there were moments where I struggled because of being a self-taught programmer. Those moments could often be attributed to situations where I had simply never before seen the problem at hand. There were also situations where I had encountered the problem before but in my non-traditional studies had only seen a specific implementation of the solution, failing to comprehend the abstract concept behind it.

When I think about autodidactism I remember reading Richard Feynman's biography Surely You're Joking, Mr. Feynman! where he mentions "having peculiar methods for doing integrals" because of being self-taught using the book Advanced Calculus by Frederick S Woods. Feynman argues that he had an advantage over his peers because he had a different "box of tools". He explains that the advantage manifested itself because only after his peers would have exhausted their own more standard "toolkit" they would come to him with their problem. If he was able to solve it with his less conventional methods he would then be regarded in a positive manner. During my internship I noticed this pattern as well, but unfortunately the dynamic works both ways. There were times where I felt at an advantage because of possessing some skills that might not generally be taught at a formal education, like jumping around in the command-line and configuring servers. A unix environment had always been my IDE so these things were native to me, but when it came to things like design principles and programming paradigms I felt at a loss. There was this layer of common understanding that the people with a background in formal education seemed to be having, a foundation of shared knowledge that made them able to communicate ideas to each other on a basis of relating the problem presented to a set of core principles that transcended implementation. A shared foundation that I was never forced to pick up on in my self studies.

Differentiating between programmers that are self-taught and those that have had a more traditional education is difficult because in reality all programmers are self-educated. The only way to really learn is by tinkering, and that is something that in the end you have to do on your own. Learning software development is an iterative process where you bounce back and forth between doing and soaking up information. Being self-taught and being formally educated are not mutually exclusive in programming, this is what makes it possible for people to be formally educated and have unconventional methods but also possible to be self-taught and have conventional methods. It is perfectly possible for someone to sit at home all day studying on coursera or reading through college textbooks to learn the same things as someone who is attending a computer science course at a university. Conversely it is also possible for someone who is a computer science major to pick up on unconventional skills because he chooses to start working on his assignments without reading the course literature, we can all understand the urge to start building the proverbial IKEA closet without reading the manual. What I observed for the real difference to be among programmers is the distinction between having learned using a structured set of information, and by means of the concept of "bricolage".

The word "bricolage" comes from the French "bricoleur", which means "handyman". Bricolage literally translates into the infinitive phrase "to tinker". Anthropologist Claude Lévi-Strauss used it in his 1962 book The Savage Mind to draw an analogy about mythical thought and how it "can reach brilliant unforeseen results on the intellectual plane" even though "the characteristic feature of mythical thought is that it expresses itself by means of a heterogeneous repertoire which, even if extensive, is nevertheless limited". This quote describes that mythical thought tries to solve new problems with the use of an existing collection of resources. Lévi-Strauss continues with his analogy where he states that:

The "bricoleur" is adept at performing a large number of diverse tasks; but, unlike the engineer, he does not subordinate each of them to the availability of raw materials and tools conceived and procured for the purpose of the project. His universe of instruments is closed and the rules of his game are always to make do with "whatever is at hand"

The analogy of bricolage fits writing code really well, it is at the very core of hacker culture to celebrate the creative misuse of whatever tools and systems that are available. It is exactly this spirit that drives people to pursue incredible feats such as for example implementing a LISP in sed. It says a lot about the culture around programming that the collective instantly recognizes the ingenuity of such projects and have it upvoted to the top of Hacker News and /r/programming. The problem however with bricolage in a context where any task can be completed with any tool is that it becomes unnecessary to look for alternative and more appropriate solutions when encountering a new problem. If any task can be completed by "working with what you have" you are never forced to expand beyond your comfort zone. Here lies the danger of the Law of the instrument.

The general process for my learning has generally been a. form an idea of what needs to be done, b. find a way to do it, c. do it. This works very well for doing rapid development and picking up a wide range of knowledge along the way, however this bottom up approach forces you only to learn the bare essentials for everything you want to do, making it more difficult to form a big picture in the long run. Step b. "find a way to do it" is where the bricolage manifests itself. The way this works is that you start searching for the things you know (the available materials). You google for keywords that in your mind are related to the goal you are trying to achieve and when you find the solution you remember where to find it for the next time. This cycle repeats itself until you have a network of knowledge that enables you to solve many problems, but it won't necessarily be the best solutions. You keep adding knowledge to your toolbox based on the limited set of knowledge you already had, essentially bubbling yourself. You can not google for what you do not know.

Only looking for what you need when you need it introduces many blind spots (unknown unknowns). I believe that in order to minimize the effect the things that you don't know have on your work you can do two things: 1. get exposed to ideas novel to you and 2. move as far away from implementation and as close to concepts as possible by focusing on abstraction. The first one is like watching cable television instead of Netflix, how would you ever know you like watching manatees on National Geographic if all you do is queue the next Person of Interest episode? The second one is about learning "why" instead of "how". Both those solutions point to a formal education in computer science because a university will guide you by telling you what to study. Seeing as I am already in the process of getting a different degree I'm going to have to settle for something else.

I like books. Fiction, non-fiction, everything. They are a good use of trees and bytes and they often present information on a topic from start to finish in a nice structured fashion, something that the web often does not offer. Whenever you scour the internet for recommendations for programming books the top results are often K&R and SICP. While finishing K&R is still on my bucket list, SICP seems a more appropriately fit for current learning goals.

Structure and Interpretation of Computer Programs authored by Hal Abelson, Jerry Sussman and Julie Sussman is a well known computer science textbook that used to be taught as an introductory course at MIT. It is a book often praised for its focussed attention on abstraction and being one of the first book to teach "programming" instead of teaching a "programming language". Many have argued that even though the book was originally published in 1985, there still hasn't been anything written that teaches the concepts from the book better than the book itself. The book uses mit-scheme (a LISP dialect) to explain it's concepts. LISP is a family of programming languages that has been around since 1958 and is still relevant to the current industry, especially since functional programming languages are on the rise again. You can't bat an eye before come across someone's blog about that next new cool thing in Haskell or Clojure.

So as an attempt to break out of the cycle of predominantly learning only what I need to know in order to make things work, I have started working through the SICP exercises. So why specifically this book instead of any other book on computer science? I could argue that articles such as the one from Brian Harvey or Peter Norvig and Paul Graham's reviews on Amazon convinced me, but the reason why I haven't give up yet is because of the available community support. I have worked through a couple of exercises already and because of the large amount of blogs available from people who have already done the exercises, there is a unique opportunity to look into the thought process others had solving the same problems, which is really nice.

You can follow my take on the SICP exercises on Github.

Permalink