b64: A software collection of Base64 encoders
Not logged in

b64

Description
Purpose
Get the source code
Details and requirements
Test suites
Benchmarks
A word about newline at EOF
A word about reading from STDIN and flushing to STDOUT "on-the-fly"
Table 1: current status of make testflush for different encoders and platforms
A word about Awk and builtin bitwise functions
A word about Awk and the shebang line
A word about using Erlang applications as command line tools
A word about the node executable for JavaScript programs (for Linux users only)
Alternative encoder
Links

Description top

b64 - A collection of Base64 encoding tools, written in different programming languages.

Usage top

where <base64_encoder> is one of the following:

Purpose top

The main purpose is to offer a "language collection" of Base64 encoders.
This project should be seen as a collection of sample code.
The code is not necessarily optimized, even if there have been attempts to do so.
The source code is there to be copied, pasted, adapted, and improved.
Nevertheless, the code has been tested, and should work out-of-the-box.
If you find or suspect a bug or suggest any improvements, please contact me at johan at kuu dot se.

The code aims to be standalone, with as few dependencies as possible, using no external programs or libraries whenever possible.
Exceptions for this are the Bourne Shell and Awk versions, which depend on the external od tool.
Anyhow, od is assumed to be included in any modern UNIX-like distribution, be it Linux, *BSD, MSYS/MSYS2, or MacOSX.
Another exception is the Perl version b64_pm.pl, which depends on MIME::Base64::Perl. (Anyhow, there is also two standalone Persion versions.)

The main purpose of this project is to encode to Base64.
There are no plans for this project to include Base64 decoders.
There is one single script for this, b64_decode.pl, which is basically used to test the encoders.

Get the source code top

To get all files, clone the fossil repository:

fossil clone http://kuu.se/fossil/b64.cgi b64.fossil

You can also browse the files here.

Details and requirements top

Encoders

Decoder

Makefile

Test suites top

Type make helptest for details how to run the test suites.

Benchmarks top

Type make helpbench for details how to run the benchmarks.
Five benchmarks have been considered:

  1. Small text file (leviathan.txt)
  2. Small binary file (kuuse.png)
  3. Big binary file (flyttkortlegacy.png)
  4. Redirection, big binary file (flyttkortlegacy.png)
  5. Pipe, small text file (leviathan.txt)

Note: The results from these benchmarks are only to be seen as very rough indicators for differences between scripts and/or platforms.

A word about newline at EOF. top

Encoding a text file and encoding the very same text copied into a variable will not produce the same result (the padding will be different).
This is due to the file containing a last newline byte before EOF, so the string length and the file size will differ one byte in size.
To get an identical encoded result, we either have to add a newline to the string variable, or remove the last newline from the file.
To create a file without the last newline, try this:

perl -pe 'chomp if eof' leviathan.txt > leviathan_without_last_newline.txt

A word about reading from STDIN and flushing to STDOUT "on-the-fly". top

All the encoders can run from STDIN, using either redirection or a pipe, and output is always sent to STDOUT.
Anyhow, different languages deals differently with I/O operations, both when it comes to reading input and flushing output.
Check here for all the gory details about I/O. (Credits to my friend Mattias for the link.)
This may be tricky when it comes to read data and encode it "on the fly".
This can be seen running the test suite

    make testflush

The first problem are the shell scripts (b64.sh, b64_awk.sh), which read lines of data, not individual bytes.
This makes it harder to encode a multiple of 3 bytes, which is needed to encode to Base64.
Besides, flushing "on-the-fly" output from shell scripts does not work as expected.
Even when using the stdbuf tool, data to STDOUT do not flush correctly.
Actually, the two test cases mentioned above are the only ones which fail.

The second problem is Awk.
I cannot make Awk flush its own data "on-the-fly", but had to use the stdbuf tool.
It is not an elegant solution, but at least data is flushed correctly "on-the-fly".
Note that on MS Windows, stdbuf is not available, so on MS Windows, b64.awk and two more test cases fail when running make testflush

The third problem is Python.
To make b64.py flush data "on-the-fly" correctly, the input buffer must be very small. It works, but is not efficient.

The fourth problem is Erlang. Or better said, my lack of knowledge about Erlang.
Flushing "on-the-fly" data can probably be implemented efficiently, but I choose to use a small buffer, much the way the Python version was fixed.

Anyone with a solution to any of these problem is very welcome to contact me at johan at kuu dot se.

Table 1 shows the current status of make testflush for different encoders and platforms.

Table 1: Current status of make testflush for different encoders and platforms

ProgramPlatformLinux Ubuntu 14.04FreeBSD 10.1MSYS2/MS Windows 8Mac OSX 10.5.8
b64.shERRORERRORERRORNot tested
b64.awkOK *, **OK *, **ERRORNot tested
b64_no_builtin_bitwise_functions.awkOK *, **OK *, **ERRORNot tested
b64_awk.shERRORERRORERRORNot tested
b64.plOKOKOKNot tested
b64_pm.plOKOKOKNot tested
b64_opt.plOKOKOKNot tested
b64.pyOK **OK **OK **Not tested
b64.jsOKOKOKNot tested
b64OKOKOKNot tested
b64.classOKOKOKNot tested
b64.beamOK **OK **OK **Not tested

*  Works with the help of the external tool 'stdbuf'
** Works when using a small input buffer when reading from STDIN, i.e. BUF_IN_SIZE_STDIN_FLUSH_UGLY_HACK = 3 * 16

A word about Awk and builtin bitwise functions. top

When encoding to Base64, bitwise operators/functions are mandatory, no matter which programming language is used.
Anyhow, in the case of Awk, not also versions come with builtin bitwise functions.
There are at least three Awk versions:

  1. GNU Awk (a.k.a. gawk on some platforms) - builtin bitwise functions? YES
    • Linux
    • MSYS/MinGW
  2. OpenBSD Awk - builtin bitwise functions? YES
    • OpenBSD (quite obvious, eh?)
  3. BSD Awk - builtin bitwise functions? NO
    • FreeBSD
    • NetBSD
    • Mac OS X

In the third case (FreeBSD, NetBSD, Mac OS X), the bitwise functions have to be emulated.
This is far less efficient than using an Awk version with builtin bitwise functions.
If you want to do some serious Awk development with Base64 on these platforms, you should probably stick to the gawk version when available.

A word about Awk and the shebang line. top

The shebang line for Awk is tricky.
First, Awk uses the '-f' flag when running a program file.
This adds an argument to the shebang line, which is ok when using awk with an absolute path, but that is not portable:
What should be used?

#!/usr/bin/awk -f

or

#!/usr/local/bin/awk -f

The someone (me, maybe?) says: Use this!

#!/usr/bin/env -S awk -f

Unfortunately, that solution only works with BSD env. It doesn't work on Linux nor on Cygwin, so it's not portable.
We also have to autodetect if gawk is present on the system, because of the builtin bitwise functions issue.
So we end up with this:

#!/bin/sh
false {
    eval "AWK=`(gawk 'BEGIN{print \"gawk\"}') 2>/dev/null||echo awk`"
    eval "exec ${AWK} -f `which $0` ${1+\"$@\"}"
}

As we see, the shebang line says it actually is a shell script.
Basically it says: Try to execute gawk, and if that fails, use awk instead.
This should be portable to all platforms.

Shebang credits to:
http://perfec.to/shebang/shebang.nawk.txt

A word about using Erlang applications as command line tools top

Writing the Erlang version of b64 came out of pure curiousity.
Erlang is not very user-friendly when it comes to invoke a program from the command line, especially including command-line arguments.
Erlang is a very powerful language, but it isn't focused on creating CLI tools.
Read the code as it was created: with curiousity. ;-) If you want to do some serious Erlang development with Base64, you should probably stick to Erlang's own base64 module:
http://www.erlang.org/doc/man/base64.html

A word about the node executable for JavaScript programs (for Linux users only) top

Problem:
The 'node' binary is called 'nodejs' on some Linux distros, so the following "shebang-line" will fail.

#!/usr/bin/env node

Fix:
Debian/Ubuntu: sudo apt-get install nodejs-legacy
Other distros: sudo ln -s /usr/local/bin/nodejs /usr/bin/node

More info:
http://stackoverflow.com/questions/21168141/can-not-install-packages-using-node-package-manager-in-ubuntu/21171188#21171188

Alternative encoder top

An easy way to Base64 encode using openssl:

Links top

This page:
http://kuu.se/fossil/b64.cgi
Theory:
http://en.wikipedia.org/wiki/Base64
RFC:
https://tools.ietf.org/html/rfc4648
The 'base64' command tool, available for many platforms:
http://www.fourmilab.ch/webtools/base64/
Another C implementation:
http://base64.sourceforge.net/
A C/C++ implementation as a lib:
http://libb64.sourceforge.net/
A nice web interface for encoding images to CSS classes:
http://b64.io/
An efficient JavaScript implementation:
http://bl.ocks.org/mbostock/492147
Misc Base64:
https://en.wikibooks.org/wiki/Algorithm_Implementation/Miscellaneous/Base64