b64
Description
Purpose
Get the source code
Details and requirements
Test suites
Benchmarks
A word about newline at EOF
A word about reading from STDIN and flushing to STDOUT "on-the-fly"
Table 1: current status of make testflush
for different encoders and platforms
A word about Awk and builtin bitwise functions
A word about Awk and the shebang line
A word about using Erlang applications as command line tools
A word about the node
executable for JavaScript programs (for Linux users only)
Alternative encoder
Links
Description top
b64 - A collection of Base64 encoding tools, written in different programming languages.
Usage top
- File:
<base64_encoder> leviathan.txt
- Redirection:
<base64_encoder> < leviathan.txt
- Pipe:
cat leviathan.txt | <base64_encoder>
where <base64_encoder>
is one of the following:
b64.sh
b64.awk
b64_no_builtin_bitwise_functions.awk
b64_awk.sh
b64.pl
b64_opt.pl
b64_pm.pl
b64.py
b64.js
b64
java b64
erl -noshell -s b64 main -s init stop
Purpose top
The main purpose is to offer a "language collection" of Base64 encoders.
This project should be seen as a collection of sample code.
The code is not necessarily optimized, even if there have been attempts to do so.
The source code is there to be copied, pasted, adapted, and improved.
Nevertheless, the code has been tested, and should work out-of-the-box.
If you find or suspect a bug or suggest any improvements, please contact me at johan at kuu dot se
.
The code aims to be standalone, with as few dependencies as possible, using no external programs or libraries whenever possible.
Exceptions for this are the Bourne Shell and Awk versions, which depend on the external od
tool.
Anyhow, od
is assumed to be included in any modern UNIX-like distribution, be it Linux, *BSD, MSYS/MSYS2, or MacOSX.
Another exception is the Perl version b64_pm.pl
, which depends on MIME::Base64::Perl
. (Anyhow, there is also two standalone Persion versions.)
The main purpose of this project is to encode to Base64.
There are no plans for this project to include Base64 decoders.
There is one single script for this, b64_decode.pl
, which is basically used to test the encoders.
Get the source code top
To get all files, clone the fossil repository:
fossil clone http://kuu.se/fossil/b64.cgi b64.fossil
You can also browse the files here.
Details and requirements top
Encoders
b64.sh
Bourne shell script.
This version depends onsh
andod
, both available on all modern UNIX-like systems, including Linux, *BSD, MSYS, Cygwin, MacOSX.
NOTE:
Very slow on MSYS/Windows XP.
Anyhow, when tested on MSYS64/MS Windows 8, performance is acceptable, even on big binary files.b64.awk
Awk script. Depends on theawk
executable.
This versions uses the built-in bitwise operator functions, such as:and()
,or()
,lshift()
,rshift()
.b64_no_builtin_bitwise_functions.awk
Awk script. Depends on theawk
executable.
This versions emulates the bitwise operator functions, such as:and()
,or()
,lshift()
,rshift()
.
To be used only with versions of Awk lacking built-in bitwise operator functions.
NOTE:
Not surprisingly, this versions is more inefficient thanb64.awk
.
Very slow on big binary files.b64_awk.sh
Awk embedded in Bourne shell script.
Auto-detects if Awk supports built-in bitwise operator functions or not.
It can be seen as a platform independent combined version of the two Awk scripts above.
This version depends onsh
andawk
.b64.pl
Perl script - standalone.b64_pm.pl
Perl script - requiresMIME::Base64::Perl
(included in this repository).
Uses a more optimized (but also less legible!) encoder thanb64.pl
.b64_opt.pl
Perl script - standalone and optimized. Combinesb64.pl
andb64_pm.pl
b64.py
Python script. Depends onpython
. Has been tested both withpython2
andpython3
.b64.js
The JavaScript version depends on
node
.
Note that the executable may be callednode.js
on some Linux Distributions, which may cause some problems. Read moreb64
C (source:b64.c
). The C version depends on a C compiler. Tested withgcc
andclang
.b64.class
Java (source:b64.java
). The Java version depends onjava
, and needsjavac
to compile. Any version will do.b64.beam
Erlang (source:b64.erl
). The Erlang version requires theerl
shell, and needserlc
to compile.
Decoder
b64_decode.pl
Perl script - standalone. Only used to test the encoders.
Makefile
Makefile
Requiresmake
anddiff
.
(NOTE:diff
is not included in MSYS2 by default. Install:pacman -S diffutils
) Tool to compile and run the C, Java, and Erlang versions.
Typemake
ormake help
to get help instructions.
The Makefile also includes test suites and benchmarks for the different encoders.
Typemake helptest
to get instructions about how to run the test suites.
Typemake helpbench
to get instructions about how to run the benchmarks.
Test suites top
Type make helptest
for details how to run the test suites.
Benchmarks top
Type make helpbench
for details how to run the benchmarks.
Five benchmarks have been considered:
- Small text file (
leviathan.txt
) - Small binary file (
kuuse.png
) - Big binary file (
flyttkortlegacy.png
) - Redirection, big binary file (
flyttkortlegacy.png
) - Pipe, small text file (
leviathan.txt
)
Note: The results from these benchmarks are only to be seen as very rough indicators for differences between scripts and/or platforms.
A word about newline at EOF. top
Encoding a text file and encoding the very same text copied into a variable will not produce the same result (the padding will be different).
This is due to the file containing a last newline byte before EOF, so the string length and the file size will differ one byte in size.
To get an identical encoded result, we either have to add a newline to the string variable, or remove the last newline from the file.
To create a file without the last newline, try this:
perl -pe 'chomp if eof' leviathan.txt > leviathan_without_last_newline.txt
A word about reading from STDIN and flushing to STDOUT "on-the-fly". top
All the encoders can run from STDIN, using either redirection or a pipe, and output is always sent to STDOUT.
Anyhow, different languages deals differently with I/O operations, both when it comes to reading input and flushing output.
Check here for all the gory details about I/O. (Credits to my friend Mattias for the link.)
This may be tricky when it comes to read data and encode it "on the fly".
This can be seen running the test suite
make testflush
The first problem are the shell scripts (b64.sh
, b64_awk.sh
), which read lines of data, not individual bytes.
This makes it harder to encode a multiple of 3 bytes, which is needed to encode to Base64.
Besides, flushing "on-the-fly" output from shell scripts does not work as expected.
Even when using the stdbuf
tool, data to STDOUT do not flush correctly.
Actually, the two test cases mentioned above are the only ones which fail.
The second problem is Awk.
I cannot make Awk flush its own data "on-the-fly", but had to use the stdbuf
tool.
It is not an elegant solution, but at least data is flushed correctly "on-the-fly".
Note that on MS Windows, stdbuf
is not available, so on MS Windows, b64.awk
and two more test cases fail when running make testflush
The third problem is Python.
To make b64.py
flush data "on-the-fly" correctly, the input buffer must be very small.
It works, but is not efficient.
The fourth problem is Erlang. Or better said, my lack of knowledge about Erlang.
Flushing "on-the-fly" data can probably be implemented efficiently, but I choose to use a small buffer, much the way the Python version was fixed.
Anyone with a solution to any of these problem is very welcome to contact me at johan at kuu dot se
.
Table 1 shows the current status of make testflush
for different encoders and platforms.
Table 1: Current status of make testflush
for different encoders and platforms
ProgramPlatform | Linux Ubuntu 14.04 | FreeBSD 10.1 | MSYS2/MS Windows 8 | Mac OSX 10.5.8 |
---|---|---|---|---|
b64.sh | ERROR | ERROR | ERROR | Not tested |
b64.awk | OK *, ** | OK *, ** | ERROR | Not tested |
b64_no_builtin_bitwise_functions.awk | OK *, ** | OK *, ** | ERROR | Not tested |
b64_awk.sh | ERROR | ERROR | ERROR | Not tested |
b64.pl | OK | OK | OK | Not tested |
b64_pm.pl | OK | OK | OK | Not tested |
b64_opt.pl | OK | OK | OK | Not tested |
b64.py | OK ** | OK ** | OK ** | Not tested |
b64.js | OK | OK | OK | Not tested |
b64 | OK | OK | OK | Not tested |
b64.class | OK | OK | OK | Not tested |
b64.beam | OK ** | OK ** | OK ** | Not tested |
* Works with the help of the external tool 'stdbuf'
** Works when using a small input buffer when reading from STDIN, i.e. BUF_IN_SIZE_STDIN_FLUSH_UGLY_HACK = 3 * 16
A word about Awk and builtin bitwise functions. top
When encoding to Base64, bitwise operators/functions are mandatory, no matter which programming language is used.
Anyhow, in the case of Awk, not also versions come with builtin bitwise functions.
There are at least three Awk versions:
- GNU Awk (a.k.a.
gawk
on some platforms) - builtin bitwise functions? YES- Linux
- MSYS/MinGW
- OpenBSD Awk - builtin bitwise functions? YES
- OpenBSD (quite obvious, eh?)
- BSD Awk - builtin bitwise functions? NO
- FreeBSD
- NetBSD
- Mac OS X
In the third case (FreeBSD, NetBSD, Mac OS X), the bitwise functions have to be emulated.
This is far less efficient than using an Awk version with builtin bitwise functions.
If you want to do some serious Awk development with Base64 on these platforms, you should probably stick to the gawk
version when available.
A word about Awk and the shebang line. top
The shebang line for Awk is tricky.
First, Awk uses the '-f' flag when running a program file.
This adds an argument to the shebang line, which is ok when using awk with an absolute path, but that is not portable:
What should be used?
#!/usr/bin/awk -f
or
#!/usr/local/bin/awk -f
The someone (me, maybe?) says: Use this!
#!/usr/bin/env -S awk -f
Unfortunately, that solution only works with BSD env. It doesn't work on Linux nor on Cygwin, so it's not portable.
We also have to autodetect if gawk
is present on the system, because of the builtin bitwise functions issue.
So we end up with this:
#!/bin/sh
false {
eval "AWK=`(gawk 'BEGIN{print \"gawk\"}') 2>/dev/null||echo awk`"
eval "exec ${AWK} -f `which $0` ${1+\"$@\"}"
}
As we see, the shebang line says it actually is a shell script.
Basically it says: Try to execute gawk
, and if that fails, use awk
instead.
This should be portable to all platforms.
Shebang credits to:
http://perfec.to/shebang/shebang.nawk.txt
A word about using Erlang applications as command line tools top
Writing the Erlang version of b64
came out of pure curiousity.
Erlang is not very user-friendly when it comes to invoke a program from the command line, especially including command-line arguments.
Erlang is a very powerful language, but it isn't focused on creating CLI tools.
Read the code as it was created: with curiousity. ;-)
If you want to do some serious Erlang development with Base64, you should probably stick to Erlang's own base64
module:
http://www.erlang.org/doc/man/base64.html
A word about the node
executable for JavaScript programs (for Linux users only) top
Problem:
The 'node' binary is called 'nodejs' on some Linux distros, so the following "shebang-line" will fail.
#!/usr/bin/env node
Fix:
Debian/Ubuntu: sudo apt-get install nodejs-legacy
Other distros: sudo ln -s /usr/local/bin/nodejs /usr/bin/node
Alternative encoder top
An easy way to Base64 encode using openssl:
- With a file:
openssl base64 -A -in leviathan.txt
- Redirection:
openssl base64 -A < leviathan.txt
- Pipe:
cat leviathan.txt | openssl base64 -A
Links top
This page:
http://kuu.se/fossil/b64.cgi
Theory:
http://en.wikipedia.org/wiki/Base64
RFC:
https://tools.ietf.org/html/rfc4648
The 'base64' command tool, available for many platforms:
http://www.fourmilab.ch/webtools/base64/
Another C implementation:
http://base64.sourceforge.net/
A C/C++ implementation as a lib:
http://libb64.sourceforge.net/
A nice web interface for encoding images to CSS classes:
http://b64.io/
An efficient JavaScript implementation:
http://bl.ocks.org/mbostock/492147
Misc Base64:
https://en.wikibooks.org/wiki/Algorithm_Implementation/Miscellaneous/Base64