Description
Embed text/binary files as objects in C executables using either:
- ld(1) and objcopy(1).
or
- xxd(1) and
gcc
/clang
.
Quick start
make demo
Check Makefile.demo
for details how to create linkable object files from text files.
Notes about objcopy
and xxd
Using either ld+objcopy
or xxd+gcc
doesn't matter, both imply limitations.
One limit with ld+objcopy
is the that the output format is binary, so the generated symbol name and type cannot be change.
On the other hand, xxd
(with the -i
option) outputs C code to stdout
, which makes
it possible to modify the output before passing it to the compiler.
Example:
echo hello > hello.txt
xxd -i hello.txt
The output:
unsigned char hello_txt[] = {
0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x0a
};
unsigned int hello_txt_len = 6;
Notes about objcopy
and xxd
: Risk for conflicting names
Both ld+objcopy
and xxd
use the file name (possibly including a path) to create the symbol name.
Both hyphens and slashes are converted to underscores (i.e. -
to _
, and /
to _
).
This means that files with different names may create the very same symbol name:
echo dummy > dummy_hello.txt
echo dummy > dummy-hello.txt
mkdir dummy
echo dummy > dummy/hello.txt
xxd -i dummy_hello.txt
xxd -i dummy-hello.txt
xxd -i dummy/hello.txt
Same output for all 3 files:
unsigned char dummy_hello_txt[] = {
0x64, 0x75, 0x6d, 0x6d, 0x79, 0x0a
};
unsigned int dummy_hello_txt_len = 6;
An easy way to avoid conflicts is to never use neither -
nor _
in filenames.
Notes about objcopy and xxd: Only for text files handled as C strings - modify the xxd
output
As pointed out here,
if you want a more "C string-friendly" approach, and just want to printf()
the text,
it would be more convenient to handle the text as a NULL-terminated string (char pointer):
const char *hello_txt[]= { 0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x0a, 0x00 }; const size_t hello_txt_len = 7;
The string length has to be incremented by one, and should be of type size_t to match
C string functions such as strlen()
.
This "C string-friendly" approach has its disadvantages though, as when compiling from stdin
(convenient to avoid creating an intermediate C file):
xxd -i hello.txt | cc -c -xc -g - -o hello.o
(no output)
xxd -i hello.txt | sed -s 's/unsigned int/size_t/' | cc -c -xc -g - -o hello.o <stdin>:4:1: error: unknown type name ‘size_t’
Oops. The output from xxd
does not include any C headers, so only core C language definitions may be used.
This can be resolved by including a header, but it also makes things more complicated:
(echo '#include <stddef.h>' && xxd -i hello.txt) | sed -s 's/unsigned int/size_t/' | cc -c -xc -g - -o hello.o (no output)
Let's keep it simple, and use the output from xxd
"as is". Or, at least, almost.
Declaring the variables as const
makes sense:
xxd -i hello.txt | sed -s 's/unsigned/const unsigned/' const unsigned char hello_txt[] = { 0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x0a }; const unsigned int hello_txt_len = 6;
And as we know the length of the text, we can still use printf()
, even if the string isn't NULL-terminated:
printf("hello_txt=%s\n", hello_txt);printf("hello_txt=%.*s\n", hello_txt_len, hello_txt);
Notes (xxd only): Speed up parsing using C
Parsing the input files for this project in particular is made by a Makefile rule (see the rule 'struct' in the Makefile).
The Makefile rule works, but is really big and not very elegant.
It is also probably quite slow for a big number of input files, as it takes heave use of shell commands:- Lots of
printf
's find
(both files and directories)xxd
file -b --mime-type
to get the MIME type- Piped output to either
gcc
orclang
(usescc
to make it portable)
- Lots of
To speed things up (and make the Makefile rule much smaller), replace the shell commands with a C program:
- Use readdir(3) instead of
find
to get files and directories. - Use the xxd source code to embed
xxd
inside the C program. - Use the libmagic(3) library
to get the MIME type instead of using
file -b --mime-type
- Use popen(3) to pipe output to
cc
.
- Use readdir(3) instead of
Notes about objcopy and xxd: Summary
- Use
xxd
instead ofobjcopy
, as it allows us to tweak the output. - Modify the
xxd
output as little as possible (but do declare variables asconst unsigned ...
). - Pipe
xxd
output togcc/clang
to avoid creating intermediate C files. - Parse the last line of
xxd
to get the symbol name, i.e.hello_txt_len
. - Avoid using filenames containing neither
-
nor_
, to avoid symbol conflicts when linking many files together. - Optionally, append output (C code) to
xxd
before piping togcc/clang
(see below for details). - Consider writing a C program to speed up file parsing.
Library with web content as in-memory objects: libwwwmem
Step 1: Static HTML
One example of using embedded files is web content files (HTML/CSS/JS/JPEG/PNG/GIF/SVG) compiled
into a C web server as object files, thus the term "wwwmem".
An advantage of using embedded web content is the speed, avoiding disk access to get the web contents.
A disadvantage is the limited flexibility, as all web contents has to be available at compile time,
but this is not necessarily a problem for simple web servers.
Step 2: Lookup methods
One problem to solve is how to map a file path in a URI request to an C symbol which points to the requested content.
Each file path must be saved together in a C struct together with the pointer to the file contents.
Directory paths must also be saved, to be able to handle Directory Index and Directory Listings.
This C struct must be generated by either a script (in our case, done by the Makefile), or the C program which parses the files.
The lookup method to find a specific path may vary, from a simple loop (slow and inefficient, but simple) to more advanced techniques such MPFH (Minimal Perfect Hash Function), which are faster, but may be considered overkill for a web server serving just a few pages.
Step 3: Directory index and listing
By default, trying to access a URI which is a directory returns either the directory's index.html
file, or 404 Access Denied
.
Use either a global configuration (command-line arguments or configuration file),
or per-directory basis .htaccess
files to change the default behaviour.
Step 4: Using HTML templates
If the web server is CGI-enabled, using HTML templates instead of static HTML pages offers more flexibility.
Step 5: Dynamically loaded library
To make web contents maintenance as flexible as possible, separate the web server code
from the libwwwmem
library, which is dynamically loaded at run-time.
The libwwwmem
library contains a list of functions, one function to access each web content object, and the web content objects themselves.
Being a dynamically loaded library, the web content may be modified/added/deleted and then recompiled and reloaded
without recompiling, or even stopping, the web server itself.
Step 6: Test suite
As libwwwmem
is compiled, test routines may be added to validate both static HTML, htaccess directives, and template syntax at compile time.
libwwwmem
details
Step 1: Static HTML
- "Objectify" a file, for example
www/css/style.css
:
Using objcopy:
rm -rf build; mkdir build
ld -r -o build/css/style.o [-z execstack] --format=binary www/css/style.css
objcopy --rename-section .data=.rodata,alloc,load,readonly,data,contents build/www_css_style_css.o
nm build/www_css_style_css.o
00000000000000b2 D _binary_www_css_style_css_end
00000000000000b2 A _binary_www_css_style_css_size
0000000000000000 D _binary_www_css_style_css_start
Using xxd:
rm -rf build; mkdir build
xxd -i www/css/style.css | cc -c -xc -g -Wall -Wextra -Werror -ansi -pedantic - -o build/www_css_style_css.o
nm build/www_css_style_css.o
0000000000000000 D www_css_style_css
00000000000000b4 D www_css_style_css_len
- Map the file
www/css/style.css
to the C symbolwww_css_style_css
.
Lets assume that the web server root directory iswww/
.
A HTTP request to getwww/css/style.css
would look like this:
GET /css/style.css HTTP/1.0
The HTTP response:
HTTP/1.0 200 OK
Content-Length: 178
Content-Type: text/html
Date: Fri, 31 Dec 1999 23:59:59 GMT
body {
font-family: Arial;
background-color: #bbf0ff;
}
.
.
.
The problem is that there is no way to map the path string /css/style.css
to the symbol _binary_www_css_style_css_start
from
within the web server program at run time.
Some additional information is required to map the file to the symbol:
To be able to return either the HTTP/1.0 200 OK
or the HTTP/1.0 404 Not Found
, there must be a way to search for a symbol,
and return 404 if the symbol does not exist. The search is done using a hash table.
The index for each entry (bucket) in the hash table entry is the hash value of the full path (i.e. hash_function(/css/style.css)
).
Each bucket is a linked list, but normally with one single element (unless collisions occured).
The element is a key/value pair.
The key is the string of full path to the file (i.e. /css/style.css
).
The key is only used when there is more than one element in the linked list, to search for the matching element in the list.
The value is a structure:
- A pointer to the web contents (i.e.
_binary_www_css_style_css_start
). - The
Content-Length
header (i.e._binary_www_css_style_css_size
, or_binary_www_css_style_css_end - _binary_www_css_style_css_start
). - The
Content-Type
header, to indicate the MIME type (i.e.text/plain
). - A pointer to the next element in the linked list.
This results in 3 structures:
/* The web content object struct */
typedef struct web_content_t
{
const unsigned char *content_start; /* _binary_example_jpg_start[] */
const size_t content_len; /* _binary_www_css_style_css_size */
const char *mime; /* MIME type, i.e. "text/plain" */
web_content_t *next;
} web_content_t;
/* The hash element (bucket) struct */
typedef struct hash_element_t
{
const char *key; /* The file path, i.e. "/css/style.css" */
char *value;
hash_element_t *next;
} hash_element_t;
/* The hash table struct */
typedef struct hash_table_t
{
hash_element_t *bucket; /* The dynamic array of buckets of keys/values */
} hash_table_t;
MIME
The MIME type may be achieved using the file
tool:
file -b --mime-type www/css/style.css
text/plain
The commonly used MIME types on the web:
text/plain
text/html
text/css
text/javascript
image/png
image/gif
image/jpeg
image/svg+xml
https://stackoverflow.com/questions/15594988/objcopy-prepends-directory-pathname-to-symbol-name
xxd -i input.txt | sed 's/input_txt/test/' | gcc -c -xc - -o obj.o
Step 2: Lookup methods
The path for all web content files and directories must be included in a list. This list of paths are used as keys for the lookup method.
The lookup method to find a specific path may vary, from a loop using strcmp
(slow and inefficient, but simple) to more advanced
techniques such MPFH (Minimal Perfect Hash Function), which are faster, but may be considered overkill for a web server
serving just a few pages.
To be able to handle Directory index and Directory listing, all directory paths must be included as separate keys in the list.
That is, if /path/to/an/image/pic1.png
is a file, the following entries have to be in the list:
/path/to/an/image/pic1.png
/path/to/an/image/
/path/to/an/
/path/to/
/path/
There are probably more efficient ways to solve this problem.
The lookup methods:
|-------------------|-----------------|-------------------------------------------------------------------------------------|
| Number of keys | Search function | Description |
|-------------------|-----------------|-------------------------------------------------------------------------------------|
| very small (< 10) | loop | Not using hash function at all. |
| very small (< 10) | hf_oa_minimal | Open-addressing probing, with collisions. |
| small (< 100) | hf_oa_perfect | Open-addressing probing, without collisions. |
| medium (< 10 000) | hf_mphf_bob | MPHF by Bob Jenkins. |
| big (+10000) | hf_bdz_ph | BDZ_PH, extracted from the CMPH library. |
|-------------------|-----------------|-------------------------------------------------------------------------------------|
Step 3: Directory index and listing
Directory index file
The default behaviour for accessing the URI /path/to/
is to return /path/to/index.html
and status 200 OK
.
Use DirectoryIndex
to use another filename instead of index.html
, or a list of filenames, checking for existance from left to right.
Directory listing
The default behaviour when /path/to/index.html
(or whatever indicated by DirectoryIndex
) does not exist,
is to return status 403 Access Denied
.
Use Options +Indexes
to return and status 200 OK
and list all files in the directory.
Usage
To use these options, do one of the following:
Create a
.htaccess
file in any or all directories:----------------------------------- DirectoryIndex index.html index.cgi Options +Indexes -----------------------------------
Enable
.htaccess
parsing from the command line:./PRG --parse-htaccess=1
Use command-line arguments to set the options globally (this ignores
.htaccess
parsing):./PRG --directory-index='index.html index.cgi' --directory-listing=1
Combine global options with per-directory
.htaccess
parsing:./PRG --directory-index='index.html index.cgi' --directory-listing=1 --parse-htaccess=1
Step 4: Using HTML templates
The HTML template: the tags
The HTML template is inspired by these two libraries:
These are the original HTML::Template
template tags:
TMPL_VAR
TMPL_LOOP
TMPL_INCLUDE
TMPL_IF
TMPL_ELSE
TMPL_UNLESS
The C Template Library offers some additional tags, but to keep things simple (and compatible with the HTML::Template
),
the additional tags are not supported:
TMPL_ELSIFTMPL_BREAKTMPL_CONTINUE
The HTML template: the tags - details
Tags may be written as HTML comments (useful when validating templates as valid HTML):
<TMPL_VAR NAME="PARAM1">
<!-- TMPL_VAR NAME=PARAM1 -->
Variables may be escaped as HTML, JS, URL:
<TMPL_VAR NAME="PARAM1" ESCAPE=HTML>
<TMPL_VAR NAME="PARAM1" ESCAPE=JS>
<TMPL_VAR NAME="PARAM1" ESCAPE=URL>
Variables may have a default value:
<TMPL_VAR NAME="PARAM1" DEFAULT="the devil">
A libwwwmem
template which include another template, uses the same syntax as HTML::Template
:
<TMPL_INCLUDE NAME="filename.tmpl">
The difference is that, when using libwwwmem
, "filename.tmpl"
is internally mapped to
the pointer _binary_filename_tmpl_start
instead of reading "filename.tmpl"
.
The HTML template: the library functions
Difference from HTML::Template
:
The libwwwmem
library functions are named with the HTML::Template
method names in mind, but not all methods are implemented.
Most HTML::Template
methods are overloaded, which is not permitted in C, so libwwwmem
includes either
only one "version" of each corresponding HTML::Template
method, or none.
Examples:
The
HTML::Template->param()
method may be called in several ways:param()
: return list of current template parameters. Not implented bylibwwwmem
.param(PARAM)
: return value of "PARAM". Not implented bylibwwwmem
.param(PARAM => 'value')
: assign single value to "PARAM".param(LOOP_PARAM => array_ref)
: assign array ref to "PARAM".param(SUB_PARAM => sub { return 'value' })
: assign sub ref to "PARAM".
Thelibwwwmem
implements one single function,wwwmem_tmpl_param(struct *tmpl_param)
.
Thetmpl_param
struct contains 3 linked lists: 1 for single values, 1 for loop arrays, and 1 for sub refs (list of function pointers).
This way, any and all kind of parameters may be called using one single call.
The
HTML::Template->config()
method with arguments sets one or more configuration options.
With no arguments, the current configuration is returned.libwwwmem
only permits configuration flags to be set using thewwwmem_templ_new()
function, but does not implement anyconfig
method.The
HTML::Template->output()
method supports the optionally argumentprint_to => *STDOUT
.
libwwwmem
implements two functionswwwmem_templ_output()
andwwwmem_templ_output_to(int fd)
.
wwwmem_templ_output()
is actually a wrapper forwwwmem_templ_output_to(stderr)
As a template always is an in-memory object, the HTML::Template
methods for dealing with file caching do not make sense with libwwwmem
.
libwwwmem
functions:
wwwmem_tmpl_new()wwwmem_tmpl_config()wwwmem_tmpl_param()wwwmem_tmpl_clear_params()wwwmem_tmpl_output() wwwmem_tmpl_output_to()wwwmem_tmpl_query()
%%% ascii/utf8/utf16 ???
Step 5: Dynamically loaded library
TODO: DETAILS
Step 6: Test suite
TODO: DETAILS
Links
man(1) objcopy:
https://linux.die.net/man/1/objcopy
man(1) ld:
https://linux.die.net/man/1/ld
man(1) xxd:
https://linux.die.net/man/1/xxd
xxd source code:
https://github.com/lwilletts/xxd
Tutorials:
http://www.linuxjournal.com/content/embedding-file-executable-aka-hello-world-version-5967
https://dvdhrm.wordpress.com/tag/objcopy/
https://gareus.org/wiki/embedding_resources_in_executables
HTML template:
https://metacpan.org/pod/HTML::Template (Perl)
http://libctemplate.sourceforge.net/doc.html (C)