Welcome to mod_spin for Apache 2.2+ from Rexursive
==================================================
mod_spin is an Apache module that provides the following functionality (in
conjunction with some other modules):
- a simple template language with data replacement capabilities only
- persistent application and session data tracking
- dynamic linking of applications into Apache as shared libraries
- parameters, cookies and multipart/form-data parsing via libapreq2
- simple API for (kind of) MVC controller functionality
- simple API for pooled (or not) access to SQL databases via APR DBD
mod_spin is written in C, just like Apache itself and it uses APR, which was
written to be cross platform, secure and perform well. Generally speaking, you
should see speed improvements when compared to Java, PHP and Perl solutions,
sometimes even by an order of magnitude.
This software exists to enable easy development and deployment of high
performance web applications written in C (or perhaps even other languages) on
Linux (or Unix) systems running Apache. It should be particularly easy to do
that on systems that run RPM packaging system, such as Fedora Core, Red Hat
Enterprise Linux, CentOS and similar distributions. Obviously, other types of
packages can be built too, but RPM support is already in mod_spin.
How does mod_spin work?
=======================
mod_spin is essentially a content handler, meaning, for a specified file
extension, mod_spin will read the file, parse it into an Abstract Syntax
Tree (AST) and then replace the occurrences of references with values coming
from the application (this is where mod_spin is similar to Velocity). There is
no predefined file extension for mod_spin templates. I sometimes use ".sm" for
"spin macro", but you can use whatever you like, as long as you tell Apache
what that is.
At one point I was considering mod_spin as an output filter, but eventually
couldn't find enough justification to do so. The code would be much more
complex and I wanted to keep things simple. One day, maybe...
The application, a shared library (or .so), is dynamically linked at run-time
(i.e. when the request is handled by Apache) and its entry functions are
called, one when the library is dynamically linked into Apache, one during the
fixup phase of the request and one during the handler phase. These functions
take one argument, which is a structure containing the context. The context
holds the data to be replaced, parsed parameters, session and application
information and the current request. It then executes whatever code is
appropriate for the current request, most likely based on the URI and the
parameters (this completely depends on what the application actually does, of
course). This execution results in data structures holding the values that are
to be placed into the template. This data is then placed inside the template
by traversing the AST and replacing references with values. The end result (a
bucket brigade) is given to Apache output filters to push out into the world
(and possibly modify the content as well).
Before the application entry functions are called, mod_spin takes care of
application and session tracking (it relies on cookies for that). It does that
in a persistent manner (i.e. the values associated with the application and
session are stored in an SQL database). Apache Portable Runtime DBD layer is
used to provide access to various SQL backends, like PostgreSQL, MySQL,
SQLite2/3 and Oracle.
What are mod_spin applications?
===============================
They are simply shared libraries. You would normally get those as a result of
writing, compiling and linking a set C program files. You can, of course, get
those as a result of compiling and linking some other language. Keep in mind
that mod_spin expects its data in a particular way. IF THAT'S NOT FOLLOWED,
MANY THINGS WILL BREAK AND YOU MIGHT CAUSE SECURITY PROBLEMS ON YOUR SYSTEM.
That being said, mod_spin applications are probably not for someone that isn't
comfortable with application development in C. If you're looking for a
scripting language, mod_spin isn't it. Actually, one of the main reasons for
writing mod_spin was that I wanted full access to C Unix API but without the
need to hammer (X)HTML out of my code. With mod_spin you can keep your focus
on business logic and forget presentation for the most part.
See sections "Service function", "Prepare function" and "Init function" for
all the details related to the entry points into the application.
Why not CSP (C Server Pages)?
=============================
C Server Pages are an implementation similar to JSP (Java Server Pages), but
unlike JSP, feature C language snippets, not Java, placed into HTML. Such page
is then converted into a C program, compiled and then linked into a shared
library, which is dynamically linked into Apache at run-time. In essence, it
taps directly into the C run-time system, just like mod_spin does. It is
probably faster because it does no template processing.
However, just like JSP, it suffers from similar problems. The first one is a
confusing mix of C programming language with HTML. This makes it completely
unusable (on the presentation level) by non-experts, even if trivial changes to
the page are required (i.e. a spelling fix can cause serious functionality
problems, even security violations), not to mention that the mixed code is
truly unreadable. The second one is the "translate -> compile -> link ->
dynamic link -> run" process, which creates further complications and opens up
the possibilities for strange run-time errors. And finally, a full C
development environment has to be installed on the system running CSP, in case
any of the pages ever get changed. These arguments are more or less the same as
the one when a template language such as Velocity is compared to JSP.
mod_spin avoids the above by defining a simple, data replacement only
template language and leaves all of the programming logic where it belongs -
in the application. At the same time, the application behaves mostly (but not
completely) neutral as far as presentation of data is concerned. It is almost
irrelevant what the output is going to look like, so most of the time
programmers are only busy working on functionality, not looks.
The above arguments, however, don't cut it for everyone (as I have observed in
my encounters with other developers), so if you're one of those, mod_spin is
probably not for you.
Security concerns
=================
Just like anything else written in C, if you aren't careful, you can shoot
yourself in the foot quite effectively. Buffer overflows and similar problems
can, however, be avoided if problematic functions aren't used and good
programming practices followed. mod_spin makes heavy use of APR, which is an
example of an API that was designed from the ground up to be secure.
Even if you're most careful, security problems can happen. It is therefore
good to follow guidelines for secure Apache setup. In extreme circumstances
(e.g. when you're allowing others to deploy their own applications into
Apache running mod_spin), IT IS ADVISABLE TO RUN A SEPARATE INSTANCE OF
APACHE, BEHIND THE MAIN SERVER, WITH A SOLE PURPOSE OF RUNNING MOD_SPIN
APPLICATIONS. Virtual hosting for multiple clients is one of the examples
where such a scenario might be effective. Applying chroot jail, SELinux and/or
running different Apache instances under different user IDs on unprivileged
ports will go a long way toward ensuring that even if someone breaks in, the
potential for damage is minimal. Also, technologies like GCC's stack
protector, kernel's exec-shield and NX bits may prove useful.
Here are some very important security implications that may arise from the
use of mod_spin. This is in relation to connection pools. To understand the
issues involved, one needs to understand how Apache deals with multiple client
connections.
Apache will generally spin up numerous processes or threads in order to handle
multiple connections from clients. There is no guarantee that a process/thread
that handled one client's connection will handle it again in the future. The
process/thread will be assigned at Apache's discretion. So, it is possible,
even likely, that a process that handled something related to one client,
handles another client next time. If there is anything in the memory space of
this process/thread left over from the previous client, it will be completely
accessible to the next client. Applications of mod_spin, shared libraries, are
linked directly into the running process and they have full access to the
memory space of that process.
The above means that an application can fetch any previously opened connection
from the pool of connections and use it at will. Depending on how this
connection was opened in the first place (by the original client), this will
enable reading and/or writing of data that otherwise might not be accessible.
IT IS CLEAR THAT THIS IS A SERIOUS SECURITY IMPLICATION.
Generally speaking, you should make sure that mod_spin applications linked
into an instance of Apache are all from the same "security realm". For
instance, if you're using mod_spin to enable dynamic applications virtually
hosted on a single server (machine) for your customers, allowing two different
customers to deploy mod_spin application into the same instance of Apache will
allow them to read/write each other's databases. This might be accidental or,
more seriously, intentional and malicious. YOU SHOULD ABSOLUTELY MAKE SURE
THAT SUCH APPLICATIONS ARE DEPLOYED INTO DIFFERENT INSTANCES OF APACHE,
RUNNING UNDER DIFFERENT USER ACCOUNTS!
General recommendation is: if you don't have full control over all
applications and you're using session/application persistent store and/or
connection pools, you should have separate instances of Apache for each
identifiable "security realm".
Stability
=========
If you ever wrote a C program, you know that one of the most dreadful things
is the infamous Segmentation Fault (SIGSEGV, Signal 11). It happens when your
program tries to dereference a memory location that is invalid, such as NULL.
mod_spin makes reasonable effort to ensure that the raw data it handles (the
template, session and application data, parameters, cookies etc.) is processed
in a manner that produces no segfaults. As for the context, the data that your
own application prepares, mod_spin doesn't have any control over what's in
there. It will take certain precautions against obvious stuff like NULL
pointers, but some of the other errors might be complicated to detect and
handle. And because mod_spin is a small and lightweight piece of software, it
doesn't do any of that. It simply relies on you (yes, that's YOU!) that the
data placed in the context is going to be good.
If the data is not good, the code will segfault, bringing down with it the
child Apache process inside which it was executing. This is not a big concern
from Apache's point of view, as the parent process will fork as many new
processes as it needs - however, your server might suffer a denial of service
attack because of this. So, make your context data good!
Note that the above scenario is only applicable to the prefork Apache MPM.
Other MPM modules might behave in a different way (i.e. more than one thread
of Apache can be affected), so keep that in mind when deploying mod_spin under
those scenarios.
Memory management
=================
Apache Portable Runtime uses memory pools for most memory allocation and
mod_spin naturally follows. It is a good and fast approach. However, some
memory pools may have rather long life cycle (namely per-thread pool of
mod_spin and its sub-pools, used for parsed templates). Although the code of
mod_spin tries to avoid these long lasting pools whenever possible, it is
sometimes unavoidable to have things put into them. Also, connection pools
will be associated with the per-thread pool (i.e. objects from the connection
pool may be pointing to objects from the per-thread pool for longer than one
request). This can lead, over time and given huge number of requests or a lot
of template changes, to small memory leaks.
To avoid this, Apache comes with a configuration directive that helps in
reduction of such problems. This directive is MaxRequestsPerChild, and it is
set to 10,000 by default. You may also want to consider using MaxMemFree
directive, which forces Apache's pool machinery to release free memory more
agressively. Making changes to both of these directives may have performance
implications, so test in your scenario before applying.
CAUTION: If you have a very large number of templates, say thousands, and
SpinCache is turned on (default) and SpinCacheCount is set high, your Apache
may consume HUGE amounts of memory, even to the point where the system goes
into heavy swapping (depending upon the amount of memory the system has). It
is quite clear that this can be exploited as denial of service attack against
the machine. Use SpinCache and SpinCacheCount directives wisely to avoid that.
This is a trade-off, of course, as mod_spin needs to parse the templates on
every connection if you turn template caching off, so there will be some
degradation in performance.
Language construct overview
===========================
The template language of mod_spin is simple, it has a loop and a few forms of
conditionals. You can see some examples below.
Loop:
#for(${reference})
text within the loop and a ${reference.column}
#end
Conditionals:
#if(${reference})
${reference} is not NULL
#else
${reference} is NULL
#end
#if(${reference} == "literal text")
${reference} equals text
#else
${reference} doesn't equal text
#end
#if(${reference} == ${anotherreference})
${reference} equals ${anotherreference}
#else
${reference} doesn't equal ${anotherreference}
#end
#if($#{reference} == 3)
the size of ${reference} equals 3
#else
the size ${reference} doesn't equal 3
#end
#if($@{reference.column} % 2 == 0)
current index of ${reference.column} is divisible by 2
#else
current index of ${reference.column} is not divisible by 2
#end
Conditionals can also be reversed in meaning, by using #unless instead of #if.
All valid #if constructs are possible with #unless as well. Here is an
example:
#unless(${reference})
${reference} is NULL
#else
${reference} is not NULL
#end
Data types and loops
====================
You can place two different types of data in the context: single and rows.
Singles are simply character strings. They are pointed to by a char* and
limited by the length. Generally, mod_spin does not rely on '\0' being present
at the end of the string. However, regular C APIs mostly handle strings that
have the ending '\0' character. Therefore, all single data, although being
declared as 'size' in length, actually gets a '\0' character at the end
(naturally, the space for this character is allocated when the single is
created). This is very useful when communicating with regular C APIs, as it
saves a lot of copying and memory allocation. If you design your own functions
that create single data, you MUST FOLLOW THIS CONVENTION OR YOU'RE SETTING
YOURSELF UP FOR A WHOLE HEAP OF BUFFER OVERFLOWS!
Rows are data that looks a lot like something that would be returned from an
SQL query: there are named columns and data contains certain number of rows.
However, unlike what's returned by SQL queries (i.e. single pieces of data),
each actual piece of data can again be either rows or single. This then enables
nesting of multiple data dimensions. The nested #for loops are used to spin
around such data. That's where the name mod_spin comes from.
Figure 1: Example data - Single
+------+
| type | rxv_spin_data_e: RXV_SPIN_DATA_SGL
+------+
| size | apr_size_t: number of characters in data
+------+ +--------------------------------+------+
| data | char* ---> | The actual data of size 'size' | '\0' |
+------+ +--------------------------------+------+
Figure 2: Example data - Rows
+------+
| type | rxv_spin_data_e: RXV_SPIN_DATA_RWS
+------+
| size | apr_size_t: number of rows in arrays pointed to by values of cols
+------+
| cols | apr_hash_t* ---+
+------+ |
|
+-------------+
|
+-----+-------+
| key | value | rxv_spin_data_t* ---+
+-----+-------+ |
| key | value | |
+-----+-------+ +--------------+----------------------------------+
| key | value | | Array of rxv_spin_data_t structures 'size' long |
+-----+-------+ +-------------------------------------------------+
| key | value |
+--+--+-------+
| +------------------------+
+---> char* ---> | zero terminated string |
+------------------------+
The final data type that is replaced into the template is always single.
mod_spin doesn't know how to replace full rows because the presentation would
be undefined. That's why you have to use #for loops to spin around rows data
type to place the singles contained there in their correct places inside the
template.
Metadata
========
Some of the API calls, like rxv_spin_meta_vstr() and rxv_spin_meta() return a
pointer to rxv_spin_data_t that has a type of RXV_SPIN_DATA_MTA, or metadata.
This data type is never used in the AST and if it is passed into it, the
results are unpredictable. Its only purpose is to facilitate the API itself by
making sure lengths of data arrays are stored somewhere, so that the
programmer using the API doesn't have to use separate variables to store them.
API calls know how to handle metadata, so if you stick to those, you should be
fine.
IMPORTANT: Placing metadata into AST can have unpredictable results. DON'T DO
THAT!
References
==========
They come in three flavours:
${reference} - the value (text, rows) of the reference (regular reference)
$#{reference} - the size of the reference (size reference)
$@{reference} - current index of the reference within the loop (index ref.)
The first form is straightforward as it simply takes the value of the
reference and it uses that. Used within text, only singles get substituted.
Used in a #for loop, singles get looped around once, rows get the number of
loops equivalent to the number of rows.
The second form takes the size of the reference, which for singles means the
length of the text and for rows the number of rows in each of the columns.
This form doesn't make sense in a #for loop and if placed there it will be
treated as a regular reference.
The third form takes the current index of the reference within a loop, if
applicable. Indexes start at 1. If the index wouldn't make sense (e.g. the
reference is a single, there is no loop etc.), it is treated as zero or as
NULL in conditionals and replaced with "0" if placed within text. This form
doesn't make sense in a #for loop and if placed there it will be treated as a
regular reference.
Text
====
Any text that isn't part of the #for loop or #if/#unless, will be literally
copied into the output. The space occupied by #for, #if and #unless will not
be space filled in the output, but removed as if it never existed.
References, which are case sensitive, placed inside the text will be replaced
with their values from the context or nothing if that value is NULL or the
reference does not exist. References are never recursively substituted (this
may create denial of service or security problems and it is therefore
avoided). If such functionality is desired, it belongs in your application.
Loops
=====
These are quite simple:
#for(${reference})
text within the loop and a ${reference.column}
#end
You can only use regular references to loop around and other types of
references placed in #for will be treated as regular. For instance:
#for($#{reference})
some text here
#end
is the same as:
#for(${reference})
some text here
#end
The #for loop won't spin if the data it is supposed to process is NULL. This
can happen if the appropriate data for the reference cannot be found in the
context, or if the value of it is NULL.
Conditionals
============
The only other command in mod_spin apart from #for loop is the conditional, as
shown above in the overview. Again, it looks like this, for the simplest of
expressions (i.e. a reference):
#if(${reference})
something if ${reference} is not NULL
#else
something if ${reference} is NULL
#end
or the negative variant:
#unless(${reference})
something if ${reference} is NULL
#else
something if ${reference} is not NULL
#end
You can also use:
#if(${ref}) something #end
#if(${ref}) something #else#end
#if(${ref})#else something #end
and naturally:
#unless(${ref}) something #end
#unless(${ref}) something #else#end
#unless(${ref})#else something #end
Expressions valid in conditionals
=================================
An expression placed inside #if or #unless always starts with a reference and
placing anything else on the left is an error (or more explicitly, a parsing
error). Here are all the forms of expressions allowed in conditionals and when
they yield truth:
#if(${ref}) - ref exists and is not NULL
#if($#{ref}) - the size of ref is greater than zero
#if($@{ref}) - the current index of ref is greater than zero
#if(${ref} =~ /regex/) - ref matches Perl compatible regular expression regex
#if($#{ref} =~ /regex/) - size of ref, as string, matches regex
#if($@{ref} =~ /regex/) - index of ref, as string, matches regex
#if(${ref} == "str") - ref is the same as literal string str
#if($#{ref} == "str") - size of ref, as string, is the same as string str
#if($@{ref} == "str") - index of ref, as string, is the same as string str
#if(${ref} == num) - ref, as number, equals num (integer >= 0)
#if($#{ref} == num) - size of ref is num
#if($@{ref} == num) - index of ref is num
#if(${ref1} == ${ref2}) - ref1 is equal to ref2, as strings
#if(${ref1} == $#{ref2}) - ref1 is equal to size of ref2, as strings
#if(${ref1} == $@{ref2}) - ref1 is equal to index of ref2, as strings
#if($#{ref1} == ${ref2}) - size ref1 is equal to ref2, as numbers
#if($#{ref1} == $#{ref2}) - size ref1 is equal to size of ref2
#if($#{ref1} == $@{ref2}) - size ref1 is equal to index of ref2
#if($@{ref1} == ${ref2}) - index of ref1 is equal to ref2, as numbers
#if($@{ref1} == $#{ref2}) - index of ref1 is equal to size of ref2
#if($@{ref1} == $@{ref2}) - index of ref1 is equal to index of ref2
#if(${ref} % mod == num) - ref, as number, modulo mod (integer > 0) is num
#if($#{ref} % mod == num) - size of ref modulo mod is num
#if($@{ref} % mod == num) - index of ref modulo mod is num
Regular expression matches, integer comparisons, literal string, reference to
reference and modulo expression comparisons don't work for regular references
that are not singles and they will always yield false. No pointer comparisons
are ever done, so attemting to compare rows type data will always fail. Of
course, all this cannot be determined at parse time, but at runtime. As you
can see, in expressions with a reference on the right, the reference on the
left is the "master" and it determines the type of comparison done in the
expression. For regular references, this is a string comparison (i.e. text is
compared, not pointers), for sizes and indexes, it is a numerical comparison.
Numbers, except mod in the modulo expression, are all integers greater or
equal zero. Literal strings are double quoted (e.g. "a string"). To escape the
double quote itself, use "a string with a \" in it". Regular expressions are
specified withing slashes (e.g. /^begin.*$/). To escape the slash, use
a backslash before it: /^begin\/.*$/. Note that mod_spin isn't aware of any
character encodings and from its perspective bytes are characters. If you need
to make comparisons that take into account character encodings, you will have
to do that inside your applications (for now).
Indexes for singles and rows outside a relevant loop are assumed to be zero.
If reference on the left doesn't exist or is NULL, the whole expression will
always evaluate to false and the bit after #else (if any) will end up being
processed.
Regular references are converted to numbers using the atol() function, so
strings that don't start with numbers turn out as zero.
Loops, conditionals and impossible references
=============================================
Normally, mod_spin template will look something like this:
First text ${ref1}
#for(${ref2})
replicate some other text and ${ref2.col1}
#end
The ${ref1} will be replaced with the value found in the context, if the data
it points to is a single, or nothing at all if the data it points to is of
type rows. The #for loop will spin around ${ref2} and replicate the enclosing
text for all instances of data that ${ref2} points to. The ${ref2.col1} will
be replaced with the current row value of the column "col1", if ${ref2}
happens to be a data type rows and ${ref2.col1} resolves to a data type
single for the current row.
Now let's examine an example where impossible references are used:
First text ${impossible.reference}
#for(${second.impossible.reference})
replicate some other text and ${second.impossible.reference.column}
#end
The first reference ${impossible.reference}, can never be found in the context
because there is no #for loop to spin the data in ${impossible} in order to
find ${impossible.reference}. So, when creating the AST, mod_spin will simply
ignore this reference. The reference used to spin the #for loop,
${second.impossible.reference}, is also something that cannot exist, so
mod_spin will ignore the whole loop and never place any of it into AST.
Note that this is different from the first code snippet with, for instance, the
value of ${ref2} being NULL, or not existing at all. The parsed #for loop and
the text it encloses will be placed into the AST, but it won't be replicated
because there is no data to spin the loop around.
The above discussion applies to conditional statements as well. For instance:
First text ${impossible.reference}
#if(${second.impossible.reference})
replicate some other text and ${second.impossible.reference.column}
#end
The above example would yield exactly the same output as the previous example
with the #for loop. However, if you use the #else, then whatever is placed
within it will be used. For instance:
First text ${impossible.reference}
#if(${second.impossible.reference})
replicate some other text and ${second.impossible.reference.column}
#else
this will always be in the output
#end
In the above example, the text placed between #else and #end will always be in
the output, because the #if would never be true, given that the reference is
impossible.
Similarly, other conditional forms behave as expected, because they always
have a reference on the left hand side. For instance:
#if($#{impossible.reference} == 3)
this will never be visible
#else
this will always be visible
#end
And here is one example for the #unless, the negative conditional:
#unless(${second.impossible.reference})
replicate some other text and ${second.impossible.reference.column}
#end
The above will always end up in the output, because the reference is
impossible.
Service function
================
Service function is the main entry function into your application. It is
called in the handler phase of request processing and BEFORE template
processing, so it has the potential to change which template is going to be
processed as well as to decline or do the processing completely.
If SpinApplication isn't specified, the shared library will not be loaded at
all, but the template will be processed as normal. However, there will be no
data in the context and therefore none will be placed in the final output.
The entry function (by default called rxv_spin_service()) takes one argument -
the context. It returns an integer which is similar to what an Apache handler
would return. The meaning is as follows:
OK: Everything was OK, commit application/session store and continue with
template processing. Note here that by manipulating filename field within the
request_rec structure, you can change which template is to be processed. Make
sure other fields (e.g. finfo) that are related to filename are properly
updated as well.
HTTP_ERROR (e.g. HTTP_INTERNAL_SERVER_ERROR): Stop all processing and give
control back to Apache request processing. Don't commit application/session
store, since there was an error in processing.
ANYTHING ELSE: Commit application/session store and give control back to
Apache request processing without processing the template.
Some examples of ANYTHING ELSE would be:
REDIRECT (e.g. HTTP_TEMPORARY_REDIRECT): Any further processing should not be
done as this request is going to be externally redirected. Note here that the
application HAS TO set the "Location" header in headers_out. Failing that, the
client will have problems.
DECLINED or DONE: The service function either decided it's not something this
handler should do (DECLINED) or has done all the work on behalf of it (DONE).
These are short-circuit return codes that will greatly affect Apache request
processing, so be careful with them.
Prepare function
================
There is an extra hook that is called before the request is handled, in the
fixup phase of Apache request processing. By default, this function is called
rxv_spin_prepare() and it takes one argument - the context. It is called if it
exists in the shared library.
The main purpose of the hook is to allow application writers to insert their
code before the actual request processing. This comes in handy, especially if
you want some code executed for URIs that aren't handled by the mod_spin
handler, but fall under the application umbrella. For instance, you can have
authentication code in this function, thus using a mod_spin application to
regulate access to all URIs of a configured application (see spin_app for an
example).
This function is not called for sub-requests. It is also not called unless an
application is configured for that particular request (i.e. per virtual host,
directory, location etc.).
Please note that at the point of call of this function, request parameters
have not been parsed yet by libapreq2. Meaning, although you do get the
context, you are getting only some of the information that is normally
available to the rxv_spin_service() function. This was designed on purpose,
since some of the requests that pass through here may not be handled by
mod_spin handler at all. In other words, we are not consuming the body of the
requests here because other modules may need it as is.
The function returns an integer, which can be OK, DECLINED, DONE or any other
HTTP_code. It will be returned back to the caller (i.e. Apache hook machinery)
directly from the fixup hook.
Init function
=============
This function, if exists, will be called when the application is dynamically
linked into Apache. It is used for process specific initialisation. By
default, this function is called rxv_spin_init(). It takes context as an
argument, but it doesn't return any values.
The code of mod_spin will allow only a single thread to execute this function
at a time. However, if the function is such that it should be called only once
per process, the function itself will have to make sure code isn't executed
more than once.
You can attach pool cleanup function(s) to thread specific pool if there is a
need to clean up after the init function. This should be done in the init
function itself.
Loading of applications
=======================
First, some background. All loading of shared libraries is done using
apr_dso_load() call from APR. On Linux (and some other Unix variants), this
translates into dlopen() call, which gives back a handle to a loaded library.
If a process attempts to open the same library again, the same handle will be
given back and a reference count for that library will be increased. In a
multi-threaded environment, this would mean that if multiple threads of
execution attempt to open the same library, they would be given back the same
handle and the reference count would be equal to the number of thread passes
that opened the library. Since Apache could be running in a multi-threaded
configuration (e.g. worker MPM), it is very difficult to control when a
library will be completely unloaded. Something like that would involve
introduction of per-process read/write locks, the code would become much more
complicated and bug-prone. Instead, mod_spin relinquishes the control of
unloading of libraries to Apache itself.
So, mod_spin makes sure that each thread keeps cache of loaded DSOs and that
it opens a particular library only once. This is done to avoid registering of
a pool cleanup for each call to apr_dso_load(), which would quickly grow
private thread pool. Once the new applications are deployed, in order to
reliably reload them, the main Apache process has to be given SIGUSR1 signal
(i.e. a graceful restart has to be initiated), so that all child processes die
and new ones replace them. This will ensure new applications are loaded across
the board.
Maximum nesting depth
=====================
The combined nesting depth of #for and #if/#unless commands is limited to
RXV_SPIN_MAX_DEPTH, as defined in private.h, which is currently 32. Why have
such a limit and why is the limit so low?
The limit is there to make the code of mod_spin simple and fast and to avoid
logic errors in templates caused by inadvertent use of deep nesting. The limit
is low because templates that require nesting depth anywhere near this limit
are doing something very, very wrong. The purpose of template language
constructs is not to introduce programming logic (in the sense of solving the
business problem the application is meant to solve), but to make simple
presentation level choices depending on the data generated by the application.
I cannot stress enough that ALL business logic should be in the application and
application alone.
Just like you are not supposed to have a great variety of commands available
in your template language and not supposed to be able to modify the data from
within the template language, you should not give in to the temptation of
fixing programming issues inside the template. A few nesting levels in the
template are quite sufficient for majority of the real world problems. The
limit currently set is way above that.
However, if you find that this is not adequate for you, for whatever reason,
feel free to modify private.h and recompile.
Presentation issues and the application
=======================================
You'll find that some of the presentation level decisions will be done within
your application as well (huh?). When given the choice of placing some
presentation level logic into the application as compared to contaminating the
template with business logic, I have chosen to go with the former. Cleverly
designed application will have a separate part that makes data generated by
business logic into a presentation friendly format. For instance, when (X)HTML
pages are created, some characters, like ``"'' and ``&'' have special meaning.
Business logic won't bother itself with making sure those are escaped.
However, the part of your application that makes sure presentation is nice,
will. Another example is a list of items on the page that should have rows
displayed in alternating colours (this particular problem can be solved by
newer version of CSS, but the browsers that support that are still not in
widespread use). Business logic, again, won't bother itself with that.
Presentation "beautifier" will.
For instance, one might have a boilerplate API calls (similar to what mod_spin
provides already, as indicated below) that adds columns to the rows data type
with a sole purpose of marking certain things for the template to pick up. One
such example would be to add a column "firstrow", which would have all data
NULL, except for the first row. Similar can be done for the last row. Again,
similar can be done for alternating rows (odd/even).
Then the template can have:
#for(${rowsofdata})
#if(${rowsofdata.firstrow})
Output this only on the first row
#end
#unless(${rowsofdata.firstrow})
Output this for all rows except for the first one
#end
#if(${rowsofdata.oddrow})
Output this only on the odd row
#else
Output this only on the even row
#end
#unless(${rowsofdata.lastrow})
Output this for all rows except for the last one
#end
#if(${rowsofdata.lastrow})
Output this only on the last row
#end
#end
These API calls would then fall into the "beautifier" category. Use your
imagination to come up with more...
How do I include other templates?
=================================
I find that duplicating functionality is not a good thing. So, I tried to stay
away from that. Apache already has mod_include, which can be used as a filter
or a handler and provides excellent support for inclusion of other files.
If the files that you're including a not dynamic (at least not very dynamic),
you should even consider generating finished files beforehand, using some of
the available replacement techniques, such as XSLT. This will be good for the
performance of your web server. It is worth an effort to reduce what's dynamic
to a minimum.
Session and application tracking
================================
mod_spin relies on mod_unique_id to provide a unique session identifier by
producing an MD5 hash of it. It then produces an HMAC MD5 of the hash, using
the crypto salt (key). Both of these (unique id hash and the HMAC) are then
served to the client in a cookie, usually called SpinSession. Only if both of
these are returned back to the server correctly, will mod_spin use this unique
id as the session id. Otherwise, the session simply won't exist. This should
make both guessing of session identifiers and denials of service attacks
caused by opening of fake sessions significantly more difficult.
You must define SpinCookie configuration parameter, or the sessions won't be
supported for that application at all.
Each session will have corresponding record in the spinstore table of the SQL
database you point to using SpinStore configuration directive. The application
will have a record identified with "__application" in the same table as well.
Through a simple API, you can get values for each key, either on the
application level (i.e. shared among multiple sessions) or session level (i.e.
private session data).
The maintenance of stale sessions is easy. Simply define a cron job that goes
around and deletes from the table whatever is older than you like. SpinTimeout
configuation directive (if defined above zero) won't actually delete any
records from the table - the parameter is used to determine when the data of
the session is too old and should therefore be ignored. You need to have an
outside job for cleaning records that are very old.
Although basic concepts have been pinched from JSP/Servlet world, applications
have a slightly different meaning in mod_spin. Basically, whatever uses the
same application database file falls under the "same application" umbrella.
You can configure SpinStore per server, virtual host, directory or location.
So, applications can cross boundaries freely. Sessions are also following the
same rule, so you can have multiple session private data for different
definitions of SpinStore.
Application configuration
=========================
Each application can (but doesn't have to) have a configuration file. The
filename is specified via the SpinAppConfig run-time configuration directive.
The file is regular XML and it looks like this:
]>
The value associated with spinparameter1
The value associated with spinparameter2
It is preferred to include the DTD in the document (it is only small) in order
to avoid parsing problems.
The configuration is loaded and reloaded automatically by mod_spin. Once the
configuration is parsed, the keys and values of the tags are placed
into the application's store. On each new request, the configuration file is
checked for modification. If the configuration file is newer, it is parsed
again and the keys and values are placed into the application store. New
values will overwrite existing values associated with same keys, but other
key/value pairs in the application store will not be changed.
Authentication
==============
Apache provides enough authentication mechanisms to not duplicate this
functionality in mod_spin. And because Apache's request_rec structure contains
all environment variables, the information about the user using the resource
is always available to your applications. At this point in time, I did not
feel that keeping user data similar to session and application data was
necessary. Things like that mostly belong into the application.
However, you can wrap Apache authentication with the spin_auth application and
small amount of your own code. See spin_auth and spin_app applications for all
details.
Connection pools
================
mod_spin has a simple API for accessing SQL relational databases. In order to
improve performance of connecting to database (and other) servers, mod_spin
uses the popular pool approach. Each connection is identified by the
connection string, which is specified when the connection is opened. mod_spin
creates a hash of all those connections and stores connection structures,
which are database specific, as values in this table. Any subsequent attempt
to open a connection to the database of the same type and with the same
connection string (the keys are case sensitive) will reuse the existing
connection. This can dramatically improve performance of applications that
frequently use (database) connections.
Each Apache thread will have its own pool of connections (see also the
threading discussion that follows). While this is good for performance, it has
downsides.
With every thread having its own private connections to the back-end server,
the total number of connections can be rather big (i.e. number of threads
multiplied by number of connections per thread). Each connection takes memory,
CPU cycles and sockets for communication, which, depending on the number of
connections, might not be negligible. This alone can overwhelm the machine and
can ultimately result in denial of service. That is another reason why it is a
good idea to run a separate instance of Apache for heavily loaded applications.
Luckily, Apache is fast to start and it doesn't consume a lot of memory (in
today's terms), so you can have many instances of it running at once. With this
approach, you're turning your Apache server into a transaction processing
server.
SpinConnPool configuration directive enables system administrators to control
pooling of connections. This can be useful if you find that a particular
application is causing a large number of connections to be kept open.
SpinConnCount configuration directive enables system administrators to control
the number of connections in the pool. By default, up to 5 connections will be
kept in the per-thread connection pool.
You can register any type of connection with the connection pool. As long as
it has a unique connection string, you should be fine. This can then be used
for any type of connection you'd like to keep hanging around for the lifetime
of the thread. LDAP and similar services come to mind first.
By all means, this kind of simple database API will not be everyone's cup of
tea. There are very nice alternatives (Apache/APR Util mod_dbd/apr_dbd, which
mod_spin code uses internally and SQL Relay come to mind first) that solved
all of these problems (in a slightly different manner) and more. Also, some
people prefer to program in a truly cross platform solutions like ODBC. Feel
free to completely ignore mod_spin's database API.
Threading
=========
Some Apache MPMs (Multi-Processing Modules), e.g. worker, spin off multiple
threads of execution. Also, you might spin off some threads in your
application code as well. There are several issues that might be affecting
thread safety, most important being connection pool.
Each thread will have its own connection pool. The separation is achieved
using apr_threadkey_private_* set of functions, which on Linux/Unix map into
pthread_key_* functions. However, if you spin off your own threads, you MUST
SYNCHRONISE access, or you will experience problems.
mod_spin code does not do any synchronisation of its own (i.e. it is thread
unsafe), simply because it makes sure beforehand that all variables are
strictly something a single thread can use without worrying about any other
threads. This ensures there is no resource competition among threads and
generally trades higher memory usage for greater speed.
IMPORTANT NOTE: As of MIT Kerberos (krb5-libs) version 1.3.6 and PostgreSQL
7.4.6, the combination is not thread safe. If you use worker (or other thread
based) Apache MPM, you can experience segfaults. There are also some memory
leaks associated with krb5 library, so if you don't use database pools, you
may see Apache child processes slowly growing in size (the memory leaks have
been fixed in version 1.4 of krb5-libs). A lower MaxRequestsPerChild or a
periodical graceful restart should fix that. Generally speaking, a prefork MPM
is recommended in these scenarios.
Licensing exceptions
====================
The code of mod_spin is licensed under the terms of the GNU General Public
Licence, or GPL for short. However, I have made exceptions in certain files of
mod_spin to make it possible to link this code with Apache itself, its
modules, as well as dynamically link shared libraries that are mod_spin
applications.
It would not be legally possible to link mod_spin against Apache (both
dynamically and statically) unless this exception was made. Also, it would not
be possible to distribute statically linked Apache that includes mod_spin.
This exception takes care of that as well. You MUST OBEY THE GPL for all
mod_spin code.
It would also not be legally possible to link any non-GPL licensed mod_spin
applications (shared libraries) dynamically, at run-time, into mod_spin.
Because I do not want to attempt to force anyone to use a particular licence
for their own work, you get permission to dynamically link, at run-time, any
of mod_spin applications (shared libraries) with mod_spin. Furthermore, Apache
can have dynamically linked modules that aren't licensed under the GPL, which
would also cause legal problems. The exception makes sure this is OK too.
This dynamic linking has to be through the interface of SpinApplication or
LoadModules run-time configuration directives of Apache, as provided by
mod_spin code or Apache itself. Nothing but dynamic linking of mod_spin
applications and Apache third party modules is covered by this exception and
you MUST OBEY THE GPL for all mod_spin code.
If you modify mod_spin code, you may extend these exceptions to your version
of the file, but you are not obligated to do so.