Welcome to mod_spin for Apache 2.x from Rexursive
=================================================
mod_spin is an Apache module that provides the following functionality (in
conjunction with some other modules):
- a simple template language with data replacement capabilities only
- persistent application and session data tracking
- dynamic linking of applications into Apache 2 as shared libraries
- parameters, cookies and multipart/form-data parsing via libapreq2
- simple API for (kind of) MVC controller functionality
- simple API for pooled (or not) access to SQL databases
mod_spin is written in C, just like Apache 2 itself and it uses APR, which
was written to be cross platform, secure and perform well. Generally speaking,
you should see speed improvements when compared to Java, PHP and Perl
solutions, sometimes even by an order of magnitude.
This software exists to enable easy development and deployment of high
performance web applications written in C (or perhaps even other languages) on
Linux (or Unix) systems running Apache 2. It should be particularly easy to do
that on the systems that run RPM packaging system, such as Fedora Core, Red
Hat Enterprise Linux, CentOS and similar distributions. Obviously, other types
of packages can be built too, but RPM support is already in mod_spin.
How does mod_spin work?
=======================
mod_spin is essentially a content handler, meaning, for a specified file
extension(s), mod_spin will read the file, parse it into an Abstract Syntax
Tree (AST) and then replace the occurrences of references with values coming
from the application (this is where mod_spin is similar to Velocity). There is
no predefined file extension for mod_spin templates. I sometimes use ".sm" for
"spin macro", but you can use whatever you like, as long as you tell Apache
what that is.
At one point I was considering mod_spin as an output filter, but eventually
couldn't find enough justification to do so. The code would be much more
complex and I wanted to keep things simple. One day, maybe...
The application, a shared library (or .so), is dynamically linked at run-time
(i.e. when the request is handled by Apache) and its entry function is called.
This function takes one argument, which is a structure containing the context
(where the data to be replaced is stored), parsed parameters, session and
application information and the current request. It then executes whatever
code is appropriate for the current request, most likely based on the URI and
the parameters (this completely depends on what the application actually does,
of course). This execution results in data structures holding the values that
are to be placed into the template. This data is then placed inside the
template by traversing the AST and replacing references with values. The end
result (a bucket brigade) is given to Apache output filters to push out into
the world (and possibly modify the content as well).
Before the application entry function is called, mod_spin takes care of
application and session tracking (it relies on cookies for that). It does that
in a persistent manner (i.e. the values associated with the application and
session are stored in a file). SDBM database functionality, already included
in APR, is used to provide hashed searches for values based on specified keys.
What are mod_spin applications?
===============================
They are simply shared libraries. You would normally get those as a result of
writing, compiling and linking a set C program files. You can, of course, get
those as a result of compiling and linking some other language. Keep in mind
that mod_spin expects its data in a particular way. IF THAT'S NOT FOLLOWED,
MANY THINGS WILL BREAK AND YOU MIGHT CAUSE SECURITY PROBLEMS ON YOUR SYSTEM.
That being said, mod_spin applications are probably not for someone that isn't
comfortable with application development in C. If you're looking for a
scripting language, mod_spin isn't it. Actually, one of the main reasons for
writing mod_spin was that I wanted full access to C Unix API but without the
need to hammer (X)HTML out of my code. With mod_spin you can keep your focus
on business logic and forget presentation for the most part.
See section "Service function" for all the details related to the entry point
into the application.
Why not CSP (C Server Pages)?
=============================
C Server Pages are an implementation similar to JSP (Java Server Pages), but
unlike JSP, feature C language snippets, not Java, placed into HTML. Such page
is then converted into a C program, compiled and then linked into a shared
library, which is dynamically linked into Apache at run-time. In essence, it
taps directly into the C run-time system, just like mod_spin does. It is
probably faster because it does no template processing.
However, just like JSP, it suffers from similar problems. The first one is a
confusing mix of C programming language with HTML. This makes it completely
unusable (on the presentation level) by non-experts, even if trivial changes to
the page are required (i.e. a spelling fix can cause serious functionality
problems, even security violations), not to mention that the mixed code is
truly unreadable. The second one is the "translate -> compile -> link ->
dynamic link -> run" process, which creates further complications and opens up
the possibilities for strange run-time errors. And finally, a full C
development environment has to be installed on the system running CSP, in case
any of the pages ever get changed. These arguments are more or less the same as
the one when a template language such as Velocity is compared to JSP.
mod_spin avoids the above by defining a simple, data replacement only
template language and leaves all of the programming logic where it belongs -
in the application. At the same time, the application behaves mostly (but not
completely) neutral as far as presentation of data is concerned. It is almost
irrelevant what the output is going to look like, so most of the time
programmers are only busy working on functionality, not looks.
The above arguments, however, don't cut it for everyone (as I have observed in
my encounters with other developers), so if you're one of those, mod_spin is
probably not for you.
Security concerns
=================
Just like anything else written in C, if you aren't careful, you can shoot
yourself in the foot quite effectively. Buffer overflows and similar problems
can, however, be avoided if problematic functions aren't used and good
programming practices followed. mod_spin makes heavy use of APR, which is an
example of an API that was designed from the ground up to be secure.
Even if you're most careful, security problems can happen. It is therefore
good to follow guidelines for secure Apache setup. In extreme circumstances
(e.g. when you're allowing others to deploy their own applications into
Apache running mod_spin), IT IS ADVISABLE TO RUN A SEPARATE INSTANCE OF
APACHE, BEHIND THE MAIN SERVER, WITH A SOLE PURPOSE OF RUNNING MOD_SPIN
APPLICATIONS. Virtual hosting for multiple clients is one of the examples
where such a scenario might be effective. Applying chroot jail, SELinux and/or
running different Apache instances under different user IDs on unprivileged
ports will go a long way toward ensuring that even if someone breaks in, the
potential for damage is minimal.
Here are some very important security implications that may arise from the
use of mod_spin. This is in relation to tracking of application and session
data and to connection pools. To understand the issues involved, one needs to
understand how Apache deals with multiple client connections. One also needs
to understand Unix file permission model.
Apache will generally spin up numerous processes or threads in order to handle
multiple connections from clients. There is no guarantee that a process/thread
that handled one client's connection will handle it again in the future. The
process/thread will be assigned at Apache's discretion. So, it is possible,
even likely, that a process that handled something related to one client,
handles another client next time. If there is anything in the memory space of
this process/thread left over from the previous client, it will be completely
accessible to the next client. Applications of mod_spin, shared libraries, are
linked directly into the running process and they have full access to the
memory space of that process.
The above means that an application can fetch any previously opened connection
from the pool of connections and use it at will. Depending on how this
connection was opened in the first place (by the original client), this will
enable reading and/or writing of data that otherwise might not be accessible.
IT IS CLEAR THAT THIS IS A SERIOUS SECURITY IMPLICATION.
Generally speaking, you should make sure that mod_spin applications linked
into an instance of Apache are all from the same "security realm". For
instance, if you're using mod_spin to enable dynamic applications virtually
hosted on a single server (machine) for your customers, allowing two different
customers to deploy mod_spin application into the same instance of Apache will
allow them to read/write each other's databases. This might be accidental or,
more seriously, intentional and malicious. YOU SHOULD ABSOLUTELY MAKE SURE
THAT SUCH APPLICATIONS ARE DEPLOYED INTO DIFFERENT INSTANCES OF APACHE,
RUNNING UNDER DIFFERENT USER ACCOUNTS!
The second problem is connected to this in terms of user accounts and access
to files. Session and application data tracking files will be readable and
writable by the user Apache processes run as (this is defined in the Apache
configuration file). So, in the above scenario with two customers, they would
be able to read/write each other's session/application files. Once again, YOU
SHOULD ABSOLUTELY MAKE SURE THAT SUCH APPLICATIONS ARE DEPLOYED INTO DIFFERENT
INSTANCES OF APACHE, RUNNING UNDER DIFFERENT USER ACCOUNTS!
General recommendation is: if you don't have full control over all
applications and you're using session/application persistent store and/or
connection pools, you should have separate instances of Apache for each
identifiable "security realm".
Stability
=========
If you ever wrote a C program, you know that one of the most dreadful things
is the infamous Segmentation Fault (SIGSEGV, Signal 11). It happens when your
program tries to dereference a memory location that is invalid, such as NULL.
mod_spin makes reasonable effort to ensure the raw data it handles (the
template, session and application data, parameters, cookies etc.) is processed
in a manner that produces no segfaults. As for the context, the data that your
own application prepares, mod_spin doesn't have any control over what's in
there. It will take certain precautions against obvious stuff like NULL
pointers, but some of the other errors might be complicated to detect and
handle. And because mod_spin is a small and lightweight piece of software, it
doesn't do any of that. It simply relies on you (yes, that's YOU!) that the
data placed in the context is going to be good.
If the data is not good, the code will segfault, bringing down with it the
child Apache process inside which it was executing. This is not a big concern
from Apache's point of view, as the parent process will fork as many new
processes as it needs - however, your server might suffer a denial of service
attack because of this. So, make your context data good!
Note that the above scenario is only applicable to the prefork Apache MPM.
Other MPM modules might behave in a different way (i.e. more than one thread
of Apache can be affected), so keep that in mind when deploying mod_spin under
those scenarios.
Memory leaks
============
Apache Portable Runtime uses memory pools for most memory allocation and
mod_spin naturally follows. It is a good and fast approach. However, some
memory pools may have rather long life cycle (namely per-thread pool of
mod_spin and its sub-pools, used for parsed templates). Although the code of
mod_spin tries to avoid these long lasting pools whenever possible, it is
sometimes unavoidable to have things put into them. Also, the connection pools
will be associated with the per-thread pool. This can lead, over time and given
huge number of requests or a lot of template changes, to small memory leaks.
That's why mod_spin as of version 0.9.4 has a new configuration parameter
SpinClearCount. After defined number of requests handled by the thread, the
per-thread pool is destroyed, including all its sub-pools and database
connections. This causes templates to be re-parsed and database connections to
be reopened, which is a performance penalty, but it might be useful in some
pathological corner cases.
Language constructs
===================
The template language of mod_spin has only three commands: a loop and two
conditionals. They look like this:
#for(${reference})
some text within the loop and a ${reference.column}
#end
#if(${reference})
some text to replace if ${reference} is not NULL
#else
some text to replace if ${reference} is NULL
#end
#unless(${reference})
some text to replace if ${reference} is NULL
#else
some text to replace if ${reference} is not NULL
#end
That's it. Everything else is the matter for the application, not the template
language.
References, which are case sensitive, placed inside the text will be replaced
with their values from the context or nothing if that value is NULL or the
reference does not exist. References are never recursively substituted (this
may create denial of service or security problems and it is therefore
avoided). If such functionality is desired, it belongs in your application.
Data types and loops
====================
You can place two different types of data in the context: single and rows.
Singles are simply character strings. They are pointed to by a char* and
limited by the length. Generally, mod_spin does not rely on '\0' being present
at the end of the string. However, regular C APIs mostly handle strings that
have the ending '\0' character. Therefore, all single data, although being
declared as 'size' in length, actually gets a '\0' character at the end
(naturally, the space for this character is allocated when the single is
created). This is very useful when communicating with regular C APIs, as it
saves a lot of copying and memory allocation. If you design your own functions
that create single data, you MUST FOLLOW THIS CONVENTION OR YOU'RE SETTING
YOURSELF UP FOR A WHOLE HEAP OF BUFFER OVERFLOWS!
Rows are data that looks a lot like something that would be returned from an
SQL query: there are named columns and data contains certain number of rows.
However, unlike what's returned by SQL queries (i.e. single pieces of data),
each actual piece of data can again be either rows or single. This then enables
nesting of multiple data dimensions. The nested #for loops are used to spin
around such data. That's where the name mod_spin comes from.
Figure 1: Example data - Single
+------+
| type | unsigned char: RXV_SPIN_DATA_SGL
+------+
| size | size_t: number of characters in data
+------+ +--------------------------------+------+
| data | char* ---> | The actual data of size 'size' | '\0' |
+------+ +--------------------------------+------+
Figure 2: Example data - Rows
+------+
| type | unsigned char: RXV_SPIN_DATA_RWS
+------+
| size | size_t: number of rows in each array pointed to by values of cols
+------+
| cols | apr_hash_t* ---+
+------+ |
|
+-------------+
|
+-----+-------+
| key | value | rxv_spin_data_t* ---+
+-----+-------+ |
| key | value | |
+-----+-------+ +--------------+----------------------------------+
| key | value | | Array of rxv_spin_data_t structures 'size' long |
+-----+-------+ +-------------------------------------------------+
| key | value |
+--+--+-------+
| +------------------------+
+---> char* ---> | zero terminated string |
+------------------------+
The final data type that is replaced into the template is always single.
mod_spin doesn't know how to replace full rows because the presentation would
be undefined. That's why you have to use #for loops to spin around rows data
type to place the singles contained there in their correct places inside the
template.
The #for loop won't spin if the data it is supposed to process is NULL. This
can happen if the appropriate data for the reference cannot be found in the
context, or if the value of it is NULL. The same applies to actual references
that are replaced into the template - if the end result is NULL, nothing is
replaced.
Metadata
========
Some of the API calls, like rxv_spin_meta_vstr() and rxv_spin_meta() return a
pointer to rxv_spin_data_t that has a type of RXV_SPIN_DATA_MTA, or metadata.
This data type is never used in the AST and if it is passed into it, it might
cause errors. Its only purpose is to facilitate the API itself by making sure
lengths of data arrays are stored somewhere, so that the programmer using the
API doesn't have to use separate variables to store them. API calls know how
to handle metadata, so if you stick to those, you should be fine.
IMPORTANT: Placing metadata into AST can have unpredictable results.
Conditionals
============
The only other command in mod_spin apart from #for loop is the conditional, as
shown above. Again, it looks like this:
#if(${reference})
something if ${reference} is not NULL
#else
something if ${reference} is NULL
#end
or the negative variant:
#unless(${reference})
something if ${reference} is NULL
#else
something if ${reference} is not NULL
#end
You can also use:
#if(${ref}) something #end
#if(${ref}) something #else#end
#if(${ref})#else something #end
and naturally:
#unless(${ref}) something #end
#unless(${ref}) something #else#end
#unless(${ref})#else something #end
Loops and impossible references
===============================
Normally, mod_spin template will look something like this:
First text ${ref1}
#for(${ref2})
replicate some other text and ${ref2.col1}
#end
The ${ref1} will be replaced with the value found in the context, if the data
it points to is a single, or nothing at all if the data it points to is of
type rows. The #for loop will spin around ${ref2} and replicate the enclosing
text for all instances of data that ${ref2} points to. The ${ref2.col1} will
be replaced with the current row value of the column "col1", if ${ref2}
happens to be a data type rows and ${ref2.col1} resolves to a data type
single for the current row.
Now let's examine an example where impossible references are used:
First text ${impossible.reference}
#for(${second.impossible.reference})
replicate some other text and ${second.impossible.reference.column}
#end
The first reference ${impossible.reference}, can never be found in the context
because there is no #for loop to spin the data in ${impossible} in order to
find ${impossible.reference}. So, when creating the AST, mod_spin will simply
ignore this reference. The reference used to spin the #for loop,
${second.impossible.reference}, is also something that cannot exist, so
mod_spin will ignore the whole loop and never place any of it into AST.
Note that this is different from the first code snippet with, for instance, the
value of ${ref2} being NULL, or not existing at all. The parsed #for loop and
the text it encloses will be placed into the AST, but it won't be replicated
because there is no data to spin the loop around.
The above discussion applies to conditional statements as well. For instance:
First text ${impossible.reference}
#if(${second.impossible.reference})
replicate some other text and ${second.impossible.reference.column}
#end
The above example would yield exactly the same output as the previous example
with the #for loop. However, if you use the #else, then whatever is placed
within it will be used. For instance:
First text ${impossible.reference}
#if(${second.impossible.reference})
replicate some other text and ${second.impossible.reference.column}
#else
this will always be in the output
#end
In the above example, the text placed between #else and #end will always be in
the output, because the #if would never be true, given that the reference is
impossible.
And one example for the #unless, the negative conditional:
#unless(${second.impossible.reference})
replicate some other text and ${second.impossible.reference.column}
#end
The above will always end up in the output, because the reference is
impossible.
Service function
================
Service function is the entry function into your application. It is called
BEFORE template processing, so it has the potential to change which template
is going to be processed as well as to decline or do the processing
completely.
The entry function (by default called rxv_spin_service()) takes one argument -
the context. It returns an integer which is similar to what an Apache handler
would return. The meaning is as follows:
OK: Everything was OK, continue with template processing. Note here that by
manipulating filename field within the request_rec structure, you can change
which template is to be processed. Make sure other fields (e.g. finfo) that
are related to filename are properly updated as well.
REDIRECT (e.g. HTTP_TEMPORARY_REDIRECT): Any further processing should not be
done as this request is going to be externally redirected. Note here that the
application HAS TO set the "Location" header in headers_out. Failing that, the
client will have problems.
DECLINED or DONE: The service function either decided it's not something this
handler should do (DECLINED) or has done all the work on behalf of it (DONE).
These are short-circuit return codes that will greatly affect Apache request
processing, so be careful with them.
ANYTHING ELSE: This will result in an internal server error.
If SpinApplication isn't specified, the shared library will not be loaded at
all, but the template will be processed as normal. However, there will be no
data in the context and therefore none will be placed in the final output.
Loading of applications
=======================
Early versions of mod_spin (up to and including 0.9.12) had a very primitive
logic of loading applications (libraries, Dynamic Shared Objects, DSOs). On
each request, the library would be loaded and then unloaded. For small
libraries, this was almost acceptable, but for libraries that pulled in a lot
of dependencies (i.e. other libraries), the performance penalty was severe
(I've measured over 8 times performance degradation in some of the cases, but
it could be even worse). To avoid that, a new system had to be implemented in
0.9.13.
First, some background. All loading of shared libraries is done using
apr_dso_load() call from APR. On Linux (and some other Unix variants), this
translates into dlopen() call, which gives back a handle to a loaded library.
If a process attempts to open the same library again, the same handle will be
given back and a reference count for that library will be increased. In a
multi-threaded environment, this would mean that if multiple threads of
execution attempt to open the same library, they would be given back the same
handle and the reference count would be equal to the number of thread passes
that opened the library. Since Apache 2 could be running in a multi-threaded
configuration (e.g. worker MPM), it is very difficult to control when a library
will be completely unloaded. Something like that would involve introduction of
per-process read/write locks, the code would become much more complicated and
bug-prone. Instead, new version of mod_spin relinquishes the control of
unloading of libraries to Apache itself.
So, mod_spin 0.9.13 and above make sure that each thread keeps cache of loaded
DSOs and that it opens a particular library only once. This is done to avoid
registering of a pool cleanup for each call to apr_dso_load(), which would
quickly grow private thread pool. Once the new applications are deployed, in
order to reliably reload them, the main Apache process has to be given SIGUSR1
signal (i.e. a graceful restart has to be initiated), so that all child
processes die and new ones replace them. This will ensure new applications are
loaded across the board.
Note that if SpinClearCount other than zero is specified, the private thread
pool will be cleaned after specified number of requests served by the thread.
Each time the pool is cleaned, the apr_dso_unload() will be called through the
pool cleanup functionality. However, dlopen() keeps reference count per
process, so relying on this functionality for reloading of new applications is
completely unreliable. The only reliable way is to gracefully restart Apache.
Maximum nesting depth
=====================
The combined nesting depth of #for and #if/#unless commands is limited to
RXV_SPIN_MAX_DEPTH, as defined in private.h, which is currently 32. Why have
such a limit and why is the limit so low?
The limit is there to make the code of mod_spin simple and fast and to avoid
logic errors in templates caused by inadvertent use of deep nesting. The limit
is low because templates that require nesting depth anywhere near this limit
are doing something very, very wrong. The purpose of template language
constructs is not to introduce programming logic (in the sense of solving the
business problem the application is meant to solve), but to make simple
presentation level choices depending on the data generated by the application.
I cannot stress enough that ALL business logic should be in the application and
application alone.
So, the whole thing is designed on purpose. You are not supposed to have a
great variety of commands available in your template language, you should not
be able to modify the data from within the template language and you should not
give in to the temptation of fixing programming issues inside the template. In
my experience, a few nesting levels in the template are quite sufficient for
majority of the real world problems. The limit currently set is way above that.
However, if you find that this is not adequate for you, for whatever reason,
feel free to modify private.h and recompile.
Presentation issues and the application
=======================================
You'll find that some of the presentation level decisions will be done within
your application as well (huh?). When given the choice of placing some
presentation level logic into the application as compared to contaminating the
template with business logic, I have chosen to go with the former. Cleverly
designed application will have a separate part that makes data generated by
business logic into a presentation friendly format. For instance, when (X)HTML
pages are created, some characters, like ``"'' and ``&'' have special meaning.
Business logic won't bother itself with making sure those are escaped.
However, the part of your application that makes sure presentation is nice,
will. Another example is a list of items on the page that should have rows
displayed in alternating colours (this particular problem can be solved by
newer version of CSS, but the browsers that support that are still not in
widespread use). Business logic, again, won't bother itself with that.
Presentation "beautifier" will.
For instance, one might have a boilerplate API calls (similar to what mod_spin
provides already, as indicated below) that adds columns to the rows data type
with a sole purpose of marking certain things for the template to pick up. One
such example would be to add a column "firstrow", which would have all data
NULL, except for the first row. Similar can be done for the last row. Again,
similar can be done for alternating rows (odd/even).
Then the template can have:
#for(${rowsofdata})
#if(${rowsofdata.firstrow})
Output this only on the first row
#end
#unless(${rowsofdata.firstrow})
Output this for all rows except for the first one
#end
#if(${rowsofdata.oddrow})
Output this only on the odd row
#else
Output this only on the even row
#end
#unless(${rowsofdata.lastrow})
Output this for all rows except for the last one
#end
#if(${rowsofdata.lastrow})
Output this only on the last row
#end
#end
These API calls would then fall into the "beautifier" category. Use your
imagination to come up with more...
How do I include other templates?
=================================
I find that duplicating functionality is not a good thing. So, I tried to stay
away from that. Apache already has mod_include, which can be used as a filter
or a handler and provides excellent support for inclusion of other files.
If the files that you're including a not dynamic (at least not very dynamic),
you should even consider generating finished files beforehand, using some of
the available replacement techniques, such as XSLT. This will be good for the
performance of your web server. On my old 1 GHz Athlon system, I have
benchmarked Apache 2 and it was capable of delivering around 2,500 static pages
per second, each around 10 kB in size (that's 25 MB/s bandwidth). Tomcat behind
Apache was able to deliver around 60 dynamic pages per second, of roughly the
same size, on the same machine (that's 600 kB/s bandwidth). It is worth an
effort to reduce what's dynamic to a minimum.
Template file size
==================
With Apache 2.0.49 and the APR that comes with it, running on Fedora Core 1,
apr_off_t, off_t, apr_size_t and size_t are all 32-bit integer values. That
means that the maximum template size on this platform is 2 GB (I'm not sure
what you'd use such large web pages for, but nevertheless). I'm guessing on
64-bit platforms those values would be 64-bit integers, which would make
possible template size much, much larger, but I have not verified that.
Session and application tracking
================================
The simplest way would be to use mod_usertrack, which is part of Apache. This
is in fact what mod_spin, up to and including 1.0.4 did. However, the cookie
generated this way is very predictable (it is simply a timestamp), so anyone
could easily figure it out. That's why mod_spin 1.0.5 and above uses a
different approach. It relies on mod_unique_id to provide a unique session
identifier, then it produces an MD5 hash of it, using the crypto salt. Both of
these (unique id and the hash) are then served to the client in a cookie,
usually called SpinSession. Only if both of these are returned back to the
server correctly, will mod_spin use this unique id as the session id.
Otherwise, the session simply won't exist. This should make both guessing of
session identifiers and denials of service attacks caused by opening of fake
sessions significantly more difficult.
As of mod_spin 1.0.5, you must define SpinCookie configuration parameter, or
the sessions won't be supported for that application at all.
Each session will have corresponding SDBM files (.dir and .pag) in the
SpinWorkspace directory (if defined), named after the session id. Each
application will have those as well, named __app.dir and __app.pag. There is
nothing special about these files - they are simply a collection of key/value
pairs. Through a simple API, you can get values for each key, either on the
application level (i.e. shared among multiple sessions) or session level (i.e.
private data).
Given those things are just files, the maintenance of stale sessions is easy.
Simply define a cron job that goes around and kills whatever is older than you
consider a valid session (i.e. has not been accessed for longer than
defined). Note that directories for keeping application and session data are
considered private (i.e. read/write by owner only) and they cannot be
symlinks. The code of mod_spin will refuse to use them if they are not. You
also need to make sure that nothing but application and session data is stored
in this directory. Otherwise, it may collide with application and session
files.
Although basic concepts have been pinched from JSP/Servlet world, applications
have a slightly different meaning in mod_spin. Basically, whatever uses the
same application database file falls under the "same application" umbrella.
You can configure SpinWorkspace per server, virtual host, directory or
location. So, applications can cross boundaries freely. Sessions are also
following the same rule, so you can have multiple session private data for
different definitions of SpinWorkspace.
Application configuration
=========================
Each application can (but doesn't have to) have a configuration file. The
filename is specified via the SpinAppConfig run-time configuration directive.
The file is regular XML and it looks like this:
]>
The value associated with spinparameter1
The value associated with spinparameter2
It is preferred to include the DTD in the document (it is only small) in order
to avoid parsing problems.
The configuration is loaded and reloaded automatically by mod_spin. Once the
configuration is parsed, the keys and values of the tags are placed
into the application's SDBM file. Every time this file is opened, the
configuration file is checked for modification. If the configuration file is
newer, it is parsed again and the keys and values are reloaded into the SDBM
file.
Authentication
==============
Apache provides enough authentication mechanisms to not duplicate this
functionality in mod_spin. And because Apache's request_rec structure contains
all environment variables, the information about the user using the resource
is always available to your applications. At this point in time, I did not
feel that keeping user data similar to session and application data was
necessary. Things like that mostly belong into the application.
However, you can wrap Apache authentication with the spin_auth application and
small amount of your own code. See spin_auth and spin_app applications for all
details.
Connection pools
================
mod_spin has a simple API for accessing SQL relational databases. In order to
improve performance of connecting to database (and other) servers, mod_spin
uses the popular pool approach. Each connection is identified by the type (of
the database) and the connection string, which are specified when the
connection is opened. mod_spin creates a hash of all those connections and
stores connection structures, which are database specific, as values in this
table. Any subsequent attempt to open a connection to the database of the same
type and with the same connection string (the keys are case sensitive) will
reuse the existing connection. This can dramatically improve performance of
applications that frequently use (database) connections.
Each Apache thread will have its own pool of connections (see also the
threading discussion that follows). While this is good for performance, it has
downsides.
With every thread having its own private connections to the back-end server,
the total number of connections can be rather big (i.e. number of threads
multiplied by number of connections per thread). Each connection takes memory,
CPU cycles and sockets for communication, which, depending on the number of
connections, might not be negligible. This alone can overwhelm the machine and
can ultimately result in denial of service. That is another reason why it is a
good idea to run a separate instance of Apache for heavily loaded applications.
Luckily, Apache is fast to start and it doesn't consume a lot of memory (in
today's terms), so you can have many instances of it running at once. With this
approach, you're turning your Apache server into a transaction processing
server.
As of version 1.0.2 of mod_spin, connection pools have been made more generic.
Now you can register any type of connection with the connection pool. It will
be treated as RXV_SPIN_CONN_FOREIGN type and as long as it has a unique
connection string, you should be fine. This can then be used for any type of
connection you'd like to keep hanging around for the lifetime of the thread.
LDAP and similar services come to mind first.
By all means, this kind of simple database API will not be everyone's cup of
tea. There are very nice alternatives (SQL Relay comes to mind first) that
solved all of these problems and more. Also, some people prefer to program in a
truly cross platform solutions like ODBC. Feel free to completely ignore
mod_spin's database API.
Threading
=========
Some Apache 2 MPMs (Multi-Processing Modules), e.g. worker, spin off multiple
threads of execution. Also, you might spin off some threads in your
application code as well. There are several issues that might be affecting
thread safety, most important being connection pool, followed closely by SDBM
database handles used for application/session tracking.
SDBM database handles are allocated from each thread when they are needed
(i.e. the SDBM files are opened by the thread for the thread). So, as long as
you don't spin off any of your threads, you should be fine. If you do spin off
threads and want to use the same handles (they are stored in rxv_spin_guts_t),
you MUST SYNCHRONISE, or you might experience weird problems in database
access, especially in terms of locking (SDBM in not capable of promoting a
shared lock into an exclusive lock). This, of course applies to other variables
within all structures as well, so having mutex/rwlock variables outside of the
context structure is a must.
The situation with connection pools is slightly more complicated. Each thread
will have its own connection pool. The separation is achieved using
apr_threadkey_private_* set of functions, which on Linux/Unix map into
pthread_key_* functions. However, if you spin off your own threads, you MUST
SYNCHRONISE access, or you will experience problems.
mod_spin code does not do any synchronisation of its own (i.e. it is thread
unsafe), simply because it makes sure beforehand that all variables are
strictly something a single thread can use without worrying about any other
threads. This ensures there is no resource competition among threads.
IMPORTANT NOTE: As of MIT Kerberos (krb5-libs) version 1.3.6 and PostgreSQL
7.4.6, the combination is not thread safe. If you use worker (or other thread
based) Apache MPM, you can experience segfaults. There are also some memory
leaks associated with krb5 library, so if you don't use database pools, you
may see Apache child processes slowly growing in size (the memory leaks have
been fixed in the upcoming version 1.4 of krb5-libs). A periodical graceful
restart should fix that. Generally speaking, a prefork MPM is recommended in
these scenarios.
PostgreSQL code
===============
Function conninfo_parse(), located in the file spin/db.c, has been modified
from the PostgreSQL 7.4.1 distribution. This function is Copyright (c)
PostgreSQL Global Development Group and Regents of the University of
California. It is included here under the following licence:
Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group
Portions Copyright (c) 1994, The Regents of the University of California
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose, without fee, and without a written agreement
is hereby granted, provided that the above copyright notice and this
paragraph and the following two paragraphs appear in all copies.
IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR
DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO
PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
Licensing exceptions
====================
The code of mod_spin is licensed under the terms of the GNU General Public
Licence, or GPL for short. However, I have made exceptions in certain files of
mod_spin to make it possible to link this code with Apache itself, its modules,
as well as dynamically link shared libraries that are mod_spin applications.
It would not be legally possible to link mod_spin against Apache (both
dynamically and statically) unless this exception was made. Also, it would not
be possible to distribute statically linked Apache that includes mod_spin.
This exception takes care of that as well. You MUST OBEY THE GPL for all
mod_spin code.
It would also not be legally possible to link any non-GPL licensed mod_spin
applications (shared libraries) dynamically, at run-time, into mod_spin.
Because I do not want to attempt to force anyone to use a particular licence
for their own work, you get permission to dynamically link, at run-time, any
of mod_spin applications (shared libraries) with mod_spin. Furthermore, Apache
can have dynamically linked modules that aren't licensed under the GPL, which
would also cause legal problems. The exception makes sure this is OK too.
This dynamic linking has to be through the interface of SpinApplication and
SpinAppEntry or LoadModules run-time configuration directives of Apache, as
provided by mod_spin code or Apache itself. Nothing but dynamic linking of
mod_spin applications and Apache third party modules is covered by this
exception and you MUST OBEY THE GPL for all mod_spin code.
If you modify mod_spin code, you may extend these exceptions to your version of
the file, but you are not obligated to do so.