Welcome to mod_spin for Apache 2.x from Rexursive ================================================= mod_spin is an Apache module that provides the following functionality (in conjunction with some other modules): - a simple template language with data replacement capabilities only - persistent application and session data tracking - dynamic linking of applications into Apache 2 as shared libraries - parameters, cookies and multipart/form-data parsing via libapreq2 - simple API for (kind of) MVC controller functionality - simple API for pooled (or not) access to SQL databases mod_spin is written in C, just like Apache 2 itself and it uses APR, which was written to be cross platform, secure and perform well. Generally speaking, you should see speed improvements when compared to Java, PHP and Perl solutions, sometimes even by an order of magnitude. This software exists to enable easy development and deployment of high performance web applications written in C (or perhaps even other languages) on Linux (or Unix) systems running Apache 2. It should be particularly easy to do that on the systems that run RPM packaging system, such as Fedora Core, Red Hat Enterprise Linux, CentOS and similar distributions. Obviously, other types of packages can be built too, but RPM support is already in mod_spin. How does mod_spin work? ======================= mod_spin is essentially a content handler, meaning, for a specified file extension(s), mod_spin will read the file, parse it into an Abstract Syntax Tree (AST) and then replace the occurrences of references with values coming from the application (this is where mod_spin is similar to Velocity). There is no predefined file extension for mod_spin templates. I sometimes use ".sm" for "spin macro", but you can use whatever you like, as long as you tell Apache what that is. At one point I was considering mod_spin as an output filter, but eventually couldn't find enough justification to do so. The code would be much more complex and I wanted to keep things simple. One day, maybe... The application, a shared library (or .so), is dynamically linked at run-time (i.e. when the request is handled by Apache) and its entry function is called. This function takes one argument, which is a structure containing the context (where the data to be replaced is stored), parsed parameters, session and application information and the current request. It then executes whatever code is appropriate for the current request, most likely based on the URI and the parameters (this completely depends on what the application actually does, of course). This execution results in data structures holding the values that are to be placed into the template. This data is then placed inside the template by traversing the AST and replacing references with values. The end result (a bucket brigade) is given to Apache output filters to push out into the world (and possibly modify the content as well). Before the application entry function is called, mod_spin takes care of application and session tracking (it relies on cookies for that). It does that in a persistent manner (i.e. the values associated with the application and session are stored in a file). SDBM database functionality, already included in APR, is used to provide hashed searches for values based on specified keys. What are mod_spin applications? =============================== They are simply shared libraries. You would normally get those as a result of writing, compiling and linking a set C program files. You can, of course, get those as a result of compiling and linking some other language. Keep in mind that mod_spin expects its data in a particular way. IF THAT'S NOT FOLLOWED, MANY THINGS WILL BREAK AND YOU MIGHT CAUSE SECURITY PROBLEMS ON YOUR SYSTEM. That being said, mod_spin applications are probably not for someone that isn't comfortable with application development in C. If you're looking for a scripting language, mod_spin isn't it. Actually, one of the main reasons for writing mod_spin was that I wanted full access to C Unix API but without the need to hammer (X)HTML out of my code. With mod_spin you can keep your focus on business logic and forget presentation for the most part. See section "Service function" for all the details related to the entry point into the application. Why not CSP (C Server Pages)? ============================= C Server Pages are an implementation similar to JSP (Java Server Pages), but unlike JSP, feature C language snippets, not Java, placed into HTML. Such page is then converted into a C program, compiled and then linked into a shared library, which is dynamically linked into Apache at run-time. In essence, it taps directly into the C run-time system, just like mod_spin does. It is probably faster because it does no template processing. However, just like JSP, it suffers from similar problems. The first one is a confusing mix of C programming language with HTML. This makes it completely unusable (on the presentation level) by non-experts, even if trivial changes to the page are required (i.e. a spelling fix can cause serious functionality problems, even security violations), not to mention that the mixed code is truly unreadable. The second one is the "translate -> compile -> link -> dynamic link -> run" process, which creates further complications and opens up the possibilities for strange run-time errors. And finally, a full C development environment has to be installed on the system running CSP, in case any of the pages ever get changed. These arguments are more or less the same as the one when a template language such as Velocity is compared to JSP. mod_spin avoids the above by defining a simple, data replacement only template language and leaves all of the programming logic where it belongs - in the application. At the same time, the application behaves mostly (but not completely) neutral as far as presentation of data is concerned. It is almost irrelevant what the output is going to look like, so most of the time programmers are only busy working on functionality, not looks. The above arguments, however, don't cut it for everyone (as I have observed in my encounters with other developers), so if you're one of those, mod_spin is probably not for you. Security concerns ================= Just like anything else written in C, if you aren't careful, you can shoot yourself in the foot quite effectively. Buffer overflows and similar problems can, however, be avoided if problematic functions aren't used and good programming practices followed. mod_spin makes heavy use of APR, which is an example of an API that was designed from the ground up to be secure. Even if you're most careful, security problems can happen. It is therefore good to follow guidelines for secure Apache setup. In extreme circumstances (e.g. when you're allowing others to deploy their own applications into Apache running mod_spin), IT IS ADVISABLE TO RUN A SEPARATE INSTANCE OF APACHE, BEHIND THE MAIN SERVER, WITH A SOLE PURPOSE OF RUNNING MOD_SPIN APPLICATIONS. Virtual hosting for multiple clients is one of the examples where such a scenario might be effective. Applying chroot jail, SELinux and/or running different Apache instances under different user IDs on unprivileged ports will go a long way toward ensuring that even if someone breaks in, the potential for damage is minimal. Here are some very important security implications that may arise from the use of mod_spin. This is in relation to tracking of application and session data and to connection pools. To understand the issues involved, one needs to understand how Apache deals with multiple client connections. One also needs to understand Unix file permission model. Apache will generally spin up numerous processes or threads in order to handle multiple connections from clients. There is no guarantee that a process/thread that handled one client's connection will handle it again in the future. The process/thread will be assigned at Apache's discretion. So, it is possible, even likely, that a process that handled something related to one client, handles another client next time. If there is anything in the memory space of this process/thread left over from the previous client, it will be completely accessible to the next client. Applications of mod_spin, shared libraries, are linked directly into the running process and they have full access to the memory space of that process. The above means that an application can fetch any previously opened connection from the pool of connections and use it at will. Depending on how this connection was opened in the first place (by the original client), this will enable reading and/or writing of data that otherwise might not be accessible. IT IS CLEAR THAT THIS IS A SERIOUS SECURITY IMPLICATION. Generally speaking, you should make sure that mod_spin applications linked into an instance of Apache are all from the same "security realm". For instance, if you're using mod_spin to enable dynamic applications virtually hosted on a single server (machine) for your customers, allowing two different customers to deploy mod_spin application into the same instance of Apache will allow them to read/write each other's databases. This might be accidental or, more seriously, intentional and malicious. YOU SHOULD ABSOLUTELY MAKE SURE THAT SUCH APPLICATIONS ARE DEPLOYED INTO DIFFERENT INSTANCES OF APACHE, RUNNING UNDER DIFFERENT USER ACCOUNTS! The second problem is connected to this in terms of user accounts and access to files. Session and application data tracking files will be readable and writable by the user Apache processes run as (this is defined in the Apache configuration file). So, in the above scenario with two customers, they would be able to read/write each other's session/application files. Once again, YOU SHOULD ABSOLUTELY MAKE SURE THAT SUCH APPLICATIONS ARE DEPLOYED INTO DIFFERENT INSTANCES OF APACHE, RUNNING UNDER DIFFERENT USER ACCOUNTS! General recommendation is: if you don't have full control over all applications and you're using session/application persistent store and/or connection pools, you should have separate instances of Apache for each identifiable "security realm". Stability ========= If you ever wrote a C program, you know that one of the most dreadful things is the infamous Segmentation Fault (SIGSEGV, Signal 11). It happens when your program tries to dereference a memory location that is invalid, such as NULL. mod_spin makes reasonable effort to ensure the raw data it handles (the template, session and application data, parameters, cookies etc.) is processed in a manner that produces no segfaults. As for the context, the data that your own application prepares, mod_spin doesn't have any control over what's in there. It will take certain precautions against obvious stuff like NULL pointers, but some of the other errors might be complicated to detect and handle. And because mod_spin is a small and lightweight piece of software, it doesn't do any of that. It simply relies on you (yes, that's YOU!) that the data placed in the context is going to be good. If the data is not good, the code will segfault, bringing down with it the child Apache process inside which it was executing. This is not a big concern from Apache's point of view, as the parent process will fork as many new processes as it needs - however, your server might suffer a denial of service attack because of this. So, make your context data good! Note that the above scenario is only applicable to the prefork Apache MPM. Other MPM modules might behave in a different way (i.e. more than one thread of Apache can be affected), so keep that in mind when deploying mod_spin under those scenarios. Memory leaks ============ Apache Portable Runtime uses memory pools for most memory allocation and mod_spin naturally follows. It is a good and fast approach. However, some memory pools may have rather long life cycle (namely per-thread pool of mod_spin and its sub-pools, used for parsed templates). Although the code of mod_spin tries to avoid these long lasting pools whenever possible, it is sometimes unavoidable to have things put into them. Also, the connection pools will be associated with the per-thread pool. This can lead, over time and given huge number of requests or a lot of template changes, to small memory leaks. That's why mod_spin as of version 0.9.4 has a new configuration parameter SpinClearCount. After defined number of requests handled by the thread, the per-thread pool is destroyed, including all its sub-pools and database connections. This causes templates to be re-parsed and database connections to be reopened, which is a performance penalty, but it might be useful in some pathological corner cases. Language constructs =================== The template language of mod_spin has only three commands: a loop and two conditionals. They look like this: #for(${reference}) some text within the loop and a ${reference.column} #end #if(${reference}) some text to replace if ${reference} is not NULL #else some text to replace if ${reference} is NULL #end #unless(${reference}) some text to replace if ${reference} is NULL #else some text to replace if ${reference} is not NULL #end That's it. Everything else is the matter for the application, not the template language. References, which are case sensitive, placed inside the text will be replaced with their values from the context or nothing if that value is NULL or the reference does not exist. References are never recursively substituted (this may create denial of service or security problems and it is therefore avoided). If such functionality is desired, it belongs in your application. Data types and loops ==================== You can place two different types of data in the context: single and rows. Singles are simply character strings. They are pointed to by a char* and limited by the length. Generally, mod_spin does not rely on '\0' being present at the end of the string. However, regular C APIs mostly handle strings that have the ending '\0' character. Therefore, all single data, although being declared as 'size' in length, actually gets a '\0' character at the end (naturally, the space for this character is allocated when the single is created). This is very useful when communicating with regular C APIs, as it saves a lot of copying and memory allocation. If you design your own functions that create single data, you MUST FOLLOW THIS CONVENTION OR YOU'RE SETTING YOURSELF UP FOR A WHOLE HEAP OF BUFFER OVERFLOWS! Rows are data that looks a lot like something that would be returned from an SQL query: there are named columns and data contains certain number of rows. However, unlike what's returned by SQL queries (i.e. single pieces of data), each actual piece of data can again be either rows or single. This then enables nesting of multiple data dimensions. The nested #for loops are used to spin around such data. That's where the name mod_spin comes from. Figure 1: Example data - Single +------+ | type | unsigned char: RXV_SPIN_DATA_SGL +------+ | size | size_t: number of characters in data +------+ +--------------------------------+------+ | data | char* ---> | The actual data of size 'size' | '\0' | +------+ +--------------------------------+------+ Figure 2: Example data - Rows +------+ | type | unsigned char: RXV_SPIN_DATA_RWS +------+ | size | size_t: number of rows in each array pointed to by values of cols +------+ | cols | apr_hash_t* ---+ +------+ | | +-------------+ | +-----+-------+ | key | value | rxv_spin_data_t* ---+ +-----+-------+ | | key | value | | +-----+-------+ +--------------+----------------------------------+ | key | value | | Array of rxv_spin_data_t structures 'size' long | +-----+-------+ +-------------------------------------------------+ | key | value | +--+--+-------+ | +------------------------+ +---> char* ---> | zero terminated string | +------------------------+ The final data type that is replaced into the template is always single. mod_spin doesn't know how to replace full rows because the presentation would be undefined. That's why you have to use #for loops to spin around rows data type to place the singles contained there in their correct places inside the template. The #for loop won't spin if the data it is supposed to process is NULL. This can happen if the appropriate data for the reference cannot be found in the context, or if the value of it is NULL. The same applies to actual references that are replaced into the template - if the end result is NULL, nothing is replaced. Metadata ======== Some of the API calls, like rxv_spin_meta_vstr() and rxv_spin_meta() return a pointer to rxv_spin_data_t that has a type of RXV_SPIN_DATA_MTA, or metadata. This data type is never used in the AST and if it is passed into it, it might cause errors. Its only purpose is to facilitate the API itself by making sure lengths of data arrays are stored somewhere, so that the programmer using the API doesn't have to use separate variables to store them. API calls know how to handle metadata, so if you stick to those, you should be fine. IMPORTANT: Placing metadata into AST can have unpredictable results. Conditionals ============ The only other command in mod_spin apart from #for loop is the conditional, as shown above. Again, it looks like this: #if(${reference}) something if ${reference} is not NULL #else something if ${reference} is NULL #end or the negative variant: #unless(${reference}) something if ${reference} is NULL #else something if ${reference} is not NULL #end You can also use: #if(${ref}) something #end #if(${ref}) something #else#end #if(${ref})#else something #end and naturally: #unless(${ref}) something #end #unless(${ref}) something #else#end #unless(${ref})#else something #end Loops and impossible references =============================== Normally, mod_spin template will look something like this: First text ${ref1} #for(${ref2}) replicate some other text and ${ref2.col1} #end The ${ref1} will be replaced with the value found in the context, if the data it points to is a single, or nothing at all if the data it points to is of type rows. The #for loop will spin around ${ref2} and replicate the enclosing text for all instances of data that ${ref2} points to. The ${ref2.col1} will be replaced with the current row value of the column "col1", if ${ref2} happens to be a data type rows and ${ref2.col1} resolves to a data type single for the current row. Now let's examine an example where impossible references are used: First text ${impossible.reference} #for(${second.impossible.reference}) replicate some other text and ${second.impossible.reference.column} #end The first reference ${impossible.reference}, can never be found in the context because there is no #for loop to spin the data in ${impossible} in order to find ${impossible.reference}. So, when creating the AST, mod_spin will simply ignore this reference. The reference used to spin the #for loop, ${second.impossible.reference}, is also something that cannot exist, so mod_spin will ignore the whole loop and never place any of it into AST. Note that this is different from the first code snippet with, for instance, the value of ${ref2} being NULL, or not existing at all. The parsed #for loop and the text it encloses will be placed into the AST, but it won't be replicated because there is no data to spin the loop around. The above discussion applies to conditional statements as well. For instance: First text ${impossible.reference} #if(${second.impossible.reference}) replicate some other text and ${second.impossible.reference.column} #end The above example would yield exactly the same output as the previous example with the #for loop. However, if you use the #else, then whatever is placed within it will be used. For instance: First text ${impossible.reference} #if(${second.impossible.reference}) replicate some other text and ${second.impossible.reference.column} #else this will always be in the output #end In the above example, the text placed between #else and #end will always be in the output, because the #if would never be true, given that the reference is impossible. And one example for the #unless, the negative conditional: #unless(${second.impossible.reference}) replicate some other text and ${second.impossible.reference.column} #end The above will always end up in the output, because the reference is impossible. Service function ================ Service function is the entry function into your application. It is called BEFORE template processing, so it has the potential to change which template is going to be processed as well as to decline or do the processing completely. The entry function (by default called rxv_spin_service()) takes one argument - the context. It returns an integer which is similar to what an Apache handler would return. The meaning is as follows: OK: Everything was OK, continue with template processing. Note here that by manipulating filename field within the request_rec structure, you can change which template is to be processed. Make sure other fields (e.g. finfo) that are related to filename are properly updated as well. REDIRECT (e.g. HTTP_TEMPORARY_REDIRECT): Any further processing should not be done as this request is going to be externally redirected. Note here that the application HAS TO set the "Location" header in headers_out. Failing that, the client will have problems. DECLINED or DONE: The service function either decided it's not something this handler should do (DECLINED) or has done all the work on behalf of it (DONE). These are short-circuit return codes that will greatly affect Apache request processing, so be careful with them. ANYTHING ELSE: This will result in an internal server error. If SpinApplication isn't specified, the shared library will not be loaded at all, but the template will be processed as normal. However, there will be no data in the context and therefore none will be placed in the final output. Loading of applications ======================= Early versions of mod_spin (up to and including 0.9.12) had a very primitive logic of loading applications (libraries, Dynamic Shared Objects, DSOs). On each request, the library would be loaded and then unloaded. For small libraries, this was almost acceptable, but for libraries that pulled in a lot of dependencies (i.e. other libraries), the performance penalty was severe (I've measured over 8 times performance degradation in some of the cases, but it could be even worse). To avoid that, a new system had to be implemented in 0.9.13. First, some background. All loading of shared libraries is done using apr_dso_load() call from APR. On Linux (and some other Unix variants), this translates into dlopen() call, which gives back a handle to a loaded library. If a process attempts to open the same library again, the same handle will be given back and a reference count for that library will be increased. In a multi-threaded environment, this would mean that if multiple threads of execution attempt to open the same library, they would be given back the same handle and the reference count would be equal to the number of thread passes that opened the library. Since Apache 2 could be running in a multi-threaded configuration (e.g. worker MPM), it is very difficult to control when a library will be completely unloaded. Something like that would involve introduction of per-process read/write locks, the code would become much more complicated and bug-prone. Instead, new version of mod_spin relinquishes the control of unloading of libraries to Apache itself. So, mod_spin 0.9.13 and above make sure that each thread keeps cache of loaded DSOs and that it opens a particular library only once. This is done to avoid registering of a pool cleanup for each call to apr_dso_load(), which would quickly grow private thread pool. Once the new applications are deployed, in order to reliably reload them, the main Apache process has to be given SIGUSR1 signal (i.e. a graceful restart has to be initiated), so that all child processes die and new ones replace them. This will ensure new applications are loaded across the board. Note that if SpinClearCount other than zero is specified, the private thread pool will be cleaned after specified number of requests served by the thread. Each time the pool is cleaned, the apr_dso_unload() will be called through the pool cleanup functionality. However, dlopen() keeps reference count per process, so relying on this functionality for reloading of new applications is completely unreliable. The only reliable way is to gracefully restart Apache. Maximum nesting depth ===================== The combined nesting depth of #for and #if/#unless commands is limited to RXV_SPIN_MAX_DEPTH, as defined in private.h, which is currently 32. Why have such a limit and why is the limit so low? The limit is there to make the code of mod_spin simple and fast and to avoid logic errors in templates caused by inadvertent use of deep nesting. The limit is low because templates that require nesting depth anywhere near this limit are doing something very, very wrong. The purpose of template language constructs is not to introduce programming logic (in the sense of solving the business problem the application is meant to solve), but to make simple presentation level choices depending on the data generated by the application. I cannot stress enough that ALL business logic should be in the application and application alone. So, the whole thing is designed on purpose. You are not supposed to have a great variety of commands available in your template language, you should not be able to modify the data from within the template language and you should not give in to the temptation of fixing programming issues inside the template. In my experience, a few nesting levels in the template are quite sufficient for majority of the real world problems. The limit currently set is way above that. However, if you find that this is not adequate for you, for whatever reason, feel free to modify private.h and recompile. Presentation issues and the application ======================================= You'll find that some of the presentation level decisions will be done within your application as well (huh?). When given the choice of placing some presentation level logic into the application as compared to contaminating the template with business logic, I have chosen to go with the former. Cleverly designed application will have a separate part that makes data generated by business logic into a presentation friendly format. For instance, when (X)HTML pages are created, some characters, like ``"'' and ``&'' have special meaning. Business logic won't bother itself with making sure those are escaped. However, the part of your application that makes sure presentation is nice, will. Another example is a list of items on the page that should have rows displayed in alternating colours (this particular problem can be solved by newer version of CSS, but the browsers that support that are still not in widespread use). Business logic, again, won't bother itself with that. Presentation "beautifier" will. For instance, one might have a boilerplate API calls (similar to what mod_spin provides already, as indicated below) that adds columns to the rows data type with a sole purpose of marking certain things for the template to pick up. One such example would be to add a column "firstrow", which would have all data NULL, except for the first row. Similar can be done for the last row. Again, similar can be done for alternating rows (odd/even). Then the template can have: #for(${rowsofdata}) #if(${rowsofdata.firstrow}) Output this only on the first row #end #unless(${rowsofdata.firstrow}) Output this for all rows except for the first one #end #if(${rowsofdata.oddrow}) Output this only on the odd row #else Output this only on the even row #end #unless(${rowsofdata.lastrow}) Output this for all rows except for the last one #end #if(${rowsofdata.lastrow}) Output this only on the last row #end #end These API calls would then fall into the "beautifier" category. Use your imagination to come up with more... How do I include other templates? ================================= I find that duplicating functionality is not a good thing. So, I tried to stay away from that. Apache already has mod_include, which can be used as a filter or a handler and provides excellent support for inclusion of other files. If the files that you're including a not dynamic (at least not very dynamic), you should even consider generating finished files beforehand, using some of the available replacement techniques, such as XSLT. This will be good for the performance of your web server. On my old 1 GHz Athlon system, I have benchmarked Apache 2 and it was capable of delivering around 2,500 static pages per second, each around 10 kB in size (that's 25 MB/s bandwidth). Tomcat behind Apache was able to deliver around 60 dynamic pages per second, of roughly the same size, on the same machine (that's 600 kB/s bandwidth). It is worth an effort to reduce what's dynamic to a minimum. Template file size ================== With Apache 2.0.49 and the APR that comes with it, running on Fedora Core 1, apr_off_t, off_t, apr_size_t and size_t are all 32-bit integer values. That means that the maximum template size on this platform is 2 GB (I'm not sure what you'd use such large web pages for, but nevertheless). I'm guessing on 64-bit platforms those values would be 64-bit integers, which would make possible template size much, much larger, but I have not verified that. Session and application tracking ================================ The simplest way would be to use mod_usertrack, which is part of Apache. This is in fact what mod_spin, up to and including 1.0.4 did. However, the cookie generated this way is very predictable (it is simply a timestamp), so anyone could easily figure it out. That's why mod_spin 1.0.5 and above uses a different approach. It relies on mod_unique_id to provide a unique session identifier, then it produces an MD5 hash of it, using the crypto salt. Both of these (unique id and the hash) are then served to the client in a cookie, usually called SpinSession. Only if both of these are returned back to the server correctly, will mod_spin use this unique id as the session id. Otherwise, the session simply won't exist. This should make both guessing of session identifiers and denials of service attacks caused by opening of fake sessions significantly more difficult. As of mod_spin 1.0.5, you must define SpinCookie configuration parameter, or the sessions won't be supported for that application at all. Each session will have corresponding SDBM files (.dir and .pag) in the SpinWorkspace directory (if defined), named after the session id. Each application will have those as well, named __app.dir and __app.pag. There is nothing special about these files - they are simply a collection of key/value pairs. Through a simple API, you can get values for each key, either on the application level (i.e. shared among multiple sessions) or session level (i.e. private data). Given those things are just files, the maintenance of stale sessions is easy. Simply define a cron job that goes around and kills whatever is older than you consider a valid session (i.e. has not been accessed for longer than defined). Note that directories for keeping application and session data are considered private (i.e. read/write by owner only) and they cannot be symlinks. The code of mod_spin will refuse to use them if they are not. You also need to make sure that nothing but application and session data is stored in this directory. Otherwise, it may collide with application and session files. Although basic concepts have been pinched from JSP/Servlet world, applications have a slightly different meaning in mod_spin. Basically, whatever uses the same application database file falls under the "same application" umbrella. You can configure SpinWorkspace per server, virtual host, directory or location. So, applications can cross boundaries freely. Sessions are also following the same rule, so you can have multiple session private data for different definitions of SpinWorkspace. Application configuration ========================= Each application can (but doesn't have to) have a configuration file. The filename is specified via the SpinAppConfig run-time configuration directive. The file is regular XML and it looks like this: ]> The value associated with spinparameter1 The value associated with spinparameter2 It is preferred to include the DTD in the document (it is only small) in order to avoid parsing problems. The configuration is loaded and reloaded automatically by mod_spin. Once the configuration is parsed, the keys and values of the tags are placed into the application's SDBM file. Every time this file is opened, the configuration file is checked for modification. If the configuration file is newer, it is parsed again and the keys and values are reloaded into the SDBM file. Authentication ============== Apache provides enough authentication mechanisms to not duplicate this functionality in mod_spin. And because Apache's request_rec structure contains all environment variables, the information about the user using the resource is always available to your applications. At this point in time, I did not feel that keeping user data similar to session and application data was necessary. Things like that mostly belong into the application. However, you can wrap Apache authentication with the spin_auth application and small amount of your own code. See spin_auth and spin_app applications for all details. Connection pools ================ mod_spin has a simple API for accessing SQL relational databases. In order to improve performance of connecting to database (and other) servers, mod_spin uses the popular pool approach. Each connection is identified by the type (of the database) and the connection string, which are specified when the connection is opened. mod_spin creates a hash of all those connections and stores connection structures, which are database specific, as values in this table. Any subsequent attempt to open a connection to the database of the same type and with the same connection string (the keys are case sensitive) will reuse the existing connection. This can dramatically improve performance of applications that frequently use (database) connections. Each Apache thread will have its own pool of connections (see also the threading discussion that follows). While this is good for performance, it has downsides. With every thread having its own private connections to the back-end server, the total number of connections can be rather big (i.e. number of threads multiplied by number of connections per thread). Each connection takes memory, CPU cycles and sockets for communication, which, depending on the number of connections, might not be negligible. This alone can overwhelm the machine and can ultimately result in denial of service. That is another reason why it is a good idea to run a separate instance of Apache for heavily loaded applications. Luckily, Apache is fast to start and it doesn't consume a lot of memory (in today's terms), so you can have many instances of it running at once. With this approach, you're turning your Apache server into a transaction processing server. As of version 1.0.2 of mod_spin, connection pools have been made more generic. Now you can register any type of connection with the connection pool. It will be treated as RXV_SPIN_CONN_FOREIGN type and as long as it has a unique connection string, you should be fine. This can then be used for any type of connection you'd like to keep hanging around for the lifetime of the thread. LDAP and similar services come to mind first. By all means, this kind of simple database API will not be everyone's cup of tea. There are very nice alternatives (SQL Relay comes to mind first) that solved all of these problems and more. Also, some people prefer to program in a truly cross platform solutions like ODBC. Feel free to completely ignore mod_spin's database API. Threading ========= Some Apache 2 MPMs (Multi-Processing Modules), e.g. worker, spin off multiple threads of execution. Also, you might spin off some threads in your application code as well. There are several issues that might be affecting thread safety, most important being connection pool, followed closely by SDBM database handles used for application/session tracking. SDBM database handles are allocated from each thread when they are needed (i.e. the SDBM files are opened by the thread for the thread). So, as long as you don't spin off any of your threads, you should be fine. If you do spin off threads and want to use the same handles (they are stored in rxv_spin_guts_t), you MUST SYNCHRONISE, or you might experience weird problems in database access, especially in terms of locking (SDBM in not capable of promoting a shared lock into an exclusive lock). This, of course applies to other variables within all structures as well, so having mutex/rwlock variables outside of the context structure is a must. The situation with connection pools is slightly more complicated. Each thread will have its own connection pool. The separation is achieved using apr_threadkey_private_* set of functions, which on Linux/Unix map into pthread_key_* functions. However, if you spin off your own threads, you MUST SYNCHRONISE access, or you will experience problems. mod_spin code does not do any synchronisation of its own (i.e. it is thread unsafe), simply because it makes sure beforehand that all variables are strictly something a single thread can use without worrying about any other threads. This ensures there is no resource competition among threads. IMPORTANT NOTE: As of MIT Kerberos (krb5-libs) version 1.3.6 and PostgreSQL 7.4.6, the combination is not thread safe. If you use worker (or other thread based) Apache MPM, you can experience segfaults. There are also some memory leaks associated with krb5 library, so if you don't use database pools, you may see Apache child processes slowly growing in size (the memory leaks have been fixed in the upcoming version 1.4 of krb5-libs). A periodical graceful restart should fix that. Generally speaking, a prefork MPM is recommended in these scenarios. PostgreSQL code =============== Function conninfo_parse(), located in the file spin/db.c, has been modified from the PostgreSQL 7.4.1 distribution. This function is Copyright (c) PostgreSQL Global Development Group and Regents of the University of California. It is included here under the following licence: Portions Copyright (c) 1996-2003, PostgreSQL Global Development Group Portions Copyright (c) 1994, The Regents of the University of California Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies. IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. Licensing exceptions ==================== The code of mod_spin is licensed under the terms of the GNU General Public Licence, or GPL for short. However, I have made exceptions in certain files of mod_spin to make it possible to link this code with Apache itself, its modules, as well as dynamically link shared libraries that are mod_spin applications. It would not be legally possible to link mod_spin against Apache (both dynamically and statically) unless this exception was made. Also, it would not be possible to distribute statically linked Apache that includes mod_spin. This exception takes care of that as well. You MUST OBEY THE GPL for all mod_spin code. It would also not be legally possible to link any non-GPL licensed mod_spin applications (shared libraries) dynamically, at run-time, into mod_spin. Because I do not want to attempt to force anyone to use a particular licence for their own work, you get permission to dynamically link, at run-time, any of mod_spin applications (shared libraries) with mod_spin. Furthermore, Apache can have dynamically linked modules that aren't licensed under the GPL, which would also cause legal problems. The exception makes sure this is OK too. This dynamic linking has to be through the interface of SpinApplication and SpinAppEntry or LoadModules run-time configuration directives of Apache, as provided by mod_spin code or Apache itself. Nothing but dynamic linking of mod_spin applications and Apache third party modules is covered by this exception and you MUST OBEY THE GPL for all mod_spin code. If you modify mod_spin code, you may extend these exceptions to your version of the file, but you are not obligated to do so.