Welcome to mod_spin for Apache 2.2+ from Rexursive(R) ===================================================== mod_spin is an Apache module that provides the following functionality (in conjunction with some other modules): - a simple template language with data replacement capabilities only - persistent application and session data tracking - dynamic linking of applications into Apache as shared libraries - parameters, cookies and multipart/form-data parsing via libapreq2 - simple API for (kind of) MVC controller functionality - simple API for pooled (or not) access to SQL databases via APR DBD mod_spin is written in C, just like Apache itself and it uses APR, which was written to be cross platform, secure and perform well. Generally speaking, you should see speed improvements when compared to Java, PHP and Perl solutions, sometimes even by an order of magnitude. This software exists to enable easy development and deployment of high performance web applications written in C (or perhaps even other languages) on Linux (or Unix) systems running Apache. It should be particularly easy to do that on systems that run RPM packaging system, such as Fedora, Red Hat Enterprise Linux, CentOS and similar distributions. Obviously, other types of packages can be built too, but RPM support is already in mod_spin. How does mod_spin work? ======================= mod_spin is a content handler and/or a filter, meaning, for a specified file extension, mod_spin will read the file (or other input), parse it into an Abstract Syntax Tree (AST) and then replace the occurrences of references with values coming from the application (this is where mod_spin is similar to Velocity). There is no predefined file extension for mod_spin templates. I sometimes use ".sm" for "spin macro", but you can use whatever you like, as long as you tell Apache what that is. The application, a shared library (or .so), is dynamically linked at run-time (i.e. when the request is handled by Apache) and its entry functions are called, one when the library is dynamically linked into Apache, one during the fixup phase of the request and one during the handler phase. These functions take one argument, which is a structure containing the context. The context holds the data to be replaced, parsed parameters, session and application information and the current request. It then executes whatever code is appropriate for the current request, most likely based on the URI and the parameters (this completely depends on what the application actually does, of course). This execution results in data structures holding the values that are to be placed into the template. This data is then placed inside the template by traversing the AST and replacing references with values. The end result (a bucket brigade) is given to Apache output filters to push out into the world (and possibly modify the content as well). Before the application entry functions are called, mod_spin takes care of application and session tracking (it relies on cookies for that). It does that in a persistent manner (i.e. the values associated with the application and session are stored in an SQL database or XML files). Apache Portable Runtime DBD layer is used to provide access to various SQL backends, like PostgreSQL, MySQL, SQLite2/3, Oracle etc. What are mod_spin applications? =============================== They are simply shared libraries. You would normally get those as a result of writing, compiling and linking a set C program files. You can, of course, get those as a result of compiling and linking some other language. Keep in mind that mod_spin expects its data in a particular way. IF THAT'S NOT FOLLOWED, MANY THINGS WILL BREAK AND YOU MIGHT CAUSE SECURITY PROBLEMS ON YOUR SYSTEM. That being said, mod_spin applications are probably not for someone that isn't comfortable with application development in C. If you're looking for a scripting language, mod_spin isn't it. Actually, one of the main reasons for writing mod_spin was that I wanted full access to C Unix API but without the need to hammer (X)HTML out of my code. With mod_spin you can keep your focus on business logic and forget presentation for the most part. See sections "Service function", "Prepare function" and "Init function" for all the details related to the entry points into the application. Why not CSP (C Server Pages)? ============================= C Server Pages are an implementation similar to JSP (Java Server Pages), but unlike JSP, feature C language snippets, not Java, placed into HTML. Such page is then converted into a C program, compiled and then linked into a shared library, which is dynamically linked into Apache at run-time. In essence, it taps directly into the C run-time system, just like mod_spin does. It is probably faster because it does no template processing. However, just like JSP, it suffers from similar problems. The first one is a confusing mix of C programming language with HTML. This makes it completely unusable (on the presentation level) by non-experts, even if trivial changes to the page are required (i.e. a spelling fix can cause serious functionality problems, even security violations), not to mention that the mixed code is truly unreadable. The second one is the "translate -> compile -> link -> dynamic link -> run" process, which creates further complications and opens up the possibilities for strange run-time errors. And finally, a full C development environment has to be installed on the system running CSP, in case any of the pages ever get changed. These arguments are more or less the same as the one when a template language such as Velocity is compared to JSP. mod_spin avoids the above by defining a simple, data replacement only, template language and leaves all of the programming logic where it belongs - in the application. At the same time, the application behaves mostly (but not completely) neutral as far as presentation of data is concerned. It is almost irrelevant what the output is going to look like, so most of the time programmers are only busy working on functionality, not looks. The above arguments, however, don't cut it for everyone (as I have observed in my encounters with other developers), so if you're one of those, mod_spin is probably not for you. Security concerns ================= Just like anything else written in C, if you aren't careful, you can shoot yourself in the foot quite effectively. Buffer overflows and similar problems can, however, be avoided if problematic functions aren't used and good programming practices followed. mod_spin makes heavy use of APR, which is an example of an API that was designed from the ground up to be secure. Even if you're most careful, security problems can happen. It is therefore good to follow guidelines for secure Apache setup. In extreme circumstances (e.g. when you're allowing others to deploy their own applications into Apache running mod_spin), IT IS ADVISABLE TO RUN A SEPARATE INSTANCE OF APACHE, BEHIND THE MAIN SERVER, WITH A SOLE PURPOSE OF RUNNING MOD_SPIN APPLICATIONS. Virtual hosting for multiple clients is one of the examples where such a scenario might be effective. Applying chroot jail, SELinux and/or running different Apache instances under different user IDs on unprivileged ports will go a long way toward ensuring that even if someone breaks in, the potential for damage is minimal. Also, technologies like GCC's stack protector, kernel's exec-shield and NX bits may prove useful. Here are some very important security implications that may arise from the use of mod_spin. This is in relation to connection pools. To understand the issues involved, one needs to understand how Apache deals with multiple client connections. Apache will generally spin up numerous processes or threads in order to handle multiple connections from clients. There is no guarantee that a process/thread that handled one client's connection will handle it again in the future. The process/thread will be assigned at Apache's discretion. So, it is possible, even likely, that a process that handled something related to one client, handles another client next time. If there is anything in the memory space of this process/thread left over from the previous client, it will be completely accessible to the next client. Applications of mod_spin, shared libraries, are linked directly into the running process and they have full access to the memory space of that process. The above means that an application can fetch any previously opened connection from the pool of connections and use it at will. Depending on how this connection was opened in the first place (by the original client), this will enable reading and/or writing of data that otherwise might not be accessible. IT IS CLEAR THAT THIS IS A SERIOUS SECURITY IMPLICATION. Generally speaking, you should make sure that mod_spin applications linked into an instance of Apache are all from the same "security realm". For instance, if you're using mod_spin to enable dynamic applications virtually hosted on a single server (machine) for your customers, allowing two different customers to deploy mod_spin application into the same instance of Apache will allow them to read/write each other's databases. This might be accidental or, more seriously, intentional and malicious. YOU SHOULD ABSOLUTELY MAKE SURE THAT SUCH APPLICATIONS ARE DEPLOYED INTO DIFFERENT INSTANCES OF APACHE, RUNNING UNDER DIFFERENT USER ACCOUNTS! General recommendation is: if you don't have full control over all applications and you're using session/application, SQL/XML based, persistent store and/or connection pools, you should have separate instances of Apache for each identifiable "security realm". Stability ========= If you ever wrote a C program, you know that one of the most dreadful things is the infamous Segmentation Fault (SIGSEGV, Signal 11). It happens when your program tries to dereference a memory location that is invalid, such as NULL. mod_spin makes reasonable effort to ensure that the raw data it handles (the template, session and application data, parameters, cookies etc.) is processed in a manner that produces no segfaults. As for the context, the data that your own application prepares, mod_spin doesn't have any control over what's in there. It will take certain precautions against obvious stuff like NULL pointers, but some of the other errors might be complicated to detect and handle. And because mod_spin is a small and lightweight piece of software, it doesn't do any of that. It simply relies on you (yes, that's YOU!) that the data placed in the context is going to be good. If the data is not good, the code will segfault, bringing down with it the child Apache process inside which it was executing. This is not a big concern from Apache's point of view, as the parent process will fork as many new processes as it needs - however, your server might suffer a denial of service attack because of this. So, make your context data good! Note that the above scenario is only applicable to the prefork Apache MPM. Other MPM modules might behave in a different way (i.e. more than one thread of Apache can be affected), so keep that in mind when deploying mod_spin under those scenarios. Memory management ================= Apache Portable Runtime uses memory pools for most memory allocation and mod_spin naturally follows. It is a good and fast approach. However, some memory pools may have rather long life cycle. Although the code of mod_spin tries to avoid these long lasting pools whenever possible, it is sometimes unavoidable to have things put into them. Also, template cache and connection pools will be associated with the process pool (i.e. objects from the connection pool may be pointing to objects from the process pool for longer than one request). This can lead, over time and given huge number of requests, to small memory leaks. To avoid this, Apache comes with a configuration directive that helps in reduction of such problems. This directive is MaxRequestsPerChild, and it is set to 10,000 by default. You may also want to consider using MaxMemFree directive, which forces Apache's pool machinery to release free memory more agressively. Making changes to both of these directives may have performance implications, so test in your scenario before applying. Language construct overview =========================== The template language of mod_spin is simple, it has a loop and a few forms of conditionals. You can see some examples below. Loop: #for(${reference}) text within the loop and a ${reference.column} #end Conditionals: #if(${reference}) ${reference} is not NULL #else ${reference} is NULL #end #if(${reference} == "literal text") ${reference} equals text #else ${reference} doesn't equal text #end #if(${reference} == ${anotherreference}) ${reference} equals ${anotherreference} #else ${reference} doesn't equal ${anotherreference} #end #if($#{reference} == 3) the size of ${reference} equals 3 #else the size ${reference} doesn't equal 3 #end #if($@{reference.column} % 2 == 0) current index of ${reference.column} is divisible by 2 #else current index of ${reference.column} is not divisible by 2 #end Conditionals can also be reversed in meaning, by using #unless instead of #if. All valid #if constructs are possible with #unless as well. Here is an example: #unless(${reference}) ${reference} is NULL #else ${reference} is not NULL #end Data types and loops ==================== You can place two different types of data in the context: single and rows. Singles are simply character strings. They are pointed to by a char* and limited by the length. Generally, mod_spin does not rely on '\0' being present at the end of the string. However, regular C APIs mostly handle strings that have the ending '\0' character. Therefore, all single data, although being declared as 'size' in length, actually gets a '\0' character at the end (naturally, the space for this character is allocated when the single is created). This is very useful when communicating with regular C APIs, as it saves a lot of copying and memory allocation. If you design your own functions that create single data, you MUST FOLLOW THIS CONVENTION OR YOU'RE SETTING YOURSELF UP FOR A WHOLE HEAP OF BUFFER OVERFLOWS! Rows are data that looks a lot like something that would be returned from an SQL query: there are named columns and data contains certain number of rows. However, unlike what's returned by SQL queries (i.e. single pieces of data), each actual piece of data can again be either rows or single. This then enables nesting of multiple data dimensions. The nested #for loops are used to spin around such data. That's where the name mod_spin comes from. The final data type that is replaced into the template is always single. mod_spin doesn't know how to replace full rows because the presentation would be undefined. That's why you have to use #for loops to spin around rows data type to place the singles contained there in their correct places inside the template. References ========== They come in three flavours: ${reference} - the value (text, rows) of the reference (regular reference) $#{reference} - the size of the reference (size reference) $@{reference} - current index of the reference within the loop (index ref.) The first form is straightforward as it simply takes the value of the reference and it uses that. Used within text, only singles get substituted. Used in a #for loop, singles get looped around once, rows get the number of loops equivalent to the number of rows. The second form takes the size of the reference, which for singles means the length of the text and for rows the number of rows in each of the columns. This form doesn't make sense in a #for loop and if placed there it will be treated as a regular reference. The third form takes the current index of the reference within a loop, if applicable. Indexes start at 1. If the index wouldn't make sense (e.g. the reference is a single, there is no loop etc.), it is treated as zero or as NULL in conditionals and replaced with "0" if placed within text. This form doesn't make sense in a #for loop and if placed there it will be treated as a regular reference. Text ==== Any text that isn't part of the #for loop or #if/#unless, will be literally copied into the output. The space occupied by #for, #if and #unless will not be space filled in the output, but removed as if it never existed. References, which are case sensitive, placed inside the text will be replaced with their values from the context or nothing if that value is NULL or the reference does not exist. References are never recursively substituted (this may create denial of service or security problems and it is therefore avoided). If such functionality is desired, it belongs in your application. Loops ===== These are quite simple: #for(${reference}) text within the loop and a ${reference.column} #end You can only use regular references to loop around and other types of references placed in #for will be treated as regular. For instance: #for($#{reference}) some text here #end is the same as: #for(${reference}) some text here #end The #for loop won't spin if the data it is supposed to process is NULL. This can happen if the appropriate data for the reference cannot be found in the context, or if the value of it is NULL. Conditionals ============ The only other command in mod_spin apart from #for loop is the conditional, as shown above in the overview. Again, it looks like this, for the simplest of expressions (i.e. a reference): #if(${reference}) something if ${reference} is not NULL #else something if ${reference} is NULL #end or the negative variant: #unless(${reference}) something if ${reference} is NULL #else something if ${reference} is not NULL #end You can also use: #if(${ref}) something #end #if(${ref}) something #else#end #if(${ref})#else something #end and naturally: #unless(${ref}) something #end #unless(${ref}) something #else#end #unless(${ref})#else something #end Expressions valid in conditionals ================================= An expression placed inside #if or #unless always starts with a reference and placing anything else on the left is an error (or more explicitly, a parsing error). Here are all the forms of expressions allowed in conditionals and when they yield truth: #if(${ref}) - ref exists and is not NULL #if($#{ref}) - the size of ref is greater than zero #if($@{ref}) - the current index of ref is greater than zero #if(${ref} =~ /regex/) - ref matches Perl compatible regular expression regex #if($#{ref} =~ /regex/) - size of ref, as string, matches regex #if($@{ref} =~ /regex/) - index of ref, as string, matches regex #if(${ref} == "str") - ref is the same as literal string str #if($#{ref} == "str") - size of ref, as string, is the same as string str #if($@{ref} == "str") - index of ref, as string, is the same as string str #if(${ref} == num) - ref, as number, equals num (integer >= 0) #if($#{ref} == num) - size of ref is num #if($@{ref} == num) - index of ref is num #if(${ref1} == ${ref2}) - ref1 is equal to ref2, as strings #if(${ref1} == $#{ref2}) - ref1 is equal to size of ref2, as strings #if(${ref1} == $@{ref2}) - ref1 is equal to index of ref2, as strings #if($#{ref1} == ${ref2}) - size ref1 is equal to ref2, as numbers #if($#{ref1} == $#{ref2}) - size ref1 is equal to size of ref2 #if($#{ref1} == $@{ref2}) - size ref1 is equal to index of ref2 #if($@{ref1} == ${ref2}) - index of ref1 is equal to ref2, as numbers #if($@{ref1} == $#{ref2}) - index of ref1 is equal to size of ref2 #if($@{ref1} == $@{ref2}) - index of ref1 is equal to index of ref2 #if(${ref} % mod == num) - ref, as number, modulo mod (integer > 0) is num #if($#{ref} % mod == num) - size of ref modulo mod is num #if($@{ref} % mod == num) - index of ref modulo mod is num Regular expression matches, integer comparisons, literal string, reference to reference and modulo expression comparisons don't work for regular references that are not singles and they will always yield false. No pointer comparisons are ever done, so attemting to compare rows type data will always fail. Of course, all this cannot be determined at parse time, but at runtime. As you can see, in expressions with a reference on the right, the reference on the left is the "master" and it determines the type of comparison done in the expression. For regular references, this is a string comparison (i.e. text is compared, not pointers), for sizes and indexes, it is a numerical comparison. Numbers, except mod in the modulo expression, are all integers greater or equal zero. Literal strings are double quoted (e.g. "a string"). To escape the double quote itself, use "a string with a \" in it". Regular expressions are specified withing slashes (e.g. /^begin.*$/). To escape the slash, use a backslash before it: /^begin\/.*$/. Note that mod_spin isn't aware of any character encodings and from its perspective bytes are characters. If you need to make comparisons that take into account character encodings, you will have to do that inside your applications (for now). Indexes for singles and rows outside a relevant loop are assumed to be zero. If reference on the left doesn't exist or is NULL, the whole expression will always evaluate to false and the bit after #else (if any) will end up being processed. Regular references are converted to numbers using the atol() function, so strings that don't start with numbers turn out as zero. Loops, conditionals and impossible references ============================================= Normally, mod_spin template will look something like this: First text ${ref1} #for(${ref2}) replicate some other text and ${ref2.col1} #end The ${ref1} will be replaced with the value found in the context, if the data it points to is a single, or nothing at all if the data it points to is of type rows. The #for loop will spin around ${ref2} and replicate the enclosing text for all instances of data that ${ref2} points to. The ${ref2.col1} will be replaced with the current row value of the column "col1", if ${ref2} happens to be a data type rows and ${ref2.col1} resolves to a data type single for the current row. Now let's examine an example where impossible references are used: First text ${impossible.reference} #for(${second.impossible.reference}) replicate some other text and ${second.impossible.reference.column} #end The first reference ${impossible.reference}, can never be found in the context because there is no #for loop to spin the data in ${impossible} in order to find ${impossible.reference}. So, when creating the AST, mod_spin will simply ignore this reference. The reference used to spin the #for loop, ${second.impossible.reference}, is also something that cannot exist, so mod_spin will ignore the whole loop and never place any of it into AST. Note that this is different from the first code snippet with, for instance, the value of ${ref2} being NULL, or not existing at all. The parsed #for loop and the text it encloses will be placed into the AST, but it won't be replicated because there is no data to spin the loop around. The above discussion applies to conditional statements as well. For instance: First text ${impossible.reference} #if(${second.impossible.reference}) replicate some other text and ${second.impossible.reference.column} #end The above example would yield exactly the same output as the previous example with the #for loop. However, if you use the #else, then whatever is placed within it will be used. For instance: First text ${impossible.reference} #if(${second.impossible.reference}) replicate some other text and ${second.impossible.reference.column} #else this will always be in the output #end In the above example, the text placed between #else and #end will always be in the output, because the #if would never be true, given that the reference is impossible. Similarly, other conditional forms behave as expected, because they always have a reference on the left hand side. For instance: #if($#{impossible.reference} == 3) this will never be visible #else this will always be visible #end And here is one example for the #unless, the negative conditional: #unless(${second.impossible.reference}) replicate some other text and ${second.impossible.reference.column} #end The above will always end up in the output, because the reference is impossible. References with explicit indexes ================================ It is possible to use references that have an explicit index within a #for loop. It is done like this: #for{${abc}) Reference with implicit index: ${abc.xyz} Reference with explicit index: ${abc.xyz[3]}) #end The second line within the loop above would always display the third row (provided the data there is single) of the "xyz" column in "abc" rows. Such references can be use anywhere where they can be successfully resolved. For instance, this should also work (i.e. produce output): Outside any loop: ${abc.xyz[3].x[2]} Provided, of course, that abc is rows containing column xyz that has at least three rows, containing column x, which has at least two rows and the second element in that column is a single. Service function ================ Service function is the main entry function into your application. It is called in the handler phase of request processing and BEFORE template processing, so it has the potential to change which template is going to be processed (only when used as a handler) as well as to decline or do the processing completely. If SpinApplication isn't specified, the shared library will not be loaded at all, but the template will be processed as normal. However, there will be no data in the context and therefore none will be placed in the final output. The entry function (by default called rxv_spin_service()) takes one argument - the context. It returns an integer which is similar to what an Apache handler would return. Please note that when using mod_spin as a filter, the only valid return code is OK. The meaning of return codes is as follows: OK: Everything was OK, commit application/session store and continue with template processing. Note here that by manipulating filename field within the request_rec structure, you can change which template is to be processed. Make sure other fields (e.g. finfo) that are related to filename are properly updated as well. HTTP_ERROR (e.g. HTTP_INTERNAL_SERVER_ERROR): Stop all processing and give control back to Apache request processing. Don't commit application/session store, since there was an error in processing. ANYTHING ELSE: Commit application/session store and give control back to Apache request processing without processing the template. Some examples of ANYTHING ELSE would be: REDIRECT (e.g. HTTP_TEMPORARY_REDIRECT): Any further processing should not be done as this request is going to be externally redirected. Note here that the application HAS TO set the "Location" header in headers_out. Failing that, the client will have problems. DECLINED or DONE: The service function either decided it's not something this handler should do (DECLINED) or has done all the work on behalf of it (DONE). These are short-circuit return codes that will greatly affect Apache request processing, so be careful with them. Prepare function ================ There is an extra hook that is called before the request is handled, in the fixup phase of Apache request processing. By default, this function is called rxv_spin_prepare() and it takes one argument - the context. It is called if it exists in the shared library. The main purpose of the hook is to allow application writers to insert their code before the actual request processing. This comes in handy, especially if you want some code executed for URIs that aren't handled by the mod_spin handler, but fall under the application umbrella. For instance, you can have authentication code in this function, thus using a mod_spin application to regulate access to all URIs of a configured application (see spin_app for an example). This function is not called for sub-requests. It is also not called unless an application is configured for that particular request (i.e. per virtual host, directory, location etc.). Please note that at the point of call of this function, request parameters have not been parsed yet by libapreq2. Meaning, although you do get the context, you are getting only some of the information that is normally available to the rxv_spin_service() function. This was designed on purpose, since some of the requests that pass through here may not be handled by mod_spin handler at all. In other words, we are not consuming the body of the requests here. The function returns an integer, which can be OK, DECLINED, DONE or any other HTTP_code. It will be returned back to the caller (i.e. Apache hook machinery) directly from the fixup hook. Init function ============= This function, if exists, will be called when the application is dynamically linked into Apache. It is used for process specific initialisation. By default, this function is called rxv_spin_init(). It takes context as an argument, but it doesn't return any values. The code of mod_spin will allow only a single thread to execute this function at a time. However, if the function is such that it should be called only once per process, the function itself will have to make sure code isn't executed more than once. You can attach pool cleanup function(s) to process specific pool if there is a need to clean up after the init function. This should be done in the init function itself. Loading of applications ======================= All loading of shared libraries is done using apr_dso_load() call from APR, so mod_spin makes sure that each process keeps cache of loaded DSOs and that it opens a particular library only once. This is done to avoid registration of a pool cleanup for each call to apr_dso_load(), which would quickly grow process pool. Once the new applications are deployed, in order to reliably reload them, the main Apache process has to be gracefully restarted, so that all child processes die and new ones replace them. This will ensure new applications are loaded across the board. Maximum nesting depth ===================== The combined nesting depth of #for and #if/#unless commands is limited to RXV_SPIN_MAX_DEPTH, as defined in private.h, which is currently 32. Why have such a limit and why is the limit so low? The limit is there to make the code of mod_spin simple and fast and to avoid logic errors in templates caused by inadvertent use of deep nesting. The limit is low because templates that require nesting depth anywhere near this limit are doing something very, very wrong. However, if you find that this is not adequate for you, for whatever reason, feel free to modify private.h and recompile. Presentation issues and the application ======================================= You'll find that some of the presentation level decisions will be done within your application as well (huh?). When given the choice of placing some presentation level logic into the application as compared to contaminating the template with business logic, I have chosen to go with the former. Cleverly designed application will have a separate part that makes data generated by business logic into a presentation friendly format. For instance, when (X)HTML pages are created, some characters, like ``"'' and ``&'' have special meaning. Business logic won't bother itself with making sure those are escaped. However, the part of your application that makes sure presentation is nice, will. Another example is a list of items on the page that should have rows displayed in alternating colours (this particular problem can be solved by newer version of CSS, but the browsers that support that are still not in widespread use). Business logic, again, won't bother itself with that. How do I include other templates? ================================= I find that duplicating functionality is not a good thing. So, I tried to stay away from that. Apache already has mod_include, which can be used as a filter or a handler and provides excellent support for inclusion of other files. If the files that you're including a not dynamic (at least not very dynamic), you should even consider generating finished files beforehand, using some of the available replacement techniques, such as XSLT. This will be good for the performance of your web server. It is worth an effort to reduce what's dynamic to a minimum. Session and application tracking ================================ mod_spin relies on mod_unique_id to provide a unique session identifier by producing an MD5 hash of it. It then produces an HMAC MD5 of the hash, using the crypto salt (key). Both of these (unique id hash and the HMAC) are then served to the client in a cookie, usually called SpinSession. Only if both of these are returned back to the server correctly, will mod_spin use this unique id as the session id. Otherwise, the session simply won't exist. This should make both guessing of session identifiers and denials of service attacks caused by opening of fake sessions significantly more difficult. You must define SpinCookie configuration parameter, or the sessions won't be supported for that application at all. Each session will have corresponding record in the table specified by SpinStoreTable configuration directive, of the SQL database you point to using SpinStore configuration directive. The application will have a record identified with "__application" in the same table as well. Through a simple API, you can get values for each key, either on the application level (i.e. shared among multiple sessions) or session level (i.e. private session data). If you configure file based store backend, application will store its data in the "__application" file located in the directory specified in SpinStore directive. Each session will have its own data stored in the same directory, in the file named after the session ID. Format of these files and records in SQL database is XML. The maintenance of stale sessions is easy. Simply define a cron job that goes around and deletes from the table whatever is older than you like. SpinTimeout configuation directive (if defined above zero) won't actually delete any records from the table - the parameter is used to determine when the data of the session is too old and should therefore be ignored. You need to have an outside job for cleaning records that are very old. Ditto for file based backend store, except that session files (instead of SQL records) that haven't been accessed for a certain amount of time should be removed. Although basic concepts have been pinched from JSP/Servlet world, applications have a slightly different meaning in mod_spin. Basically, whatever uses the same application database file falls under the "same application" umbrella. You can configure SpinStore per server, virtual host, directory or location. So, applications can cross boundaries freely. Sessions are also following the same rule, so you can have multiple session private data for different definitions of SpinStore. Application configuration ========================= Each application can (but doesn't have to) have a configuration file. The filename is specified via the SpinAppConfig run-time configuration directive. The file is regular XML and it looks like this: ]>

The value associated with spinparameter1

The value associated with spinparameter2

It is preferred to include the DTD in the document (it is only small) in order to avoid parsing problems. The configuration is loaded and reloaded automatically by mod_spin. Once the configuration is parsed, the keys and values of the

tags are placed into the application's store. On each new request and if the last parsing was more than ten seconds ago, the configuration file is checked for modification. If the configuration file is newer, it is parsed again and the keys and values are placed into the application store. New values will overwrite existing values associated with same keys, but other key/value pairs in the application store will not be changed. Authentication ============== Apache provides enough authentication mechanisms to not duplicate this functionality in mod_spin. And because Apache's request_rec structure contains all environment variables, the information about the user using the resource is always available to your applications. At this point in time, I did not feel that keeping user data similar to session and application data was necessary. Things like that mostly belong into the application. However, you can wrap Apache authentication with the spin_auth application and small amount of your own code. See spin_auth and spin_app applications for all details. Connection pools ================ mod_spin has a simple API for accessing SQL relational databases. In order to improve performance of connecting to database (and other) servers, mod_spin uses the popular pool approach. Each connection is identified by the connection string, which is specified when the database is opened. mod_spin creates a hash of all those identifiers and stores connection pools, which are database specific, as values in this table. Any subsequent attempt to open a connection to the database of the same type and with the same connection string (the keys are case sensitive) will reuse an existing connection from the pool. This can dramatically improve performance of applications that frequently use (database) connections. Each Apache process will have its own pool of connections (see also the thread safety discussion that follows). While this is good for performance, it has downsides. With every process having its own private connections to the back-end server, the total number of connections can be rather big. Each connection takes memory, CPU cycles and sockets for communication, which, depending on the number of connections, might not be negligible. This alone can overwhelm the machine and can ultimately result in denial of service. That is another reason why it may be a good idea to run a separate instance of Apache for heavily loaded applications. Luckily, Apache is fast to start and it doesn't consume a lot of memory (in today's terms), so you can have many instances of it running at once. SpinConnPool configuration directive enables system administrators to control pooling of connections. This can be useful if you find that a particular application is causing a large number of connections to be kept open. SpinConnCount configuration directive enables system administrators to control the number of connections in the pool. By default, up to 5 connections will be kept in the per-process connection pool, for each connection identifier. You can register any type of connection with the connection pool. As long as it has a unique connection string, you should be fine. This can then be used for any type of connection you'd like to keep hanging around for the lifetime of the thread. LDAP and similar services come to mind first. By all means, this kind of simple database API will not be everyone's cup of tea. There are very nice alternatives (SQL Relay come to mind first) that solved all of these problems (in a slightly different manner) and more. Also, some people prefer to program in a truly cross platform solutions like ODBC. Feel free to completely ignore mod_spin's database API. Thread safety ============= Some Apache MPMs (Multi-Processing Modules), e.g. worker, spin off multiple threads of execution, under which mod_spin should be OK. However, if you spin off your own threads and want to use mod_spin structures across threads, you MUST SYNCHRONISE access, or you will experience problems.