Recently at Quoin, we undertook a project with a client to redesign and add community building functionality to an existing high traffic website. The client’s major requirements were for users to be able to create accounts and rate and review content in various ways. Users can then search and view this user-generated information via numerous JavaScript widgets. With the expectation of thousands of users interacting with the site’s content, the database server would need to process a large amount of data quickly in order to present that data to users.
As developers, there are some interesting problems that we need to solve in order to fulfill the client’s requests while building an application capable of handling the high volume of traffic. Time-intensive SQL queries are required for the database server to sort and filter that content. With many users accessing the site simultaneously, the database server could quickly become overwhelmed. Ensuring that all database queries are optimized and efficient is essential; this can only do so much however. Another approach is to try and reduce the total number of queries made, especially when those queries are particularly complex and resource intensive.
Certain calls to the database are unavoidable, logging in a user or recording a users rating of content for example. These queries are generally unique to the specific user. However, other queries are the same or very similar for all users, such as determining how many times certain content has been rated in a given time period. These are the queries we want to avoid.
Rather than querying the database each time a user makes one of these general
requests, we can use Quartz
to schedule those queries to run as a cronjob. The cronjob runs periodically,
writing the results to files accessible by the web server. The user then
requests that file rather than making a call to the database.
An important design decision developer’s must make is the file format used to
store the data; we have used JSON
extensively for this client’s site. JSON is a simple notation that can easily
be parsed and generated; JSON data can also be used to directly instantiate
objects in JavaScript. This is incredibly useful when writing JavaScript
widgets in a framework such as Dojo.
These widgets make asynchronous calls to request the appropriate JSON data
file. The client side widgets further sort and filter this data and ultimately
use it to populate the site with meaningful information.
These data files can be cached locally or in content distribution networks, possibly for long periods of time depending on the caching techniques used. In the best-case scenario, what was originally a query to the database server can be replaced with a locally cached file containing the necessary information
A second benefit of this approach is scalability. We launch our server application via Spring. This allows us to adjust the frequency of the Quartz scheduled cronjobs by modifying a simple xml configuration file. If the load on the database server becomes too high, then the site managers can reduce the frequency of intensive queries. If the site updates its infrastructure, allowing greater load on the database server, the managers can increases the frequency of these queries.
There is a drawback to this approach however - the data is not real time. Thus, any query that must give real time results cannot be replaced with this technique. However, if the query determines the number of times some content has been rated in the last year, the fact that the data is a few minutes old is probably not important.