Courant News: Email Engine


07.19.09 Posted in Courant News by Max

As I work on the last pieces needed to launch the YDN on Courant next month, I finally got around to imple­ment­ing the email engine for Courant and the YDN site.

Goals/Requirements

There were a few basic require­ments neces­si­tated by the quan­tity of emails that the YDN han­dles on a daily basis (on the order of sev­eral thou­sand subscribers):

  1. Process the email queue asyn­chro­nously from the web server process. Send­ing thou­sands of emails takes a long time, and we don’t want to tie up our web server dur­ing that time.
  2. When administrators/editors sub­mit new mass email jobs (e.g., daily head­lines), they should not have to wait around for the server to process that request. The page should quickly load and the pro­cess­ing should be done async/out-of-band.
  3. Since emails could poten­tially be cus­tomized, avoid reliance on bcc: field. Essen­tially, each user gets their own email object in the system.

Imple­men­ta­tion

For the email engine itself, I started with the django-mailer app as my foun­da­tion. It stores emails in a queue in the data­base, and a command-line man­age­ment com­mand is run by cron every x min­utes to process queued emails. I also took some patches from var­i­ous forks to han­dle reuse of the SMTP con­nec­tion, stream­lined HTML+Text email cre­ation, and a limit on the num­ber of emails the man­age­ment com­mand processes at one time.

On the inter­face side, I cre­ated a Django admin action for both issues and arti­cles that will ren­der tem­plates for the text and HTML ver­sions and then cre­ate  a django-mailer Mes­sage instance for each sub­scriber. I quickly dis­cov­ered that this only meets the above require­ments for very small sub­scriber sets. With 5000 sub­scribers, it took about 45 sec­onds on my dev VM to gen­er­ate all the emails and save them in the data­base; while that was hap­pen­ing, the browser would wait for a response, pos­si­bly tim­ing out in the interim.

This clearly didn’t meet require­ment #2, so I had to add another step to the process. When the admin action exe­cutes, it would cre­ate the email con­tents and store that in a new model, Mes­sage­Job, which had a sim­ple TextField con­tain­ing a delim­ited list of email addresses to send to. Then I cre­ated a new man­age­ment com­mand that will process Mes­sage­Jobs into all the indi­vid­ual Mes­sages when called by cron.

So now the exe­cu­tion flows looks like this (see images at end of post):

  1. Edi­tor checks box next to an issue, selects “Send email update” admin action option, and hits “Go” (sub­mits the action form)
  2. A Mes­sage­Job is gen­er­ated con­tain­ing the text and HTML ver­sions of the email, the sub­ject line, from address, and delim­ited list of recip­i­ents. Page loads for edi­tor and dis­plays mes­sage con­firm­ing that job has been queued.
  3. Sys­tem cron soon calls “manage.py process_mail_jobs”, which pulls out any unprocessed Mes­sage­Jobs and cre­ates a new Mes­sage instance for each recipient.
  4. Sys­tem cron soon calls “manage.py send_mail” which attempts to send all the Mes­sages in the queue in the data­base. If any fail to send, they will be retried when cron calls “manage.py retry_deferred” (stan­dard django-mailer behavior).

Those three cron jobs get called every minute (con­fig­urable if you want less fre­quently, though there is almost no penalty for call­ing them if the queues are empty), either by man­u­ally set­ting your user crontab or by using an app like django-chronograph (which will come pre-configured with Courant News).

Con­clu­sion

So the mod­i­fied django-mailer sys­tem with the admin actions inte­gra­tion meets all of the require­ments we had defined for the email engine. The next step will be to imple­ment the email sub­scrip­tion spec that Rob and I wrote back in Decem­ber, though that work may be deferred to ver­sion 2.

I have not yet done exten­sive per­for­mance tests com­par­ing use of a local mail server (e.g., Exim) vs. an exter­nal ser­vice (e.g., Gmail), but that’s some­thing I’ll be explor­ing in the not-too-distant future. Even if the email send­ing process takes a long time, the users of the web­site and admin should not notice, and sys­tem per­for­mance should be well within accept­able bounds. User reg­is­tra­tion, pass­word reset and other emails will have higher pri­or­i­ties and jump to the top of the queue, so even while pro­cess­ing a large sub­scrip­tion queue those crit­i­cal mes­sages will still be (almost) imme­di­ately sent out.

I’m hold­ing off on check­ing this work into the pub­lic Courant News repos­i­tory as I cur­rently have a mess of Nando stuff inter­leaved, and I need to sort that out first. I also want to add some forms and views for han­dling cre­ation of email sub­scrip­tions on the pub­lic web­site, so maybe once that is com­plete I’ll push through a checkin.



3 Responses to “Courant News: Email Engine”

  1. Neal Poole says:

    Look­ing good! :)

    On a slightly unre­lated note, it looks like the Trac wiki for Courant News is get­ting spammed: the time­line is full of spam wiki pages.

  2. Max says:

    Thanks for the heads up Neal, I just went and cleaned up the wiki a bit. I’ll mon­i­tor it more closely for spam in the future.

  3. […] Courant News included new email and ana­lyt­ics track­ing sys­tems, which allowed us to push break­ing news updates to our email […]

Leave a Reply