Author Archives: david

A parable about problem solving in software development

I’ve told a lot of people this story over the years. Mostly whilst drunk. The responses are usually pretty similar – hilarity, incredulity and just a little bit of “There but for the grace of god go I” style sympathy. I’ve had multiple requests to write it up, so I’ve finally acquiesced.

It’s a parable about what happens when you’re always solving the problem right in front of you rather than questioning whether this is a problem you actually need to be solving. It’s unfortunately mostly true. I’ve anonymized and fictionalized bits of it, mostly to protect the innocent and the guilty (and occasionally to make it a better story), but 90% of what I describe in the following really happened, and 90% of that happened more or less as I described it (as best as I can remember). If you know me, you can probably figure out where it happened. If you don’t know me, I’m not going to tell you.

At the beginning of our story we had one central API off which maybe a dozen smaller apps and services hung. The API controlled our data storage, the operations you can reasonably perform on it, and generally encapsulates our model. It, and each of the apps, lived in their own source control repo and were deployed separately.

This API was implemented via JSON-RPC over HTTP. It wasn’t RESTful, but maybe it was a bit RESTy. RESTish perhaps.

It kinda worked. It wasn’t perfect, but it was at least vaguely functional.

We essentially had two problems with it:

  1. Each of the apps talking to it had written its own client library (or was just including raw HTTP calls straight in the code)
  2. It was quite slow

As well as the the core API, we also had a message queuing system. It was pretty good. We didn’t use it for a lot – just some job queueing and notifications to send to the users – but it worked well for that. We’d had a few problems with the client libraries, but they were easy to fix.

At some point it occurred to one of us that the reason our HTTP API was slow was of course that HTTP was slow. So clearly the best solution was to replace our slow shitty HTTP RPC with our hot new message queue based RPC. What could go wrong!

Well, you know, it didn’t really go wrong. It mostly worked. It was… a bit strange, but it basically worked. We wrote an event driven server which implemented most of what we were doing with the HTTP API (including all the blocking calls we were making to our ORM. Oops). It polled a message queue, clients would create their own message queue to receive responses on. Then a client would post a message to the server which would reply on the client’s message queue (I think there were some tags added to the messages to make sure things lined up. I hope there were, otherwise this all sounds horribly precarious).

This was basically unproblematic. It might even have been a slight improvement on our previous system. RPC over message queue is a legitimate tactic after all. We of course didn’t have any benchmarks because why would you benchmark sweeping changes you make in the name of performance, but it was at the very least not obviously worse than the previous system.

Our next problem was the various client libraries that we were reimplementing everywhere. This was obviously stupid. Code reuse is good, right?

So we rationalized them, pulled them all out into their own repo, and produced a client package. You used it by installing it on your system (which was just a single command using the packaging system we were using), and then you could talk to an API server. It was straightforward enough.

So we’d solved our reimplementing problems, and we were at least claiming we’d solved our performance problems (and maybe we even had. At this late stage I honestly couldn’t tell you).

Thing is… it turned out that this was actually quite irritating to develop against.

It had already been a little painful before, but now in order to add a feature you had to do all the following steps:

  1. Make a change to the server code
  2. Make a change to the client library code
  3. Make a change to the application code
  4. Restart the server (no code reloading in our custom daemon)
  5. Install the client library on your system
  6. Restart your application (no code reloading when a system package changes)

We decided to solve the first two problems first.

We noticed that a lot of the code between the client and the server was duplicated anyway (similar structure on each side after all). So we ended up commonizing it and putting the client library in the repo with the server. Not all of the server code was needed in the client library obviously, but it was much easier to just put it all in one directory and have a flag that let you test if you were running in client or server mode. So now at least all the changes you had to make to both client and server were in one repo, and they might even have been the same code.

At the moment what we have here is a slightly baroque architecture, but it’s fundamentally not that much worse than many you’d encounter in the wild. It’s not good, but looking at it from the outside you can sortof see where we’re coming from. What follows next is the point at which it all starts to go completely tea party.

You see, code duplication between client and server was still a problem.

In particular, data model duplication. If you had a Kitten model, you needed a Kitten model in both the client and the server and you needed to maintain both. This was quite a nuisance.

At this point some bright spark (it wasn’t me, I swear) realised something: Our ORM supported highly pluggable backends. They didn’t even need to be SQL – there were examples of people using it for document storage databases, even REST APIs. We had this API server, why not make it an ORM backend?

And if we’re doing that, can we do it in a way that reuses the models we’re already using? We’re already detecting if we’re running in client or server mode, can’t we just have it use a different backend in the two cases?

Well, of course we can.

Of course, the really nice thing about having an ORM is how you can chain things and build rich queries. So we do want to support the full range of query syntax for the ORM.

A weekend of caffeine fuelled development from this one guy later, we all arrived on a Monday morning to find a grand new vision in place. Here’s how it worked:

  1. We have the same ORM models on both client and server
  2. If we are in client mode, our backend uses the JSON-RPC server rather than talking to the database
  3. Given a query object, we do a JSON RPC call to the corresponding backend methods on the server. This returns a bunch of models

Simple, right?

I’m going to unpack that.

  1. I make a bunch of method calls to the ORM
  2. This generates a Query object
  3. We pass this Query object to a custom JSON serializer that has to support the full range of subtypes of Query
  4. We send that JSON over a message queue
  5. Our server pops the JSON off a message queue, deserializes it and calls a custom method to build a Query object
  6. This Query object is passed to the ORM backend
  7. The ORM backend converts the query object into an SQL query
  8. The database adapter executes that SQL query and returns a bunch of rows
  9. Those rows get wrapped as model objects
  10. Those model objects get serialized as JSON and passed across the client message queue
  11. The client pops the model JSON from the message queue
  12. The client parses the JSON and wraps the resulting array of hashes as models

…yeah.

Anyway, we arrive on a Monday morning to find this all in place and broadly working (“There are just a few details to polish”).

And, you know what? We decided to roll with it. We were quite irritated with the status quo, and this clearly would make our lives easier – there was an awful lot less code to write when we wanted to add a feature and boy did we need to add features. So although we were probably a little suspicious, we decided to let that slide.

Of course… you see that long pipeline over there? Lot of moving parts isn’t it? Many of them, custom crap we’ve written. I bet that’s going to break, don’t you?

Of course it broke. A lot.

And naturally, as seems to happen, muggins here gets to be the guy in charge of fixing those bugs (how did this happen? I don’t know. I think the problem is that I don’t step back fast enough when the call for volunteers arrives. Or maybe people have an uncanny knack for spotting I’m actually quite good at it despite my best efforts to pretend I’m not).

One of the most common sources of bugs was user error. Specifically it was user error that was made really easy by our setup.

It required three steps to push a change to the code to your application: You had to restart the server, you had to install the package, you had to restart your application. If you forgot any one of those three steps, your client and server code would be out of sync (and remember how much of this was shared) and the resulting errors would be subtle and confusing. This frequently drove people to despair.

Remember how I believe in “Fail early, fail often“? It turns out I’ve believed this for some time (the first evidence I can find of my thinking along these lines comes from 2007. That would have been within about a year of my learning to program).

So the solution I hit upon for the problem was “Well, don’t do that then”. When a server or a client started up, it would create a signature that was a (MD5 I think) hash of all its code. This would then be transmitted along with every RPC call, and if the server detected that the client’s hash differed from its own it would instead respond with an error saying “No, you’re running the wrong client code. I’m not going to talk to you”. Unsubtle, but effective in making the error clear.

This solved the immediate problem, and we decided it was good enough.

Most of the next six months (when I wasn’t doing feature dev) I was fixing bugs with the pipeline – this particular obscure query was crashing our deserializer. This one query was somehow generating 17MB of JSON data and the parser didn’t like that very much. That sort of thing.

During this time people were getting increasingly irritated with the dev process. It was all very well having those errors be detected, but what you really wanted was for those errors to be fixed. And to not have to do three slow steps to make a simple change.

This was when my true contribution to our little Lovecraftian beauty came in.

“Well”, I reasoned, “the server has all the code, right? And the client needs all the code? And the server is already sending data to the client…”

So.

The package remained as a tiny shim library that needed to be installed to talk to the server, but it include really very little code (it still checked the code md5, but this now basically never changed).

Here is the code loading protocol:

  1. On startup, the client would make its first RPC call. This was a “Hey, give me the code” call. The server would reply with a list of file paths and their source code
  2. The client would create a temporary directory and write all the files into that temporary directory
  3. The client would add that temporary directory to the load path and require the entry point to the library

This removed the install step: The client would forever and always be running the latest version of the code, because it fetched it from the server at start up. We still had to restart the server and the client, but at least one of the more irritating and easy to forget steps was removed.

I don’t think we ever implemented code reloading, though it’s obvious how we could have – on code changes, the server would just have to broadcast the changed files, which could again be written to the file system and reloaded.

Fortunately better judgement prevailed before we hit that point.

We were coming up to the first major release we’d have with all this infrastructure in place.

It was obviously not going to go well.

The site was dramatically slow in comparison to its previous “This is too slow!” HTTP incarnation. Why? Because it turns out that serializing and deserializing lots of ORM queries and models is really fucking slow! When we had the HTTP implementation in place we were a bit more careful about what we were doing, but this was all behind the scenes and invisible to us and mostly out of our hands.

It was also still quite buggy. Despite my best efforts to keep the whole thing reliable and functioning – I’d patched a lot of bugs – we kept finding new ones. The problem wasn’t in fixing individual bugs, it was that the core architecture was basically a disaster.

One night while wrestling with insomnia I had a revelation.

“OH MY GOD. IT’S JUST A LIBRARY”.

A weekend of caffeine fuelled development from me later, everyone arrived on a Monday morning to find a grand new vision in place. Here’s how it worked:

  1. Everything lived in a single repo.
  2. Everything that was previously server code was now just sitting in a single library that everything put directly on their load path.
  3. Everything talked to the database directly, via that library.

That’s. It.

It took a little bit of time to get it stable after that – there were a lot of places where our bug workarounds now became bugs in their own right. There were a few days where it was touch and go – this was about a month before release and there was some serious head scratching and concerned moments where we thought we were going to have to release it in its previous form after all. But we got there, and the result was unsurprisingly both faster and more reliable than what came before it.

Obviously this is how we should have done it in the first place. It’s not just obvious in retrospect, it should have been obvious in the beginning. We were just too focused on fixing this one problem with our current system rather than calling the system itself into question to see it.

The project structure changed a bit over the time since then, but as far as I know this is still essentially how it looks, and I imagine how it will to continue to look indefinitely.

Unless someone decided that what was really needed is to abstract out some part of the database access into an RPC server. I hope no one did that, but I’m a little afraid to ask and find out.

This entry was posted in programming on by .

A manifesto for error reporting

See also: a rewritten and somewhat calmer version of this post I wrote later.


So I do a lot of debugging.

It’s not because I write a lot of broken code so much as that I seem to be a go to guy for “Something mysterious is happening. Can you lend a hand?”

That’s fine. It’s something I’m reasonably good at, and sometimes it’s even enjoyable.

But it means I have a bit of a triggering issue which will get me very angry.

“What is that triggering issue, David? Tell us, please!” I hear you say.

Well, I’m glad you asked!

That issue is very simple: Bad error reporting.

A thing developers don’t seem to understand is that what happens when things go wrong is every bit as important as what happens when things go right. Possibly more important.

If you don’t realise this, when things go wrong you will feel the wrath of myself and all the ops people who have to deal with your software floating through the air, trying to set you on fire with our minds.

While I’m reasonably sure psychic powers aren’t actually a thing, do you really want to take that chance?

So, if you don’t want to experience spontaneous geek rage induced combustion, here is some helpful advice for you to follow.

First, a word on process. When something goes wrong, the question I am asking is “How can I make this not go wrong?”. In order to answer this, I must first answer the following questions:

  1. Where has it gone wrong?
  2. What has gone wrong with it?
  3. Why has it gone wrong?

Your job as a writer of software is to make it as easy as possible for me to answer these three questions.

Next a note of context:

How am I attempting to answer this question?

Well, in an ideal world, I’m attempting to answer it because I have a nice precise test case which reproduces the problem.

However, I first need to get to that point, and in order to get to that point I need enough information to give a pretty good answer to the first two questions. An entire application is not a test case, especially not if it’s in a complicated deployment environment. I need enough information about where it has gone wrong to extract a smaller test case and I need enough information about what has gone wrong to put that test case in a state where it will demonstrate the problem.

So what I’m actually looking at initially is almost certainly a log file. It’s OK if this log file is really the screen of a console, but the point is that something, somewhere, has given me a textual record that says “Hey, something’s gone a bit wrong. Here’s some info for you to look at”.

There is a possibility that if you’re writing an application or a framework or something you have deliberately avoided producing such a textual record of anything, or are piping your errors to /dev/null or something. Hopefully this is not the case, because if it is you don’t need to worry about spontaneous combustion because whomever has to deploy and maintain your code has probably already tracked you down to your home address and killed you in your sleep. No jury would convict.

So, from now on, I’m assuming you’ve done the decent thing and there’s some way of going from errors that occur to logs of such errors.

What can you do to make everyone’s lives easier?

Error messages

Obviously the prerequisite of this is that you actually tell me something in your error message. You’d never just write an error message that said “Something went wrong”, right? So assuming you’ve already got error messages that tell me roughly what went wrong, here is how to have error messages that tell me exactly what went wrong:

If your error message is triggered by a value, for the love of god include that value in your error message.

People don’t seem to do this. I don’t understand why. It’s very simple.

Don’t do:

   error "Bad argument"

Do do:

   error "Bad argument: #{argument.inspect}"

Even better if you tell me exactly why it is invalid:

   error "You can't curdle a frobnitzer which has already been curdled: #{argument.inspect}"

(Side note: All examples here will be in ruby, because that’s mostly what I’ve been working with when this has been pissing me off. The examples should be easily portable and the principles are language agnostic).

That’s it. You’ve already made my life at least 27% simpler with this one step.

Why is this important?

It’s important because tracking data flow is hard. It’s entirely possible that the function you’ve raised an error in is about 50 calls deep. I can probably track down what has been passed to it eventually after carefully looking through calls and such-like, but I shouldn’t need to. If you are not including the value in your error message then you have exactly the information I need at your finger tips and are failing to tell me. That’s kinda a dick move.

Exceptions are awesome. Do more of those

You know what are great?

Exceptions. Exceptions are great.

I mean obviously I’d rather if your code isn’t throwing exceptions, but I’d rather it’s not throwing exceptions because it doesn’t need to because everything is going swimmingly, not because it wouldn’t throw them if something went wrong.

Why are exceptions great?

Exceptions are nice for structuring error handling in code, they provide good classification for error recoveries, etc. etc.

That’s not what I care about here.

Exceptions contain one thing that elevates them to the status of patron saint of people who have to debug problems.

They carry a stack trace.

It’s like a glorious little audit trail that points the finger at exactly where the problem occurred. If you’ve followed the previous instructions and given them a good error message too then you’ve probably told me exactly what I need to know to reproduce the problem (there are some, ahem, exceptions to this which I will get on to later, but this is true most of the time).

Side note: I know this isn’t true in all languages. e.g. C++ exceptions don’t carry stack traces (I think) and Haskell ones have less than useful stack traces due to lazy evaluation. You have my sympathies. Everyone else, no excuses.

Further, they carry exactly the information I want to appear in the log on top of that: An error category and a message. An exception which bubbles up to the log file is my best friend for problem debugging.

Some specific notes on exceptions:

If you see an exception, say an exception

Never hide exceptions from me. Ever.

If you catch an exception, I need to know about it unless you’re really goddamn sure I don’t (examples where you may validly be goddamn sure I don’t include Python’s StopIteration and any other exceptions used for control flow. Yes this is a valid thing to do).

I don’t care if you send an email, dump it in a log file, whatever you want. I just need to know about it, and I need to know at the very least the exception class, the exception method and for the love of god the exception stack trace.

Thou Shalt Not Fuck With The Stack Trace

A lot of frameworky things (rails, rspec, etc. I define framework as any library or application where the usage pattern is “Don’t call our code, we’ll call yours”) think that exceptions are confusing and unhelpful. They might show you some of the stack trace, but you really don’t want the whole thing do you? Here, let us filter out those unhelpful bits.

NO.

NO NO NO NO NO NO NO NO NO NO.

NO.

Bad developer. Wrist slap. No cookie.

The chances that you actually correctly understand what is the important bit of the stack trace are effectively zero. Even if you somehow manage to correctly understand this, you are removing important context. The lack of that context will confuse me more than its presence. If I ever find you are doing this I will simply have to do everything again with the “stop lying to me you bastard” flag turned on.

And that’s terrible.

Except…

There is one case in which fucking with the stack trace is not only permitted but also mandatory.

It is OK to add more information to the stack trace.

In particular, if there is another stack trace involved you should also include that.

I often see code like this:

begin
   ...
rescue LowLevelException => e
   raise MyLibrarySpecificException(e.message)
end

Please take my outrage as read.

It doesn’t look like you’re doing it but you are once again fucking with the stack trace. Remember what I said about not doing that?

It’s OK to wrap exceptions. I understand the reasoning for doing it, and it’s often a good idea.

However: Your language almost certainly gives you the capability to override the stack trace. When you are wrapping an exception you must do this so that it includes the original stack trace. Ideally you would include both back traces, so your logs would contain something like:

MyLibrarySpecificException: Wrapped LowLevelException: "A message"
   this
   error
   was 
   thrown
   here
   -- WRAPPED BACKTRACE --
   the 
   original
   error
   was 
   thrown
   there

The details don’t matter. The point is: Include both back traces if you can, include only the original stack trace of the exception you’re wrapping if you absolutely must.

Here’s an example of how you can do that in Ruby:

class MyLibrarySpecificException < StandardError
  attr_reader :wrapped_exception
 
  def initialize(wrapped_exception)
    super("Wrapped #{wrapped_exception.class}: #{wrapped_exception.message}")
    @wrapped_exception = wrapped_exception  
  end
 
  def backtrace
    super + ["---WRAPPED EXCEPTION---"] + wrapped_exception.backtrace
  end
end

Enough of exceptions. Some more general principles.

If something goes wrong, tell me

This rant isn’t about Padrino, but it was a triggering point for it.

One of Padrino’s more interesting behaviours is that if you have a syntax error in one of your controller files it won’t fail to start. Instead what will happen is it will log a warning, continue loading and then just go “Eh, I don’t know anything about that” if you try to use routes defined in a controller it failed to load.

This is not helpful.

A common design principle seems to be that you should attempt to do the right thing – recover from errors, guess what the user meant, etc.

This is really not helpful.

The problem with fuzzy behaviour is that it produces fuzzy results. Postel’s Law is deeply unhelpful for library design: Code which you are running should be correct. If it’s a bit wrong, you should not attempt to run it, you should error out and make me fix my code.

This is because errors in code are signs of error in thought. The chances of my accidentally calling your code with the wrong value is much higher than the chances of me deliberately being a bit sloppy (and if I’m deliberately being a bit sloppy it’s OK to slap my wrist and punish me for it). Code which is doing the wrong thing is going to be a problem now or a problem later, and I’d much rather you told me it was a problem now so I can fix it now rather than having to locate it later.

On the subject of “now rather than later”.

Validate early, validate often

Suppose I write the following code:

class HelpfulHashWrapper
  def initialize(hash)
    @hash = hash
  end
 
  def do_something(some_key)
    return @hash[some_key]
  end
end

(ignore the fact that this class is stupid)

Now suppose I do the following:

1.8.7 :029 > my_wrapper = HelpfulHashWrapper.new(nil)
 => # 
1.8.7 :032 > my_wrapper.do_something "hi"
NoMethodError: undefined method `[]' for nil:NilClass
	from (irb):26:in `do_something'
	from (irb):32

Where is the error here?

Hint: It’s not the point where the exception was raised.

I constructed the HelpfulHashWrapper with an argument that was never going to work. My HelfpulHashWrapper unhelpfully didn’t tell me that I had put it into an invalid state.

Why is this important?

Remember when I said that the first question I needed to be able to answer was “Where has it gone wrong?”

If I get an error when I try to use an object in an invalid state, I’m not really able to answer that question. Instead what I need to do is back track to the point where the object got put into an invalid state. This is hard work. The following version of the class will make my life much easier:

class HelpfulHashWrapper
  def initialize(hash)
    raise "I can only helpfully wrap hashes. #{hash.inspect} is not a hash" unless hash.is_a? Hash
    @hash = hash
  end
 
  def do_something(some_key)
    return @hash[some_key]
  end
end

I will now discover very early on when I’ve done something wrong, rather than waiting to find it at a mysterious later date.

Basically: The closer to the original error you report the problem, the easier it is for me to identify and fix the problem.

In summary

  1. Above all else, give me helpful error messages
  2. Helpful error messages contain any invalid values and a reason as to why they’re invalid.
  3. Throw exceptions if something goes wrong.
  4. Your application should record all exceptions it receives.
  5. Do not fuck with the stack trace
  6. Do not attempt to help me by not throwing an exception. If something maybe should throw an exception, it should throw an exception.
  7. Validate your internal state, and throw an exception when your state becomes invalid, not when I try to use it in an invalid state.

Doing these things will significantly reduce my blood pressure, will make your ops guys love you (or at least resent you slightly less bitterly), and will reduce your chances of spontaneous combustion by at least 43%.

This entry was posted in programming on by .

A meta-recipe

I moved flat in December.

Shortly after I moved, an oriental supermarket opened up not a minute’s walk from my front door (it literally takes me longer to get out of my building than it takes for me to get the supermarket from there). This fills me with joy.

As a result my closest source of food to home contains fairly exotic Asian ingredients (or at least exotic to white boy here. I imagine it’s the Chinese equivalent of Tesco). Naturally this has influenced my cooking quite significantly.

The problem with this of course being that I have only the roughest idea of how to cook most Asian cuisine. I’ve had a go in the past at a few standard recipes, but I’ve never really been very recipe oriented anyway.

Fortunately, I’ve eaten a lot of Asian food of various types, and am always extremely happy to improvise! The result is a lot of food that is, shall we say, Asian inspired…

Here’s a meta recipe I’ve been cooking a lot recently. It’s basically soup. It works pretty well.

The primary starting point is you take hot water, put it on to boil, and add rice vinegar (I’ve been using sushi vinegar recently, but I’ve used brown rice vinegar in the past and both work well) and about a teaspoon of Japanese red pepper (this is really nice. It has a lot of flavour and just a little bit of heat, so you can use lots of it to get a really nice flavour without it getting too hot). Towards the end of the cooking process I also add a large dollop of miso (I’m currently using red miso). I’ve vaguely gathered the impression that you’re not supposed to boil miso because it kills stuff in it, which is why I add it towards the end once I’ve reduced the heat.

So to recap, we’re basically making a broth that is rice vinegar, red pepper and miso. Mmm.

Other things that are very tasty to add to this broth include sesame (either whole seeds or oil) and peanut butter.

No, really. Peanut butter. Don’t knock it. It’s an amazing addition.

To this broth we then add… stuff. Pretty much whatever is available. That’s why it’s a meta-recipe.

Things I have successfully added:

  • Noodles
  • Glutinous rice (this makes it less of a soup and more of a rice dish, also increases the cooking time by lots. I’m tempted to premake a whole bunch of glutinous rice just so I can add bits of it instead of cooking it each time)
  • Frozen vegetable gyoza (don’t judge. These are amazing)
  • Eggs (especially salted duck eggs)
  • Tofu
  • Frozen soy beans
  • Frozen peas
  • Bean sprouts
  • Any other vegetables to hand
  • etc.

It’s not complicated. You’re basically making a big bowl of tasty soup and adding anything you have to hand.

So, yeah, basically this is a post saying “HEY GUYS HAVE YOU HEARD ABOUT THIS THING CALLED SOUP IT’S PRETTY AMAZING?”

But specifically you should try out the incredibly simple broth “recipe”, because it works really well (don’t forget to try adding peanut butter. Unless you’re allergic, in which case you probably shouldn’t), and you should also try out salted duck eggs, because they’re really nice.

And don’t forget about the frozen gyoza. Mmm. Gyoza.

This entry was posted in Food on by .

Exploring your twitter archive with unix

So I have access to my twitter archive now. I was very excited by this, then a month later I still haven’t done anything about it.

I decided to fix this.

Everything I’m doing today is on a linux system (running Mint if you must know). You will need the following things installed to follow all of it.

atool
This is basically an archive management tool. Actually you don’t need this at all, but I used it as part of set up and it’s totally worth having on every system you run so I’m mentioning it anyway
git
Again, this is strictly optional, but you might want to replace it with some other VCS. When I’m working with a bunch of files I like to put them under version control, so any destructive operations I perform (deliberately or inadvertently) can easily be backed out of. I like git, so I used git, but there’s no reason to use it specifically over just about anything else here.
moreutils
I use this in precisely one step, but it’s quite useful for that step. You can easily find a way to do without it.
wget
Program for downloading files using the command line. Non-essential (just use your browser if you prefer) but useful
jq
This is absolutely the core utility I’m using and you need it installed in order to do anything useful with this post

So step one is to get the data. Go to your twitter settings, click “Download my archive”. They’ll email you the archive. Download it. Come back when you’re done.

david@volcano-base ~ $ mkdir -p data
david@volcano-base ~ $ cd data/
david@volcano-base ~/data $ aunpack ~/Downloads/tweets.zip 
Archive:  /home/david/Downloads/tweets.zip
  inflating: Unpack-2420/data/js/tweets/2013_03.js  
  (etc)
tweets.zip: extracted to `tweets' (multiple files in root)
[email protected] ~/data $ cd tweets/
css/  data/ img/  js/   lib/  
[email protected] ~/data $ cd tweets/data/js/tweets/
[email protected] ~/data/tweets/data/js/tweets $ ls
2008_04.js  2008_09.js  2009_01.js  
  (etc)

You’re now in a directory with lots of .js files.

Before we do anything, lets put everything under git.

david@volcano-base ~/data/tweets/data/js/tweets $ git init
Initialized empty Git repository in /home/david/data/tweets/data/js/tweets/.git/
david@volcano-base ~/data/tweets/data/js/tweets $ git add *.js
david@volcano-base ~/data/tweets/data/js/tweets $ git commit -m "Initial commit of data files"
[master (root-commit) 9aeb0dc] Initial commit of data files
 59 files changed, 699462 insertions(+)
 create mode 100644 2008_04.js
 (etc)

Now lets take a look at what we have here:

david@volcano-base ~/data/tweets/data/js/tweets $ head -n25 2008_04.js 
Grailbird.data.tweets_2008_04 = 
 [ {
  "source" : "web",
  "entities" : {
    "user_mentions" : [ ],
    "media" : [ ],
    "hashtags" : [ ],
    "urls" : [ ]
  },
  "geo" : {
  },
  "id_str" : "799851705",
  "text" : "Yay for google whoring.",
  "id" : 799851705,
  "created_at" : "Tue Apr 29 21:30:07 +0000 2008",
  "user" : {
    "name" : "David R. MacIver",
    "screen_name" : "DRMacIver",
    "protected" : false,
    "id_str" : "14368342",
    "profile_image_url_https" : "https://si0.twimg.com/profile_images/2609387884/1tu53xdcpssixve5o09m_normal.jpeg",
    "id" : 14368342,
    "verified" : false
  }
}, {

First thing we see: These aren’t JSON files. They are actually Javscript. Fortunately they’re formatted in such a way that we can easily turn them into JSON:

david@volcano-base ~/data/tweets/data/js/tweets $ for file in *.js; do tail -n+2 $file | sponge $file; done
david@volcano-base ~/data/tweets/data/js/tweets $ git diff | head
diff --git a/2008_04.js b/2008_04.js
index 5737052..2046162 100644
--- a/2008_04.js
+++ b/2008_04.js
@@ -1,4 +1,3 @@
-Grailbird.data.tweets_2008_04 = 
  [ {
   "source" : "web",
   "entities" : {
diff --git a/2008_06.js b/2008_06.js
 
david@volcano-base ~/data/tweets/data/js/tweets $ git commit -a -m "remove initial assignment line so all our files are valid javascript"
[master 8d41481] remove initial assignment line so all our files are valid javascript
 59 files changed, 59 deletions(-)

What’s going on here?

Well, from the tail man page:

-n, –lines=K output the last K lines, instead of the last 10; or use -n +K to output lines starting with the Kth

So tail -n+2 outputs all lines starting with the second.

sponge is from moreutils. According to its man page:

sponge reads standard input and writes it out to the specified file. Unlike a shell redirect, sponge soaks up all its input before opening the output file. This allows constricting pipelines that read from and write to the same file.

So what we’re doing in this loop is for each javascript file we’re stripping off the first line, buffering it up and then writing it back to the original file. (I expect there’s also a sed one liner to do this, but this was easier than looking up what it was)

Right. Now for something actually interesting!

For a starting point, I’ve often wondered how much I’ve actually written on twitter. So lets do that.

david@volcano-base ~/data/tweets/data/js/tweets $ cat *.js | jq -r '.[] | .text' | wc -w
326212

Before I analyze what I’ve just done, I’m going to marvel at the fact that that’s a metric fuckton of words. Depending on how you count that’s about three novels tweeted. I don’t know if that says good or bad things about me.

Now lets unpack the command.

First the uninteresting bits: cat *.js concatenates all the js files and spews them to stdout, wc -w counts the number of words fed to its stdin. You knew that though.

Now lets talk about the jq command, which is the interesting bit. To be clear: I’m learning jq as I write this (I’ve been meaning to for a while and then didn’t), so I really don’t know that much about it, so all of what I say may be wrong.

As I understand it, the jq model is that everything in it is a stream of JSON values, and that’s also how it parses its STDIN. This is why concatenating the JSON files and feeding it to jq works: It parses one value from STDIN, then another, then another. So we’re starting with a stream of arrays.

We’re then building up a filter. ‘.’ is simply the filter which pipes its input to its output, but by modifying it as ‘.[]’ we get a filter which accepts a stream of arrays and unpacks them by reading each array then streaming it to the output one at a time. So we’ve taken our stream of arrays and turned it into a stream of the array contents. Let’s verify that:

david@volcano-base ~/data/tweets/data/js/tweets $ < 2008_04.js  jq '.[]' | head -n50
{
  "user": {
    "verified": false,
    "id": 14368342,
    "profile_image_url_https": "https://si0.twimg.com/profile_images/2609387884/1tu53xdcpssixve5o09m_normal.jpeg",
    "id_str": "14368342",
    "protected": false,
    "screen_name": "DRMacIver",
    "name": "David R. MacIver"
  },
  "created_at": "Tue Apr 29 21:30:07 +0000 2008",
  "id": 799851705,
  "text": "Yay for google whoring.",
  "id_str": "799851705",
  "geo": {},
  "entities": {
    "urls": [],
    "hashtags": [],
    "media": [],
    "user_mentions": []
  },
  "source": "web"
}
{
  "user": {
    "verified": false,
    "id": 14368342,
    "profile_image_url_https": "https://si0.twimg.com/profile_images/2609387884/1tu53xdcpssixve5o09m_normal.jpeg",
    "id_str": "14368342",
    "protected": false,
    "screen_name": "DRMacIver",
    "name": "David R. MacIver"
  },
  "created_at": "Tue Apr 29 10:36:22 +0000 2008",
  "id": 799412882,
  "text": "I get more and more sick of Java with every passing day. Impressive, given that I'm not even writing it any more...",
  "id_str": "799412882",
  "geo": {},
  "entities": {
    "urls": [],
    "hashtags": [],
    "media": [],
    "user_mentions": []
  },
  "source": "web"
}
{
  "user": {
    "verified": false,
    "id": 14368342,

(I’ve switched to just using the 2008_04.js file because we’re only looking at a small amount of data)

We do indeed get a sequence of JSON objects one after another (note no commas or array markers).

We then build up a pipe inside the jq language. The “.text” filter reads its input stream, looks up the text property on it and outputs the result as follows.

david@volcano-base ~/data/tweets/data/js/tweets $ < 2008_04.js  jq  '.[] | .text' | head -n10
"Yay for google whoring."
"I get more and more sick of Java with every passing day. Impressive, given that I'm not even writing it any more..."
"Unleashing my inner interior decorator."
"Having trouble with the twitter UI, which is just embarassing given how simple it is. :-)"
"giving this twitter thing another try."

The final bit to explain here is the -r flag. From the man page:

–raw-output/-r: With this option, if the filter’s result is a string then it will be written directly to standard output rather than being formatted as a JSON string with quotes. This can be useful for making jq filters talk to non-JSON-based systems.

Indeed it can:

david@volcano-base ~/data/tweets/data/js/tweets $ < 2008_04.js  jq  -r '.[] | .text' | head -n10
Yay for google whoring.
I get more and more sick of Java with every passing day. Impressive, given that I'm not even writing it any more...
Unleashing my inner interior decorator.
Having trouble with the twitter UI, which is just embarassing given how simple it is. :-)
giving this twitter thing another try.

So we’ve chained all these together to get the actual text.

Now lets save that text so we can do a bit more analysis on it:

david@volcano-base ~/data/tweets/data/js/tweets $ head all_tweets.txt 
Yay for google whoring.
I get more and more sick of Java with every passing day. Impressive, given that I'm not even writing it any more...
Unleashing my inner interior decorator.
Having trouble with the twitter UI, which is just embarassing given how simple it is. :-)
giving this twitter thing another try.
No, seriously. I mean it. Why do people use ASP? Every site I've encountered using it has driven me insane with its awfullness
Barack Obama is the eleventh doctor!
Brillig and the Slithy Toves would make a great band name
According to Victoria I am well positioned to be the antichrist
I know understand why tea and crumpets are our traditional fare
david@volcano-base ~/data/tweets/data/js/tweets $ git add all_tweets.txt
david@volcano-base ~/data/tweets/data/js/tweets $ git commit all_tweets.txt -m "Just the text for all tweets"
[master df1e778] Just the text for all tweets
 1 file changed, 20553 insertions(+)
 create mode 100644 all_tweets.txt

And a sanity check:

david@volcano-base ~/data/tweets/data/js/tweets $ wc -w all_tweets.txt 
326212 all_tweets.txt

Good, the same answer.

Now, lets ask another interesting question: How much have I written not counting @replies?

Answer: Not nearly so much.

david@volcano-base ~/data/tweets/data/js/tweets $ grep -v '@' all_tweets.txt | wc -w
88966

So apparently most of my twitter usage is conversations: We go from about 3 novels to a short novel or long novella if I remove them.

What’s going on here?

Well, I’m doing a text search on all_tweets.txt. grep ‘@’ gives me all lines which contain an @, and then the -v flag inverts the sense of the match:

david@volcano-base ~/data/tweets/data/js/tweets $ grep '@' all_tweets.txt | head
@benaud I could. It's kinda hacked together and purely for sending messages. Still want it?
@t_a_w JQuery doesn't meet my needs in two crucial ways: a) I'm not doing something in the browser. b) It left survivors.
@t_a_w JQuery fails my needs in two crucial ways. Firstly, I'm not trying to do something in the browser. Secondly, it left survivors.
@gnufied It's possible I was being slightly facetious...
@gnufied But that is not object orientated! Encapsulation! You are a bad programmer and should go back to writing C with global variables.
@t_a_w I'll have you know that the book I just bought covers the state of the art up to at *least* 1990.
@jherber Not sure.I don't think arrays are the issue so much as the general collections API. Certainly there are too damn many toList calls.
@jherber Oh, hey. I'd forgotten about the embarrassing slowness of List.sort. Thanks for reminding me. I'll add fix that.
@mikesten Right, but there are two reports. The one I sent a link to is everything on expertise-identification, but there are other ones.
@mikesten The full super-scary thing or just the work applicable ones? :-)
david@volcano-base ~/data/tweets/data/js/tweets $ grep -v '@' all_tweets.txt | head
Yay for google whoring.
I get more and more sick of Java with every passing day. Impressive, given that I'm not even writing it any more...
Unleashing my inner interior decorator.
Having trouble with the twitter UI, which is just embarassing given how simple it is. :-)
giving this twitter thing another try.
No, seriously. I mean it. Why do people use ASP? Every site I've encountered using it has driven me insane with its awfullness
Barack Obama is the eleventh doctor!
Brillig and the Slithy Toves would make a great band name
According to Victoria I am well positioned to be the antichrist
I know understand why tea and crumpets are our traditional fare

We could have also worked this out with jq:

david@volcano-base ~/data/tweets/data/js/tweets $ cat *.js | jq -r '.[] | select(.entities.user_mentions | length == 0) | .text' | wc -w 
91071

What’s going on here?

What we’ve done is we’ve added a filter in the middle of our two previous filters. This is a select filter, as explained here in the manual. For each row of input, it passes that to the filter it’s wrapping (note that the thing inside is a filter!), then reads all the output from that filter until it either runs out of output or it finds a true value.

Here we only ever output one value (whether the length of the user_mentions array is 0), but here’s an illustration of what happens if you output multiple values:

david@volcano-base ~/data/tweets/data/js/tweets $ echo  -e '[false]\n[true]\n[false,true]\n[true,false]' | jq 'select(.[])'
[
  true
]
[
  false,
  true
]
[
  true,
  false
]

Observe that the arrays with any true value in them get returned but the arrays with all false values don’t.

An interesting thing to note: These answers are not exactly the same. I have more words written with no user mentions than I do words from tweets with no @s in them. What’s going on?

Let’s find out:

david@volcano-base ~/data/tweets/data/js/tweets $ cat *.js | jq -r '.[] | select(.entities.user_mentions | length == 0) | select(.text | contains("@")) | .text' | grep -o '@[[:alnum:]_]\+' | sort | uniq -c | sort -nr
     68 @njbartlett
     19 @sylviazygalo
      8 @Lars_Westergren
      7 @burningodzilla
      4 @jeanfinds
      3 @a_y_alex
      2 @torsemaciver
      2 @Ms_Elevator
      2 @charlesarmstrong
      2 @cambellmichael
      2 @benreesman
      2 @allbery_b
      1 @zarkonnen
      1 @trampodevs
      1 @tooPsychedout
      1 @timperret
      1 @thatsoph
      1 @nuart11
      1 @missbossy
      1 @mikevpotts
      1 @michexile
      1 @MetalBeetleLtd
      1 @mccraicmccraig
      1 @MatthijsAKrul
      1 @m
      1 @lucasowen85
      1 @lisascott869
      1 @lauren0donnell
      1 @Knewton_Pete
      1 @kittenhuffers
      1 @jnbartlet
      1 @itomicsam
      1 @iamreddaveMaximum
      1 @georgie_guy
      1 @geezusfreeek
      1 @GarethAugustus
      1 @drmaciver
      1 @debashshg
      1 @dbasishg
      1 @CTerry1985
      1 @communicaI
      1 @cipher3d
      1 @carnotwitk

These are the usernames that appear in tweets that have no user_mentions in them. Based on a few random samples, none of them appear to be valid twitter user names. Some of them are obviously typos, some of them I recognise as people who have changed their usernames or deleted their accounts. So I think that explains that. Now lets explain the command.

The jq command should hopefully be obvious – it’s just more of the same, but I’ve added a new select (this time using contains for string matching). The remainder is interesting though.

The grep string ‘@[[:alnum:]_]\+’ is a POSIX regular expression which matches strings which start with @ and then are followed by any combination of alphanumeric characters and underscores. So e.g. “@foobarbaz” matches, as does “@foobar123”, or “@foo_bar_baz”. The -m flag says to only print the matches rather than the lines containing matches as is grep’s normal behaviour.

So e.g.

david@volcano-base ~/data/tweets/data/js/tweets $ <all_tweets.txt grep -o '@[[:alnum:]_]\+' | head
@benaud
@t_a_w
@t_a_w
@gnufied
@gnufied
@t_a_w
@jherber
@jherber
@mikesten
@mikesten

What’s going on with the bit after the grep?

As a whole unit, ‘sort | uniq -c | sort -nr’ means “Give me a tabulated count of all the lines I feed in here, ordered by reverse frequency”.

Taking it apart what happens is this:

First we feed the results into sort. This, err, sorts them (in dictionary order). We then pass the output for that to uniq.

What uniq classically does is it removes all consecutive lines which are the same. So for example:

david@volcano-base ~/data/tweets/data/js/tweets $ <all_tweets.txt grep -o '@[[:alnum:]_]\+' | head | uniq
@benaud
@t_a_w
@gnufied
@t_a_w
@jherber
@mikesten

Note that it doesn’t remove all duplicates: Only adjacent ones. That’s why we had to sort first.

Adding the -c flag causes it to count up those adjacent lines:

david@volcano-base ~/data/tweets/data/js/tweets $ <all_tweets.txt grep -o '@[[:alnum:]_]\+' | head | uniq -c
      1 @benaud
      2 @t_a_w
      2 @gnufied
      1 @t_a_w
      2 @jherber
      2 @mikesten

So between the sorting and the counting, this gives us our tallies.

We then sort again to put it in order. However we don’t want to sort numbers by their string value (this would put e.g. 2 after 11), so the -n flag to sort tells it to sort numerically (I don’t actually know what it does with the rest of the text. I think it just pulls off the first number and uses that). This would put it in ascending numerical order, so the -r flag reverses that.

OK. So, the text is actually a more reliable guide of tweets than the metadata if I just want to know who I’m tweeting at. And I do. So lets get that.

david@volcano-base ~/data/tweets/data/js/tweets $ <all_tweets.txt grep -o '@[[:alnum:]_]\+' | sort | uniq -cd | sort -nr | head
    793 @angusprune
    641 @LovedayBrooke
    503 @petermacrobert
    412 @pozorvlak
    360 @stef
    339 @bluealchemist
    322 @drsnooks
    313 @reyhan
    271 @communicating
    239 @zarkonnen

I’m not sure if that’s the exact answer I would have expected, but it’s not terribly surprising.

Now I’d like to take a look at what words I use.

In order to do this I’ll use grep again to extract words from the all_tweets.txt file:

david@volcano-base ~/data/tweets/data/js/tweets $ grep -o '\b[[:alpha:]]\+\b' all_tweets.txt | head
Yay
for
google
whoring
I
get
more
and
more
sick

The ‘\b’ special character there is the only thing new in this (also [:alpha:], but that just means any alphabet character). ‘\b’ means word boundary.

One thing stands out here: There are going to be a lot of really uninteresting words in this, like “I” and “more” and “and”. In natural language processing, these are typically called stop words. We want to remove those from the calculation.

So what do we do?

Well, first we find a list of stop words. Some brief googling lead me to this file, which seems to be good enough. Let’s fetch it:

david@volcano-base ~/data/tweets/data/js/tweets $ wget https://stop-words.googlecode.com/svn/trunk/stop-words/stop-words/stop-words-english1.txt
--2013-03-03 13:01:39--  https://stop-words.googlecode.com/svn/trunk/stop-words/stop-words/stop-words-english1.txt
Resolving stop-words.googlecode.com (stop-words.googlecode.com)... 2a00:1450:400c:c05::52, 173.194.78.82
Connecting to stop-words.googlecode.com (stop-words.googlecode.com)|2a00:1450:400c:c05::52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4812 (4.7K) [text/plain]
Saving to: `stop-words-english1.txt'
 
100%[=====================================================================================================================================================================================>] 4,812       --.-K/s   in 0.001s  
 
2013-03-03 13:01:39 (8.29 MB/s) - `stop-words-english1.txt' saved [4812/4812]

Now! An annoying bit.

What came next was mysteriously not working for me. It turns out that there were some problems with this file.

The first reason it was not working was the presence of a BOM (Byte Order Mark) at the beginning of this file was confusing poor grep. So first we have to strip the BOM like so:

david@volcano-base ~/data/tweets/data/js/tweets $ tail -c+4 stop-words-english1.txt | sponge stop-words-english1.txt

-c works like -n except that it functions on characters (really, bytes) instead of lines. So what we’re doing here is stripping out the first three bytes of the file, which are the BOM.

The second reason it was not working was to do with line endings.

Basically, line endings are actually special characters which say “Here! Start a new line!”. Unfortunately there is some disagreement as to just what those special characters are. Unix considers it to be a single newline character, traditionally represented as ‘\n’, whileas windows (and the web) consider it to be two characters ‘\r\n’- the first being a carriage return, meaning “Go back to the beginning of the line”.

We’re going to be feeding these lines to grep to use as patterns, and grep is of the unix, so it wants its new lines to be pure ‘\n’ and will consider ‘\r’ to be part of the pattern to be matched on. So we need to remove those:

david@volcano-base ~/data/tweets/data/js/tweets $ sed -i 's/\r//' stop-words-english1.txt

The pattern we’re giving said says “replace the first carriage return with an empty string”. The -i flag means “And then replace the file with the results of this”. Normally it would write the results to the console.

Some poking at the file caused me to realise it also doesn’t consider “I” to be a stop word. I don’t know why. Lets fix that:

david@volcano-base ~/data/tweets/data/js/tweets $ echo i >> stop-words-english1.txt
david@volcano-base ~/data/tweets/data/js/tweets $ tail stop-words-english1.txt
you'd
you'll
your
you're
yours
yourself
yourselves
you've
zero
i

General lesson here: Real world data is often messy. If your code isn’t working make sure the bug isn’t in your data.

Now our stop words are ready to use. Lets add them to the git repo:

david@volcano-base ~/data/tweets/data/js/tweets $ git add stop-words-english1.txt
david@volcano-base ~/data/tweets/data/js/tweets $ git commit stop-words-english1.txt -m "File containing stop words"
[master bec4115] File containing stop words
 1 file changed, 636 insertions(+)
 create mode 100644 stop-words-english1.txt

We’re now prepared to use the stop word list:

david@volcano-base ~/data/tweets/data/js/tweets $ grep -o '\b[[:alpha:]]\+\b' all_tweets.txt | grep -xvi -f stop-words-english1.txt | head
Yay
google
whoring
sick
Java
passing
day
Impressive
m
writing

grep -f means “Use the lines from this file as patterns and match on any of them”

The flag -v as before means “invert the match”, i.e. only give us things that don’t match the pattern. -i tells it to match case insensitively and -x tells it to only match things which match the whole line (So e.g. the fact that we have a single character ‘i’ in the list shouldn’t exclude words containing i).

Annoyingly the word m is still coming through. Rather than add every single character word to the stop words, lets just change our search to only match words of three letters or longer:

david@volcano-base ~/data/tweets/data/js/tweets $ grep -o '\b[[:alpha:]]\{3,\}\b' all_tweets.txt | grep -xvi -f stop-words-english1.txt | head
Yay
google
whoring
sick
Java
passing
day
Impressive
writing
Unleashing

Replacing the ‘\+’ with ‘\{3,\}’ has made the pattern only match words which are at least 3 characters long.

Now, lets actually get the answer:

david@volcano-base ~/data/tweets/data/js/tweets $ grep -o '\b[[:alpha:]]\{3,\}\b' all_tweets.txt | grep -xvi -f stop-words-english1.txt | sort | uniq -c | sort -nr | head
   1573 http
   1134 don
    901 people
    826 good
    794 angusprune
    641 LovedayBrooke
    610 time
    525 work
    504 petermacrobert
    484 bit

Err. Clearly there are some problems here.

First problem is that fairly obviously some of these are @replies showing up as words. Apparently word boundary (‘\b’) doesn’t mean what I thought it did. It’s also clearly extracting things like http from http:// and don from don’t.

Here’s my replacement solution:

david@volcano-base ~/data/tweets/data/js/tweets $ grep -o '[^[:space:]]\{3,\}\b' all_tweets.txt | grep -v '@' | grep -xvi -f stop-words-english1.txt | sort | uniq -c | sort -nr | head -n25
    887 people
    810 good
    600 time
    519 work
    443 problem
    361 code
    350 bad
    335 Yeah
    327 idea
    300 pretty
    300 point
    287 bit
    285 day
    279 find
    278 lot
    275 wrong
    250 coffee
    246 hard
    229 read
    224 feel
    221 thought
    216 twitter
    197 today
    191 long
    190 works

Basically instead of looking for purely alphabetic words we look for any sequences of non whitespace characters. We then filter out things with @ in them afterwards.

The results here turn out to be… really uninteresting. About the only things that look remotely specific to me on here are “coffee” and “code”, both of which it’s true I do care about quite a lot. Mmm. Tasty, tasty code.

Before I go, here’s one more thing:

david@volcano-base ~/data/tweets/data/js/tweets $ cat *.js | jq -r '.[] | .entities.urls[] | .expanded_url'  | head
http://ncommandments.com/40
http://twitpic.com/43pl3t
http://twitpic.com/43nccl
http://ncommandments.com/893
http://ncommandments.com/42
http://twitpic.com/4dnzhr
http://yfrog.com/h2797xpj
http://twitpic.com/4a1t1z
http://twitpic.com/49yzs2
http://twitpic.com/49euse

Every URL you’ve ever posted to twitter (well, it would be without the head at the end to truncate it). Just more chaining of jq filters – we unpack the arrays, then we get the URLs off the object as an array, then we unpack that, then we get the .expanded_url off the url objects.

And that’s about it for now. I can’t think of anything else I particularly want to do. Time line analysis might be interesting – i.e. what’s changed over time (particularly in terms of who I tweet at), but I’m not very interested in doing that right now so I think I’ll leave it there.

Questions? Anything you’d like know how to do?

This entry was posted in programming on by .

Now syndicating drmaciver.com to twitter

I realised it was a bit silly that I was tweeting my Imbibliotech posts to Twitter and not my drmaciver.com posts, given that my twitter account is much more strongly associated with this online identity than with Imbibliotech.

One solution would be to tweet neither of course, but that would deny me a fabulous opportunity for self-promoting tooting of my own horn, so we can’t have that.

So hopefully starting with this posts I’ll be tweeting new drmaciver.com posts. Hopefully this shouldn’t be too annoying, and if it is well I blame the internet for having apparently deciding that RSS is inferior to proprietary self-promotion platforms. Those of us who continue to do it right will get punished by seeing it in both mediums of course, but such was always the way.

This entry was posted in Admin on by .