No fail-whale purgatory for us—debunking the twitpocalypse
June 13, 2009
The interwebs are all a-twitter with speculation about a nefarious Twitter bug that could soon doom us all to fail-whale purgatory. The truth, however, is that the so-called Twitpocalypse will probably not do much damage at all.
Each message that is posted to the popular Twitter microblogging service is automatically assigned a unique numerical ID that distinguishes it from other messages. The ID values for new messages are getting higher as the volume of total messages posted to the service increases. The premise behind the Twitpocalypse scare is that the ID numbers will soon exceed 2,147,483,647, which is the highest value that can fit in a 32-bit signed integer. In theory, when Twitter ID values exceed that number, the resulting integer overflow will cause things to break. At the current rate that users are posting Twitter messages, the ID numbers will reach that value by the end of the night.
The Twitter service itself is designed to accommodate higher numbers and is not susceptible to the problem. In practice, it’s only going to effect a slim number of the most poorly written third-party Twitter programs. Many modern high-level programming languages completely insulate programmers from these kinds of problems and use abstract numerical types that don’t have the same kind of hard size limitations.
For software that is written in lower-level programming languages, it’s still unclear if the Twitpocalypse is going to be problematic. Third-party Twitter applications get their data from Twitter through the service’s APIs. If a Twitter client application is retrieving a message stream as XML, then the program will parse the XML, pull out all of the values as strings, and then convert them to other types as needed. Most third-party Twitter software developers are hopefully smart enough to use appropriate types for the ID values.
For the vast majority of desktop clients, there is no need to treat the ID value as an integer. In fact, we suspect that most clients that are using Twitter’s XML feed store the ID values internally as strings. The only major use for the ID on the client side is to specify which message a reply is directed to via the Twitter API’s in_reply_to_status_id parameter. This is sent as part of the URL string, so it doesn’t need to be an integer for that.
The other major data interchange format that is supported by Twitter’s API is JSON. Unlike XML, JSON does differentiate between string and number types. Twitter’s JSON timeline specifies that the ID value is a number, so it is likely treated as such by programs. Most third-party Twitter software that uses JSON, however, will be using a canned JSON library rather than rolling their own JSON parser. This means that it will be up to the library to take the JSON content and determine how to handle the type conversion. This is when we start to see some potential problems. I suspect that most JSON libraries handle large numbers appropriately, but some might not.
All things considered, we doubt that we will see a lot of third-party Twitter software breaking down as a result of this issue.
This post has been written by Ryan Paul on June 12, 2009 5:03 PM couresy of arstechnica.com.