Tuesday, May 12, 2015

Security: Hacking Wordpress/PHP, Ruby Comparisons, and Lessons Learned

Another US Cert alert went out for Wordpress last week.  I'm not shocked.  I call Wordpress one of the biggest hacker honeypots around because it keeps popping up in cybersecurity news.  Yet, it's not fair to be subjective.   To be more objective, I'll take a look at verified  software vulnerabilities to assess what's happening under the hood.

Part 1: Lots of Vulnerabilities


I went back to MITRE's CVE database again (NIST's mirror isn't as user friendly) to compare how many vulnerabilities have been occurring in Wordpress and PHP -- the core language behind Wordpress -- versus another popular web development language, Ruby (and Ruby on Rails).  See my previous post on CVEs and preventing software vulnerabilities for using these databases.  Anyways, a simplistic comparison of all vulnerabilities found in Ruby versus PHP is staggering:

CVE Total Counts

$ curl -s https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=ruby | grep "CVE" | wc -l
288
$ curl -s https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=php | grep "CVE" | wc -l
5812

For the studious, since Ruby doesn't really have a good equivalent to Wordpress, it's fairer to see that Ruby on Rails CVE counts are not on the same scale as PHP.  Here's their CVE dumps of those too:

$ curl -s https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=rails | grep CVE | wc -l
122
$ curl -s https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=wordpress | grep CVE | wc -l
951

It should be noted -- especially for the skeptical -- that PHP and Ruby have some relevant differences.  PHP version 1.0 was released in June 1995 while Ruby 1.0 was released December 1996, but PHP is only slightly older than Ruby.  The CVE dumps above are for all known time, so I don't think their age explains the difference in quantity (and the CVE databases only go back to the early 2000s anyways).  I chose to compare these two languages instead of solutions built on top of them, like Wordpress, because they are both popular, both interpreted languages, and both end on the front line of cybersecurity -- running public websites.

Adoption and Presence


A coworker sent me over to BuiltWith for an unrelated comparison of website softwares so I thought it would be a good tool to dig deeper on the software adoption being related to quantities of vulnerabilities.  BuildWith is a service that scrapes online footprints to determine which softwares run public websites -- everything from operating systems to content management systems like Wordpress.  It's pretty nifty.  I asked BuildWith what the adoption rates look like for PHP and Ruby on Rails (there was no pure Ruby data, which makes sense for BuiltWith's purpose):

Framework Usage


Essentially, PHP pops up 30 more times in websites than Ruby but PHP usage is essentially stable while Ruby's is on the rise; so it's worth another look at whether the frequency of vulnerabilities is related to usage or adoption.  Here's vulnerabilities found over time and basic trends in each language:

CVE Counts per Year
Language2012201320142015 ForecastLinear Regress
PHP171127150128.3333333-10.5
Ruby33752739-3
* note Google Slide FORECAST uses a slightly different model than SLOPE, so calling that out.

Contrast the vulnerabilities trend in Ruby -- a decreasing rate of occurrence -- with the adoption rate of Ruby according to BuiltWith -- a ~33% annual growth.  Of course, there's many reasons for finding vulnerabilities, like a large adoption may mean more hackers want to target a larger victim population, or popularity could drive up the amount of bugs in an effort to meet many (possibly insecure) feature requests.  To play Devil's Advocate, PHP does indeed have a decreasing rate of vulnerabilities, but it isn't being adopted more than Ruby. So does a 20-fold difference in vulnerabilities between PHP and Ruby get explained away by their sheer online presence?

Part 2: Writing Bugs


Instances and rates of vulnerabilities don't capture the severity of each vulnerability, or the ease of a hacker exploiting the software, or the laziness of programmers.  (CVSS does focus our attention, though.)  Also, cybersecurity news only gives us superficial  recommendations like "security patch available, update now!" instead of digging deeper into buggy software.  At most, we may read that a "XSS" or "SQL injection" vulnerability was found -- as if those phrases invoke some kind of meaning into our decision to use one software over another.  I decided to dive deeper into one of these XSS & SQL injection Wordpress bugs, similar to one that caused the US Cert issued last week, and found some disturbing practices in PHP programming and some ignored best practices.

One recent vulnerability in Wordpress came down to programmers being too lazy to scrub data input.  Something as seemingly benign as a Wordpress forum was exploited by submitting HTML into the comment field; and because Javascript can be embedded into any valid HTML data stream, the PHP server parsed and rendered back to the web browser client whatever had been submitted as a comment.  The marvel of this hack is that Wordpress comment moderators are often Wordpress administrators who have logged in with unlimited access to the site, so malicious Javascript embedded in a comment field on their web browsers would gain unauthorized, elevated privileges to execute against the entire website.  The studious guy who found this vulnerability made a horrific demo of Javascript uploading content to the PHP server without the moderator's/administrator's knowledge.  That demo leverages cross-site scripting (XSS) to upload the content but it could have injected malicious SQL into Wordpress.

I've respected the programmer's motto of being lazy and keeping things simple but after seeing this hack I wonder: have we become too lazy?  Apparently the Wordpress comment functionality above wasn't unknown to developers.  Wordpress designed the comment form this way as a feature -- Wordpress users wanted to "texturize" comments with options like italicized fonts, embedded hyperlinks, etc. so Wordpress developers enabled HTML parsing of comments by the PHP engine.  They were actually doing a kind of data filtering of the comment field but did not thoroughly sanitize it!

In the first chapter of his concise work PHP Security, Shiflett's rationale for bringing up thorough data scrubbing at the very start of his book harkens back to the CVE reports I've cited before:

The vast majority of security vulnerabilities in popular PHP applications can be traced to a failure to filter input. (pg. 21)

A bit of Googling on problems with PHP programmers filtering data returned some disturbing practices.  For example, PHP had come up with a global requirement for quoted data to be escaped to prevent SQL injections but Shiflett notes that in reality this caused complications that encouraged programmers to fall back to merely stripping data of quotes or slashes instead of checking for valid data.  When this global quoting requirement troubled one PHP programmer, not a single suggestion included either checking for valid data or using best practices in filtering data.  Shiflett suggests using standard data sanitation functions in PHP, including htmlentities() for interacting with front-end data (and its exhaustive ENT_QUOTES parameter), mysql_real_escape_string() for back-end data, etc.  Why were these functions not included in answers for the PHP programmer having issues with quoted data?

To go back to the above case of the Wordpress vulnerability, what constitutes "thorough" data sanitization?  In the most strict sense, the Wordpress fix required not allowing HTML input from the user that hadn't been predefined. To keep that feature in tact, Wordpress developers had done data filtering that only allowed a subset of HTML to be valid input, but that kind of sanitation was undermined by forgetting another data sanitization process: not checking for valid data size.  The hack demo'ed above, after all, leveraged a very lengthy comment field.  So the vulnerability was incomplete or superficial data sanitation.  If I extrapolate these mishaps to web application security for Ruby, then there isn't a magic pill.  Ruby or Rails might have more programmatic means for data sanitation but the same fundamental process applies.  (I'll leave Ruby data sanitization for a follow-up blog.)  Generally speaking then, I don't buy the excuse that vulnerabilities are merely a lack of forethought.  It gets back to how we program.

Preventing Bugs


One of my coworkers boiled this entire blog down to "bad coding" but I think that is laziness itself speaking on our behalves.  The takeaway I got from writing about this XSS and SQL injection bug is the latent, perennial problem of feature requests over best practices.  Allowing texturized comments was valued more than allowing thorough data sanitization.  In the PHP / Wordpress examples above, data sanitation was needed for:

  1. String fields of maximum length
  2. String data input of only predefined HTML tags

More generally, the development lifecycle would include systematic means for exhausting the scrubbing of data input and output.

Programmers have a means for various testing, and can include a good kind of testing in-flight with unit tests.  For every input or output operation, we should iterate over data that abuses the interface to determine whether more data sanitation is needed.  This leads to one characteristic of Ruby and PHP where the languages differ.  Ruby includes a standard, unit test out-of-the-box whereas PHP programmers must choose and install one of many frameworks to start doing them.

These lessons learned aren't news.  Even good Ruby web applications will need to be written with the idea of systematically preventing bugs and weaknesses by exhaustively testing data sanitation.  MITRE's common weaknesses database is blatantly sarcastic about their best practices being ignored by programmers, as attested by their reports.  (See CWE-79, and CWE-434).  To be fair, MITRE's recommendations can be harsh, like not trusting any input, even a PNG sourced by HTML, so some of the lessons boil down to classic security debates about functionality.  And to be honest, I haven't typically tested for unexpected data, have only done the minimum in leveraging programmatic data filtering, and I've never included unit tests in my own programs.  These are all recommendations I make knowing that being lazy isn't an excuse.  What is left unknown is whether PHP programmers are more lazy than Ruby programmers.  :)


No comments:

Post a Comment