character set

Getting Hacked

character set


Recently one of my clients’ sites got hacked.

I went to update something on the site and discovered that the page title was a long string of code. and other odd characters had popped up in the site content.

It’s a WordPress site, so I quickly logged into the admin area and went to the site settings.  The “Site Title” had been changed to a long string of code.  Since the site title shows up on every page of the site, it was ruining every page.  This was a site where SEO had recently become critical due to changes with Google’s algorithms and thr client was watching the traffic stats every day.

I quickly replaced the hacker’s code in the site title and moved on to the rest of the site.  

I started looking at pages (there are a couple thousand pages on this site). I was seeing this character on various pages: Á and was puzzled as to why hackers would do something like that, but on one page I went into the editor and deleted out all of them I could find.  Easy enough.  Then I hit the Update button.  Disaster!  Everything following where I had deleted one of the stray characters was gone!  Before panic set in, In I tried to fix a 2nd page.  Arrgh!  The same result.  I moved onto a 3rd page, but this time copied the HTML to dreamweaver and made the fixes. I copied it all back and clicked on Update.  Success!  I updated a few more pages, and fixed the first 2 that I had broken with text from a recent database backup I thankfully had.  Whew!

Over the next few days I noticed more and more stray characters and odd code that had crept in.  What could be causing this?! After slogging through fixing a dozen or so pages I got the bright idea to google that Character and found info that led me to check the Character set.  Not something you normally have to mess with in WordPress, but a quick view of the source code told me that the character set had been changed to UTF-7!  

OK, so what is the character set? It’s a short line of code that tells your browser what set of characters to use.  For example a whole different set of characters would be needed if you wanted your site to display content in Japanese, Russian, or Greek.  Or if the coding of the page had been done in a different language. For the web, the standard is UTF-8, especially if PHP coding is used. ( UTF stands for Unicode Transformation Format )

So the dang hackers had somehow change the character set to UFT-7!

It only took me a few minutes to go into my database and change the character set back to good ole UTF-8, and presto!  All the stray characters had vanished.  Turns out that “ ” (the HTML code for hard space) was the Á character in UTF-7.  Other codes such as the one for Bullet • were also being rendered incorrectly.

Yay!  Useful lesson learned.

Then it was back to trying to secure my site against future attacks… That’s for another posting.